This paper describes the inclusion of Sicilian in the Semantic Web through the development of new resources aligned with Linguistic Linked Open Data principles. More specifically, we model and publish the first Sicilian Lemma Bank and a bilingual Sicilian–Italian glossary extracted from the Sicilian Wiktionary (Wikizziunariu). These resources are formalized using the OntoLex-Lemon and LiLa (Linking Latin) ontologies with the aim of enabling cross-lingual interoperability. The glossary is also linked to the LiITA (Linking Italian) knowledge base. In addition, two preliminary experiments are reported: the first evaluates the translation capabilities of commercial Large Language Models (LLMs) from Sicilian into Italian; the second investigates bilingual lexicon induction through cross-lingual embedding alignment, with results indicating the challenges posed by low-resource dialects. This work aims to demonstrate the feasibility and importance of integrating under-resourced languages into broader Computational Linguistics and Semantic Web infrastructures.
Sprugnoli, R., Moretti, G., Muscianisi, D. G., Litta Modignani Picozzi, E. M. G., Ciallabacialla! Modeling and Linking a Regional Lexical Resource to Include Sicilian in the Semantic Web, in Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025), (Cagliari (Italia), 24-26 September 2025), CEUR Workshop Proceedings (CEUR-WS.org), Cagliari 2025:<<CEUR WORKSHOP PROCEEDINGS>>,4112 1093-1101 [https://hdl.handle.net/10807/337630]
Ciallabacialla! Modeling and Linking a Regional Lexical Resource to Include Sicilian in the Semantic Web
Sprugnoli, RachelePrimo
Writing – Original Draft Preparation
;Litta Modignani Picozzi, Eleonora Maria GabriellaUltimo
Writing – Review & Editing
2025
Abstract
This paper describes the inclusion of Sicilian in the Semantic Web through the development of new resources aligned with Linguistic Linked Open Data principles. More specifically, we model and publish the first Sicilian Lemma Bank and a bilingual Sicilian–Italian glossary extracted from the Sicilian Wiktionary (Wikizziunariu). These resources are formalized using the OntoLex-Lemon and LiLa (Linking Latin) ontologies with the aim of enabling cross-lingual interoperability. The glossary is also linked to the LiITA (Linking Italian) knowledge base. In addition, two preliminary experiments are reported: the first evaluates the translation capabilities of commercial Large Language Models (LLMs) from Sicilian into Italian; the second investigates bilingual lexicon induction through cross-lingual embedding alignment, with results indicating the challenges posed by low-resource dialects. This work aims to demonstrate the feasibility and importance of integrating under-resourced languages into broader Computational Linguistics and Semantic Web infrastructures.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



