IRIS UniCatt

This paper presents an approach to integrating Latin inflected forms and corpus attestations within a Linked Open Data (LOD) framework, enhancing interoperability between Wikidata and the LiLa knowledge base. Building on the PrinParLat lexicon of Latin verb principal parts, we generate the complete set of inflected forms for over 8,000 verbs, encoded as RDF in a dedicated Wikibase instance. These forms are linked to the Index Thomisticus Treebank (ITTB), whose morphologically annotated tokens are related to corresponding forms based on segmental identity, lemma alignment, and mapped morphological features. Our generation and linking process achieves over 95% coverage of ITTB verbal tokens, demonstrating the robustness of our pipeline even for Medieval Latin data. By aligning Paralex, Wikidata, and LiLa ontologies, we ensure semantic interoperability and facilitate future integration into Wikidata. Beyond Latin, this workflow provides a reproducible model for linking inflectional paradigms and corpus attestations in other languages.

Lindemann, D., Pellegrini, M., Mambrini, F., Passarotti, M. C., Wikidata and LiLa for Latin: Enabling Interoperability and Access to Inflected Forms and Corpus Attestations, <<JOURNAL OF OPEN HUMANITIES DATA>>, 2025; 11 (83): 1-17. [doi:https://doi.org/10.5334/johd.464] [https://hdl.handle.net/10807/328059]

Wikidata and LiLa for Latin: Enabling Interoperability and Access to Inflected Forms and Corpus Attestations

Lindemann David;Pellegrini, Matteo;Mambrini, Francesco;Passarotti, Marco Carlo

2025

Abstract

This paper presents an approach to integrating Latin inflected forms and corpus attestations within a Linked Open Data (LOD) framework, enhancing interoperability between Wikidata and the LiLa knowledge base. Building on the PrinParLat lexicon of Latin verb principal parts, we generate the complete set of inflected forms for over 8,000 verbs, encoded as RDF in a dedicated Wikibase instance. These forms are linked to the Index Thomisticus Treebank (ITTB), whose morphologically annotated tokens are related to corresponding forms based on segmental identity, lemma alignment, and mapped morphological features. Our generation and linking process achieves over 95% coverage of ITTB verbal tokens, demonstrating the robustness of our pipeline even for Medieval Latin data. By aligning Paralex, Wikidata, and LiLa ontologies, we ensure semantic interoperability and facilitate future integration into Wikidata. Beyond Latin, this workflow provides a reproducible model for linking inflectional paradigms and corpus attestations in other languages.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2025
			
	Lingua del contenuto
	
				Inglese
			
	Nome del periodico
	
				JOURNAL OF OPEN HUMANITIES DATA
			
	DOI del contributo
	
				https://dx.doi.org/https://doi.org/10.5334/johd.464
			
	Citazione
	
				Lindemann, D., Pellegrini, M., Mambrini, F., Passarotti, M. C., Wikidata and LiLa for Latin: Enabling Interoperability and Access to Inflected Forms and Corpus Attestations, <<JOURNAL OF OPEN HUMANITIES DATA>>, 2025;  11 (83): 1-17. [doi:https://doi.org/10.5334/johd.464] [https://hdl.handle.net/10807/328059]
			
	Appare nelle tipologie:
	
				Articolo in rivista, Nota a sentenza

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/328059

Citazioni

ND

0

0

social impact