IRIS UniCatt

This paper addresses the challenge of divergent lemmatization and part-of-speech (PoS) tagging practices for Latin participles in annotated corpora. We propose a solution through the LiLa Knowledge Base, a Linked Open Data framework designed to unify lexical and textual data for Latin. Using lemmas as the point of connection between distributed textual and lexical resources, LiLa introduces hypolemmas — secondary citation forms belonging to a word’s inflectional paradigm — as a means of reconciling divergent annotations for participles. Rather than advocating a single uniform annotation scheme, LiLa preserves each resource’s native guidelines while ensuring that users can retrieve and analyze participial data seamlessly. Via empirical assessments of multiple Latin corpora, we show how the LiLa’s integration of lemmas and hypolemmas enables consistent retrieval of participle forms regardless of whether they are categorized as verbal or adjectival.

Passarotti, M. C., Iurescia, F., Ruffolo, P., Harmonizing Divergent Lemmatization and Part-of-Speech Tagging Practices for Latin Participles through the LiLa Knowledge Base, in Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025), (Vienna, Austria, 31-31 July 2025), Association for Computational Linguistics, Vienna, Austria 2025: 103-114 [https://hdl.handle.net/10807/319716]

Harmonizing Divergent Lemmatization and Part-of-Speech Tagging Practices for Latin Participles through the LiLa Knowledge Base

Passarotti, Marco Carlo;Iurescia, Federica;Ruffolo, Paolo

2025

Abstract

This paper addresses the challenge of divergent lemmatization and part-of-speech (PoS) tagging practices for Latin participles in annotated corpora. We propose a solution through the LiLa Knowledge Base, a Linked Open Data framework designed to unify lexical and textual data for Latin. Using lemmas as the point of connection between distributed textual and lexical resources, LiLa introduces hypolemmas — secondary citation forms belonging to a word’s inflectional paradigm — as a means of reconciling divergent annotations for participles. Rather than advocating a single uniform annotation scheme, LiLa preserves each resource’s native guidelines while ensuring that users can retrieve and analyze participial data seamlessly. Via empirical assessments of multiple Latin corpora, we show how the LiLa’s integration of lemmas and hypolemmas enables consistent retrieval of participle forms regardless of whether they are categorized as verbal or adjectival.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2025
			
	Lingua del contenuto
	
				Inglese
			
	Titolo del volume che raccoglie gli atti
	
				Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)
			
	Denominazione evento
	
				The 19th Linguistic Annotation Workshop (LAW-XIX)
			
	Luogo dell'evento
	
				Vienna, Austria
			
	Data inizio evento
	
				31-lug-2025
			
	Data fine evento
	
				31-lug-2025
			
	ISBN del volume
	
				979-8-89176-262-6
			
	Editore
	
				Association for Computational Linguistics
			
	URL alternativo
	
				https://aclanthology.org/2025.law-1.8/
			
	Citazione
	
				Passarotti, M. C., Iurescia, F., Ruffolo, P.,  Harmonizing Divergent Lemmatization and Part-of-Speech Tagging Practices for Latin Participles through the LiLa Knowledge Base, in Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025), (Vienna, Austria,  31-31 July 2025), Association for Computational Linguistics, Vienna, Austria 2025: 103-114 [https://hdl.handle.net/10807/319716]
			
	Appare nelle tipologie:
	
				Atti di Convegno, Congresso, Giornate di studio, ecc., Workshop (in volume)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/319716

Citazioni

ND

ND

ND

social impact