This paper addresses the challenge of divergent lemmatization and part-of-speech (PoS) tagging practices for Latin participles in annotated corpora. We propose a solution through the LiLa Knowledge Base, a Linked Open Data framework designed to unify lexical and textual data for Latin. Using lemmas as the point of connection between distributed textual and lexical resources, LiLa introduces hypolemmas — secondary citation forms belonging to a word’s inflectional paradigm — as a means of reconciling divergent annotations for participles. Rather than advocating a single uniform annotation scheme, LiLa preserves each resource’s native guidelines while ensuring that users can retrieve and analyze participial data seamlessly. Via empirical assessments of multiple Latin corpora, we show how the LiLa’s integration of lemmas and hypolemmas enables consistent retrieval of participle forms regardless of whether they are categorized as verbal or adjectival.

Passarotti, M. C., Iurescia, F., Ruffolo, P., Harmonizing Divergent Lemmatization and Part-of-Speech Tagging Practices for Latin Participles through the LiLa Knowledge Base, in Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025), (Vienna, Austria, 31-31 July 2025), Association for Computational Linguistics, Vienna, Austria 2025: 103-114 [https://hdl.handle.net/10807/319716]

Harmonizing Divergent Lemmatization and Part-of-Speech Tagging Practices for Latin Participles through the LiLa Knowledge Base

Passarotti, Marco Carlo;Iurescia, Federica;Ruffolo, Paolo
2025

Abstract

This paper addresses the challenge of divergent lemmatization and part-of-speech (PoS) tagging practices for Latin participles in annotated corpora. We propose a solution through the LiLa Knowledge Base, a Linked Open Data framework designed to unify lexical and textual data for Latin. Using lemmas as the point of connection between distributed textual and lexical resources, LiLa introduces hypolemmas — secondary citation forms belonging to a word’s inflectional paradigm — as a means of reconciling divergent annotations for participles. Rather than advocating a single uniform annotation scheme, LiLa preserves each resource’s native guidelines while ensuring that users can retrieve and analyze participial data seamlessly. Via empirical assessments of multiple Latin corpora, we show how the LiLa’s integration of lemmas and hypolemmas enables consistent retrieval of participle forms regardless of whether they are categorized as verbal or adjectival.
2025
Inglese
Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)
The 19th Linguistic Annotation Workshop (LAW-XIX)
Vienna, Austria
31-lug-2025
31-lug-2025
979-8-89176-262-6
Association for Computational Linguistics
Passarotti, M. C., Iurescia, F., Ruffolo, P., Harmonizing Divergent Lemmatization and Part-of-Speech Tagging Practices for Latin Participles through the LiLa Knowledge Base, in Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025), (Vienna, Austria, 31-31 July 2025), Association for Computational Linguistics, Vienna, Austria 2025: 103-114 [https://hdl.handle.net/10807/319716]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/319716
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact