Although lexicography of Latin has a long tradition dating back to ancient grammarians, and almost all Latin grammars devote to wordformation at least one part of the section(s) concerning morphology, none of the today available lexical resources and NLP tools of Latin feature a wordformation-based organization of the Latin lexicon. In this paper, we describe the first steps towards the semi-automatic development of a wordformation-based lexicon of Latin, by detailing several problems occurring while building the lexicon and presenting our solutions. Developing a wordformation-based lexicon of Latin is nowadays of outmost importance, as the last years have seen a large growth of annotated corpora of Latin texts of different eras. While these corpora include lemmatization, morphological tagging and syntactic analysis, none of them features segmentation of the word forms and wordformation relations between the lexemes. This restricts the browsing and the exploitation of the annotated data for linguistic research and NLP tasks, such as information retrieval and heuristics in PoS tagging of unknown words.

Passarotti, M. C., Mambrini, F., First Steps towards the Semi-automatic Development of a Wordformation-based Lexicon of Latin, in Proceedings of the Eighth International Conference on Language Resources and Evaluation, (Istanbul, 23-25 May 2012), ELDA, Istanbul 2012: 852-859 [http://hdl.handle.net/10807/1422]

First Steps towards the Semi-automatic Development of a Wordformation-based Lexicon of Latin

Passarotti, Marco Carlo;Mambrini, Francesco
2012

Abstract

Although lexicography of Latin has a long tradition dating back to ancient grammarians, and almost all Latin grammars devote to wordformation at least one part of the section(s) concerning morphology, none of the today available lexical resources and NLP tools of Latin feature a wordformation-based organization of the Latin lexicon. In this paper, we describe the first steps towards the semi-automatic development of a wordformation-based lexicon of Latin, by detailing several problems occurring while building the lexicon and presenting our solutions. Developing a wordformation-based lexicon of Latin is nowadays of outmost importance, as the last years have seen a large growth of annotated corpora of Latin texts of different eras. While these corpora include lemmatization, morphological tagging and syntactic analysis, none of them features segmentation of the word forms and wordformation relations between the lexemes. This restricts the browsing and the exploitation of the annotated data for linguistic research and NLP tasks, such as information retrieval and heuristics in PoS tagging of unknown words.
Inglese
Proceedings of the Eighth International Conference on Language Resources and Evaluation
LREC 2012
Istanbul
23-mag-2012
25-mag-2012
978-2-9517408-7-7
ELDA
Passarotti, M. C., Mambrini, F., First Steps towards the Semi-automatic Development of a Wordformation-based Lexicon of Latin, in Proceedings of the Eighth International Conference on Language Resources and Evaluation, (Istanbul, 23-25 May 2012), ELDA, Istanbul 2012: 852-859 [http://hdl.handle.net/10807/1422]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/1422
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact