Despite its key role in the history of computational linguistics, thanks to the pioneering work by Roberto Busa SJ on the Index Thomisticus, Latin can still be considered as a less-resourced language. Although during the last decades several Latin texts have been digitized, only a few of them have been linguistically tagged, while most still lack linguistic tagging at all. However, while the less-resourced status affects historical languages in general, over the past few years a number of language resources for Latin and other historical languages have been started, among which are several treebanks. Presenting the experience of the Index Thomisticus Treebank project and, particularly, its valency lexicon, this paper reports some general insights about the creation and use of language resources for less-resourced languages, showing that, although creating from scratch a language resource for a less-resourced language still remains a labor-intensive and time-consuming task, today this is simplified by exploiting the results provided by previous similar experiences in language resources development.

Passarotti, M. C., Leaving Behind the Less-Resourced Status. The Case of Latin through the Experience of the Index Thomisticus Treebank, in 7th SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-Resourced Languages, (La Valletta, Malta, 23-23 May 2010), ELRA, Malta 2010: 27-32 [http://hdl.handle.net/10807/1407]

Leaving Behind the Less-Resourced Status. The Case of Latin through the Experience of the Index Thomisticus Treebank

Passarotti, Marco Carlo
2010

Abstract

Despite its key role in the history of computational linguistics, thanks to the pioneering work by Roberto Busa SJ on the Index Thomisticus, Latin can still be considered as a less-resourced language. Although during the last decades several Latin texts have been digitized, only a few of them have been linguistically tagged, while most still lack linguistic tagging at all. However, while the less-resourced status affects historical languages in general, over the past few years a number of language resources for Latin and other historical languages have been started, among which are several treebanks. Presenting the experience of the Index Thomisticus Treebank project and, particularly, its valency lexicon, this paper reports some general insights about the creation and use of language resources for less-resourced languages, showing that, although creating from scratch a language resource for a less-resourced language still remains a labor-intensive and time-consuming task, today this is simplified by exploiting the results provided by previous similar experiences in language resources development.
2010
Inglese
7th SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-Resourced Languages
7th SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-Resourced Languages
La Valletta, Malta
23-mag-2010
23-mag-2010
2-9517408-6-7
Passarotti, M. C., Leaving Behind the Less-Resourced Status. The Case of Latin through the Experience of the Index Thomisticus Treebank, in 7th SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-Resourced Languages, (La Valletta, Malta, 23-23 May 2010), ELRA, Malta 2010: 27-32 [http://hdl.handle.net/10807/1407]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/1407
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact