Despite its key role in the history of computational linguistics, thanks to the pioneering work by Roberto Busa SJ on the Index Thomisticus, Latin can still be considered as a less-resourced language. Although during the last decades several Latin texts have been digitized, only a few of them have been linguistically tagged, while most still lack linguistic tagging at all. However, while the less-resourced status affects historical languages in general, over the past few years a number of language resources for Latin and other historical languages have been started, among which are several treebanks. Presenting the experience of the Index Thomisticus Treebank project and, particularly, its valency lexicon, this paper reports some general insights about the creation and use of language resources for less-resourced languages, showing that, although creating from scratch a language resource for a less-resourced language still remains a labor-intensive and time-consuming task, today this is simplified by exploiting the results provided by previous similar experiences in language resources development.
Passarotti, M. C., Leaving Behind the Less-Resourced Status. The Case of Latin through the Experience of the Index Thomisticus Treebank, in 7th SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-Resourced Languages, (La Valletta, Malta, 23-23 May 2010), ELRA, Malta 2010: 27-32 [http://hdl.handle.net/10807/1407]
Leaving Behind the Less-Resourced Status. The Case of Latin through the Experience of the Index Thomisticus Treebank
Passarotti, Marco Carlo
2010
Abstract
Despite its key role in the history of computational linguistics, thanks to the pioneering work by Roberto Busa SJ on the Index Thomisticus, Latin can still be considered as a less-resourced language. Although during the last decades several Latin texts have been digitized, only a few of them have been linguistically tagged, while most still lack linguistic tagging at all. However, while the less-resourced status affects historical languages in general, over the past few years a number of language resources for Latin and other historical languages have been started, among which are several treebanks. Presenting the experience of the Index Thomisticus Treebank project and, particularly, its valency lexicon, this paper reports some general insights about the creation and use of language resources for less-resourced languages, showing that, although creating from scratch a language resource for a less-resourced language still remains a labor-intensive and time-consuming task, today this is simplified by exploiting the results provided by previous similar experiences in language resources development.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.