IRIS UniCatt

The creation of language resources for less-resourced languages like the historical ones benefits from the exploitation of language-independent tools and methods developed over the years by many projects for modern languages. Along these lines, a number of treebanks for historical languages started recently to arise, including treebanks for Latin. Among the Latin treebanks, the Index Thomisticus Treebank is a 68,000 token dependency treebank based on the Index Thomisticus by Roberto Busa SJ, which contains the opera omnia of Thomas Aquinas (118 texts) as well as 61 texts by other authors related to Thomas, for a total of approximately 11 million tokens. In this paper, we describe a number of modifications that we applied to the dependency parser DeSR, in order to improve the parsing accuracy rates on the Index Thomisticus Treebank. First, we adapted the parser to the specific processing of Medieval Latin, defining an ad-hoc configuration of its features. Then, in order to improve the accuracy rates provided by DeSR, we applied a revision parsing method and we combined the outputs produced by different algorithms. This allowed us to improve accuracy rates substantially, reaching results that are well beyond the state of the art of parsing for Latin.

Passarotti, M. C., Dell'Orletta, F., Improvements in Parsing the Index Thomisticus Treebank. Revision, Combination and a Feature Model for Medieval Latin, in Proceedings of the Seventh International Conference on Language Resources and Evaluation, (La Valletta - Malta, 19-21 May 2010), ELRA, La Valletta 2010: 1964-1971 [http://hdl.handle.net/10807/1402]

Improvements in Parsing the Index Thomisticus Treebank. Revision, Combination and a Feature Model for Medieval Latin

Passarotti, Marco Carlo;Dell'Orletta, Felice

2010

Abstract

The creation of language resources for less-resourced languages like the historical ones benefits from the exploitation of language-independent tools and methods developed over the years by many projects for modern languages. Along these lines, a number of treebanks for historical languages started recently to arise, including treebanks for Latin. Among the Latin treebanks, the Index Thomisticus Treebank is a 68,000 token dependency treebank based on the Index Thomisticus by Roberto Busa SJ, which contains the opera omnia of Thomas Aquinas (118 texts) as well as 61 texts by other authors related to Thomas, for a total of approximately 11 million tokens. In this paper, we describe a number of modifications that we applied to the dependency parser DeSR, in order to improve the parsing accuracy rates on the Index Thomisticus Treebank. First, we adapted the parser to the specific processing of Medieval Latin, defining an ad-hoc configuration of its features. Then, in order to improve the accuracy rates provided by DeSR, we applied a revision parsing method and we combined the outputs produced by different algorithms. This allowed us to improve accuracy rates substantially, reaching results that are well beyond the state of the art of parsing for Latin.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2010
			
	Lingua del contenuto
	
				Inglese
			
	Titolo del volume che raccoglie gli atti
	
				Proceedings of the Seventh International Conference on Language Resources and Evaluation
			
	Denominazione evento
	
				Seventh International Conference on Language Resources and Evaluation (LREC 2010)
			
	Luogo dell'evento
	
				La Valletta - Malta
			
	Data inizio evento
	
				19-mag-2010
			
	Data fine evento
	
				21-mag-2010
			
	ISBN del volume
	
				2-9517408-6-7
			
	Editore
	
	Citazione
	
				Passarotti, M. C., Dell'Orletta, F.,  Improvements in Parsing the Index Thomisticus Treebank. Revision, Combination and a Feature Model for Medieval Latin, in Proceedings of the Seventh International Conference on Language Resources and Evaluation, (La Valletta - Malta,  19-21 May 2010), ELRA, La Valletta 2010: 1964-1971 [http://hdl.handle.net/10807/1402]
			
	Appare nelle tipologie:
	
				Atti di Convegno, Congresso, Giornate di studio, ecc., Workshop (in volume)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/1402

Citazioni

ND

ND

ND

social impact