IRIS UniCatt

The Index Thomisticus Treebank is the largest available treebank for Latin; it contains Medieval Latin texts by Thomas Aquinas. After experimenting on its data with a number of dependency parsers based on different supervised machine learning techniques, we found that DeSR with a multilayer perceptron algorithm, a right-to-left transition, and a tailor-made feature model is the parser providing the highest accuracy rates. We improved the results further by using a technique that combines the output parses of DeSR with those provided by other parsers, outperforming the previous state of the art in parsing the Index Thomisticus Treebank. The key idea behind such improvement is to ensure a sufficient diversity and accuracy of the outputs to be combined; for this reason, we performed an in-depth evaluation of the results provided by the different parsers that we combined. Finally, we assessed that, although the general architecture of the parser is portable to Classical Latin, yet the model trained on Medieval Latin is inadequate for such purpose.

Ponti, E., Passarotti, M. C., Differentia compositionem facit. A Slower-Paced and Reliable Parser for Latin, in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), (Portorož, 23-28 May 2016), European Language Resources Association (ELRA), Portorož 2016: 683-688 [http://hdl.handle.net/10807/78692]

Differentia compositionem facit. A Slower-Paced and Reliable Parser for Latin

Ponti, Edoardo;Passarotti, Marco Carlo

2016

Abstract

The Index Thomisticus Treebank is the largest available treebank for Latin; it contains Medieval Latin texts by Thomas Aquinas. After experimenting on its data with a number of dependency parsers based on different supervised machine learning techniques, we found that DeSR with a multilayer perceptron algorithm, a right-to-left transition, and a tailor-made feature model is the parser providing the highest accuracy rates. We improved the results further by using a technique that combines the output parses of DeSR with those provided by other parsers, outperforming the previous state of the art in parsing the Index Thomisticus Treebank. The key idea behind such improvement is to ensure a sufficient diversity and accuracy of the outputs to be combined; for this reason, we performed an in-depth evaluation of the results provided by the different parsers that we combined. Finally, we assessed that, although the general architecture of the parser is portable to Classical Latin, yet the model trained on Medieval Latin is inadequate for such purpose.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2016
			
	Lingua del contenuto
	
				Inglese
			
	Titolo del volume che raccoglie gli atti
	
				Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
			
	Denominazione evento
	
				Tenth International Conference on Language Resources and Evaluation (LREC 2016)
			
	Luogo dell'evento
	
				Portorož
			
	Data inizio evento
	
				23-mag-2016
			
	Data fine evento
	
				28-mag-2016
			
	ISBN del volume
	
				978-2-9517408-9-1
			
	Editore
	
				European Language Resources Association (ELRA)
			
	Citazione
	
				Ponti, E., Passarotti, M. C.,  Differentia compositionem facit. A Slower-Paced and Reliable Parser for Latin, in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), (Portorož,  23-28 May 2016), European Language Resources Association (ELRA), Portorož 2016: 683-688 [http://hdl.handle.net/10807/78692]
			
	Appare nelle tipologie:
	
				Atti di Convegno, Congresso, Giornate di studio, ecc., Workshop (in volume)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/78692

Citazioni

ND

11

8

social impact