IRIS UniCatt

In this paper, we create and evaluate non-combined and combined models using Old and Contemporary Italian data to determine whether increasing the size of the training data with a combined model could improve parsing accuracy to facilitate manual annotation. We find that, despite the increased size of the training data, in-domain parsing performs better. Additionally, we discover that models trained on Old Italian data perform better on Contemporary Italian data than the reverse. We attempt to explain this result in terms of syntactic complexity, finding that Old Italian text exhibits higher sentence length and non-projectivity rate.

Corbetta, C., Moretti, G., Passarotti, M. C., Join Together? Combining Data to Parse Italian Texts, in Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), (Pisa, 04-06 December 2024), CEUR Workshop Proceedings, Pisa 2024: 251-257 [https://hdl.handle.net/10807/308718]

Join Together? Combining Data to Parse Italian Texts

Claudia Corbetta;Giovanni Moretti;Passarotti, Marco Carlo

2024

Abstract

In this paper, we create and evaluate non-combined and combined models using Old and Contemporary Italian data to determine whether increasing the size of the training data with a combined model could improve parsing accuracy to facilitate manual annotation. We find that, despite the increased size of the training data, in-domain parsing performs better. Additionally, we discover that models trained on Old Italian data perform better on Contemporary Italian data than the reverse. We attempt to explain this result in terms of syntactic complexity, finding that Old Italian text exhibits higher sentence length and non-projectivity rate.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2024
			
	Lingua del contenuto
	
				Inglese
			
	Titolo del volume che raccoglie gli atti
	
				Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
			
	Denominazione evento
	
				Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)
			
	Luogo dell'evento
	
				Pisa
			
	Data inizio evento
	
				4-dic-2024
			
	Data fine evento
	
				6-dic-2024
			
	ISBN del volume
	
				979-12-210-7060-6
			
	Editore
	
				CEUR Workshop Proceedings
			
	URL alternativo
	
				https://aclanthology.org/2024.clicit-1.30/
			
	Citazione
	
				Corbetta, C., Moretti, G., Passarotti, M. C.,  Join Together? Combining Data to Parse Italian Texts, in Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), (Pisa,  04-06 December 2024), CEUR Workshop Proceedings, Pisa 2024: 251-257 [https://hdl.handle.net/10807/308718]
			
	Appare nelle tipologie:
	
				Atti di Convegno, Congresso, Giornate di studio, ecc., Workshop (in volume)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/308718

Citazioni

ND

ND

ND

social impact