In this paper, we create and evaluate non-combined and combined models using Old and Contemporary Italian data to determine whether increasing the size of the training data with a combined model could improve parsing accuracy to facilitate manual annotation. We find that, despite the increased size of the training data, in-domain parsing performs better. Additionally, we discover that models trained on Old Italian data perform better on Contemporary Italian data than the reverse. We attempt to explain this result in terms of syntactic complexity, finding that Old Italian text exhibits higher sentence length and non-projectivity rate.

Corbetta, C., Moretti, G., Passarotti, M. C., Join Together? Combining Data to Parse Italian Texts, in Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), (Pisa, 04-06 December 2024), CEUR Workshop Proceedings, Pisa 2024: 251-257 [https://hdl.handle.net/10807/308718]

Join Together? Combining Data to Parse Italian Texts

Passarotti, Marco Carlo
2024

Abstract

In this paper, we create and evaluate non-combined and combined models using Old and Contemporary Italian data to determine whether increasing the size of the training data with a combined model could improve parsing accuracy to facilitate manual annotation. We find that, despite the increased size of the training data, in-domain parsing performs better. Additionally, we discover that models trained on Old Italian data perform better on Contemporary Italian data than the reverse. We attempt to explain this result in terms of syntactic complexity, finding that Old Italian text exhibits higher sentence length and non-projectivity rate.
2024
Inglese
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)
Pisa
4-dic-2024
6-dic-2024
979-12-210-7060-6
CEUR Workshop Proceedings
Corbetta, C., Moretti, G., Passarotti, M. C., Join Together? Combining Data to Parse Italian Texts, in Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), (Pisa, 04-06 December 2024), CEUR Workshop Proceedings, Pisa 2024: 251-257 [https://hdl.handle.net/10807/308718]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/308718
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact