This paper investigates the recent advances in parsing the Index Thomisticus Treebank, which encompasses Medieval Latin texts by Thomas Aquinas. The research focuses on two types of variables. On the one hand, it examines the impact that a larger dataset has on the results of parsing; on the other hand, performances of new parsers are analysed with respect to less recent tools. Term of comparison to determine the effective parsing advances are the results in parsing the Index Thomisticus Treebank described in a previous work. First, the best performing parser among those concerned in that study is tested on a larger dataset than the one originally used. Then, some parser combinations that were developed in the same study are evaluated as well, assessing that more training data result in more accurate performances. Finally, to examine the impact that newly available tools have on parsing results, we train, test, and evaluate two neural parsers chosen among those best performing in the CoNLL 2018 Shared Task. Our experiments reach the highest accuracy rates achieved so far in automatic syntactic parsing of the Index Thomisticus Treebank and of Latin overall.

Gamba, F., Passarotti, M. C., Ruffolo, P., More Data and New Tools. Advances in Parsing the Index Thomisticus Treebank, in Proceedings of the Conference on Computational Humanities Research 2021. Amsterdam, the Netherlands, November 17-19, 2021, CEUR Workshop Proceedings, 2021, (AMSTERDAM -- NLD, 17-21 November 2021), CEUR Workshop Proceedings (CEUR-WS.org), AMSTERDAM -- NLD 2021: 108-122 [http://hdl.handle.net/10807/187581]

More Data and New Tools. Advances in Parsing the Index Thomisticus Treebank

Passarotti, Marco Carlo;
2021

Abstract

This paper investigates the recent advances in parsing the Index Thomisticus Treebank, which encompasses Medieval Latin texts by Thomas Aquinas. The research focuses on two types of variables. On the one hand, it examines the impact that a larger dataset has on the results of parsing; on the other hand, performances of new parsers are analysed with respect to less recent tools. Term of comparison to determine the effective parsing advances are the results in parsing the Index Thomisticus Treebank described in a previous work. First, the best performing parser among those concerned in that study is tested on a larger dataset than the one originally used. Then, some parser combinations that were developed in the same study are evaluated as well, assessing that more training data result in more accurate performances. Finally, to examine the impact that newly available tools have on parsing results, we train, test, and evaluate two neural parsers chosen among those best performing in the CoNLL 2018 Shared Task. Our experiments reach the highest accuracy rates achieved so far in automatic syntactic parsing of the Index Thomisticus Treebank and of Latin overall.
2021
Inglese
Proceedings of the Conference on Computational Humanities Research 2021. Amsterdam, the Netherlands, November 17-19, 2021, CEUR Workshop Proceedings, 2021
Conference on Computational Humanities Research 2021
AMSTERDAM -- NLD
17-nov-2021
21-nov-2021
NA
CEUR Workshop Proceedings (CEUR-WS.org)
Gamba, F., Passarotti, M. C., Ruffolo, P., More Data and New Tools. Advances in Parsing the Index Thomisticus Treebank, in Proceedings of the Conference on Computational Humanities Research 2021. Amsterdam, the Netherlands, November 17-19, 2021, CEUR Workshop Proceedings, 2021, (AMSTERDAM -- NLD, 17-21 November 2021), CEUR Workshop Proceedings (CEUR-WS.org), AMSTERDAM -- NLD 2021: 108-122 [http://hdl.handle.net/10807/187581]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/187581
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact