The paper introduces the project of the Index Thomisticus Treebank (IT-TB). The IT-TB is a dependency-based treebank based on the corpus of the Index Thomisticus by father Roberto Busa (IT), which includes the opera omnia of Thomas Aquinas, for a total of approximately 11 million words. Currently, the IT-TB is the largest Latin treebank available, with more than 350,000 nodes in around 17,000 sentences. The annotation covers the entire books 1, 2 and 3 of Summa contra Gentiles, plus excerpts from Scriptum super Sententiis Magistri Petri Lombardi and Summa Theologiae. The paper details the multi-layer annotation style of the IT-TB and its background theoretical motivations. The conversion process to the now widely used Universal Dependencies style is described as well. Across more than a decade, the proj- ect has developed a number of linguistic resources and NLP tools for Latin connected to the IT-TB. As for the resources, the paper presents the syntax- based subcategorization lexicon IT-VaLex and the valency lexicon Latin Vallex. As for the tools, the automatic dependency parsing process is de- scribed, highlighting the core issue of portability of NLP tools across the wide diachronic and diatopic span of Latin texts. A section is dedicated to auto- matic morphological analysis of Latin, introducing the analyzer Lemlat and its recent enhancement with information on derivational morphology and a new set of lexical entries covering a large Onomasticon (from Forcellini dic- tionary) and Medieval Latin (from Du Cange glossary).

Passarotti, M. C., The Project of the Index Thomisticus Treebank, in Berti, M. (ed.), Digital Classical Philology. Ancient Greek and Latin in the Digital Revolution, De Gruyter, Berlin - Boston 2019: <<AGE OF ACCESS? GRUNDFRAGEN DER INFORMATIONSGESELLSCHAFT>>, 10 299- 319. 10.1515/9783110599572-017 [http://hdl.handle.net/10807/141133]

The Project of the Index Thomisticus Treebank

Passarotti, Marco Carlo
2019

Abstract

The paper introduces the project of the Index Thomisticus Treebank (IT-TB). The IT-TB is a dependency-based treebank based on the corpus of the Index Thomisticus by father Roberto Busa (IT), which includes the opera omnia of Thomas Aquinas, for a total of approximately 11 million words. Currently, the IT-TB is the largest Latin treebank available, with more than 350,000 nodes in around 17,000 sentences. The annotation covers the entire books 1, 2 and 3 of Summa contra Gentiles, plus excerpts from Scriptum super Sententiis Magistri Petri Lombardi and Summa Theologiae. The paper details the multi-layer annotation style of the IT-TB and its background theoretical motivations. The conversion process to the now widely used Universal Dependencies style is described as well. Across more than a decade, the proj- ect has developed a number of linguistic resources and NLP tools for Latin connected to the IT-TB. As for the resources, the paper presents the syntax- based subcategorization lexicon IT-VaLex and the valency lexicon Latin Vallex. As for the tools, the automatic dependency parsing process is de- scribed, highlighting the core issue of portability of NLP tools across the wide diachronic and diatopic span of Latin texts. A section is dedicated to auto- matic morphological analysis of Latin, introducing the analyzer Lemlat and its recent enhancement with information on derivational morphology and a new set of lexical entries covering a large Onomasticon (from Forcellini dic- tionary) and Medieval Latin (from Du Cange glossary).
2019
Inglese
Digital Classical Philology. Ancient Greek and Latin in the Digital Revolution
978-3-11-059678-6
De Gruyter
10
Passarotti, M. C., The Project of the Index Thomisticus Treebank, in Berti, M. (ed.), Digital Classical Philology. Ancient Greek and Latin in the Digital Revolution, De Gruyter, Berlin - Boston 2019: <<AGE OF ACCESS? GRUNDFRAGEN DER INFORMATIONSGESELLSCHAFT>>, 10 299- 319. 10.1515/9783110599572-017 [http://hdl.handle.net/10807/141133]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/141133
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 23
  • ???jsp.display-item.citation.isi??? 9
social impact