Corpus linguistics is nowadays a well established field of research, where collaborative work with both computational and theoretical linguistics is required. As a matter of fact, computational linguistics makes use of corpus data to train probabilistic Natural Language Processing (NLP) tools, such as taggers and parsers; on the other hand, in empirical approaches to the study of language, theoretical linguistics refers to corpus evidence. On its side, corpus linguistics, as a discipline in itself, uses NLP tools to (semi)automatically build annotated corpora, and refers to linguistic theory as the backbone for the design of annotation guidelines. The creation of a linguistically annotated corpus is, therefore, an excellent opportunity to apply to real data (and potentially revise) linguistic theories which have been designed in a pre-corpus era. This is an even more attractive challenge if a language like Latin is involved. Indeed, while the language-dependent computational processing of Latin is today limited to automatic morphological tagging, a number of available language-independent methods and tools of analysis can be applied to it.
Passarotti, M. C., Theory and Practice of Corpus Annotation in the Index Thomisticus Treebank, <<LEXIS>>, 2009; 27 (N/A): 5-23 [http://hdl.handle.net/10807/1403]
Theory and Practice of Corpus Annotation in the Index Thomisticus Treebank
Passarotti, Marco Carlo
2009
Abstract
Corpus linguistics is nowadays a well established field of research, where collaborative work with both computational and theoretical linguistics is required. As a matter of fact, computational linguistics makes use of corpus data to train probabilistic Natural Language Processing (NLP) tools, such as taggers and parsers; on the other hand, in empirical approaches to the study of language, theoretical linguistics refers to corpus evidence. On its side, corpus linguistics, as a discipline in itself, uses NLP tools to (semi)automatically build annotated corpora, and refers to linguistic theory as the backbone for the design of annotation guidelines. The creation of a linguistically annotated corpus is, therefore, an excellent opportunity to apply to real data (and potentially revise) linguistic theories which have been designed in a pre-corpus era. This is an even more attractive challenge if a language like Latin is involved. Indeed, while the language-dependent computational processing of Latin is today limited to automatic morphological tagging, a number of available language-independent methods and tools of analysis can be applied to it.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.