IRIS UniCatt

Corpus linguistics is nowadays a well established field of research, where collaborative work with both computational and theoretical linguistics is required. As a matter of fact, computational linguistics makes use of corpus data to train probabilistic Natural Language Processing (NLP) tools, such as taggers and parsers; on the other hand, in empirical approaches to the study of language, theoretical linguistics refers to corpus evidence. On its side, corpus linguistics, as a discipline in itself, uses NLP tools to (semi)automatically build annotated corpora, and refers to linguistic theory as the backbone for the design of annotation guidelines. The creation of a linguistically annotated corpus is, therefore, an excellent opportunity to apply to real data (and potentially revise) linguistic theories which have been designed in a pre-corpus era. This is an even more attractive challenge if a language like Latin is involved. Indeed, while the language-dependent computational processing of Latin is today limited to automatic morphological tagging, a number of available language-independent methods and tools of analysis can be applied to it.

Passarotti, M. C., Theory and Practice of Corpus Annotation in the Index Thomisticus Treebank, <<LEXIS>>, 2009; 27 (N/A): 5-23 [http://hdl.handle.net/10807/1403]

Theory and Practice of Corpus Annotation in the Index Thomisticus Treebank

Passarotti, Marco Carlo

2009

Abstract

Corpus linguistics is nowadays a well established field of research, where collaborative work with both computational and theoretical linguistics is required. As a matter of fact, computational linguistics makes use of corpus data to train probabilistic Natural Language Processing (NLP) tools, such as taggers and parsers; on the other hand, in empirical approaches to the study of language, theoretical linguistics refers to corpus evidence. On its side, corpus linguistics, as a discipline in itself, uses NLP tools to (semi)automatically build annotated corpora, and refers to linguistic theory as the backbone for the design of annotation guidelines. The creation of a linguistically annotated corpus is, therefore, an excellent opportunity to apply to real data (and potentially revise) linguistic theories which have been designed in a pre-corpus era. This is an even more attractive challenge if a language like Latin is involved. Indeed, while the language-dependent computational processing of Latin is today limited to automatic morphological tagging, a number of available language-independent methods and tools of analysis can be applied to it.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2009
			
	Lingua del contenuto
	
				Inglese
			
	Nome del periodico
	
				LEXIS
			
	Citazione
	
				Passarotti, M. C., Theory and Practice of Corpus Annotation in the Index Thomisticus Treebank, <<LEXIS>>, 2009;  27 (N/A): 5-23 [http://hdl.handle.net/10807/1403]
			
	Appare nelle tipologie:
	
				Articolo in rivista, Nota a sentenza

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/1403

Citazioni

ND

ND

ND

social impact