From Treebanks to Lexical Entries.
Clustering the Index Thomisticus

Passarotti, Marco Carlo

Language resources (LRs) such as corpora, lexica, grammars and ontologies are strictly related to each other at both development and exploitation stage. In particular, a strong relation holds between lexical resources and annotated corpora. Recent years have seen a large growth of projects aimed at building LRs for Classical languages. Among these new LRs are syntactically annotated corpora (treebanks), which can be exploited to provide empirical evidence to test and refine lexical resources developed over the centuries by Ancient Greek and Latin lexicography. This paper describes the application of clustering techniques to the Index Thomisticus Treebank corpus to organise the meanings of lemma forma in Thomas Aquinas’ works, according to its textual and syntactic behaviour. Clustering is an unsupervised learning method dealing with finding a structure in a collection of data. Applying clustering techniques to textual data grounds on the theoretical assumption that words that are used in similar contexts tend to have the same or related meanings (Distributional Hypothesis by HARRIS (1954)). Our results show that syntactic metadata are indeed helpful for clustering purposes.

Passarotti, M. C., From Treebanks to Lexical Entries.Clustering the Index Thomisticus, in Actes du 31e Colloque International sur le Lexique et la Grammaire, (Nové Hrady, 19-22 September 2012), Université de Bohême du Sud à České Budějovice, České Budějovice 2012: 143-147 [http://hdl.handle.net/10807/40639]

From Treebanks to Lexical Entries. Clustering the Index Thomisticus

Passarotti, Marco Carlo

2012

Abstract

Language resources (LRs) such as corpora, lexica, grammars and ontologies are strictly related to each other at both development and exploitation stage. In particular, a strong relation holds between lexical resources and annotated corpora. Recent years have seen a large growth of projects aimed at building LRs for Classical languages. Among these new LRs are syntactically annotated corpora (treebanks), which can be exploited to provide empirical evidence to test and refine lexical resources developed over the centuries by Ancient Greek and Latin lexicography. This paper describes the application of clustering techniques to the Index Thomisticus Treebank corpus to organise the meanings of lemma forma in Thomas Aquinas’ works, according to its textual and syntactic behaviour. Clustering is an unsupervised learning method dealing with finding a structure in a collection of data. Applying clustering techniques to textual data grounds on the theoretical assumption that words that are used in similar contexts tend to have the same or related meanings (Distributional Hypothesis by HARRIS (1954)). Our results show that syntactic metadata are indeed helpful for clustering purposes.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2012
			
	Lingua del contenuto
	
				Inglese
			
	Titolo del volume che raccoglie gli atti
	
				Actes du 31e Colloque International sur le Lexique et la Grammaire
			
	Denominazione evento
	
				31e Colloque International sur le Lexique et la Grammaire
			
	Luogo dell'evento
	
				Nové Hrady
			
	Data inizio evento
	
				19-set-2012
			
	Data fine evento
	
				22-set-2012
			
	ISBN del volume
	
				978-80-7394-409-4
			
	Editore
	
				Université de Bohême du Sud à České Budějovice
			
	URL alternativo
	
				http://books.google.it/books?id=yHnhKOpA0uAC&pg=PA143&lpg=PA143&dq=passarotti+Actes+du+31e+Colloque+International+sur+le+Lexique+et+la+Grammaire,&source=bl&ots=DUpid9NUjV&sig=fngaVnah591c5snTdkCpoYx2ZYE&hl=it&sa=X&ei=IUS0Uq6uKsifyQP4kYDYCg&ved=0CEAQ6AEwAw#v=onepage&q=passarotti%20Actes%20du%2031e%20Colloque%20International%20sur%20le%20Lexique%20et%20la%20Grammaire%2C&f=false
			
	Citazione
	
				Passarotti, M. C.,  From Treebanks to Lexical Entries.Clustering the Index Thomisticus, in Actes du 31e Colloque International sur le Lexique et la Grammaire, (Nové Hrady,  19-22 September 2012), Université de Bohême du Sud à České Budějovice, České Budějovice 2012: 143-147 [http://hdl.handle.net/10807/40639]
			
	Appare nelle tipologie:
	
				Atti di Convegno, Congresso, Giornate di studio, ecc., Workshop (in volume)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/40639

Citazioni

ND

ND

ND

IRIS UniCatt

From Treebanks to Lexical Entries. Clustering the Index Thomisticus

Passarotti, Marco Carlo

2012

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

IRIS UniCatt

From Treebanks to Lexical Entries. Clustering the Index Thomisticus

Passarotti, Marco Carlo

2012

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)