Language resources (LRs) such as corpora, lexica, grammars and ontologies are strictly related to each other at both development and exploitation stage. In particular, a strong relation holds between lexical resources and annotated corpora. Recent years have seen a large growth of projects aimed at building LRs for Classical languages. Among these new LRs are syntactically annotated corpora (treebanks), which can be exploited to provide empirical evidence to test and refine lexical resources developed over the centuries by Ancient Greek and Latin lexicography. This paper describes the application of clustering techniques to the Index Thomisticus Treebank corpus to organise the meanings of lemma forma in Thomas Aquinas’ works, according to its textual and syntactic behaviour. Clustering is an unsupervised learning method dealing with finding a structure in a collection of data. Applying clustering techniques to textual data grounds on the theoretical assumption that words that are used in similar contexts tend to have the same or related meanings (Distributional Hypothesis by HARRIS (1954)). Our results show that syntactic metadata are indeed helpful for clustering purposes.

Passarotti, M. C., From Treebanks to Lexical Entries.Clustering the Index Thomisticus, in Actes du 31e Colloque International sur le Lexique et la Grammaire, (Nové Hrady, 19-22 September 2012), Université de Bohême du Sud à České Budějovice, České Budějovice 2012: 143-147 [http://hdl.handle.net/10807/40639]

From Treebanks to Lexical Entries. Clustering the Index Thomisticus

Passarotti, Marco Carlo
2012

Abstract

Language resources (LRs) such as corpora, lexica, grammars and ontologies are strictly related to each other at both development and exploitation stage. In particular, a strong relation holds between lexical resources and annotated corpora. Recent years have seen a large growth of projects aimed at building LRs for Classical languages. Among these new LRs are syntactically annotated corpora (treebanks), which can be exploited to provide empirical evidence to test and refine lexical resources developed over the centuries by Ancient Greek and Latin lexicography. This paper describes the application of clustering techniques to the Index Thomisticus Treebank corpus to organise the meanings of lemma forma in Thomas Aquinas’ works, according to its textual and syntactic behaviour. Clustering is an unsupervised learning method dealing with finding a structure in a collection of data. Applying clustering techniques to textual data grounds on the theoretical assumption that words that are used in similar contexts tend to have the same or related meanings (Distributional Hypothesis by HARRIS (1954)). Our results show that syntactic metadata are indeed helpful for clustering purposes.
2012
Inglese
Actes du 31e Colloque International sur le Lexique et la Grammaire
31e Colloque International sur le Lexique et la Grammaire
Nové Hrady
19-set-2012
22-set-2012
978-80-7394-409-4
Université de Bohême du Sud à České Budějovice
Passarotti, M. C., From Treebanks to Lexical Entries.Clustering the Index Thomisticus, in Actes du 31e Colloque International sur le Lexique et la Grammaire, (Nové Hrady, 19-22 September 2012), Université de Bohême du Sud à České Budějovice, České Budějovice 2012: 143-147 [http://hdl.handle.net/10807/40639]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/40639
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact