In this paper we present an unsupervised, graph-based approach for Word Sense Discrimination. Given a set of text sentences, a word co-occurrence graph is derived and a distance based on Jaccard index is defined on it; subsequently, the new distance is used to cluster the neighbour nodes of ambiguous terms using the concept of “gangplanks” as edges that separate denser regions (“islands”) in the graph. The proposed approach has been evaluated on a real data set, showing promising performance in Word Sense Discrimination.
L’obiettivo di questo articolo è descrivere un approccio di clustering non supervisionato e basato su grafi per individuare e discriminare i differenti sensi che un termine può assumere all’interno di un testo. Partendo da un grafo di cooccorrenze, vi definiamo una distanza fra nodi e applichiamo un algoritmo basato sulle “passerelle”, cioè archi che separano regioni dense (“isole”) all’interno del grafo. Discutiamo i risultati ottenuti su un insieme di dati composto da tweet.
Cecchini, F. M., Fersini, E., Word Sense Discrimination: A Gangplank Algorithm, Comunicazione, in Proceedings of the second Italian conference on Computational Linguistics CLiC-it 2015, (Fondazione Bruno Kessler, Trento, 03-04 December 2015), aAccademia University Press, Torino 2015: 77-81 [http://hdl.handle.net/10807/122108]
Word Sense Discrimination: A Gangplank Algorithm
Cecchini, Flavio Massimiliano;
2015
Abstract
In this paper we present an unsupervised, graph-based approach for Word Sense Discrimination. Given a set of text sentences, a word co-occurrence graph is derived and a distance based on Jaccard index is defined on it; subsequently, the new distance is used to cluster the neighbour nodes of ambiguous terms using the concept of “gangplanks” as edges that separate denser regions (“islands”) in the graph. The proposed approach has been evaluated on a real data set, showing promising performance in Word Sense Discrimination.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.