In this paper we define two parallel data sets based on pseudowords, extracted from the same corpus. They both consist of word-centered graphs for each of 1225 different pseudowords, and use respectively first-order co-occurrences and secondorder semantic similarities. We propose an evaluation framework on these data sets for graph-based Word Sense Induction (WSI) focused on the case of coarsegrained homonymy: We compare different WSI clustering algorithms by measuring how well their outputs agree with the a priori known ground-truth decomposition of a pseudoword. We perform this evaluation for four different clustering algorithms: the Markov cluster algorithm, Chinese Whispers, MaxMax and a gangplankbased clustering algorithm. To further improve the comparison between these algorithms and the analysis of their behaviours, we also define a new specific evaluation measure. As far as we know, this is the first large-scale systematic pseudoword evaluation dedicated to the induction of coarsegrained homonymous word senses.

Cecchini, F. M., Riedl, M., Biemann, C., Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction, Paper, in Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, (Gothenburg, SWEDEN, 22-24 May 2017), Linköping University Electronic Press, Linköping 2017:<<LINKÖPING ELECTRONIC CONFERENCE PROCEEDINGS>>,131 105-114 [http://hdl.handle.net/10807/122036]

Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction

Cecchini, Flavio Massimiliano
;
2017

Abstract

In this paper we define two parallel data sets based on pseudowords, extracted from the same corpus. They both consist of word-centered graphs for each of 1225 different pseudowords, and use respectively first-order co-occurrences and secondorder semantic similarities. We propose an evaluation framework on these data sets for graph-based Word Sense Induction (WSI) focused on the case of coarsegrained homonymy: We compare different WSI clustering algorithms by measuring how well their outputs agree with the a priori known ground-truth decomposition of a pseudoword. We perform this evaluation for four different clustering algorithms: the Markov cluster algorithm, Chinese Whispers, MaxMax and a gangplankbased clustering algorithm. To further improve the comparison between these algorithms and the analysis of their behaviours, we also define a new specific evaluation measure. As far as we know, this is the first large-scale systematic pseudoword evaluation dedicated to the induction of coarsegrained homonymous word senses.
2017
Inglese
Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa
Nordic Conference on Computational Linguistics, NoDaLiDa
Gothenburg, SWEDEN
Paper
22-mag-2017
24-mag-2017
9789176856017
Linköping University Electronic Press
Cecchini, F. M., Riedl, M., Biemann, C., Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction, Paper, in Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, (Gothenburg, SWEDEN, 22-24 May 2017), Linköping University Electronic Press, Linköping 2017:<<LINKÖPING ELECTRONIC CONFERENCE PROCEEDINGS>>,131 105-114 [http://hdl.handle.net/10807/122036]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/122036
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact