IRIS UniCatt

In this paper we define two parallel data sets based on pseudowords, extracted from the same corpus. They both consist of word-centered graphs for each of 1225 different pseudowords, and use respectively first-order co-occurrences and secondorder semantic similarities. We propose an evaluation framework on these data sets for graph-based Word Sense Induction (WSI) focused on the case of coarsegrained homonymy: We compare different WSI clustering algorithms by measuring how well their outputs agree with the a priori known ground-truth decomposition of a pseudoword. We perform this evaluation for four different clustering algorithms: the Markov cluster algorithm, Chinese Whispers, MaxMax and a gangplankbased clustering algorithm. To further improve the comparison between these algorithms and the analysis of their behaviours, we also define a new specific evaluation measure. As far as we know, this is the first large-scale systematic pseudoword evaluation dedicated to the induction of coarsegrained homonymous word senses.

Cecchini, F. M., Riedl, M., Biemann, C., Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction, Paper, in Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, (Gothenburg, SWEDEN, 22-24 May 2017), Linköping University Electronic Press, Linköping 2017:<<LINKÖPING ELECTRONIC CONFERENCE PROCEEDINGS>>,131 105-114 [http://hdl.handle.net/10807/122036]

Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction

Cecchini, Flavio Massimiliano;Martin, Riedl;Chris, Biemann

2017

Abstract

In this paper we define two parallel data sets based on pseudowords, extracted from the same corpus. They both consist of word-centered graphs for each of 1225 different pseudowords, and use respectively first-order co-occurrences and secondorder semantic similarities. We propose an evaluation framework on these data sets for graph-based Word Sense Induction (WSI) focused on the case of coarsegrained homonymy: We compare different WSI clustering algorithms by measuring how well their outputs agree with the a priori known ground-truth decomposition of a pseudoword. We perform this evaluation for four different clustering algorithms: the Markov cluster algorithm, Chinese Whispers, MaxMax and a gangplankbased clustering algorithm. To further improve the comparison between these algorithms and the analysis of their behaviours, we also define a new specific evaluation measure. As far as we know, this is the first large-scale systematic pseudoword evaluation dedicated to the induction of coarsegrained homonymous word senses.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2017
			
	Lingua del contenuto
	
				Inglese
			
	Titolo del volume che raccoglie gli atti
	
				Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa
			
	Denominazione evento
	
				Nordic Conference on Computational Linguistics, NoDaLiDa
			
	Luogo dell'evento
	
				Gothenburg, SWEDEN
			
	Tipo di contributo
	
				Paper
			
	Data inizio evento
	
				22-mag-2017
			
	Data fine evento
	
				24-mag-2017
			
	ISBN della pubblicazione
	
				9789176856017
			
	Nome della collana/serie
	
				LINKÖPING ELECTRONIC CONFERENCE PROCEEDINGS
			
	Editore
	
				Linköping University Electronic Press
			
	Citazione
	
				Cecchini, F. M., Riedl, M., Biemann, C., Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction,  Paper, in Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, (Gothenburg, SWEDEN,  22-24 May 2017), Linköping University Electronic Press, Linköping 2017:<<LINKÖPING ELECTRONIC CONFERENCE PROCEEDINGS>>,131 105-114 [http://hdl.handle.net/10807/122036]
			
	Appare nelle tipologie:
	
				Paper, Selected paper, Contributed paper, Working paper, Poster, Poster paper, Comunicazione, Relazione (in volume)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/122036

Citazioni

ND

2

ND

social impact