This paper presents L-KD, a tool that relies on available linguistic and knowledge resources to perform keyphrase clustering and labelling. The aim of L-KD is to help finding and tracing themes in English and Italian text data, represented by groups of keyphrases and associated domains. We perform an evaluation of the top-ranked domains using the 20 Newsgroup dataset, and we show that 8 domains out of 10 match with manually assigned labels. This confirms the good accuracy of this approach, which does not require supervision.
Moretti, G., Sprugnoli, R., Tonelli, S., KD Strikes Back: from Keyphrases to Labelled Domains Using External Knowledge Sources, in Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) & Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2016), (Napoli, Italia, 05-07 December 2016), aAccademia University Press, Torino 2016:1749 216-221. [10.4000/books.aaccademia.1814] [http://hdl.handle.net/10807/132966]
KD Strikes Back: from Keyphrases to Labelled Domains Using External Knowledge Sources
Sprugnoli, Rachele;
2016
Abstract
This paper presents L-KD, a tool that relies on available linguistic and knowledge resources to perform keyphrase clustering and labelling. The aim of L-KD is to help finding and tracing themes in English and Italian text data, represented by groups of keyphrases and associated domains. We perform an evaluation of the top-ranked domains using the 20 Newsgroup dataset, and we show that 8 domains out of 10 match with manually assigned labels. This confirms the good accuracy of this approach, which does not require supervision.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.