IRIS UniCatt

We describe a methodology for identifying characterizing terms from a source text or paper and automatically building an ontology around them, with the purpose of semantically categorizing a paper corpus where documents sharing similar subjects may be subsequently clustered together by means of ontology alignment. We first employ a Natural Language Processing pipeline to extract relevant terms from the source text, and then use a combination of a pattern-based and machine-learning approach to establish semantic relationships among those terms, with some user's feedback required in-between. This methodology for discovering characterizing knowledge from textual sources finds its inception as an extension of PRAISED, our abbreviation discovery framework, in order to enhance its resolution capabilities. By moving from a paper-by-paper, mainly syntactical process to a corpus-based, semantic approach, it was in fact possible to overcome earlier limits of the system related to abbreviations whose explanation could not be found within the same paper they were cited in. At the same time, though, the methodology we present is not tied to this specific task, but is instead of relevance for a variety of contexts, and might therefore be used to build a stand-alone system for advanced knowledge extraction and semantic categorization. © 2012 IEEE.

Toti, D., Atzeni, P., Polticelli, F., A knowledge discovery methodology for semantic categorization of unstructured textual sources, Paper, in 8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012r, (Sorrento, ita, 25-29 November 2012), IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA 2012: 944-951. 10.1109/SITIS.2012.140 [http://hdl.handle.net/10807/163938]

A knowledge discovery methodology for semantic categorization of unstructured textual sources

Toti, Daniele;Atzeni P.;Polticelli F.

2012

Abstract

We describe a methodology for identifying characterizing terms from a source text or paper and automatically building an ontology around them, with the purpose of semantically categorizing a paper corpus where documents sharing similar subjects may be subsequently clustered together by means of ontology alignment. We first employ a Natural Language Processing pipeline to extract relevant terms from the source text, and then use a combination of a pattern-based and machine-learning approach to establish semantic relationships among those terms, with some user's feedback required in-between. This methodology for discovering characterizing knowledge from textual sources finds its inception as an extension of PRAISED, our abbreviation discovery framework, in order to enhance its resolution capabilities. By moving from a paper-by-paper, mainly syntactical process to a corpus-based, semantic approach, it was in fact possible to overcome earlier limits of the system related to abbreviations whose explanation could not be found within the same paper they were cited in. At the same time, though, the methodology we present is not tied to this specific task, but is instead of relevance for a variety of contexts, and might therefore be used to build a stand-alone system for advanced knowledge extraction and semantic categorization. © 2012 IEEE.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2012
			
	Lingua del contenuto
	
				Inglese
			
	Titolo del volume che raccoglie gli atti
	
				8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012r
			
	Denominazione evento
	
				8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012
			
	Luogo dell'evento
	
				Sorrento, ita
			
	Tipo di contributo
	
				Paper
			
	Data inizio evento
	
				25-nov-2012
			
	Data fine evento
	
				29-nov-2012
			
	ISBN della pubblicazione
	
				978-1-4673-5152-2
			
	Editore
	
				IEEE
			
	DOI del contributo
	
				https://dx.doi.org/10.1109/SITIS.2012.140
			
	Citazione
	
				Toti, D., Atzeni, P., Polticelli, F., A knowledge discovery methodology for semantic categorization of unstructured textual sources,  Paper, in 8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012r, (Sorrento, ita,  25-29 November 2012), IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA 2012: 944-951. 10.1109/SITIS.2012.140 [http://hdl.handle.net/10807/163938]
			
	Appare nelle tipologie:
	
				Paper, Selected paper, Contributed paper, Working paper, Poster, Poster paper, Comunicazione, Relazione (in volume)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/163938

Citazioni

ND

4

3

social impact