We propose a methodology to automatically discover characterizing knowledge from textual sources, with the purpose of semantically categorizing them and clustering them together according to their subjects. Such a methodology is based upon several challenging steps, like terminology extraction and disambiguation, semantic similarity identification via ontology alignment, and a core pattern-based strategy for automatic ontology building. This methodology was originally devised as an extension of PRAISED, our abbreviation identification and resolution proposal, with the purpose of allowing us to resolve previously unresolvable abbreviations, whose explanation either escapes the system's proximity-based approach or is not found within the very source text they are featured in. By moving from a paper-by-paper, mainly syntactical process to a corpus-based, semantic approach, it will be in fact possible to dramatically enhance our system in terms of its resolution capabilities. Nevertheless, the strategy we present here is not tied to this specific task, but is instead of relevance for a variety of contexts, and might therefore find a far wider applicability for other advanced knowledge extraction and discovery systems. Copyright (c) 2012 - Edizioni Libreria Progetto and the authors.

Atzeni, P., Polticelli, F., Toti, D., Knowledge discovery from textual sources by using semantic similarity, Paper, in Proceedings of the 20th Italian Symposium on Advanced Database Systems, SEBD 2012, (Venice, ita, 24-27 June 2012), Università Ca' Foscari, Venezia 2012: 213-220 [http://hdl.handle.net/10807/165882]

Knowledge discovery from textual sources by using semantic similarity

Toti, D.
2012

Abstract

We propose a methodology to automatically discover characterizing knowledge from textual sources, with the purpose of semantically categorizing them and clustering them together according to their subjects. Such a methodology is based upon several challenging steps, like terminology extraction and disambiguation, semantic similarity identification via ontology alignment, and a core pattern-based strategy for automatic ontology building. This methodology was originally devised as an extension of PRAISED, our abbreviation identification and resolution proposal, with the purpose of allowing us to resolve previously unresolvable abbreviations, whose explanation either escapes the system's proximity-based approach or is not found within the very source text they are featured in. By moving from a paper-by-paper, mainly syntactical process to a corpus-based, semantic approach, it will be in fact possible to dramatically enhance our system in terms of its resolution capabilities. Nevertheless, the strategy we present here is not tied to this specific task, but is instead of relevance for a variety of contexts, and might therefore find a far wider applicability for other advanced knowledge extraction and discovery systems. Copyright (c) 2012 - Edizioni Libreria Progetto and the authors.
Inglese
Proceedings of the 20th Italian Symposium on Advanced Database Systems, SEBD 2012
20th Italian Symposium on Advanced Database Systems, SEBD 2012
Venice, ita
Paper
24-giu-2012
27-giu-2012
9788896477236
Università Ca' Foscari
Atzeni, P., Polticelli, F., Toti, D., Knowledge discovery from textual sources by using semantic similarity, Paper, in Proceedings of the 20th Italian Symposium on Advanced Database Systems, SEBD 2012, (Venice, ita, 24-27 June 2012), Università Ca' Foscari, Venezia 2012: 213-220 [http://hdl.handle.net/10807/165882]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/165882
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact