We propose a framework for identifying, disambiguating and storing protein-related abbreviations as found in the full texts of scientific papers, in order to build and maintain a publicly available abbreviation repository via a semi-automatic process. This process involves information extraction methods and techniques for acronym identification and resolution, based on lexical clues and syntactical, largely domain-independent criteria. A dictionary and an ontology for proteins provide the means for matching and disambiguating the biological entities. User feedback is gathered at the end of the process and the confirmed entries are then stored and made available to the scientific community for further reviewing. © 2011 IEEE.

Atzeni, P., Polticelli, F., Toti, D., A framework for semi-automatic identification, disambiguation and storage of protein-related abbreviations in scientific literature, Paper, in Proceedings - International Conference on Data Engineering, (Hannover, deu, 11-16 April 2011), N/A, Hanover 2011: 59-61. 10.1109/ICDEW.2011.5767646 [http://hdl.handle.net/10807/163320]

A framework for semi-automatic identification, disambiguation and storage of protein-related abbreviations in scientific literature

Toti, Daniele
Primo
2011

Abstract

We propose a framework for identifying, disambiguating and storing protein-related abbreviations as found in the full texts of scientific papers, in order to build and maintain a publicly available abbreviation repository via a semi-automatic process. This process involves information extraction methods and techniques for acronym identification and resolution, based on lexical clues and syntactical, largely domain-independent criteria. A dictionary and an ontology for proteins provide the means for matching and disambiguating the biological entities. User feedback is gathered at the end of the process and the confirmed entries are then stored and made available to the scientific community for further reviewing. © 2011 IEEE.
2011
Inglese
Proceedings - International Conference on Data Engineering
2011 IEEE 27th International Conference on Data Engineering Workshops, ICDE 2011
Hannover, deu
Paper
11-apr-2011
16-apr-2011
978-1-4244-9195-7
N/A
Atzeni, P., Polticelli, F., Toti, D., A framework for semi-automatic identification, disambiguation and storage of protein-related abbreviations in scientific literature, Paper, in Proceedings - International Conference on Data Engineering, (Hannover, deu, 11-16 April 2011), N/A, Hanover 2011: 59-61. 10.1109/ICDEW.2011.5767646 [http://hdl.handle.net/10807/163320]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/163320
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? ND
social impact