IRIS UniCatt

We propose a methodology to identify and resolve protein-related abbreviations found in the full texts of scientific papers, as part of a semi-automatic process implemented in our PRAISED framework. The identification of biological acronyms is carried out via an effective syntactical approach, by taking advantage of lexical clues and using mostly domain-independent metrics, resulting in considerably high levels of recall as well as extremely low execution time. The subsequent abbreviation resolution uses both syntactical and semantic criteria in order to match an abbreviation with its potential explanation, as discovered among a number of contiguous words proportional to the abbreviation's length. We have tested our system against the Medstract Gold Standard corpus and a relevant set of manually annotated PubMed papers, obtaining significant results and high performance levels, while at the same time allowing for great customization, lightness and scalability. © 2011 Springer-Verlag.

Atzeni, P., Polticelli, F., Toti, D., An automatic identification and resolution system for protein-related abbreviations in scientific papers, Paper, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (Torino, ita, 27-29 April 2011), Springer Verlag, N/A 2011:<<LECTURE NOTES IN COMPUTER SCIENCE>>,6623 171-176. 10.1007/978-3-642-20389-3_18 [http://hdl.handle.net/10807/163936]

An automatic identification and resolution system for protein-related abbreviations in scientific papers

Atzeni P.;Polticelli F.;Toti, Daniele

2011

Abstract

We propose a methodology to identify and resolve protein-related abbreviations found in the full texts of scientific papers, as part of a semi-automatic process implemented in our PRAISED framework. The identification of biological acronyms is carried out via an effective syntactical approach, by taking advantage of lexical clues and using mostly domain-independent metrics, resulting in considerably high levels of recall as well as extremely low execution time. The subsequent abbreviation resolution uses both syntactical and semantic criteria in order to match an abbreviation with its potential explanation, as discovered among a number of contiguous words proportional to the abbreviation's length. We have tested our system against the Medstract Gold Standard corpus and a relevant set of manually annotated PubMed papers, obtaining significant results and high performance levels, while at the same time allowing for great customization, lightness and scalability. © 2011 Springer-Verlag.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2011
			
	Lingua del contenuto
	
				Inglese
			
	Titolo del volume che raccoglie gli atti
	
				Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
			
	Denominazione evento
	
				9th European Conference on Evolutionary Computation, Machine Learning, and Data Mining in Bioinformatics, EvoBIO 2011
			
	Luogo dell'evento
	
				Torino, ita
			
	Tipo di contributo
	
				Paper
			
	Data inizio evento
	
				27-apr-2011
			
	Data fine evento
	
				29-apr-2011
			
	ISBN della pubblicazione
	
				978-3-642-20388-6
			
	Nome della collana/serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Editore
	
				Springer Verlag
			
	DOI del contributo
	
				https://dx.doi.org/10.1007/978-3-642-20389-3_18
			
	Citazione
	
				Atzeni, P., Polticelli, F., Toti, D., An automatic identification and resolution system for protein-related abbreviations in scientific papers,  Paper, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (Torino, ita,  27-29 April 2011), Springer Verlag, N/A 2011:<<LECTURE NOTES IN COMPUTER SCIENCE>>,6623 171-176. 10.1007/978-3-642-20389-3_18 [http://hdl.handle.net/10807/163936]
			
	Appare nelle tipologie:
	
				Paper, Selected paper, Contributed paper, Working paper, Poster, Poster paper, Comunicazione, Relazione (in volume)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/163936

Citazioni

ND

13

8

social impact