IRIS UniCatt

Text re-use describes the spoken and written repetition of information. Historical text re-use, with its longer time span, embraces a larger set of morphological, linguistic, syntactic, semantic and copying variations, thus adding complication to text-reuse detection. Furthermore, it increases the chances of redundancy in a digital library. In Natural Language Processing it is crucial to remove these redundancies before we can apply any kind of machine learning techniques to the text. In Humanities, these redundancies foreground textual criticism and allow scholars to identify lines of transmission. Identification can be accomplished by way of automatic or semi-automatic methods. Text re-use algorithms, however, are of squared complexity and call for higher computational power. The present paper addresses this issue of complexity, with a particular focus on its algorithmic implications and solutions.

Marco, B., Franzini, G., Emily, F., Maria, M., Scaling historical text re-use, Paper, in Proceedings of the 2014 IEEE International Conference on Big Data (Big Data), (Washington, DC, 27-30 October 2014), N/A, Washington, DC 2014: 23-31. 10.1109/BigData.2014.7004449 [http://hdl.handle.net/10807/127330]

Scaling historical text re-use

Franzini, Greta^Secondo;Emily Franzini^Penultimo;Maria Moritz^Ultimo

2014

Abstract

Text re-use describes the spoken and written repetition of information. Historical text re-use, with its longer time span, embraces a larger set of morphological, linguistic, syntactic, semantic and copying variations, thus adding complication to text-reuse detection. Furthermore, it increases the chances of redundancy in a digital library. In Natural Language Processing it is crucial to remove these redundancies before we can apply any kind of machine learning techniques to the text. In Humanities, these redundancies foreground textual criticism and allow scholars to identify lines of transmission. Identification can be accomplished by way of automatic or semi-automatic methods. Text re-use algorithms, however, are of squared complexity and call for higher computational power. The present paper addresses this issue of complexity, with a particular focus on its algorithmic implications and solutions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2014
			
	Lingua del contenuto
	
				Inglese
			
	Titolo del volume che raccoglie gli atti
	
				Proceedings of the 2014 IEEE International Conference on Big Data (Big Data)
			
	Denominazione evento
	
				2014 IEEE International Conference on Big Data (Big Data)
			
	Luogo dell'evento
	
				Washington, DC
			
	Tipo di contributo
	
				Paper
			
	Data inizio evento
	
				27-ott-2014
			
	Data fine evento
	
				30-ott-2014
			
	Editore
	
				N/A
			
	DOI del contributo
	
				https://dx.doi.org/10.1109/BigData.2014.7004449
			
	Citazione
	
				Marco, B., Franzini, G., Emily, F., Maria, M., Scaling historical text re-use,  Paper, in Proceedings of the 2014 IEEE International Conference on Big Data (Big Data), (Washington, DC,  27-30 October 2014), N/A, Washington, DC 2014: 23-31. 10.1109/BigData.2014.7004449 [http://hdl.handle.net/10807/127330]
			
	Appare nelle tipologie:
	
				Paper, Selected paper, Contributed paper, Working paper, Poster, Poster paper, Comunicazione, Relazione (in volume)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/127330

Citazioni

ND

1

0

social impact