IRIS UniCatt

Texts are not monolithic entities but rather coherent collections of micro illocutionary acts which help to convey a unitary message of content and purpose. Identifying such text segments is challenging because they require a fine-grained level of analysis even within a single sentence. At the same time, accessing them facilitates the analysis of the communicative functions of a text as well as the identification of relevant information. We propose an empirical framework for modelling micro illocutionary acts at clause level, that we call content types, grounded on linguistic theories of text types, in particular on the framework proposed by Werlich in 1976. We make available a newly annotated corpus of 279 documents (for a total of more than 180,000 tokens) belonging to different genres and temporal periods, based on a dedicated annotation scheme. We obtain an average Cohen’s kappa of 0.89 at token level. We achieve an average F1 score of 74.99% on the automatic classification of content types using a bi-LSTM model. Similar results are obtained on contemporary and historical documents, while performances on genres are more varied. This work promotes a discourse-oriented approach to information extraction and cross-fertilisation across disciplines through a computationally-aided linguistic analysis.

Caselli, T., Sprugnoli, R., Moretti, G., Identifying communicative functions in discourse with content types, <<LANGUAGE RESOURCES AND EVALUATION>>, 2022; 56 (2): 417-450. [doi:10.1007/s10579-021-09554-4] [https://hdl.handle.net/10807/309460]

Identifying communicative functions in discourse with content types

Caselli T.;Sprugnoli, Rachele;Moretti G.^Software

2022

Abstract

Texts are not monolithic entities but rather coherent collections of micro illocutionary acts which help to convey a unitary message of content and purpose. Identifying such text segments is challenging because they require a fine-grained level of analysis even within a single sentence. At the same time, accessing them facilitates the analysis of the communicative functions of a text as well as the identification of relevant information. We propose an empirical framework for modelling micro illocutionary acts at clause level, that we call content types, grounded on linguistic theories of text types, in particular on the framework proposed by Werlich in 1976. We make available a newly annotated corpus of 279 documents (for a total of more than 180,000 tokens) belonging to different genres and temporal periods, based on a dedicated annotation scheme. We obtain an average Cohen’s kappa of 0.89 at token level. We achieve an average F1 score of 74.99% on the automatic classification of content types using a bi-LSTM model. Similar results are obtained on contemporary and historical documents, while performances on genres are more varied. This work promotes a discourse-oriented approach to information extraction and cross-fertilisation across disciplines through a computationally-aided linguistic analysis.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2022
			
	Lingua del contenuto
	
				Inglese
			
	Nome del periodico
	
				LANGUAGE RESOURCES AND EVALUATION
			
	DOI del contributo
	
				https://dx.doi.org/10.1007/s10579-021-09554-4
			
	URL alternativo
	
				https://link.springer.com/article/10.1007/s10579-021-09554-4
			
	Citazione
	
				Caselli, T., Sprugnoli, R., Moretti, G., Identifying communicative functions in discourse with content types, <<LANGUAGE RESOURCES AND EVALUATION>>, 2022;  56 (2): 417-450. [doi:10.1007/s10579-021-09554-4] [https://hdl.handle.net/10807/309460]
			
	Appare nelle tipologie:
	
				Articolo in rivista, Nota a sentenza

File in questo prodotto:

File	Dimensione	Formato
unpaywall-bitstream-846338427.pdf accesso aperto Tipologia file ?: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 824.05 kB Formato Adobe PDF Visualizza/Apri	824.05 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/309460

Citazioni

1

4

1

social impact