IRIS UniCatt

The large availability of processable textual resources for Classical Latin has made it possible to study Latin literature through methods and tools that support distant reading. This paper describes a number of experiments carried out to test the possibility of investigating the thematic distribution of the Classical Latin corpus Opera Latina by means of topic modeling. For this purpose, we train, optimize and compare two neural models, Product-of-Experts LDA (ProdLDA) and Embedded Topic Model (ETM), opportunely revised to deal with the textual data from a Classical Latin corpus, to evaluate which one performs better both on the basis of topic diversity and topic coherence metrics, and from a human judgment point of view. Our results show that the topics extracted by neural models are coherent and interpretable and that they are significant from the perspective of a Latin scholar. The source code of the proposed model is available at https://github.com/MIND-Lab/LatinProdLDA.

Martinelli, G., Impicciché, P., Fersini, E., Mambrini, F., Passarotti, M. C., Exploring Neural Topic Modeling on a Classical Latin Corpus, in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), (TORINO -- ITA, 22-24 May 2024), ELRA and ICCL, TORINO -- ITA 2024: 6929-6934 [https://hdl.handle.net/10807/278617]

Exploring Neural Topic Modeling on a Classical Latin Corpus

Martinelli Ginevra;Impicciché Paola;Fersini Elisabetta;Mambrini, Francesco;Passarotti, Marco Carlo

2024

Abstract

The large availability of processable textual resources for Classical Latin has made it possible to study Latin literature through methods and tools that support distant reading. This paper describes a number of experiments carried out to test the possibility of investigating the thematic distribution of the Classical Latin corpus Opera Latina by means of topic modeling. For this purpose, we train, optimize and compare two neural models, Product-of-Experts LDA (ProdLDA) and Embedded Topic Model (ETM), opportunely revised to deal with the textual data from a Classical Latin corpus, to evaluate which one performs better both on the basis of topic diversity and topic coherence metrics, and from a human judgment point of view. Our results show that the topics extracted by neural models are coherent and interpretable and that they are significant from the perspective of a Latin scholar. The source code of the proposed model is available at https://github.com/MIND-Lab/LatinProdLDA.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2024
			
	Lingua del contenuto
	
				Inglese
			
	Titolo del volume che raccoglie gli atti
	
				Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
			
	Denominazione evento
	
				2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
			
	Luogo dell'evento
	
				TORINO -- ITA
			
	Data inizio evento
	
				22-mag-2024
			
	Data fine evento
	
				24-mag-2024
			
	ISBN del volume
	
				978-2-493814-10-4
			
	Editore
	
				ELRA and ICCL
			
	URL alternativo
	
				https://aclanthology.org/2024.lrec-main.606
			
	Citazione
	
				Martinelli, G., Impicciché, P., Fersini, E., Mambrini, F., Passarotti, M. C.,  Exploring Neural Topic Modeling on a Classical Latin Corpus, in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), (TORINO -- ITA,  22-24 May 2024), ELRA and ICCL, TORINO -- ITA 2024: 6929-6934 [https://hdl.handle.net/10807/278617]
			
	Appare nelle tipologie:
	
				Atti di Convegno, Congresso, Giornate di studio, ecc., Workshop (in volume)

File in questo prodotto:

File	Dimensione	Formato
2024_Martinelli-et-alii_topic-modeling_LREC.pdf accesso aperto Tipologia file ?: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 476.93 kB Formato Adobe PDF Visualizza/Apri	476.93 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/278617

Citazioni

ND

1

ND

social impact