The large availability of processable textual resources for Classical Latin has made it possible to study Latin literature through methods and tools that support distant reading. This paper describes a number of experiments carried out to test the possibility of investigating the thematic distribution of the Classical Latin corpus Opera Latina by means of topic modeling. For this purpose, we train, optimize and compare two neural models, Product-of-Experts LDA (ProdLDA) and Embedded Topic Model (ETM), opportunely revised to deal with the textual data from a Classical Latin corpus, to evaluate which one performs better both on the basis of topic diversity and topic coherence metrics, and from a human judgment point of view. Our results show that the topics extracted by neural models are coherent and interpretable and that they are significant from the perspective of a Latin scholar. The source code of the proposed model is available at https://github.com/MIND-Lab/LatinProdLDA.

Martinelli, G., Impicciché, P., Fersini, E., Mambrini, F., Passarotti, M. C., Exploring Neural Topic Modeling on a Classical Latin Corpus, in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), (TORINO -- ITA, 22-24 May 2024), ELRA and ICCL, TORINO -- ITA 2024: 6929-6934 [https://hdl.handle.net/10807/278617]

Exploring Neural Topic Modeling on a Classical Latin Corpus

Mambrini, Francesco;Passarotti, Marco Carlo
2024

Abstract

The large availability of processable textual resources for Classical Latin has made it possible to study Latin literature through methods and tools that support distant reading. This paper describes a number of experiments carried out to test the possibility of investigating the thematic distribution of the Classical Latin corpus Opera Latina by means of topic modeling. For this purpose, we train, optimize and compare two neural models, Product-of-Experts LDA (ProdLDA) and Embedded Topic Model (ETM), opportunely revised to deal with the textual data from a Classical Latin corpus, to evaluate which one performs better both on the basis of topic diversity and topic coherence metrics, and from a human judgment point of view. Our results show that the topics extracted by neural models are coherent and interpretable and that they are significant from the perspective of a Latin scholar. The source code of the proposed model is available at https://github.com/MIND-Lab/LatinProdLDA.
2024
Inglese
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
TORINO -- ITA
22-mag-2024
24-mag-2024
978-2-493814-10-4
ELRA and ICCL
Martinelli, G., Impicciché, P., Fersini, E., Mambrini, F., Passarotti, M. C., Exploring Neural Topic Modeling on a Classical Latin Corpus, in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), (TORINO -- ITA, 22-24 May 2024), ELRA and ICCL, TORINO -- ITA 2024: 6929-6934 [https://hdl.handle.net/10807/278617]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/278617
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact