We present a lexical-based investigation into the corpus of the opera omnia of Seneca. By applying a number of statistical techniques to textual data we aim to automatically collect similar texts into closely related groups. We demonstrate that our objective and unsupervised method is able to distinguish the texts by work and genre.

Cantaluppi, G., Passarotti, M. C., Clustering the Corpus of Seneca: A Lexical-Based Approach, in Carpita, M., Brentari, E., Qannari, E. M. (ed.), Advances in Latent Variables. Methods, Models and Applications, Springer, Heidelberg 2014: <<STUDIES IN THEORETICAL AND APPLIED STATISTICS>>, 13- 25. 10.1007/10104_2014_6 [http://hdl.handle.net/10807/62228]

Clustering the Corpus of Seneca: A Lexical-Based Approach

Cantaluppi, Gabriele;Passarotti, Marco Carlo
2014

Abstract

We present a lexical-based investigation into the corpus of the opera omnia of Seneca. By applying a number of statistical techniques to textual data we aim to automatically collect similar texts into closely related groups. We demonstrate that our objective and unsupervised method is able to distinguish the texts by work and genre.
Inglese
Advances in Latent Variables. Methods, Models and Applications
978-3-319-02966-5
Springer
Cantaluppi, G., Passarotti, M. C., Clustering the Corpus of Seneca: A Lexical-Based Approach, in Carpita, M., Brentari, E., Qannari, E. M. (ed.), Advances in Latent Variables. Methods, Models and Applications, Springer, Heidelberg 2014: <<STUDIES IN THEORETICAL AND APPLIED STATISTICS>>, 13- 25. 10.1007/10104_2014_6 [http://hdl.handle.net/10807/62228]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/62228
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact