We present a lexical-based investigation into the corpus of the opera omnia of Seneca. By applying a number of statistical techniques to textual data we aim to automatically collect similar texts into closely related groups. We demonstrate that our objective and unsupervised method is able to distinguish the texts by work and genre.
Cantaluppi, G., Passarotti, M. C., Clustering the Corpus of Seneca: A Lexical-Based Approach, in Carpita, M., Brentari, E., Qannari, E. M. (ed.), Advances in Latent Variables. Methods, Models and Applications, Springer, Heidelberg 2014: <<STUDIES IN THEORETICAL AND APPLIED STATISTICS>>, 13- 25. 10.1007/10104_2014_6 [http://hdl.handle.net/10807/62228]
Clustering the Corpus of Seneca: A Lexical-Based Approach
Cantaluppi, Gabriele;Passarotti, Marco Carlo
2014
Abstract
We present a lexical-based investigation into the corpus of the opera omnia of Seneca. By applying a number of statistical techniques to textual data we aim to automatically collect similar texts into closely related groups. We demonstrate that our objective and unsupervised method is able to distinguish the texts by work and genre.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.