We present a lexical-based investigation into the corpus of the opera omnia of Seneca. By applying a number of statistical techniques to textual data we aim to automatically collect similar texts into closely related groups. Comparison with the orationes of Cicero, with the Latin New Testament by Jerome (Vulgata) and with the opera maiora of Thomas Aquinas is performed as well. We demonstrate that our objective and unsupervised method is able to distinguish the texts by work, genre and author.
Passarotti, M. C., Cantaluppi, G., A Statistical Investigation into the Corpus of Seneca, in Poccetti, P. (ed.), Latinitatis Rationes. Descriptive and Historical Accounts for the Latin Language, De Gruyter, Berlin - Boston 2016: 684- 706 [http://hdl.handle.net/10807/90593]
A Statistical Investigation into the Corpus of Seneca
Passarotti, Marco CarloPrimo
;Cantaluppi, GabrieleSecondo
2016
Abstract
We present a lexical-based investigation into the corpus of the opera omnia of Seneca. By applying a number of statistical techniques to textual data we aim to automatically collect similar texts into closely related groups. Comparison with the orationes of Cicero, with the Latin New Testament by Jerome (Vulgata) and with the opera maiora of Thomas Aquinas is performed as well. We demonstrate that our objective and unsupervised method is able to distinguish the texts by work, genre and author.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.