Sentence splitting, that is the segmentation of the raw input text into sentences, is a fundamental step in text processing. Although it is considered a solved task for texts such as news articles and Wikipedia pages, the performance of systems can vary greatly depending on the text genre. This paper presents the evaluation of the performance of eight sentence splitting tools adopting different approaches (rule-based, supervised, semi-supervised, and unsupervised learning) on Italian 19th-century novels, a genre that has not received sufficient attention so far but which can be an interesting common ground between Natural Language Processing and Digital Humanities.
Redaelli, A., Sprugnoli, R., Is Sentence Splitting a Solved Task? Experiments to the Intersection Between NLP and Italian Linguistics, Paper, in CEUR Workshop Proceedings, (PISA -- ITA, 04-06 December 2024), CEUR-WS, PISA -- ITA 2024: 813-820 [https://hdl.handle.net/10807/309458]
Is Sentence Splitting a Solved Task? Experiments to the Intersection Between NLP and Italian Linguistics
Sprugnoli, Rachele
2024
Abstract
Sentence splitting, that is the segmentation of the raw input text into sentences, is a fundamental step in text processing. Although it is considered a solved task for texts such as news articles and Wikipedia pages, the performance of systems can vary greatly depending on the text genre. This paper presents the evaluation of the performance of eight sentence splitting tools adopting different approaches (rule-based, supervised, semi-supervised, and unsupervised learning) on Italian 19th-century novels, a genre that has not received sufficient attention so far but which can be an interesting common ground between Natural Language Processing and Digital Humanities.File | Dimensione | Formato | |
---|---|---|---|
2024.clicit-1.88.pdf
accesso aperto
Tipologia file ?:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
949.37 kB
Formato
Adobe PDF
|
949.37 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.