Sentence splitting, that is the segmentation of the raw input text into sentences, is a fundamental step in text processing. Although it is considered a solved task for texts such as news articles and Wikipedia pages, the performance of systems can vary greatly depending on the text genre. This paper presents the evaluation of the performance of eight sentence splitting tools adopting different approaches (rule-based, supervised, semi-supervised, and unsupervised learning) on Italian 19th-century novels, a genre that has not received sufficient attention so far but which can be an interesting common ground between Natural Language Processing and Digital Humanities.

Redaelli, A., Sprugnoli, R., Is Sentence Splitting a Solved Task? Experiments to the Intersection Between NLP and Italian Linguistics, Paper, in CEUR Workshop Proceedings, (PISA -- ITA, 04-06 December 2024), CEUR-WS, PISA -- ITA 2024: 813-820 [https://hdl.handle.net/10807/309458]

Is Sentence Splitting a Solved Task? Experiments to the Intersection Between NLP and Italian Linguistics

Sprugnoli, Rachele
2024

Abstract

Sentence splitting, that is the segmentation of the raw input text into sentences, is a fundamental step in text processing. Although it is considered a solved task for texts such as news articles and Wikipedia pages, the performance of systems can vary greatly depending on the text genre. This paper presents the evaluation of the performance of eight sentence splitting tools adopting different approaches (rule-based, supervised, semi-supervised, and unsupervised learning) on Italian 19th-century novels, a genre that has not received sufficient attention so far but which can be an interesting common ground between Natural Language Processing and Digital Humanities.
2024
Inglese
CEUR Workshop Proceedings
10th Italian Conference on Computational Linguistics, CLiC-it 2024
PISA -- ITA
Paper
4-dic-2024
6-dic-2024
979-12-210-7060-6
CEUR-WS
Redaelli, A., Sprugnoli, R., Is Sentence Splitting a Solved Task? Experiments to the Intersection Between NLP and Italian Linguistics, Paper, in CEUR Workshop Proceedings, (PISA -- ITA, 04-06 December 2024), CEUR-WS, PISA -- ITA 2024: 813-820 [https://hdl.handle.net/10807/309458]
File in questo prodotto:
File Dimensione Formato  
2024.clicit-1.88.pdf

accesso aperto

Tipologia file ?: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 949.37 kB
Formato Adobe PDF
949.37 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/309458
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact