: Electronic Health Records (EHRs) contain a wealth of unstructured patient data, making it challenging for physicians to do informed decisions. In this paper, we introduce a Natural Language Processing (NLP) approach for the extraction of therapies, diagnosis, and symptoms from ambulatory EHRs of patients with chronic Lupus disease. We aim to demonstrate the effort of a comprehensive pipeline where a rule-based system is combined with text segmentation, transformer-based topic analysis and clinical ontology, in order to enhance text preprocessing and automate rules' identification. Our approach is applied on a sub-cohort of 56 patients, with a total of 750 EHRs written in Italian language, achieving an Accuracy and an F-score over 97% and 90% respectively, in the three extracted domains. This work has the potential to be integrated with EHR systems to automate information extraction, minimizing the human intervention, and providing personalized digital solutions in the chronic Lupus disease domain.

Lilli, L., Bosello, S. L., Antenucci, L., Patarnello, S., Ortolan, A., Lenkowicz, J., Gorini, M., Castellino, G., Cesario, A., D'Agostino, M. A., Masciocchi, C., A Comprehensive Natural Language Processing Pipeline for the Chronic Lupus Disease, <<Studies in health technology and informatics>>, 2024; 316 (aug): 909-913. [doi:10.3233/shti240559] [https://hdl.handle.net/10807/298479]

A Comprehensive Natural Language Processing Pipeline for the Chronic Lupus Disease

Lilli, Livia;Bosello, Silvia Laura;Antenucci, Laura;Ortolan, Augusta;Lenkowicz, Jacopo;Gorini, Marco;Cesario, Alfredo;D'Agostino, Maria Antonietta;Masciocchi, Carlotta
2024

Abstract

: Electronic Health Records (EHRs) contain a wealth of unstructured patient data, making it challenging for physicians to do informed decisions. In this paper, we introduce a Natural Language Processing (NLP) approach for the extraction of therapies, diagnosis, and symptoms from ambulatory EHRs of patients with chronic Lupus disease. We aim to demonstrate the effort of a comprehensive pipeline where a rule-based system is combined with text segmentation, transformer-based topic analysis and clinical ontology, in order to enhance text preprocessing and automate rules' identification. Our approach is applied on a sub-cohort of 56 patients, with a total of 750 EHRs written in Italian language, achieving an Accuracy and an F-score over 97% and 90% respectively, in the three extracted domains. This work has the potential to be integrated with EHR systems to automate information extraction, minimizing the human intervention, and providing personalized digital solutions in the chronic Lupus disease domain.
2024
Inglese
Lilli, L., Bosello, S. L., Antenucci, L., Patarnello, S., Ortolan, A., Lenkowicz, J., Gorini, M., Castellino, G., Cesario, A., D'Agostino, M. A., Masciocchi, C., A Comprehensive Natural Language Processing Pipeline for the Chronic Lupus Disease, <<Studies in health technology and informatics>>, 2024; 316 (aug): 909-913. [doi:10.3233/shti240559] [https://hdl.handle.net/10807/298479]
File in questo prodotto:
File Dimensione Formato  
compreh.pdf

accesso aperto

Tipologia file ?: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 275.61 kB
Formato Adobe PDF
275.61 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/298479
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact