Large language models (LLMs) in the evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity, and Bard

Infante, Amato; Gaudino, Simona; Orsini, Federico; Del Ciello, Annemilia; Gulli', Consolato; Merlino, Biagio; Natale, Luigi; Iezzi, Roberto; Sala, Evis

doi:10.1016/j.crad.2023.11.011

Large language models (LLMs), especially those based on the Generative Pre-trained Transformer (GPT) architecture, have becomewidely popularand have beenappliedinvarious ﬁelds due to their ability to provide written responses to a diverse range of queries swiftly and accurately. LLMs have demonstrated a transformative and potentially revolutionary capacity in multiple medical subﬁelds, including radiology.1 A promising utilisation of these models is to streamline free-text radiology reports into concise or structured formats, 2,3 thereby enhancing accessibility and organisation of extensive information, potentially facilitating communication among medical professionals. Furthermore, incorporating automated radiological structured reporting systems could enhance clinical procedures, standardising language across institutions, promoting effective communication among healthcare experts, and improving the efﬁciency of data extraction for research purposes. The present authors share their preliminary results with three LLMs evaluating their accuracy in extracting emergency data recognition within a human-generated emergency radiology report.

Infante, A., Gaudino, S., Orsini, F., Del Ciello, A., Gulli', C., Merlino, B., Natale, L., Iezzi, R., Sala, E., Large language models (LLMs) in the evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity, and Bard, <<CLINICAL RADIOLOGY>>, 2024; 79 (2): 102-106. [doi:10.1016/j.crad.2023.11.011] [https://hdl.handle.net/10807/271349]