Purpose: The aim of this study was to define the capability of ChatGPT-4 and Google Gemini in analyzing detailed glaucoma case descriptions and suggesting an accurate surgical plan. Methods: Retrospective analysis of 60 medical records of surgical glaucoma was divided into “ordinary” (n = 40) and “challenging” (n = 20) scenarios. Case descriptions were entered into ChatGPT and Bard’s interfaces with the question “What kind of surgery would you perform?” and repeated three times to analyze the answers’ consistency. After collecting the answers, we assessed the level of agreement with the unified opinion of three glaucoma surgeons. Moreover, we graded the quality of the responses with scores from 1 (poor quality) to 5 (excellent quality), according to the Global Quality Score (GQS) and compared the results. Results: ChatGPT surgical choice was consistent with those of glaucoma specialists in 35/60 cases (58%), compared to 19/60 (32%) of Gemini (p = 0.0001). Gemini was not able to complete the task in 16 cases (27%). Trabeculectomy was the most frequent choice for both chatbots (53% and 50% for ChatGPT and Gemini, respectively). In “challenging” cases, ChatGPT agreed with specialists in 9/20 choices (45%), outperforming Google Gemini performances (4/20, 20%). Overall, GQS scores were 3.5 ± 1.2 and 2.1 ± 1.5 for ChatGPT and Gemini (p = 0.002). This difference was even more marked if focusing only on “challenging” cases (1.5 ± 1.4 vs. 3.0 ± 1.5, p = 0.001). Conclusion: ChatGPT-4 showed a good analysis performance for glaucoma surgical cases, either ordinary or challenging. On the other side, Google Gemini showed strong limitations in this setting, presenting high rates of unprecise or missed answers.

Carlà, M. M., Gambini, G., Baldascino, A., Boselli, F., Giannuzzi, F., Margollicci, F., Rizzo, S., Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison, <<GRAEFE'S ARCHIVE FOR CLINICAL AND EXPERIMENTAL OPHTHALMOLOGY>>, 2024; 262 (9): 2945-2959. [doi:10.1007/s00417-024-06470-5] [https://hdl.handle.net/10807/305078]

Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison

Gambini, Gloria;Baldascino, Antonio;Boselli, Francesco;Giannuzzi, Federico;Margollicci, Fabio;Rizzo, Stanislao
2024

Abstract

Purpose: The aim of this study was to define the capability of ChatGPT-4 and Google Gemini in analyzing detailed glaucoma case descriptions and suggesting an accurate surgical plan. Methods: Retrospective analysis of 60 medical records of surgical glaucoma was divided into “ordinary” (n = 40) and “challenging” (n = 20) scenarios. Case descriptions were entered into ChatGPT and Bard’s interfaces with the question “What kind of surgery would you perform?” and repeated three times to analyze the answers’ consistency. After collecting the answers, we assessed the level of agreement with the unified opinion of three glaucoma surgeons. Moreover, we graded the quality of the responses with scores from 1 (poor quality) to 5 (excellent quality), according to the Global Quality Score (GQS) and compared the results. Results: ChatGPT surgical choice was consistent with those of glaucoma specialists in 35/60 cases (58%), compared to 19/60 (32%) of Gemini (p = 0.0001). Gemini was not able to complete the task in 16 cases (27%). Trabeculectomy was the most frequent choice for both chatbots (53% and 50% for ChatGPT and Gemini, respectively). In “challenging” cases, ChatGPT agreed with specialists in 9/20 choices (45%), outperforming Google Gemini performances (4/20, 20%). Overall, GQS scores were 3.5 ± 1.2 and 2.1 ± 1.5 for ChatGPT and Gemini (p = 0.002). This difference was even more marked if focusing only on “challenging” cases (1.5 ± 1.4 vs. 3.0 ± 1.5, p = 0.001). Conclusion: ChatGPT-4 showed a good analysis performance for glaucoma surgical cases, either ordinary or challenging. On the other side, Google Gemini showed strong limitations in this setting, presenting high rates of unprecise or missed answers.
2024
Inglese
Carlà, M. M., Gambini, G., Baldascino, A., Boselli, F., Giannuzzi, F., Margollicci, F., Rizzo, S., Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison, <<GRAEFE'S ARCHIVE FOR CLINICAL AND EXPERIMENTAL OPHTHALMOLOGY>>, 2024; 262 (9): 2945-2959. [doi:10.1007/s00417-024-06470-5] [https://hdl.handle.net/10807/305078]
File in questo prodotto:
File Dimensione Formato  
large.pdf

accesso aperto

Tipologia file ?: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 3.16 MB
Formato Adobe PDF
3.16 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/305078
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 25
  • ???jsp.display-item.citation.isi??? 14
social impact