Research Article
BibTex RIS Cite

Interpreting Chest X-ray with ChatGPT: Can It Serve as a Tool for Justifying Computed Tomography?

Year 2025, Volume: 2 Issue: 2, 118 - 126, 30.06.2025
https://doi.org/10.70058/cjm.1633438

Abstract

Objective: The aim of this study was to test the success of ChatGPT-4 in evaluating chest radiographs and detecting abnormal findings, and then to demonstrate its utility in computed tomography (CT) justification.

Methods: This study included 59 patients (20 patients in the first phase, and 39 patients in the second phase) from a publicly available chest X-ray dataset. X-rays were evaluated by an experienced chest radiologist (as gold standard), two radiology residents, and ChatGPT, first as normal-abnormal and then whether CT was needed if abnormal. Finally, the ChatGPT and two radiology residents' decisions were compared with the gold standard decision of the expert radiologist to obtain an accuracy value.

Results: The accuracy of Resident 1, Resident 2, and ChatGPT for normal-abnormal labeling was 76.27%, 93.22%, and 76.27%, respectively, for a total of 59 patients. The accuracy of Resident 1, Resident 2, and ChatGPT for CT necessity was 67.80%, 72.88%, and 66.10%, respectively. The expert radiologist determined that CT was not necessary in 30 patients. Of these 30 patients, Resident 1, Resident 2, and ChatGPT answered incorrectly in 14, 12, and 15 patients, respectively. There is no statistically significant difference between the responses of Resident 1, Resident 2, and ChatGPT for CT necessity (Chi-square, p=0.731).

Conclusion: The results of this study show that ChatGPT-4 is promising for chest X-ray interpretation and justification of CT scans. However, large language models such as ChatGPT, which still have major limitations, should be trained with a much larger number of radiology images.

Ethical Statement

No ethics committee approval is required in this article since a publicly available dataset is used. The principles of the Declaration of Helsinki were followed during this study.

References

  • Mettler FA, Mahesh M, Bhargavan Chatfield M, et al. NCRP Reprt 184: Medical Radiation Exposure of Patients in the United States. Recommendations of the National Council on Radiation Protection and Measurements; 2019.
  • E.G Friberg. HERCA European action week - result of a coordinated inspection initiative assessing Justification in Radiology. Int Conf Radiat Prot Med - Achiev Chang Pract. 2017;1–5.
  • Rastogi S, Singh R, Borse R, et al. Use of Multiphase CT Protocols in 18 Countries: Appropriateness and Radiation Doses. Can Assoc Radiol J. 2021;72(3):381-387. doi:10.1177/0846537119888390
  • Foley SJ, Bly R, Brady AP, et al. Justification of CT practices across Europe: results of a survey of national competent authorities and radiology societies. Insights Imaging. 2022;13(1):177. Published 2022 Nov 22. doi:10.1186/s13244-022-01325-1
  • American College of Radiology (ACR), Society of Advanced Body Imaging (SABI), Society for Pediatric Radiology (SPR), Society of Thoracic Radiology (STR). ACR–SABI–SPR–STR Practice Parameter for the Performance of Thoracic Computed Tomography (CT). Revised 2023. Available at: https://www.acr.org/
  • Speets AM, van der Graaf Y, Hoes AW, et al. Chest radiography in general practice: indications, diagnostic yield and consequences for patient management. Br J Gen Pract. 2006;56(529):574-578.
  • Gatt ME, Spectre G, Paltiel O, Hiller N, Stalnikowicz R. Chest radiographs in the emergency department: is the radiologist really necessary?. Postgrad Med J. 2003;79(930):214-217. doi:10.1136/pmj.79.930.214
  • Dadalı Y, Köksal D. Thorax CT findings of patients with hilar enlargement on chest X-Ray. Ann Clin Anal Med. 2020;11(3):235-238
  • FDA Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices Page. Available at: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices
  • van Leeuwen KG, Schalekamp S, Rutten MJCM, van Ginneken B, de Rooij M. Artificial intelligence in radiology: 100 commercially available products and their scientific evidence. Eur Radiol. 2021;31(6):3797-3804. doi:10.1007/s00330-021-07892-z
  • Ziegelmayer S, Marka AW, Lenhart N, et al. Evaluation of GPT-4's Chest X-Ray Impression Generation: A Reader Study on Performance and Perception. J Med Internet Res. 2023;25:e50865. Published 2023 Dec 22. doi:10.2196/50865
  • Tiu E, Talius E, Patel P, Langlotz CP, Ng AY, Rajpurkar P. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat Biomed Eng. 2022;6(12):1399-1406. doi:10.1038/s41551-022-00936-9
  • Lee KH, Lee RW, Kwon YE. Validation of a Deep Learning Chest X-ray Interpretation Model: Integrating Large-Scale AI and Large Language Models for Comparative Analysis with ChatGPT. Diagnostics (Basel). 2023;14(1):90. Published 2023 Dec 30. doi:10.3390/diagnostics14010090
  • Bhayana R. Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology. 2024;310(1):e232756. doi:10.1148/radiol.232756
  • Zaboli A, Brigo F, Sibilio S, Mian M, Turcato G. Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage?. Am J Emerg Med. 2024;79:44-47. doi:10.1016/j.ajem.2024.02.008
  • Mira FA, Favier V, Dos Santos Sobreira Nunes H, et al. Chat GPT for the management of obstructive sleep apnea: do we have a polar star?. Eur Arch Otorhinolaryngol. 2024;281(4):2087-2093. doi:10.1007/s00405-023-08270-9
  • Khan U. Revolutionizing Personalized Protein Energy Malnutrition Treatment: Harnessing the Power of Chat GPT. Ann Biomed Eng. 2024;52(5):1125-1127. doi:10.1007/s10439-023-03331-w
  • Günay S, Öztürk A, Özerol H, Yiğit Y, Erenler AK. Comparison of emergency medicine specialist, cardiologist, and chat-GPT in electrocardiography assessment. Am J Emerg Med. 2024;80:51-60. doi:10.1016/j.ajem.2024.03.017
  • Topçu Varlık A, Kaba E, Burakgazi G. The R.E.N.A.L. nephrometry scoring from CT reports with ChatGPT: example with proofs. Jpn J Radiol. 2024;42(8):929-931. doi:10.1007/s11604-024-01573-9
  • Xu S, Yang L, Kelly C. et al. ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders. 2023; Available at: http://arxiv.org/abs/2308.01317
  • Lee S, Youn J, Kim H, Kim M, Yoon SH. CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images. 2023; Available at: https://arxiv.org/abs/2310.18341v3
  • Shentu J, Al Moubayed N. CXR-IRGen: An Integrated Vision and Language Model for the Generation of Clinically Accurate Chest X-Ray Image-Report Pairs. 2024;5200–9.
  • Thawkar O, Shaker A, Mullappilly SS, Cholakkal H, Anwer RM, Khan S, vd. XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models. 2023; Available at: http://arxiv.org/abs/2306.07971
  • Kozel G, Gurses ME, Gecici NN, et al. Chat-GPT on brain tumors: An examination of Artificial Intelligence/Machine Learning's ability to provide diagnoses and treatment plans for example neuro-oncology cases. Clin Neurol Neurosurg. 2024;239:108238. doi:10.1016/j.clineuro.2024.108238
  • Brin D, Sorin V, Barash Y, et al. Assessing GPT-4 multimodal performance in radiological image analysis. Eur Radiol. 2025;35(4):1959-1965. doi:10.1007/s00330-024-11035-5
  • Chetla N, Tandon M, Chang J, Sukhija K, Patel R, Sanchez R. Evaluating ChatGPT's Efficacy in Pediatric Pneumonia Detection From Chest X-Rays: Comparative Analysis of Specialized AI Models. JMIR AI. 2025;4:e67621. Published 2025 Jan 10. doi:10.2196/67621

ChatGPT ile Akciğer Grafisi Yorumlama: Bilgisayarlı Tomografiyi Gerekçelendirmek İçin Bir Araç Olabilir mi?

Year 2025, Volume: 2 Issue: 2, 118 - 126, 30.06.2025
https://doi.org/10.70058/cjm.1633438

Abstract

Amaç: Bu çalışmanın amacı, ChatGPT-4'ün akciğer grafilerini değerlendirmede ve anormal bulguları saptamadaki başarısını test edip ardından bilgisayarlı tomografi (BT) gerekçelendirmesi konusunda faydasını ortaya koymaktır.

Gereç ve Yöntem: Bu çalışmaya kamuya açık bir akciğer röntgeni veri setinden toplam 59 hasta (ilk aşamada 20 hasta, ikinci aşamada 39 hasta) dahil edilmiştir. Röntgenler deneyimli bir göğüs radyoloğu (altın standart olarak), iki radyoloji asistanı ve ChatGPT tarafından önce normal-anormal olarak, sonra da anormalse BT gerekip gerekmediği açısından değerlendirilmiştir. Son olarak, ChatGPT ve iki radyoloji asistanının kararları, bir doğruluk değeri elde etmek için uzman radyoloğun altın standart kararı ile karşılaştırılmıştır.

Bulgular: Normal-anormal etiketleme için Asistan 1, Asistan 2 ve ChatGPT'nin doğruluğu toplam 59 hasta için sırasıyla %76,27, %93,22 ve %76,27 idi. BT gerekliliği için Asistan 1, Asistan 2 ve ChatGPT'nin doğruluğu sırasıyla %67,80, %72,88 ve %66,10'dur. Uzman radyolog 30 hastada BT'nin gerekli olmadığına karar vermiştir. Bu 30 hastanın 14'ünde Asistan 1, 12'sinde Asistan 2 ve 15'inde ChatGPT yanlış yanıt vermiştir. BT gerekliliği için Asistan 1, Asistan 2 ve ChatGPT'nin yanıtları arasında istatistiksel olarak anlamlı bir fark yoktur (Ki-kare, p=0,731).

Sonuç: Bu çalışmanın sonuçları ChatGPT-4'ün akciğer grafisi yorumlama ve BT taramalarının gerekçelendirilmesi için umut verici olduğunu göstermektedir. Bununla birlikte, ChatGPT gibi hala önemli sınırlamaları olan büyük dil modelleri, çok daha fazla sayıda radyoloji görüntüsü ile eğitilmelidir.

References

  • Mettler FA, Mahesh M, Bhargavan Chatfield M, et al. NCRP Reprt 184: Medical Radiation Exposure of Patients in the United States. Recommendations of the National Council on Radiation Protection and Measurements; 2019.
  • E.G Friberg. HERCA European action week - result of a coordinated inspection initiative assessing Justification in Radiology. Int Conf Radiat Prot Med - Achiev Chang Pract. 2017;1–5.
  • Rastogi S, Singh R, Borse R, et al. Use of Multiphase CT Protocols in 18 Countries: Appropriateness and Radiation Doses. Can Assoc Radiol J. 2021;72(3):381-387. doi:10.1177/0846537119888390
  • Foley SJ, Bly R, Brady AP, et al. Justification of CT practices across Europe: results of a survey of national competent authorities and radiology societies. Insights Imaging. 2022;13(1):177. Published 2022 Nov 22. doi:10.1186/s13244-022-01325-1
  • American College of Radiology (ACR), Society of Advanced Body Imaging (SABI), Society for Pediatric Radiology (SPR), Society of Thoracic Radiology (STR). ACR–SABI–SPR–STR Practice Parameter for the Performance of Thoracic Computed Tomography (CT). Revised 2023. Available at: https://www.acr.org/
  • Speets AM, van der Graaf Y, Hoes AW, et al. Chest radiography in general practice: indications, diagnostic yield and consequences for patient management. Br J Gen Pract. 2006;56(529):574-578.
  • Gatt ME, Spectre G, Paltiel O, Hiller N, Stalnikowicz R. Chest radiographs in the emergency department: is the radiologist really necessary?. Postgrad Med J. 2003;79(930):214-217. doi:10.1136/pmj.79.930.214
  • Dadalı Y, Köksal D. Thorax CT findings of patients with hilar enlargement on chest X-Ray. Ann Clin Anal Med. 2020;11(3):235-238
  • FDA Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices Page. Available at: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices
  • van Leeuwen KG, Schalekamp S, Rutten MJCM, van Ginneken B, de Rooij M. Artificial intelligence in radiology: 100 commercially available products and their scientific evidence. Eur Radiol. 2021;31(6):3797-3804. doi:10.1007/s00330-021-07892-z
  • Ziegelmayer S, Marka AW, Lenhart N, et al. Evaluation of GPT-4's Chest X-Ray Impression Generation: A Reader Study on Performance and Perception. J Med Internet Res. 2023;25:e50865. Published 2023 Dec 22. doi:10.2196/50865
  • Tiu E, Talius E, Patel P, Langlotz CP, Ng AY, Rajpurkar P. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat Biomed Eng. 2022;6(12):1399-1406. doi:10.1038/s41551-022-00936-9
  • Lee KH, Lee RW, Kwon YE. Validation of a Deep Learning Chest X-ray Interpretation Model: Integrating Large-Scale AI and Large Language Models for Comparative Analysis with ChatGPT. Diagnostics (Basel). 2023;14(1):90. Published 2023 Dec 30. doi:10.3390/diagnostics14010090
  • Bhayana R. Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology. 2024;310(1):e232756. doi:10.1148/radiol.232756
  • Zaboli A, Brigo F, Sibilio S, Mian M, Turcato G. Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage?. Am J Emerg Med. 2024;79:44-47. doi:10.1016/j.ajem.2024.02.008
  • Mira FA, Favier V, Dos Santos Sobreira Nunes H, et al. Chat GPT for the management of obstructive sleep apnea: do we have a polar star?. Eur Arch Otorhinolaryngol. 2024;281(4):2087-2093. doi:10.1007/s00405-023-08270-9
  • Khan U. Revolutionizing Personalized Protein Energy Malnutrition Treatment: Harnessing the Power of Chat GPT. Ann Biomed Eng. 2024;52(5):1125-1127. doi:10.1007/s10439-023-03331-w
  • Günay S, Öztürk A, Özerol H, Yiğit Y, Erenler AK. Comparison of emergency medicine specialist, cardiologist, and chat-GPT in electrocardiography assessment. Am J Emerg Med. 2024;80:51-60. doi:10.1016/j.ajem.2024.03.017
  • Topçu Varlık A, Kaba E, Burakgazi G. The R.E.N.A.L. nephrometry scoring from CT reports with ChatGPT: example with proofs. Jpn J Radiol. 2024;42(8):929-931. doi:10.1007/s11604-024-01573-9
  • Xu S, Yang L, Kelly C. et al. ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders. 2023; Available at: http://arxiv.org/abs/2308.01317
  • Lee S, Youn J, Kim H, Kim M, Yoon SH. CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images. 2023; Available at: https://arxiv.org/abs/2310.18341v3
  • Shentu J, Al Moubayed N. CXR-IRGen: An Integrated Vision and Language Model for the Generation of Clinically Accurate Chest X-Ray Image-Report Pairs. 2024;5200–9.
  • Thawkar O, Shaker A, Mullappilly SS, Cholakkal H, Anwer RM, Khan S, vd. XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models. 2023; Available at: http://arxiv.org/abs/2306.07971
  • Kozel G, Gurses ME, Gecici NN, et al. Chat-GPT on brain tumors: An examination of Artificial Intelligence/Machine Learning's ability to provide diagnoses and treatment plans for example neuro-oncology cases. Clin Neurol Neurosurg. 2024;239:108238. doi:10.1016/j.clineuro.2024.108238
  • Brin D, Sorin V, Barash Y, et al. Assessing GPT-4 multimodal performance in radiological image analysis. Eur Radiol. 2025;35(4):1959-1965. doi:10.1007/s00330-024-11035-5
  • Chetla N, Tandon M, Chang J, Sukhija K, Patel R, Sanchez R. Evaluating ChatGPT's Efficacy in Pediatric Pneumonia Detection From Chest X-Rays: Comparative Analysis of Specialized AI Models. JMIR AI. 2025;4:e67621. Published 2025 Jan 10. doi:10.2196/67621
There are 26 citations in total.

Details

Primary Language English
Subjects Radiology and Organ Imaging
Journal Section Research Articles
Authors

Nur Hürsoy 0000-0001-5059-2268

Hafsa Kolluk 0009-0007-1575-8294

Merve Solak 0000-0003-3466-7260

Kubilay Kağan Budak 0009-0007-3998-101X

Esat Kaba 0000-0001-7464-988X

Publication Date June 30, 2025
Submission Date February 4, 2025
Acceptance Date May 15, 2025
Published in Issue Year 2025 Volume: 2 Issue: 2

Cite

Vancouver Hürsoy N, Kolluk H, Solak M, Budak KK, Kaba E. Interpreting Chest X-ray with ChatGPT: Can It Serve as a Tool for Justifying Computed Tomography?. Cerasus J Med. 2025;2(2):118-26.