Research Article
BibTex RIS Cite

Evaluation of artificial intelligence in thoracic surgery internship education: accuracy and usability of AI-generated exam questions

Year 2025, Volume: 8 Issue: 3, 524 - 528, 30.05.2025
https://doi.org/10.32322/jhsm.1660603

Abstract

Aims: This study aims to evaluate the usefulness and reliability of artificial intelligence (AI) applications in thoracic surgery internship education and exam preparation.
Methods: Claude Sonnet 3.7 AI was provided with core topics covered in the 5th-year thoracic surgery internship and was instructed to generate a 20-question multiple-choice exam, including an answer key. Four thoracic surgery specialists assessed the AI-generated questions using the Delphi panel method, classifying them as correct, minor error, or major error. Major errors included the absence of the correct answer among choices, incorrect AI-marked answers, or contradictions with established medical knowledge. A second exam was manually created by a thoracic surgery specialist and evaluated using the same methodology. Seven volunteer 5th-year medical students completed both exams, and the correlation between their scores was statistically analyzed.
Results: Among AI-generated questions, 8 (40%) contained major errors, while 1 (5%) had a minor error. The expert-generated exam had a perfect accuracy rate, whereas the AI-generated exam had significantly lower accuracy (p=0.001). Median scores were 75 (67-100) for the AI exam and 85 (70-95) for the expert exam. No significant correlation was found between students’ scores (r=0.042, p=0.929).
Conclusion: AI-generated questions had a high error rate (40% major, 5% minor), making them unreliable for unsupervised use in medical education. While AI may provide partial benefits under expert supervision, it currently lacks the accuracy required for independent implementation in thoracic surgery education.

Thanks

Prof. Dr. Osman Güler

References

  • Feigerlova E, Hani H, Hothersall-Davies E. A systematic review of the impact of artificial intelligence on educational outcomes in health professions education. BMC Med Educ. 2025;25:129. doi:10.1186/s12909-025-06719-5
  • Ennab F, Farhan H, Zary N. Generative artificial intelligence and its role in the development of clinical cases in medical education: a scoping review protocol. Preprints. 2025. doi:10.20944/preprints202501.1031.v1
  • Preiksaitis C, Rose C. Opportunities, challenges, and future directions of generative artificial intelligence in medical education: scoping review. JMIR Med Educ. 2023;9:e48785. doi:10.2196/48785
  • Koçak B, Ponsiglione A, Stanzione A, et al. Bias in artificial intelligence for medical imaging: fundamentals, detection, avoidance, mitigation, challenges, ethics, and prospects. Diagn Interv Radiol. 2025;31(2):75-88. doi:10.4274/dir.2024.242854
  • Colton S, Hatcher T. The web-based Delphi research technique as a method for content validation in HRD and adult education research. Online Submission. 2004.
  • Hicke Y, Geathers J, Rajashekar N, et al. MedSimAI: simulation and formative feedback generation to enhance deliberate practice in medical education. arXiv preprint arXiv:2503.05793. 2025. doi:10.48550/arXiv.2503.05793
  • Hersh W. Generative artificial intelligence: implications for biomedical and health professions education. arXiv preprint arXiv:2501.10186. 2025. doi:10.48550/arXiv.2501.10186
  • Mir MM, Mir GM, Raina NT, et al. Application of artificial intelligence in medical education: current scenario and future perspectives. J Adv Med Educ Prof. 2023;11(3):133-140. doi:10.30476/JAMP.2023.98655.1803
  • Barile J, Margolis A, Cason G, et al. Diagnostic accuracy of a large language model in pediatric case studies. JAMA Pediatr. 2024;178(3):313-315. doi:10.1001/jamapediatrics.2023.5750
  • Narayanan S, Ramakrishnan R, Durairaj E, Das A. Artificial intelligence revolutionizing the field of medical education. Cureus. 2023;15(11):e49604. doi:10.7759/cureus.49604
  • Johnson D, Goodman R, Patrinely J, et al. Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the Chat-GPT model. Res Sq [Preprint]. 2023:rs.3.rs-2566942. doi:10.21203/rs.3.rs-2566942/v1
  • Law AK, So J, Lui CT, et al. AI versus human-generated multiple-choice questions for medical education: a cohort study in a high-stakes examination. BMC Med Educ. 2025;25(1):208. doi:10.1186/s12909-025-06796-6
  • Al Shuraiqi S, Aal Abdulsalam A, Masters K, Zidoum H, AlZaabi A. Automatic generation of medical case-based multiple-choice questions (MCQs): a review of methodologies, applications, evaluation, and future directions. Big Data Cogn Comput. 2024;8(10):139. doi: 10.3390/bdcc8100139

Yapay zekânın göğüs cerrahisi staj eğitimi değerlendirmesi: yapay zekâ tarafından üretilen sınav sorularının doğruluğu ve kullanılabilirliği

Year 2025, Volume: 8 Issue: 3, 524 - 528, 30.05.2025
https://doi.org/10.32322/jhsm.1660603

Abstract

Amaç
Bu çalışma, yapay zekâ (YZ) uygulamalarının göğüs cerrahisi staj eğitimi ve sınav hazırlığındaki faydasını ve güvenilirliğini değerlendirmeyi amaçlamaktadır.

Yöntemler
Claude Sonnet 3.7 YZ’ye, 5. sınıf göğüs cerrahisi stajında işlenen temel konular sağlandı ve 20 sorudan oluşan, cevap anahtarı içeren çoktan seçmeli bir sınav hazırlaması talep edildi. Dört göğüs cerrahisi uzmanı, YZ tarafından oluşturulan soruları Delphi panel yöntemi ile değerlendirerek “doğru”, “küçük hata” veya “büyük hata” olarak sınıflandırdı. Büyük hatalar, doğru cevabın seçenekler arasında yer almaması, YZ tarafından yanlış işaretlenen cevaplar veya yerleşik tıbbi bilgiyle çelişkiler olarak tanımlandı. Aynı yöntemle, bir göğüs cerrahisi uzmanı tarafından manuel olarak oluşturulan ikinci bir sınav da değerlendirildi. Yedi gönüllü 5. sınıf tıp öğrencisi her iki sınavı tamamladı ve puanlar arasındaki korelasyon istatistiksel olarak analiz edildi.

Bulgular
YZ tarafından oluşturulan soruların 8’inde (%40) büyük hata, 1’inde (%5) küçük hata tespit edildi. Uzman tarafından hazırlanan sınavın doğruluk oranı tam iken, YZ tarafından hazırlanan sınavın doğruluk oranı anlamlı derecede daha düşük bulundu (p = 0.001). YZ sınavının medyan puanı 75 (67-100), uzman sınavının medyan puanı ise 85 (70-95) olarak hesaplandı. Öğrencilerin iki sınavdaki puanları arasında anlamlı bir korelasyon bulunamadı (r=0.042, p=0.929).

Sonuç
YZ tarafından oluşturulan soruların yüksek hata oranına sahip olması (%40 büyük, %5 küçük hata), bunların tıp eğitiminde denetimsiz kullanım için güvenilir olmadığını göstermektedir. Yapay zekâ, uzman denetimi altında belirli avantajlar sağlayabilse de, göğüs cerrahisi eğitiminde bağımsız olarak uygulanabilecek doğruluğa henüz ulaşamamıştır.

Thanks

Prof. Dr. Osman Güler

References

  • Feigerlova E, Hani H, Hothersall-Davies E. A systematic review of the impact of artificial intelligence on educational outcomes in health professions education. BMC Med Educ. 2025;25:129. doi:10.1186/s12909-025-06719-5
  • Ennab F, Farhan H, Zary N. Generative artificial intelligence and its role in the development of clinical cases in medical education: a scoping review protocol. Preprints. 2025. doi:10.20944/preprints202501.1031.v1
  • Preiksaitis C, Rose C. Opportunities, challenges, and future directions of generative artificial intelligence in medical education: scoping review. JMIR Med Educ. 2023;9:e48785. doi:10.2196/48785
  • Koçak B, Ponsiglione A, Stanzione A, et al. Bias in artificial intelligence for medical imaging: fundamentals, detection, avoidance, mitigation, challenges, ethics, and prospects. Diagn Interv Radiol. 2025;31(2):75-88. doi:10.4274/dir.2024.242854
  • Colton S, Hatcher T. The web-based Delphi research technique as a method for content validation in HRD and adult education research. Online Submission. 2004.
  • Hicke Y, Geathers J, Rajashekar N, et al. MedSimAI: simulation and formative feedback generation to enhance deliberate practice in medical education. arXiv preprint arXiv:2503.05793. 2025. doi:10.48550/arXiv.2503.05793
  • Hersh W. Generative artificial intelligence: implications for biomedical and health professions education. arXiv preprint arXiv:2501.10186. 2025. doi:10.48550/arXiv.2501.10186
  • Mir MM, Mir GM, Raina NT, et al. Application of artificial intelligence in medical education: current scenario and future perspectives. J Adv Med Educ Prof. 2023;11(3):133-140. doi:10.30476/JAMP.2023.98655.1803
  • Barile J, Margolis A, Cason G, et al. Diagnostic accuracy of a large language model in pediatric case studies. JAMA Pediatr. 2024;178(3):313-315. doi:10.1001/jamapediatrics.2023.5750
  • Narayanan S, Ramakrishnan R, Durairaj E, Das A. Artificial intelligence revolutionizing the field of medical education. Cureus. 2023;15(11):e49604. doi:10.7759/cureus.49604
  • Johnson D, Goodman R, Patrinely J, et al. Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the Chat-GPT model. Res Sq [Preprint]. 2023:rs.3.rs-2566942. doi:10.21203/rs.3.rs-2566942/v1
  • Law AK, So J, Lui CT, et al. AI versus human-generated multiple-choice questions for medical education: a cohort study in a high-stakes examination. BMC Med Educ. 2025;25(1):208. doi:10.1186/s12909-025-06796-6
  • Al Shuraiqi S, Aal Abdulsalam A, Masters K, Zidoum H, AlZaabi A. Automatic generation of medical case-based multiple-choice questions (MCQs): a review of methodologies, applications, evaluation, and future directions. Big Data Cogn Comput. 2024;8(10):139. doi: 10.3390/bdcc8100139
There are 13 citations in total.

Details

Primary Language English
Subjects Thoracic Surgery
Journal Section Original Article
Authors

İsmail Dal 0000-0002-5118-0780

Publication Date May 30, 2025
Submission Date March 18, 2025
Acceptance Date May 27, 2025
Published in Issue Year 2025 Volume: 8 Issue: 3

Cite

AMA Dal İ. Evaluation of artificial intelligence in thoracic surgery internship education: accuracy and usability of AI-generated exam questions. J Health Sci Med / JHSM. May 2025;8(3):524-528. doi:10.32322/jhsm.1660603

Interuniversity Board (UAK) Equivalency: Article published in Ulakbim TR Index journal [10 POINTS], and Article published in other (excuding 1a, b, c) international indexed journal (1d) [5 POINTS].

The Directories (indexes) and Platforms we are included in are at the bottom of the page.

Note: Our journal is not WOS indexed and therefore is not classified as Q.

You can download Council of Higher Education (CoHG) [Yüksek Öğretim Kurumu (YÖK)] Criteria) decisions about predatory/questionable journals and the author's clarification text and journal charge policy from your browser. https://dergipark.org.tr/tr/journal/2316/file/4905/show







The indexes of the journal are ULAKBİM TR Dizin, Index Copernicus, ICI World of Journals, DOAJ, Directory of Research Journals Indexing (DRJI), General Impact Factor, ASOS Index, WorldCat (OCLC), MIAR, EuroPub, OpenAIRE, Türkiye Citation Index, Türk Medline Index, InfoBase Index, Scilit, etc.

       images?q=tbn:ANd9GcRB9r6zRLDl0Pz7om2DQkiTQXqDtuq64Eb1Qg&usqp=CAU

500px-WorldCat_logo.svg.png

atifdizini.png

logo_world_of_journals_no_margin.png

images?q=tbn%3AANd9GcTNpvUjQ4Ffc6uQBqMQrqYMR53c7bRqD9rohCINkko0Y1a_hPSn&usqp=CAU

doaj.png  

images?q=tbn:ANd9GcSpOQFsFv3RdX0lIQJC3SwkFIA-CceHin_ujli_JrqBy3A32A_Tx_oMoIZn96EcrpLwTQg&usqp=CAU

ici2.png

asos-index.png

drji.png





The platforms of the journal are Google Scholar, CrossRef (DOI), ResearchBib, Open Access, COPE, ICMJE, NCBI, ORCID, Creative Commons, etc.

COPE-logo-300x199.jpgimages?q=tbn:ANd9GcQR6_qdgvxMP9owgnYzJ1M6CS_XzR_d7orTjA&usqp=CAU

icmje_1_orig.png

cc.logo.large.png

ncbi.pngimages?q=tbn:ANd9GcRBcJw8ia8S9TI4Fun5vj3HPzEcEKIvF_jtnw&usqp=CAU

ORCID_logo.png

1*mvsP194Golg0Dmo2rjJ-oQ.jpeg


Our Journal using the DergiPark system indexed are;

Ulakbim TR Dizin,  Index Copernicus, ICI World of JournalsDirectory of Research Journals Indexing (DRJI), General Impact FactorASOS Index, OpenAIRE, MIAR,  EuroPub, WorldCat (OCLC)DOAJ,  Türkiye Citation Index, Türk Medline Index, InfoBase Index


Our Journal using the DergiPark system platforms are;

Google, Google Scholar, CrossRef (DOI), ResearchBib, ICJME, COPE, NCBI, ORCID, Creative Commons, Open Access, and etc.


Journal articles are evaluated as "Double-Blind Peer Review". 

Our journal has adopted the Open Access Policy and articles in JHSM are Open Access and fully comply with Open Access instructions. All articles in the system can be accessed and read without a journal user.  https//dergipark.org.tr/tr/pub/jhsm/page/9535

Journal charge policy   https://dergipark.org.tr/tr/pub/jhsm/page/10912

Our journal has been indexed in DOAJ as of May 18, 2020.

Our journal has been indexed in TR-Dizin as of March 12, 2021.


17873

Articles published in the Journal of Health Sciences and Medicine have open access and are licensed under the Creative Commons CC BY-NC-ND 4.0 International License.