Araştırma Makalesi
BibTex RIS Kaynak Göster

Multiple Attention-Based Deep Learning Model for MRI Captioning

Yıl 2025, Cilt: 13 Sayı: 1, 128 - 137, 30.06.2025
https://doi.org/10.18586/msufbd.1532112

Öz

In recent years, the use of artificial intelligence in medicine, as in many other fields, has begun to increase considerably. Creating magnetic resonance (MR) reports manually by medical doctors is a very difficult, time-consuming, and potentially error-prone process. In order to address these problems, a deep learning-based image captioning model is proposed in this study to automatically generate reports from brain MRIs. In the developed model, image processing, natural language processing, and deep learning methods are used together to produce text for the content and diagnoses in the medical image. First of all, pre-processing, such as rotating at random angles, changing size, cropping, changing brightness and contrast, adding shadows, and mirroring, were performed for MR images. Then, a model that generates reports was developed by utilizing the Bootstrapping Language Image Pre-Training (BLIP) model and the transformer architecture of the model. The experimental studies showed that the proposed model had successful results; the produced reports were highly similar to the original reports and could be used as a supplementary tool in medicine.

Kaynakça

  • [1] Ravinder, P., & Srinivasan, S. (2024). Automated Medical Image Captioning with Soft Attention-Based LSTM Model Utilizing YOLOv4 Algorithm.
  • [2] Aggarwal, A. K. Revealing AI-Driven Chest X-Ray Image Captioning Using Blip Transformer.
  • [3] Wang, Y., Lin, Z., Xu, Z., Dong, H., Luo, J., Tian, J., ... & He, Z. (2024). Trust it or not: Confidence-guided automatic radiology report generation. Neurocomputing, 127374.
  • [4] Boag, W., Hsu, T. M. H., McDermott, M., Berner, G., Alesentzer, E., & Szolovits, P. (2020, April). Baselines for chest x-ray report generation. In Machine learning for health workshop (pp. 126-140). PMLR.
  • [5] Liu, F., Yin, C., Wu, X., Ge, S., Zou, Y., Zhang, P., Zou, Y., & Sun, X. (2023). Contrastive attention for automatic chest X-ray report generation. arXiv. https://arxiv.org/abs/2106.06965.
  • [6] Lovelace, J., & Mortazavi, B. (2020, November). Learning to generate clinically coherent chest X-ray reports. In Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 1235-1243).
  • [7] Yang, S., Wu, X., Ge, S., Zhou, S. K., & Xiao, L. (2022). Knowledge matters: Chest radiology report generation with general and specific knowledge. Medical image analysis, 80, 102510
  • [8] Yang, S., Wu, X., Ge, S., Zheng, Z., Zhou, S. K., & Xiao, L. (2023). Radiology report generation with a learned knowledge base and multi-modal alignment. Medical Image Analysis, 86, 102798.
  • [9] Pelka, O., Koitka, S., Rückert, J., Nensa, F., & Friedrich, C. M. (2018). Radiology objects in context (roco): a multimodal image dataset. In Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis: 7th Joint International Workshop, CVII-STENT 2018 and Third International Workshop, LABELS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Proceedings 3 (pp. 180-189). Springer International Publishing.
  • [10] Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi. "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models." Proceedings of the 40th International Conference on Machine Learning, PMLR 202:19730-19742, 2023.
  • [11] Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi. "BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation." Proceedings of the 39th International Conference on Machine Learning, PMLR 162:12888-12900, 2022.
  • [12] Lemons, S., Linares López, C., Holte, R., & Ruml, W. (2022). Beam search: Faster and monotonic. Proceedings of the International Conference on Automated Planning and Scheduling, 32(1), 222-230. https://doi.org/10.1609/icaps.v32i1.19805.
  • [13] Zeng, X., Wen, L., Xu, Y., & Ji, C. (2020). Generating diagnostic report for medical image by high-middle-level visual information incorporation on double deep learning models. Computer methods and programs in biomedicine, 197, 105700.
  • [14] Barbella, M., & Tortora, G. (2022). Rouge metric evaluation for text summarization techniques. Available at SSRN 4120317.
  • [15] Devlin, J. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  • [16] https://www.imageclef.org/2024/medical/caption
  • [17] Liu, G., Hsu, T. M. H., McDermott, M., Boag, W., Weng, W. H., Szolovits, P., & Ghassemi, M. (2019, October). Clinically accurate chest x-ray report generation. In Machine Learning for Healthcare Conference (pp. 249-269). PMLR.

MR Altyazılama için Çoklu Dikkat Tabanlı Derin Öğrenme Modeli

Yıl 2025, Cilt: 13 Sayı: 1, 128 - 137, 30.06.2025
https://doi.org/10.18586/msufbd.1532112

Öz

Son yıllarda birçok alanda olduğu gibi sağlık alanında da yapay zeka kullanımı oldukça artmaya başlamıştır. Manyetik rezonans (MR) raporlarının manuel olarak tıp hekimleri tarafından oluşturulması oldukça zor, uzun zaman alan ve hatalı olma olasılığı yüksek bir süreçtir. Bu problemleri adreslemek amacıyla, bu çalışmada beyin MR görüntülerinden otomatik rapor üretecek derin öğrenme tabanlı görüntü altyazılama modeli önerilmiştir. Geliştirilen modelde, görüntü işleme, doğal dil işleme ve derin öğrenme yöntemleri birlikte kullanılarak tıbbi görüntüdeki içerik ve tanılara yönelik metin üretilmektedir. Öncelikle MR görüntüleri için, rastgele açılarla döndürme, boyut değiştirme, kırpma, parlaklık ve kontrast değiştirme, gölge ekleme ve aynalama gibi ön işlemler yapılmıştır. Ardından Bootstrapping Language Image Pre-Training (BLIP) modeli ve modelin transformer mimarisinden faydalanılarak rapor üreten bir model geliştirilmiştir. Yapılan deneysel çalışmalarda, geliştirilen modelin farklı metrikler için başarılı sonuçlar verdiği, üretilen raporların orijinal raporlara yüksek oranda benzer olduğu ve tıp alanında yardımcı öneri sistemi olarak kullanılabileceği görülmüştür.

Kaynakça

  • [1] Ravinder, P., & Srinivasan, S. (2024). Automated Medical Image Captioning with Soft Attention-Based LSTM Model Utilizing YOLOv4 Algorithm.
  • [2] Aggarwal, A. K. Revealing AI-Driven Chest X-Ray Image Captioning Using Blip Transformer.
  • [3] Wang, Y., Lin, Z., Xu, Z., Dong, H., Luo, J., Tian, J., ... & He, Z. (2024). Trust it or not: Confidence-guided automatic radiology report generation. Neurocomputing, 127374.
  • [4] Boag, W., Hsu, T. M. H., McDermott, M., Berner, G., Alesentzer, E., & Szolovits, P. (2020, April). Baselines for chest x-ray report generation. In Machine learning for health workshop (pp. 126-140). PMLR.
  • [5] Liu, F., Yin, C., Wu, X., Ge, S., Zou, Y., Zhang, P., Zou, Y., & Sun, X. (2023). Contrastive attention for automatic chest X-ray report generation. arXiv. https://arxiv.org/abs/2106.06965.
  • [6] Lovelace, J., & Mortazavi, B. (2020, November). Learning to generate clinically coherent chest X-ray reports. In Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 1235-1243).
  • [7] Yang, S., Wu, X., Ge, S., Zhou, S. K., & Xiao, L. (2022). Knowledge matters: Chest radiology report generation with general and specific knowledge. Medical image analysis, 80, 102510
  • [8] Yang, S., Wu, X., Ge, S., Zheng, Z., Zhou, S. K., & Xiao, L. (2023). Radiology report generation with a learned knowledge base and multi-modal alignment. Medical Image Analysis, 86, 102798.
  • [9] Pelka, O., Koitka, S., Rückert, J., Nensa, F., & Friedrich, C. M. (2018). Radiology objects in context (roco): a multimodal image dataset. In Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis: 7th Joint International Workshop, CVII-STENT 2018 and Third International Workshop, LABELS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Proceedings 3 (pp. 180-189). Springer International Publishing.
  • [10] Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi. "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models." Proceedings of the 40th International Conference on Machine Learning, PMLR 202:19730-19742, 2023.
  • [11] Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi. "BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation." Proceedings of the 39th International Conference on Machine Learning, PMLR 162:12888-12900, 2022.
  • [12] Lemons, S., Linares López, C., Holte, R., & Ruml, W. (2022). Beam search: Faster and monotonic. Proceedings of the International Conference on Automated Planning and Scheduling, 32(1), 222-230. https://doi.org/10.1609/icaps.v32i1.19805.
  • [13] Zeng, X., Wen, L., Xu, Y., & Ji, C. (2020). Generating diagnostic report for medical image by high-middle-level visual information incorporation on double deep learning models. Computer methods and programs in biomedicine, 197, 105700.
  • [14] Barbella, M., & Tortora, G. (2022). Rouge metric evaluation for text summarization techniques. Available at SSRN 4120317.
  • [15] Devlin, J. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  • [16] https://www.imageclef.org/2024/medical/caption
  • [17] Liu, G., Hsu, T. M. H., McDermott, M., Boag, W., Weng, W. H., Szolovits, P., & Ghassemi, M. (2019, October). Clinically accurate chest x-ray report generation. In Machine Learning for Healthcare Conference (pp. 249-269). PMLR.
Toplam 17 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Karar Desteği ve Grup Destek Sistemleri
Bölüm Araştırma Makalesi
Yazarlar

Burkay Maraş 0009-0004-2967-0486

Serhat Karatorak 0009-0001-3586-7056

Kevser Özdem Karaca 0000-0002-6695-200X

A. Orkun Gedik 0009-0004-2623-8073

M. Ali Akcayol 0000-0002-6615-1237

Erken Görünüm Tarihi 24 Haziran 2025
Yayımlanma Tarihi 30 Haziran 2025
Gönderilme Tarihi 14 Ağustos 2024
Kabul Tarihi 9 Ekim 2024
Yayımlandığı Sayı Yıl 2025 Cilt: 13 Sayı: 1

Kaynak Göster

APA Maraş, B., Karatorak, S., Özdem Karaca, K., Gedik, A. O., vd. (2025). MR Altyazılama için Çoklu Dikkat Tabanlı Derin Öğrenme Modeli. Mus Alparslan University Journal of Science, 13(1), 128-137. https://doi.org/10.18586/msufbd.1532112
AMA Maraş B, Karatorak S, Özdem Karaca K, Gedik AO, Akcayol MA. MR Altyazılama için Çoklu Dikkat Tabanlı Derin Öğrenme Modeli. MAUN Fen Bil. Dergi. Haziran 2025;13(1):128-137. doi:10.18586/msufbd.1532112
Chicago Maraş, Burkay, Serhat Karatorak, Kevser Özdem Karaca, A. Orkun Gedik, ve M. Ali Akcayol. “MR Altyazılama için Çoklu Dikkat Tabanlı Derin Öğrenme Modeli”. Mus Alparslan University Journal of Science 13, sy. 1 (Haziran 2025): 128-37. https://doi.org/10.18586/msufbd.1532112.
EndNote Maraş B, Karatorak S, Özdem Karaca K, Gedik AO, Akcayol MA (01 Haziran 2025) MR Altyazılama için Çoklu Dikkat Tabanlı Derin Öğrenme Modeli. Mus Alparslan University Journal of Science 13 1 128–137.
IEEE B. Maraş, S. Karatorak, K. Özdem Karaca, A. O. Gedik, ve M. A. Akcayol, “MR Altyazılama için Çoklu Dikkat Tabanlı Derin Öğrenme Modeli”, MAUN Fen Bil. Dergi., c. 13, sy. 1, ss. 128–137, 2025, doi: 10.18586/msufbd.1532112.
ISNAD Maraş, Burkay vd. “MR Altyazılama için Çoklu Dikkat Tabanlı Derin Öğrenme Modeli”. Mus Alparslan University Journal of Science 13/1 (Haziran 2025), 128-137. https://doi.org/10.18586/msufbd.1532112.
JAMA Maraş B, Karatorak S, Özdem Karaca K, Gedik AO, Akcayol MA. MR Altyazılama için Çoklu Dikkat Tabanlı Derin Öğrenme Modeli. MAUN Fen Bil. Dergi. 2025;13:128–137.
MLA Maraş, Burkay vd. “MR Altyazılama için Çoklu Dikkat Tabanlı Derin Öğrenme Modeli”. Mus Alparslan University Journal of Science, c. 13, sy. 1, 2025, ss. 128-37, doi:10.18586/msufbd.1532112.
Vancouver Maraş B, Karatorak S, Özdem Karaca K, Gedik AO, Akcayol MA. MR Altyazılama için Çoklu Dikkat Tabanlı Derin Öğrenme Modeli. MAUN Fen Bil. Dergi. 2025;13(1):128-37.