ENHANCING TRANSLATION WITH VISUAL AND AUDITORY MODALITIES

Gülfidan Aytaş

doi:10.37999/udekad.1611713

Araştırma Makalesi

ENHANCING TRANSLATION WITH VISUAL AND AUDITORY MODALITIES

Yıl 2025, Cilt: 8 Sayı: 1, 425 - 438, 28.03.2025

Gülfidan Aytaş

https://doi.org/10.37999/udekad.1611713

Öz

This study investigates the impact of integrating visual and auditory modalities into neural machine translation (NMT) processes. Traditional text-based NMT models face limitations in translation quality due to their inability to capture contextual and cultural nuances effectively. This research demonstrates that incorporating visual and auditory elements—such as scene context, character expressions, intonation, and emphasis—leads to significant improvements in translation quality. The study highlights the capacity of multimodal models to preserve cultural and emotional contexts beyond linguistic fidelity. It explores the potential of these models in various applications, including subtitle translation, video game localization, and educational materials. The findings show that visual and auditory modalities enhance the interaction with linguistic context, producing context-aware and culturally aligned content in translation processes. Additionally, this work systematically compares deep learning models such as Transformer, BERT, and GPT, evaluating their characteristics in improving translation quality. The results indicate that new technologies integrating visual and auditory contexts offer significant advantages over traditional text-based models. This has important implications for both theoretical discussions and practical applications.

Anahtar Kelimeler

multimodal translation, neural machine translation (NMT), audiovisual translation (AVT).

Kaynakça

Referans1 Bannon, D. (2010). The Elements of Subtitles, Revised and Expanded Edition: A Practical Guide to the Art of Dialogue, Character, Context, Tone and Style in Subtitling. Lulu. com.
Referans2 Bernal-Merino, M. Á. (2014). Translation and localisation in video games: Making entertainment software global. Routledge.
Referans3 Caglayan, O., Madhyastha, P., Specia, L., & Barrault, L. (2019). Probing the need for visual context in multimodal machine translation. arXiv, 1903.08678v2 [cs.CL], 2 June 2019.
Referans4 Caldwell-Harris, C. L. (2014). Emotionality differences between a native and foreign language: theoretical implications. Frontiers in Psychology, 5, 1055. https://doi.org/10.3389/fpsyg.2014.01055.
Referans5 Castilho, S., & Knowles, R. (2024). A survey of context in neural machine translation and its evaluation. Natural Language Processing, 1-31.
Referans6 Chandler, H., & Deming, S. (2012). The game localisation handbook (2nd ed.). Jones & Bartlett Publishers.
Referans7 Chiaro, D. (2009). Issues in audiovisual translation. In J. Munday (Ed.), The Routledge companion to translation studies (pp. 155-179). Routledge.
Referans8 Gambier, Y. (2023). Audiovisual translation and multimodality: What future? Media and Intercultural Communication: A Multidisciplinary Journal, 1 (1), 1-16.
Referans9 Geçgel, H., & Peker, B. (2020). Multimedya araçlarının yabancı dil öğretimine etkisi üzerine öğretmen görüşleri. RumeliDE Dil Ve Edebiyat Araştırmaları Dergisi(20), 12-22. https://doi.org/10.29000/rumelide.791070
Referans10 Gurbet, Ç. (2023). Game types and differences in the context of game localisation in translation studies [Master’s thesis]. Sakarya University.
Referans11 Huang, F., Zhang, X., Zhao, Z., Xu, J., & Li, Z. (2019). Image–text sentiment analysis via deep multimodal attentive fusion. Knowledge-Based Systems, 167, 26–37. https://doi.org/10.1016/j.knosys.2019.01.019
Referans12 Li, K., Wang, Y., He, Y., Li, Y., Wang, Y., Liu, Y., ... & Qiao, Y. (2024). Mvbench: A comprehensive multi-modal video understanding benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 22195-22206).
Referans13 Mayer, R. E. (2014). Incorporating motivation into multimedia learning. Learning and Instruction, 29, 171-173. https://doi.org/10.1016/j.learninstruc.2013.04.003
Referans14 Mondal, A., Giraldo, J. H., Bouwmans, T., & Chowdhury, A. S. (2021). Moving object detection for event-based vision using graph spectral clustering. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 876-884).
Referans15 Naveen, P., & Trojovský, P. (2024). Overview and challenges of machine translation for contextually appropriate translations. Iscience, 27(10).
Referans16 O’Hagan, M., & Mangiron, C. (2013). Game localisation: Translating for the global digital entertainment industry. John Benjamins.
Referans17 Okyayuz, A. Ş., & Kaya, M. (2017). Görsel-İşitsel Çeviri Eğitimi. Siyasal Yayınevi.
Referans18 Okyayuz, A. Ş. (2019a). Ayrıntılı Altyazı Çevirisi. Siyasal Kitabevi.
Referans19 Okyayuz, A. Ş. (2019b). Görsel-İşitsel Çeviri ve Engelsiz Erişim. Siyasal Kitabevi.
Referans20 Oral, Z. (2024). Çok dilli görsel-işitsel ürünlerin çevirisinde çevirmen yaklaşım ve yöntemleri üzerine bir inceleme. RumeliDE Dil ve Edebiyat Araştırmaları Dergisi, 38, 1564-1583.
Referans21 Perego, E. (2012). Introduction. In E. Perego (Ed.), Eye tracking in audiovisual translation (pp. 7-11). Aracne Editrice.
Referans22 Specia, L., Frank, S., Sima'An, K., & Elliott, D. (2016, August). A shared task on multimodal machine translation and crosslingual image description. In Proceedings of the First Conference on Machine Translation (pp. 543-553). Association for Computational Linguistics (ACL).
Referans23 Sulubacak, U., Caglayan, O., Grönroos, S. A., Rouhe, A., Elliott, D., Specia, L., & Tiedemann, J. (2020). Multimodal machine translation through visuals and speech. Machine Translation, 34, 97-147.
Referans24 Tiedemann, J. (2012). Parallel data, tools, and interfaces in OPUS. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12) (pp. 2214–2218). http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf
Referans25 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Referans26 Xu, W., Zheng, Y., & Liang, Y. (2024). TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages. arXiv. https://arxiv.org/abs/2402.16021

GÖRSEL VE İŞİTSEL MODALİTELERLE ÇEVİRİYİ GELİŞTİRME

Yıl 2025, Cilt: 8 Sayı: 1, 425 - 438, 28.03.2025

Gülfidan Aytaş

https://doi.org/10.37999/udekad.1611713

Öz

Bu çalışma, görsel ve işitsel modalitelerin sinirsel makine çevirisi (NMT) süreçlerine entegrasyonunun etkisini incelemektedir. Geleneksel metin tabanlı NMT modelleri, bağlamsal ve kültürel nüansları yeterince yakalayamadığından, çeviri süreçlerinde sınırlılıklar yaşamaktadır. Bu çalışma, görsel ve işitsel unsurların (sahne bağlamı, karakter ifadeleri, tonlama, vurgulama gibi) eklenmesiyle çeviri kalitesinde anlamlı iyileşmeler sağlandığını ortaya koymaktadır. Multimodal modellerin, dilsel sadakatin ötesinde kültürel ve duygusal bağlamları koruma kapasitesine dikkat çekilmiştir. Çalışma, bu modellerin altyazı çevirisi, video oyun yerelleştirmesi ve eğitim materyalleri gibi farklı uygulama alanlarındaki potansiyelini incelemektedir. Bulgular, görsel ve işitsel modalitelerin dilsel bağlamla etkileşimlerini geliştirerek çeviri süreçlerinde bağlam duyarlı ve kültürel olarak uyumlu içeriklerin üretilmesini sağladığını göstermiştir. Bu çalışma ayrıca, Transformer, BERT ve GPT gibi derin öğrenme modellerini sistematik bir şekilde karşılaştırıp, çeviri kalitesini iyileştirme konusundaki özelliklerini değerlendirmiştir. Sonuçlar, görsel ve işitsel bağlamları entegre eden yeni teknolojilerin, geleneksel metin tabanlı modeller üzerine anlamlı üstünlükler sağladığını ortaya koymaktadır. Bu durum hem teorik tartışmalara hem de pratik uygulamalara önemli bir katkı sunmaktadır.

Anahtar Kelimeler

çoklu modalite çevirisi, sinirsel makine çevirisi, görsel işitsel çeviri.

Kaynakça

Referans1 Bannon, D. (2010). The Elements of Subtitles, Revised and Expanded Edition: A Practical Guide to the Art of Dialogue, Character, Context, Tone and Style in Subtitling. Lulu. com.
Referans2 Bernal-Merino, M. Á. (2014). Translation and localisation in video games: Making entertainment software global. Routledge.
Referans3 Caglayan, O., Madhyastha, P., Specia, L., & Barrault, L. (2019). Probing the need for visual context in multimodal machine translation. arXiv, 1903.08678v2 [cs.CL], 2 June 2019.
Referans4 Caldwell-Harris, C. L. (2014). Emotionality differences between a native and foreign language: theoretical implications. Frontiers in Psychology, 5, 1055. https://doi.org/10.3389/fpsyg.2014.01055.
Referans5 Castilho, S., & Knowles, R. (2024). A survey of context in neural machine translation and its evaluation. Natural Language Processing, 1-31.
Referans6 Chandler, H., & Deming, S. (2012). The game localisation handbook (2nd ed.). Jones & Bartlett Publishers.
Referans7 Chiaro, D. (2009). Issues in audiovisual translation. In J. Munday (Ed.), The Routledge companion to translation studies (pp. 155-179). Routledge.
Referans8 Gambier, Y. (2023). Audiovisual translation and multimodality: What future? Media and Intercultural Communication: A Multidisciplinary Journal, 1 (1), 1-16.
Referans9 Geçgel, H., & Peker, B. (2020). Multimedya araçlarının yabancı dil öğretimine etkisi üzerine öğretmen görüşleri. RumeliDE Dil Ve Edebiyat Araştırmaları Dergisi(20), 12-22. https://doi.org/10.29000/rumelide.791070
Referans10 Gurbet, Ç. (2023). Game types and differences in the context of game localisation in translation studies [Master’s thesis]. Sakarya University.
Referans11 Huang, F., Zhang, X., Zhao, Z., Xu, J., & Li, Z. (2019). Image–text sentiment analysis via deep multimodal attentive fusion. Knowledge-Based Systems, 167, 26–37. https://doi.org/10.1016/j.knosys.2019.01.019
Referans12 Li, K., Wang, Y., He, Y., Li, Y., Wang, Y., Liu, Y., ... & Qiao, Y. (2024). Mvbench: A comprehensive multi-modal video understanding benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 22195-22206).
Referans13 Mayer, R. E. (2014). Incorporating motivation into multimedia learning. Learning and Instruction, 29, 171-173. https://doi.org/10.1016/j.learninstruc.2013.04.003
Referans14 Mondal, A., Giraldo, J. H., Bouwmans, T., & Chowdhury, A. S. (2021). Moving object detection for event-based vision using graph spectral clustering. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 876-884).
Referans15 Naveen, P., & Trojovský, P. (2024). Overview and challenges of machine translation for contextually appropriate translations. Iscience, 27(10).
Referans16 O’Hagan, M., & Mangiron, C. (2013). Game localisation: Translating for the global digital entertainment industry. John Benjamins.
Referans17 Okyayuz, A. Ş., & Kaya, M. (2017). Görsel-İşitsel Çeviri Eğitimi. Siyasal Yayınevi.
Referans18 Okyayuz, A. Ş. (2019a). Ayrıntılı Altyazı Çevirisi. Siyasal Kitabevi.
Referans19 Okyayuz, A. Ş. (2019b). Görsel-İşitsel Çeviri ve Engelsiz Erişim. Siyasal Kitabevi.
Referans20 Oral, Z. (2024). Çok dilli görsel-işitsel ürünlerin çevirisinde çevirmen yaklaşım ve yöntemleri üzerine bir inceleme. RumeliDE Dil ve Edebiyat Araştırmaları Dergisi, 38, 1564-1583.
Referans21 Perego, E. (2012). Introduction. In E. Perego (Ed.), Eye tracking in audiovisual translation (pp. 7-11). Aracne Editrice.
Referans22 Specia, L., Frank, S., Sima'An, K., & Elliott, D. (2016, August). A shared task on multimodal machine translation and crosslingual image description. In Proceedings of the First Conference on Machine Translation (pp. 543-553). Association for Computational Linguistics (ACL).
Referans23 Sulubacak, U., Caglayan, O., Grönroos, S. A., Rouhe, A., Elliott, D., Specia, L., & Tiedemann, J. (2020). Multimodal machine translation through visuals and speech. Machine Translation, 34, 97-147.
Referans24 Tiedemann, J. (2012). Parallel data, tools, and interfaces in OPUS. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12) (pp. 2214–2218). http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf
Referans25 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Referans26 Xu, W., Zheng, Y., & Liang, Y. (2024). TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages. arXiv. https://arxiv.org/abs/2402.16021

Toplam 26 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Çeviri ve Yorum Çalışmaları, Karşılaştırmalı Dil Çalışmaları
Bölüm	Araştırma Makaleleri
Yazarlar	Gülfidan Aytaş 0000-0003-1566-1592
Erken Görünüm Tarihi	27 Mart 2025
Yayımlanma Tarihi	28 Mart 2025
Gönderilme Tarihi	2 Ocak 2025
Kabul Tarihi	26 Mart 2025
Yayımlandığı Sayı	Yıl 2025 Cilt: 8 Sayı: 1

Kaynak Göster

APA	Aytaş, G. (2025). ENHANCING TRANSLATION WITH VISUAL AND AUDITORY MODALITIES. Uluslararası Dil Edebiyat Ve Kültür Araştırmaları Dergisi, 8(1), 425-438. https://doi.org/10.37999/udekad.1611713

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

* Hakemlerimizin uzmanlık alanlarını detaylı olarak girmesi süreçte hakem ataması açısından önem arz etmektedir.

* Dergimize gönderilen makaleler sadece ön değerlendirme sürecinde gerekçe gösterilerek geri çekilebilir. Değerlendirme sürecine geçen makalelerin geri çekilmesi mümkün değildir. Anlayışınız için teşekkür eder iyi çalışmalar dileriz.