Araştırma Makalesi
BibTex RIS Kaynak Göster

X Platformunda İstenmeyen Gönderilerin Makine Öğrenmesi, Geniş Dil Modelleri ve Bilgisayarlı Görü ile Tespiti

Yıl 2025, Cilt: 18 Sayı: 2, 181 - 194, 30.04.2025
https://doi.org/10.17671/gazibtd.1596990

Öz

Sosyal medya platformları, günümüzde bilgi paylaşımı ve iletişimde önemli araçlar haline gelirken, aynı zamanda istenmeyen gönderilerin (spam) yayılması da büyük bir sorun teşkil etmektedir. Bu çalışma, X sosyal medya platformundaki (eski adıyla Twitter) istenmeyen gönderilerin tespitine yönelik, makine öğrenmesi, geniş dil modelleri ve bilgisayarlı görü tekniklerini birleştiren yeni bir yaklaşım önermektedir. Türkiye’de popüler olan konulara dair görsel içeren gönderilerden bir veri kümesi oluşturularak, spam tespitinde en etkili makine öğrenmesi algoritmaları belirlenmeye çalışılmıştır. Gönderi içeriğinin etiketlerle ilişkisi ve birden fazla etiketin birbiriyle ilgisi gibi sosyal medya etkileşimini belirleyen öznitelikler geliştirilmiştir. Ayrıca, görsel içeriğin analizi için, görselin X platformunda ilk paylaşıldığı tarih ile internet üzerindeki diğer sayfalarda geçtiği metinle benzerliği gibi görsel odaklı öznitelikler de dahil edilmiştir. Bu öznitelikler, Google Gemini ve Cloud Vision AI araçları kullanılarak geliştirilmiştir. Beş farklı makine öğrenmesi algoritması (Karar Ağaçları, Rastgele Orman, SVM, Lojistik Regresyon, Çok Katmanlı Algılayıcı) ile yapılan deneylerde, Rastgele Orman algoritması en yüksek doğruluk ve F1 skoru değerlerine ulaşmıştır. Bu çalışma, X platformunda istenmeyen gönderi tespiti için makine öğrenmesi yöntemlerinin etkinliğini göstermiş ve Google Gemini ile Cloud Vision AI araçlarının etkin kullanımına dair yeni yöntemler sunmuştur. Ayrıca geliştirilen öznitelikler, spam içeriklerin doğru bir şekilde sınıflandırılmasında güçlü bir temel oluşturmaktadır.

Kaynakça

  • P. Sharma, T. Nagpal, G. Shrivastava, J. D. Kumar, “A Systematic Review on Social Bots Account Detection Using Machine Learning,” 2023 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), pp. 862–866, 2023.
  • S. Ghosh, B. Viswanath, F. Kooti, N. K. Sharma, K. P. Gummadi, B. Krishnamurthy, “Understanding and combating link farming in the twitter social network,” The Web Conference (WWW), pp. 61–70, 2012.
  • N. L. Jin, N. Y. Chen, N. T. Wang, N. P. Hui, A. V. Vasilakos, “Understanding user behavior in online social networks: a survey,” IEEE Communications Magazine, vol. 51, no. 9, pp. 144–150, 2013.
  • K. Thomas, C. Grier, D. Song, V. Paxson, “Suspended accounts in retrospect,” 12th ACM Workshop on Hot Topics in Networks (HotNets), pp. 1–7, 2011.
  • A. Aggarwal, A. Rajadesingan, P. Kumaraguru, “PhishAri: Automatic real-time phishing detection on twitter,” eCrime Researchers Summit, pp. 1–12, 2012.
  • D. Wang, S. Navathe, L. Liu, D. Irani, A. Tamersoy, C. Pu, “Click Traffic Analysis of Short URL Spam on Twitter,” IEEE International Conference on Big Data, pp. 1–8, 2013.
  • C. Grier, K. Thomas, V. Paxson, M. Zhang, “@spam,” Proceedings of the 2010 ACM Conference on Computer and Communications Security, pp. 27–37, 2010.
  • T. Wu, S. Liu, J. Zhang, Y. Xiang, “Twitter spam detection based on deep learning,” Proceedings of the Australasian Computer Science Week Multiconference (ACSW), pp. 1–9, 2016.
  • M. Mateen, M. A. Iqbal, M. Aleem, M. A. Islam, “A hybrid approach for spam detection for Twitter,” 2022 19th International Bhurban Conference on Applied Sciences and Technology (IBCAST), pp. 106–111, 2017.
  • S. Sedhai, A. Sun, “Semi-Supervised Spam Detection in Twitter Stream,” IEEE Transactions on Computational Social Systems, vol. 5, no. 1, pp. 169–175, 2017.
  • I. Inuwa-Dutse, M. Liptrott, I. Korkontzelos, “Detection of spam-posting accounts on Twitter,” Neurocomputing, vol. 315, pp. 496–511, 2018.
  • S. Madisetty, M. S. Desarkar, “A Neural Network-Based Ensemble Approach for Spam Detection in Twitter,” IEEE Transactions on Computational Social Systems, vol. 5, no. 4, pp. 973–984, 2018.
  • R. Kaur, S. Singh, H. Kumar, “Rise of spam and compromised accounts in online social networks: A state-of-the-art review of different combating approaches,” Journal of Network and Computer Applications, vol. 112, pp. 53–88, 2018.
  • K. Binsaeed, G. Stringhini, A. Eleyan, “Detecting Spam in Twitter Microblogging Services: A Novel Machine Learning Approach based on Domain Popularity,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 11, pp. 1–7, 2020.
  • K. S. Adewole, T. Han, W. Wu, H. Song, A. K. Sangaiah, “Twitter spam account detection based on clustering and classification methods,” The Journal of Supercomputing, vol. 76, no. 7, pp. 4802–4837, 2018.
  • N. El-Mawass, P. Honeine, L. Vercouter, “SimilCatch: Enhanced social spammers detection on Twitter using Markov Random Fields,” Information Processing & Management, vol. 57, no. 6, p. 102317, 2020.
  • Y. Kontsewaya, E. Antonov, A. Artamonov, “Evaluating the effectiveness of machine learning methods for spam detection,” Procedia Computer Science, vol. 190, pp. 479–486, 2021.
  • K. U. Santoshi, S. S. Bhavya, Y. B. Sri, B. Venkateswarlu, “Twitter Spam Detection Using Naïve Bayes Classifier,” International Conference on Inventive Computation Technologies, pp. 773–777, 2021.
  • S. B. Abkenar, E. Mahdipour, S. M. Jameii, M. H. Kashani, “A hybrid classification method for Twitter spam detection based on differential evolution and random forest,” Concurrency and Computation: Practice and Experience, vol. 33, no. 21, 2021.
  • C. Kumar, T. S. Bharti, S. Prakash, “A hybrid Data-Driven framework for Spam detection in Online Social Network,” Procedia Computer Science, vol. 218, pp. 124–132, 2023.
  • M. Sumathi, S. P. Raja, “Machine learning algorithm-based spam detection in social networks,” Social Network Analysis and Mining, vol. 13, no. 1, 2023.
  • M. Thomas, B. B. Meshram, “ChSO-DNFNet: Spam detection in Twitter using feature fusion and optimized Deep Neuro Fuzzy Network,” Advances in Engineering Software, vol. 175, p. 103333, 2022.
  • S. J. Alsunaidi, R. T. Alraddadi, H. Aljamaan, “Twitter spam accounts detection using machine learning models,” 2022 14th International Conference on Computational Intelligence and Communication Networks (CICN), vol. 15, pp. 525–531, 2022.
  • S. Kaddoura, S. Henno, “Dataset of Arabic spam and ham tweets,” Data in Brief, vol. 52, p. 109904, 2023.
  • D. SivaKrishna, G. Srinivas, “StopSpamX: A multi-modal fusion approach for spam detection in social networking,” MethodsX, vol. 12, p. 103227, 2025.
  • K. P. Sharma, M. Gupta, A. Mishra, “Quantum Behaved Binary Gravitational Search Algorithm with Random Forest for Twitter Spammer Detection,” Results in Engineering, vol. 20, p. 103993, 2025.
  • A. Nekrasov, S. H. Teoh, S. Wu, “Visuals and attention to earnings news on Twitter,” SSRN Electronic Journal, 2019.
  • C. Boididou, S. Papadopoulos, Y. Kompatsiaris, J. Schifferes, N. Newman, “Verifying information with multimedia content on Twitter,” Multimedia Tools and Applications, vol. 77, no. 12, pp. 15545–15571, 2017.
  • Cumhurbaşkanlığı İletişim Başkanlığı, Sosyal Medya Kullanım Kılavuzu, Türkiye, 2020.
  • Q. Ren, H. Cheng, H. Han, “Research on machine learning framework based on random forest algorithm,” AIP Conference Proceedings, vol. 1864, no. 1, p. 020164, 2017.
  • A. Gupte, S. Joshi, P. Gadgul, A. Kadam, “Comparative Study of Classification Algorithms used in Sentiment Analysis,” International Journal of Computer Science and Information Technologies, vol. 5, no. 5, pp. 6261–6264, 2014.

Detecting Spams on the X Platform with Machine Learning, Large Language Models, and Computer Vision

Yıl 2025, Cilt: 18 Sayı: 2, 181 - 194, 30.04.2025
https://doi.org/10.17671/gazibtd.1596990

Öz

While social media platforms have become crucial tools for information sharing and communication, the spread of unwanted content (spam) has also become a significant problem. This paper proposes a novel approach for spam detection on the social media platform X (formerly Twitter) by integrating machine learning, large language models, and computer vision techniques. A dataset containing posts with visual content on popular Turkish topics was created, aiming to identify the most effective machine learning algorithms for spam detection. Feature engineering was conducted to capture key aspects of social media interaction, including the relationship between post content and hashtags, as well as the relevance between multiple hashtags. Additionally, image-based features were introduced, such as the initial posting date of an image on X and its textual similarity to other web pages, to enhance visual content analysis. These features were developed using Google Gemini and Cloud Vision AI. Experimental evaluations with five machine learning algorithms (Decision Trees, Random Forest, SVM, Logistic Regression, and Multilayer Perceptron) demonstrated that the Random Forest algorithm achieved the highest accuracy and F1 score. This paper highlights the effectiveness of machine learning methods in spam detection on X and introduces new methodologies for leveraging Google Gemini and Cloud Vision AI. Furthermore, the engineered features provide a strong foundation for accurately classifying spam content.

Kaynakça

  • P. Sharma, T. Nagpal, G. Shrivastava, J. D. Kumar, “A Systematic Review on Social Bots Account Detection Using Machine Learning,” 2023 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), pp. 862–866, 2023.
  • S. Ghosh, B. Viswanath, F. Kooti, N. K. Sharma, K. P. Gummadi, B. Krishnamurthy, “Understanding and combating link farming in the twitter social network,” The Web Conference (WWW), pp. 61–70, 2012.
  • N. L. Jin, N. Y. Chen, N. T. Wang, N. P. Hui, A. V. Vasilakos, “Understanding user behavior in online social networks: a survey,” IEEE Communications Magazine, vol. 51, no. 9, pp. 144–150, 2013.
  • K. Thomas, C. Grier, D. Song, V. Paxson, “Suspended accounts in retrospect,” 12th ACM Workshop on Hot Topics in Networks (HotNets), pp. 1–7, 2011.
  • A. Aggarwal, A. Rajadesingan, P. Kumaraguru, “PhishAri: Automatic real-time phishing detection on twitter,” eCrime Researchers Summit, pp. 1–12, 2012.
  • D. Wang, S. Navathe, L. Liu, D. Irani, A. Tamersoy, C. Pu, “Click Traffic Analysis of Short URL Spam on Twitter,” IEEE International Conference on Big Data, pp. 1–8, 2013.
  • C. Grier, K. Thomas, V. Paxson, M. Zhang, “@spam,” Proceedings of the 2010 ACM Conference on Computer and Communications Security, pp. 27–37, 2010.
  • T. Wu, S. Liu, J. Zhang, Y. Xiang, “Twitter spam detection based on deep learning,” Proceedings of the Australasian Computer Science Week Multiconference (ACSW), pp. 1–9, 2016.
  • M. Mateen, M. A. Iqbal, M. Aleem, M. A. Islam, “A hybrid approach for spam detection for Twitter,” 2022 19th International Bhurban Conference on Applied Sciences and Technology (IBCAST), pp. 106–111, 2017.
  • S. Sedhai, A. Sun, “Semi-Supervised Spam Detection in Twitter Stream,” IEEE Transactions on Computational Social Systems, vol. 5, no. 1, pp. 169–175, 2017.
  • I. Inuwa-Dutse, M. Liptrott, I. Korkontzelos, “Detection of spam-posting accounts on Twitter,” Neurocomputing, vol. 315, pp. 496–511, 2018.
  • S. Madisetty, M. S. Desarkar, “A Neural Network-Based Ensemble Approach for Spam Detection in Twitter,” IEEE Transactions on Computational Social Systems, vol. 5, no. 4, pp. 973–984, 2018.
  • R. Kaur, S. Singh, H. Kumar, “Rise of spam and compromised accounts in online social networks: A state-of-the-art review of different combating approaches,” Journal of Network and Computer Applications, vol. 112, pp. 53–88, 2018.
  • K. Binsaeed, G. Stringhini, A. Eleyan, “Detecting Spam in Twitter Microblogging Services: A Novel Machine Learning Approach based on Domain Popularity,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 11, pp. 1–7, 2020.
  • K. S. Adewole, T. Han, W. Wu, H. Song, A. K. Sangaiah, “Twitter spam account detection based on clustering and classification methods,” The Journal of Supercomputing, vol. 76, no. 7, pp. 4802–4837, 2018.
  • N. El-Mawass, P. Honeine, L. Vercouter, “SimilCatch: Enhanced social spammers detection on Twitter using Markov Random Fields,” Information Processing & Management, vol. 57, no. 6, p. 102317, 2020.
  • Y. Kontsewaya, E. Antonov, A. Artamonov, “Evaluating the effectiveness of machine learning methods for spam detection,” Procedia Computer Science, vol. 190, pp. 479–486, 2021.
  • K. U. Santoshi, S. S. Bhavya, Y. B. Sri, B. Venkateswarlu, “Twitter Spam Detection Using Naïve Bayes Classifier,” International Conference on Inventive Computation Technologies, pp. 773–777, 2021.
  • S. B. Abkenar, E. Mahdipour, S. M. Jameii, M. H. Kashani, “A hybrid classification method for Twitter spam detection based on differential evolution and random forest,” Concurrency and Computation: Practice and Experience, vol. 33, no. 21, 2021.
  • C. Kumar, T. S. Bharti, S. Prakash, “A hybrid Data-Driven framework for Spam detection in Online Social Network,” Procedia Computer Science, vol. 218, pp. 124–132, 2023.
  • M. Sumathi, S. P. Raja, “Machine learning algorithm-based spam detection in social networks,” Social Network Analysis and Mining, vol. 13, no. 1, 2023.
  • M. Thomas, B. B. Meshram, “ChSO-DNFNet: Spam detection in Twitter using feature fusion and optimized Deep Neuro Fuzzy Network,” Advances in Engineering Software, vol. 175, p. 103333, 2022.
  • S. J. Alsunaidi, R. T. Alraddadi, H. Aljamaan, “Twitter spam accounts detection using machine learning models,” 2022 14th International Conference on Computational Intelligence and Communication Networks (CICN), vol. 15, pp. 525–531, 2022.
  • S. Kaddoura, S. Henno, “Dataset of Arabic spam and ham tweets,” Data in Brief, vol. 52, p. 109904, 2023.
  • D. SivaKrishna, G. Srinivas, “StopSpamX: A multi-modal fusion approach for spam detection in social networking,” MethodsX, vol. 12, p. 103227, 2025.
  • K. P. Sharma, M. Gupta, A. Mishra, “Quantum Behaved Binary Gravitational Search Algorithm with Random Forest for Twitter Spammer Detection,” Results in Engineering, vol. 20, p. 103993, 2025.
  • A. Nekrasov, S. H. Teoh, S. Wu, “Visuals and attention to earnings news on Twitter,” SSRN Electronic Journal, 2019.
  • C. Boididou, S. Papadopoulos, Y. Kompatsiaris, J. Schifferes, N. Newman, “Verifying information with multimedia content on Twitter,” Multimedia Tools and Applications, vol. 77, no. 12, pp. 15545–15571, 2017.
  • Cumhurbaşkanlığı İletişim Başkanlığı, Sosyal Medya Kullanım Kılavuzu, Türkiye, 2020.
  • Q. Ren, H. Cheng, H. Han, “Research on machine learning framework based on random forest algorithm,” AIP Conference Proceedings, vol. 1864, no. 1, p. 020164, 2017.
  • A. Gupte, S. Joshi, P. Gadgul, A. Kadam, “Comparative Study of Classification Algorithms used in Sentiment Analysis,” International Journal of Computer Science and Information Technologies, vol. 5, no. 5, pp. 6261–6264, 2014.
Toplam 31 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Derin Öğrenme, Yarı ve Denetimsiz Öğrenme, Doğal Dil İşleme
Bölüm Makaleler
Yazarlar

Ebutalha Camadan 0000-0001-7669-5601

Mehmet Şimşek 0000-0002-9797-5028

Yayımlanma Tarihi 30 Nisan 2025
Gönderilme Tarihi 5 Aralık 2024
Kabul Tarihi 11 Mart 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 18 Sayı: 2

Kaynak Göster

APA Camadan, E., & Şimşek, M. (2025). X Platformunda İstenmeyen Gönderilerin Makine Öğrenmesi, Geniş Dil Modelleri ve Bilgisayarlı Görü ile Tespiti. Bilişim Teknolojileri Dergisi, 18(2), 181-194. https://doi.org/10.17671/gazibtd.1596990