Research Article
BibTex RIS Cite

Performance of machine learning methods on breast cancer prediction

Year 2025, Issue: 012, 1 - 9, 30.04.2025

Abstract

In the last 50 years, the effect of cancer disease on the annual number of deaths has increased significantly. This has led to an increase in research on early detection and diagnosis of cancer. Early diagnosis of cancer increases the chance of surviving the disease and reduces the possibility of recurrence of the disease. The technological advances in artificial intelligence and machine learning are used to analyse patient data, while at the same time reducing the likelihood of developing diseases. In this paper, 7 different machine learning algorithms commonly used in the literature are used for breast cancer diagnosis. These are: Logistic Regression (LR), K-Nearest Neighbours (KNN), Support Vector Machines (SVM), Radial Basis Function (RBF) Kernel, Naive Bayes, Decision Tree (DT), and Random Forest (RF) algorithms. In our study, two separate datasets were used for breast cancer diagnosis. In the first dataset, Random Forest, SVM (RBF), and SVM (Linear) algorithms had the highest accuracy value of 96.5, while the K-Nearest Neighbours algorithm had the highest sensitivity value of 98.8, and the decision tree algorithm had the highest specificity value of 98.1. The K-Nearest Neighbour algorithm was also found to be the fastest algorithm, with 1.03 seconds. In the second dataset with different data, the K-Nearest Neighbours algorithm reached the highest accuracy value of 97.7 and was observed to be the second fastest algorithm with 1.48 seconds after the Gaussian Naive Bayes algorithm with 1.14 seconds.

References

  • [1] American Cancer Society. Breast Cancer Facts & Figures 2024-2025. Atlanta: American Cancer Society, Inc. 2024, https://www.cancer.org/ (accessed Feb. 1, 2025).
  • [2] W. J. Archibald, R. E. Ziemer, J. S. Newman, “Ask mayo expert: Anemia workup in 1919,” Mayo Clinic Proceedings, vol. 94, no. 9, pp. 1904, 2019.
  • [3] B. J. Copeland. “Artificial intelligence.” https://www.britannica.com/technology/artificial-intelligence (accessed Dec. 14, 2024).
  • [4] “Machine learning.” https://www.sas.com/en_us/insights/analytics/machine-learning.html (accessed Sep. 3, 2024).
  • [5] T. Davenport, R. Kalakota, “The potential for artificial intelligence in healthcare,” Future Healthcare Journal, vol. 6, no. 2, pp. 94-98, 2019.
  • [6] I. Kononenko, “Machine learning for medical diagnosis: history, state of the art and perspective,” Journal of Artificial Intelligence in Medicine, vol. 23, no. 1, pp. 89-109, 2001.
  • [7] J. Kiruba, R. Visalakshi, A. Vaishnavi, R. Ahalya, R. A. Keerthi, “Medical diagnosis using machine learning,” Indian Journal of Public Health Research and Development, vol. 10, no. 4, pp. 1337, 2019.
  • [8] S. G. Jacob, R. G. Ramani, “Efficient classifier for classification of prognostic breast cancer data through data mining techniques,” World Congress on Engineering and Computer Science, San Francisco, USA, 2012, vol. 1, pp. 978-988.
  • [9] Agarap A F M. “On breast cancer detection: An application of machine learning algorithms on the Wisconsin diagnostic dataset,” Proceedings of the 2nd International Conference on Machine Learning and Soft Computing (ICMLSC’2018), Phu Quoc Island, Vietnam, 2018, pp. 5-9.
  • [10] P. P. Sengar, M. J. Gaikwad, A. S. Nagdive, “Comparative study of machine learning algorithms for breast cancer prediction,” Third International Conference on Smart Systems and Inventive Technology (ICSSIT’2020), Tirunelveli, India, 2020, pp. 796–801.
  • [11] T. Jain, V. K. Verma, M. Agarwal, A. Yadav, A. Jain, “A supervised machine learning approach for the prediction of breast cancer,” 2020 International Conference on System, Computation, Automation and Networking (ICSCAN), Pondicherry, India, 2020, vol. 10, pp. 1-6.
  • [12] U. Ojha, S. Goel, “A study on prediction of breast cancer recurrence using data mining techniques,” 7th International Conference on Cloud Computing, Data Science & Engineering Confluence, Noida, India, 2017, pp. 527-530.
  • [13] G. Y. Özkan, S. Y. Gündüz, “Comparision of classification algorithms for survival of breast cancer patients,” Innovations in Intelligent Systems and Applications Conference (ASYU’20), Istanbul, Turkey, 2020, pp. 1-4.
  • [14] T. Kıyan, T. Yıldırım, “Breast cancer diagnosis using statistical neural networks,” Istanbul University Journal of Electrical & Electronics Engineering, vol. 4, no. 2, pp. 1149-1153, 2004.
  • [15] A. Hazra, S. Kumar, A. Gupta, “A study and analysis of breast cancer cell detection using naïve Bayes, SVM and ensemble algorithms,” International Journal of Computer Applications, vol. 145, no. 2, pp. 39-45, 2016.
  • [16] S. H. Abdulla, A. M. Sagheer, h. Veisi, “Breast cancer classification using machine learning techniques: A review,” Turkish Journal of Computer and Mathematics Education, vol. 12, no. 14, pp. 1970-1979, 2021.
  • [17] C. Shravya, K. Pravalika, S. Subhani, “Prediction of breast cancer using supervised machine learning techniques,” International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 6, pp. 2278-3075, 2019.
  • [18] N. Al-Azzam, I. Shatnawi, “Comparing supervised and semi-supervised machine learning models on diagnosing breast cancer,” The journal of Annals of Medicine and Surgery, vol. 62, pp. 53-64, 2021.
  • [19] M. Darwich, M. Bayoumi, “An evaluation of the effectiveness of machine learning prediction models in assessing breast cancer risk,” Informatics in Medicine Unlocked, vol. 49, 101550, 2024.
  • [20] T. Islam, M. A. Sheakh, M. S. Tahosin, et al., “Predictive modeling for breast cancer classification in the context of Bangladeshi patients by use of machine learning approach with explainable AI,” Scientific Reports, vol. 14, article 8487, 2024.
  • [21] M. Stojiljković. “Logistic regression in Python.” https://realpython.com/logistic-regression-python/ (accessed Jan. 13, 2024).
  • [22] C. Sampaio. “Guide to the K-nearest neighbours algorithm in python and scikit-learn.” https://stackabuse.com/k-nearest-neighbors-algorithm-in-python-and-scikit-learn/ (accessed Feb. 02, 2025)
  • [23] J. C. Platt, “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,” Journal of Advances in Large Margin Classifiers, pp. 1-11, 2000.
  • [24] Quora. “When can I use Linear SVM instead of RBF, polynomial, or a sigmoid kernel?” https://www.quora.com/When-can-I-use-Linear-SVM-instead-of-RBF-polynomial-or-a-sigmoid-kernel (accessed Dec. 8, 2024).
  • [25] Z. Anw. “Difference between SVM Linear, polynmial and RBF kernel?” https://www.researchgate.net/post/Diffference_between_SVM_Linear_polynmial_and_RBF_kernel. (accessed Dec. 8, 2024).
  • [26] A. Navlani. “Naive Bayes Classification using Scikit-learn.” https://www.datacamp.com/community/tutorials/naive-bayes-scikit-learn (accessed Feb. 8, 2025).
  • [27] A. Navlani. “Decision Tree Classification in Python.” https://www.datacamp.com/community/tutorials/decision-tree-classification-python. (accessed Dec. 8, 2024).
  • [28] M. N. Dumont, R. Marée, L. Wehenkel, P. Geurts, “Fast multi-class image annotation with random subwindows and multiple output randomized trees,” Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, Lisboa, Portugal, 2009, vol. 2, pp. 196-203.
  • [29] A. Navlani. “Understanding Random Forests Classifiers in Python.” https://www.datacamp.com/community/tutorials/random-forests-classifier-python. (accessed Dec. 8, 2024).
  • [30] W. Wolberg, O. Mangasarian, N. Street, W. Street. Breast Cancer Wisconsin (Diagnostic) [Dataset], UCI Machine Learning Repository. https://doi.org/10.24432/C5DW2B (accessed Sep. 8, 2024).
  • [31] W. Wolberg. Breast Cancer Wisconsin [Dataset], UCI Machine Learning Repository. https://doi.org/10.24432/C5HP4Z (accessed Sep. 8, 2024).
  • [32] G. M. Foody, “Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient,” PLoS ONE, vol. 18, no. 10, 2023.
  • [33] H. H. Rashidi, S. Albahra, S. Robertson, N. K. Tran, and B. Hu, “Common statistical concepts in the supervised Machine Learning arena,” Frontiers in Oncology, vol. 13, 2023.

Meme Kanseri Tahmininde Makine Öğrenme Yöntemlerinin Performansı

Year 2025, Issue: 012, 1 - 9, 30.04.2025

Abstract

Son 50 yılda yıllık ölüm sayısına kanser hastalığının etkisi önemli ölçüde artmıştır. Bu durum da kanserin erken teşhisi ve tanısına yönelik araştırmaların artmasına neden olmuştur. Zira kanserin erken teşhisi hastalıktan kurtulma şansını artırırken hastalığın tekrarlama ihtimalini de düşürmektedir. Yapay zeka ve makine öğreniminin içinde bulunduğumuz teknolojik ilerlemeler, hasta verilerini analiz etmeye yararken aynı zamanda hastalıklara yakalanma olasılıklarını da azaltmaktadır. Bu makalede, meme kanseri teşhisi için literatürde sık kullanılan 7 farklı makine öğrenmesi algoritması kullanılmıştır. Bunlar: Lojistik Regresyon (LR), K-En Yakın Komşular (KNN), Destek Vektör Makineleri (SVM), Radyal Tabanlı Fonksiyon (RBF) Çekirdeği, Naive Bayes, Karar Ağacı (DT) ve Rastgele Orman (RF) algoritmalarıdır. Çalışmamızda meme kanseri hastalığı teşhisi için iki ayrı veri seti kullanılmıştır. İlk veri setinde, Random Forest, SVM (RBF) ve SVM (Linear) algoritmaları 96.5'lik en yüksek accuracy değerine sahip olmakla birlikte, K-Nearest Neighbors algoritmasının 98.8 ile en yüksek sensitivity değerini ve ayrıca decision tree algoritmasının da 98.1 ile en yüksek specifity değerini aldığı görülmüştür. K-Nearest Neighbour algoritmasının 1.03 saniye ile en hızlı algoritma olduğu da saptanmıştır. Farklı verilere sahip ikinci veri setinde, K-Nearest Neighbors algoritması %97.7'lik en yüksek accuracy değerine ulaşmakla birlikte, 1.14 saniyelik Gaussian Naive Bayes algoritmasından sonra 1.48 saniye ile en hızlı ikinci algoritma olarak gözlenmiştir.

References

  • [1] American Cancer Society. Breast Cancer Facts & Figures 2024-2025. Atlanta: American Cancer Society, Inc. 2024, https://www.cancer.org/ (accessed Feb. 1, 2025).
  • [2] W. J. Archibald, R. E. Ziemer, J. S. Newman, “Ask mayo expert: Anemia workup in 1919,” Mayo Clinic Proceedings, vol. 94, no. 9, pp. 1904, 2019.
  • [3] B. J. Copeland. “Artificial intelligence.” https://www.britannica.com/technology/artificial-intelligence (accessed Dec. 14, 2024).
  • [4] “Machine learning.” https://www.sas.com/en_us/insights/analytics/machine-learning.html (accessed Sep. 3, 2024).
  • [5] T. Davenport, R. Kalakota, “The potential for artificial intelligence in healthcare,” Future Healthcare Journal, vol. 6, no. 2, pp. 94-98, 2019.
  • [6] I. Kononenko, “Machine learning for medical diagnosis: history, state of the art and perspective,” Journal of Artificial Intelligence in Medicine, vol. 23, no. 1, pp. 89-109, 2001.
  • [7] J. Kiruba, R. Visalakshi, A. Vaishnavi, R. Ahalya, R. A. Keerthi, “Medical diagnosis using machine learning,” Indian Journal of Public Health Research and Development, vol. 10, no. 4, pp. 1337, 2019.
  • [8] S. G. Jacob, R. G. Ramani, “Efficient classifier for classification of prognostic breast cancer data through data mining techniques,” World Congress on Engineering and Computer Science, San Francisco, USA, 2012, vol. 1, pp. 978-988.
  • [9] Agarap A F M. “On breast cancer detection: An application of machine learning algorithms on the Wisconsin diagnostic dataset,” Proceedings of the 2nd International Conference on Machine Learning and Soft Computing (ICMLSC’2018), Phu Quoc Island, Vietnam, 2018, pp. 5-9.
  • [10] P. P. Sengar, M. J. Gaikwad, A. S. Nagdive, “Comparative study of machine learning algorithms for breast cancer prediction,” Third International Conference on Smart Systems and Inventive Technology (ICSSIT’2020), Tirunelveli, India, 2020, pp. 796–801.
  • [11] T. Jain, V. K. Verma, M. Agarwal, A. Yadav, A. Jain, “A supervised machine learning approach for the prediction of breast cancer,” 2020 International Conference on System, Computation, Automation and Networking (ICSCAN), Pondicherry, India, 2020, vol. 10, pp. 1-6.
  • [12] U. Ojha, S. Goel, “A study on prediction of breast cancer recurrence using data mining techniques,” 7th International Conference on Cloud Computing, Data Science & Engineering Confluence, Noida, India, 2017, pp. 527-530.
  • [13] G. Y. Özkan, S. Y. Gündüz, “Comparision of classification algorithms for survival of breast cancer patients,” Innovations in Intelligent Systems and Applications Conference (ASYU’20), Istanbul, Turkey, 2020, pp. 1-4.
  • [14] T. Kıyan, T. Yıldırım, “Breast cancer diagnosis using statistical neural networks,” Istanbul University Journal of Electrical & Electronics Engineering, vol. 4, no. 2, pp. 1149-1153, 2004.
  • [15] A. Hazra, S. Kumar, A. Gupta, “A study and analysis of breast cancer cell detection using naïve Bayes, SVM and ensemble algorithms,” International Journal of Computer Applications, vol. 145, no. 2, pp. 39-45, 2016.
  • [16] S. H. Abdulla, A. M. Sagheer, h. Veisi, “Breast cancer classification using machine learning techniques: A review,” Turkish Journal of Computer and Mathematics Education, vol. 12, no. 14, pp. 1970-1979, 2021.
  • [17] C. Shravya, K. Pravalika, S. Subhani, “Prediction of breast cancer using supervised machine learning techniques,” International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 6, pp. 2278-3075, 2019.
  • [18] N. Al-Azzam, I. Shatnawi, “Comparing supervised and semi-supervised machine learning models on diagnosing breast cancer,” The journal of Annals of Medicine and Surgery, vol. 62, pp. 53-64, 2021.
  • [19] M. Darwich, M. Bayoumi, “An evaluation of the effectiveness of machine learning prediction models in assessing breast cancer risk,” Informatics in Medicine Unlocked, vol. 49, 101550, 2024.
  • [20] T. Islam, M. A. Sheakh, M. S. Tahosin, et al., “Predictive modeling for breast cancer classification in the context of Bangladeshi patients by use of machine learning approach with explainable AI,” Scientific Reports, vol. 14, article 8487, 2024.
  • [21] M. Stojiljković. “Logistic regression in Python.” https://realpython.com/logistic-regression-python/ (accessed Jan. 13, 2024).
  • [22] C. Sampaio. “Guide to the K-nearest neighbours algorithm in python and scikit-learn.” https://stackabuse.com/k-nearest-neighbors-algorithm-in-python-and-scikit-learn/ (accessed Feb. 02, 2025)
  • [23] J. C. Platt, “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,” Journal of Advances in Large Margin Classifiers, pp. 1-11, 2000.
  • [24] Quora. “When can I use Linear SVM instead of RBF, polynomial, or a sigmoid kernel?” https://www.quora.com/When-can-I-use-Linear-SVM-instead-of-RBF-polynomial-or-a-sigmoid-kernel (accessed Dec. 8, 2024).
  • [25] Z. Anw. “Difference between SVM Linear, polynmial and RBF kernel?” https://www.researchgate.net/post/Diffference_between_SVM_Linear_polynmial_and_RBF_kernel. (accessed Dec. 8, 2024).
  • [26] A. Navlani. “Naive Bayes Classification using Scikit-learn.” https://www.datacamp.com/community/tutorials/naive-bayes-scikit-learn (accessed Feb. 8, 2025).
  • [27] A. Navlani. “Decision Tree Classification in Python.” https://www.datacamp.com/community/tutorials/decision-tree-classification-python. (accessed Dec. 8, 2024).
  • [28] M. N. Dumont, R. Marée, L. Wehenkel, P. Geurts, “Fast multi-class image annotation with random subwindows and multiple output randomized trees,” Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, Lisboa, Portugal, 2009, vol. 2, pp. 196-203.
  • [29] A. Navlani. “Understanding Random Forests Classifiers in Python.” https://www.datacamp.com/community/tutorials/random-forests-classifier-python. (accessed Dec. 8, 2024).
  • [30] W. Wolberg, O. Mangasarian, N. Street, W. Street. Breast Cancer Wisconsin (Diagnostic) [Dataset], UCI Machine Learning Repository. https://doi.org/10.24432/C5DW2B (accessed Sep. 8, 2024).
  • [31] W. Wolberg. Breast Cancer Wisconsin [Dataset], UCI Machine Learning Repository. https://doi.org/10.24432/C5HP4Z (accessed Sep. 8, 2024).
  • [32] G. M. Foody, “Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient,” PLoS ONE, vol. 18, no. 10, 2023.
  • [33] H. H. Rashidi, S. Albahra, S. Robertson, N. K. Tran, and B. Hu, “Common statistical concepts in the supervised Machine Learning arena,” Frontiers in Oncology, vol. 13, 2023.
There are 33 citations in total.

Details

Primary Language English
Subjects Planning and Decision Making
Journal Section Research Articles
Authors

Ghazwa Alsaffaf 0000-0001-9824-5951

Soydan Serttaş 0000-0001-8887-8675

Publication Date April 30, 2025
Submission Date April 24, 2025
Acceptance Date April 30, 2025
Published in Issue Year 2025 Issue: 012

Cite

APA Alsaffaf, G., & Serttaş, S. (2025). Performance of machine learning methods on breast cancer prediction. Journal of Scientific Reports-B(012), 1-9.
AMA Alsaffaf G, Serttaş S. Performance of machine learning methods on breast cancer prediction. JSR-B. April 2025;(012):1-9.
Chicago Alsaffaf, Ghazwa, and Soydan Serttaş. “Performance of Machine Learning Methods on Breast Cancer Prediction”. Journal of Scientific Reports-B, no. 012 (April 2025): 1-9.
EndNote Alsaffaf G, Serttaş S (April 1, 2025) Performance of machine learning methods on breast cancer prediction. Journal of Scientific Reports-B 012 1–9.
IEEE G. Alsaffaf and S. Serttaş, “Performance of machine learning methods on breast cancer prediction”, JSR-B, no. 012, pp. 1–9, April 2025.
ISNAD Alsaffaf, Ghazwa - Serttaş, Soydan. “Performance of Machine Learning Methods on Breast Cancer Prediction”. Journal of Scientific Reports-B 012 (April 2025), 1-9.
JAMA Alsaffaf G, Serttaş S. Performance of machine learning methods on breast cancer prediction. JSR-B. 2025;:1–9.
MLA Alsaffaf, Ghazwa and Soydan Serttaş. “Performance of Machine Learning Methods on Breast Cancer Prediction”. Journal of Scientific Reports-B, no. 012, 2025, pp. 1-9.
Vancouver Alsaffaf G, Serttaş S. Performance of machine learning methods on breast cancer prediction. JSR-B. 2025(012):1-9.