Research Article
BibTex RIS Cite

Performance Evaluation of Supervised Machine Learning Classifiers for Type 2 Diabetes Mellitus Prediction

Year 2025, Volume: 8 Issue: 3, 875 - 884, 15.05.2025
https://doi.org/10.34248/bsengineering.1618267

Abstract

Diabetes mellitus is a significant global health concern that profoundly affects individuals' lives and imposes a considerable burden on healthcare systems. Enhanced predictive capabilities can lead to timely interventions, ultimately improving patient outcomes and alleviating the strain on healthcare resources. Thus, accurate and timely prediction of diabetes mellitus is crucial for reducing mortality rates and minimizing complications within healthcare frameworks. This study addresses the correlation between type 2 diabetes mellitus (T2DM) and key attributes that differentiate diabetic from non-diabetic cases, utilizing various machine learning-based classification methods. For this reason, this work employed a large, open-source dataset obtained from Kaggle. To my knowledge, this is the first study utilizing such a dataset that specifically focuses on predicting T2DM in patients aged 35 years or older, according to the American Diabetes Association (ADA). To identify key features associated with T2DM for use as input to each supervised classifier, the Minimum Redundancy Maximum Relevance (mRMR) feature selection algorithm was applied to the dataset. In this analysis, the performance of each supervised classifier with feature selection was evaluated and compared using various metrics, including accuracy, sensitivity, specificity, precision (positive predictive value, PPV), negative predictive value (NPV), F1 score, and the area under the receiver operating characteristic curve (AUROC). The results of the analysis reveal that an ensemble method employing boosted trees (EBT) classifier surpasses the other models, recording the highest macro-average values for accuracy (95.9%), PPV (97.7%), NPV (97.7%), and F1 score (89.7%), along with the superior area under the curve (AUC) of 95.57% for both diabetes and non-diabetes cases. The study suggests that machine learning classifiers can serve as a reliable tool for the precise prediction of T2DM, thereby enhancing clinical decision-making processes for healthcare practitioners.

Ethical Statement

Ethics committee approval was not required for this study because of there was no study on animals or humans. The study employed an open-source dataset.

References

  • American Diabetes Association. 2021. URL: https://diabetes.org/newsroom/latest-ada-annual-standards-of-care-includes-changes-to-diabetes-screening-first-line-therapy-pregnancy-technology (accessed date: January 6, 2025)
  • Anderson RP, Jin R, Grunkemeier GL. 2003. Understanding logistic regression analysis in clinical reports: an introduction. Ann Thorac Surg, 75(3): 753–757.
  • Bhat SS, Banu M, Ansari GA, Selvam V. 2023. A risk assessment and prediction framework for diabetes mellitus using machine learning algorithms. Healthc Anal, 4: 100273.
  • Borse SP, Chhipa AS, Sharma V, Singh DP, Nivsarkar M 2021. Management of type 2 diabetes: current strategies, unfocussed aspects, challenges, and alternatives. Med Princ Pract, 30(2): 109–121.
  • Cano-Cano F, Gómez-Jaramillo L, Ramos-García P, Arroba AI, Aguilar-Diosdado M. 2022. IL-1β implications in type 1 diabetes mellitus progression: systematic review and meta-analysis. J Clin Med, 11(5): 1303.
  • Carrillo-Larco RM, Guzman-Vilca WC, Xu X, Bernabe-Ortiz A. 2024. Mean age and body mass index at type 2 diabetes diagnosis: pooled analysis of 56 health surveys across income groups and world regions. Diabet Med, 41(2): e15174.
  • Chandra MA, Bedi SS. 2018. Survey on SVM and their application in image classification. Int J Inf Technol, 13(5): 1–11.
  • Costa-Cordella S, Luyten P, Cohen D, Mena F, Fonagy P. 2021. Mentalizing in mothers and children with type 1 diabetes. Dev Psychopathol, 33(1): 216–225.
  • Fazakis N, Kocsis O, Dritsas E, Alexiou S, Fakotakis N, Moustakas K. 2021. Machine learning tools for long-term type 2 diabetes risk prediction. IEEE Access, 9: 103737–103757.
  • Febrian ME, Ferdinan FX, Sendani GP, Suryanigrum KM, Yunanda R. 2023. Diabetes prediction using supervised machine learning. Procedia Comput Sci, 216: 21–30.
  • Hidayati N, Hermawan A. 2021. K-Nearest Neighbor (K-NN) algorithm with Euclidean and Manhattan in classification of student graduation. J Eng Appl Sci Technol, 2(2): 86–91.
  • International Diabetes Federation. 2021. URL: https://diabetesatlas.org/atlas/tenth-edition/ (accessed date: January 3, 2025)
  • Janiesch C, Zschech P, Heinrich K. 2021. Machine learning and deep learning. Electron Markets, 31(3): 685–695.
  • Jordan MI, Mitchell TM. 2015. Machine learning: trends, perspectives, and prospects. Science, 349(6245): 255–260.
  • Kaggle. 2024. Diabetes prediction dataset. URL: https://www.kaggle.com/datasets/iammustafatz/diabetes-prediction-dataset (accessed date: October 17, 2024)
  • Kurt O. 2024. Model-based prediction of water levels for the Great Lakes: a comparative analysis. Earth Sci Inform, 17(3): 3333–3349.
  • Laakso M, Kuusisto J. 2014. Insulin resistance and hyperglycaemia in cardiovascular disease development. Nat Rev Endocrinol, 10(5): 293–302.
  • Lin X, Xu Y, Pan X, Xu J, Ding Y, Sun X, Song X, Ren Y, Shan PF. 2020. Global, regional, and national burden and trend of diabetes in 195 countries and territories: an analysis from 1990 to 2025. Sci Rep, 10(1): 14790.
  • Ma CX, Ma XN, Guan CH, Li YD, Mauricio D, Fu SB. 2022. Cardiovascular disease in type 2 diabetes mellitus: progress toward personalized management. Cardiovasc Diabetol, 21(1): 74.
  • Modak SKS, Jha VK. 2024. Diabetes prediction model using machine learning techniques. Multimed Tools Appl, 83(13): 38523–38549.
  • Mohapatra SK, Das A, Mohanty MN. 2023. Application of ensemble learning–based classifiers for genetic expression data classification. Data Science for Genomics, Academic Press, pp: 11–23.
  • Rastogi R, Bansal M. 2023. Diabetes prediction model using data mining techniques. Meas Sens, 25(21): 100605.
  • Ren J, Lee SD, Chen X, Kao B, Cheng R, Cheung D. 2009. Naive Bayes classification of uncertain data. Ninth IEEE Int Conf Data Min, pp: 944–949.
  • Talukder MA, Islam MM, Uddin A, Kazi M, Khalid M, Akhter A, Moni MA. 2024. Toward reliable diabetes prediction: innovations in data engineering and machine learning applications. Digit Health, 10: 1–26.
  • Tasin I, Nabil TU, Islam S, Khan R. 2022. Diabetes prediction using machine learning and explainable AI techniques. Healthc Technol Lett, 10(1–2): 1–10.
  • Tigga NP, Garg S. 2020. Prediction of type 2 diabetes using machine learning classification methods. Procedia Comput Sci, 167: 706–716.
  • Varma KVSRP, Rao AA, Lakshmi TSM, Rao PVN. 2014. A computational intelligence approach for a better diagnosis of diabetic patients. Comput Electr Eng, 40(5): 1758–1765.
  • WHO. 2024. World health organization. URL: https://www.who.int/news-room/fact-sheets/detail/diabetes (accessed date: January 3, 2025)
  • Zhao S, Zhang B, Yang J, Zhou J, Xu Y. 2024. Linear discriminant analysis. Nat Rev Methods Primers, 4(1): 70.
  • Zhou H, Xin Y, Li S. 2023. A diabetes prediction model based on Boruta feature selection and ensemble learning. BMC Bioinform, 24(1): 224.

Performance Evaluation of Supervised Machine Learning Classifiers for Type 2 Diabetes Mellitus Prediction

Year 2025, Volume: 8 Issue: 3, 875 - 884, 15.05.2025
https://doi.org/10.34248/bsengineering.1618267

Abstract

Diabetes mellitus is a significant global health concern that profoundly affects individuals' lives and imposes a considerable burden on healthcare systems. Enhanced predictive capabilities can lead to timely interventions, ultimately improving patient outcomes and alleviating the strain on healthcare resources. Thus, accurate and timely prediction of diabetes mellitus is crucial for reducing mortality rates and minimizing complications within healthcare frameworks. This study addresses the correlation between type 2 diabetes mellitus (T2DM) and key attributes that differentiate diabetic from non-diabetic cases, utilizing various machine learning-based classification methods. For this reason, this work employed a large, open-source dataset obtained from Kaggle. To my knowledge, this is the first study utilizing such a dataset that specifically focuses on predicting T2DM in patients aged 35 years or older, according to the American Diabetes Association (ADA). To identify key features associated with T2DM for use as input to each supervised classifier, the Minimum Redundancy Maximum Relevance (mRMR) feature selection algorithm was applied to the dataset. In this analysis, the performance of each supervised classifier with feature selection was evaluated and compared using various metrics, including accuracy, sensitivity, specificity, precision (positive predictive value, PPV), negative predictive value (NPV), F1 score, and the area under the receiver operating characteristic curve (AUROC). The results of the analysis reveal that an ensemble method employing boosted trees (EBT) classifier surpasses the other models, recording the highest macro-average values for accuracy (95.9%), PPV (97.7%), NPV (97.7%), and F1 score (89.7%), along with the superior area under the curve (AUC) of 95.57% for both diabetes and non-diabetes cases. The study suggests that machine learning classifiers can serve as a reliable tool for the precise prediction of T2DM, thereby enhancing clinical decision-making processes for healthcare practitioners.

Ethical Statement

Ethics committee approval was not required for this study because of there was no study on animals or humans. The study employed an open-source dataset.

References

  • American Diabetes Association. 2021. URL: https://diabetes.org/newsroom/latest-ada-annual-standards-of-care-includes-changes-to-diabetes-screening-first-line-therapy-pregnancy-technology (accessed date: January 6, 2025)
  • Anderson RP, Jin R, Grunkemeier GL. 2003. Understanding logistic regression analysis in clinical reports: an introduction. Ann Thorac Surg, 75(3): 753–757.
  • Bhat SS, Banu M, Ansari GA, Selvam V. 2023. A risk assessment and prediction framework for diabetes mellitus using machine learning algorithms. Healthc Anal, 4: 100273.
  • Borse SP, Chhipa AS, Sharma V, Singh DP, Nivsarkar M 2021. Management of type 2 diabetes: current strategies, unfocussed aspects, challenges, and alternatives. Med Princ Pract, 30(2): 109–121.
  • Cano-Cano F, Gómez-Jaramillo L, Ramos-García P, Arroba AI, Aguilar-Diosdado M. 2022. IL-1β implications in type 1 diabetes mellitus progression: systematic review and meta-analysis. J Clin Med, 11(5): 1303.
  • Carrillo-Larco RM, Guzman-Vilca WC, Xu X, Bernabe-Ortiz A. 2024. Mean age and body mass index at type 2 diabetes diagnosis: pooled analysis of 56 health surveys across income groups and world regions. Diabet Med, 41(2): e15174.
  • Chandra MA, Bedi SS. 2018. Survey on SVM and their application in image classification. Int J Inf Technol, 13(5): 1–11.
  • Costa-Cordella S, Luyten P, Cohen D, Mena F, Fonagy P. 2021. Mentalizing in mothers and children with type 1 diabetes. Dev Psychopathol, 33(1): 216–225.
  • Fazakis N, Kocsis O, Dritsas E, Alexiou S, Fakotakis N, Moustakas K. 2021. Machine learning tools for long-term type 2 diabetes risk prediction. IEEE Access, 9: 103737–103757.
  • Febrian ME, Ferdinan FX, Sendani GP, Suryanigrum KM, Yunanda R. 2023. Diabetes prediction using supervised machine learning. Procedia Comput Sci, 216: 21–30.
  • Hidayati N, Hermawan A. 2021. K-Nearest Neighbor (K-NN) algorithm with Euclidean and Manhattan in classification of student graduation. J Eng Appl Sci Technol, 2(2): 86–91.
  • International Diabetes Federation. 2021. URL: https://diabetesatlas.org/atlas/tenth-edition/ (accessed date: January 3, 2025)
  • Janiesch C, Zschech P, Heinrich K. 2021. Machine learning and deep learning. Electron Markets, 31(3): 685–695.
  • Jordan MI, Mitchell TM. 2015. Machine learning: trends, perspectives, and prospects. Science, 349(6245): 255–260.
  • Kaggle. 2024. Diabetes prediction dataset. URL: https://www.kaggle.com/datasets/iammustafatz/diabetes-prediction-dataset (accessed date: October 17, 2024)
  • Kurt O. 2024. Model-based prediction of water levels for the Great Lakes: a comparative analysis. Earth Sci Inform, 17(3): 3333–3349.
  • Laakso M, Kuusisto J. 2014. Insulin resistance and hyperglycaemia in cardiovascular disease development. Nat Rev Endocrinol, 10(5): 293–302.
  • Lin X, Xu Y, Pan X, Xu J, Ding Y, Sun X, Song X, Ren Y, Shan PF. 2020. Global, regional, and national burden and trend of diabetes in 195 countries and territories: an analysis from 1990 to 2025. Sci Rep, 10(1): 14790.
  • Ma CX, Ma XN, Guan CH, Li YD, Mauricio D, Fu SB. 2022. Cardiovascular disease in type 2 diabetes mellitus: progress toward personalized management. Cardiovasc Diabetol, 21(1): 74.
  • Modak SKS, Jha VK. 2024. Diabetes prediction model using machine learning techniques. Multimed Tools Appl, 83(13): 38523–38549.
  • Mohapatra SK, Das A, Mohanty MN. 2023. Application of ensemble learning–based classifiers for genetic expression data classification. Data Science for Genomics, Academic Press, pp: 11–23.
  • Rastogi R, Bansal M. 2023. Diabetes prediction model using data mining techniques. Meas Sens, 25(21): 100605.
  • Ren J, Lee SD, Chen X, Kao B, Cheng R, Cheung D. 2009. Naive Bayes classification of uncertain data. Ninth IEEE Int Conf Data Min, pp: 944–949.
  • Talukder MA, Islam MM, Uddin A, Kazi M, Khalid M, Akhter A, Moni MA. 2024. Toward reliable diabetes prediction: innovations in data engineering and machine learning applications. Digit Health, 10: 1–26.
  • Tasin I, Nabil TU, Islam S, Khan R. 2022. Diabetes prediction using machine learning and explainable AI techniques. Healthc Technol Lett, 10(1–2): 1–10.
  • Tigga NP, Garg S. 2020. Prediction of type 2 diabetes using machine learning classification methods. Procedia Comput Sci, 167: 706–716.
  • Varma KVSRP, Rao AA, Lakshmi TSM, Rao PVN. 2014. A computational intelligence approach for a better diagnosis of diabetic patients. Comput Electr Eng, 40(5): 1758–1765.
  • WHO. 2024. World health organization. URL: https://www.who.int/news-room/fact-sheets/detail/diabetes (accessed date: January 3, 2025)
  • Zhao S, Zhang B, Yang J, Zhou J, Xu Y. 2024. Linear discriminant analysis. Nat Rev Methods Primers, 4(1): 70.
  • Zhou H, Xin Y, Li S. 2023. A diabetes prediction model based on Boruta feature selection and ensemble learning. BMC Bioinform, 24(1): 224.
There are 30 citations in total.

Details

Primary Language English
Subjects Biomedical Diagnosis, Electrical Engineering (Other)
Journal Section Research Articles
Authors

Onur Kurt 0000-0002-4486-2257

Publication Date May 15, 2025
Submission Date January 12, 2025
Acceptance Date April 22, 2025
Published in Issue Year 2025 Volume: 8 Issue: 3

Cite

APA Kurt, O. (2025). Performance Evaluation of Supervised Machine Learning Classifiers for Type 2 Diabetes Mellitus Prediction. Black Sea Journal of Engineering and Science, 8(3), 875-884. https://doi.org/10.34248/bsengineering.1618267
AMA Kurt O. Performance Evaluation of Supervised Machine Learning Classifiers for Type 2 Diabetes Mellitus Prediction. BSJ Eng. Sci. May 2025;8(3):875-884. doi:10.34248/bsengineering.1618267
Chicago Kurt, Onur. “Performance Evaluation of Supervised Machine Learning Classifiers for Type 2 Diabetes Mellitus Prediction”. Black Sea Journal of Engineering and Science 8, no. 3 (May 2025): 875-84. https://doi.org/10.34248/bsengineering.1618267.
EndNote Kurt O (May 1, 2025) Performance Evaluation of Supervised Machine Learning Classifiers for Type 2 Diabetes Mellitus Prediction. Black Sea Journal of Engineering and Science 8 3 875–884.
IEEE O. Kurt, “Performance Evaluation of Supervised Machine Learning Classifiers for Type 2 Diabetes Mellitus Prediction”, BSJ Eng. Sci., vol. 8, no. 3, pp. 875–884, 2025, doi: 10.34248/bsengineering.1618267.
ISNAD Kurt, Onur. “Performance Evaluation of Supervised Machine Learning Classifiers for Type 2 Diabetes Mellitus Prediction”. Black Sea Journal of Engineering and Science 8/3 (May 2025), 875-884. https://doi.org/10.34248/bsengineering.1618267.
JAMA Kurt O. Performance Evaluation of Supervised Machine Learning Classifiers for Type 2 Diabetes Mellitus Prediction. BSJ Eng. Sci. 2025;8:875–884.
MLA Kurt, Onur. “Performance Evaluation of Supervised Machine Learning Classifiers for Type 2 Diabetes Mellitus Prediction”. Black Sea Journal of Engineering and Science, vol. 8, no. 3, 2025, pp. 875-84, doi:10.34248/bsengineering.1618267.
Vancouver Kurt O. Performance Evaluation of Supervised Machine Learning Classifiers for Type 2 Diabetes Mellitus Prediction. BSJ Eng. Sci. 2025;8(3):875-84.

                                                24890