Diabetes mellitus is a significant global health concern that profoundly affects individuals' lives and imposes a considerable burden on healthcare systems. Enhanced predictive capabilities can lead to timely interventions, ultimately improving patient outcomes and alleviating the strain on healthcare resources. Thus, accurate and timely prediction of diabetes mellitus is crucial for reducing mortality rates and minimizing complications within healthcare frameworks. This study addresses the correlation between type 2 diabetes mellitus (T2DM) and key attributes that differentiate diabetic from non-diabetic cases, utilizing various machine learning-based classification methods. For this reason, this work employed a large, open-source dataset obtained from Kaggle. To my knowledge, this is the first study utilizing such a dataset that specifically focuses on predicting T2DM in patients aged 35 years or older, according to the American Diabetes Association (ADA). To identify key features associated with T2DM for use as input to each supervised classifier, the Minimum Redundancy Maximum Relevance (mRMR) feature selection algorithm was applied to the dataset. In this analysis, the performance of each supervised classifier with feature selection was evaluated and compared using various metrics, including accuracy, sensitivity, specificity, precision (positive predictive value, PPV), negative predictive value (NPV), F1 score, and the area under the receiver operating characteristic curve (AUROC). The results of the analysis reveal that an ensemble method employing boosted trees (EBT) classifier surpasses the other models, recording the highest macro-average values for accuracy (95.9%), PPV (97.7%), NPV (97.7%), and F1 score (89.7%), along with the superior area under the curve (AUC) of 95.57% for both diabetes and non-diabetes cases. The study suggests that machine learning classifiers can serve as a reliable tool for the precise prediction of T2DM, thereby enhancing clinical decision-making processes for healthcare practitioners.
Ethics committee approval was not required for this study because of there was no study on animals or humans. The study employed an open-source dataset.
Diabetes mellitus is a significant global health concern that profoundly affects individuals' lives and imposes a considerable burden on healthcare systems. Enhanced predictive capabilities can lead to timely interventions, ultimately improving patient outcomes and alleviating the strain on healthcare resources. Thus, accurate and timely prediction of diabetes mellitus is crucial for reducing mortality rates and minimizing complications within healthcare frameworks. This study addresses the correlation between type 2 diabetes mellitus (T2DM) and key attributes that differentiate diabetic from non-diabetic cases, utilizing various machine learning-based classification methods. For this reason, this work employed a large, open-source dataset obtained from Kaggle. To my knowledge, this is the first study utilizing such a dataset that specifically focuses on predicting T2DM in patients aged 35 years or older, according to the American Diabetes Association (ADA). To identify key features associated with T2DM for use as input to each supervised classifier, the Minimum Redundancy Maximum Relevance (mRMR) feature selection algorithm was applied to the dataset. In this analysis, the performance of each supervised classifier with feature selection was evaluated and compared using various metrics, including accuracy, sensitivity, specificity, precision (positive predictive value, PPV), negative predictive value (NPV), F1 score, and the area under the receiver operating characteristic curve (AUROC). The results of the analysis reveal that an ensemble method employing boosted trees (EBT) classifier surpasses the other models, recording the highest macro-average values for accuracy (95.9%), PPV (97.7%), NPV (97.7%), and F1 score (89.7%), along with the superior area under the curve (AUC) of 95.57% for both diabetes and non-diabetes cases. The study suggests that machine learning classifiers can serve as a reliable tool for the precise prediction of T2DM, thereby enhancing clinical decision-making processes for healthcare practitioners.
Ethics committee approval was not required for this study because of there was no study on animals or humans. The study employed an open-source dataset.
Primary Language | English |
---|---|
Subjects | Biomedical Diagnosis, Electrical Engineering (Other) |
Journal Section | Research Articles |
Authors | |
Publication Date | May 15, 2025 |
Submission Date | January 12, 2025 |
Acceptance Date | April 22, 2025 |
Published in Issue | Year 2025 Volume: 8 Issue: 3 |