Araştırma Makalesi
BibTex RIS Kaynak Göster

Investigating the factors affecting obesity using machine learning algorithms

Yıl 2025, Cilt: 4 Sayı: 1, 18 - 24, 13.07.2025

Öz

Obesity is a multifactorial public health challenge, influenced by a complex interplay of behavioral, dietary, genetic, and lifestyle factors. Traditional statistical methods often fall short in capturing nonlinear relationships and high-dimensional interactions within such data. This study aims to identify the most influential predictors of obesity using four machine learning-based feature selection methods, thereby offering robust insights for public health interventions and policy design. A dataset comprising 2,111 records from individuals in Mexico, Peru, and Colombia—partially augmented using the SMOTE technique—was analyzed using Boruta, Recursive Feature Elimination (RFE), Lasso Logistic Regression, and Genetic Algorithms. Variables included demographic, behavioral, dietary, and physical activity-related features. All analyses were conducted in R. Across all four methods, high-calorie food consumption, frequent snacking, low water intake, reduced physical activity, and family history of overweight were consistently identified as key predictors of obesity. In contrast, variables such as gender, smoking, and transportation mode were not selected by any method, suggesting limited predictive value in the given context. Some features, like alcohol intake and vegetable consumption, showed algorithm-specific relevance. The convergence of findings across multiple machine learning algorithms strengthens the validity of selected predictors, emphasizing the role of lifestyle and dietary habits in obesity risk. The study highlights the utility of multi-algorithmic feature selection in deriving interpretable and reliable insights from complex health data, with implications for designing targeted intervention strategies.

Kaynakça

  • 1. Nagendran, M., Chen, Y., Lovejoy, C. A., Gordon, A. C., Komorowski, M., Harvey, H., ... Maruthappu, M. (2020). Artificial intelligence versus clinicians: Systematic review of design, reporting standards, and claims of deep learning studies. BMJ, 368, m689.
  • 2. Knights, V., Blazevska, T., Markovic, G., & Gajdoš Kljusurić, J. (2024). Mathematical analysis of experimental design and machine learning methods in identifying obesity-related factors. Archives of Medical Research, 12(9). https://doi.org/10.18103/mra.v12i9.5790
  • 3. Fernandes, A., Dahikar, S., Chopra, K., & Saxena, K. (2023). Comparison of machine learning algorithms for obesity prediction. In 2023 Asian Conference on Intelligent Technologies (ASIANCON) (pp. 1–5). IEEE.https://doi.org/10.1109/asiancon58793.2023.10270246
  • 4. Iqbal, M., Lisnawanty, L., Dharmawan, W. S., & Septian, R. (2024). Prediction of obesity categories based on physical activity using machine learning algorithms. Journal of Computer Networks, Architecture and High-Performance Computing, 6(3), 1025–1034. https://doi.org/10.47709/cnahpc.v6i3.4053
  • 5. de Lucena, P. H. P., de Campos, L. M. L., & Garcia, J. C. P. (2024). Predictive Performance of Machine Learning Algorithms Regarding Obesity Levels Based on Physical Activity and Nutritional Habits: A Comprehensive Analysis. IEEE Latin America Transactions, 22(9), 714-722.
  • 6. Kitiş, Ş., & Goker, H. (2023). Determination of obesity stages using machine learning algorithms. Anbar Journal of Engineering Sciences, 14(1), 80–88. https://doi.org/10.37649/aengs.2023.139350.1045
  • 7. R Core Team. (2024). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/
  • 8. Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of statistical software, 36, 1-13.
  • 9. Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1), 389–422. https://doi.org/10.1023/A:1012487302797
  • 10. Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of statistical software, 28, 1-26.
  • 11. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288. 12. Friedman, J. H., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33, 1-22. 13. Sampson, J. R. (1976). Adaptation in natural and artificial systems (John H. Holland). 14. Siedlecki, W., & Sklansky, J. (1989). A note on genetic algorithms for large-scale feature selection. Pattern recognition letters, 10(5), 335-347. 15. Scrucca, L. (2013). GA: A package for genetic algorithms in R. Journal of Statistical Software, 53, 1-37.
Yıl 2025, Cilt: 4 Sayı: 1, 18 - 24, 13.07.2025

Öz

Kaynakça

  • 1. Nagendran, M., Chen, Y., Lovejoy, C. A., Gordon, A. C., Komorowski, M., Harvey, H., ... Maruthappu, M. (2020). Artificial intelligence versus clinicians: Systematic review of design, reporting standards, and claims of deep learning studies. BMJ, 368, m689.
  • 2. Knights, V., Blazevska, T., Markovic, G., & Gajdoš Kljusurić, J. (2024). Mathematical analysis of experimental design and machine learning methods in identifying obesity-related factors. Archives of Medical Research, 12(9). https://doi.org/10.18103/mra.v12i9.5790
  • 3. Fernandes, A., Dahikar, S., Chopra, K., & Saxena, K. (2023). Comparison of machine learning algorithms for obesity prediction. In 2023 Asian Conference on Intelligent Technologies (ASIANCON) (pp. 1–5). IEEE.https://doi.org/10.1109/asiancon58793.2023.10270246
  • 4. Iqbal, M., Lisnawanty, L., Dharmawan, W. S., & Septian, R. (2024). Prediction of obesity categories based on physical activity using machine learning algorithms. Journal of Computer Networks, Architecture and High-Performance Computing, 6(3), 1025–1034. https://doi.org/10.47709/cnahpc.v6i3.4053
  • 5. de Lucena, P. H. P., de Campos, L. M. L., & Garcia, J. C. P. (2024). Predictive Performance of Machine Learning Algorithms Regarding Obesity Levels Based on Physical Activity and Nutritional Habits: A Comprehensive Analysis. IEEE Latin America Transactions, 22(9), 714-722.
  • 6. Kitiş, Ş., & Goker, H. (2023). Determination of obesity stages using machine learning algorithms. Anbar Journal of Engineering Sciences, 14(1), 80–88. https://doi.org/10.37649/aengs.2023.139350.1045
  • 7. R Core Team. (2024). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/
  • 8. Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of statistical software, 36, 1-13.
  • 9. Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1), 389–422. https://doi.org/10.1023/A:1012487302797
  • 10. Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of statistical software, 28, 1-26.
  • 11. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288. 12. Friedman, J. H., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33, 1-22. 13. Sampson, J. R. (1976). Adaptation in natural and artificial systems (John H. Holland). 14. Siedlecki, W., & Sklansky, J. (1989). A note on genetic algorithms for large-scale feature selection. Pattern recognition letters, 10(5), 335-347. 15. Scrucca, L. (2013). GA: A package for genetic algorithms in R. Journal of Statistical Software, 53, 1-37.
Toplam 11 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Klinik Tıp Bilimleri (Diğer)
Bölüm Research Article[En]
Yazarlar

Onur Çamlı 0000-0003-3885-3781

Dilek Sevim 0009-0003-4133-3733

Erken Görünüm Tarihi 21 Temmuz 2025
Yayımlanma Tarihi 13 Temmuz 2025
Gönderilme Tarihi 14 Nisan 2025
Kabul Tarihi 2 Temmuz 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 4 Sayı: 1

Kaynak Göster

APA Çamlı, O., & Sevim, D. (2025). Investigating the factors affecting obesity using machine learning algorithms. Eurasian Journal of Molecular and Biochemical Sciences, 4(1), 18-24.