Research Article
BibTex RIS Cite

PROJE EFOR TAHMİNİ İÇİN MAKİNE ÖĞRENMESİ MODELLERİNİN GELİŞTİRİLMESİ VE SHAP YÖNTEMİ KULLANILARAK AÇIKLANMASI

Year 2025, Volume: 13 Issue: 2, 528 - 544, 27.06.2025
https://doi.org/10.21923/jesd.1604190

Abstract

Günümüzde işletmeler, dijitalleşen dünyaya uyum sağlamak için başarılı bir proje yönetimine ihtiyaç duymaktadır. Özellikle yazılım projelerinin artışıyla birlikte, doğru efor tahmini yapmak kritik bir süreç haline gelmiştir. Efor tahmini, projenin tamamlanması için gereken zaman ve iş gücü miktarını tahmin ederek maliyetleri optimize etmeyi sağlamaktadır. Bu çalışmada, proje efor tahmini için rastgele orman, karar ağacı, doğrusal regresyon, yapay sinir ağı, GradientBoost ve AdaBoost yöntemleri geliştirilmiştir. china_original, cocomonasa_v1, humans2, nasa93, usp05 ve usp05-ft gibi 6 farklı veri seti üzerinde 50 tekrarlayan sınama yaklaşımı kullanılarak analizler yapılmış ve modeller ortalama mutlak hata, ortalama logaritmik kare hatası, belirleme katsayısı ve ortalama göreli büyüklük hatası metrikleri kullanılarak karşılaştırılmıştır. Analiz sonuçlarına göre yapay sinir ağı, rastgele orman, karar ağaçları ve GradientBoost modellerinin farklı veri setlerinde en başarılı modeller olduğu gözlemlenmiştir. Proje efor tahmini için ise en başarılı modelin karar ağacı olduğu kanısına varılmıştır. Çalışmada yapılan diğer bir analizde ise, geliştirilen modeller açıklamalı yapay zekâ modeli olan SHAP (SHapley Additive exPlanations) yöntemi kullanılarak açıklanmıştır. Yapılan açıklamalar doğrultusunda her bir veri seti için bazı özniteliklerin model karar alma sürecinde diğer özniteliklere göre daha etkili olduğu gözlemlenmiştir.

References

  • Amruthnath, N., & Gupta, T. (2018). A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance. 2018 5th International Conference on Industrial Engineering and Applications (ICIEA), 355-361. https://doi.org/10.1109/IEA.2018.8387124
  • Azzeh, M., & Nassif, A. B. (2016). A hybrid model for estimating software project effort from Use Case Points. Applied Soft Computing, 49, 981-989. https://doi.org/10.1016/j.asoc.2016.05.008
  • BaniMustafa, A. (2018). Predicting Software Effort Estimation Using Machine Learning Techniques. 2018 8th International Conference on Computer Science and Information Technology (CSIT), 249-256. https://doi.org/10.1109/CSIT.2018.8486222
  • Braga, P. L., Oliveira, A. L. I., Ribeiro, G. H. T., & Meira, S. R. L. (2007). Bagging Predictors for Estimation of Software Project Effort. 2007 International Joint Conference on Neural Networks, 1595-1600. https://doi.org/10.1109/IJCNN.2007.4371196
  • Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324 Dragicevic, S., Celar, S., & Turic, M. (2017). Bayesian network model for task effort estimation in agile software development. Journal of Systems and Software, 127, 109-119. https://doi.org/10.1016/j.jss.2017.01.027
  • Draper, N. R., & Smith, H. (1998). Applied Regression Analysis. John Wiley & Sons. Effor Estimation Datasets. (2024). GitHub. https://github.com/danrodgar/DASE/tree/master/datasets/effortEstimation
  • Elish, M. O. (2009). Improved estimation of software project effort using multiple additive regression trees. Expert Systems with Applications, 36(7), 10774-10778. https://doi.org/10.1016/j.eswa.2009.02.013
  • Erasmus, I. P., & Daneva, M. (2013). ERP Effort Estimation Based on Expert Judgments. 2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement, 104-109. https://doi.org/10.1109/IWSM-Mensura.2013.25
  • Freund, Y., & Schapire, R. E. (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55(1), 119-139. https://doi.org/10.1006/jcss.1997.1504
  • Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5), 1189-1232.
  • Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., & Kagal, L. (2019). Explaining Explanations: An Overview of Interpretability of Machine Learning (arXiv:1806.00069). arXiv. https://doi.org/10.48550/arXiv.1806.00069
  • Hameed, S., Elsheikh, Y., & Azzeh, M. (2023). An optimized case-based software project effort estimation using genetic algorithm. Information and Software Technology, 153, 107088. https://doi.org/10.1016/j.infsof.2022.107088
  • Haris, M., Chua, F.-F., & Lim, A. H.-L. (2023). An Ensemble-Based Framework to Estimate Software Project Effort. 2023 IEEE 8th International Conference On Software Engineering and Computer Systems (ICSECS), 47-52. https://doi.org/10.1109/ICSECS58457.2023.10256337
  • Hosni, M. (2024). Comparative Analysis of Single and Ensemble Support Vector Regression Methods for Software Development Effort Estimation: Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, 509-516. https://doi.org/10.5220/0013072300003838
  • Jorgensen, M. (2005). Practical guidelines for expert-judgment-based software effort estimation. IEEE Software, 22(3), 57-63. IEEE Software. https://doi.org/10.1109/MS.2005.73
  • Kassaymeh, S., Alweshah, M., Al-Betar, M. A., Hammouri, A. I., & Al-Ma’aitah, M. A. (2024). Software effort estimation modeling and fully connected artificial neural network optimization using soft computing techniques. Cluster Computing, 27(1), 737-760. https://doi.org/10.1007/s10586-023-03979-y
  • Kim, J.-H. (2009). Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics & Data Analysis, 53(11), 3735-3745. https://doi.org/10.1016/j.csda.2009.04.009
  • Kitchenham, B., & Mendes, E. (2004). Software productivity measurement using multiple size measures. IEEE Transactions on Software Engineering, 30(12), 1023-1035. IEEE Transactions on Software Engineering. https://doi.org/10.1109/TSE.2004.104
  • Kök, İ. (2024). Açıklanabilir Yapay Zekaya Dayalı Müşteri Kaybı Analizi ve Elde Tutma Önerisi. Mühendislik Bilimleri ve Araştırmaları Dergisi, 6(1), Article 1. https://doi.org/10.46387/bjesr.1344414
  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. https://doi.org/10.1038/nature14539
  • Lipton, Z. C. (2017). The Mythos of Model Interpretability (arXiv:1606.03490). arXiv. https://doi.org/10.48550/arXiv.1606.03490
  • Lundberg, S. M., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems, 30. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
  • Molnar, C. (2020). Interpretable Machine Learning. Lulu.com. Mukherjee, S., & Malu, R. K. (2014). Optimization of project effort estimate using neural network. 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies, 406-410. https://doi.org/10.1109/ICACCCT.2014.7019474
  • Mustafa, E. I., & Osman, R. (2024). A random forest model for early-stage software effort estimation for the SEERA dataset. Information and Software Technology, 169, 107413. https://doi.org/10.1016/j.infsof.2024.107413
  • Özgür, A. S., Tarhan, Ç., Komesli, M., & Tecim, V. (2023). Yapay Zeka Teknikleri Kullanılarak Proje Üretim Sistemlerinin Tasarımı ve Geliştirilmesi. Journal of Information Systems and Management Research, 5(1), Article 1. https://doi.org/10.59940/jismar.1214440
  • Plumb, G., Molitor, D., & Talwalkar, A. S. (2018). Model Agnostic Supervised Local Explanations. Advances in Neural Information Processing Systems, 31. https://proceedings.neurips.cc/paper_files/paper/2018/hash/b495ce63ede0f4efc9eec62cb947c162-Abstract.html
  • Pospieszny, P., Czarnacka-Chrobot, B., & Kobylinski, A. (2018). An effective approach for software project effort and duration estimation with machine learning algorithms. Journal of Systems and Software, 137, 184-196. https://doi.org/10.1016/j.jss.2017.11.066
  • Qi, F., Jing, X.-Y., Zhu, X., Xie, X., Xu, B., & Ying, S. (2017). Software effort estimation based on open source projects: Case study of Github. Information and Software Technology, 92, 145-157. https://doi.org/10.1016/j.infsof.2017.07.015
  • Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106. https://doi.org/10.1007/BF00116251
  • Ritu, & Bhambri, P. (2023). Software Effort Estimation with Machine Learning – A Systematic Literature Review. Içinde Agile Software Development (ss. 291-308). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781119896838.ch15 scikit-learn: Machine learning in Python. (2024). https://scikit-learn.org/stable/
  • Seber, G. A. F., & Lee, A. J. (2012). Linear Regression Analysis. John Wiley & Sons. SHAP. (2024). https://shap.readthedocs.io/en/latest/
  • Sharma, S., & Vijayvargiya, S. (2021). Applying Soft Computing Techniques for Software Project Effort Estimation Modelling. Içinde V. Nath & J. K. Mandal (Ed.), Nanoelectronics, Circuits and Communication Systems (ss. 211-227). Springer. https://doi.org/10.1007/978-981-15-7486-3_21
  • Sharma, S., & Vijayvargiya, S. (2022). Modeling of software project effort estimation: A comparative performance evaluation of optimized soft computing-based methods. International Journal of Information Technology, 14(5), 2487-2496. https://doi.org/10.1007/s41870-022-00962-5
  • Sharma, S., & Vijayvargiya, S. (2023). An Optimized Neuro-Fuzzy Network for Software Project Effort Estimation. IETE Journal of Research, 69(10), 6855-6866. https://doi.org/10.1080/03772063.2022.2027282
  • Shepperd, M., Schofield, C., & Kitchenham, B. (1996). Effort estimation using analogy. Proceedings of IEEE 18th International Conference on Software Engineering, 170-178. https://doi.org/10.1109/ICSE.1996.493413
  • Şengüneş, B., & Öztürk, N. (2023). An Artificial Neural Network Model for Project Effort Estimation. Systems, 11(2), Article 2. https://doi.org/10.3390/systems11020091
  • Tsunoda, M., Monden, A., Keung, J., & Matsumoto, K. (2012). Incorporating Expert Judgment into Regression Models of Software Effort Estimation. 2012 19th Asia-Pacific Software Engineering Conference, 1, 374-379. https://doi.org/10.1109/APSEC.2012.58
  • Tuncer, Y. (2024). Artificial Intelligence Based Risk Analsis in Project Management [M.Eng.]. https://www.proquest.com/docview/3143984193/abstract/4ABE168365614041PQ/1
  • Walkerden, F., & Jeffery, R. (1999). An Empirical Study of Analogy-based Software Effort Estimation. Empirical Software Engineering, 4(2), 135-158. https://doi.org/10.1023/A:1009872202035
  • Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79-82. https://doi.org/10.3354/cr030079

DEVELOPMENT OF MACHINE LEARNING MODELS FOR PROJECT EFFORT PREDICTION AND EXPLANATION USING SHAP METHOD

Year 2025, Volume: 13 Issue: 2, 528 - 544, 27.06.2025
https://doi.org/10.21923/jesd.1604190

Abstract

In today’s digitalized world, successful project management has become essential for businesses, with accurate effort estimation emerging as a critical component due to the increasing prevalence of software projects. Effort estimation facilitates cost optimization by predicting the time and labor required for project completion. This study developed and evaluated six regression models—random forest, decision tree, linear regression, neural network, GradientBoost, and AdaBoost—for project effort estimation. Analyses were conducted on six datasets (china_original, cocomonasa_v1, humans2, nasa93, usp05, and usp05-ft) using 50 repeated holdout tests, and model performance was compared using metrics such as mean absolute error, mean squared logarithmic error, coefficient of determination, and mean relative magnitude error. The results demonstrated that artificial neural networks, random forest, decision trees, and GradientBoost models performed most effectively across the datasets, with the decision tree identified as the best-performing model for effort estimation. Furthermore, the study utilized the SHAP (Shapley Additive Explanations) method to interpret the models, revealing that specific attributes were more influential than others in the decision-making process across different datasets.

References

  • Amruthnath, N., & Gupta, T. (2018). A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance. 2018 5th International Conference on Industrial Engineering and Applications (ICIEA), 355-361. https://doi.org/10.1109/IEA.2018.8387124
  • Azzeh, M., & Nassif, A. B. (2016). A hybrid model for estimating software project effort from Use Case Points. Applied Soft Computing, 49, 981-989. https://doi.org/10.1016/j.asoc.2016.05.008
  • BaniMustafa, A. (2018). Predicting Software Effort Estimation Using Machine Learning Techniques. 2018 8th International Conference on Computer Science and Information Technology (CSIT), 249-256. https://doi.org/10.1109/CSIT.2018.8486222
  • Braga, P. L., Oliveira, A. L. I., Ribeiro, G. H. T., & Meira, S. R. L. (2007). Bagging Predictors for Estimation of Software Project Effort. 2007 International Joint Conference on Neural Networks, 1595-1600. https://doi.org/10.1109/IJCNN.2007.4371196
  • Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324 Dragicevic, S., Celar, S., & Turic, M. (2017). Bayesian network model for task effort estimation in agile software development. Journal of Systems and Software, 127, 109-119. https://doi.org/10.1016/j.jss.2017.01.027
  • Draper, N. R., & Smith, H. (1998). Applied Regression Analysis. John Wiley & Sons. Effor Estimation Datasets. (2024). GitHub. https://github.com/danrodgar/DASE/tree/master/datasets/effortEstimation
  • Elish, M. O. (2009). Improved estimation of software project effort using multiple additive regression trees. Expert Systems with Applications, 36(7), 10774-10778. https://doi.org/10.1016/j.eswa.2009.02.013
  • Erasmus, I. P., & Daneva, M. (2013). ERP Effort Estimation Based on Expert Judgments. 2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement, 104-109. https://doi.org/10.1109/IWSM-Mensura.2013.25
  • Freund, Y., & Schapire, R. E. (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55(1), 119-139. https://doi.org/10.1006/jcss.1997.1504
  • Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5), 1189-1232.
  • Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., & Kagal, L. (2019). Explaining Explanations: An Overview of Interpretability of Machine Learning (arXiv:1806.00069). arXiv. https://doi.org/10.48550/arXiv.1806.00069
  • Hameed, S., Elsheikh, Y., & Azzeh, M. (2023). An optimized case-based software project effort estimation using genetic algorithm. Information and Software Technology, 153, 107088. https://doi.org/10.1016/j.infsof.2022.107088
  • Haris, M., Chua, F.-F., & Lim, A. H.-L. (2023). An Ensemble-Based Framework to Estimate Software Project Effort. 2023 IEEE 8th International Conference On Software Engineering and Computer Systems (ICSECS), 47-52. https://doi.org/10.1109/ICSECS58457.2023.10256337
  • Hosni, M. (2024). Comparative Analysis of Single and Ensemble Support Vector Regression Methods for Software Development Effort Estimation: Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, 509-516. https://doi.org/10.5220/0013072300003838
  • Jorgensen, M. (2005). Practical guidelines for expert-judgment-based software effort estimation. IEEE Software, 22(3), 57-63. IEEE Software. https://doi.org/10.1109/MS.2005.73
  • Kassaymeh, S., Alweshah, M., Al-Betar, M. A., Hammouri, A. I., & Al-Ma’aitah, M. A. (2024). Software effort estimation modeling and fully connected artificial neural network optimization using soft computing techniques. Cluster Computing, 27(1), 737-760. https://doi.org/10.1007/s10586-023-03979-y
  • Kim, J.-H. (2009). Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics & Data Analysis, 53(11), 3735-3745. https://doi.org/10.1016/j.csda.2009.04.009
  • Kitchenham, B., & Mendes, E. (2004). Software productivity measurement using multiple size measures. IEEE Transactions on Software Engineering, 30(12), 1023-1035. IEEE Transactions on Software Engineering. https://doi.org/10.1109/TSE.2004.104
  • Kök, İ. (2024). Açıklanabilir Yapay Zekaya Dayalı Müşteri Kaybı Analizi ve Elde Tutma Önerisi. Mühendislik Bilimleri ve Araştırmaları Dergisi, 6(1), Article 1. https://doi.org/10.46387/bjesr.1344414
  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. https://doi.org/10.1038/nature14539
  • Lipton, Z. C. (2017). The Mythos of Model Interpretability (arXiv:1606.03490). arXiv. https://doi.org/10.48550/arXiv.1606.03490
  • Lundberg, S. M., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems, 30. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
  • Molnar, C. (2020). Interpretable Machine Learning. Lulu.com. Mukherjee, S., & Malu, R. K. (2014). Optimization of project effort estimate using neural network. 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies, 406-410. https://doi.org/10.1109/ICACCCT.2014.7019474
  • Mustafa, E. I., & Osman, R. (2024). A random forest model for early-stage software effort estimation for the SEERA dataset. Information and Software Technology, 169, 107413. https://doi.org/10.1016/j.infsof.2024.107413
  • Özgür, A. S., Tarhan, Ç., Komesli, M., & Tecim, V. (2023). Yapay Zeka Teknikleri Kullanılarak Proje Üretim Sistemlerinin Tasarımı ve Geliştirilmesi. Journal of Information Systems and Management Research, 5(1), Article 1. https://doi.org/10.59940/jismar.1214440
  • Plumb, G., Molitor, D., & Talwalkar, A. S. (2018). Model Agnostic Supervised Local Explanations. Advances in Neural Information Processing Systems, 31. https://proceedings.neurips.cc/paper_files/paper/2018/hash/b495ce63ede0f4efc9eec62cb947c162-Abstract.html
  • Pospieszny, P., Czarnacka-Chrobot, B., & Kobylinski, A. (2018). An effective approach for software project effort and duration estimation with machine learning algorithms. Journal of Systems and Software, 137, 184-196. https://doi.org/10.1016/j.jss.2017.11.066
  • Qi, F., Jing, X.-Y., Zhu, X., Xie, X., Xu, B., & Ying, S. (2017). Software effort estimation based on open source projects: Case study of Github. Information and Software Technology, 92, 145-157. https://doi.org/10.1016/j.infsof.2017.07.015
  • Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106. https://doi.org/10.1007/BF00116251
  • Ritu, & Bhambri, P. (2023). Software Effort Estimation with Machine Learning – A Systematic Literature Review. Içinde Agile Software Development (ss. 291-308). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781119896838.ch15 scikit-learn: Machine learning in Python. (2024). https://scikit-learn.org/stable/
  • Seber, G. A. F., & Lee, A. J. (2012). Linear Regression Analysis. John Wiley & Sons. SHAP. (2024). https://shap.readthedocs.io/en/latest/
  • Sharma, S., & Vijayvargiya, S. (2021). Applying Soft Computing Techniques for Software Project Effort Estimation Modelling. Içinde V. Nath & J. K. Mandal (Ed.), Nanoelectronics, Circuits and Communication Systems (ss. 211-227). Springer. https://doi.org/10.1007/978-981-15-7486-3_21
  • Sharma, S., & Vijayvargiya, S. (2022). Modeling of software project effort estimation: A comparative performance evaluation of optimized soft computing-based methods. International Journal of Information Technology, 14(5), 2487-2496. https://doi.org/10.1007/s41870-022-00962-5
  • Sharma, S., & Vijayvargiya, S. (2023). An Optimized Neuro-Fuzzy Network for Software Project Effort Estimation. IETE Journal of Research, 69(10), 6855-6866. https://doi.org/10.1080/03772063.2022.2027282
  • Shepperd, M., Schofield, C., & Kitchenham, B. (1996). Effort estimation using analogy. Proceedings of IEEE 18th International Conference on Software Engineering, 170-178. https://doi.org/10.1109/ICSE.1996.493413
  • Şengüneş, B., & Öztürk, N. (2023). An Artificial Neural Network Model for Project Effort Estimation. Systems, 11(2), Article 2. https://doi.org/10.3390/systems11020091
  • Tsunoda, M., Monden, A., Keung, J., & Matsumoto, K. (2012). Incorporating Expert Judgment into Regression Models of Software Effort Estimation. 2012 19th Asia-Pacific Software Engineering Conference, 1, 374-379. https://doi.org/10.1109/APSEC.2012.58
  • Tuncer, Y. (2024). Artificial Intelligence Based Risk Analsis in Project Management [M.Eng.]. https://www.proquest.com/docview/3143984193/abstract/4ABE168365614041PQ/1
  • Walkerden, F., & Jeffery, R. (1999). An Empirical Study of Analogy-based Software Effort Estimation. Empirical Software Engineering, 4(2), 135-158. https://doi.org/10.1023/A:1009872202035
  • Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79-82. https://doi.org/10.3354/cr030079
There are 40 citations in total.

Details

Primary Language Turkish
Subjects Information Systems (Other), Computer Software, Automated Software Engineering
Journal Section Research Articles
Authors

Esma Nur Kaya 0009-0000-3144-2686

Yasin Görmez 0000-0001-8276-2030

Publication Date June 27, 2025
Submission Date December 20, 2024
Acceptance Date May 16, 2025
Published in Issue Year 2025 Volume: 13 Issue: 2

Cite

APA Kaya, E. N., & Görmez, Y. (2025). PROJE EFOR TAHMİNİ İÇİN MAKİNE ÖĞRENMESİ MODELLERİNİN GELİŞTİRİLMESİ VE SHAP YÖNTEMİ KULLANILARAK AÇIKLANMASI. Mühendislik Bilimleri Ve Tasarım Dergisi, 13(2), 528-544. https://doi.org/10.21923/jesd.1604190