Supervised Learning Approaches to Flight Delay Prediction
Year 2020,
Volume: 24 Issue: 6, 1223 - 1231, 01.12.2020
Mehmet Cemal Atlıoğlu
Mustafa Bolat
Murat Şahin
Volkan Tunalı
,
Deniz Kılınç
Abstract
Delays in flights and other airline operations have significant consequences in quality of service, operational costs, and customer satisfaction. Therefore, it is important to predict the occurrence of delays and take necessary actions accordingly. In this study, we addressed the flight delay prediction problem from a supervised machine learning perspective. Using a real-world airline operations dataset provided by a leading airline company, we identified optimum dataset features for optimum prediction accuracy. In addition, we trained and tested 11 machine learning models on the datasets that we created from the original dataset via feature selection and transformation. CART and KNN showed consistently good performance in almost all cases achieving 0.816 and 0.807 F-Scores respectively. Similarly, GBM, XGB, and LGBM showed very good performance in most of the cases, achieving F-Scores around 0.810.
Supporting Institution
Research and Development Center of TAV Airports Holding
Thanks
Funding for this work was partially supported by the Research and Development Center of TAV Airports Holding accredited on Turkey - Ministry of Science.
References
- N. Pyrgiotis, K. M. Malone, and A. Odoni, "Modelling delay propagation within an airport network," Transportation Research Part C: Emerging Technologies, vol. 27, pp. 60-75, 2013.
- J. J. Rebollo and H. Balakrishnan, "Characterization and prediction of air traffic delays," Transportation Research Part C: Emerging Technologies, vol. 44, pp. 231-241, 2014.
- Y. Ding, "Predicting flight delay based on multiple linear regression," in 2nd International Conference on Materials Science, Energy Technology and Environmental Engineering (MSETEE 2017), Zhuhai, China, 2017, pp. 1-8.
- N. Chakrabarty, "A Data Mining Approach to Flight Arrival Delay Prediction for American Airlines," CoRR, vol. abs/1903.06740, 2019.
- B. Yu, Z. Guo, S. Asian, H. Wang, and G. Chen, "Flight delay prediction for commercial air transport: A deep learning approach," Transportation Research Part E: Logistics and Transportation Review, vol. 125, pp. 203-221, 2019.
- H. Khaksar and A. Sheikholeslami, "Airline delay prediction by machine learning algorithms," Scientia Iranica, vol. 26, pp. 2689-2702, 2017.
- G. Gui, F. Liu, J. Sun, J. Yang, Z. Zhou, and D. Zhao, "Flight Delay Prediction Based on Aviation Big Data and Machine Learning," IEEE Transactions on Vehicular Technology, vol. 69, pp. 140-150, 2020.
- E. Alpaydın, Introduction to Machine Learning, 3rd ed. London, England: The MIT Press, 2014.
- J. Han and M. Kamber, Data Mining: Concepts and Techniques. USA: Morgan Kaufmann Publishers, 2006.
- J. C. Platt, "Fast training of support vector machines using sequential minimal optimization," in Advances in kernel methods, S. Bernhard, J. C. B. Christopher, and J. S. Alexander, Eds., ed: MIT Press, 1999, pp. 185-208.
- L. Breiman, "Random Forests," Machine Learning, vol. 45, pp. 5-32, 2001.
- T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," presented at the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, 2016.
- G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, "LightGBM: a highly efficient gradient boosting decision tree," presented at the Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA, 2017.
- A. V. Dorogush, V. Ershov, and A. Gulin, "CatBoost: gradient boosting with categorical features support," CoRR, vol. abs/1810.11363, 2018.
- W. McKinney, "pandas: a foundational Python library for data analysis and statistics," Python for High Performance and Scientific Computing, vol. 14, 2011.
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, "Scikit-learn: Machine Learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
Year 2020,
Volume: 24 Issue: 6, 1223 - 1231, 01.12.2020
Mehmet Cemal Atlıoğlu
Mustafa Bolat
Murat Şahin
Volkan Tunalı
,
Deniz Kılınç
References
- N. Pyrgiotis, K. M. Malone, and A. Odoni, "Modelling delay propagation within an airport network," Transportation Research Part C: Emerging Technologies, vol. 27, pp. 60-75, 2013.
- J. J. Rebollo and H. Balakrishnan, "Characterization and prediction of air traffic delays," Transportation Research Part C: Emerging Technologies, vol. 44, pp. 231-241, 2014.
- Y. Ding, "Predicting flight delay based on multiple linear regression," in 2nd International Conference on Materials Science, Energy Technology and Environmental Engineering (MSETEE 2017), Zhuhai, China, 2017, pp. 1-8.
- N. Chakrabarty, "A Data Mining Approach to Flight Arrival Delay Prediction for American Airlines," CoRR, vol. abs/1903.06740, 2019.
- B. Yu, Z. Guo, S. Asian, H. Wang, and G. Chen, "Flight delay prediction for commercial air transport: A deep learning approach," Transportation Research Part E: Logistics and Transportation Review, vol. 125, pp. 203-221, 2019.
- H. Khaksar and A. Sheikholeslami, "Airline delay prediction by machine learning algorithms," Scientia Iranica, vol. 26, pp. 2689-2702, 2017.
- G. Gui, F. Liu, J. Sun, J. Yang, Z. Zhou, and D. Zhao, "Flight Delay Prediction Based on Aviation Big Data and Machine Learning," IEEE Transactions on Vehicular Technology, vol. 69, pp. 140-150, 2020.
- E. Alpaydın, Introduction to Machine Learning, 3rd ed. London, England: The MIT Press, 2014.
- J. Han and M. Kamber, Data Mining: Concepts and Techniques. USA: Morgan Kaufmann Publishers, 2006.
- J. C. Platt, "Fast training of support vector machines using sequential minimal optimization," in Advances in kernel methods, S. Bernhard, J. C. B. Christopher, and J. S. Alexander, Eds., ed: MIT Press, 1999, pp. 185-208.
- L. Breiman, "Random Forests," Machine Learning, vol. 45, pp. 5-32, 2001.
- T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," presented at the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, 2016.
- G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, "LightGBM: a highly efficient gradient boosting decision tree," presented at the Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA, 2017.
- A. V. Dorogush, V. Ershov, and A. Gulin, "CatBoost: gradient boosting with categorical features support," CoRR, vol. abs/1810.11363, 2018.
- W. McKinney, "pandas: a foundational Python library for data analysis and statistics," Python for High Performance and Scientific Computing, vol. 14, 2011.
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, "Scikit-learn: Machine Learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.