Predicting Flight Arrival Delays: Machine Learning Approaches

Published:

Project Overview

Team project for CS4641: Machine Learning at Georgia Institute of Technology (Fall 2025), focusing on predicting flight arrival delays at Hartsfield–Jackson Atlanta International Airport using U.S. Bureau of Transportation Statistics data.

Team Members: Antonio Arias Alonso, Luoyi Zhang, Magnus H. Voss, Richie Jonathan, Sara El Khattabi Vilchez

Project Website: https://rjrichie.github.io/ml-flight-arrival-delay/

Objectives

  • Develop binary classification models to predict whether flights will be delayed (>15 minutes)
  • Build regression models to estimate arrival delay magnitude in minutes
  • Compare performance across multiple ML algorithms on imbalanced dataset

Dataset

  • Source: U.S. Bureau of Transportation Statistics (BTS)
  • Scope: Domestic arrivals at ATL from 11 airlines (2019-2025, excluding COVID years)
  • Features: Scheduled/actual times, carrier information, origin airport, temporal factors
  • Challenge: Highly imbalanced (84.8% on-time, 15.2% delayed)

Methods & Models

Classification Models

  • Logistic Regression - Baseline linear model with class balancing
  • Random Forest Classifier - Ensemble method with hyperparameter tuning
  • Multi-Layer Perceptron - Neural network with early stopping
  • XGBoost - Gradient boosting with randomized search optimization

Regression Models

  • Random Forest Regressor - Predicting delay magnitude
  • MLP Regressor - Neural network for continuous delay estimation
  • XGBoost Regressor - Optimized gradient boosting

Key Results

Classification Performance (Best: XGBoost)

  • Accuracy: 61.4%
  • F1-Score: 0.316
  • ROC-AUC: 0.643
  • Successfully balanced precision/recall for minority class detection

Regression Performance (Best: XGBoost)

  • MAE: 19.48 minutes
  • RMSE: 34.11 minutes
  • R²: 0.039
  • 57.1% of predictions within 15 minutes of actual delay

Technical Contributions

  • Extensive feature engineering (temporal, operational, carrier-specific features)
  • Addressed class imbalance using balanced weights and SMOTE techniques
  • Comprehensive hyperparameter tuning with cross-validation
  • Feature importance analysis revealing scheduled elapsed time, origin airport, and seasonality as top predictors

Technologies Used

  • Machine Learning: scikit-learn, XGBoost, PyTorch/TensorFlow
  • Data Processing: pandas, NumPy
  • Visualization: matplotlib, seaborn
  • Development: Python, Jupyter Notebook

Conclusions

Tree-based ensemble methods (Random Forest, XGBoost) consistently outperformed linear and neural models, demonstrating the importance of capturing nonlinear interactions in flight delay prediction. The relatively low R² in regression tasks highlighted the need for external features (weather, air traffic, upstream delays) for improved delay magnitude estimation.

Course

CS4641: Machine Learning
Georgia Institute of Technology, Fall 2025