[2603.21039] Benchmarking Scientific Machine Learning Models for Air Quality Data
About this article
Abstract page for arXiv paper 2603.21039: Benchmarking Scientific Machine Learning Models for Air Quality Data
Computer Science > Machine Learning arXiv:2603.21039 (cs) [Submitted on 22 Mar 2026] Title:Benchmarking Scientific Machine Learning Models for Air Quality Data Authors:Khawja Imran Masud, Venkata Sai Rahul Unnam, Sahara Ali View a PDF of the paper titled Benchmarking Scientific Machine Learning Models for Air Quality Data, by Khawja Imran Masud and 2 other authors View PDF HTML (experimental) Abstract:Accurate air quality index (AQI) forecasting is essential for the protecting public health in rapidly growing urban regions, and the practical model evaluation and selection are often challenged by the lack of rigorous, region-specific benchmarking on standardized datasets. Physics-guided machine learning and deep learning models could be a good and effective solution to resolve such issues with more accurate and efficient AQI forecasting. This research study presents an explainable and comprehensive benchmark that enables a guideline and proposed physics-guided best model by benchmarking classical time-series, machine-learning, and deep-learning approaches for multi-horizon AQI forecasting in North Texas (Dallas County). Using publicly available U.S. Environmental Protection Agency (EPA) daily observations of air quality data from 2022 to 2024, we curate city-level time series for PM2.5 and O3 by aggregating station measurements and constructing lag-wise forecasting datasets for LAG in {1,7,14,30} days. For benchmarking the best model, linear regression (LR), SARIMAX, multil...