Artificial Intelligence

Analysis and Comparison of Missing Values Imputation Methods for Atmospheric Pollution Data

Abstract

Missing values frequently occur in real-world time series datasets, significantly affecting the precision and reliability of data analysis and machine learning models. This research project aims to explore the types of missing data occurrences and examine various imputation methods. The approaches considered will range from simple statistical techniques to more complex methods such as regression models, neural networks, and LSTM models. The effectiveness of these imputation techniques will be assessed using atmospheric pollution data, with a particular focus on PM10 and PM2.5 levels. Each method,s performance will be evaluated based on accuracy, consistency, and its impact on subsequent predictive models. The findings indicate that LSTM models are the most effective, while regression and MLP models, though less accurate, offer faster alterna tives. Conversely, mean imputation results in the highest error values.

DOI: doi.org/10.63721/25JPAIR0113

To Read or Download the Article  PDF