Modeling the Impact of Emerging Pollutants on Aquatic Biodiversity Using Explainable Machine Learning
Main Article Content
Abstract
Emerging pollutants such as microplastics, pharmaceuticals, per- and polyfluoroalkyl substances (PFAS), pesticides, and personal care product contaminants have turned out to be threatening to aquatic biodiversity because of their persistence, bioaccumulation ability, and negative impacts on the environment. The present research aims to explore the influence of such pollutants on aquatic biodiversity by means of explainable machine learning. The purpose of the paper is to use the XML algorithm to model the relationship between physicochemical, emerging pollutant concentrations, and aquatic biodiversity parameters to reveal factors that could affect ecosystems' well-being. Data were gathered at several freshwater monitoring sites between 2018 and 2025, with regard to water temperature, dissolved oxygen, species richness, Shannon biodiversity index, and aquatic invertebrates' abundance. After the data pre-processing, feature engineering, and normalization stages, such machine learning models as Random Forest, XGBoost, and LightGBM were used to create prediction models. For the sake of explanation, the SHAP method (Shapley Additive Explanations) was employed to determine the factors that have the most considerable influence on biodiversity loss. Validation was done by means of descriptive statistics, Pearson correlation coefficient, ANOVA, and k-fold cross-validation. Model performance was evaluated according to R², MAE, and RMSE. As a result, the highest accuracy was attained using XGBoost (R²>0.94). Explaining model predictions showed that the main factors influencing biodiversity loss were microplastics, PFAS, water temperature, and dissolved oxygen.