Predicting lake surface water phosphorus dynamics using process-guided machine learning

Paul C. Hanson; Aviah B. Stillman; Xiaowei Jia; Anuj Karpatne; Hilary A. Dugan; Cayelan C. Carey; Joseph Stachelek; Nicole K. Ward; Yu Zhang; Jordan S. Read; Vipin Kumar

doi:10.1016/j.ecolmodel.2020.109136

Predicting lake surface water phosphorus dynamics using process-guided machine learning

Paul C. Hanson, Aviah B. Stillman, Xiaowei Jia, Anuj Karpatne, Hilary A. Dugan, Cayelan C. Carey, Joseph Stachelek, Nicole K. Ward, Yu Zhang, Jordan S. Read, Vipin Kumar

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

53 Scopus citations

Abstract

Phosphorus (P) loading to lakes is degrading the quality and usability of water globally. Accurate predictions of lake P dynamics are needed to understand whole-ecosystem P budgets, as well as the consequences of changing lake P concentrations for water quality. However, complex biophysical processes within lakes, along with limited observational data, challenge our capacity to reproduce short-term lake dynamics needed for water quality predictions, as well as long-term dynamics needed to understand broad scale controls over lake P. Here we use an emerging paradigm in modeling, process-guided machine learning (PGML), to produce a phosphorus budget for Lake Mendota (Wisconsin, USA) and to accurately predict epilimnetic phosphorus over a time range of days to decades. In our implementation of PGML, which we term a Process-Guided Recurrent Neural Network (PGRNN), we combine a process-based model for lake P with a recurrent neural network, and then constrain the predictions with ecological principles. We test independently the process-based model, the recurrent neural network, and the PGRNN to evaluate the overall approach. The process-based model accounted for most of the observed pattern in lake P; however it missed the long-term trend in lake P and had the worst performance in predicting winter and summer P in surface waters. The root mean square error (RMSE) for the process-based model, the recurrent neural network, and the PGRNN was 33.0 μg P L⁻¹, 22.7 μg P L⁻¹, and 20.7 μg P L⁻¹, respectively. All models performed better during summer, with RMSE values for the three models (same order) equal to 14.3 μg P L⁻¹, 10.9 μg P L⁻¹, and 10.7 μg P L⁻¹. Although the PGRNN had only marginally better RMSE during summer, it had lower bias and reproduced long-term decreases in lake P missed by the other two models. For all seasons and all years, the recurrent neural network had better predictions than process alone, with root mean square error (RMSE) of 23.8 μg P L⁻¹ and 28.0 μg P L⁻¹, respectively. The output of PGRNN indicated that new processes related to water temperature, thermal stratification, and long term changes in external loads are needed to improve the process model. By using ecological knowledge, as well as the information content of complex data, PGML shows promise as a technique for accurate prediction in messy, real-world ecological dynamics, while providing valuable information that can improve our understanding of process.

Original language	English (US)
Article number	109136
Journal	Ecological Modelling
Volume	430
DOIs	https://doi.org/10.1016/j.ecolmodel.2020.109136
State	Published - Aug 15 2020

Bibliographical note

Funding Information:
We thank our CNH colleagues and our GLEON colleagues for valuable discussions of the ideas herein and Samantha Oliver for reviewing the manuscript. We are grateful for Y. Gil, who catalyzed the collaboration between ecologists and computer scientists. Two anonymous reviewers provided helpful criticisms. The NTL LTER (DEB-1440297) provided context and data for the study. Funding: The U.S. National Science Foundation provided funding through the CNH-Lakes project ( ICER -1517823), DEB-1753639 , OAC-1934633 and DEB-1753657 .

Publisher Copyright:
© 2020 The Authors

Keywords

Lake
Lake Mendota
Long-term
Machine learning
Model
Phosphorus

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access

10.1016/j.ecolmodel.2020.109136

OpenUrl availability

Full text

Cite this

@article{c2404ea2a2234613bd571cddd43ef6a5,

title = "Predicting lake surface water phosphorus dynamics using process-guided machine learning",

abstract = "Phosphorus (P) loading to lakes is degrading the quality and usability of water globally. Accurate predictions of lake P dynamics are needed to understand whole-ecosystem P budgets, as well as the consequences of changing lake P concentrations for water quality. However, complex biophysical processes within lakes, along with limited observational data, challenge our capacity to reproduce short-term lake dynamics needed for water quality predictions, as well as long-term dynamics needed to understand broad scale controls over lake P. Here we use an emerging paradigm in modeling, process-guided machine learning (PGML), to produce a phosphorus budget for Lake Mendota (Wisconsin, USA) and to accurately predict epilimnetic phosphorus over a time range of days to decades. In our implementation of PGML, which we term a Process-Guided Recurrent Neural Network (PGRNN), we combine a process-based model for lake P with a recurrent neural network, and then constrain the predictions with ecological principles. We test independently the process-based model, the recurrent neural network, and the PGRNN to evaluate the overall approach. The process-based model accounted for most of the observed pattern in lake P; however it missed the long-term trend in lake P and had the worst performance in predicting winter and summer P in surface waters. The root mean square error (RMSE) for the process-based model, the recurrent neural network, and the PGRNN was 33.0 μg P L−1, 22.7 μg P L−1, and 20.7 μg P L−1, respectively. All models performed better during summer, with RMSE values for the three models (same order) equal to 14.3 μg P L−1, 10.9 μg P L−1, and 10.7 μg P L−1. Although the PGRNN had only marginally better RMSE during summer, it had lower bias and reproduced long-term decreases in lake P missed by the other two models. For all seasons and all years, the recurrent neural network had better predictions than process alone, with root mean square error (RMSE) of 23.8 μg P L−1 and 28.0 μg P L−1, respectively. The output of PGRNN indicated that new processes related to water temperature, thermal stratification, and long term changes in external loads are needed to improve the process model. By using ecological knowledge, as well as the information content of complex data, PGML shows promise as a technique for accurate prediction in messy, real-world ecological dynamics, while providing valuable information that can improve our understanding of process.",

keywords = "Lake, Lake Mendota, Long-term, Machine learning, Model, Phosphorus",

author = "Hanson, {Paul C.} and Stillman, {Aviah B.} and Xiaowei Jia and Anuj Karpatne and Dugan, {Hilary A.} and Carey, {Cayelan C.} and Joseph Stachelek and Ward, {Nicole K.} and Yu Zhang and Read, {Jordan S.} and Vipin Kumar",

note = "Funding Information: We thank our CNH colleagues and our GLEON colleagues for valuable discussions of the ideas herein and Samantha Oliver for reviewing the manuscript. We are grateful for Y. Gil, who catalyzed the collaboration between ecologists and computer scientists. Two anonymous reviewers provided helpful criticisms. The NTL LTER (DEB-1440297) provided context and data for the study. Funding: The U.S. National Science Foundation provided funding through the CNH-Lakes project ( ICER -1517823), DEB-1753639 , OAC-1934633 and DEB-1753657 . Publisher Copyright: {\textcopyright} 2020 The Authors",

year = "2020",

month = aug,

day = "15",

doi = "10.1016/j.ecolmodel.2020.109136",

language = "English (US)",

volume = "430",

journal = "Ecological Modelling",

issn = "0304-3800",

publisher = "Elsevier",

}

TY - JOUR

T1 - Predicting lake surface water phosphorus dynamics using process-guided machine learning

AU - Hanson, Paul C.

AU - Stillman, Aviah B.

AU - Jia, Xiaowei

AU - Karpatne, Anuj

AU - Dugan, Hilary A.

AU - Carey, Cayelan C.

AU - Stachelek, Joseph

AU - Ward, Nicole K.

AU - Zhang, Yu

AU - Read, Jordan S.

AU - Kumar, Vipin

N1 - Funding Information: We thank our CNH colleagues and our GLEON colleagues for valuable discussions of the ideas herein and Samantha Oliver for reviewing the manuscript. We are grateful for Y. Gil, who catalyzed the collaboration between ecologists and computer scientists. Two anonymous reviewers provided helpful criticisms. The NTL LTER (DEB-1440297) provided context and data for the study. Funding: The U.S. National Science Foundation provided funding through the CNH-Lakes project ( ICER -1517823), DEB-1753639 , OAC-1934633 and DEB-1753657 . Publisher Copyright: © 2020 The Authors

PY - 2020/8/15

Y1 - 2020/8/15

N2 - Phosphorus (P) loading to lakes is degrading the quality and usability of water globally. Accurate predictions of lake P dynamics are needed to understand whole-ecosystem P budgets, as well as the consequences of changing lake P concentrations for water quality. However, complex biophysical processes within lakes, along with limited observational data, challenge our capacity to reproduce short-term lake dynamics needed for water quality predictions, as well as long-term dynamics needed to understand broad scale controls over lake P. Here we use an emerging paradigm in modeling, process-guided machine learning (PGML), to produce a phosphorus budget for Lake Mendota (Wisconsin, USA) and to accurately predict epilimnetic phosphorus over a time range of days to decades. In our implementation of PGML, which we term a Process-Guided Recurrent Neural Network (PGRNN), we combine a process-based model for lake P with a recurrent neural network, and then constrain the predictions with ecological principles. We test independently the process-based model, the recurrent neural network, and the PGRNN to evaluate the overall approach. The process-based model accounted for most of the observed pattern in lake P; however it missed the long-term trend in lake P and had the worst performance in predicting winter and summer P in surface waters. The root mean square error (RMSE) for the process-based model, the recurrent neural network, and the PGRNN was 33.0 μg P L−1, 22.7 μg P L−1, and 20.7 μg P L−1, respectively. All models performed better during summer, with RMSE values for the three models (same order) equal to 14.3 μg P L−1, 10.9 μg P L−1, and 10.7 μg P L−1. Although the PGRNN had only marginally better RMSE during summer, it had lower bias and reproduced long-term decreases in lake P missed by the other two models. For all seasons and all years, the recurrent neural network had better predictions than process alone, with root mean square error (RMSE) of 23.8 μg P L−1 and 28.0 μg P L−1, respectively. The output of PGRNN indicated that new processes related to water temperature, thermal stratification, and long term changes in external loads are needed to improve the process model. By using ecological knowledge, as well as the information content of complex data, PGML shows promise as a technique for accurate prediction in messy, real-world ecological dynamics, while providing valuable information that can improve our understanding of process.

AB - Phosphorus (P) loading to lakes is degrading the quality and usability of water globally. Accurate predictions of lake P dynamics are needed to understand whole-ecosystem P budgets, as well as the consequences of changing lake P concentrations for water quality. However, complex biophysical processes within lakes, along with limited observational data, challenge our capacity to reproduce short-term lake dynamics needed for water quality predictions, as well as long-term dynamics needed to understand broad scale controls over lake P. Here we use an emerging paradigm in modeling, process-guided machine learning (PGML), to produce a phosphorus budget for Lake Mendota (Wisconsin, USA) and to accurately predict epilimnetic phosphorus over a time range of days to decades. In our implementation of PGML, which we term a Process-Guided Recurrent Neural Network (PGRNN), we combine a process-based model for lake P with a recurrent neural network, and then constrain the predictions with ecological principles. We test independently the process-based model, the recurrent neural network, and the PGRNN to evaluate the overall approach. The process-based model accounted for most of the observed pattern in lake P; however it missed the long-term trend in lake P and had the worst performance in predicting winter and summer P in surface waters. The root mean square error (RMSE) for the process-based model, the recurrent neural network, and the PGRNN was 33.0 μg P L−1, 22.7 μg P L−1, and 20.7 μg P L−1, respectively. All models performed better during summer, with RMSE values for the three models (same order) equal to 14.3 μg P L−1, 10.9 μg P L−1, and 10.7 μg P L−1. Although the PGRNN had only marginally better RMSE during summer, it had lower bias and reproduced long-term decreases in lake P missed by the other two models. For all seasons and all years, the recurrent neural network had better predictions than process alone, with root mean square error (RMSE) of 23.8 μg P L−1 and 28.0 μg P L−1, respectively. The output of PGRNN indicated that new processes related to water temperature, thermal stratification, and long term changes in external loads are needed to improve the process model. By using ecological knowledge, as well as the information content of complex data, PGML shows promise as a technique for accurate prediction in messy, real-world ecological dynamics, while providing valuable information that can improve our understanding of process.

KW - Lake

KW - Lake Mendota

KW - Long-term

KW - Machine learning

KW - Model

KW - Phosphorus

UR - http://www.scopus.com/inward/record.url?scp=85085653747&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85085653747&partnerID=8YFLogxK

U2 - 10.1016/j.ecolmodel.2020.109136

DO - 10.1016/j.ecolmodel.2020.109136

M3 - Article

AN - SCOPUS:85085653747

SN - 0304-3800

VL - 430

JO - Ecological Modelling

JF - Ecological Modelling

M1 - 109136

ER -

Predicting lake surface water phosphorus dynamics using process-guided machine learning

Abstract

Bibliographical note

Keywords

UN SDGs

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this