Contamination from anthropogenic activities is a long-standing challenge to the sustainability of groundwater resources. Physically based (PB) models are often used in groundwater risk assessments, but their application to large scale problems requiring high spatial resolution remains computationally intractable. Machine learning (ML) models have emerged as an alternative to PB models in the era of big data, but the necessary number of observations may be impractical to obtain when events are rare, such as episodic groundwater contamination incidents. The current study employs metamodeling, a hybrid approach that combines the strengths of PB and ML models while addressing their respective limitations, to evaluate groundwater well vulnerability to contamination from unconventional oil and gas development (UD). We illustrate the approach in northeastern Pennsylvania, where intensive natural gas production from the Marcellus Shale overlaps with local community dependence on shallow aquifers. Metamodels were trained to classify vulnerability from predictors readily computable in a geographic information system. The trained metamodels exhibited high accuracy (average out-of-bag classification error <5%). A predictor combining information on topography, hydrology, and proximity to contaminant sources (inverse distance to nearest upgradient UD source) was found to be highly important for accurate metamodel predictions. Alongside violation reports and historical groundwater quality records, the predicted vulnerability provided critical insights for establishing the prevalence of UD contamination in 94 household wells that we sampled in 2018. While <10% of the sampled wells exhibited chemical signatures consistent with UD produced wastewaters, >60% were predicted to be in vulnerable locations, suggesting that future impacts are likely to occur with greater frequency if safeguards against contaminant releases are relaxed. Our results show that hybrid physics-informed ML offers a robust and scalable framework for assessing groundwater contamination risks.
Bibliographical noteFunding Information:
This research was conducted as part of the Water and Energy Resources Study (https://medicine.yale.edu/ water/). The project was developed under Assistance Agreement No. CR839249 awarded by the U.S. Environmental Protection Agency to Yale University. It has not been formally reviewed by EPA. The views expressed in this document are solely those of the authors and do not necessarily reflect those of the Agency. EPA does not endorse any products or commercial services mentioned in this publication. M A Soriano was also supported by the Yale Institute for Biospheric Studies Small Grants Program and the Geological Society of America Graduate Student Research Grants program. C J Clark was funded in part by the National Institute of Environmental Health Sciences under the National Institutes of Health [F31ES031441]. We thank the three anonymous reviewers for their thoughtful comments that led to improvements in this manuscript. We are also grateful to Dr Joshua Warren for initial insights on ML, and to Keli Sorrentino, Julie Plano, Livia Nock, Emma Ryan, Rebecca Brenneis, Nicholas Hoffman, and Nicolette Bugher for project assistance. We also thank the Yale Analytical and Stable Isotope Center and the Cary Institute of Ecosystem Studies for use of laboratory facilities, and the Yale Center for Research Computing for use of the high-performance computing infrastructure.
© 2021 The Author(s). Published by IOP Publishing Ltd
- Drinking water quality
- Gas development
- Groundwater contamination risk assessment
- Physics-informed machine learning
- Unconventional oil