Water Power June 2022

| Spotlight

Yay-I The challenge of forecasting streamflow is a perfect fit

for machine learning. There is a wide range of possible inputs, a massive record of these inputs (big data at its finest!), and a set of complex, interconnected relationships between these inputs and the predicted output. And best of all, there’s a wealth of established science and lessons from past solutions to draw from and build upon. At first glance this is a perfect fit for ML, and an easy win. However, it is important to keep in mind that not all solutions that employ machine learning are created equally. I’ll show that for a complex problem such as this, it’s possible to create a machine learning model that will have worse performance than simple solutions and conceptual models.

When AI Falls Short There are existing, complex solutions to forecasting

streamflow. Conceptual physical modeling, meaning creating calibrated equations that attempt to capture the relationships between variables (e.g. precipitation) and streamflow, have been applied to forecast flows at a variety of horizons for decades.

Conceptual models range in performance from woefully inadequate to pretty good. Their complexity is inherent to the problem itself: streamflow depends on so many different factors: the past winter’s snow, recent precipitation and temperature, upcoming precipitation and temperature, elevation and soil types across the basin, groundwater interaction, distribution of land classification, and many (many!) other factors. When the prior art, established science, and

interdisciplinary nature of a problem are not sufficiently integrated into how machine learning is applied, the solution will either immediately underperform or - eventually - fail catastrophically. What this means in practice is that a model is provided with a limited or partial set of inputs that influence the forecast, is trained on a historical period that causes “overfitting,” or is otherwise structured in a way that violates the nature of the problem (for streamflow forecasting, the laws of physics). In the case of forecasting streamflow, a hypothetical solution might take precipitation and temperature as inputs, some in situ flow measurements as observations, and an off-the-shelf machine learning model to tie it all together. Training and evaluation looks pretty good! The solution-builder presents the result and moves forward to place it into a decision- making workflow. But hold up, we’re savvy about hydrology. We know that there is more than precipitation and temperature that drive streamflow. Can you think of when and why this approach will fail, catastrophically? The first failure would be acute: the first big storm

(more water) or intense drought (less water) that occurs outside of the parameters of the data used to train the model would produce nonsense or incorrect predictions. And depending on how that model is used, that could have critical consequences like forcing an operational team to scramble due to a missed forecast. The second failure would be gradual, but no less

severe. Long-term nonstationarity (i.e. the gradual drift of the relationship between input and output) - in our case climate and landscape change - would cause the model to slowly deviate. This discrepancy might be almost imperceptible at first, but would become more

1.0 0.8 0.6 0.4 0.2 0.0 -0.2

All sites, WY 2021, all lead times

Model type Statistical

Conceptual

Theory-guided statistical Unknown

West Northwest Albama Southeast Québec

exacerbated over time and no less consequential than the acute failure, especially as extreme weather events are projected to increase.

A Better AI: Theory-guided Machine

Learning Thankfully, there is a better way to employ ML to predict natural systems: an approach that respects prior solutions and structurally integrates their wisdom. And I posit that this approach is key to the application of ML and AI to any complex problem. Our friend Curt Jawdy of Tennessee Valley Authority echoed this in his excellent reflections on the competition, asking “how will AI and conceptual models hybridize to provide a best-of-both approach?” At Upstream Tech we call this a theory-guided machine learning approach, and it’s the beating heart of HydroForecast. We:

1. Leverage expertise in meteorology and hydrology from our team and partners to inform how we select inputs

2. Meet with our customers to understand and incorporate their wisdom of the river(s) they work on

3. Build upon physical modeling approaches to inform how we train our models, evaluate our results, and make iterative improvements How do we know this approach to applying machine learning makes HydroForecast stronger? The Forecast Rodeo participants included veteran

forecasting teams at utilities like Tennessee Valley Authority and Hydro-Quebéc, governmental forecasts from agencies like NOAA’s National Weather Service River Forecast Centers, private vendors including Upstream Tech and Sapere, and - at some locations - public participants. We were not the only ones who submitted AI forecasts! However, one approach to applying machine learning (ours) performed vastly better than the others. It’s likely that the other ML participants did not sufficiently incorporate hydrological theory. And we can speculate that the models’ performance would worsen over time as more extreme events occur and climate patterns and landscapes shift. In contrast, HydroForecast’s theory-guided machine learning design makes it the best forecasting model currently available, and it will continue to improve and perform in the years and decades to come. It bears repeating: solutions that employ machine learning are not created equally. Often, there are simpler tools to use when devising solutions. And when machine learning is a good fit for the job, it is best applied with respect for the science and with the (human!) experts at the table. ●

Above: In each of the competition’s geographic regions, the best performance came from HydroForecast’s theory-guided machine learning approach. On average, machine learning models which were not theory-guided (“Statistical” above) underperformed conceptual models

More information To learn more about HydroForecast, reach out at team@hydroforecast.com. Do you have a problem and are thinking about machine learning as a tool for the job? I want to hear about it! Email me at: marshall@upstream.tech

www.waterpowermagazine.com | June 2022 | 9

More accurate Average NSE

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37

orderForm.title