The photos above were found by doing a google image search for ‘hydrologist’. Apparently our image is that of scientists that get to be outside a lot. We all know that the knowledge we gain from fieldwork gets codified in hydrological models which can be written in all sort of programming languages.
“I wonder what this analysis would look like using that other groups hydrological model…”
This used to either be a thought quickly discarded by hydrologists, or something to (get an MSc or PhD to) work on for months. We believe that running each others models shouldn’t be such a pain. Yet we also believe that simply shouting that other researchers should make their models easier to run for us won’t work.
Therefore in the eWaterCycle project we set out to building a system that can generate: ‘nice hydrographs for everyone’. Halfway into this four year project, our team of hydrologists and Research Software Engineers have currently built a system that allows registered users to run each others models without having to dive into each others code. We also made a software stack for pre-processing climate forcing data that is ‘FAIR by Design’ and lets you pre-process forcing data from different climatological datasets in a transparent and easy manner. Below (and at the upcoming EGU General Assembly in May in Vienna!) we’ll explain what that means for you!
Recap: what is eWaterCycle II about
The goal we have for the eWaterCycle II project is to provide the hydrological community with tools that:
- Allow the use of a wide variety of models, by different research groups, written in different programming languages, without having to learn those languages.
- Run models needing large amounts of memory and CPUs.
- Have access to all the relevant datasets from the community (forcing, observations)
- Allow advanced use cases such as data assimilation and model coupling studies.
- Allow the sharing of models with the entire community, both for citing (DOIs) and re-use.
In 2018 we showed a Minimal Viable Product (MVP) of the system we want to build. This MVP was intended to showcase to the hydrological community a working version of what we wanted to make available to everyone, without building the entire system. We used the MVP to get feedback from hydrologists on what they wanted, what they would need and would use in a system like eWaterCycle.
For hydrologists: a model comparison study using eWaterCycle leads to better pre-processing tools
In eWaterCycle II we are building a ‘community environment for hydrological computational science’. Community is there for a reason: we want to build a system that benefits the hydrological community by making it easier to do our work. After the MVP of 2018, we expanded the system to be able to work with multiple users. Subsequently, we invited community members for a week long workshop to add their models to eWaterCycle and do a large comparison study together.
During the workshop-week, hydrologists and research software engineers (RSEs) worked together to add hydrological models with different concepts (conceptual, semi-distributed and distributed) and with different programming languages (Python, Fortran, Matlab) into our system. The scientific question we set out to answer during the workshop was: “what is the impact for the hydrological community of the new ERA5 forcing dataset from ECMWF?”. And here we hit a snag. At the end of the week we had added nearly all models to our system, yet we had not pre-processed ERA5 for any of these models. It turned out that pre-processing of forcing data currently is mostly done using model specific scripts that are certainly not FAIR (Findable, Accessible, Interoperable and Re-usable). Since eWaterCycle II is a ‘FAIR by Design’ system, we could not let this stand and used most of the remaining time of 2019 to build a generic toolkit that allows hydrologists to pre-process climate forcing data that is FAIR. We based our solution on ESMValTool, a post-processing toolkit for climate models, repurposed to process climate models into forcing data for hydrological models. Hydrologist now only need to provide a ‘recipe’ to be able to pre-processes ERA5 or ERA-Interim data into model specific inputs. For the models in our comparison study we already provide template recipes.
For Research Software Engineers: a stack of re-usable technology
eWaterCycle II is a flagship project of the Netherlands eScience Center together with Delft University of Technology. Projects run by the eScience Center differ from those funded by traditional funders (like the Dutch NWO or the American NSF) in that the bulk of the budget is provided in ‘in kind’ hours of their research software engineers (RSEs). These RSEs are tasked to not only provide solutions for scientists within the projects that they work on, but also to build these solutions as generic as possible, to facilitate re-use in other projects and domains. This means that most of the specific technology developed within eWaterCycle should be useable within and outside of the project. Three prime examples of such re-usable technology are ERA5CLI, GRPC4BMI and our contributions to ESMValTool.
As part of the pre-processing workflow we build in 2019, we needed an automated method to download the ERA5 dataset from ECMWF. Currently, ERA5 can only be downloaded through the Climate Data Store (CDS) website, which requires manually entering information on what parts of the data a user wants to download, or using the programming interface, requiring a user to write a script to download the data We build ERA5CLI, a command line interface to automatically download ERA5 data. This was received enthusiastically by the community and even ECMWF themselves advocate the use of ERA5CLI for advanced users of the CDS.
eWaterCycle aims to bring together a community of hydrological modellers who use different programming languages for their specific models. We needed a method allow scientists to interface with each others models without having to learn (another) programming language. The Basic Model Interface (BMI) developed at CSDMS already provides a generic way to interface with models that is defined in different programming languages. It does not, however, allow automatic translation of calls from one language into another. To this end we combined gRPC, container technology and BMI into GRPC4BMI, an inter-language model coupler that allows a user in a jupyter notebook to treat a model in any programming language as a simple object it can call, without being confronted with the language that the model is written in. While GRPC4BMI is used in eWaterCycle to interface with different models, it can also be used to create one big (earth system) model build up from different sub models. Large modelling communities such as NOAA and NCAR expressed interest to use GRPC4BMI in their own workflows. We will follow up on these collaborations in 2020.
Pre-processing of climate data to generate input for hydrological models usually includes operations such as parameter selection, region selection, re-gridding (interpolation / aggregation) and even some derivations. When we looked into building a pre-processor tool for hydrological models, we realized that such a tool already exist in the climate sciences: ESMValTool. We contributed to ESMValTool by adding hydrological concepts, such as the capability to select a catchment from a dataset using the shape of the catchment. This allows hydrologist to use this powerful toolset to pre-process their input data in a FAIR manner. In 2020 we will be providing tutorials for hydrologists to help them get started with ESMValTool within eWaterCycle II.
Looking forward, in 2020 the eWaterCycle team will focus on finishing (and publishing) the comparison study. After that we intend to open the system to the hydrological community for them to do their research in. We also plan to develop specific tools for academics that have to teach hydrology to students to use eWaterCycle II in their courses. Keep an eye on eWaterCycle presentations at the General Assembly to learn the latest!
Edited by Maria-Helena Ramos