Scalable, high-performance data validation in hydrometeorological applications

The quality assurance of hydrometeorological data has been a focus of meteorologists and hydrologists for decades and even centuries. So far, the driving force for the validation process has been a human being, nowadays using software tools to support data visualization and editing actions. With the availability of more, higher-resolution and higher frequency data in recent years, a paradigm change is ongoing in the community towards semi-automatic and automatic data validation and estimation techniques. In this setup, the hydromet experts are acting as supervisors of the process rather than the in-the-loop resource being part of the process.

The ability for large-scale data validation results in a number of requirements on a related software framework. Of those, scalability and performance are among the most prominent ones. A derived, state-of-the art software architecture should reflect those requirements in application to hydromet use cases. In this context, the challenge is to properly deal with heterogeneous data validation rules and their temporal and spatial reference.

OBJECTIVE

  • Develop a conceptual framework for the classification and assessment of hydrometeorological data validation rules from a software perspective
  • Design a scalable, high-performance software architecture for the large-scale validation of hydrometeorological data
  • Implement a proof-of-concept of the software design above and discuss its properties by performance metrics