Data validation in national and international context: where are we, and where are we going?

Presenter(s): Mark van der Loo, CBS

e-mail(s): mpj.vanderloo@cbs.nl

Date: 16 March 2021

Webinar aims

Checking the quality of data is a task that pervades statistical production. It does not matter whether we are working with raw data, cleaned data, or with the results of an analyses. It is always important to convince ourselves that the data were are using are fit for its intended purpose.
Since it is such a pervasive task, it makes sense to understand, standardize, and automate data validation as much as possible so results can be compared over time, across statistics, and across organizations.
The aim of this webinar is to inform participants about the state of the art in systematic data validation, both from a national and international (ESS) perspective. We will touch upon aspects related to business, methodology, and implementation, including the ‘validate’ R package.  The seminar aims to be both practical and interactive, with real examples from statistical production and an interactive online quiz during the webinar.

Webinar learning outcomes

Participants will be informed about the state of the art in data validation, both in the context of national statistical offices and in the context of data exchange between NSIs and Eurostat. After this webinar, participants will have a better grasp on why data validation is important, and why and how it is standardized within the European Statistical System. Participants will also learn about some of the tooling that is available that allows them to start improving data
validation procedures in their own situation immediately.

Webinar content

  • Data validation: why and how?;

  • Principles for data validation in statistical production systems.

  • Data validation in context of the European Statistical System.

  • Data validation with R: the ‘validate’ package.

  • Analyzing, visualizing, and reporting results.

Difficulty level


Prerequisites for the webinar

The webinar is aimed at data professionals and their managers who want to learn about or get started with systematic and automated data validation. There are no formal prerequisites, apart from having some professional experience with managing data.

Further reading and resources

Getting started with data validation:
–  The data validation cookbook
–  The ESS handbook on methodology for data validation
–  The CROS portal website on data validation

Van der Loo, MPJ, and E de Jonge. “Data Validation.” Wiley Stat. Ref. Online (2014): 1-7. DOI: 10.1002/9781118445112 (ArXiv preprint)

