RDF Data Quality Assessment – connecting the pieces

RDF and graph databases are steadily increasing their adoption and are no longer choices of niche-only communities. For almost 20 years, a constraint language for RDF was a big missing piece in the technology stack and a prohibiting factor for further adoption.

Even though most RDF-based systems were performing data validation and quality assessment, there was no standardized way to define constraints. People were using ad-hoc solutions or schemas and languages that were not meant for validation.

Thankfully, since 2017 there are 2 additions to the RDF technology stack: SHACL & ShEx. Both provide a high level RDF constraint language that people can use to define data constraints (a.k.a. Shapes), each with different strengths.

This talk will provide an outline of different types of RDF data quality issues and existing approaches to quality assessment. The goal is to give an overview of the existing RDF validation landscape and hopefully, inspire people on how to improve their RDF publishing workflows.

Dimitris Kontokostas
Senior Data & Knowledge Engineer, GeoPhy

Dimitris Kontokostas is a senior data & knowledge engineer at GeoPhy where he designs and implements solutions on data quality & large-scale knowledge graphs. Dimitris has a PhD (Dr.-Ing.) from Leipzig University (AKSW/KILT Group) on “Large-scale knowledge extraction, publishing and quality assessment” where he graduated with magna cum laude. He served as the head of technical developments of DBpedia for three years, is the creator of RDFUnit, a unit testing framework for RDF with high industry uptake, heavily involved with data quality standardization activities (i.e. SHACL, ShEx) and a co-author of the “Validating RDF data” book.


This is what Connected Data London 2018 brought to the fore. Connected Data London 2019 is on! Secure your chance to learn from experts and innovators, get your ticket early! Limited number early bird tickets available.