Links form the backbone of Tim Berners Lee’s vision for the Semantic Web and Open Data. For good reason as modelling Open Data using Linked Data standards makes disambiguation easier through the use of URIs which in turn facilitates integration with other Open Data sets improving the quality, making interoperability easier and providing a common set of standards for users. This enables several end user benefits such as ease of understanding the content of the dataset and also making it easy to discover new related data. describes the differing levels of Linked Open Data detailing the ‘gold standard’ 5 star achievement as modelling your content as Linked Open Data (LOD) with the second best 4 star level being the Resource Description Framework (RDF). The more proprietary the standard the less stars are awarded.

The Open Data Institute (ODI) have completed much work recently in the space of maturity benchmarking applying conventional Data Management benchmarking techniques to Open Data authors’ content. This has led to the creation of a pathway tool to encourage users to self-assess their maturity. This does however beg several questions. How mature are most authors of Open Data currently? Is there a significant barrier and trade off choice for authors when faced with the choice of publishing data ‘as is’ vs transforming their data into LOD? Are there business benefits of managing Open Data internally as LOD?

Looking at one of the world’s largest Open Data portals it’s clear that at present the maturity stack looks somewhat different with a very small proportion of datasets falling into the 5 star LOD standard.  Less than 0.5% of all datasets on the portal are modelled as LOD. Is this surprising? Thinking about the landscapes of typical authors on the platform perhaps not – typically these are councils, local authorities and government departments amongst others. Each likely have highly complex data architectures and processes meaning providing the data at all in an open format is no mean feat. Also in the context of generally meeting Freedom of Information request (FOI) obligations it’s likely not the highest priority to ‘triplify’ existing datasets.

Getting started with making your Open Data sets connected

Where should an Open Data author interested in progressing through these maturity stages practically start? Is it better to start with looking at a scalable platform for storage, data management & publishing?  Perhaps it’s better to start with hiring skilled practitioners with experience of working with linked data? Should those interested start with a small pilot group of datasets or scope a larger implementation plan that covers all key components of turning Open Data into Linked Open Data?

