Dow Jones is Reimagining the News as a Knowledge Graph with Stardog

Bringing information to the world’s top organizations requires data integration. Data integration requires knowledge graphs at scale, and Stardog 7.0 is here to bring this to the enterprise.

For most people, the words “Dow Jones” evoke images of Wall Street. This is where Dow Jones got started in 1882, in an obscure Wall Street basement, and this is where it grew to become a worldwide news and information powerhouse as publisher of The Wall Street Journal, Barron’s, MarketWatch and data products such as Factiva.

Dow Jones has always been all about bringing information to the world’s top organizations. You don’t make it that far by not keeping up with the times, and Dow Jones has not just been keeping up, but leading. Leading both in the race for news and data reporting as well as in comprehensive, market-leading content and data integrated directly into customer workflows, platforms and applications.

Dow Jones has been reimagining what the news looks like, and AI is at the core. AI is transforming the financial media industry, impacting everything from content creation to consumption trends. A knowledge graph-powered platform enables Dow Jones to unify structured and unstructured data from a vast range of news sources and deliver cutting-edge insights for customers and partners globally.

Knowledge graphs for data unification

The questions the enterprise must answer are not neatly organized by data type or location. In fact, it’s entirely the opposite; the majority of data-driven initiatives require access to a wide breadth of data, which is usually spread across the organization, stored in all shapes and sizes, moving at every speed.

This is where traditional data management solutions have not kept pace with the needs of the business. The never ending parade of copies of data, growing infrastructure costs for heavyweight partial solutions like data warehouses, and data silos which isolate relevant aspects of the data landscape, making them difficult or impossible to access, all limit the ability to innovate based on data for an enterprise like Dow Jones.

Increasingly enterprises look to knowledge graphs to address these challenges. Knowledge graphs are getting a lot of hype these days, especially after they have been featured by Gartner as an emerging technology in 2018, and as a Top 10 Data and Analytics Technology Trend for 2019

However, they’re not a new technology. Knowledge graphs excel in data integration; something the founding team knows from a decade of experience. Born out of the University of Maryland AI and Computational Sciences Lab, Stardog provides a data unification platform, built on knowledge graphs, that aims to address the fundamental challenges posed to enterprises as they move forward with their mission-critical data initiatives.

Stardog for knowledge graphs in the enterprise

Knowledge graphs take to heart the value of flexibility that helped give rise to the NoSQL “Cambrian Explosion” that occurred, in part, due to the challenges that arise from the rigidity of the legacy relational landscape, but also embrace the value provided by a schema.

Data models, or schemas, are crucial. They provide significant value in understanding the data they help represent. While in relational systems, the connection between the schema itself and the meaning of the data is primarily notional, in a knowledge graph, that meaning, or context, is crucial, and in fact, is central to how they operate.

And unlike relational systems, knowledge graphs understand that there are multiple points of view, and different parts of the business have different objects and will need to look and analyze data from different perspectives. A single data model, especially in a world where data is increasingly varied, does not work at scale, when enterprises are attempting to leverage their data.

Stardog calls the ability to support different lens into the data, which highlight and augment the parts of the knowledge graph that are most important for each user and use case, schema multi-tenancy, and made this a first-class feature in its latest version 7.

With schema multi-tenancy organizations can concurrently support any number of these lenses, or viewpoints, onto their data. And because of the inherent flexibility of the graph data model, these schemas can be built, and optimized, to accommodate all forms of data, irrespective of where it lives or what it looks like or how fast it moves.

Part of what Stardog is trying to achieve is to provide a platform for true enterprise-scale data unification, which in it’s founder’s words, will enable organizations to “act as if the walls of the silos do not exist” and provides the appropriate infrastructure to serve as the infrastructure for any and all AI and ML initiatives. The knowledge graph built from unifying the silos of an organization let’s them act as if there is a single “go to” place for all of their data needs.

Stardog really means it when they say you can act as if the walls of the silos are not there. Debuting in version 7 is what they call Virtual Transparency, which appropriately abstracts away from the user, whether that’s an application developer, a business analyst, data scientist, or anyone else who needs data, the unnecessary details of where the data is or how it’s stored. Data consumers simple write a logical query, i.e. “get all purchases made in the last 30 days in the mid-atlantic, within the Electronics department for more than $100” and Stardog’s platform handles the details of accessing the data where it resides and returning a single, coherent, and most important, complete result.

Performance to match enterprise requirements

Both of these capabilities have helped Dow Jones tremendously. And while Dow Jones may be among the most extreme examples of multiple data sources, schemas, and locations, it certainly isn’t the only one.

Accessing data irrespective of type, format, or location is important, but it’s almost as important that the performance is there to match enterprise requirements. Dow Jones is developing a platform to connect relevant news and data to their customers’ workflows and data lakes.. Because of the sheer volume and types of data they deal with, improved write performance is highly valuable. 

Stardog responded by equipping Stardog 7 with a dramatically faster low-level storage engine, which has been in development for over two years. The new storage engine is a key part of Stardog 7, which Stardog says is dramatically faster, typically between 10x and 20x improvement for writes.

For example, with a 1 billion nodes and edges database, with 10 concurrent transactions, Stardog 7 is 18 times faster than Stardog 6. In addition, even when a database is being updated with very large transactions that might run for hours, smaller transactions are not blocked and will complete in milliseconds.

Dow Jones: Reimagining the News as a Knowledge Graph from Connected Data London

Further, Stardog 7 was designed specifically with  Stardog’s HA cluster in mind. It can handle many simultaneous clients all writing at once without any of them blocking one another, yielding a significant performance boost over previous versions. 

This new generation of knowledge graph capabilities does not end there. Stardog also offers a wide range of options for modeling and querying knowledge graphs (RDF, OWL, SHACL, SPARQL, property graphs, Gremlin, and GraphQL), APIs in multiple programming environments (Java, JavaScript, Spring, .Net, Closure, and Groovy), support for a number of graph algorithms, machine learning, and a visual environment out of the box.

To show how all of that has come together for Dow Jones, Clancy Childs, Dow Jones Knowledge Enablement General Manager, and Mike Grove, Stardog Vice President of Engineering & Co-Founder, shared how Dow Jones is Reimagining the News as a Knowledge Graph with Stardog.