Knowledge graphs excel at integrating data, and leveraging connections. They offer a foundation for reaping the benefits of data exploration, analytics, data science and AI. But what if you’re not ready to make the switch, and move all your enterprise data in a graph database? Siren can help.
Data lakes, data hubs, knowledge graphs, AI..Buzzword overload. The problem with buzzwords is that most of them don’t serve reality very well. In reality, organizations have data in various formats, stored across various systems.
Some systems were purchased on the merit of the buzzword du jour and then handed down as legacy. Others were chosen by developers who started coding a proof of concept, and stuck around. Some were chosen simply as the right tool for the kind of use that the data required.
Databases, relational and NoSQL. Documents, spreadsheets, text files and emails. And logs – lots of logs. No wonder then that one of the most popular ways for people to find what they’re looking for in this deluge of data all over the place is search.
Search done right is simple to use: you don’t need to know the whereabouts of what you are looking for, or the connections between that and everything else in your data. Search done right, however, largely relies on indexing and connecting data, which is anything but simple. That’s what got Siren started; going from search to unifying buzzwords has been an interesting ride.
Unifying Search and Knowledge Graphs for the win
Giovanni Tummarello and Renaud Delbru got started with search, and knowledge graphs, a while ago. Back in 2007, Tummarello and Delbru were researchers. Knowledge graphs were not yet a buzzword, although the underlying technology (Semantic Web, or Linked Data, or Web of Data) was enjoying its own hype.
Web pages were annotated with bits of metadata: machine readable information like “author” or “price” . This was embedded in HTML in formats such as microformats or RDF, which were then adopted by Google in Schema.org. Tummarello and Delbru set out to index them and make them discoverable in a search engine for the Web of Data called Sindice.
Sindice started out modestly, but produced quite a bit of results, some contributions making its way through open source Apache Foundation projects and search engines Lucene and Solr before folding. Although being able to index and search a 30 billion edge knowledge graph was no small feat, commercializing it did not quite work out.
But that only served to launch Tummarello and Delbru to their next endeavor: Siren, the Semantic Information Retrieval Engine. In 2014 Siren’s founders set off to combine Knowledge Graphs and indexing to bring value to the enterprise. Knowledge Graphs excel in integrating various data sources, and index-based search is the easiest way to access that data.
By that time, open source platform Elasticsearch with its Kibana visualization layer and rich plugin ecosystem was a big hit in the developer community, and it was used in many Enterprises. Siren decided to capitalize on Elasticsearch, and offers a layered approach to getting the best of both worlds.
The Federation layer offers a number of connectors, enabling data to be ingested into Elasticsearch from a number of sources: SQL and NoSQL databases, Hadoop, and more. This makes Elasticsearch an integration point, while letting data stay in their original sources and relying on smart indexes.
Siren built the core of its solution based on a Semantic model. A unified data model (or ontology) is a formal representation of the entities and relationships that exist in a domain – to a level of detail useful to answer core domain questions. Siren abstracts the details of working with ontologies, while offering expert users the option to express graph queries in the Gremlin graph query language, part of Apache Tinkerpop.
The Investigate layer builds on the unified data model to offer link analysis within and across datasets, and to define and control of access to data sets and data elements. These are implemented as a set of plugins compatible with Elasticsearch and Kibana, and build on search technology with significance ranking, fuzzy and phonetic search and more.
AI superpowers and Siren ML
The top layer in Siren is the Alert layer, which lets users define triggers to fire and receive notifications when certain events happen. This is a high availability layer, which also offers a scheduled reporting mode. It’s on this layer that Siren is now adding AI superpowers, integrating deep learning technology.
Siren 10.3 introduces Siren Machine Learning (Siren ML). Designed to leverage modern open-source machine-learning frameworks such as TensorFlow and running in cloud-compatible Docker environment, Siren ML aims to provide data investigators with a stress-free way to reap the benefits of using state-of-the-art “auto” machine-learning methods on their data.
Deep expertise in machine learning is not necessary to use Siren ML, as the plugin comes with a UI to support creating, updating, and activating ML models. Siren ML also takes care of hyperparameter optimization to find the best model to fit data. Hyperparameters are settings required prior to training models. Machine learning results can be viewed on Siren dashboards.
The first release of Siren ML offers two types of machine-learning model for handling numerical time series data (unsupervised anomaly detection) and future value prediction (forecast). Anomaly detection can help analysts be more efficient scanning through loads of data, while future value prediction can be combined with alerts to give early warning for events of interest.
Siren Entity Resolution (ER) is a Machine Learning component capable of recognizing that two or more records are very likely to be referring to the same real-world entity (for example, the same person). In addition to identity, ER can also prompt that two or more entities are interestingly connected: for example, they may share an unusual combination of attributes.
Siren ER offers advanced, mostly automated real-time operations, with no batch reprocessing required. It can scan across schemata and data sources, and anything that can be connected to Siren can benefit from this. Siren ER can match across dozens of address formats and conventions: it understands that “Robert” can also be referred to as “Rob” or “Bob” or “Роберц”.
Finally, Siren ER can correct previous assertions based on new facts. It uses both new records to revise previous assertions and automatically assesses the weight of the attributes given their distribution over the data. Combined with link analysis over knowledge graphs (discovering entities which share identifiers), ER can reveal new insights even in known data sets.
Exploring and visualizing connections
Visualization plays a key role in showing and exploring connections in data, and some people think of Siren as the long missing link between Tableau-like visual analytics tools and graph tools. What may be hard to spot and follow through in text or tabular format becomes more clear when displayed on a dashboard. Siren has been building on Elasticsearch Kibana, enhancing it with its own graph-specific visualizations.
Siren offers relational navigation capabilities: the Siren data model allows one to move from a set of records (e.g. a dashboard with a filter) to the set of records which are relationally connected (e.g. in another dashboard).
For example, one can move from a “set of companies” which have certain characteristics to the “set of investors” that have invested “at least 5M” in those companies and also move from “dashboard” to “link analysis” and back as required by the analysis.
Siren 10.3 introduces “Dashboard 360”. This feature can link visualizations within a dashboard to create a “360-degree” interactive view around an entity or group of entities. A new dashboard data model enables visual configuration of relationships between different search-based visualizations on a dashboard, which then allows coherent filtering across all of them.
For example, let’s assume a data model containing a table “investor” connected to another table “investments”, which then is connected to another table “companies” and yet another table “articles”: Investors – make → Investments – secured by → Companies – mentioned in → Articles.
For text-based data (reports, emails, news articles, etc), Siren provides a built-in real-time visual interactive topic and keyword clustering exploration UI which helps in news monitoring, investigative textual data discovery, and e-discovery.
The Topic Explorer is a new visualization in Siren which can be embedded as part of dashboards and interacts with other visualizations creating filters. For example, users can associate it with date histograms or any other control to create explorative dashboards for their corpuses.
Last but not least, in addition to enabling existing data to be viewed as graphs, Siren also integrates graph data, starting with Neo4j. Neo4j data can be visualized in Siren dashboards, while native queries can leverage Neo4j capabilities and return results which are then explored in Siren.
Augmented Analytics for Knowledge Graphs in the real world
Analytics augmented by the use of AI: this is how Gartner defines Augmented Analytics. AI can make data in existing enterprise backends become a unified Knowledge Graph, with a UI experience to match.
Knowledge graph-based data models can serve as the backbone for enterprise data integration. While this realization is increasingly hitting home, not all organizations are ready to commit to moving all their data into a graph database.. Still, data integration is the foundation for reaping the benefits of exploration, analytics, data science and AI.
Siren offers a pragmatic approach to knowledge graphs, abstracting as much as possible of the underlying complexity, and integrating data in a federated way – mostly leaving data “where it is”
Building on this federation layer, out of the box capabilities for exploration, alerts and visualization make it an investigative platform for the enterprise.
Siren’s latest version, 10.3, adds deep-learning based AI superpowers to this foundation, making the life of enterprise users simpler. We expect to see Siren building even further on this.
Siren is open source, and comes in 3 editions – Community, IT, and Business. All new features are available in limited form in the Community edition, and in their fully fledged form in the IT and Business editions.
See Tummarello’s Keynote in Connected Data London 2019, meet and greet the Siren team at booth 8. Tickets are going fast, secure yours now! And remember: bulk and combo discounts apply. Check out our workshops too!
Need to convince your manager? We got you a kit, start working on it!