It is important to understand that not all graph databases are created equal. As I pull back the covers on graph databases, I'm beginning to understand that graph databases tend to fall into the following categories:
RDF versus property graph
Resource description framework (RDF) graph databases, sometimes known as triple stores, offer a way of accessing data that follows W3C standards. Data sources conforming to these standard will be made more accessible without data conversion. RDF databases also natively use SparQL, with is considered a standard language for graph analytics.
Labelled property graphs (LPG) like Neo4J were originally purpose-built and less conforming to the Web standards. However, they may perform better for certain types of graph analysis. LPGs may use non-standard languages, like gremlin and cypher in order to achieve analysis. That said, SparQL is not exactly well-known. SQL does not have native graph functions, so you'll need to pick up a variant language - SparQL, Cypher or Gremlin.
Some products, like AnzoGraph, offer both property graphs and RDF. AnzoGraph is an example of an all-in-one product for performing both W3C-conforming RDF style analytics and LPG-style analytics.
Analytic databases (OLAP) versus operational databases (OLTP)
OLAP databases are designed for analytics that look across an entire set of data, while OLTP databases are aimed at pin-point analytics. In other words, if you want to look across the database and complete historical analysis of things that happened this month, OLAP databases do this well. If you’re more concerned about transactions, whether a seat on an airplane is available or not, whether a switch is on or off, OLTP systems are designed for this.
All operational databases support some degree of analytics but performance is impacted based on the underlying architecture. If you have an even workload of both OLAP style queries and OLTP-style analytics, it may benefit you to split these workloads, especially if you have a high data volume.
Native engine versus “built-on”
Some graph databases have been specifically built by starting with a native graph engine. Others have been built on top of other technologies, including Hadoop and Cassandra. It’s important to look at whether you need to manage an underlying infrastructure, or whether the engine is self-contained. Performance and management of multiple solutions are the keys here.
Understand what you're downloading before you start in on your graph database selection process.
RDF versus property graph
Resource description framework (RDF) graph databases, sometimes known as triple stores, offer a way of accessing data that follows W3C standards. Data sources conforming to these standard will be made more accessible without data conversion. RDF databases also natively use SparQL, with is considered a standard language for graph analytics.
Labelled property graphs (LPG) like Neo4J were originally purpose-built and less conforming to the Web standards. However, they may perform better for certain types of graph analysis. LPGs may use non-standard languages, like gremlin and cypher in order to achieve analysis. That said, SparQL is not exactly well-known. SQL does not have native graph functions, so you'll need to pick up a variant language - SparQL, Cypher or Gremlin.
Some products, like AnzoGraph, offer both property graphs and RDF. AnzoGraph is an example of an all-in-one product for performing both W3C-conforming RDF style analytics and LPG-style analytics.
Analytic databases (OLAP) versus operational databases (OLTP)
OLAP databases are designed for analytics that look across an entire set of data, while OLTP databases are aimed at pin-point analytics. In other words, if you want to look across the database and complete historical analysis of things that happened this month, OLAP databases do this well. If you’re more concerned about transactions, whether a seat on an airplane is available or not, whether a switch is on or off, OLTP systems are designed for this.
All operational databases support some degree of analytics but performance is impacted based on the underlying architecture. If you have an even workload of both OLAP style queries and OLTP-style analytics, it may benefit you to split these workloads, especially if you have a high data volume.
Native engine versus “built-on”
Some graph databases have been specifically built by starting with a native graph engine. Others have been built on top of other technologies, including Hadoop and Cassandra. It’s important to look at whether you need to manage an underlying infrastructure, or whether the engine is self-contained. Performance and management of multiple solutions are the keys here.
Understand what you're downloading before you start in on your graph database selection process.
Comments
Post a Comment