Skip to main content

What are graph databases?


The description of graph databases that you get when you google it are mostly academic. I see a lot of descriptions about graph databases that talk about seven bridges in Königsberg or Berners-Lee, the inventor of the internet. There are theories and visions which are fine, but for me, I still think it’s important to lead with the relevance. Why are graph databases important to you?

Imagine the data that’s stored in a local restaurant chain. If you were keeping track, you’d store customer information in one database table, the items you offer in another and the sales that you’ve made in a third table. This is fine when I want to understand what I sold, order inventory and who my best customer is. But what’s missing is the connective tissue, the connection between the items, along with function in the database that can let me make the most of it.
A graph database stores the same sort of data, but is also able to store linkages between the things.  John buys a lot of Pepsi, Jack is married to Valerie and buys different drinks. I don’t have to perfom JOINs to understand how I should market to each individual customer.  I can see the relationships in the data without having to make a hypothesis and test it.

Examples of applications for the graph databases
Semantic Info Stored
Example
Use Case
Ownership

Susan owns a Honda
Buyer Intent
Interest
Steve is interested in Football
Designed by
Frank Lloyd Wright designed the Guggenheim
Knowledge Graph
<classification>

Guggenheim is a museum
Connections
via port e.g. server1 connected via port 8080 to server2
Network/IT operations
Is associated with 

e.g gene is associated with cancer
Life Sciences
Many more
This new connected information layer does a lot for you. It’s not just about buyer intent, but it could be helpful in a lot of use cases (see table 1).
Since databases are designed with tables, not the linked data, SQL won’t do anymore. This has given rise to SQL-like languages (but different) like SparQL, Gremlin and Cypher to name a few.  A major difference is the analytical functions you need to act upon the linked data. If I wanted to find the most popular time to buy a certain product on your web site, or if I wanted to rank popularity of an item, for example, there’s new syntax for that. You need to learn the language of connected data to make the most of it.

Can’t you can do that with a RDBMS?

Yes, it is possible to create these linkages in a traditional RDBMS. However, to perform these tasks in traditional databases, database administrators have toiled to maintain unique keys and reconstruct relationships with JOINs. If graph databases are used, both the subject and its relationship, known as subjects and predicates, are already known.  There’s no need to reconstruct the connections.
Inferring that Zoe is the daughter of Mary if you have previously defined that Mary is the mother of Zoe is another example. You do not necessarily need to specifically define both relationships because graph databases are pretty smart about this. By comparison, relational database cannot understand anything it isn’t defined. Therefore, this inferencing capability has clear value when looking at interests, households and communities.

If you want to learn more about SparQL, check out one of the many SparQL tutorials online.  There’s ample opportunity to try a graph database, like AnzoGraph.



Comments

Popular posts from this blog

Choosing a Graph Database is one of IT's Big Bold Moves (but it just may pay off)

Graph databases are becoming more important to analytics by offering a capability to store relationships and perform unique algorithms. Graph databases show relationships, true. But the real power might just be in the difficult analysis they can perform. In the graph database world, the graph relationship diagram highlights one of the unique values of graph, namely the ability to keep track of connections in the data. Graph visualizations are the first place to start when it comes to understanding the connections in the data and how the puzzle fits together. However, it is  just one  of the features that makes graph databases potentially valuable for your organization. Let’s look at a couple of examples of that potential and how they come together to empower analytics. Graph Algorithms Even though you may not necessarily visualize certain algorithms with traditional graph ball visualization, graph algorithms including Pagerank, shortest path, all paths and ...

The three biggest differences between graph databases

It is important to understand that not all graph databases are created equal. As I pull back the covers on graph databases, I'm beginning to understand that graph databases tend to fall into the following categories: RDF versus property graph Resource description framework (RDF) graph databases, sometimes known as triple stores, offer a way of accessing data that follows W3C standards. Data sources conforming to these standard will be made more accessible without data conversion. RDF databases also natively use SparQL, with is considered a standard language for graph analytics.  Labelled property graphs (LPG) like Neo4J were originally purpose-built and less conforming to the Web standards.  However, they may perform better for certain types of graph analysis. LPGs may use non-standard languages, like gremlin and cypher in order to achieve analysis. That said, SparQL is not exactly well-known. SQL does not have native graph functions, so you'll need to pick up a variant l...