The description of graph databases that you get when you
google it are mostly academic. I see a lot of descriptions about graph
databases that talk about seven bridges in Königsberg or Berners-Lee, the
inventor of the internet. There are theories and visions which are fine, but for
me, I still think it’s important to lead with the relevance. Why are graph
databases important to you?
Imagine the data that’s stored in a local restaurant chain.
If you were keeping track, you’d store customer information in one database table,
the items you offer in another and the sales that you’ve made in a third table.
This is fine when I want to understand what I sold, order inventory and who my
best customer is. But what’s missing is the connective tissue, the connection
between the items, along with function in the database that can let me make the
most of it.
A graph database stores the same sort of data, but is also
able to store linkages between the things.
John buys a lot of Pepsi, Jack is married to Valerie and buys different
drinks. I don’t have to perfom JOINs to understand how I should market to each
individual customer. I can see the
relationships in the data without having to make a hypothesis and test it.
Examples of applications for the graph databases
|
||
Semantic Info Stored
|
Example
|
Use Case
|
Ownership
|
Susan
owns a Honda
|
Buyer
Intent
|
Interest
|
Steve
is interested in Football
|
|
Designed
by
|
Frank
Lloyd Wright designed the Guggenheim
|
Knowledge
Graph
|
<classification>
|
Guggenheim
is a museum
|
|
Connections
|
via
port e.g. server1 connected via port 8080 to server2
|
Network/IT
operations
|
Is
associated with
|
e.g
gene is associated with cancer
|
Life
Sciences
|
Many
more
|
||
This new connected information layer does a lot for you.
It’s not just about buyer intent, but it could be helpful in a lot of use cases
(see table 1).
Since databases are designed with tables, not the linked
data, SQL won’t do anymore. This has given rise to SQL-like languages (but
different) like SparQL, Gremlin and Cypher to name a few. A major difference is the analytical functions
you need to act upon the linked data. If I wanted to find the most popular time
to buy a certain product on your web site, or if I wanted to rank popularity of
an item, for example, there’s new syntax for that. You need to learn the
language of connected data to make the most of it.
Can’t you can do that with a RDBMS?
Yes, it is possible to create these linkages in a
traditional RDBMS. However, to perform these tasks in traditional databases,
database administrators have toiled to maintain unique keys and reconstruct
relationships with JOINs. If graph databases are used, both the subject and its
relationship, known as subjects and predicates, are already known. There’s no need to reconstruct the
connections.
Inferring that Zoe is the daughter of Mary if you have
previously defined that Mary is the mother of Zoe is another example. You do
not necessarily need to specifically define both relationships because graph
databases are pretty smart about this. By comparison, relational database
cannot understand anything it isn’t defined. Therefore, this inferencing
capability has clear value when looking at interests, households and
communities.
If you want to learn more about SparQL, check out one of the
many SparQL
tutorials online. There’s
ample opportunity to try a graph
database, like AnzoGraph.
Comments
Post a Comment