Intro

I have collected Q&A topics since about 2010. These are being put onto this blog gradually, which explains why they are dated 2017 and 2018. Most are responses to questions from my students, some are my responses to posts on the Linkedin forums. You are invited to comment on any post. To create a new topic post or ask me a question, please send an email to: geverest@umn.edu since people cannot post new topics on Google Blogspot unless they are listed as an author. Let me know if you would like me to do that.

2020-09-24

Graph Databases and Graph Theory - What is a Graph?

 

Victor Morgante, in "What is a graph database?" 2020 Sept 15

https://towardsdatascience.com/what-is-a-graph-database-249cd7fdf24d

is trying to get beyond the sales hype for graph databases by getting down to a base definition:

“A graph database is any database over which a graph schema ... can be written”

However, there are many different definitions and conditions for a graph schema which can account for the differences in graph databases. 

When we speak of graph databases we are invoking the notion of a graph.  At its base, a graph is simply a collection of nodes, representing things, and lines or arcs, each representing a relationship between nodes. If we add that an arc is always between two nodes, then we are restricted to binary relationships.

A critical assumption about graphs in graph theory, and one that is often overlooked or not stated, is that the nodes are homogeneous, at least in some sense, that is, they all belong to one population. The other assumption is that each node is uniquely identifiable and distinguished from all other nodes, that is, there is a name space over the population of nodes.

You have made it clear that the nodes in a graph you are speaking of are at the schema level, that is, every node represents a type or population of instances of things (or domain of values or identifiers)

Beyond that, there are many rules or characteristics we may assume or impose on the formation of a graph:

- Connected – every node must be connected to at least one other node?

- Acyclic – there is at most one path between any two nodes.

- Rooted – there is a single "entry point" into the graph (These three serve to define a tree structure)

- Directed – every arc "points" in one direction (and never both directions).

- Binary only, or allowing a reflexive relationship, a node relating or pointing to itself

- Ternary and higher order relationships involving more than two nodes.  E.g, Person possesses Skill at level of Proficiency, or Person enrolled in Class at School earns Grade.  

- Naming the arcs to reflect some semantics about the relationship.  In the data world, these would be verb phrases or predicates (while the node names are nouns).  Note, there is always an inverse reading, whether expressed or not.      E.g, Person works in Dept, Dept employs Person.

- Role Names – an alternate name for the node which reflects the role it plays in the relationship.       E.g., Person working in a Dept could be called an Employee, Person enrolled in a class is a Student.

- Being at the type or schema level, with each node being a population of instances, it now becomes critically important to specify for each relationship (arc) its cardinality, that is, the mandatory/optional and exclusivity/multiplicity characteristics, and in both directions (in each binary relationship).

In any given context we may go about drawing nodes and arcs without stating our assumptions about the nature or characteristics of the graph.  Perhaps it is differences in how these characteristics are manifest in particular graph databases that account for the different schemes.  To be sure the differences are significant in our understanding of graph databases in different data modeling schemes and data management systems.  Now it makes a difference how you define your graph database.  It is overly simplistic to say that every relational database is a graph database.

 

No comments:

Post a Comment

Comments to any post are always welcome. I thrive on challenges and it will be more interesting for you.