Victor Morgante, in "What is a graph database?" 2020 Sept 15
https://towardsdatascience.com/what-is-a-graph-database-249cd7fdf24d
is trying to get beyond the sales hype for graph databases by getting down to a base definition:
“A graph database is any database over which a graph schema ... can be written”
However, there are many different definitions and conditions for a graph schema which can account for the differences in graph databases.
When we speak of graph databases we are invoking the notion of a graph. At its base, a graph is simply a collection of nodes, representing things, and lines or arcs, each representing a relationship between nodes. If we add that an arc is always between two nodes, then we are restricted to binary relationships.
A critical assumption about graphs in graph theory, and one that is often overlooked or not stated, is that the nodes are homogeneous, at least in some sense, that is, they all belong to one population. The other assumption is that each node is uniquely identifiable and distinguished from all other nodes, that is, there is a name space over the population of nodes.
You have
made it clear that the nodes in a graph you are speaking of are at the schema
level, that is, every node represents a type or population of instances of
things (or domain of values or identifiers)
Beyond that, there are many rules or characteristics we may assume
or impose on the formation of a graph:
- Connected – every node must be connected to at least one other
node?
- Acyclic – there is at most one path between any two nodes.
- Rooted – there is a single "entry point" into the
graph (These three serve to define a tree structure)
- Directed – every arc "points" in one direction (and
never both directions).
- Binary only, or allowing a reflexive relationship, a node relating
or pointing to itself
- Ternary and higher order relationships involving more than two nodes. E.g, Person possesses Skill at level of Proficiency, or Person enrolled in Class at School earns Grade.
- Naming the arcs to reflect some semantics about the relationship. In the data world, these would be verb phrases or predicates (while the node names are nouns). Note, there is always an inverse reading, whether expressed or not. E.g, Person works in Dept, Dept employs Person.
- Role Names – an alternate name for the node which reflects the role it plays in the relationship. E.g., Person working in a Dept could be called an Employee, Person enrolled in a class is a Student.
- Being at the type or schema level, with each node being a population of instances, it now becomes critically important to specify for each relationship (arc) its cardinality, that is, the mandatory/optional and exclusivity/multiplicity characteristics, and in both directions (in each binary relationship).
In any given context we may go about drawing nodes and arcs
without stating our assumptions about the nature or characteristics of the
graph. Perhaps it is differences in how
these characteristics are manifest in particular graph databases that account
for the different schemes. To be sure
the differences are significant in our understanding of graph databases in
different data modeling schemes and data management systems. Now it makes a difference how you define your
graph database. It is overly simplistic
to say that every relational database is a graph database.
No comments:
Post a Comment
Comments to any post are always welcome. I thrive on challenges and it will be more interesting for you.