Martjin posts
(2012/10/11)
.. There have been endless
debates on how to name, identify, relate and transform the various kinds of
Data Models we can, should, would, must have in the process of designing,
developing and managing information systems (like Data warehouses). We talk
about conceptual, logical and physical data models, usually in the context of
certain tools, platforms, frameworks or methodologies. A confusion of tongues
is usually the end result. Recently David Hay has made an interesting video
(Kinds of Data Models ‑‑ And How to Name them) which he tries to resolve this
issue in a consistent and complete manner. But on LinkedIn this was already
questioned if this was a final or universal way of looking at such models.
.. I note that if you use e.g.
FCO‑IM diagrams you can go directly from conceptual to "physical"
DBMS schema if you want to, even special ones like Data Vault or Dimensional
Modeling. I also want to remark that there are 'formal language' modeling
techniques like Gellish that defy David's classification scheme. They are both
ontological and fact driven and could in theory go from ontological to physical
in one step without conceptual or logical layer (while still be consistent and
complete btw, so no special assumptions except a transformation strategy). The
question arises how many data model layers we want or need, and what each layer
should solve for us. There is tension between minimizing the amount of layers
while at the same time not overloading a layer/model with too much semantics
and constructs which hampers its usability.
.. For me this is governed by
the concept of concerns, pieces of interest that we try to describe with a
specific kind of data model. These can be linguistic concerns like
verbalization and translation, semantic concerns like identification,
definition and ontology, Data quality constraints like uniqueness or
implementation and optimization concerns like physical model transformation
(e.g. Data Vault, Dimensional Modeling), partitioning and indexing.
Modeling/diagramming techniques and methods usually have primary concerns that
they want to solve in a consistent way, and secondary concerns they can model,
but not deem important (but that are usually important concerns at another
level/layer!). What makes this even more difficult is that within certain kinds
of data models there is also the tension between notation, approach and theory
(N.A.T. principle). E.g. the relational model is theoretically sound, but the
formal notation of nested sets isn't visually helpful. ER diagramming looks
good but there is little theoretic foundations beneath it.
.. I personally think we should
try to rationalize the use of data model layers, driven by concerns, instead of
standardizing on a basic 3 level approach of conceptual, logical, and physical.
We should be explicit on the concerns we want to tackle in each layer instead
of using generic suggestive layer names.
I would propose the following (data) model layers minimization rule:
A layered (data) modeling scenario supports the concept of separation
of concerns (as defined by Dijkstra) in a suitable way with a minimum of layers
using a minimum of modeling methodologies and notations.