Martjin posts
(2012/10/11)
.. There have been endless
debates on how to name, identify, relate and transform the various kinds of
Data Models we can, should, would, must have in the process of designing,
developing and managing information systems (like Data warehouses). We talk
about conceptual, logical and physical data models, usually in the context of
certain tools, platforms, frameworks or methodologies. A confusion of tongues
is usually the end result. Recently David Hay has made an interesting video
(Kinds of Data Models ‑‑ And How to Name them) which he tries to resolve this
issue in a consistent and complete manner. But on LinkedIn this was already
questioned if this was a final or universal way of looking at such models.
.. I note that if you use e.g.
FCO‑IM diagrams you can go directly from conceptual to "physical"
DBMS schema if you want to, even special ones like Data Vault or Dimensional
Modeling. I also want to remark that there are 'formal language' modeling
techniques like Gellish that defy David's classification scheme. They are both
ontological and fact driven and could in theory go from ontological to physical
in one step without conceptual or logical layer (while still be consistent and
complete btw, so no special assumptions except a transformation strategy). The
question arises how many data model layers we want or need, and what each layer
should solve for us. There is tension between minimizing the amount of layers
while at the same time not overloading a layer/model with too much semantics
and constructs which hampers its usability.
.. For me this is governed by
the concept of concerns, pieces of interest that we try to describe with a
specific kind of data model. These can be linguistic concerns like
verbalization and translation, semantic concerns like identification,
definition and ontology, Data quality constraints like uniqueness or
implementation and optimization concerns like physical model transformation
(e.g. Data Vault, Dimensional Modeling), partitioning and indexing.
Modeling/diagramming techniques and methods usually have primary concerns that
they want to solve in a consistent way, and secondary concerns they can model,
but not deem important (but that are usually important concerns at another
level/layer!). What makes this even more difficult is that within certain kinds
of data models there is also the tension between notation, approach and theory
(N.A.T. principle). E.g. the relational model is theoretically sound, but the
formal notation of nested sets isn't visually helpful. ER diagramming looks
good but there is little theoretic foundations beneath it.
.. I personally think we should
try to rationalize the use of data model layers, driven by concerns, instead of
standardizing on a basic 3 level approach of conceptual, logical, and physical.
We should be explicit on the concerns we want to tackle in each layer instead
of using generic suggestive layer names.
I would propose the following (data) model layers minimization rule:
A layered (data) modeling scenario supports the concept of separation
of concerns (as defined by Dijkstra) in a suitable way with a minimum of layers
using a minimum of modeling methodologies and notations.
EVEREST RESPONDS:
ReplyDelete.. Martijn, I think you are on the right track. Rather than levels of data models we need to identify the elements, things, concepts, constructs, (or whatever you might call them) which we introduce into the models. For example, populations of things, names for populations of things, identifiers (lexical surrogates for members of a population of things), relationships (unary, binary, ternary...), characteristics/constraints of relationships (optional/mandatory, multiplicity/exclusivity), representation of relationships (e.g., foreign keys), attributes/properties of things, ring constraints, population constraints, etc. Ideally, we would like to define an ordering on these. Then we identify clusters of those elements to establish levels. For example, in the first stages of FOM/ORM we do not need identifiers (reference modes) but rather we can speak only of populations of things. We do need to have names for populations of things (so we can talk about them). That provides the semantics of those populations. Note that we do not even need to introduce the notion of relationships until after introducing the notion of things/objects/entities. In a model, I can have objects without relationships, but I can't have relationships without objects. This sets a precedence ordering on introducing the elements of the model. If we don't have relationships initially, then we would have no need for foreign keys. In fact in ER we don't even have/need the notion of foreign keys. Foreign keys are a particular method of representing relationships imposed by the relational data modeling scheme. Also, it is unnecessary to introduce the notion of single valued/atomic attributes in ER. This (first normal form) is a constraint applied by the relational model which is done for the purposes of implementation, not modeling the user domain for users. Interesting to note that we often jump to thinking about attributes of things prematurely when we think entity tables. A lot of modeling can be done before putting stuff into tables (witness FOM and ORM). The truth here is that an attribute is an object which plays a role in a relationship with some (other) object. So you can't even have an attribute until you have presumed a relationship. In fact, in the relational model such relationships must be functional dependencies (M:1, or 1:1) so we must have the notion of relationship characteristics/constraints before we can have the notion of attributes in tables. So why not model all the relationships first, then we can say that an object has attributes by being related to other objects.
.. What I call elements in/of the model, you are calling "concerns." Rather than beating our heads against the wall trying to define stages of data models/modeling, we should try to identify the basic elements of data models/modeling and then establish some sort of precedence ordering on those elements (though never to be completely linear).