By: Brent (Linkedin Discussion: Started 2011 Nov. 9)
. . We're having an internal discussion about PDM standards. The existing standard
says that, for all tables, columns, keys, indexes, and relationships, the PDM
should match what was implemented in the physical database. We have a huge
Teradata database that has no primary key constraints, foreign key constraints,
or indexes implemented in the database, but they are still in the data model.
Those components are part of the application code used to load the tables, so
the total database implementation = database schema + data load rules, and the
standard is satisfied. Agree? Disagree? What's your standard?
EVEREST RESPONDS:
First recognize that a data model goes through several
stages from a detailed “conceptual” representation of the users’ world to final
implementation in some DBMS. This
question suggests there are at least two stages. Ideally, as the data model is transformed
from one stage to the next, the (user) semantics are retained (perhaps with
differing syntactical representations), but additional features and constructs
may be added. In general, the model at
each stage will be different than the previous stage. So I would answer that they need not and
should not match, except in the semantics of the users’ world.
Note that user semantics does not include things such
as clustering data items into records/tables, converting the representation of
relationships into foreign keys, adding indexes, surrogate keys, enforcing
first normal form (other normal forms are still relevant), etc. These are all introduced for the convenience
of the system to be implementable in some data modeling scheme (such as
relational) and some target DBMS, and reflecting storage efficiency and
performance goals. Note that none of
these have anything to do with the users’ view in a data model, and should not
be reflected in that first stage model (some call this the “conceptual” data
model). Here the notion of conceptual
should not be confused with a high-level, or abstract presentation of a
data model. While we may present a data
model in a high-level, abstraction, the underlying model needs to be as
detailed as necessary to capture as much as possible of the exogenous semantics
of the users’ world. That model should
have no consideration or constraints relating to physical implementation.
Back to the question.
The implemented database schema (in some target DBMS) may have some
additional elements defined in a way that is peculiar to that system, but need
not be expressed in the physical data model.
While the physical data model is a prelude to implementation it is
primarily a vehicle for communication with people, to understand the data
model. The physical data model can
include generic elements of a physical representation but not in the way which
might be represented in the implemented schema.
Of course, it is important to document (for people)
the transformations (mappings) made from one stage of a data model to the next,
and the rationale for those choices.
This documentation is key. From
the discussion so far, people seem to be assuming that all you have are the two
data models, and the issue is to what extent are they different or should they
be the same. If you want them to be the
same, then I would argue for two physical data models - one generic, and one
tailored for the target DBMS implementation.
However, this is more work and probably not very useful. So keep them different, and document the
differences. The tool (DBMS or data
modeling tool) should have some ability to produce a human readable representation
of the stored schema, even in a graphical form.
We need both logical/physical models and “logical maps.”
We should note here that data modeling schemes and
DBMSs have their limitations (often chosen for efficiency reasons). Hence, we need to transform or fudge or
introduce (sometimes spurious) constructs to allow certain semantics to be
represented in the next stage model. For
example, to transform an ER model to a relational model we must impose the
“single atomic value” constraint on attributes (1NF) which means we must
resolve all M:N relationships into two 1:M relationships with a intersection
entity. Also, we must represent all
relationships with foreign keys, which is another reason why we cannot
represent M:N relationships in a relational model. None of this is necessary in the users’
conceptual model. People don’t have a
problem understanding M:N relationships!
Only a relational DBMS does! One
approach to implementing complex user constraints which are not expressible in
the target DBMS is to embody it in triggers and stored procedures, or worse
yet, in application code. One example
might be enforcing referential integrity, if that feature is not available in
your target DBMS. Now the semantics
really get buried and difficult for users to comprehend.
To Alan Howlett’s point, if we try to (manually)
maintain a PDM and an implemented schema which are intended to be the same,
that is simply an invitation to introduce inconsistency. If they are to be the same then they must be
stored as a single data model. Hopefully
you have a tool which can generate user-friendly representations of the model,
a graphical representation, or a schema for your DBMS (multiple?).
No comments:
Post a Comment
Comments to any post are always welcome. I thrive on challenges and it will be more interesting for you.