Intro

I have collected Q&A topics since about 2010. These are being put onto this blog gradually, which explains why they are dated 2017 and 2018. Most are responses to questions from my students, some are my responses to posts on the Linkedin forums. You are invited to comment on any post. To create a new topic post or ask me a question, please send an email to: geverest@umn.edu since people cannot post new topics on Google Blogspot unless they are listed as an author. Let me know if you would like me to do that.

2018-01-02

Physical data model governed by business (logical) data model

Jaunine Conradie (student) asks: (email to me on 2012 Aug 13)
    The typical Teradata architecture includes a staging layer (stages data from sources in the form that they exist on the source), an integration layer (3NF integrated data from stage), and a semantic layer (dimensional view on the integration layer data).
.. We are two modelers on the project. The problem that we have is that the ETL lead is demanding a model change to facilitate data loads and queries.
.. In short, a Subscriber on the network (Vodacom here in South Africa, which is like Sprint or Verizon in the USA) have various Access Methods of various Access Method Types.  Types can for example be Primary (cell phone number), Data (for internet access), etc.
The change being requested by the ETL lead is to host the Primary access method on Subscriber.

This is problematic because:
1. This is inconsistent with the implementation, where some access methods for a subscriber will now be carried on the Access Method Subscriber entity, and some on the Subscriber entity itself.
2. We do not (and should not) carry history on the Subscriber entity.
3. Teradata has specialized index types to facilitate joins (this negating the request to denormalize because of query performance)


How do we convince the program manager to not let ETL‑related considerations dictate the design of the physical model?

1 comment:

  1. EVEREST responds:

    .. The logical model is what should govern here. As I have preached, the logical model should be as accurate a model of the perceived user world as possible. The logical model should not reflect constraints or considerations of physical implementation. Retain only what is important and meaningful to the users. That probably leaves out most consideration of data storage types, and identifiers, for example. These are normally (or should be) deferred until design for physical implementation. Any demands on design to promote efficiency of ETL data loading are entirely out of place if they compromise the logical design. So are considerations of what join processing algorithms are available. Such considerations can be brought to bear down stream in the implementation process.
    .. Now there may be some disagreement on what the world looks like, which will entail some negotiation... among all interested parties, not just ETL and DB implementers, in particular the users who best know their world. There should also be no constraints from either the need to reflect it in a 3NF relational database, nor in a dimensional model. These both tend to introduce spurious constructs and apply unnatural constraints on the design. As you well know by now, the starting point should be a model representation in ORM (or some equivalent, non record based modeling scheme. In Europe they call it FOM, or fact oriented modeling. I don't know what is predominant in South Africa, but you should try to hook into what your modelers/developers already understand, if anything. Otherwise, you have a unique teaching opportunity).
    .. So the issue will not be resolved between you and the ETL folks, without involving the users who know (collectively) what their world looks like, and what they need to do in that world (i.e., the business operations... on the data). Having all parties with a stake in the outcome at the table, the focus should initially be only on arriving at a logical structure (a la ORM) on which all parties can agree. Push all considerations of data storage (encoding), identifiers (reference modes), physical access, etc. downstream, i.e., deferred. With a good logical model as the basis, it is possible to think through and choose the (better) implementation alternatives. At the logical level you are asking, what are the objects (whether entities or attributes), what is the nature of the object populations (in your case, access types), what are the relationships among them, what are the constraints on those objects and relationships, what is/are the determinants (as in functional dependencies) for each object type. Sound familiar?
    .. For some comments on the specifics of your email. Note that your concerns rest on questions of where we store the "attributes", with which entity. In ORM there are no attributes, only relationships between objects. To point (2), history may be relevant (ask the users), where you store it is a downstream question.

    ReplyDelete

Comments to any post are always welcome. I thrive on challenges and it will be more interesting for you.