Everest on Data Modeling: March 2020

Intro

I have collected Q&A topics since about 2010. These are being put onto this blog gradually, which explains why they are dated 2017 and 2018. Most are responses to questions from my students, some are my responses to posts on the Linkedin forums. You are invited to comment on any post. To create a new topic post or ask me a question, please send an email to: geverest@umn.edu since people cannot post new topics on Google Blogspot unless they are listed as an author. Let me know if you would like me to do that.

2020-03-29

How a data model is expressed

Fabian Pascal posts (LinkedIn, Data Modeling, 2020/03/29)
Shown a data model diagram, he says "Actually, it is not a logical model, but a graphical representation of it that users understand -- the DBMS doesn't. Users understand the conceptual model SEMANTICALLY, the DBMS "understands" the true logical model ALGORITHMICALLY and that's not what your drawing is.

Everest responds:

Regarding model (re)presentation, the exact same model can be presented in a variety of ways - graphical diagram, formal linear (for machine processing, I take you call this "logical"). But I would hope and expect that the underlying semantics would be exactly the same. We build a data model, and it should not matter how it is expressed, as long as they are equivalent. The semantics relate to the model, not how it is expressed (Conceptual?). Semantics is the meaning behind what is presented. Furthermore, all models are logical, that is, built according to some set of rules of logic and formal argument, no matter how it is presented or who reads it (check your dictionary). The rules in this case constitute what I call the modeling scheme.

Picking the right users for a data modeling project

Ken Evans said (LinkedIn, Data Modeling, 2020/03/29)
The piece of the puzzle that you have not mentioned is where the users understanding of their domain is rather vague.

Everest responds:

If you have users with a vague understanding of their domain, you are talking to the wrong people. I have found that there are people in the user domain who really do know what is going on, what their world looks like. They are often seen as troublemakers, asking the tough questions, complaining about how things are done (or not done) and suggesting how things could be done better. People on the front lines working in the trenches who actually think about what they are doing, are not satisfied with the status quo, going beyond their job description. Every organization has such people; you just need to find them. And the best way to find them is to ask other users. Most can readily tell you who they are. If there is no one they can point to, you have a dead or dieing organization where nobody cares. The people who fit the profile above will generally not be management or senior level -- who are usually tending to managing and training people, as they should be. Once you get the right people to the table it takes a skilled facilitator to elicit the needed information and document it in a usable form, i.e. a data model, such that they understand and concur with the representation.

Who judges the accuracy of a data model?

Kevin Feeney says (LinkedIn, Data Modeling, 2020/03/29)
How do we know that we have a correct model? My general take would be that we know the model is correct, if the world moves and the model moves with it without breaking anything - if the model is wrong, you'll find duplication and other errors seep in. In terms of how you set the scope for the model and how you deal with systems integration and different views on data - generally you need to go upwards in abstraction and make the model correct at the top level. For example, when you find that two departments have different concepts of what a customer is, both of which are correct from their points of view, there are implicit subclasses (special types of customer) that can be made explicit to make everything correct.

Everest responds:

Kevin, I see two main problems with your viewpoint. (1) Sounds like you are saying design it, build it, and wait for problems to arise. Surely we need to judge correctness before we commit resources to implementation. That would be irresponsible and dangerous. Before implementing the model and building a database, we need to have some assurance that our model is an accurate representation of the user domain. (2) It sounds like you are depending on the designers/modelers to make judgments about model correctness. That is the last thing I would do. Too often I have found that the data modeling experts had only superficial understanding of the user domain. They may be well versed in the modeling task, but that doesn't produce a good model. The best modeling tool in the world and the best modeling methodology would be insufficient to produce a "correct" data model. Rather than a "correct" model I prefer to call it striving for an "accurate" data model, that is, one that accurately represents the user domain. As Simsion argued and I agree, there is no single correct model. So, who best to judge?

So, who best to judge the "correctness" of a data model? I say, the USERS THEMSELVES. They are the ones who understand their world better than anyone else. But you have to get the right users to the table. I have lead dozens of data modeling projects and we only go to implementation when ALL the user representatives sign off and say "Yes, this is an accurate representation of our world." If there are differences, they must be resolved among themselves (with wise direction from a trained data modeler). One caveat: the users must thoroughly understand the data model, in all its glorious detail (not high-level). This is the responsibility of the data modeler to ensure the users collectively understand all the details of the model -- an awesome responsibility. That means the users must understand the model diagrams and all the supporting documentation, particularly the definition of the "things" (entities, objects), relationships (binary, ternary, and more), and all the associated constraints (e.g., cardinalities). Our goal is to develop as rich a representation as possible of the semantics of the user domain, and that means having a rich set of constructs to use in developing the model. So far, I see ORM as the richest modeling scheme.

The best way to make this happen is for the user representatives to be part of the modeling team. In fact, they should be the ones in control. Upper management needs to grant release time to those users most knowledgeable about their domain. An experienced data modeler needs to facilitate and guide the modeling process and the development of the data model. The team needs to be allowed to meet and deliberate as long as necessary to arrive at a model which they all feel comfortable approving. In my experience the users have always known when they were done (and ready to go to implementation), although the time it took was difficult to predict up front. Only in one project were we unable to come to agreement and that is because we had the wrong user representatives at the table. They were little more than data entry clerks who really didn't understand the meaning of the data, why it was important, nor how it was used.

2020-03-26

Determining the "Correctness" of a data model

LinkedIn, Data modeling, 2020/3/26
I asked: How do we know when a data model is correct?
Ken Evans responded: That's easy, when the model conforms to stated requirements.
I then asked "who determines/documents the requirements?"
Ken responded:
It does not matter "who" determines the requirements. The point is that you can only judge whether a deliverable is "correct" if you have a set of pre-established requirements against which to assess "correctness". This principle has been widely accepted in the quality management discipline since 1978 when Philip Crosby published his book "Quality is Free." Crosby makes the point that "Quality is conformance to requirements."

Everest Responds:

Sorry about the "who", perhaps I should have said "how." While I accept the general principle, is it realistic? Crosby's statement begs the question, if quality is conformance to requirements, then who/how do we determine the correctness and completeness of the stated requirements? I have yet to see anything close to an a priori statement of requirements that was sufficient to judge the correctness of the end result, i.e., a domain data model. Furthermore, I have yet to see any guidelines sufficient for preparing a statement of requirements for a data modeling project. I would love to see any examples.

.. To me, the only satisfactory "statement of requirements" sufficient to judge the correctness of the model would be the final, detailed data model itself. Anything less than that would not be sufficient to express the full set of semantics to be included in the final model. In the case of ORM perhaps, a complete set of elementary fact sentences, with well defined object types and predicates. But that is what the entire process of data modeling is all about -- to discover and document the semantics of some user domain. We want to capture at least as rich a set of semantics as possible given our modeling scheme (which is why we need to use a modeling scheme such as ORM which captures many more semantics than any other scheme, including ER/relational).

.. So the question remains, whether we are talking about the data model, or the requirements for a data model -- how do we judge the correctness of a data model? Who is in the best position to do that?

2020-03-25

Is data modeling Design or Description?

Ken Evans asks (LinkedIn, Data Modeling, 2020/3/25)

In the book "Data Modeling: Theory and Practice" Graeme Simsion describes the result of his extensive research into the "Design or Description?" question. Graeme's research showed that whilst many people believe that a data modeler's job is to "describe" a reality that is out there, the truth is that data modelers are designers rather than describers.

Everest Responds:

I do not intend to contradict what Simsion said. If modeling was purely a descriptive activity it would be easy, although we would have differing points of view, different interpretations. It is the differences in these descriptions that means it is best considered a "design" activity -- precisely because there are many choices to be made in building the model. The modeler CHOOSES how to represent that domain in building the model. So the question remains, how do we know when we have a correct model? Interesting that Simsion maintains that there is no one correct model, hence a design activity. I agree with that, which is why I posed the question about the goal of data modeling is to find THE correct model.

… Or perhaps one would be open to finding A correct model! However, that still doesn't answer the question: How do we know it is A correct model?

Is a data model a "representation"? ...of what?

Ken Evans asks (LinkedIn, Data modeling, 2020/3/25)

Hmm. Is a data model a "representation"?

EVEREST RESPONDS:

First of all I don't like the phrase "Data Model" (sorry "Ted" Codd). It suggests that it is a "model of data." That is misleading to someone outside of our community. It is only a model of data if we have some data. Then the model would be an (abstract) "representation" of the data. For us, a "data" model is a model of some aspects of a domain of interest to a community of users, real (world) or imagined/desired/yet to be built. It is a model "in data," that is, built using informational constructs, all guided by a modeling scheme. The modeling scheme tells us what to look for in the domain and how to represent it in the model. So we identify a population of similar things, give it a label and a definition, and put a box or circle into a diagram to represent that population of things. We build up or "design" a model with lots of types of things, add relationships among those things, and constraints on those things and relationships. The modeling scheme tells us how to represent those relationships and constraints in our model. Graham Witt adds some light to this argument by calling it a "business data model."

Everest on Data Modeling