Intro

I have collected Q&A topics since about 2010. These are being put onto this blog gradually, which explains why they are dated 2017 and 2018. Most are responses to questions from my students, some are my responses to posts on the Linkedin forums. You are invited to comment on any post. To create a new topic post or ask me a question, please send an email to: geverest@umn.edu since people cannot post new topics on Google Blogspot unless they are listed as an author. Let me know if you would like me to do that.
Showing posts with label data model correctness. Show all posts
Showing posts with label data model correctness. Show all posts

2020-03-29

Who judges the accuracy of a data model?

Kevin Feeney says (LinkedIn, Data Modeling, 2020/03/29)
How do we know that we have a correct model? My general take would be that we know the model is correct, if the world moves and the model moves with it without breaking anything - if the model is wrong, you'll find duplication and other errors seep in. In terms of how you set the scope for the model and how you deal with systems integration and different views on data - generally you need to go upwards in abstraction and make the model correct at the top level. For example, when you find that two departments have different concepts of what a customer is, both of which are correct from their points of view, there are implicit subclasses (special types of customer) that can be made explicit to make everything correct.

Everest responds:

Kevin, I see two main problems with your viewpoint. (1) Sounds like you are saying design it, build it, and wait for problems to arise. Surely we need to judge correctness before we commit resources to implementation. That would be irresponsible and dangerous. Before implementing the model and building a database, we need to have some assurance that our model is an accurate representation of the user domain. (2) It sounds like you are depending on the designers/modelers to make judgments about model correctness. That is the last thing I would do. Too often I have found that the data modeling experts had only superficial understanding of the user domain. They may be well versed in the modeling task, but that doesn't produce a good model. The best modeling tool in the world and the best modeling methodology would be insufficient to produce a "correct" data model. Rather than a "correct" model I prefer to call it striving for an "accurate" data model, that is, one that accurately represents the user domain. As Simsion argued and I agree, there is no single correct model. So, who best to judge?

So, who best to judge the "correctness" of a data model? I say, the USERS THEMSELVES. They are the ones who understand their world better than anyone else. But you have to get the right users to the table. I have lead dozens of data modeling projects and we only go to implementation when ALL the user representatives sign off and say "Yes, this is an accurate representation of our world." If there are differences, they must be resolved among themselves (with wise direction from a trained data modeler). One caveat: the users must thoroughly understand the data model, in all its glorious detail (not high-level). This is the responsibility of the data modeler to ensure the users collectively understand all the details of the model -- an awesome responsibility. That means the users must understand the model diagrams and all the supporting documentation, particularly the definition of the "things" (entities, objects), relationships (binary, ternary, and more), and all the associated constraints (e.g., cardinalities). Our goal is to develop as rich a representation as possible of the semantics of the user domain, and that means having a rich set of constructs to use in developing the model. So far, I see ORM as the richest modeling scheme.

The best way to make this happen is for the user representatives to be part of the modeling team. In fact, they should be the ones in control. Upper management needs to grant release time to those users most knowledgeable about their domain. An experienced data modeler needs to facilitate and guide the modeling process and the development of the data model. The team needs to be allowed to meet and deliberate as long as necessary to arrive at a model which they all feel comfortable approving. In my experience the users have always known when they were done (and ready to go to implementation), although the time it took was difficult to predict up front. Only in one project were we unable to come to agreement and that is because we had the wrong user representatives at the table. They were little more than data entry clerks who really didn't understand the meaning of the data, why it was important, nor how it was used.

2020-03-26

Determining the "Correctness" of a data model

LinkedIn, Data modeling, 2020/3/26
I asked: How do we know when a data model is correct?
Ken Evans responded: That's easy, when the model conforms to stated requirements.
I then asked "who determines/documents the requirements?"
Ken responded:
It does not matter "who" determines the requirements. The point is that you can only judge whether a deliverable is "correct" if you have a set of pre-established requirements against which to assess "correctness". This principle has been widely accepted in the quality management discipline since 1978 when Philip Crosby published his book "Quality is Free." Crosby makes the point that "Quality is conformance to requirements."

Everest Responds:
Sorry about the "who", perhaps I should have said "how." While I accept the general principle, is it realistic? Crosby's statement begs the question, if quality is conformance to requirements, then who/how do we determine the correctness and completeness of the stated requirements? I have yet to see anything close to an a priori statement of requirements that was sufficient to judge the correctness of the end result, i.e., a domain data model. Furthermore, I have yet to see any guidelines sufficient for preparing a statement of requirements for a data modeling project. I would love to see any examples.
.. To me, the only satisfactory "statement of requirements" sufficient to judge the correctness of the model would be the final, detailed data model itself. Anything less than that would not be sufficient to express the full set of semantics to be included in the final model. In the case of ORM perhaps, a complete set of elementary fact sentences, with well defined object types and predicates. But that is what the entire process of data modeling is all about -- to discover and document the semantics of some user domain. We want to capture at least as rich a set of semantics as possible given our modeling scheme (which is why we need to use a modeling scheme such as ORM which captures many more semantics than any other scheme, including ER/relational).
.. So the question remains, whether we are talking about the data model, or the requirements for a data model -- how do we judge the correctness of a data model? Who is in the best position to do that?

2020-03-25

Is data modeling Design or Description?


Ken Evans asks (LinkedIn, Data Modeling, 2020/3/25)

In the book "Data Modeling: Theory and Practice" describes the result of his extensive research into the "Design or Description?" question. Graeme's research showed that whilst many people believe that a data modeler's job is to "describe" a reality that is out there, the truth is that data modelers are designers rather than describers.

Everest Responds:

I do not intend to contradict what Simsion said.  If modeling was purely a descriptive activity it would be easy, although we would have differing points of view, different interpretations.  It is the differences in these descriptions that means it is best considered a "design" activity -- precisely because there are many choices to be made in building the model.  The modeler CHOOSES how to represent that domain in building the model.  So the question remains, how do we know when we have a correct model?  Interesting that Simsion maintains that there is no one correct model, hence a design activity.  I agree with that, which is why I posed the question about the goal of data modeling  is to find THE correct model.
Or perhaps one would be open to finding A correct model! However, that still doesn't answer the question: How do we know it is A correct model?