Intro

I have collected Q&A topics since about 2010. These are being put onto this blog gradually, which explains why they are dated 2017 and 2018. Most are responses to questions from my students, some are my responses to posts on the Linkedin forums. You are invited to comment on any post. To create a new topic post or ask me a question, please send an email to: geverest@umn.edu since people cannot post new topics on Google Blogspot unless they are listed as an author. Let me know if you would like me to do that.
Showing posts with label data model. Show all posts
Showing posts with label data model. Show all posts

2020-04-25

Are we really modeling data?

Even the title of this blog is a little misleading.  Perhaps I should call it
Everest on "Data" Modeling!
I never did like the phrase "data" modeling.  It suggests that it is a "model of data."  That is misleading to someone outside of our community.  At the heart of it we are not modeling data, we are modeling some user domain. It is only a model of data if we have some data.  Then the model would be an (abstract) "representation" of the data.  For us, as data modelers, a "data" model is a model of some aspects of a domain of interest to a community of users, real (world) or imagined/desired/yet to be built.  It is a model expressed "in data," that is, built using informational constructs, all guided by a modeling scheme.  The modeling scheme tells us what to look for in the domain and how to represent it in the model.  So we identify a population of similar things, give it a label and a definition, and put a box or circle into a diagram to represent that population of things.  We build up or "design" a model with lots of types of things, add relationships among those things, and constraints on those things and relationships.  The modeling scheme tells us how to represent those relationships and constraints in our model.

2020-04-21

In a data model: nouns and predicates

John O'Gorman asks (LinkedIn Data Modeling 2020 April)

Why do data models only include nouns? Second, the word 'Status' is a noun, right? If I use it as the name of a set, could I include the words 'Active', 'Inactive', 'Stalled', and 'Inverted' as members of the set? If so, could I include them in a data model as Concepts even though they are clearly not nouns?

Ken Evans answers:

Not true. A proper data model has nouns and predicates that define the relationships between the nouns.

Everest responds:
First, for data modeling, I note that a noun implies a population of "things". Perhaps the hardest but most important part of building a data model is in defining the members of that population so we can always determine what is included and what is excluded from the population.
Not only nouns and predicates (verb phrases) but also adjectives, understanding that an adjective serves to restrict the population of the noun. A noun qualified by an adjective would name a subset of the noun population, e.g., Employee, and full-time Employee.


2020-04-01

Thinking about Attributes or Properties

Kevin Feeney (LinkedIn, Data Modeling, 2020/03/24)
In his presentation on data modeling (https://lnkd.in/dnwYTEY) says that an accurate data model defines Things, Properties of things, how things are Identified, and Relationships.

Everest says:
A caution when thinking about Properties.  You cannot define an Attribute until you first have (or presume) a Relationship.  An Attribute is a thing with a population (a domain of values).  So ORM does not distinguish, it calls them both Objects.  For example, I could have a thing called a Skill Code and Employees have Skills.  That means there is a relationship between Employee and Skill.  We often depict an Attribute being tucked away in a box for the Employee.  This naturally leads to (thinking about) putting it in a column in a table for the Employee entity.  That can lead to problems.  In ORM we defer thinking about tables since that is really a step toward implementation (in a Relational DBMS).  Better to think in terms of two objects, Employee and Skill, with a relationship between them.  So here is the definition:  An ATTRIBUTE (or Property) is an OBJECT which plays a ROLE in a RELATIONSHIP with another OBJECT.  Now we can add cardinality to the relationship.  In fact, in this example, if an Employee can possess multiple Skills there is a M:N relationship and Skill cannot be stored in an Employee table (it would violate First Normal Form).  But Skill is no less an attribute of Employee, even if it is not stored in the Employee table.  That further reinforces the fact that an OBJECT has ATTRIBUTES by virtue of having RELATIONSHIPS with other OBJECTS.  Hence, there is no need for an Attribute artifact in a data model.

2020-03-29

How a data model is expressed

Fabian Pascal posts (LinkedIn, Data Modeling, 2020/03/29)
Shown a data model diagram, he says "Actually, it is not a logical model, but a graphical representation of it that users understand -- the DBMS doesn't. Users understand the conceptual model SEMANTICALLY, the DBMS "understands" the true logical model ALGORITHMICALLY and that's not what your drawing is.

Everest responds:

Regarding model (re)presentation, the exact same model can be presented in a variety of ways - graphical diagram, formal linear (for machine processing, I take you call this "logical"). But I would hope and expect that the underlying semantics would be exactly the same. We build a data model, and it should not matter how it is expressed, as long as they are equivalent. The semantics relate to the model, not how it is expressed (Conceptual?). Semantics is the meaning behind what is presented. Furthermore, all models are logical, that is, built according to some set of rules of logic and formal argument, no matter how it is presented or who reads it (check your dictionary). The rules in this case constitute what I call the modeling scheme.

Who judges the accuracy of a data model?

Kevin Feeney says (LinkedIn, Data Modeling, 2020/03/29)
How do we know that we have a correct model? My general take would be that we know the model is correct, if the world moves and the model moves with it without breaking anything - if the model is wrong, you'll find duplication and other errors seep in. In terms of how you set the scope for the model and how you deal with systems integration and different views on data - generally you need to go upwards in abstraction and make the model correct at the top level. For example, when you find that two departments have different concepts of what a customer is, both of which are correct from their points of view, there are implicit subclasses (special types of customer) that can be made explicit to make everything correct.

Everest responds:

Kevin, I see two main problems with your viewpoint. (1) Sounds like you are saying design it, build it, and wait for problems to arise. Surely we need to judge correctness before we commit resources to implementation. That would be irresponsible and dangerous. Before implementing the model and building a database, we need to have some assurance that our model is an accurate representation of the user domain. (2) It sounds like you are depending on the designers/modelers to make judgments about model correctness. That is the last thing I would do. Too often I have found that the data modeling experts had only superficial understanding of the user domain. They may be well versed in the modeling task, but that doesn't produce a good model. The best modeling tool in the world and the best modeling methodology would be insufficient to produce a "correct" data model. Rather than a "correct" model I prefer to call it striving for an "accurate" data model, that is, one that accurately represents the user domain. As Simsion argued and I agree, there is no single correct model. So, who best to judge?

So, who best to judge the "correctness" of a data model? I say, the USERS THEMSELVES. They are the ones who understand their world better than anyone else. But you have to get the right users to the table. I have lead dozens of data modeling projects and we only go to implementation when ALL the user representatives sign off and say "Yes, this is an accurate representation of our world." If there are differences, they must be resolved among themselves (with wise direction from a trained data modeler). One caveat: the users must thoroughly understand the data model, in all its glorious detail (not high-level). This is the responsibility of the data modeler to ensure the users collectively understand all the details of the model -- an awesome responsibility. That means the users must understand the model diagrams and all the supporting documentation, particularly the definition of the "things" (entities, objects), relationships (binary, ternary, and more), and all the associated constraints (e.g., cardinalities). Our goal is to develop as rich a representation as possible of the semantics of the user domain, and that means having a rich set of constructs to use in developing the model. So far, I see ORM as the richest modeling scheme.

The best way to make this happen is for the user representatives to be part of the modeling team. In fact, they should be the ones in control. Upper management needs to grant release time to those users most knowledgeable about their domain. An experienced data modeler needs to facilitate and guide the modeling process and the development of the data model. The team needs to be allowed to meet and deliberate as long as necessary to arrive at a model which they all feel comfortable approving. In my experience the users have always known when they were done (and ready to go to implementation), although the time it took was difficult to predict up front. Only in one project were we unable to come to agreement and that is because we had the wrong user representatives at the table. They were little more than data entry clerks who really didn't understand the meaning of the data, why it was important, nor how it was used.

2020-03-25

Is a data model a "representation"? ...of what?


Ken Evans asks (LinkedIn, Data modeling, 2020/3/25)
Hmm. Is a data model a "representation"?

EVEREST RESPONDS:

First of all I don't like the phrase "Data Model" (sorry "Ted" Codd).  It suggests that it is a "model of data."  That is misleading to someone outside of our community.  It is only a model of data if we have some data.  Then the model would be an (abstract) "representation" of the data.  For us, a "data" model is a model of some aspects of a domain of interest to a community of users, real (world) or imagined/desired/yet to be built.  It is a model "in data," that is, built using informational constructs, all guided by a modeling scheme.  The modeling scheme tells us what to look for in the domain and how to represent it in the model.  So we identify a population of similar things, give it a label and a definition, and put a box or circle into a diagram to represent that population of things.  We build up or "design" a model with lots of types of things, add relationships among those things, and constraints on those things and relationships.  The modeling scheme tells us how to represent those relationships and constraints in our model.  Graham Witt adds some light to this argument by calling it a "business data model."