Intro

I have collected Q&A topics since about 2010. These are being put onto this blog gradually, which explains why they are dated 2017 and 2018. Most are responses to questions from my students, some are my responses to posts on the Linkedin forums. You are invited to comment on any post. To create a new topic post or ask me a question, please send an email to: geverest@umn.edu since people cannot post new topics on Google Blogspot unless they are listed as an author. Let me know if you would like me to do that.

2020-04-23

All language is referential: Members of a population vs. identifiers

John O.Gorman posts (LinkedIn 2020 April 22)

Since all language is referential I can declare membership of strings based on their usage in communication. I don't need attributes or properties to do so. For example, the string 'John Smith' looks to me to refer to a Person, so I make an ontological commitment to associate that string with that (Person) class. Doing so accomplishes a couple of things: 1. I can use that class anywhere I might want to reference members of that collection.

Everest responds:

John O'Gorman. "All language is referential" - love it and I agree absolutely. Let's remember that we are modeling/representing things in some user domain. We (the designer/modeler) are the ones who define the groupings into populations. That process may seem arbitrary, but is chosen by the designer based on their purposes, and how they wish to view the world. Such groupings do not naturally occur. The world only consists of instances of things. The designer, of course, uses clues in deciding the rules of membership, and it is usually based on our observed characteristics of individual prospective members of a population and what is of interest to us.
.. The task of the modeler is to design a model to (accurately) represent the user domain. The model is essentially an abstract representation, abstract because we use "tokens" to represent things and populations of things in that domain. The dilemma for us is we must find some way to (uniquely) identify individuals (members of populations). You would not be comfortable being put into a data storage device and spinning around on the surface of a disk at 100 mph! So we need a token to serve as a surrogate for you. We call that an identifier. (Criteria for choosing an identifier is a topic for another discussion). Its form is usually some string of characters.
.. The trap we fall into is thinking that a character string, a particular surrogate token, IS the person.
It is not the string "John Smith" you want to associate with the class (or population), it is the actual person. You need to have confidence that the string uniquely identifies or references the person. The operation of our systems and databases depends on it. This is why we need to have a careful definition of the population of things we use in developing our models.
.. Defining populations and the criteria for inclusion and exclusion (and choosing identifiers) are the critical and difficult tasks of a "data" modeler.

No comments:

Post a Comment

Comments to any post are always welcome. I thrive on challenges and it will be more interesting for you.