Intro

I have collected Q&A topics since about 2010. These are being put onto this blog gradually, which explains why they are dated 2017 and 2018. Most are responses to questions from my students, some are my responses to posts on the Linkedin forums. You are invited to comment on any post. To create a new topic post or ask me a question, please send an email to: geverest@umn.edu since people cannot post new topics on Google Blogspot unless they are listed as an author. Let me know if you would like me to do that.

2017-12-20

Can we avoid the need for normalization entirely?

Normalization is the Achilles heel of data modeling. Even experienced data modelers get it wrong sometimes. Wouldn't it be nice if we had a data modeling scheme which did not require the modeler to think about and apply the rules of normalization? How might that be possible?

Any form of record-based modeling (e.g., Relational, ER, IDEF1X, UML...) requires the enforcement of normalization rules.  In record-based modeling, data items are clustered together in records (or tables) to describe a particular entity type, the data items are then called "attributes" of the entity.  The problem is that sometimes we put attributes into a table when we shouldn't.  The rules of normalization help us to identify which attributes violate the rules, and hence must be removed from that table (and probably put into some other, often new, table).

Unfortunately, this is necessary but not sufficient for a good design.  The rules of normalization are only applied to a single record/table at a time.  There may be inter-record relationships which require further examination to ensure a good design.  Another problem is:  you must first determine or designate the identifier before you can apply the rules of normalization.  That is because normalization is all about what "determines" each attribute.  To paraphrase Chris Date (which has been widely recounted) "every data item (in a record) must be single valued (for each value of the identifier key) [1NF], must be functionally dependent on the key, the whole key (if  composite) [2NF], and nothing but the key (i.e., directly dependent with no transitive dependencies) [3NF] ... so help me Codd!"  

Normalization is the test, and record decomposition is the remedy for violations (i.e., removing data items from the record).  The need for normalization can be avoided if we don't cluster data items into a record in the first place.  That is the essence of fact (oriented) modeling, such as ORM.  The inverse of functional dependency is determinancy.  In a normalized record, the determinant of an attribute is the key or identifier.  In fact modeling, we directly relate each data item with its determinant.  That then gives us the unambiguous way to store data items into records or tables without violating any of the rules of normalization.  In a sense, a fact model is the ultimate decomposition of a (record-based) database, such that each record now has at most one non-key domain.  Some have called this the 6th NF, but beware, there are other definitions of 6NF.

So the answer to your question is yes... if we design our data models using fact modeling we can avoid the need for normalization entirely!

No comments:

Post a Comment

Comments to any post are always welcome. I thrive on challenges and it will be more interesting for you.