E N D
1. Data Modelling Contentious Issues The Best Debates from the Data Modelling List
January 2005
Karen Lopez
InfoAdvisors.com
2. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 2 Karen López, I.S.P. Karen is the moderator of the Data Modelling List. She has 18 years of data modelling experience on large, multi-project programs
She has a B.Sc. in Computer Technology / Information Systems from Purdue University
She is a former President of the Information Resource Management Association of Canada (IRMAC)
3. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 3 Upcoming Speaking Engagements
DAMA / Meta Data Symposium
May 2005
Toronto IRMAC
Spring 2005
4. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 4 About this Presentation We will be using an interactive format - you will be participating in informal polls about data modelling issues and best practices.
This is not an introductory presentation - a good knowledge of data modelling issues will be assumed.
5. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 5 InfoAdvisors’ Discussion Groups E-mail, web, and newsgroup based discussion group
Data Modeling, Frameworks, Tools Groups
Over 8000 subscribers
Moderated
No Charge
www.infoadvisors.com
6. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 6 E-mail
7. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 7 Web
8. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 8 Newsgroups
9. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 9 Agenda Contentious Issues
Background
Discussion Quotes
Poll
Results & Analysis
Resources
10. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 10 Contentious Issues Near-religious discussions and debates
People rarely change their minds based on this discussion
The most successful discussions are ones where both sides learn something new about the other viewpoint.
11. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 11 Conceptual Data Models
Do you do them?
Are they used? How?
Just what is one?
How do they differ from other Data Models?
12. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 12 Conceptual Data Models What is a CDM?
Entity only, no cardinality, fewer than 15 entities
Entity only, cardinality, more than 15 entities
Entity only, one Entity for every entity in the LDM
Entity & Attributes, no physical, no surrogate keys
Entity & Attributes, no DBMS-specific issues
13. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 13 Conceptual Data Models – View 1 The conceptual model is concerned with the real world view and understanding of data; the logical model is a generalized formal structure in the rules of information science; the physical model specifies how this will be executed in a particular DBMS instance.
Duncan Dwelle, AIS Intl. http://www.aisintl.com/case/CDM-PDM.html
14. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 14 Conceptual Data Models – View 2 “A [CDM], typically called an Entity Relationship Diagram (ERD), contains business information and structure without regard to physical storage concerns. Business items become entities and you describe them with attributes. You formalize relationships…. Typically you have no index, foreign key or tablespace information.
…
A Logical Data Model (LDM) is sometimes skipped over. It is a hybrid between the CDM and PDM and contains some DBMS elements like denormalization and indexes, but isn't as detailed as the PDM.”
- Michael N.
15. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 15 Conceptual Data Models – View 3 “[CDM is] a tricky phrase to define in that our industry uses this same phrase for two very different (in my opinion) ways. If you come from the Information Engineering background, a CDM is an entity and relationship-only method (no attributes and often no cardinalities to the relationships) that has a purpose of defining the scope of smaller, more detailed logical data modelling projects. Those logical data models are not constrained by organizational, platform, or other technical issues.If you come from an ORM background, CDM is much like Mike has described [previously]. As he mentions, in this case Logical Data Modelling is similar to a first cut physical data model.
- Karen Lopez
16. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 16 Conceptual Data Models – Warm up Vote Do Conceptual Data Models have Attributes?
Do they have cardinality?
Are there fewer Entities in a CDM than in an LDM?
If they have attributes, can they have surrogate keys?
17. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 17 Vote! Ok, Here’s How we vote
I post a question
The question has a Range.
You put the sticky in the area for your vote (1-5), sometimes once for PDM and once for LDM.
We debate the answer and the results
18. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 18 Conceptual Data Models - Vote Do you create Conceptual Data Models …that are actively used?
Yes, it’s up-to-date, easily available, and used at least a couple of times a year
What’s a Conceptual Data Model? We don’t need no stinkin’ CDMs!
19. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 19 Conceptual Data Models TRAP!You may find yourself debating a topic because there is no common definition of a model or object, not because there is a real disagreement.
20. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 20 Do we need Classwords? A traditional naming convention
Usually means there’s a standard classword list
Some tools can check for standards compliance
Examples: Date, Amount, Count, Quantity
21. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 21 Attribute Names Customer First Name
Customer
Backordered Specialty Order Item ID
Item ID
Item
22. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 22 Classwords - Vote Does a good LDM / PDM data model use classwords?
Always
Classwords are so obsolete….
23. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 23 Keys - Natural or Surrogate Just what you call them may show your true colours....
Surrogate, Non-intelligent, Unnatural, Dataless or Meaningless
Natural, Intelligent, Normal, Cluttered
24. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 24 Keys – Natural or Surrogate
25. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 25 Keys - Natural or Surrogate – View 1 I have inherited many [tables] that have used business elements for keys. The complexity of the keys for some tables is horrendous…Another thing I hate about this method is putting business information into the PK. This leads to the business changing either the contents of, or the definition of, a key column. …You should never change PK values….
With single column PK, you have single column FKs and easier maintenance of the system over time. You may have to perform more joins to satisfy complexqueries, but the joins are simpler. … I have implemented surrogate keys as a "strongly recommended“ guideline for all future database development.
- Michael N.
26. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 26 Keys - Natural or Surrogate – View 2 I agree wholeheartedly with Mike, but with one exception. I think reference tables would be better retaining the business elements in the primary key since they're usually standalone tables not used in joins. On a rather large application we set the ref tables up with sequence numbers as the primarykey. Quickly some developers began hard coding them to refer back to the row. This caused us much difficulty
- Thomas Z.
27. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 27 Keys - Natural or Surrogate - Vote What best describes your approach in an LDM / PDM?
Surrogate Keys? We don’t need surrogate keys - we have natural identifiers.
Every entity deserves its own surrogate key.
28. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 28 Derived and Redundant Data Dilemma Not all derived data is calculated
Snapshot versus History
History: Just maintain a foreign key (or relationship) back to the timestamped data
Snapshot: Copy the data to another location to preserve the data at that point in time.
29. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 29 Historical (no derived data)
Derived and
Snapshot
Derived Data Dilemma
30. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 30 Derived Data Dilemma - Vote Do Derived Attributes belong in an LDM/PDM?
Never
All the time - pure logical data modelling doesn’t work in the real world.
31. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 31 Derived Data Dilemma TIP!Every decision is ultimately made based on cost, benefit, and risk. Be prepared to analyze all three.
32. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 32 Abstraction / Generalization A modelling design decision
Can be very flexible
Can be very difficult to understand
33. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 33 Abstraction
34. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 34 Abstract or Specific - View “Although your model represents a higher level of abstraction than most people would create it does not violate the rules of normalization. So, it is not technically speaking incorrect. Now come the issues. What is the purpose of the model? Are you documenting the business requirements and rules? If so your approach does not provide as much information about the business as a less abstract approach would…If you are trying to model a flexible environment where the required data (represented by the repeating groups) changes frequently then your model would be more appropriate.”
- Rick B.
35. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 35 Abstract or Specific Which style are you most likely to use in an LDM / PDM?
The more flexible (Abstract) the better
The more precise (Specific) the better
36. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 36 To Party or Not to Party Party, Party Role, Party Type, Party Category
Migration of Party ID
Use of Subtype “Owned Keys”
Universality of Party
37. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 37 To Party or Not to Party Party
Non-Party
38. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 38 To Party or Not to Party It is so unfortunate that there is still a mentality that we need to model the simple case only. I personally think that once you understand the party, party role model it is simpler. That simplicity reinforces itself as you are able to apply the same pattern to lots of different situations and environments.
- George P.
39. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 39 To Party or Not to Party The problem I find with PARTY ROLE is that it is derivable information. This is purely a classification of an individual body based on other data that must be available. Otherwise if I identify you as my “customer” what business meaning does it have? There must be one or more “orders”, or “contracts”, or “correspondences”, or “contacts”, or whatever I deem necessary for it to be sensible for you to be known as my customer. It’s the ORDER that makes you my customer, not being my customer that makes me able to take your ORDER.
- Mike V.
40. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 40 To Party or Not to Party - Vote Which is the best approach to PARTY in an LDM /PDM?
It’s the only way to be fully logical
It’s too academic, too theoretical and it just won’t work
41. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 41 DA – Lawyer or Architect? Several new legislative actions focus on data quality, integrity, reliability, availability…
Sounds like good DA work, right?
42. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 42 Examples Sarbanes-Oxley (Sarbox): Financial Accountability & Reliability
Anti-Spam Legislation: Data integrity, Reliability
Privacy: Quality, Reliability, Accountability…
43. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 43 DA Role in Compliance Finally, a reason for the Execs to support us…
Hey! I don’t even have time write good definitions…
44. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 44 Did UML kill the Data Modelling Star? Combines data and process into one diagramming technique
Some tools support ERDs as well as UML models
Many DA’s are being pressured to choose between UML and ERDs.
45. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 45 UML Class Diagram
46. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 46 UML - Vote? Is UML an acceptable replacement for LDM?
Never – 2 different things
UML is the future, we might as well switch now….
47. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 47 Generic Data Models Generic Data Models are models or subsets of models that an organization can purchase.
They are sometimes prepared for a specific industry
May be very generic, may be both.
48. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 48 Generic Data Models - Vote What do you think about Generic Data Models or Patterns?
I love them and won’t ever start a project from a blank page again.
They are so high level that I don’t see the value in purchasing them.
49. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 49 Who Gets to Update the Model? Many marketing pitches portray a team of developers, modelers, DBAs, etc., happily working on the model
Is it “More hands make for less work” or “Too many cooks spoil the soup”?
50. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 50 Who Gets to Update the Model Only seasoned Data Modeling Professionals
Everybody – we’re all professionals here
51. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 51 Who calls the shots in a Logical Data Model? Many stakeholders in a LDM
Not everyone shares the same understanding about the purpose of the LDM
DM tools can greatly influence the decision
DA’s can report through a variety of departments
52. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 52 Who calls the shots in a Logical Data Model? The FINAL say on LDM / PDM decisions belongs to:
The Data Architect/Modeller
The Project Manager
The User…Customer
The DBA
The CIO/CEO
53. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 53 XML – the next Data Specification XML is a type of markup language for data
Similar to HTML, but carries meaning as well as format metadata
Many who use EDI are looking toward XML
Many vendors are jumping onto this ‘standard’
Still requires establishing meaning.
54. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 54 XML <?xml version="1.0" encoding="iso-8859-1" ?> -
<clientapp>
<global>
<string phone_number="*8369464" />
<string user id=“TestUser”/>
<string dir_html="html" />
<string dir_language="english" />
<string filename_html_connected="connected.htm" />
<string filename_html_targetfile="target.htm" />
</global>-
55. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 55 XML – the next Data Specification I have heard from a wide variety of sources that XML means that data modelling is dead -- who needs a data model when you can create a DTD and call the piece of data anything you want? All you have to do is convince everyone else to call it the same thing...
- Karen Lopez
56. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 56 XML – the next Data Specification XML is very simple. It is a facility for defining your own tags in transmitting documents. (With HTML you have to use theirs.) The implication of this is that the browser/person receiving the document has to know how to interpret your tags. This is the fun part. Some organizations have begun to define a set of tags to serve their purposes -- mathematicians, chemists, metadata gurus, etc.Alternatively, if you send the "Document Tag Definition (DTD)" with your data, it can be used to decode it, at least syntactically.
- David H.
57. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 57 XML – the next Data Specification
I see XML as a new medium for presenting data. To effectively present the data it needs to be understood and this is the role of the data model. Do we have the fundamental meaning of our data understood and represented in a quality data model? If yes then the transformation of the requirements into the presentation by XML will be much more effective. ….
Enough said for now on how I feel that the data model and XML need to correlate data requirements.
- Paul E.
58. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 58 XML - Vote? Will XML replace data models?
Never – 2 different things
XML is the future, we might as well switch now….
59. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 59 Is that with one “L” or Two? How is it spelled?
Modeling/Modeler
Modelling/Modeller
It depends on my mood…
60. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 60 Online Data Modelling Resources InfoAdvisors (http:///www.infoadvisors.com)
The Data Model List
Product User Discussion Groups (ERwin, Advantage Repository, Visible Analyst/Advantage, ER/Studio, DBArtisan, CaseWise)
The Data Administration Newsletter (http://www.tdan.com)
61. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 61 Recommended Books Data Modeling Essentials, Graeme Simsion, 2004
Building Quality Databases with IDEF1X, Thomas Bruce, 1992, Dorset House, ISBN 0-932633-18-8
The Data Modeling Handbook, Michael Reingruber & William Gregory, 1994, Wiley QED, ISBN0-471-05290-6
A Practical Guide to Logical Data Modeling, George Tillmann, 1993,McGraw-Hill, $45, ISBN 0-07-064615-5
62. 7 Mar 01Jan 2005 © 2001 InfoAdvisors 62 Recommended Books Data Model Patterns: Conventions of Thought, David C. Hay, 1996, Dorset House, ISBN 0-9326333-29-3
The Data Model Resource Book, Silverston, Volumes 1 and 2
63. Jan 2005Jan 2005 © 2005 InfoAdvisors 63 InfoAdvisors 11066 Sheppard Ave East
Toronto, ON CANADA
karenlopez@infoadvisors.com
http://www.infoadvisors.com
8000+ IRM participants subscribed to several discussion groups