350 likes | 591 Views
Databases and Database Design. William A. Yasnoff, MD, PhD Oregon Health Division. Databases & Database Design. Introduction Relational Databases Database Design: Data Models. Introduction. Database = large collection of information Stored on hard disk Retrieval
 
                
                E N D
Databases and Database Design William A. Yasnoff, MD, PhD Oregon Health Division
Databases & Database Design • Introduction • Relational Databases • Database Design: Data Models
Introduction • Database = large collection of information • Stored on hard disk • Retrieval • 100,000 items X 10 ms = 17 minutes • Need to organize data to improve retrieval speed • How do you organize your paper files?
Introduction (continued) • Data Organization • sorting data: phonebook in alpha v. random order • “tabs” at key points: dictionary with tabs for each letter • direct pointers to information = index • table of key element value and address of other data • e.g. name & mailbox number --> letters in mailbox
Relational Database Model • Developed by EF Codd, CJ Date (70s) • Table = Entity = Relation • Table row = tuple = instance • Table column = attribute • Table linkage by values • Entity-Relationship Model
Attributes • Key • uniquely identify row/tuple • may be one or more attributes • Non-key • other properties of instance • dependent on key
Retrieval in RDMS • SQL • Select ... • From ... • optional: can be computed • Where ... • Query by example • Fill in the blanks • WONDER
Advantages of RDMS • Very strong theoretical basis • storage • retrieval • Easy to implement • Conflicts and anomalies can be avoided • Intuitive appeal • Easy retrieval
Relational Model: Summary • popular method of organizing data • strong theoretical properties facilitate retrieval • relation = table • attribute = column • tuple = row • key = {attributes} that uniquely identify each row
Relational Model: Summary 2 • Allows complex data relationships with multiple tables: • {patient id, patient demographics} • {patient id, patient visit date, blood pressure} • Structured Query Language (SQL) retrieval • Select patient _name where blood_lead_level > 10
Data Models • Definition: A representation of the data and and data relationships of an activity • data: case report of serious E. Coli illness • data: E. Coli serotype O157 • relationship: case report “contains” serotype (semantic relationship) • Database Design = Development of a quality data model
Correctness Conceptual (concepts represented properly) real world representation Syntactic (relationships represented properly) real world language Completeness (wholeness) Conceptual (all concepts represented) Syntactic (all relationships represented) Data Model Quality
Entity • “Any distinguishable object that is to be represented in a database” [C.J. Date] • Properties • within scope of model • single concept • a set of distinguishable “instances” • satisfies normalization rules • Also called a relation
Entity Guidelines • Relationship with at least one other entity • Unique, descriptive name • Class or set of things (not just one) • Single meaning (no homonyms) • No synonyms (two entities describing same class)
Attribute • “a type or characteristic of an entity” (e.g. “gender” is an attribute of the entity “patient”) • Domain = the set of values from which an attribute may be selected (e.g. the domain of the attribute “gender” is [male, female]) • An entity typically has many attributes
Attribute Characteristics • Key • value uniquely identifies entity • e.g. “Lab test ID” is key attribute of “specimen” • Non-Key • value does not uniquely identify entity • e.g. “author” does not uniquely identify “publication”
Attribute Characteristics • Atomic = individual data value (one and only one fact) • Description = complete and clear definition • e.g. “professional privilege date” = The date on which a health care professional is granted privilege to practice in a particular health care facility, establishing the provider’s eligibility for patient care assignments and liability coverage
Attribute Guidelines • Unique name • no plurals, possessives, articles, conjunctions, verbs, or prepositions • Clear, complete, unambiguous description • Atomic (no positional information) • Domain with 2 or more values • Originates in only one entity
Primary Key • Attribute of an entity whose values uniquely determine its occurrences • {birth certificate number, person name, birth date, mother name, physician name} • {facility, patient name, physician name, date, temperature, pulse, blood pressure}
Primary Key Characteristics • Stable: does not change over time • Minimal: fewest attributes necessary • Factless: no hidden information • Definitive: value always exists • Accessible: available when data created • Unique: absolutely no duplicates
Relationship • Semantic: “contains”, “is part of”, “belongs to” • One-to-one • serotype of an organism • immunization status of a child • One-to-many • antibiotic resistances of an organism • vaccines administered to a child
Relationship Guidelines • No circular references (e.g. “health plans -> markets -> products -> health plans” should be “health plans -> health plan markets -> market products <- health plan products <- health plans”) • Single relationship between two entities • No recursive relationships
Single Relationship of Entities • Problem: employee <--> job assignment <--> job [double circular] • Solution: person --> job assignment <-- job [“person” includes “status” as employee or contractor]
Recursive Relationships • Problem: supervisor <--> employee [may be circular] • Solution: person {person ID } --> employee-supervisor relationship {employee ID, supervisor ID} • multiple supervisory roles for each person
Normalization • Formalization of common sense rules of information organization • An attribute is functionally dependent on X only if each of its values is determined by the value of X (X may be composite) • Example: DOB is functionally dependent on Driver’s License number
Key Normalization Concept • Functional dependence of each entity must be • based on entire primary key • NOT based on any other attributes
Benefits of Normalization • Aids in database design, integration • Ensures precise capture of business logic • Minimizes redundancy • Minimizes need for null values • Prevents • information loss • unintentional results
Summary of Normalization • One Fact in One Place
Common DB Design Errors • Multiple instances in same row of table, e.g. first_value, second_value, third_value [Problem: what to do with 4th value?] • Same data item repeated in multiple places, e.g. address appears in two different tables [Problem: how to keep two values synchronized?]
Correctness Conceptual (concepts represented properly) real world representation Syntactic (relationships represented properly) real world language Completeness (wholeness) Conceptual (all concepts represented) Syntactic (all relationships represented) Data Model Quality
Database Design Pearls • Accommodate all data needed • Correct relationships between data items • No duplicate representation • Anticipated retrievals use indexes • Meet confidentiality requirements
References - 1 • Reingruber MC & Gregory WW: The Data Modeling Handbook (New York: John Wiley & Sons, 1994) • Montgomery SL: Object-Oriented Information Engineering (Boston: AP Professional/Harcourt Brace, 1994)
References - 2 • Codd EF: The Relational Model for Database Management (Reading, MA: Addison-Wesley, 1990) • Date CJ: An Introduction to Database Systems, 5th ed. (Reading, MA: Addison-Wesley, 1990) • Duncan KA: Health Information and Health Reform (San Francisco: Jossey-Bass, 1994)