44220: Database Design & Implementation Avoiding Database Anomalies

44220: Database Design & ImplementationAvoiding Database Anomalies Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.uk http://itsy.co.uk/ac/0506/sem2/44220_DDI/

Avoiding Database Anomalies • This lecture concentrates upon building a ‘robust’ Logical Data Model. i.e.: • Transforming a Conceptual Data Model into a set of Relations. • Checking these Relations for any Anomalies. • Documenting them as a Database Schema. • Most Database books have a section describing a mathematically-based technique called Normalisation: • I will show you a much easier way of achieving the same result, i.e. a robust database design.

What is an Anomaly? • Anything we try to do with a database that leads to unexpected and/or unpredictable results. • Three types of Anomaly to guard against: • insert • delete • update • Need to check your database design carefully: • the only good database is an anomaly free database.

Insert Anomaly • When we want to enter a value into a data cell but the attempt is prevented, as another value is not known. • e.g. We have built a new Room (e.g. B123), but it has not yet been timetabled for any courses or members of staff.

Delete Anomaly • When a value we want to delete also means we will delete values we wish to keep. • e.g. CoNo 351 has ended, but Room C320 will be used elsewhere.

Update Anomaly • When we want to change a single data item value, but must update multiple entries • e.g. Room H940 has been improved, it is now of RSize = 500.

A Conceptual Model M M 1 M Course Staff Student • Consider the following ‘simple’ conceptual data model: Staff(Staff-ID, Name, Address, ScalePoint, RateOfPay, DOB, ...) Student(Enrol-No, Name, Address, OLevelPoints, ...) Course(CourseCode, Name, Duration, ...)

The ‘Translation’ Process • EntitiesbecomeRelations • AttributesbecomeAttributes(?) • Key Attribute(s) becomePrimary Key(s) • Relationshipsare represented by additional Foreign Key Attributes: • for those Relations that are at the ‘M’ end of each 1:M Relationship.

The ‘Staff’ & ‘Student’ Relations Staff(Staff-ID, Name, Address, ScalePoint, RateOfPay, DOB, ...) becomes: Staff • Student(Enrol-No, Name, Address, OLevelPoints, ...) • becomes: • Student NB. Foreign Key Tutor references Staff.Staff-ID

The ‘Staff’ & ‘Course’ Relations Staff Course(CourseCode, Name, Duration, ...) becomes: Course • NB. Can’t add a Foreign Key; as BOTH Relations have a ‘M’ end: • I warned you about leaving M:M relationships in your Conceptual Data Model. • Must create an ‘artificial’ linking Relation.

‘Staff’, ‘Course’ & ‘Team’ Relations Staff Team Course • NB. In the ‘artificial’ Relation (i.e. Team): • The Primary Key is a composite of CourseCode & Staff-ID • Foreign Key CourseCode references Course.CourseCode • Foreign Key Staff-ID references Staff.Staff-ID

4 Relations from 3 Entities? Student Staff Team Course BUT - are they anomaly free?

Check Relations for Anomalies! • every Tuple unique? • no hidden meaning from location? • data cells atomic? • for Relations with single-attribute keys: • every Attribute depends upon the Primary Key? • for Relations with composite keys: • every Attribute depends upon all of the Composite Key?

What if the checks fail? • If any Relation fails ‘checks’: • especially those checking dependency. • we MUST split that Relation into multiple Relations: • until they pass the tests. • but MUST remember to leave behind a Foreign Key: • to ‘point’ forwards to the Primary Key of the ‘new’ split-off Relation.

Are they Anomaly Free? Student Staff Team NOT this one! as RateOfPay does NOT depend upon Staff-ID Course

The Attribute ‘RateOfPay’ depends upon ‘ScalePoint’ NOT ‘Staff-ID’. So, we need to split this Relation: Fixing this ‘Problem’ Staff Staff Pay NB. Foreign Key ScalePoint references Pay.ScalePoint

5 Relations from 3 Entities Student Staff Course an ‘artificial’ Relation - to ‘solve’ a M:M ‘problem’ Team a ‘split-off’ Relation - to ‘solve’ a Dependency ‘problem’ Pay

Don’t change Conceptual Model • Remember, we can chose from one of a range of Database Theories with which to build our Logical Data Model: • Hierarchical • Relational • Object • Each of these Database Theories may require different compromises (i.e. at the Logical Modelling stage); • from the ‘pure’ meaning captured by your Conceptual Model.

Document Relations as a Database Schema • A Database Schema: • defines all Relations, • lists all Attributes (with their Domains), • and identifies all Primary & Foreign Keys. • We should have ‘captured’ the Business situation (assumptions and constraints) in the Conceptual Data Model, e.g: • a College only delivers 10 Courses. • a Hospital only has 12 Wards. • These assumptions and constraints need to be expressed as the Domains of the Database Schema.

Logical Schema 1 - Domains • Schema College • Domains • StudentIdentifiers = 1 - 9999; • StaffIdentifiers = 1001 - 1199; • PersonNames = TextString (15 Characters); • Addresses = TextString (25 Characters); • CourseIdentifiers = 101 - 110; • CourseNames = Comp, IS, Law, Mkt, ...; • OLevelPoints = 0 - 100; • ScalePoints = 1 - 12; • PayRates = £14,005, £14,789, £15,407, ...; • StaffBirthDates = Date (dd/mm/yyyy), >21 Years before Today;

Logical Schema 2 - Relations • Relation Student • Enrol-No: StudentIdentifiers; • Name: PersonNames; • Address: Addresses; • OLevelPoints: OLevelPoints; • Tutor: StaffIdentifiers; • Primary Key: Enrol-No • Foreign Key Tutor references Staff.Staff-ID

Logical Schema 3 - Relations • Relation Staff • Staff-ID: StaffIdentifiers; • Name: PersonNames; • Address: Addresses; • ScalePoint: ScalePoints; • DOB: StaffBirthDates; • Primary Key: Staff-ID • Foreign Key ScalePoint references Pay.ScalePoint

Logical Schema ... • Relation Course • CourseCode: CourseIdentifiers; • Name: CourseNames; • … etc. • Continue to define each of the Relations in a similar manner. • All Relations MUST have a Primary Key. • Any Relation at the M-end of a 1:M Relationship MUST have a Foreign Key. • Make sure that you define ALL of the Relations, including: • ‘artificial’ ones (e.g. Team) • ‘split-off’ ones (e.g. Pay)

This Week’s Workshop • In this Workshops session we will; • test a logical data model; to ensure that it is anomaly free (i.e. robust), • practice documenting a Database Schema; based on a small conceptual model (as represented by an ER Diagram). • Examine a table of data: • Explain the ‘potential’ for insert, delete & update anomalies in a table of data. • Define what a ‘better’ set of tables (Relations?) to store the data look like? • Examine an ER Diagram: • Identify suitable Attributes for each Relation; as a minimum those that will act as the Primary & Foreign Keys. • Document as a Database Schema; starting with the Relations first, then coming back to document suitable Domains.

44220: Database Design & Implementation Avoiding Database Anomalies