Denormalization Techniques: Balancing Performance and Data Redundancy

Denormalization - Causes redundancy, but fast performance & no referential integrity - Denormalize when • specific queries occur frequently, • a strict performance is required and • it is not heavily updated -So, denormalize only when there is a very clear advantage to doing so and document carefully the reason for doing so

typical denormalization techniques • Flatten a repeating group in one table Instead of EMP (E#, Ename) SKILL (E#, Skill) Use EMP (E#, Skill, Ename) when Emp has a smaller # of attributes. - This means use Method 2 of 1NF algorithm. But know the danger of this method as we discussed in MVD.

Cont’ (2) Embed stable Code-Interpretation (Reference) Table. Instead of FLIGHT (F#, Departs, From_Code, To_Code) CODE (Code, Airport_Name) Use FLIGHT (F#, Departs, From_AP, From_Code, To_AP, To_Code)

Cont’ Combine1:1 or 1:N (a) when N is small and (b) the record on the "one" side is small (thus the amount of redundancy will be small) Instead of SALE (S#, SPName, SaleDate), SALE_ITEMS (S#, Line#, Code, Qty) Use SALE(S#, Line#, SPName, SaleDate, Code, Qty) -- "How many T179's did we sell yeaterday?" can be answered without join. • Another example: Order_Item(O#, I#, C#, Cname, I_Desc, Qty, I_Price)

Cont’ (4) When the other entity in is not interesting by itself Order(O#, ODate, OShipTerms, PmtTerms, Cname, CAddr) (5) Replicate non-frequently updated attributes to avoid JOIN WORK_ON (ESSN, P_NUM, PName, Hours)

Problems of denormalization • Makes row longer • Makes data transfer longer • Needs more memory for memory processing • Cause redundancy and expensive update

Adding redundant data - Add summary attributes or derived attributes - Redundant relationships can improve performance with the cost of update overhead

Schema translation • Reduce #of relations for JOIN by using mapped translation • Handling null values • Combine 1:1 relationships • Relax participation constraints • Divide the big table into two, if A & B are distinct in R(A, B) • Ignore FDs based on co-occurring attributes, which are not updated ZIP --> CITY

Primary key - Most frequently used attributes - Prefer small sized attributes (used in indexes, Ref. integrity)

Index - Create a set of appropriate indexes optimzing queries (This will be discussed more in physical DB chapters.)

Denormalization Techniques: Balancing Performance and Data Redundancy

Denormalization Techniques: Balancing Performance and Data Redundancy

Presentation Transcript

Database Normalization and Denormalization

What is Denormalization?

AN EMPIRICAL EXAMINATION OF THE IMPACT OF DATA MODEL DENORMALIZATION ON THE

Denormalization