Database Normalization and Denormalization. Trisha Cummings. What is Normalization?. In relational database theory
PowerPoint Slideshow about 'Database Normalization and Denormalization' - liam
An Image/Link below is provided (as is) to download presentation
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Normalization is the process of restructuring the logical data model of a database to eliminate redundancy, organize data efficiently and reduce repeating data and to reduce the potential for anomalies during data operations.
Data normalization also may improve data consistency and simplify future extension of the logical data model.
The formal classifications used for describing a relational database's level of normalization are called normal forms
A non-normalized database can suffer from data anomalies:
A non-normalized database may store data representing a particular referent in multiple locations. An update to such data in some but not all of those locations results in an update anomaly, yielding inconsistent data. A normalized database prevents such an anomaly by storing such data (i.e. data other than primary keys) in only one location.
A non-normalized database may have inappropriate dependencies, i.e. relationships between data with no functional dependencies. Adding data to such a database may require first adding the unrelated dependency. A normalized database prevents such insertion anomalies by ensuring that database relations mirror functional dependencies.
Similarly, such dependencies in non-normalized databases can hinder deletion. That is, deleting data from such databases may require deleting data from the inappropriate dependency. A normalized database prevents such deletion anomalies by ensuring that all records are uniquely identifiable and contain no extraneous information.
Normalized databases have a design that reflects the true dependencies between tracked quantities, allowing quick updates to data with little risk of introducing inconsistencies.
Instead of attempting to lump all information into one table, data is spread out logically into many tables.
Normalizing the data is decomposing a single relation into a set of smaller relations which satisfy the constraints of the original relation.
Redundancy can be solved by decomposing the tables. However certain new problems are caused by decomposition.
Normalization helps us to make a conscious decision to avoid redundancy keeping the pros and cons in mind.
One can only describe a database as having a normal form if the relationships between quantities have been rigorously defined.
It is possible to use set theory to express this knowledge once a problem domain has been fully understood, but most database designers model the relationships in terms of an "idealized schema". (The mathematical support came back into play in proofs regarding the process of transforming from one form to another.)
The transformation of conceptual model to computer representation format is known as Normalization.
Proposed the process of normalization and what came to be known as the 1st normal form:
Defined the first three normal forms
“There is, in fact, a very simple elimination* procedure which we shall call normalization. Through decomposition nonsimple domains are replaced by "domains whose elements are atomic (nondecomposable) values.“ E.F.Codd
Fifth normal form (5NF and also PJ/NF) requires that there are no non-trivial join dependencies that do not follow from the key constraints. A table is said to be in the 5NF if and only if it is in 4NF and every join dependency in it is implied by the candidate keys.
Databases intended for Online Transaction Processing (OLTP) are typically more normalized than databases intended for On Line Analytical Processing (OLAP).
OLTP Applications are characterized by a high volume of small transactions such as updating a sales record at a super market checkout counter.
The expectation is that each transaction will leave the database in a consistent state.
By contrast, databases intended for OLAP operations are primarily "read only" databases.
OLAP applications tend to extract historical data that has accumulated over a long period of time. For such databases, redundant or "denormalized" data may facilitate Business Intelligence applications.
Specifically, dimensional tables in a star schema often contain denormalized data.
The denormalized or redundant data must be carefully controlled during ETL processing, and users should not be permitted to see the data until it is in a consistent state.
The normalized alternative to the star schema is the snowflake schema.
Denormalization is also used to improve performance on smaller computers as in computerized cash-registers. Since these use the data for look-up only (e.g. price lookups), no changes are to be made to the data and a swift response is crucial.
The star schema (sometimes referenced as star join schema) is the simplest data warehouse schema, consisting of a single "fact table" with a compound primary key, with one segment for each "dimension" and with additional columns of additive, numeric facts.
The star schema makes multi-dimensional database (MDDB) functionality possible using a traditional relational database.
Because relational databases are the most common data management system in organizations today, implementing multi-dimensional views of data using a relational database is very appealing.
Even if you are using a specific MDDB solution, its sources likely are relational databases. Another reason for using star schema is its ease of understanding.
Fact tables in star schema are mostly in third normal form (3NF), but dimensional tables are in de-normalized second normal form (2NF).
If you want to normalize dimensional tables, they look like snowflakes (see snowflake schema) and the same problems of relational databases arise - you need complex queries and business users cannot easily understand the meaning of data.
Although query performance may be improved by advanced DBMS technology and hardware, highly normalized tables make reporting difficult and applications complex.
The snowflake schema is a variation of the star schema used in a data warehouse. The snowflake and star schema are methods of storing data which are multidimensional in nature (i.e. which can be analysed by any or all of a number of independent factors) in a relational database
The snowflake schema (sometimes callled snowflake join schema) is a more complex schema than the star schema because the tables which describe the dimensions are normalized.
If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snowflaking may be appropriate.
A multidimensional view is sometimes added to an existing transactional database to aid reporting. In this case, the tables which describe the dimensions will already exist and will typically be normalised. A snowflake schema will hence be easier to implement.
A snowflake schema can sometimes reflect the way in which users think about data. Users may prefer to generate queries using a star schema in some cases, although this may or may not be reflected in the underlying organisation of the database.
Some users may wish to submit queries to the database which, using conventional multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria.
Some snowflaking would typically be required to permit simple query tools such as Cognos Powerplay to form such a query, especially if provision for these forms of query weren't anticpated when the data warehouse was first designed.
In practice, many data warehouses will normalize some dimensions and not others, and hence use a combination of snowflake and classic star schema.
In recognition that denormalization can be deliberate and useful, the non-first normal form is a definition of database designs which do not conform to the first normal form, by allowing "sets and sets of sets to be attribute domains" (Schek 1982). This extension introduces hierarchies in relations.
In mathematics, a function f from a set X to a set Y is said to be bijective if for every y in Y there is exactly one x in X such that f(x) = y.
Said another way, f is bijective if it is a one-to-one correspondence between those sets; i.e., both one-to-one (injective) and onto (surjective).
For example, consider the function succ, defined from the set of integers to , that to each integer x associates the integer succ(x) = x + 1. For another example, consider the function sumdif that to each pair (x,y) of real numbers associates the pair sumdif(x,y) = (x + y, x − y).
A bijective function is also called a bijection or permutation. The latter is more commonly used when X = Y. It should be noted that one-to-one function means one-to-one correspondence (i.e., bijection) to some authors, but injection to others. The set of all bijections from X to Y is denoted as XY.
Bijective functions play a fundamental role in many areas of mathematics, for instance in the definition of isomorphism (and related concepts such as homeomorphism and diffeomorphism), permutation group, projective map, and many others.