Fundamentals of data warehousing
1 / 20

Fundamentals of Data Warehousing - PowerPoint PPT Presentation

  • Uploaded on

Fundamentals of Data Warehousing. Dr. Akhtar Ali School of Computing, Engineering and Information Sciences. Lecture Outline. Inmon’s Four Characteristics of a DW Subject-Oriented, Integrated, Time Variant and Non-Volatile Some Useful Definitions Issues in Data Warehousing DW Design

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Fundamentals of Data Warehousing' - haile

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Fundamentals of data warehousing

Fundamentals of Data Warehousing

Dr. Akhtar Ali

School of Computing, Engineering and Information Sciences

Lecture outline
Lecture Outline

  • Inmon’s Four Characteristics of a DW

    • Subject-Oriented, Integrated, Time Variant and Non-Volatile

  • Some Useful Definitions

  • Issues in Data Warehousing

    • DW Design

    • Extraction

    • Integration

    • Creation and Loading

    • DW Maintenance

    • Querying and Optimization

1 1 subject oriented





Data Warehouse

Subject Area

1.1 Subject-Oriented

Data is categorized and stored by business subject rather than

by application.






Operational Systems

1 2 integrated
1.2 Integrated

Data on a given subject is defined and stored once.











Subject = Customer

Data Warehouse

Operational Environment

1 3 time variant




Data for January


Data for February


Data for March

1.3 Time Variant

Data is stored as a series of snapshots, each representing a

period of time.



1 4 non volatile

Operational Databases

Warehouse Database

1.4 Non-Volatile

Typically data in the data warehouse is not directly updated or deleted.






2 1 what is an operational data store
2.1 What is an Operational Data Store?

  • An operational data store (ODS) is the point of integration for operational/transaction-oriented systems.

    • For example Banks typically have several independent systems set up to support different financial products e.g. loans, checking accounts, savings accounts etc.

    • The advent of ATMs helped push many banks to create an ODS to integrate current balances and recent transactional data from these separate accounts under one customer number.

    • Such ODSs are normally kept separate from a DW.

  • An ODS may be seen as a lowest layer of a DW for lower-management to access detailed as well as integrated data.

    • This means that an ODS may be seen as a front edge of a DW.

    • Such ODSs are normally kept as part of a DW.

2 2 what is a data mart
2.2 What is a Data Mart?

  • A data mart (DMT) is a logical subset of a complete DW.

    • A DMT is a complete “pie-wedge” of the overall DW pie.

    • A DW is a made up of the union of all its DMTs.

    • Some people take the definition literally. They create several independent DMTs to meet the needs of several departments.

    • Will everyone be happy? Well, maybe. There may be serious issues of integrating these DMTs together.

  • A DMT is an extension of a DW.

    • Data is integrated as it enters the DW. DMTs then derive data from the central source, the DW.

    • Each department gets its own DMT.

    • Each department determines which of the data warehouse contents are of interest .

    • These subject areas are then replicated into the smaller and local DMT so that users can get to the data they want with less interference from other departments.

2 3 what is on line analytical processing olap
2.3 What is On-Line Analytical Processing (OLAP) ?

  • OLAP is complementary to data warehousing.

  • OLAP embodies general activities of querying and presenting text and number data from DWs.

  • OLAP is based on dimensional modelling as opposed to entity-relationship (ER) modelling.

  • A dimensional model may contain the same information as an ER model but packages data in a symmetric form.

  • A dimensional model is geared towards user understandability and high performance query processing.

  • ROLAP (Relational OLAP)

    • A set of user interfaces and applications that give a relational database (RDB) a dimensional flavour.

  • MOLAP (Multi-dimensional OLAP)

    • A set of user interfaces, applications and proprietary database technologies that have a strong dimensional flavour.

2 4 what is data mining dm
2.4 What is Data Mining (DM) ?

  • DM is often defined as “finding hidden information” in a database.

  • Alternatively, DM is “exploratory data analysis, data driven discovery, and deductive learning”.

  • Data mining software is a class of tools that apply artificial intelligence techniques to the analysis of data.

  • Given access to data, DM tools dig through the data looking for patterns and discovering relationships that the user might have never suspected.

  • DM tools work against an operational database or a DW.

  • Since data in a DW is usually integrated and summarized it may be more efficient to use it for DM.

  • But a DM tool may find more useful information from an operational database (compared to a DW) as a DW usually hosts data to support anticipated DSS and may miss out data useful for DM.

3 1 warehouse design
3.1 Warehouse Design

  • Influenced by both maintenance and querying

  • Many trade-offs

    • Space vs. update time vs. query performance

  • Logical model of data

    • ER vs. Dimensional, Relational vs. OR vs. OO, ROLAP vs. MOLAP

  • Identify sources of data

  • Identify warehouse data – what to materialize?

    • Which summary tables?

    • Which fact/dimensional tables?

    • Which indices?

  • Choose software and hardware

3 2 data extraction
3.2 Data Extraction

  • Selecting relevant data from data sources (DSs) and moving it into DW.

  • DS types

    • Database (e.g. relational), flat file, WWW, XML, COBOL, etc

  • How to obtain the data?

    • Using data replication servers/tools

    • Dump file or Export tools


    • Third party Wrappers/Middleware/Agents

  • Other activities: data transformation, change detection (monitoring), cleansing etc.

3 2 1 monitors
3.2.1 Monitors

  • Detecting changes (of interest to a DW) in data sources and propagate to DW.

  • How?

    • Triggers

    • Replication servers/tools

    • Log Sniffer

    • Compare query results

    • Compare snapshots/dumps/exported data

3 2 1 data cleansing
3.2.1 Data Cleansing

  • Finds and removes duplicate tuples

    • For example Judie Harris Morris vs. Judie H. Morris

  • Detect inconsistent or wrong data

    • Attribute values that do not match (because of wrong data types or violating certain constraints e.g. for Gender attribute a value of ‘N’ meaning neither may be rejected).

  • Unreadable or incomplete data

  • Notify DSs of errors found during the cleansing process

3 3 data integration
3.3 Data Integration

  • Receive data (changes) from multiple wrappers/monitors/data cleansers and integrate into DW.

  • Often Rule-based

  • Actions

    • Resolve inconsistencies

    • Eliminate duplicates

    • Integrate into DW

    • Summarize data

    • Fetch more data from DSs

    • Notify users that DW is now up-to-date

3 4 1 warehouse loading
3.4 .1 Warehouse Loading

  • Includes all of the previous processes

  • Similar to loading/populating a database but complex due to heterogeneity of data and dealing with multiple DSs, possibly remote and external.

    • Building indices

    • Checking integrity constraints, etc.

  • Issue: huge volumes of data but small time window to complete the process.

  • Computation of additional data

    • Auxiliary data to facilitate DW maintenance and support/speed up querying and analysis.

3 4 2 warehouse creation
3.4.2 Warehouse Creation

  • A DW can be seen as a collection of materialized views (MVs) over DSs.

    • Contains a copy of data (collected from DSs) tailored to end-users.

  • Steps

    • Create DW schema (e.g. creating fact and dimensional tables, defining MVs)

    • Load warehouse

    • Start monitoring for changes at DSs

    • Update/Maintain DW as needed.

3 5 dw maintenance
3.5 DW Maintenance

  • DSs on which a DW is based may change over time.

  • Changes at DSs may require changes at a DW.

  • How often to propagate changes to a DW?

    • At night, weekly/fortnightly/monthly, immediately, etc.

  • Off-line or on-line

    • Most current vendor products take a warehouse off-line during maintenance

  • How to propagate changes to a DW?

    • Completely re-build all affected tables at the DW (easy but inefficient)

    • Apply changes to affected tables incrementally (efficient but difficult)

  • Read my paper about MOVIE or wait until we discuss this topic in detail.

3 6 querying and optimization
3.6 Querying and Optimization

  • Queries are long-running and complex

    • Multiple/Nested joins and aggregation

    • Usually “touch all tuples” kind of queries

  • Query language or analysis tools must support multi-dimensional operations

    • Pivot, Slice/Dice, Rollup, Percentile, etc

    • Standard SQL does not provide adequate operations

  • Solution: pre-compute partial answers and reuse

    • Drawback: it may increase DW maintenance

  • Emergence of warehouse management systems (WHMS e.g. ADMS, WHIPS)