O n l ine a nalytical p rocessing
Download
1 / 68

O N- L INE A NALYTICAL P ROCESSING - PowerPoint PPT Presentation


  • 149 Views
  • Uploaded on

O N- L INE A NALYTICAL P ROCESSING. Objectives. What is OLAP Need for OLAP Features & functions of OLAP Different OLAP models OLAP implementations. Demand for OLAP. To develop DM, three approaches In all approaches, Data Marts rest on Dimensional Model

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' O N- L INE A NALYTICAL P ROCESSING' - keith


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
O n l ine a nalytical p rocessing

ON-LINE ANALYTICAL PROCESSING


Objectives
Objectives

  • What is OLAP

  • Need for OLAP

  • Features & functions of OLAP

  • Different OLAP models

  • OLAP implementations


Demand for olap
Demand for OLAP

  • To develop DM, three approaches

  • In all approaches, Data Marts rest on Dimensional Model

  • Data Marts are sufficient for basic data analysis

  • Users need to go beyond such basic analysis


Demand for olap1
Demand for OLAP

  • Need for Multidimensional Analysis

  • Fast Access & Powerful Calculations

  • Limitations of other analysis methods like:

    • SQL

    • Spreadsheets

    • Report Writers


Demand for olap2
Demand for OLAP

  • Traditional tools of report writers, query products, spreadsheets, & language interfaces do not match the user expectations as far as performing multidimensional analysis with complex calculations is concerned.

  • Tools used with OLTP and basic DW environments do not match up to the task


Olap is the answer
OLAP is the Answer!

OLAP is a category of software technology that enables analysts, managers, and executives to gain insight into the data through fast, consistent, interactive, access in a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.


Why is OLAP useful?

  • Facilitates multidimensional data analysis by pre-computing aggregates across many sets of dimensions

  • Provides for:

    • Greater speed and responsiveness

    • Improved user interactivity


Data warehouses
Data Warehouses

  • A data warehouse is based on a multidimensional data model which views data in the form of a data cube

  • A data cube allows data to be modeled and viewed in multiple dimensions

  • In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.


Lattice of cuboids

all

0-D(apex) cuboid

time

item

location

supplier

1-D cuboids

time,item

time,location

item,location

location,supplier

2-D cuboids

time,supplier

item,supplier

time,location,supplier

time,item,location

3-D cuboids

item,location,supplier

time,item,supplier

4-D(base) cuboid

time, item, location, supplier

Lattice of Cuboids


day 2

day 1

CUBE

Multi-dimensional cube:

Fact table view:

dimensions = 3


Aggregates
Aggregates

  • Add up amounts for day 1

  • In SQL: SELECT sum(amt) FROM SALE

  • WHERE date = 1

81


Aggregates1
Aggregates

  • Add up amounts by day

  • In SQL: SELECT date, sum(amt) FROM SALE

  • GROUP BY date


Aggregates

  • Operators: sum, count, max, min, median, avg

  • “Having” clause

  • Using dimension hierarchy

    • average by region (within store)

    • maximum by month (within date)


Cube aggregation

rollup

drill-down

Cube Aggregation

Example: computing sums

day 2

. . .

day 1

129


Cube operators
Cube Operators

day 2

. . .

day 1

sale(c1,*,*)

129

sale(c2,p2,*)

sale(*,*,*)

sale(*,p1,*)


Extended cube
Extended Cube

*

day 2

sale(*,p2,*)

day 1


Aggregation using hierarchies

day 2

day 1

Aggregation Using Hierarchies

customer

region

country

(customer c1 in Region A;

customers c2, c3 in Region B)


Pivoting

day 2

day 1

Pivoting

Fact table view:

Multi-dimensional cube:


Cube aggregates lattice

day 2

day 1

Cube Aggregates Lattice

129

all

city

product

date

city, product

city, date

product, date

use greedy

algorithm to

decide what

to materialize

city, product, date


Dimension hierarchies
Dimension Hierarchies

all

state

city


Dimension hierarchies1
Dimension Hierarchies

all

product

city

date

product, date

city, product

city, date

state

city, product, date

state, date

state, product

state, product, date

not all arcs shown...


Interesting hierarchy
Interesting Hierarchy

all

years

weeks

quarters

conceptual

dimension table

months

days


Sample cube

Date

2Qtr

1Qtr

sum

3Qtr

4Qtr

TV

Product

U.S.A

PC

VCR

sum

Canada

Country

Mexico

sum

All, All, All

SAMPLE CUBE

Total annual sales

of TV in U.S.A.

Total annual sales

of PC in U.S.A.

Total sales

In U.S.A

Total annual sales

of VCR in U.S.A.

Total Q1 sales

In U.S.A

Total sales

In Canada

Total Q1 sales

In Canada

Total sales

In Mexico

Total Q1 sales

In Mexico

Total Q2 sales

In all countries

TOTAL SALES

Total Q1 sales

In all countries


OLAP Operations

  • Roll-Up

  • Drill-Down

  • Slice & Dice

  • Pivot

  • Drill-Across

  • Drill-Through







Other olap operations
Other OLAP Operations

  • Drill-Across: Queries involving more than one fact table

  • Drill-Through: Makes use of SQL to drill through the bottom level of a data cube down to its back-end relational tables

  • Pivot (rotate): Pivot (also called "rotate") is a

  • visualization operation which rotates the data axes in

  • view in order to provide an alternative presentation of

  • the data. Other examples include rotating the axes in a

  • 3-D cube, or transforming a 3-D cube into a series of 2-

  • D planes.


Other olap operations1
Other OLAP Operations

  • Moving Averages

  • Growth Rates

  • Depreciation

  • Currency Conversion

  • Statistical Functions

  • Top N or Bottom N queries


Conceptual vs. Actual

  • The “cube” is a logical way of visualizing the data in an OLAP setting

  • Not how the data is actually represented on disk

  • Two ways of storing data:

    • ROLAP: Relational OLAP

    • MOLAP: Multidimensional OLAP


OLAP & CUBE

  • Construction of the data cube is key to the operation of OLAP

  • The computation process creates a set of aggregates on the various dimensions of the data

  • The CUBE operator



The CUBE Operator

  • Proposed by Gray et al*

  • Effectively involves a series of GROUP-BY operations to aggregate data

  • Creates power set on all attributes according to:

    • A measure

    • An aggregator function

*J. Gray, S. Chaudhuri, A. Bosworth, A. Layman,D. Reichart, M. Venkatrao, F. Pellow and H. Pirahesh.

Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals.

Data Mining and Knowledge Discovery, 1:29-54, 1997.


CUBING Problem

  • Problem: this generates a lot of data and work (2n sets in total, where n is the number of dimensions)

  • Solution: optimized algorithms to run faster, consume less memory, and perform fewer I/Os.


Efficient Computation of Data Cubes

  • ROLAP-based cubing algorithms (Agarwal et al’96)

  • Array-based cubing algorithm

    (Zhao et al’97)

S. Agarwal, R. Agrawal, P. M. Deshpande, A.Gupta, J. F. Naughton, R. Ramakrishnan and S.Sarawagi.

On the computation of multidimensional aggregates. In VLDB'96.

Y. Zhao, P. M. Deshpande, and J. F. Naughton.

An array-based algorithm for simultaneous multidimensional aggregates. In SIGMOD'97.


Efficient Computation of Data Cubes

  • How many cuboids in a cube with 3 dimensions?

  • Answer:

  • As many group by operations?

  • No hierarchies involved!!

  • π(Li+1), where Li is the number of levels associated with dimension I

  • 10 dimensions & 4 levels for each dimension

  • Total Cuboids = 510


Approaches to OLAP Servers

  • It is all about which DBMS you choose to store your data warehouse data

  • RDBMS – ROLAP

  • MDDB – MOLAP

  • BOTH - HOLAP


Approaches to OLAP Servers

Three possibilities for OLAP servers

(1) Relational OLAP (ROLAP)

  • Relational and specialized relational DBMS to store and manage warehouse data

  • OLAP middleware to support missing pieces

    (2) Multidimensional OLAP (MOLAP)

  • Array-based storage structures

  • Direct access to array data structures

    (3) Hybrid OLAP (HOLAP)

  • Storing detailed data in RDBMS

  • Storing aggregated data in MDBMS

  • User access via MOLAP tools


ROLAP

  • Special schema design: star, snowflake

  • Special indexes: bitmap, multi-table join

  • Proven technology (relational model, DBMS), tend to outperform specialized MDDB especially on large data sets

  • Products

    • IBM DB2, Oracle, Sybase IQ, RedBrick, Informix


ROLAP

  • Defines complex, multi-dimensional data with simple model

  • Reduces the number of joins a query has to process

  • Allows the data warehouse to evolve with relatively low maintenance

  • Can contain both detailed and summarized data.

  • ROLAP is based on familiar, proven, and already selected technologies.

    BUT!!!

  • SQL for multi-dimensional manipulation of calculations.


MOLAP

  • MDDB: a special-purpose data model

  • Facts stored in multi-dimensional arrays

  • Dimensions used to index array

  • Sometimes on top of relational DB

  • Products

    • Pilot, Arbor Essbase, Gentia


MOLAP

  • Pre-calculating or pre-consolidating transactional data improves speed.

    BUT

    Fully pre-consolidating incoming data, MDDs require an enormous amount of overhead both in processing time and in storage. An input file of 200MB can easily expand to 5GB

    MDDBs are great candidates for the < 100GB department data marts.

  • With MDDs, application design is essentially the definition of dimensions and calculation rules, while the RDBMS requires that the database schema be a star or snowflake.


OLAP Needs

  • User Needs

    • Multidimensional view

    • Excellent Performance

    • Analytical Flexibility

    • Real-Time Data Access

    • High Data Capacity

  • MIS Needs

    • Leverages Data Warehouse

    • Easy Development

    • Low Structure Maintenance

    • Low Aggregate Maintenance


OLAP Needs: User Needs

Multidimensional View

  • All true OLAP tools, whether they work with a MDDB or an RDBMS, provide a multidimensional view of data.

  • For example, decision makers may view sales by office, quarter, representative, product, etc. This perspective on data, which mirrors the way business professional think, allows for more intuitive and more powerful analysis.


OLAP Needs: User Needs

Excellent Performance

  • The performance of your decision support tool directly depends on the way it manages aggregates.

  • RDBMS

    • Calculate aggregates on fly (response time suffers)

    • DBA creates summary tables to store aggregates (enormous amount of disk space)


OLAP Needs: User Needs

Excellent Performance

  • For example, suppose you have a Sales indicator with six dimensions—Representatives, Products, Customers, Regions, Months, and Years.

  • MOLAP tools will store a given aggregate, such as the November 1997 government sales of product A504 by representative 1040 in New York, in 1 cell of the MDDB.

  • In contrast, ROLAP tools consume 600% more space, because they require a record of seven values—six foreign keys and the actual aggregate—in a relational summary table.


OLAP Needs: User Needs

Excellent Performance


OLAP Needs: User Needs

Excellent Performance

RDBMSs must use several summary tables to store the aggregates

that a MOLAP could store in just one cube. For example, consider a Sales indicator with three dimensions: Months, Regions, and Products. The indicator cube will contain seven sets of aggregates:

• Sales by month

• Sales by product

• Sales by region

• Sales by month and product

• Sales by month and region

• Sales by product and region

• Sales by product, month, and region

To store these aggregates in an RDBMS, you’d have to create seven summary tables, one for each aggregate set.

HOW MANY SUMMARY TABLES FOR 6 DIMENSIONS?

(Separate fact table and shrunken dimension table approach for storing aggregates)


OLAP Needs: User Needs

Excellent Performance

  • Huge amounts of extra storage space is required (even if there is no sparsity failure)

  • Maintenance costs are high

  • Lot of statistical analysis needs to be done to decide which aggregates are to be precomputed

  • DBA must keep the cost/performance ratio in check


OLAP Needs: User Needs

Excellent Performance

  • In contrast, we’ve seen that multidimensional databases store aggregates in a very compact structure that consumes very little disk space and requires very little maintenance

  • All levels of consolidation can therefore be precomputed and stored in MDDB

  • As a result, fast response time is not limited to the most frequently accessed queries; all aggregates can be accessed with lightning speed.


OLAP Needs: User Needs

Analytical Flexibility

  • Both ROLAP & MOLAP tools offer comparative performance for

    • Comparative Analysis

    • Roll-up and Drill-down

    • Slicing & Dicing

  • Only MOLAP tools offer ‘what-if’ analysis


OLAP Needs: User Needs

Real-Time Data Access

  • MOLAP tools load data into the multidimensional cubes. Consequently, the data being accessed is only as recent as the last load.

  • Some applications require real-time data access

  • Process of continually refreshing the data attaches higher costs to operating a MOLAP system

  • Some MOLAP tools offer reach-through functionality to access volatile data stored outside the MDDB

  • Unfortunately, users must be aware of the underlying database structure

  • Relational data access is too complex for the typical user


OLAP Needs: User Needs

Real-Time Data Access

  • ROLAP tools maintain a constant link to the operational RDBMS, which provides users with up-to-the-minute, accurate data

    (Real-Time Data Warehousing)

  • Industries & organizations with highly volatile data particularly benefit from this access to live, operational data.


OLAP Needs: User Needs

High Capacity Data

  • MOLAP products are limited by the size of the cube defined by the multidimensional view. When dimension elements are predefined, the scope of available data is limited at the onset.

  • ROLAP tools circumvent this barrier. Dynamic dimensions are not stored in the predefined multidimensional model, but fetched at run time from the RDBMS.


OLAP Needs: User Needs

High Capacity Data

  • In MOLAP, only aggregates are stored in the cube. Atomic, operational data are forced out of the user’s analytical realm.

  • ROLAP systems can access extremely detailed operational data, as well as aggregated data stored in summary tables.


OLAP Needs

MIS Needs

Administrators should be able to leverage their existing relational databases without devoting large amounts of time and effort to intricate development, fine tuning, or intensive maintenance.


OLAP Needs: MIS Needs

Leveraging Data Warehouse

  • Both the finance and the MIS departments of your organization will appreciate a decision support tool that leverages existing investments in data warehousing.

  • MIS staff that opts for a MOLAP tool must duplicate data in its own proprietary MDDB.

  • MIS staff that chooses a ROLAP tool will be able to access the data warehouse directly.


OLAP Needs: MIS Needs

Easy Development

  • MOLAP development is straightforward, it requires no fine tuning and creates its own aggregates.

  • ROLAP tools, on the other hand, require a specific schema for the relational database.

  • Skilled DBAs must provide the appropriate schema (star or snowflake schema), tune the database, and create the appropriate summary tables.

  • However, many ROLAP tools are metadata-driven, which means the multidimensional view is generated and maintained more easily.


OLAP Needs: MIS Needs

Low Structure Maintenance

  • The structure of a MOLAP tool’s underlying MDDB greatly depends on each of its dimensions. When one dimension changes, the entire MDDB must be re-structured.

  • Multi-matrix MDDBs reduce the maintenance burden

  • ROLAP systems do not store data in a proprietary structure.

  • They build and maintain a constant link between the multidimensional view and the underlying RDBMS using the metadata.

  • No database restructuring is required.


OLAP Needs: MIS Needs

Low Aggregate Maintenance

  • MOLAP tools automatically create high-level aggregates based on your lower-level MDDB data and aggregate definitions.

  • When data is updated, the aggregates are automatically updated and stored in the MDDB.

  • With ROLAP tools, MIS staff must continually monitor the use of summary tables to keep their cost/performance ratio in check.

  • DBAs inevitably use sophisticated statistics to isolate only the most frequently accessed aggregates, and store them in summary tables.

  • These tables leave ROLAP administrators with a heavy maintenance burden.



ROLAP vs. MOLAP

  • 1)Performance:

    • How fast will the system appear to the end-user?

    • MDD server vendors believe this is a key point in their favor.

  • 2) Data volume and scalability:

    • While MDD servers can handle up to 100GB of storage, RDBMS servers can handle hundreds of gigabytes and terabytes.


Hybrid OLAP - HOLAP

  • Best of both worlds

  • Storing detailed data in RDBMS

  • Storing aggregated data in MDBMS

  • User access via MOLAP tools


RDBMS Server

MDBMS Server

Client

Multidimensional Viewer

Multi-dimensional access

SQL-Read

User

data

Multi-dimensional data

Metadata

Derived

data

RelationalViewer

SQL-Reach Through

SQL-Read

HOLAP


ROLAP, MOLAP, or HOLAP

IF

A. You require write access

B. Your data is under 50 GB

C. Your timetable to implement is 60-90 days

D. Lowest level already aggregated

E. Data access on aggregated level

F. You’re developing a general-purpose application for inventory movement or assets management

THEN

Consider an MDD /MOLAP solution for your data mart

IF

A. Your data is over 100 GB

B. You have a "read-only" requirement

C. Historical data at the lowest level of granularity

D. Detailed access, long-running queries

E. Data assigned to lowest level elements

THEN

Consider an RDBMS/ROLAP solution for your data mart.

IF

A. OLAP on aggregated and detailed data

B. Different user groups

C. Ease of use and detailed data

THEN

Consider an HOLAP for your data mart


Conclusions

  • ROLAP: RDBMS -> star/snowflake schema

  • MOLAP: MDDB -> Cube structures

  • ROLAP or MOLAP: Data models used play major role in performance differences

  • MOLAP: for summarized and relatively lesser volumes of data (100GB)

  • ROLAP: for detailed and larger volumes of data

  • Both storage methods have strengths and weaknesses

  • The choice is requirement specific, though currently data warehouses are predominantly built using RDBMSs/ROLAP.

  • HOLAP is emerging as the OLPA server of choice


ad