Multidimensional databases
Download
1 / 75

Multidimensional Databases - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Multidimensional Databases. Prof. Navneet Goyal Computer Science & Information Systems Department BITS, Pilani. Database Evolution. Flat files Hierarchical and Network Relational Distributed Relational Multidimensional (MDDB). MDDB History.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Multidimensional Databases' - mikasi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Multidimensional databases

Multidimensional Databases

Prof. Navneet Goyal

Computer Science & Information Systems Department

BITS, Pilani


Database evolution
Database Evolution

  • Flat files

  • Hierarchical and Network

  • Relational

  • Distributed Relational

  • Multidimensional (MDDB)


Mddb history
MDDB History

  • Multidimensional databases do not have their origin in DB technology but stem from multidimensional matrix algebra, which has been used for manual data analysis since the late 19th century


Mddb history1
MDDB History

  • Evolved from econometric research conducted at MIT in the 1960s, the MDDB has matured into the database engine of choice for data analysis applications


Mddb history2
MDDB History

  • During the late 1960s, IRI Software and Comshare independently began developing what later became MDDB systems

  • IRI Express, a popular tool for marketing analysis in the late 1970s & early 1980s, became a market-leading online analytical processing tool & was acquired by Oracle.


Mddb history3
MDDB History

  • Concurrently, the Comshare system developed into System W, which saw heavy use for financial planning, analysis, & reporting during the 1980s.


Mddb history4
MDDB History

  • In 1991, Arbor Software, now Hyperion Solutions, was formed withthe specific purpose of creating a multiuser, MDDB server, which resulted in the Essbase system. Arbor later licensed a basic version of

    Essbase to IBM for integration into DB2.


Mddb history5
MDDB History

  • In 1993, E.F. Codd coined the term OLAP1

  • Another significant development

    in the early 1990s was the advent of data warehouses, which are typically based on relational star or snowflake schemas, an approach that uses relational database technology to handle multidimensional data

    1http://dev.hyperion.com/resource_library/white_papers/


Why multi dimensional databases
Why Multi-Dimensional Databases?

  • No single "best" data structure for all applications within an enterprise

  • Organizations have abandoned the search for the HOLY GRAIL of globally accepted database

  • Select the most appropriate data structure on a case-by-case basis from a palette of standard database structures

  • Multidimensional Databases for OLAP?


Why multi dimensional databases1
Why Multi-Dimensional Databases?

  • Inherent ability to integrate and analyze large volumes of enterprise data

  • Offers a good conceptual fit with the way end-users visualize business data

    • Most business people already think about their businesses in multidimensional terms

    • Managers tend to ask questions about product sales in different markets over specific time periods


Why multi dimensional databases2
Why Multi-Dimensional Databases?

  • Multidimensional databases is a key technology in the enabling of interactive analyses of large amounts of data for decision-making purposes.

  • Multidimensional databases differ from previous technologies by viewing data specifically as multidimensional cubes, which have proven to be particularly well suited for data analyses.


Multi dimensional data model
Multi-Dimensional Data Model

  • The multidimensional data model, has emerged that has taken over from the relational model when the objective is to analyze data, rather than to perform on-line transactions

  • Multidimensional models categorize data as being either facts with associated numerical measures, or as being dimensionsthat characterize the facts & are mostly textual.


Multi dimensional data model1
Multi-Dimensional Data Model

3Major Applications within data analysis:

  • Data Warehousing

  • OLAP

  • Data Mining


Data Warehousing

Information

Operational data

Knowledgeworkers

Fig. taken from w w w . e f o r c e g l o b a l . c o m


A subject oriented approach

Operational Systems

Data Warehouse

Sales

Customer

Financial

Geography

Inventory

Product

Organized by processes

or tasks

Organized by subject

A subject-Oriented Approach

Fig. taken from w w w . e f o r c e g l o b a l . c o m


Data Warehouse Process

DATA SOURCES STAGING AREA DATA WAREHOUSE DECISION SUPPORT

Application Databases

______________________________________________________

Reports

Packaged application/ERP Data

DATA MARTS

INCOME ANNUAL REPORT

___ ___ ____ _____ ___ __

___ ___ ____ _____ ___ __

___ ___ ____ _____ ___ __

EXTRACTION

TRANSFORMING

CLEANING

AGGREGATION

DATA

WAREHOUSE

EIS

Desktop Data

OLAP

External Data

Statistical & Financial Analysis

Web-based Data

[Adapted from SunExpert Magazine, October 1998.]


What is olap
What is OLAP?

  • OLAP enables analysts, managers, and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information

  • OLAP transforms raw data so that it reflects the real dimensionality of the enterprise as understood by the user


What is olap1
What is OLAP?

  • OLAP is a decision support tool which enables the sophisticated analysis of an organization’s performance by providing access to views of data which characterize the multidimensional nature of the enterprise [Codd et at 1993]


Data mining
Data Mining

  • Discovery of useful summaries of data - Ullman

  • Extracting or “Mining” knowledge form large amounts of data

  • The efficient discovery of previously unknown patterns in large databases

  • Technology which predict future trends based on historical data

  • It helps businesses make proactive and knowledge-driven decisions

  • Data Mining vs. KDD

  • The name “Data Mining” a misnomer?


Continuum of analysis
Continuum of Analysis

OLTP

Specialized Algorithms

SQL + Extensions

SQL

Data Mining

DW/

OLAP

Primitive & Canned Analysis

Complex Ad-hoc Analysis

Automated Analysis

Role of MDDBs??


Sql extensions
SQL Extensions

  • Window Queries

  • Top N Queries

  • Rollup Operator

  • Cube


Multidimensional database
Multidimensional Database

  • Spreadsheets – A 2D database?

  • Functionalities

  • What about a stack of similar spreadsheets for different times?

  • Limitations?

We can not relate data in different

sheets easily!


Multidimensional database1
Multidimensional Database

An MDDB is a computer software system designed to allow for the efficient and convenient storage and retrieval system of large volumes of data that is

  • Intimately related &

  • Stored, viewed and analyzed form different perspectives

    These perspectives are called Dimensions


A motivating example
A Motivating Example

An automobile manufacturer wants to increase sale volumes by examining sales data collected throughout the organization. The evaluation would require viewing historical sales volume figures from multiple dimensions such as

  • Sales volume by model

  • Sales volume by color

  • Sales volume by dealer

  • Sales volume over time



6

5

4

M

O

D

3

5

5

E

L

4

3

2

COLOR

Multidimensional Array Structure

Measurement

Sales Volumes

Dimension

MiniVan

Coupe

Sedan

Blue

Red

White

Dimension

Positions


Rdbms vs mddb
RDBMS vs. MDDB

  • Multidimensional array structure represents a higher level of organization than the relational table

  • Perspectives are embedded directly into the structure in the multidimensional model

    • All possible combinations of perspectives containing a specific attribute (the color BLUE, for example) line up along the dimension position for that attribute.

  • Perspectives are placed in fields in the relational model - tells us nothing about field contents.


  • RDBMS vs. MDD

    • MDD makes data browsing and manipulation intuitive to the end-user

    • Any data manipulation action possible with a MDD is also possible using relational technology

    • Substantial cognitive advantages in query formulation

    • Substantial computational performance advantages in query processing when using MDD



    Mutlidimensional representation

    Mini Van

    M

    Coupe

    O

    D

    Carr

    Sedan

    E

    Gleason

    L

    Clyde

    DEALERSHIP

    Blue

    Red

    White

    COLOR

    Mutlidimensional Representation

    Sales Volumes


    Viewing data an example

    M

    O

    D

    E

    L

    DEALERSHIP

    COLOR

    Viewing Data - An Example

    Sales Volumes

    Assume that each dimension has 10 positions, as shown in the cube above


    Viewing Data - An Example

    • How many records would be there in a relational table?

    • Implications for viewing data from an end-user standpoint?

    SALES VOLUMES FOR ALL DEALERSHIPS

    MODEL COLOR DEALERSHIP VOLUME

    MINI VAN BLUE CLYDE 2

    MINI VAN BLUE GLEASON 2

    MINI VAN BLUE CARR 2

    MINI VAN RED CLYDE 1

    MINI VAN WHITE GLEASON 3

    RECORD NUMBER.... 998

    RECORD NUMBER.... 999

    RECORD NUMBER.... 1000


    Performance Advantages

    • Volume figure when car type = SEDAN, color=BLUE, & dealer=GLEASON?

    • RDBMS – all 1000 records might need to be searched to find the right record

    • MDDB has more ‘knowledge’ about where the data lies

    • Max. of 30 position searches!!

    • Average case

      15 vs. 500


    Performance Advantages

    • Total Sales across all colors and dealers when model = SEDAN?

    • RDBMS – all 1000 records must be searched to get the answer

    • MDDB – Sum the contents of one 10x10 ‘slice’


    Performance Advantages

    • Data manipulation that requires a minute in RDBMS may require only a few seconds in MDDB

    • MDDBs are an order of magnitude faster than RDBMSs

    • Performance benefits are more for queries that generate cross-tab views of data

    • The performance advantages offered by multidimensional technology facilitates the development of interactive decision support applications like OLAP that can be impractical in a relational environment.


    RDBMS vs. MDDB

    • Any data manipulation action possible with a multidimensional database is also possible using relational technology

    • MDDBs however offer several advantages like:

      • Ease of data presentation and navigation

      • Ease of maintenance

      • Performance


    Ease of Data Presentation & Navigation

    • Intuitive spreadsheet like data views are natural output of MDBs

    • Obtaining the same views in a relational environment, requires either a complex SQL or a SQL generator against a RDB to convert the table outputs into a more intuitive format

    • Top N queries are not possible with SQL at all


    Ease of Maintenance

    • Ease of maintenance because data is stored as it is viewed

    • No additional overhead is required to translate user queries into requests for data

    • To provide same intuitiveness, RDBs use indexes and sophisticated joins which require significant maintenance and storage


    Performance

    • Performance of MDDBs can be matched by RDBs through database tuning

    • Not possible to tune the database for all possible adhoc queries

    • Tuning requires resources of an expensive DB specialist

    • Aggregate navigators are helping RDBs to catch up with MDBs as far as aggregation queries are concerned


    Adding dimension an example

    M

    Mini Van

    Mini Van

    Mini Van

    O

    D

    Coupe

    Coupe

    Coupe

    E

    L

    Carr

    Carr

    Carr

    Sedan

    Sedan

    Sedan

    Gleason

    Gleason

    Gleason

    DEALERSHIP

    Clyde

    Clyde

    Clyde

    Blue

    Red

    White

    Blue

    Red

    White

    Blue

    Red

    White

    COLOR

    COLOR

    COLOR

    JANUARY

    FEBRUARY

    MARCH

    Adding Dimension –An Example

    Sales Volumes


    When is mdd in appropriate
    When is MDD (In)appropriate?

    First, consider situation 1

    PERSONNEL

    EMPLOYEE#

    LAST NAME

    EMPLOYEE AGE

    SMITH

    01

    21

    REGAN

    12

    19

    FOX

    31

    63

    WELD

    14

    31

    KELLY

    54

    27

    LINK

    03

    56

    KRANZ

    41

    45

    LUCUS

    33

    41

    WEISS

    23

    19


    When is MDD (In)appropriate?

    Now consider situation 2

    1. Set up a MDD structure for situation 1, with LAST NAME & Employee# as dimensions, and AGE as the measurement.

    2. Set up a MDD structure for situation 2, with MODEL & COLOR as dimensions, and SALES VOLUME as the measurement.


    EmployeeAge

    Smith

    21

    Regan

    19

    Sales Volumes

    Fox

    63

    L

    A

    6

    5

    4

    Mini Van

    Weld

    31

    M

    S

    O

    T

    D

    3

    5

    5

    Coupe

    Kelly

    27

    E

    N

    L

    A

    4

    3

    2

    Link

    M

    56

    Sedan

    E

    Red

    White

    Blue

    Kranz

    45

    COLOR

    Lucas

    41

    Weiss

    19

    31

    41

    23

    01

    14

    54

    03

    12

    33

    EMPLOYEE #

    When is MDD (In)appropriate?

    MDD Structures for the Situations

    Note the sparse between the two MDD representations


    When is MDD (In)appropriate?

    • Our sales volume dataset has a great number of meaningful interrelationships

    • Interrelationships more meaningful than individual data elements themselves.

    • The greater the number of inherent interrelationships between the elements of a dataset, the more likely it is that a study of those interrelationships will yield business information of value to the company.

    • Highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis


    When is MDD (In)appropriate?

    • No last name is matching with more than one emp # and no emp # is matching with more than one last name

    • In contrast, there is a sales figure associated with every combination of model and color resulting in a completed filled up 3x3 matrix

    • Performance suffers (RDB 9 vs. MDB 18)


    When is MDD (In)appropriate?

    • The relative performance advantages of storing multidimensional data in a multidimensional array increase as the size of the dataset increases

    • The relative performance disadvantages of storing non-multidimensional data in a multidimensional array increase as the size of the dataset increases.

    • NO inherent value of storing Non-multidimensional data (employee data) in multidimensional arrays


    When is MDD Appropriate?

    • The greater the number of inherent interrelationships between the elements of a dataset, the more likely it is that a study of those interrelationships will yield business information of value to the company.

    • Most companies have limited time and resources to devote to analyzing data

    • It therefore becomes critical that these highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis.


    Queries
    Queries

    • High degree of structure in MDB makes the query language very simple and efficient

    • Query language is intuitive

    • Output is immediately useful to end user


    Queries example
    Queries: Example

    • Display sales volume by model for each dealership

      PRINT TOTAL.(SALES_VOLUME KEEP MODEL DEALERSHIP)

      Trends emerge and comparisons are easily made

    DEALERSHIP

    MODELCLYDE GLEASON CARR

    MINI VAN 7 5 6

    SPORTS COUPE 4 6 8

    SEDAN 3 8 12


    Queries example1
    Queries: Example

    • Corresponding SQL

      SELECT MODEL, DEALERSHIP, SUM(SALES_VOLUME)

      FROM SALES_VOLUME

      GROUP BY MODEL, DEALERSHIP

      ORDER BY MODEL, DEALERSHIP

    MODEL | DEALERSHIP | SUM(SALES_VOLUME)

    MINI VAN | CLYDE | 7

    MINI VAN | GLEASON | 5

    MINI VAN | CARR | 6

    SPORTS COUPE | CLYDE | 4

    SPORTS COUPE | GLEASON | 6

    SPORTS COUPE| CARR | 8

    SEDAN | CLYDE | 3

    SEDAN | GLEASON | 8

    SEDAN| CARR | 12


    Queries example2
    Queries: Example

    Use report writer in addition to SQL and we get

    MINI VAN

    CLYDE 7

    GLEASON 5

    CARR 6

    SPORTS COUPE

    CLYDE 4

    GLEASON 5

    CARR 8

    SEDAN

    CLYDE 3

    GLEASON 8

    CARR 12


    Rdbms vs mddb1
    RDBMS vs. MDDB

    • DO I still use RDBMS for my DW?

    • MDDBs store data in hypercube, i.e., multidimensional array

    • RDBMS store data as tables with row and columns that do not map directly to multidimensional view that user have of data

    • EDW – RDBMS

    • Data Marts - MDDB


    Rdbms vs mddb trade offs
    RDBMS vs. MDDB:Trade-Offs

    • SIZE

      • MDDBs limited by size

        • Mid – 1990s 10GB caused problems

        • Today – 100GB is OK

      • Large DWs are still better served by relational front-ends running against high performance and scalable RDBMS

    • VOLATILITY

      • Highly volatile data are better handled by RDBMS

      • MDDBs take long to load and update


    Rdbms vs mddb trade offs1
    RDBMS vs. MDDB:Trade-Offs

    • AGGREGATE STRATEGY

      • MDDBs support aggregates better

      • RDBMSs are catching up with the help of Aggregate Navigators

    • INVESTMENT PROTECTION

      • Most organizations already have made significant investments in relational technology and skill sets

      • Continued use for another purpose (DW) provides additional ROI and lowers technical risk of failure

      • MDDBs – need to acquire new software & train staff to use it


    Rdbms vs mddb trade offs2
    RDBMS vs. MDDB:Trade-Offs

    TYPE OF USERS

    • Power users prefer the range of functionalities available in MOLAP tools

    • Users that require broad views of enterprise data require access to DW and therefore better served by a ROLAP tool


    Integrated architecture
    INTEGRATED ARCHITECTURE

    • DB vendors have integrated their multidimensional and relational database products

    • Multidimensional Front-end tools

    • If queries require data that are not available in MDDB, the tools retrieve the data from the larger RDB

    • Known as “DRILL-THROUGH”


    Mddb major limitations
    MDDB: Major Limitations

    • Database Management Tools

      • Integrity through Backup & Restore

      • Tune DB for optimum performance

      • User security at multiple levels

      • Limit resource usage based on user privileges

  • Support for Drill-down to atomic data

  • Incremental DB refresh

  • Multiple Arrays Support

    - MDDBs do not support the creation of multiple, related multidomensional arrays within a single DB stucture


  • Mddb major limitations1
    MDDB: Major Limitations

    • Database Joins

      - MDDBs do not support the logical joining of multiple multidimensional arrays

    • Subset Selection

    • Indexing

    • Fragmentation

    • Handling of Sparse Data

    • Handling Large Data Volumes

    • Lack of Industry Standards



    MDD Features - Rotation

    6

    5

    4

    6

    3

    4

    M

    C

    O

    O

    D

    L

    3

    5

    5

    5

    5

    3

    E

    O

    L

    R

    4

    3

    2

    4

    5

    2

    o

    ( ROTATE 90

    )

    COLOR

    MODEL

    Sales Volumes

    Mini Van

    Blue

    Coupe

    Red

    Sedan

    White

    Mini Van

    Coupe

    Sedan

    Blue

    Red

    White

    View #1: ModelxColor

    View #2: ColorxModel

    • Also referred to as “data slicing.”

    • Each rotation yields a different slice or two dimensional table

    • of data.


    Mdd features rotation

    Sales Volumes

    M

    C

    C

    Mini Van

    Blue

    Blue

    O

    O

    O

    D

    L

    L

    Coupe

    Red

    Red

    E

    O

    O

    L

    R

    R

    Mini Van

    Carr

    Carr

    Sedan

    White

    White

    Gleason

    Gleason

    Coupe

    Clyde

    Clyde

    Sedan

    Sedan

    Coupe

    Mini Van

    Blue

    Red

    White

    Carr

    Gleason

    Clyde

    DEALERSHIP

    COLOR

    MODEL

    o

    o

    o

    ( ROTATE 90

    )

    ( ROTATE 90

    )

    ( ROTATE 90

    )

    DEALERSHIP

    DEALERSHIP

    MODEL

    View #3

    View #1

    View #2

    D

    D

    E

    E

    A

    A

    L

    L

    Mini Van

    Carr

    Carr

    E

    E

    M

    R

    R

    O

    Coupe

    Gleason

    Gleason

    S

    S

    D

    H

    H

    E

    Blue

    Blue

    Mini Van

    Sedan

    I

    I

    L

    Red

    Clyde

    Clyde

    Red

    Coupe

    White

    P

    P

    White

    Sedan

    Coupe

    Sedan

    Mini Van

    White

    Clyde

    Gleason

    Carr

    Red

    Blue

    DEALERSHIP

    COLOR

    MODEL

    o

    o

    ( ROTATE 90

    )

    ( ROTATE 90

    )

    MODEL

    COLOR

    COLOR

    View #6

    View #4

    View #5

    MDD Features - Rotation


    Mdd features rotation1
    MDD Features - Rotation

    • All the six views can be obtained by simple rotation

    • In MDBs rotations are simple as no rearrangement of data is required

    • Rotation is also referred to as “data slicing”

    • No. of views

      • 2D – 2

      • 3D – 6

      • 4D - ?

    24


    Mdd features ranging
    MDD Features - Ranging

    • How sales volume of models painted with new metallic blue compared with the sales of normal blue color models?

    • The user knows that only Sports Coupe and Mini Van models have received the new paint treatment

    • Also the user knows that only 2 dealers viz, Carr and Clyde have unconstrained supply of these models


    Mdd features ranging1

    Sales Volumes

    Mini Van

    M

    Mini Van

    O

    D

    Coupe

    Carr

    E

    Coupe

    Clyde

    L

    Normal

    Metal

    Blue

    Blue

    Carr

    Clyde

    DEALERSHIP

    Normal

    Metal

    Blue

    Blue

    COLOR

    MDD Features - Ranging

    • The end user selects the desired positions along each dimension.

    • Also referred to as "data dicing."

    • The data is scoped down to a subset grouping


    Mdd features ranging2
    MDD Features - Ranging

    • The reduced array can now be rotated and used in computations in the same was as the parent array

    • Referred to as “Data Dicing” as data is scoped down to a subset grouping

    • Complex SQL query is required in RDB

    • Performance is better in MDB as less resource consuming searches are required


    Mdd features roll up drill down
    MDD Features – Roll-Up & Drill-Down

    • Users want different views of the same data

    • For eg., Sales Volume by model vs, sales volume by dealership

    • Many times views are similar

    • Sales volume by dealership vs. volume by district

    • Natural relationship between Sales Volumes at the DEALERSHIP level and Sales Volumes at the DISTRICT level

    • Sales Volumes for all the dealerships in a district sum to the Sales Volumes for that district


    Mdd features roll up drill down1
    MDD Features – Roll-Up & Drill-Down

    • Multidimensional database technology is specially designed to facilitate the handling of these natural relationships

    • Define two related aggregates on the same dimension

    • One aggregation is dealership and the other district

    • District is at a higher level of aggregation than dealership


    Mdd features roll ups drill downs
    MDD Features - Roll-Ups & Drill Downs

    • The figure presents a definition of a hierarchy within

    • the organization dimension.

    • Aggregations perceived as being part of the same dimension.

    • Moving up and moving down levels in a hierarchy is referred to

    • as “roll-up” and “drill-down.”



    Mdd features drill down through a dimension
    MDD Features:Drill-Down Through a Dimension


    Mdd features multidimensional computations
    MDD Features:Multidimensional Computations

    • Well equipped to handle demanding mathematical functions.

    • Can treat arrays like cells in spreadsheets. For example, in a budget analysis situation, one can divide the ACTUAL array by the BUDGET array to compute the VARIANCE array.

    • Applications based on multidimensional database technology typically have one dimension defined as a "business measurements" dimension.

    • Integrates computational tools very tightly with the database structure.


    Mdd features multidimensional computations1

    Sales Volumes

    10

    0.1

    11

    Mini Van

    Coupe

    12

    0.33

    16

    -

    10

    8

    0.2

    Sedan

    16

    0.0

    16

    Variance

    Budget

    Actual

    BUSINESS MEASUREMENTS

    MDD Features:Multidimensional Computations


    The time dimension
    The Time Dimension

    • TIME as a predefined hierarchy for rolling-up and drilling-down across days, weeks, months, years and special periods, such as fiscal years.

      • Eliminates the effort required to build sophisticated hierarchies every time a database is set up.

      • Extra performance advantages




    ad