slide1
Download
Skip this Video
Download Presentation
ITCS 6163 Data Warehousing

Loading in 2 Seconds...

play fullscreen
1 / 76

ITCS 6163 Data Warehousing - PowerPoint PPT Presentation


  • 68 Views
  • Uploaded on

ITCS 6163 Data Warehousing. Xintao Wu. History. 60s C. Bachman GE network data model Late 60s IBM IMS hierarchical data model 70 E.Codd relational model 80s SQL IBM R transaction J. Gray Late 80s-90s DB2, Oracle, informix, sybase

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' ITCS 6163 Data Warehousing ' - quintessa-haley


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
history

History

60s C. Bachman GE network data model

Late 60s IBM IMS hierarchical data model

70 E.Codd relational model

80s SQL IBM R transaction J. Gray

Late 80s-90s DB2, Oracle, informix, sybase

90s- DW, internet

Turing award and Turing test?

evolution of database technology see fig 1 1
Evolution of Database Technology(See Fig. 1.1)
  • 1960s:
    • Data collection, database creation, IMS and network DBMS
  • 1970s:
    • Relational data model, relational DBMS implementation
  • 1980s:
    • RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.)
  • 1990s—2000s:
    • Data mining and data warehousing, multimedia databases, and Web databases
can you easily answer these questions
Can You Easily Answer These Questions?

What are Personnel Services costs across all departments for all funding sources?

What is the correlation between expenditures and collection of delinquent taxes?

What are the effects of outsourcing specific services?

What is the impact on revenues and expenditures of changing the operating hours of the Dept. of Motor Vehicles?

What is the economic impact of the small business initiative in our district?

overview data warehousing and olap technology for data mining
Overview: Data Warehousing and OLAP Technology for Data Mining
  • What is a data warehouse?
  • Why a data warehouse?
  • A multi-dimensional data model
  • Data warehouse architecture
  • Data warehouse implementation
  • From data warehousing to data mining
what is a warehouse

more

What is a Warehouse?
  • Collection of diverse data
    • subject oriented
    • aimed at executive, decision maker
    • often a copy of operational data
    • with value-added data (e.g., summaries, history)
    • integrated
    • time-varying
    • non-volatile
what is a warehouse1
What is a Warehouse?
  • Collection of tools
    • gathering data
    • cleansing, integrating, ...
    • querying, reporting, analysis
    • data mining
    • monitoring, administering warehouse
what is a warehouse2
What is a Warehouse?
  • Defined in many different ways, but not rigorously.
    • A decision support database that is maintained separately from the organization’s operational database
    • Support information processing by providing a solid platform of consolidated, historical data for analysis.
  • “A data warehouse is asubject-oriented, integrated, time-variant, and nonvolatilecollection of data in support of management’s decision-making process.”—W. H. Inmon
  • Data warehousing:
    • The process of constructing and using data warehouses
data warehouse subject oriented
Data Warehouse—Subject-Oriented
  • Organized around major subjects, such as customer, product, sales.
  • Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing.
  • Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.
data warehouse integrated
Data Warehouse—Integrated
  • Constructed by integrating multiple, heterogeneous data sources
    • relational databases, flat files, on-line transaction records
  • Data cleaning and data integration techniques are applied.
    • Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources
      • E.g., Hotel price: currency, tax, breakfast covered, etc.
    • When data is moved to the warehouse, it is converted.
data warehouse time variant
Data Warehouse—Time Variant
  • The time horizon for the data warehouse is significantly longer than that of operational systems.
    • Operational database: current value data.
    • Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)
  • Every key structure in the data warehouse
    • Contains an element of time, explicitly or implicitly
    • But the key of operational data may or may not contain “time element”.
data warehouse non volatile
Data Warehouse—Non-Volatile
  • A physically separate store of data transformed from the operational environment.
  • Operational update of data does not occur in the data warehouse environment.
    • Does not require transaction processing, recovery, and concurrency control mechanisms
    • Requires only two operations in data accessing:
      • initial loading of data and access of data.
warehouse is specialized db
Mostly updates

Many small transactions

Mb-Tb of data

Current snapshot

Raw data

Clerical users

Mostly reads

Queries are long, complex

Gb-Tb of data

History

Summarized, consolidated data

Decision-makers, analysts as users

Warehouse is specialized DB

Standard DB

Warehouse

data warehouse vs heterogeneous dbms
Data Warehouse vs. Heterogeneous DBMS
  • Traditional heterogeneous DB integration:
    • Build wrappers/mediators on top of heterogeneous databases
    • Query driven approach
      • When a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set
      • Complex information filtering, compete for resources
  • Data warehouse: update-driven, high performance
    • Information from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis
data warehouse vs operational dbms
Data Warehouse vs. Operational DBMS
  • OLTP (on-line transaction processing)
    • Major task of traditional relational DBMS
    • Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc.
  • OLAP (on-line analytical processing)
    • Major task of data warehouse system
    • Data analysis and decision making
  • Distinct features (OLTP vs. OLAP):
    • User and system orientation: customer vs. market
    • Data contents: current, detailed vs. historical, consolidated
    • Database design: ER + application vs. star + subject
    • View: current, local vs. evolutionary, integrated
    • Access patterns: update vs. read-only but complex queries
overview data warehousing and olap technology for data mining1
Overview: Data Warehousing and OLAP Technology for Data Mining
  • What a data warehouse?
  • Why a data warehouse?
  • A multi-dimensional data model
  • Data warehouse architecture
  • Data warehouse implementation
  • From data warehousing to data mining
why separate data warehouse
Why Separate Data Warehouse?
  • High performance for both systems
    • DBMS— tuned for OLTP: access methods, indexing, concurrency control, recovery
    • Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view, consolidation.
  • Different functions and different data:
    • missing data: Decision support requires historical data which operational DBs do not typically maintain
    • data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources
    • data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled
warehouse architecture

Client

Client

Query & Analysis

Warehouse

Integration

Source

Source

Source

Warehouse Architecture

Metadata

why a warehouse

?

Source

Source

Why a Warehouse?
  • Two Approaches:
    • Query-Driven (Eager)
    • Warehouse (Lazy)
query driven approach

Client

Client

Mediator

Wrapper

Wrapper

Wrapper

Source

Source

Source

Query-Driven Approach
advantages of warehousing
Advantages of Warehousing
  • High query performance
  • Queries not visible outside warehouse
  • Local processing at sources unaffected
  • Can operate when sources unavailable
  • Can query data not stored in a DBMS
  • Extra information at warehouse
    • Modify, summarize (store aggregates)
    • Add historical information
advantages of query driven
Advantages of Query-Driven
  • No need to copy data
    • less storage
    • no need to purchase data
  • More up-to-date data
  • Query needs can be unknown
  • Only query interface needed at sources
overview data warehousing and olap technology for data mining2
Overview: Data Warehousing and OLAP Technology for Data Mining
  • What a data warehouse?
  • Why a data warehouse?
  • A multi-dimensional data model
  • Data warehouse architecture
  • Data warehouse implementation
  • From data warehousing to data mining
modeling oltp systems
Modeling OLTP Systems
  • Goal -- Update as many transactions as possible in the shortest period of time
  • Approach
    • Model to 3rd Normal Form (3NF)
    • Minimize redundancy to optimize update
  • Result
    • Create many (hundreds) of tables
    • Difficult for business users to understand and use
    • Retrieval requires many JOINs = lousy performance
modeling the data warehouse
Modeling the Data Warehouse
  • Tuning the relational model
    • Denormalize
      • Reduces the number of tables
      • Improves usability
      • Improves performance
    • Add aggregate data (typically separate tables)
      • Improves performance
      • Degrades usability
modeling the data warehouse1
Modeling the Data Warehouse

“Entity relation data models are a disaster for querying because they cannot be understood by users and they cannot be navigated usefully by DBMS software. Entity relation models cannot be used as the basis for enterprise data warehouses.”

Ralph Kimball, The Data Warehouse Toolkit,

1996, John Wiley & Sons, Inc., ISBN 0-471-15337-0

from tables and spreadsheets to data cubes
From Tables and Spreadsheets to Data Cubes
  • A data warehouse is based on a multidimensional data model which views data in the form of a data cube
  • A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions
    • Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year)
    • Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables
  • In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.
cube a lattice of cuboids
Cube: A Lattice of Cuboids

all

0-D(apex) cuboid

time

item

location

supplier

1-D cuboids

time,item

time,location

item,location

location,supplier

2-D cuboids

time,supplier

item,supplier

time,location,supplier

time,item,location

3-D cuboids

item,location,supplier

time,item,supplier

4-D(base) cuboid

time, item, location, supplier

conceptual modeling of data warehouses
Conceptual Modeling of Data Warehouses
  • Modeling data warehouses: dimensions & measures
    • Star schema: A fact table in the middle connected to a set of dimension tables
    • Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake
    • Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation
example of star schema

item

time

item_key

item_name

brand

type

supplier_type

time_key

day

day_of_the_week

month

quarter

year

location

branch

location_key

street

city

province_or_street

country

branch_key

branch_name

branch_type

Example of Star Schema

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

example of snowflake schema

supplier

item

time

item_key

item_name

brand

type

supplier_key

supplier_key

supplier_type

time_key

day

day_of_the_week

month

quarter

year

city

location

branch

city_key

city

province_or_street

country

location_key

street

city_key

branch_key

branch_name

branch_type

Example of Snowflake Schema

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

example of fact constellation

item

time

item_key

item_name

brand

type

supplier_type

time_key

day

day_of_the_week

month

quarter

year

location

location_key

street

city

province_or_street

country

shipper

branch

shipper_key

shipper_name

location_key

shipper_type

branch_key

branch_name

branch_type

Example of Fact Constellation

Shipping Fact Table

time_key

Sales Fact Table

item_key

time_key

shipper_key

item_key

from_location

branch_key

to_location

location_key

dollars_cost

units_sold

units_shipped

dollars_sold

avg_sales

Measures

multidimensional data
Multidimensional Data
  • Sales volume as a function of product, month, and region

Dimensions: Product, Location, Time

Hierarchical summarization paths

Region

Industry Region Year

Category Country Quarter

Product City Month Week

Office Day

Product

Month

a sample data cube

Date

2Qtr

1Qtr

sum

3Qtr

4Qtr

TV

Product

U.S.A

PC

VCR

sum

Canada

Country

Mexico

sum

All, All, All

A Sample Data Cube

Total annual sales

of TV in U.S.A.

cuboids corresponding to the cube
Cuboids Corresponding to the Cube

all

0-D(apex) cuboid

country

product

date

1-D cuboids

product,date

product,country

date, country

2-D cuboids

3-D(base) cuboid

product, date, country

typical olap operations
Typical OLAP Operations
  • Roll up (drill-up): summarize data
    • by climbing up hierarchy or by dimension reduction
  • Drill down (roll down): reverse of roll-up
    • from higher level summary to lower level summary or detailed data, or introducing new dimensions
  • Slice and dice:
    • project and select
  • Pivot (rotate):
    • reorient the cube, visualization, 3D to series of 2D planes.
  • Other operations
    • drill across: involving (across) more than one fact table
    • drill through: through the bottom level of the cube to its back-end relational tables (using SQL)
relational operators
Relational Operators
  • Select
  • Project
  • Join
aggregates
Aggregates
  • Add up amounts by day
  • In SQL: SELECT date, sum(amt) FROM SALE
  • GROUP BY date
another example
Another Example
  • Add up amounts by day, product
  • In SQL: SELECT date, sum(amt) FROM SALE
  • GROUP BY date, prodId

rollup

drill-down

aggregates1
Aggregates
  • Operators: sum, count, max, min, median, avg
  • Type
    • Distributive
    • Algebraic
    • holistic
  • “Having” clause
  • Using dimension hierarchy
    • average by region (within store)
    • maximum by month (within date)
cube aggregation

rollup

drill-down

Cube Aggregation

Example: computing sums

day 2

. . .

day 1

129

aggregation using hierarchies

day 2

day 1

Aggregation Using Hierarchies

customer

region

country

(customer c1 in Region A;

customers c2, c3 in Region B)

pivoting

day 2

day 1

Pivoting

Fact table view:

Multi-dimensional cube:

overview data warehousing and olap technology for data mining3
Overview: Data Warehousing and OLAP Technology for Data Mining
  • What a data warehouse?
  • Why a data warehouse?
  • A multi-dimensional data model
  • Data warehouse architecture
  • Data warehouse implementation
  • From data warehousing to data mining
design of a data warehouse a business analysis framework
Design of a Data Warehouse: A Business Analysis Framework
  • Four views regarding the design of a data warehouse
    • Top-down view
      • allows selection of the relevant information necessary for the data warehouse
    • Data source view
      • exposes the information being captured, stored, and managed by operational systems
    • Data warehouse view
      • consists of fact tables and dimension tables
    • Business query view
      • sees the perspectives of data in the warehouse from the view of end-user
data warehouse design process
Data Warehouse Design Process
  • Top-down, bottom-up approaches or a combination of both
    • Top-down: Starts with overall design and planning (mature)
    • Bottom-up: Starts with experiments and prototypes (rapid)
  • From software engineering point of view
    • Waterfall: structured and systematic analysis at each step before proceeding to the next
    • Spiral: rapid generation of increasingly functional systems, short turn around time, quick turn around
  • Typical data warehouse design process
    • Choose a business process to model, e.g., orders, invoices, etc.
    • Choose the grain (atomic level of data) of the business process
    • Choose the dimensions that will apply to each fact table record
    • Choose the measure that will populate each fact table record
slide48

other

sources

Extract

Transform

Load

Refresh

Operational

DBs

Multi-Tiered Architecture

Monitor

&

Integrator

OLAP Server

Metadata

Analysis

Query

Reports

Data mining

Serve

Data

Warehouse

Data Marts

Data Sources

Data Storage

OLAP Engine

Front-End Tools

three data warehouse models
Three Data Warehouse Models
  • Enterprise warehouse
    • collects all of the information about subjects spanning the entire organization
  • Data Mart
    • a subset of corporate-wide data that is of value to a specific groups of users. Its scope is confined to specific, selected groups, such as marketing data mart
      • Independent vs. dependent (directly from warehouse) data mart
  • Virtual warehouse
    • A set of views over operational databases
    • Only some of the possible summary views may be materialized
what is a data mart

Finance

HR

Budget

Info.

Tech.

Asset

Mgmt.

Purch-

asing

What is a Data Mart ?
  • A data mart is a small-scale data warehouse that is focused on a single department or single subject area to provide a subset of data warehouse data to address specific reporting and analysis requirements.
  • Smaller warehouses
  • Spans part of organization
  • Do not require enterprise-wide consensus
    • but long term integration problems?
data warehouse development a recommended approach
Data Warehouse Development: A Recommended Approach

Multi-Tier Data Warehouse

Distributed Data Marts

Enterprise Data Warehouse

Data Mart

Data Mart

Model refinement

Model refinement

Define a high-level corporate data model

olap server architectures
OLAP Server Architectures
  • Relational OLAP (ROLAP)
    • ROLAP - provides a Multi-dimensional view of a relational DB (e.g. MicroStrategy)
    • Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middle ware to support missing pieces
    • Include optimization of DBMS backend, implementation of aggregation navigation logic, and additional tools and services
    • greater scalability
  • Multidimensional OLAP (MOLAP)
    • Array-based multidimensional storage engine (sparse matrix techniques)
    • fast indexing to pre-computed summarized data
  • Hybrid OLAP (HOLAP)
    • User flexibility, e.g., low level: relational, high-level: array
  • Specialized SQL servers
    • specialized support for SQL queries over star/snowflake schemas
slide53

MOLAP Databases

  • Data is stored using a proprietary format(MOLAP)
  • Accessible only through the DB vendor’s tools
  • Suitable only for summarized data
  • Data may be summarized in advance or real-time
  • Examples:
    • PowerPlay
    • Holos
    • Essbase
slide54

RDBMS: Indexing Strategies

  • Select columns to be indexed:
    • Choose combinations of columns most often used to constrain queries (“where …” clause)
    • Queries must use constraining columns in the same order as the columns in the index
  • Unique more efficient than non-unique.
  • More indexes means faster query performance, but also longer transformation/load times.
  • Types of Indexes:
    • B-tree -- many possible values (e.g., invoice number)
    • Bitmap -- few possible values (e.g., M/F, S/M/D/W)
molap versus rolap
MOLAP

Multidimensional OLAP

Data stored in multi-dimensional cube

Transformation required

Data retrieved directly from cube for analysis

Faster analytical processing

Cube size limitations

ROLAP

Relational OLAP

Data stored in relational database as virtual cube

No transformation needed

Data retrieved via SQL from database for analysis

Slower analytical processing

No size limitations

MOLAP versus ROLAP
data warehouse usage
Data Warehouse Usage
  • Three kinds of data warehouse applications
    • Information processing
      • supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs
    • Analytical processing
      • multidimensional analysis of data warehouse data
      • supports basic OLAP operations, slice-dice, drilling, pivoting
    • Data mining
      • knowledge discovery from hidden patterns
      • supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools.
  • Differences among the three tasks
data mining forecasting
Data Mining & Forecasting
  • Mining the Warehouse
    • Choose data population
    • Select mining technique
    • Segment data into groups
    • Identify data patterns
  • Forecasting Data
    • Select trend data
    • Choose forecast model
    • Run forecast
    • Display predictions
accessing analyzing data
Accessing & Analyzing Data
  • Query & Reporting … retrieving data directly from the warehouse and preparing it for presentation
  • Online Analytical Processing (OLAP) … analyzing aggregated data from a variety of perspectives
  • Data Mining & Forecasting … analyzing and predicting data using mathematical models
query reporting
Query & Reporting
  • Query the Data ...
    • Select & filter data
    • Retrieve results
  • Report the Results ...
    • Sort & group data
    • Format & present data
  • Save or Export Data
    • Save queries & reports
    • Export to other tools
    • Publish HTML pages
query reporting tools
Query & Reporting Tools
  • Cognos Impromptu
  • Business Objects
  • Crystal Info
  • BrioQuery
  • IQ
  • GQL
  • SAS
online analytical processing
Online Analytical Processing
  • Slice and Dice ...
    • Select dimensions
    • Choose measures
    • Filter by dimensions
  • Drill Down ...
    • Drill down hierarchies
    • Drill through to details
  • Present the Results
    • Present as spreadsheet
    • Display graphically
olap tools
OLAP Tools
  • Cognos PowerPlay
  • Business Analyzer
  • Holos
  • BrioAnalyzer
  • Microstrategy
  • Oracle Express
  • SAS
  • Arbor Essbase
data mining forecasting1
Data Mining & Forecasting
  • Mining the Warehouse
    • Choose data population
    • Select mining technique
    • Segment data into groups
    • Identify data patterns
  • Forecasting Data
    • Select trend data
    • Choose forecast model
    • Run forecast
    • Display predictions
data mining
Data Mining

transaction

id

products

bought

customer

id

sales

records:

  • Trend: Products p5, p8 often bough together
  • Trend: Customer 12 likes product p9
mining and forecasting tools
Mining and Forecasting Tools
  • Scenario
  • 4Thought
  • Business Miner
  • Clementine
  • Darwin
  • Holos
  • SAS
data warehouse back end tools and utilities
Data Warehouse Back-End Tools and Utilities
  • Data extraction:
    • get data from multiple, heterogeneous, and external sources
  • Data cleaning:
    • detect errors in the data and rectify them when possible
  • Data transformation:
    • convert data from legacy or host format to warehouse format
  • Load:
    • sort, summarize, consolidate, compute views, check integrity, and build indicies and partitions
  • Refresh
    • propagate the updates from the data sources to the warehouse
data cleaning
Data Cleaning
  • Of primordial interest in the warehouse creation
  • One of the biggest problems
  • Difficult to achieve
  • Probability of one or many of the sources containing “dirty data” is high.
  • Lots of manual intervention
data cleaning problems
Data Cleaning Problems

Data quality problems

Multi-source

Single Source

Schema level Instance level

Schema levelInstance level

(poor schema design) (data entry errors) (heterogeneity) (overlapping contradicting data)

.Uniqueness .MisspellingsNaming conflicts Inconsistent aggregation

multisource problems
Multisource problems
  • All the previous problems +
  • Schema differences (translation and integration)
    • E.g.: EmpID, CID, Sex= M/F, Sex=0/1
  • Instance level conflicts
    • Duplicate records, contradicting records
    • Different measures ($, Euros)
    • Different aggregation levels (weeks, quarters)
overview data warehousing and olap technology for data mining4
Overview: Data Warehousing and OLAP Technology for Data Mining
  • What a data warehouse?
  • Why a data warehouse?
  • A multi-dimensional data model
  • Data warehouse architecture
  • Data warehouse implementation
  • From data warehouse to data mining
data mining a kdd process
Data Mining: A KDD Process

Knowledge

Pattern Evaluation

  • Data mining: the core of knowledge discovery process.

Data Mining

Task-relevant Data

Selection

Data Warehouse

Data Cleaning

Data Integration

Databases

steps of a kdd process
Steps of a KDD Process
  • Learning the application domain:
    • relevant prior knowledge and goals of application
  • Creating a target data set: data selection
  • Data cleaning and preprocessing: (may take 60% of effort!)
  • Data reduction and transformation:
    • Find useful features, dimensionality/variable reduction, invariant representation.
  • Choosing functions of data mining
    • summarization, classification, regression, association, clustering.
  • Choosing the mining algorithm(s)
  • Data mining: search for patterns of interest
  • Pattern evaluation and knowledge presentation
    • visualization, transformation, removing redundant patterns, etc.
  • Use of discovered knowledge
data mining and business intelligence
Data Mining and Business Intelligence

Increasing potential

to support

business decisions

End User

Making

Decisions

Business

Analyst

Data Presentation

Visualization Techniques

Data Mining

Data

Analyst

Information Discovery

Data Exploration

Statistical Analysis, Querying and Reporting

Data Warehouses / Data Marts

OLAP, MDA

DBA

Data Sources

Paper, Files, Information Providers, Database Systems, OLTP

from olap to on line analytical mining olam
From OLAP to On Line Analytical Mining (OLAM)
  • Why online analytical mining?
    • High quality of data in data warehouses
      • DW contains integrated, consistent, cleaned data
    • Available information processing structure surrounding data warehouses
      • ODBC, OLEDB, Web accessing, service facilities, reporting and OLAP tools
    • OLAP-based exploratory data analysis
      • mining with drilling, dicing, pivoting, etc.
    • On-line selection of data mining functions
      • integration and swapping of multiple mining functions, algorithms, and tasks.
  • Architecture of OLAM
an olam architecture
An OLAM Architecture

Mining query

Mining result

Layer4

User Interface

User GUI API

OLAM

Engine

OLAP

Engine

Layer3

OLAP/OLAM

Data Cube API

Layer2

MDDB

MDDB

Meta Data

Database API

Filtering&Integration

Filtering

Layer1

Data Repository

Data cleaning

Data

Warehouse

Databases

Data integration

summary
Summary
  • Data warehouse
    • A subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process
  • A multi-dimensional model of a data warehouse
    • Star schema, snowflake schema, fact constellations
    • A data cube consists of dimensions & measures
  • OLAP operations: drilling, rolling, slicing, dicing and pivoting
  • OLAP servers: ROLAP, MOLAP, HOLAP
  • From OLAP to OLAM
ad