a narrow escape how we avoided the data warehouse death march n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Narrow Escape: How We Avoided the Data Warehouse “Death March” PowerPoint Presentation
Download Presentation
A Narrow Escape: How We Avoided the Data Warehouse “Death March”

Loading in 2 Seconds...

play fullscreen
1 / 38

A Narrow Escape: How We Avoided the Data Warehouse “Death March” - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

A Narrow Escape: How We Avoided the Data Warehouse “Death March”. Northern California Oracle User’s Group November 15, 2001. Chris Lawson. home.mindspring.com/~karo3. What is a “Death March” ?. Edward Yourdon, 1997, Death March: The Complete Software Developer’s Guide to

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Narrow Escape: How We Avoided the Data Warehouse “Death March”' - Faraday


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a narrow escape how we avoided the data warehouse death march

A Narrow Escape:How We Avoided the Data Warehouse “Death March”

Northern California Oracle User’s Group

November 15, 2001

ChrisLawson

home.mindspring.com/~karo3

what is a death march
What is a “Death March” ?

Edward Yourdon, 1997, Death March:

The Complete Software Developer’s Guide to

Surviving “Mission Impossible Projects”

Death March projects “use a “forced march imposed upon relatively innocent victims, the outcome of which is usually a high casualty rate.”

some encouraging words
Some “Encouraging” Words
  • DW projects have a terrible reputation, often producing mad customers and a frustrated project team.
  • Many seek to “jump ship.”

Yourdon: “A large percentage of the projects associated with start-up companies are death march projects. A large percentage of these projects will fail.”

Death March,

Death March!

further words of encouragement
Further “Words of Encouragement”
  • Approx 50% of large DW projects “fail to meet the desired levels of success,” often failing on first or even second attempts.”

CIO magazine, “The Middle Ground,” 1/15/99.

By the time the data sources have been identified and transformed the in-house customers have often lost interest.

traditional approach will it work
Traditional Approach: Will it Work?
  • For large information systems, development has traditionally been an extremely structured process.
  • Many days are spent on up-front analysis, requirements analysis, design reviews, etc.
  • The strategy for these types of projects is to invest much time early, when mistakes are cheaper to fix.
traditional software project
Traditional Software Project
  • For large government-funded projects, this is reasonable because requirements are fixed.
  • Customer may even mandate standards, such as “SEI” (Software Engineering Institute).
  • These standards require strict requirement analysis and rigorous design reviews.
  • Emphasis is on repeatable processes and continuous improvement.
change in ground rules
Change in “Ground Rules”
  • The strategy described above often does not work well with large, private-sector projects, such as Data Warehouses.
  • The “ground rules” defining a successful project are different.
  • For example, in contrast to government projects, Data Warehouse requirements are negotiable, elusive, and will probably change.
change in ground rules cont d
Change in Ground Rules (cont’d)

“Just when you are about to deliver, expect the rules to change – then change again.”

Steve Tracy, IntelligentEnterprise, “Close the Loop,” March 27, 2001.

In other words, the entire nature of the project development cycle is fluid.

our escape route prototyping
Our Escape Route: Prototyping
  • Early phases were very limited in scope.
  • We deliberately followed a simpler development process.
  • The Prototype approach allows early demonstration of a working model with some features.

Data warehouse designed for DigitalThink used a Prototype strategy with very restricted scope.

key to prototyping benefits
Key to Prototyping Benefits
  • With scope limited to only six key reports and one data source, our early successes provided confidence for later stages.
  • Prototyping provides the opportunity to quickly uncover “show stoppers.
  • Roadblocks will be found quickly.
another benefit credibility
Another Benefit: Credibility
  • Prototyping allows team to gain support among the potential users.
  • Working prototype enabled the team to perform several “demos” of key reports.
  • These demos provided "buy-in" from interested parties
  • Demos also proved that that the project was “on track”--we gained credibility.
design overview
Design Overview
  • Like most Data Warehouses, Digital Think used a process called ETL—Extraction, Transformation, and Loading.
  • Nightly runs perform data extraction, transformation, and loading from the production Sybase system into Oracle 8i.
  • Only the changed production data is inserted.
  • Data is transformed into appropriate ‘star’ schema format using E.piphany ETL tool.
slide13

ServerSun E4500 cluster

Web Server

Sun Ultra 2 (2)

Load Program

Production

Sybase

Apache 1.3.12

STAGING

Oracle

Actuate

ReportCast

ETL Program

Reporting Server

Sun E-3500 cluster

Actuate e.Reporting Server

REPORTING

Oracle

reporting system
Reporting System
  • The application server layer was actually a Report Server in our case.
  • Reports are produced by Actuate reporting system.
  • Actuate reads the pre-defined report definition to define a database query.
  • It then executes the query on the Oracle 8i database, adds formatting and graphics.
  • It then forwards report to Apache web server for delivery to the client’s web browser.
database surprises
Database Surprises
  • Surprisingly(?) some of the newest Oracle features were a mixed blessing.
  • Star Transformation did improve performance of some reports, but degraded others.

I wasn’t surprised

Feature turned off when bug caused different results from same query!

More on Star Schemas later!

report factories
Report “Factories.”
  • Some queries require extensive processing.
  • This presents risk that numerous reports might be requested simultaneously, degrading the system.
  • This risk was mitigated via “factories,” which restrict the number of simultaneous queries.
  • Reports are assigned to particular factories based on expected run time.
  • The factory for the typical report was configured to allow execution of two reports per CPU.
a big challenge lumpy data
A Big Challenge: Lumpy Data
  • Certain customers accounted for a large proportion of the data. This presented a performance-tuning dilemma.
  • Time for reports ranged from a few seconds for most to a few minutes for some customers.
  • Histograms no good because ‘big’ customer looks like a ‘regular’ account (at first).

How can performance be optimized given this huge variance?

oracle trick query rewrite
Oracle Trick: “Query Rewrite”
  • Recent database feature uses “materialized views” (Snapshots) with “Query Rewrite.”
  • Each night, aggregate data is stored in a materialized view for the big accounts.
  • Then, whenever these customers submit a report, the optimizer rewrites the database query to use the special aggregate data.
  • Another help was “Pre-running reports” for big customers. The Report Server then uses the pre-built report instead of new query.
star schema queries
Star Schema Queries

Let’s look at how Oracle uses Star Schemas

FACTORY

10 rows

SALES Fact table

100 million rows

PRODUCT

500 rows

STORE

100 rows

Dimension tables

TIME_PERIOD

365 rows

star schema performance
Star Schema Performance
  • Star Schemas are very useful, but good performance can be elusive when joining.
  • There is always the “normal” way of joining--Hash Join, Nested Loop, Sort/Merge.
  • But--the Oracle optimizer has some “tricks” to use when joining tables in Star Schemas.
  • These two tricks are called “Star Join” and “Star Transformation.”
  • These two methods are NOT the same, and use drastically different ideas.
intro to sql joins with stars
Intro to Sql Joins with Stars
  • The Issue: What if the selection criteria is spread among the Dimension tables.
  • That is, the query as a whole only returns a few rows, but the ‘Where Clause” on each Dimension table is not very restrictive.
  • This presents a special type of problem, and the join methods can have drastically different performance, as we will see.

Failure to understand this issue will cause bad things to happen.

example typical query
Example: Typical Query

Select Sales.Sale# From Sales, Time_Period, Factory,Product, Store where

Time_Period.Quarter = ‘Q1-2000’ and

Factory.Manuf_Name = ‘Coleman’ and

Product.Description = ‘Camping Gear’ and

Store.City = ‘Elk Grove’ ;

The query as a whole only returns a few rows but any one criteria would return many rows.

the dilemma
The Dilemma
  • How can we “jump” to the final result set?
  • It seems like no matter how we join the tables, we must process many rows. The earliest joins must “pay the price.”
  • A solution seems impossible! Or is it?
  • We only want the final result set, not all the rows we have to “wade through” to get there.
oracle s first answer star join
Oracle’s First Answer: STAR Join
  • Optimizer recognizes the Star setup, & bypasses the usual join methods.
  • Prior to joining to the Fact table, the optimizer jointly considers the dimension constraints.
  • The optimizer manually builds a list of all possible combinations (Cartesian product) of the Dimension rows that meet the selection criteria.
  • This small set of rows is used to access the Fact table, via a (B*tree) composite index.
star join example
Star Join Example

Step 1: Find Cartesian product of Dimension rows that meet the search criteria.

Quarter = ‘Q1-2000’ 90 rows

Manuf = ‘Coleman x 1 row

Product = ‘Sleeping Bag’ x 1 row

Store = ‘Boston’ x 1 row

total 90 rows

Step 2:Using this set of 90 rows, access Sales table via 4-col index.

star join complications
STAR Join: Complications
  • Requires huge composite index on Fact table.
  • Multiple huge indexes will be necessary so that “leading columns” rule will always be met.
  • Cartesian product can be huge. Example:

Quarter = ‘Q1-2000’ 90 rows

Manuf like ‘C%’ x 10 rows

Product like ‘Sleeping%’ x 10 rows

Store_Area = ‘East’ x 20 rows = 180,000

Thus, Star Join would require 180,000 index lookups

oracle s second answer star transformation
Oracle’s Second Answer: STAR Transformation

The “Trick”-- Use inherent speed of combiningbitmaps

Instead of figuring all possible combinations, the combining is done at the bitmap level.

  • Need bitmap index for each foreign key in Fact pointing to a Dimension table.
  • Star_Transformation_Enabled = True
example of star transformation
Example of STAR Transformation

QUARTER 90 rows {‘1/1/2000’, ‘1/2/2000’, etc.}

MANUF 10 rows {‘Coleman’, ‘Clark’, etc.}

PRODUCT 10 rows {‘Sleep Bag’, ‘Sleepware’, etc}

STORE: 20 rows {‘Boston’, ‘New York’, etc.}

Consider again criteria: Quarter = ‘Q1-2000’ AND Manuf like ‘C%’ AND Product like ‘Sleeping’ AND Store_Area = ‘East’

Step 1: Scan each Dimension table to find rows that match the search criteria. Get PK’s:

star transformation
STAR Transformation

Step 2: Access matching bitmaps on Fact table. Combine bitmaps (‘OR’) for each Dimension:

QUARTER Merge 90 bitmaps => 1 bitmap

MANUF Merge 10 bitmaps => 1 bitmap

PRODUCT Merge 10 bitmaps => 1 bitmap

STORE: Merge 20 bitmaps => 1 bitmap

star transformation1
STAR Transformation

Step 3: Combine (‘AND’) bitmaps yielding the “final” bitmap that encompasses all selection criteria.

Step 4: Use final bitmap to read the Fact table.

Step 5: [If necessary] Return to Dimension tables to get columns not yet read.

high availability
High Availability
  • This required a method of keeping the database available while new data is loaded.
  • The E.piphany tool provided a handy solution to this via a duplicate set of database tables.
  • The Reporting system was required to be available 24x7, because of world-wide customers.
high availability1
High Availability
  • Each table set contains all the data needed to run a report. Thus, while the users are querying set “A,” new data is copied into “B.”
  • At the conclusion of the nightly data update, new reports are directed (via a view) to the most recent set of data--either “A” or “B.”
  • Of course, this feature requires more storage, but the high availability benefit was deemed a good tradeoff.
high availability ha
High Availability (HA)
  • The database and reporting servers were configured for HA (High Availability) using Veritas Cluster Server (VCS).
  • With VCS, each server is actually part of a two-node cluster.
  • Upon failure of one node, the entire application and database automatically switch to the alternate node.
  • See web site for further documentation on Veritas VCS.
teamwork
Teamwork
  • The critical position of project architect was filled very early in the project.

Our architect

  • The architect was one of the few team members with extensive data warehouse design experience.
  • He provided the “road map”, as well as designed the database “Star” schema.
teamwork continued
Teamwork (continued)
  • External customers were invited to preview prototype reports, and provide feedback.
  • The DBA worked with the designers to optimize the response time, and identify database bottlenecks.
  • This frequently meant identifying poorly performing sql code, and suggesting corrections, or adding new indexes.

Our DBA loves sql tuning

summary our secret formula
Summary: Our Secret Formula
  • Rather, prototyping strategy provided a positive impetus and a laid a good groundwork for further development.
  • Trust and cooperation among team members was also a key contributor to success.
  • Recipe for success: Competent personnel plus small early successes plus credibility.

The project success was not primarily due to “gurus” or clever development techniques.

for further information

For Further Information:

Chris_Lawson @ Yahoo.com

home.mindspring.com/~karo3