Multi-terabyte Data Warehouses on MYSQL? Absolutely!
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

Multi-terabyte Data Warehouses on MYSQL? Absolutely! PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on
  • Presentation posted in: General

Multi-terabyte Data Warehouses on MYSQL? Absolutely!. Agenda. Data Warehousing Today Traditional Data Warehouse Solutions A New Approach to Multi-Terabyte Data Warehouses. A Look at the Market. The worldwide database market was $18.8 billion in 2006 11.1 Billion for OLTP

Download Presentation

Multi-terabyte Data Warehouses on MYSQL? Absolutely!

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Multi terabyte data warehouses on mysql absolutely

Multi-terabyte Data Warehouses on MYSQL? Absolutely!


Agenda

Agenda

  • Data Warehousing Today

  • Traditional Data Warehouse Solutions

  • A New Approach to Multi-Terabyte Data Warehouses


A look at the market

A Look at the Market

  • The worldwide database market was $18.8 billion in 2006

    • 11.1 Billion for OLTP

    • 7.7 Billion for Data warehousing

  • 41% of the database budget is spent on data warehousing

  • Data warehousing is the number one area of CIO spend in North America in 2007 and 2008

  • The data warehousing market is growing twice as fast as the OLTP database market


Bi is not just for the boardroom

BI is Not Just for the Boardroom

  • BI started as a strategic, decision-support tool, used to create canned reports by executives and analysts to guide the ship

  • Today, BI is mission critical, and serves users across the enterprise, used to support not only traditional analytics, but also daily, operational decision making

  • These changes in use have brought changes in infrastructure requirements

  • Traditional RDBMS have trouble making the grade


How is the data warehouse different

How is the Data Warehouse different?

  • Many queries - all very different, unpredictable, and always changing

  • Queries are very complex - lots of joins, group bys, where clauses

  • Focus on history over many days, months, years

  • Interface to user and many different Business Intelligence tools

OLTP Data Warehouse

  • Many simple transactions of exactly the same type

  • A lot of tuning (data model, indexes, partitions) in order for the specific transaction to perform well

  • Focus on current data

  • Scaling thru massive hardware and multiple copies of the database (eg. Online gaming systems have 1 database instance for every customer)

  • Interface to an application, often custom


The data warehousing challenge

The Data Warehousing Challenge

“Volume of the world’s data doubles every three years. Ninety-two percent of new information is stored in magnetic media...organizations face a simple problem: what to do with all the data.”

Industry Research

“Collecting and analyzing information that enables your organization to better lead, decide, measure, manage and optimize its overall efficiency is a major financial and competitive differentiator. The faster an enterprise can gather and use relevant information, the faster it will be able to reduce costs and increase profits.”

Gartner


The problem data warehouses are strained

The Problem: Data Warehouses are Strained

Data Volume

Complexity

Trouble

+

=

Users are asking more complex questions

Data is aggregated and deleted

Data is archived and not usable

Complex queries are blocked

Complex queries don’t perform

Data is growing exponentially


What do the current limitations mean for stakeholders

What do the current limitations mean for Stakeholders?

  • Users

    • Do not get access to data they need;

    • Queries run too slowly;

    • Are not allowed to think creatively – ask new and different questions;

    • Are told to wait for months for what they want in minutes

  • IT

    • Besieged with requests for new data sources;

    • Feature creep and changing requirements straining resources;

    • Analytic system maintenance tuning affect support for operational systems;

  • Executives

    • CIOs face service level complaints and rising IT costs;

    • Business unit leaders without analytic data fail to achieve objectives;


About infobright

About Infobright


Data warehousing part of the problem

Data Warehousing: Part of the Problem

10

10

Clickstream and log files

0101010101010101010101010101

10

10

0101010101010101010101010101

10

10

101

101

101

Existing data warehouse

0101010101010101010101010

0101010101010101010101010

10

10

10

01

01

01

External Sources

0101010101010101010101

0101010101010101010101

0

1

1

1

0

1

0

1

1

10

0

10

0

10

01

0

01

10

1

10

01

1

01

10

01

1

0

0

01

10

0

10

101

10

101

101

1

10

1

0

10

0

1

1

101

101

10

101

1010

0

10

1

.

1

0

010

01

0

1

010

0

1010

1

0

1

01

1

0

01

0

0101

10

101

0

01

0

101

0

More Kinds of Output Needed

by More Business Users

More Data & Data Sources

  • I/O intensive, write centric

  • Labor intensive, heavy indexing and partitioning

  • Hardware intensive: massive storage; big servers

Traditional Data Warehousing


Real life example

Real Life Example

“Data is the difference. The difference between a campaign that meets your objectives vs. one that blows them away. Between paying $9 vs. $60 for a new customer. Between predicting what sites to advertise on vs. knowing you’ve put the right message in front of the right person no matter what site they are on.”


Desired queries

Desired Queries:

  • #1 - Campaign Effectiveness:

    • Goal was to determine the optimum number of times to show an ad to get best results

    • Actual query example: analyzed 2 billion rows of campaign frequency by date, to look at 5 campaigns in order to determine how many times a user saw each campaign.

  • #2 - User Demographics by Campaign:

    • Counts users by different demographic categories

    • Very wide range of possible results across varying range of rows

    • Two actual query examples:

      • User entered incorrect campaign number. Search was performed against 1.3 billion rows in user campaign aggregate table and the result was a null set

      • Largest campaign (highest results returned) where 89 million rows (11% of entire table) in user campaign were selected and joined to 57 million rows in the user dimension table


Traditional data warehouse approach

Traditional Data Warehouse Approach

  • Identify the reporting requirements

  • Determine the data needed

  • Design the data warehouse:

    • Extract-Transform-Load

    • Data Model (Logical and Physical)

    • Canned reports and BI tools

      …then

  • Revise the model as reporting requirements change and data grows:

    • Add indexes

    • Partition data to improve performance

    • Restrict users!


Traditional data warehouse approach1

Traditional Data Warehouse Approach

  • Results:

    • Software costs well known and predictable but...

    • Management and support costs spiral:

      • Partitioning strategies

      • Indexing strategies

      • Additional data marts

      • More hardware

    • Business user satisfaction declines as restrictions are placed on:

      • Adhoc query capabilities

      • Volume of historical data that can be queried

      • Time lag between business requirement and system delivery

  • With this particular client, their systems were unable to handle this volume of data so they couldn’t run these queries at all!


Market evolution

Market Evolution

All-purpose RDBMS

Resource intensive, lots of DBA time

Divide and conquer on

lots of hardware (MPP)

Nothing to address underlying issues

Extending Database Concepts

Incremental improvements, still inflexible

Radical New Approach

Hardware Advances

Database Advances

Traditional

Work Harder

Work Smarter

Data Warehouse Innovator

Working Smarter

Innovation


What to look for in a new approach

What to Look for in a New Approach

  • New Approach

  • Leverages column approach

  • Automatically creates structures that:

  • finds needed data

  • responds to all queries,

  • are always ready

  • Has small footprint

  • Uses existing infrastructure

  • Is easy to setup and maintain

Clickstream and log files

0101010101010101010101010101

0101010101010101010101010101

10

101

Existing data warehouse

0101010101010101010101010

0101010101010101010101010

10

01

External Sources

10

0101010101010101010101

0101010101010101010101

101

0

10

1

10

10

10

0

101

01

01

1

10

10

0

01

1

01

10

0

1

0

10

1

10

101

01

0

10

1

01

1

01

1

0

0

01

10

10

0

01

10

101

101

1

01

1

101

01

1

1010101010101010101

10

1

101010101010101010101010

0

10101010101010101010101010101

The Analytic Data Warehouse


A new approach introducing brighthouse

A New Approach: Introducing Brighthouse

Working Smarter, Not Harder

Scalable solution without scaling IT

Clickstream and log files

0101010101010101010101010101

  • Better Analytics

  • Faster Response

  • Decreased IT Burden

  • Smaller Footprint

0101010101010101010101010101

10

101

Existing data warehouse

0101010101010101010101010

0101010101010101010101010

10

01

External Sources

10

0101010101010101010101

0101010101010101010101

101

0

10

1

10

10

10

0

101

01

01

1

10

10

0

01

1

01

10

0

1

0

10

1

10

101

01

0

10

1

1

01

01

1

0

0

01

10

10

0

10

101

101

101

1010101010101010101

1

101010101010101010101010

1

10101010101010101010101010101

The Infobright Analytic Data Warehouse


How brighthouse works smarter

How Brighthouse Works Smarter

  • Smarter architecture:

  • Load data and go

  • No indices or partitions to build / maintain

  • Knowledge Grid created automatically as data loaded

  • Up to 40:1 compression reduces storage

  • Open architecture leverages off-the-shelf hardware

Knowledge Grid—statistics and metadata “describing” the super-compressed data

Data Packs—Data stored in manageably sized, highly compressed data packs

Data compressed using algorithms tailored to data type

Brighthouse


How brighthouse works smarter1

How Brighthouse Works Smarter

Queryreceived by Brighthouse

Optimizeriterates over the Knowledge Grid

Only the data packs needed to resolve the query are decompressed

Often query results can be determined from the Knowledge Grid alone

Knowledge Grid

Data Packs

Brighthouse


Brighthouse is easy on it

Brighthouse is Easy on IT

0101010101010101010101010101

0101010101010101010101010

0101010101010101010101

10

0101010101010101010101

0101010101010101010101010

0101010101010101010101010101

10

10

BI Connectors

10

101

10

10

101

101

10

01

10

10

01

01

1

0

1

0

1

1

10

0

0

1

1

10

0

01

0

10

10

1

01

01

10

1

01

0

01

10

1

10

0

01

101

10

0

10

1

10

0

1

Existing Data Warehouses

Clickstream/Logfiles

  • No strain on IT:

  • No need for physical data modeling

  • Run on standard hardware

  • Works with existing BI and ETL platforms

  • MySQL “wrapper”

  • No need to learn new database system

  • Leverage mature tools

External Sources

ETL Platform Connector


Multi terabyte data warehouses on mysql absolutely

BrightHouse Architecture and MySQL

  • MySQL selected due to:

  • mature connectors, tools, resources

  • interconnectivity and certification with BI Tools

  • commercial OEM license protects our IP

  • most broadly used open-source DB (12 million users).

  • Benefits

  • Greatly improved time to market

  • Development focused on competitive differentiators

  • Sell to MySQL customers experiencing scalability problems


Real life example results with brighthouse

Real Life Example: Results with Brighthouse


Customer query response time results vs oracle

Customer Query Response TimeResults vs. Oracle

Oracle Time: 136 sec

Brighthouse 3.0 Time: 16.8 sec


Query speed as volume of data grows

Query Speed as Volume of Data Grows

Impact of Additional Data on Query Times

Increases as data volume grows

Average Response Time (in Secs)

Brighthouse Performance Advantage

  • Queries were moderately complex, with at least two table joins and two or more where clauses

  • Tables were indexed

  • Response time represents the average response time of queries


Brighthouse load time remains constant

Brighthouse Load Time RemainsConstant

Data Load Times as Volume Increases

Savings in processing time during load over conventional databases

Load Time (in Secs)

Brighthouse load time stays constant

Volume (Rows in Millions)

  • Comparison of load to a single table. Data was loaded in 10 million row chunks

  • Table had a single index


Brighthouse is fast

Brighthouse is Fast

  • Brighthouse is designed specifically to quickly run complex queries on large data sets

    • The Knowledge Grid’s small chunks of highly compressed data are fast and easy to manipulate

    • Knowledge Grid optimizer iteratively optimizes query execution plan

    • Only data packs needed to answer query are opened; Often query results can be determined from the Knowledge Grid alone

  • Users enjoy fast response times no matter how complex or spontaneous their query

“Each month we process and analyze data generated by 20 billion online transactions,”...We are pleased by Brighthouse’s performance and the fact that we now can get answers to questions we want to ask. --Ola Udén, CTO of TradeDoubler


Brighthouse is flexible

Brighthouse is Flexible

  • Brighthouse ensures changing and complex analytic requirements are supported with fast response times

    • Knowledge Grid is built on-the-fly, creating a layer of statistics and metadata across all columns and rows

    • Knowledge Grid obviates the need for Indexing, data partitioning, or other physical data structures

    • Data no longer needs to be off-loaded or archived

  • Users can ask any question of all the data

"Brighthouse allows us to do very complex analyses on over 30 terabytes of data”

-- Jay Webster, General Manager, BlueLithium


Brighthouse is simple

Brighthouse is Simple

  • Brighthouse eliminates the complications, cost and disruption IT teams must endure to support complex queries

    • No DBA resources required to build indices and data partitions in response to user requirements

    • No complicated performance variables to tune; “No Knobs”

    • Leverages MySQL ease of use, connectivity, and supported BI tools

    • Runs on off-the-shelf hardware

  • Reduced complexity frees IT resources and significantly lowers lifetime TCO


How does brighthouse impact tco

How does Brighthouse impact TCO?

  • Hardware footprint 20 to 50 times smaller

  • Fewer DBA resources required

    • 40 – 60% reduction in one-time build

    • Up to 90% reduction in ongoing support

  • Support for existing infrastructure

  • Load and Go

  • Improved SLAs – immediate response vs weeks or months

ETL and

Data Changes

PhysicalModeling

Software

Hardware

Tuning


Prove it

  • Contact Us

  • [email protected]

  • 416.596.2483, x. 225

  • Download Claudia Imhoff paper:

  • http://www.infobright.com

Prove it!

RAPID START:

  • Call us– we’ll walk you through a a few questions to mutually determine if our technology is a good fit.

  • Agree on process– e.g. your place or ours?

  • Load and go – Load your data, run your queries

  • Summarize results – performance, compression, load times

  • Next steps –did we prove it?


  • Login