World’s Largest
Download
1 / 37

largeDB - PowerPoint PPT Presentation


  • 247 Views
  • Uploaded on

World’s Largest Databases. Howard Fosdick (630)-279-4286. (C) 2004 FCI. Hands-on DBA (and SA) for … Oracle, DB2, SQL Server Unix, Linux, Windows Founder IDUG, MWDUG, CAMP Author, Speaker. Who Am I?. Independent Contractor (630)-279-4286 [email protected]

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'largeDB' - EllenMixel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

World’s Largest

Databases

Howard Fosdick

(630)-279-4286

(C) 2004 FCI


Slide2 l.jpg

Who Am I?

Independent Contractor

(630)-279-4286

[email protected]


Slide3 l.jpg

Outline

1. What’s a “Big Database”

2. DSS

3. OLTP

4. Observations


Slide4 l.jpg

1. Winter Corp.

  • -- Database Top Ten

  • -- Yearly survey

    -- Vendor neutral

    -- Free at: www.wintercorp.com

    2. Survey.com

    -- High-End BI/DW Competitive Analysis

    -- Survey of 150 companies w/ big warehouses

    -- Free at: www.survey.com

Statistics Sources

“Thank You” to both sources


Slide5 l.jpg

Classifying Large Databases

DSS

OLTP

Decision Support Systems (DSS)

Online Analytical Processing (OLAP)

Data Warehouses (DW)

Multi-dimensional Databases (MDD)

+ Query oriented, mainly Read-only

Online Transaction Processing (OLTP)

+ Update with short transactions

(transaction= small CPU & data resources)

Commercial IT vs. Scientific/Research databases


Slide6 l.jpg

Database Size

- User data

- User data plus metadata & indexes

- DASD farm

Users

- Concurrent users

- Total user population

Load

- Concurrent queries

- Queries / day or hour

(simple vs complex queries)

What’s a Large Database ?

VLDB = Very Large Database

Good definitions and measurements are key to success



Slide8 l.jpg

Data Warehouses VS. Data Marts

DW

DM

  • Application neutral

  • Service multiple organizational needs

  • Application specific

  • Organizationally focused

Largest systems are usually data warehouses


Slide9 l.jpg

!!!!! Super Big Groceries !!!!!

Preferred Customer Card #283736

Hello, I’m Scot94

03/04/04 02:38 3284 03 2918 33

Store 493 Loc 229

PRETTY-LADY HAIRCLR 1 5.99

AARP MAGAZINE 1 4.95

DIAPERS 2 10.00

BEER SIX-PACK 1 3.45

Tax 2.40 BAL 36.79

Cash 40.00

Change 3.21

Save this Receipt –

Get $2.00 off on Prozac

When You Buy Super-Baby Food !

Web Sites --

- Clickstream data

Retail --

- Transaction Level Detail (TLD)

What’s Driving the Growth of

Large Data Warehouses ?

Understanding customer

behavior means $$$ !


Slide10 l.jpg

  • Necessary Preconditions --

  • Cheap Hardware

  • Higher reliability / availability

  • (based on dynamic hardware swapping)

  • Better Software

  • Lax privacy laws in USA

    • EU curtails cross-usage of data

    • EU has stronger privacy laws

What’s Driving the Growth of

Large Data Warehouses ?


Slide11 l.jpg

World’s Largest DSS Systems

  • Way bigger than just 3 years ago

  • All Unix “mainframes”

  • All use SANs (Storage Area Networks) (aka ESS)

  • No IBM Mainframes

  • No Windows or Wintel

  • No SQL Server

  • No Linux or Open Source databases

  • NCR/Teradata niche market at 2.7% (Gartner 05/28/03)

  • Goodbye Informix!

©2003 Winter Corp.

Database Size =

disk storage for

user tables,

indices, aggregates


Slide12 l.jpg

Storage Area Network

Unix “mainframe”

Large DSS Systems

Sun E12/15K

HP Superdome

IBM Regatta

Query

Users

EMC

Hitachi

HP

LSI

  • Unix “mainframes” –

    • + Dynamically add/drop CPUs, RAM

      • (Sun calls it partitioning)

    • + High reliability

    • (as good as clusters or Mainframes)

    • + Capacity on Demand

  • SANs –

    • + Flash (“snap”) backup

      • (OS-level backup)

    • + Large Cache

    • + Intelligent data

    • placement/movement


Slide13 l.jpg

Example Evolution

– Scaling a Unix “Mainframe”

35 concurrent

users

25 concurrent

users

64 CPUs

@ 64 Gig RAM

12 concurrent

users

Other upgrades:

Oracle 8i -> 9i

Sun E10K -> E12K

32 CPUs

@ 64 Gig RAM

8 CPUs

@ 16 Gig RAM


Slide14 l.jpg

World’s Largest DSS Systems -- Windows

©2003 Winter Corp.

  • Way smaller than Unix systems

  • Way bigger than just 3 years ago

  • Oracle vs SQL Server (like market share battle for Windows DBMSs)

  • Also use SANs (Storage Area Networks)

  • No IBM DB2 UDB

  • No Teradata


Slide15 l.jpg

©2003 Winter Corp.

World’s Largest DSS Systems

-- By Peak Workload

©2003 Winter Corp.


Slide16 l.jpg

Where did IBM Mainframes Go ?

1994

2004

Big

Iron

Big

Silicon

Poof!

  • -- Goodbye…

    • -- Largest databases

    • -- Smaller mainframes (VM, VSE)

  • -- Reliability advantage eroded

  • -- High cost per CPU

  • +Hello Linux !

  • + Good for --

  • + Consolidation platform

    • + Legacy systems

    • + Virtualization

    • (multi-OS platform)


Slide17 l.jpg

Oracle Rising

  • Joined the Top Ten list 3 to 5 years ago

  • 8i added essential DSS technologies ...

    • + Partitions

    • + New ROW ID (for bigger databases)

    • + Thorough Parallelism (DML, DDL, utilities)

    • + Index improvements

    • (bit mapped IXs, function-based, desc, others)

    • + Resource Manager (proactive)

    • + Materialized Views

    • + Large memory mgmt

    • + Optimizer is Partition-aware

    • + Online DDL operations and Utilities


Slide18 l.jpg

Amazon

Best Buy

Colgate

Telecom Italia Mobile

System

HP Superdome

Sun 15K

IBM p690 Regatta

HP AlphaServer

Architecture

SMP

SMP

SMP

Cluster

Storage

EMC

EMC

IBM

EMC

Processors

64

24

24

2 node cluster

Oracle Version

9i

8i

9i

8i

DB Size

13 T

6.3 T

3.8 T

16 T

Number of Tables

600

4025

27,000

1,200

Detail Data

Clickstream data

Sales Transaction data

Varied detail data

Call detail records

User Population

800

16,000

6,200

400

Concurrent Users

55-60

600-700

600-700

55

DBAs

2

2

n/a

3

Peak Workload

4300 queries / day

150,000 queries /

4 hour period

14,200 steps /

day

700 M records loaded / day

Example Oracle Warehouses

©2003

Winter

Corp.


Slide19 l.jpg

Why Not Oracle Clustering ?

  • + Great for non-disruptive scaling of existing systems

  • . . . But the biggest systems tend not to use it

  • -- Unix “mainframe” no longer requires clustering

  • for reliability, availability or easy scalability

  • -- Clustering means complexity in minimizing the…

  • -- Locking issues

9i improved this via Cache Fusion –

but SMP Unix “mainframe” will still be favored


Slide20 l.jpg

Where’s SQL Server 2000 ?

  • Big in OLTP but lacks essential DSS technologies ...

    • -- Parallelism restricted to SELECTs

      • -- Needs it for other DML, DDL, utilities

    • -- Partitions

    • -- Wintel restriction

Yukon ?

(Features = partitioning, database mirroring, mirrored backups, online Indexing & Restore, fast recovery,

ANSI 1999 T-SQL, CLR support, native XML, XML Query, better .NET support,

Reporting Services, Service Broker (async messaging), extensible data types…)

-- Many new features. . . ready for “Top Ten” DSS ?


Slide21 l.jpg

Where’s Open Source ?

  • Linux

  • + 2.6 kernel now out

  • + More CPUs (to 16)

  • + More RAM (> 4+ Gig)

  • + Better threading, file system support

  • MySQL and PostgresQL

    • -- Top out at 500,000 page views per day (EWeek 2003)

    • (or 15 per second)

    • + Improving rapidly

Prediction – open source will support big databases

but not “Top Ten” list sites


Slide22 l.jpg

Risks of Large DWs

  • 40% of IT projects fail due to … Management (time&budgetissues)

  • “Large warehouses are unforgiving”-- Survey.com

  • Design issues critical

    • Database Design

    • Query design (and EXPLAINs)

    • ETL design and scheduling

  • Pre-program wherever possible

  • (control users and the resources they use)

  • Monitoring and alerts

  • Scale gradually (staggered loads on a schedule…)

  • Benchmarks (after each Scaling Point)


Slide23 l.jpg

Risks of Large DWs

  • Partitioning data properly is critical

    • For better physical management (utilities)

    • Optimizers use this info

    • Parallelism via multiple partitions

  • How to partition

    • Depends on data usage

    • Examples: geographical, hash, unique id, ranges…



Slide25 l.jpg

World’s Largest OLTP Systems

©2003 Winter Corp.

  • Wintel “mainframes” arrive !

  • SQL Server arrives

  • Use SANs

  • CA can do the job (but has tiny overall database market share)

  • Oracle has big systems -- but not in the top ten


Slide26 l.jpg

World’s Largest OLTP Systems

-- Unix -- Windows

©2003

Winter Corp.

©2003

Winter Corp.


Slide27 l.jpg

World’s Largest OLTP Systems

-- By Number of Rows

©2003

Winter Corp.

©2003

Winter Corp.


Slide28 l.jpg

OLTP Observations

  • Wintel “mainframes” w/ SQL Server displace MVS/CICS

  • SQL Server dominates Wintel OLTP

    • Great for pre-programmed, resource-limited txns

  • Oracle dominates Unix OLTP



Slide30 l.jpg

Architectures

Shared-disk

Clusters

Shared-nothing

(Massively Parallel Processing or MPP)

Large SMP

“mainframe”

The “architectural debate” means

far less than it used to !


Slide31 l.jpg

Product:

Architecture:

Implementation:

DB2 UDB for z/OS

Shared-disk clustering

DB2 Data Sharing on Sysplex

DB2 UDB for LUW

Shared nothing

DB2 UDB ESE partitioning feature

Oracle

Shared-disk clustering

or SMP

Real Application Clusters (RAC)-- previously known as

Oracle Parallel Server (OPS)

SQL Server 2000

Shared nothing

or SMP

Customer-developed partitioning based on SQL Server features

Teradata

Shared nothing

Teradata on NCR MPP

Vendor Architectures


Slide32 l.jpg

DBMS Licensing Costs

+ Low-cost SQL Server supports the

biggest OLTP systems

-- Pressure on Teradata to keep its niche

+ Open Source DBMSs have a role

but it’s not “Top Ten” databases

Teradata

$$$$$

Oracle

DB2 UDB

Biggest DSS

Systems

Biggest OLTP

Systems

SQL Server 2000

Open Source

(MySQL, PostgreSQL)

Database pricing varies by the options

selected and by the deal an IT organization

cuts with the vendor.

Your mileage may vary!

$

TCO ?


Slide33 l.jpg

DW Labor Costs

© 2002 Survey.com

  • Like TCO, Labor Costs may be an un-measurable …

  • Figures applicable across sites ?

  • Every vendor claims lowest labor costs

  • “Terabytes perDBA” may be non-linear!

  • 1 or 2 DBAs for a 24/7 site ?

  • Development staff will be larger than Maintenance staff

  • Your mileage will vary


Slide34 l.jpg

Trend !

Multi- Machine Mixed Systems

Sabre /

Travelocity

45 Linux w/

MySQL servers

17 Himalaya

Non-stop w/

Master database

(Fare look-up

and routing)

(Transactional updates)

EWeek, 2/23/04


Slide35 l.jpg

Trend !

Multi- Machine Mixed Systems

Omaha

Steaks

* 50,000 to 68,000 daily sessions

* 1 year in Production / 8 Million sessions

DB2

17 Linux w/

MySQL servers

(Shopping cart)

ISeries

(Transactional

updates)

EWeek 2003


Slide36 l.jpg

Trends !

Conclusions

  • Databases are growing exponentially

  • IT is closing in on Scientific/Research databases

  • “Multiple machine” mixed systems are becoming popular

  • (Monolithic central databases are no longer the only game in town)

  • “Mixed use” databases are becoming more common

    • Multiple applications

    • Read and update

  • Open Source supports large systems -- but not “Top Ten”

  • VLDBs are instructive – but unique in some ways


Slide37 l.jpg

?

?

?

?

?

questions...

?

?

?

?


ad