the rebirth of database machines l.
Download
Skip this Video
Download Presentation
The Rebirth of Database Machines

Loading in 2 Seconds...

play fullscreen
1 / 35

The Rebirth of Database Machines - PowerPoint PPT Presentation


  • 142 Views
  • Uploaded on

The Rebirth of Database Machines. Dina Bitton Jim Gray. Outline. Active Disks are coming Disk Tutorial (not presented, but slides in deck) Disk Arms are important (optimize them) The Rebirth of Database Machines. Disks of 30 Years Ago. 10 MB Failed every few weeks Cost more than 400$.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Rebirth of Database Machines' - tiva


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the rebirth of database machines

The Rebirth ofDatabase Machines

Dina Bitton

Jim Gray

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

outline
Outline
  • Active Disks are coming
  • Disk Tutorial (not presented, but slides in deck)
  • Disk Arms are important (optimize them)
  • The Rebirth of Database Machines

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

disks of 30 years ago
Disks of 30 Years Ago
  • 10 MB
  • Failed every few weeks
  • Cost more than 400$

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

disk arrays
Disk Arrays
  • 24 cpus
  • 384 disks
  • More mips in the disks than in the cpus

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

year 2003 disks
Year 2003 Disks
  • Big disk (10 $/GB)
    • 3”
    • 200 GB
    • 150 kaps (k accesses per second)
    • 30 MBps sequential
  • Small disk (20 $/GB)
    • 2”
    • 40 GB
    • 100 kaps
    • 20 MBps sequential
  • Both running DBMS, Mail, Web, and OS

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

slide6
From CMU Active Disk web sitehttp://www.pdl.cs.cmu.edu/Active/

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

research problem when every disk is a super computer and there are thousands of them
Research Problem: When every disk is a super-computer…And there are thousands of them...
  • Who manages data placement?
  • Query plans among 1,000 severs?
  • How does
    • mirroring work?
    • backup work?
  • Where does my program run?

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

relevant university research on active disks
Relevant University Research on Active Disks
  • Kim Keeton & Dave Patterson @ UC Berkeleyhttp://www.cs.berkeley.edu/~pattrsn/talks/sigmod98-keynote.ppt
  • Erik Riedel & Garth Gibson @ CMUhttp://www.pdl.cs.cmu.edu/Active/
  • Mike Franklin @ U Marylandhttp://www.cs.umd.edu/projects/bdisk
  • Anurag Acharya, Mustafa Uysal @ UC SBhttp://www.cs.ucsb.edu/TRs/TRCS98-06.html

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

outline9
Outline
  • Active Disks are coming
  • Disk Tutorial (not presented, but slides in deck)
  • Disk Arms are important
  • The Rebirth of Database Machines

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

disk access time
Disk Access Time
  • Access time = SeekTime 6 ms + RotateTime 3 ms + ReadTime 1 ms
  • Rotate time:
    • 5,000 to 10,000 rpm
      • ~ 12 to 6 milliseconds per rotation
      • ~ 6 to 3 ms rotational latency

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

disk access time improves slowly
Disk Access Time Improves Slowly
  • Access time = SeekTime 6 ms 8%/y + RotateTime 3 ms 8%/y + ReadTime 1 ms 40%/y
  • Other useful facts:
    • Power rises more than size3 (small is indeed beautiful)
    • Small devices are more rugged
    • Small devices can use plastics (forces are much smaller)e.g. bugs fall without breaking anything

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

disk seek time
Disk Seek Time
  • Seek time is ~ Sqrt(distance)(distance = 1/2 acceleration x time2)
  • Specs assume seek is 1/3 of disk
  • Short seeks are common. (over 50% are zero length)
  • Typical 1/3 seek time: 6 ms
  • 4x improvement in 20 years.

Full Stop

Full Accelerate

speed

time

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

disk access ratios have changed
Disk Access Ratios Have Changed
  • Key metrics: $/GB Kaps/GB (KB accesses per second per GB) SCAN: time to scan the disk
  • Scan going from minutes to days
  • Disk arms are precious resource (disk capacity is no longer the precious resource)Kaps/GB went from 500 to 7 and going to 1

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

stripe for more bandwidth
Stripe For More Bandwidth
  • N-stores have N-times the bandwidth
  • Works great!
  • Supported by most file systems

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

mirrors replicate stores for availability
Mirrors: Replicate Stores for Availability
  • Read one, write all
  • If one fails, rebuild from survivor
  • Run scrubber in background to fix faults
  • N-replicas can give N-times the bandwidth
  • UnAvailabity ~

A Million Years!!!

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

raid5 parity saves storage space
RAID5: Parity Saves Storage Space
  • Mirrors: 50% storage overhead
    • read one, write both
  • RAID5: 12% Storage overhead:
    • read one, write one plus parity

PARITY

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

interesting fact mirrored disks optimize disk arms
Interesting Fact: Mirrored Disks Optimize Disk Arms
  • Doubles read bandwidthSequential: Read stagger reads from each drive (stripe) Random: Read closest armseek is min seek.
  • Doubles write cost (write both)
    • Write time increases becauseseek is max seek.

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

if mix reads writes mirror is better than partition
If Mix Reads & WritesMirror is Better Than Partition
  • 2 servers are better than one
  • Benefit is better than 2x write cost if reads  writes

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

what if you have lots of disks
What if you have LOTS of Disks
  • When you have BIG disks (200 GB), arms are precious, space is cheap.
  • If you replicate 1000x
    • write seek time asymptotically approaches 1.7x
    • read seek time asymptotically approaches zero.

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

outline20
Outline
  • Active Disks are coming
  • Disk Tutorial (not presented, but slides in deck)
  • Disk Arms are important
  • The Rebirth of Database Machines

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

the rebirth of database machines21

The Rebirth of Database Machines

Dina Bitton Jim Gray

IDS Microsoft

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

outline22
Performance hungry databases

History: life and death of database machines

What has changed that can make database machines work today

Shared-Nothing Database Machine

Where is the required bandwidth

DMP : Shared-Nothing & Shared-Everything

Outline

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

demand for database performance
Larger Databases:

marketing data warehouses: TB of historical data

daily news broadcasts: 1 TB of searchable video/audio data

Large Scans: Searches require access to large fraction of database

Repeated Scans: DSS queries, Data mining algorithms

Demand for Database Performance

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

life death reincarnation
Life, Death & Reincarnation

Database Machines are coming, Database Machines are coming ... (Hsiao 1979)

Then there was Britton-Lee, Direct, ICL …

Teradata builds highly-parallel shared-nothing SQL server

many university “paper” designs

“Database Machines, An Idea whose time has Passed?” (Boral- DeWitt 1983)

Then there was MMDBs, Grace, Gamma and more Teradata

Then there was Software (Parallel Database Query)

Next: PDQ + lots of disks with power controllers

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

and all along
And All Along

Stonebraker’s Opinion:

“The history of DBMS research is littered with innumerable proposals to construct hardware database machines to provide high performance operations. In general these have been proposed by hardware types with a clever solution in searchof a problem on which it might work.”

Readings in Database Systems, Morgan-Kaufmann

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

why not then but yes now
Why Not then, but Yes now
  • Too early: small databases on 1 diskTB databases span thousands disks, need partitioning
  • Disk filter designs: addressed only small part of DBMS requirementsdisk controllers are fast computers
  • Exotic technologies (bubbles, CCD…) went away
  • Special purpose hardware increased design time and costHigher level of integration,VLSI design tools better
  • Parallel query processing was not well-understoodLarge body of research, successful commercial implementations

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

parallel query processing dewitt gray cacm91
Parallel Query Processing[DeWitt-Gray CACM91]

Pipelining

data streams flow from one operator to the next

Partitioning

tables are partitioned to allow concurrent processing on partitions

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

data pathway contention patterson sigmod 1998
Data Pathway Contention[Patterson Sigmod 1998]

Diskexternal I/O bus bottleneck to transfer rate, cost

Networkinternal I/O bus interface is bottleneck to delivered bandwidth

Memory-Processorprocessor-memory interface (cache+memory bus) is bottleneck to delivered bandwidth

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

a shared nothing database machine
A Shared-Nothing Database Machine

Scalable Interconnect

Processor

&

Memory

Processor

&

Memory

Processor

&

Memory

Processor

&

Memory

. . .

No contention in memory access or parallel disk access

=> “Embarrassingly Parallel” Scan [Patterson]

But: how fast need Interconnect be?

Each processor has own OS, communication protocols,DB instance

Exchange data streams for pipelining ops, for sort, merge

Can’t support M:N mapping between disks & threads

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

share everything
Share-Everything?
  • Need more bandwidth for shipping data streams than network can provide
  • Need M:N mapping from disks to processors for sort/merge
  • Control & synchronization: Data-flow best to synchronize processors

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

where to get the bandwidth
Where to Get the Bandwidth?

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

the data manipulation platform
The Data Manipulation Platform

To Host Computer

To other DMP Boards via high-speed switch

  • Massive Parallel Operation data-flow control
  • M:N thread-to-disk

I/O interface adapter

Bus adapter

. . .

. . .

NP 1

NP 2

NP 16

P 4

P 1

RFM

BAM

RAM

Direct processor to disk access

Direct disk to memory connect

Direct connection

...

1

80

DMP BOARD

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

slide33

Select sum(tabX.amount*.08), tabY.region

from tabX,tabY

where tabX.key=tabY.region

group by tabY.region,

order by tabY.region;

A DSS Query Execution Plan

Exchange 5

Sort 1

Sort 2

Exchange 4

Group 1

Group 2

1/10 grouped

Exchange 3

Temp Disks

1/10 joined

HJoin

HJoin

HJoin

Exchange 1

Exchange 2

1/3 selected

Scan tabX

1

Scan tabX

2

Scan tabX

32

Scan tabY

1

Scan tabY

3

. . .

. . .

. . .

. . .

2

32

1

3

1

Database Disks

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

slide34

Bandwidth Requirements

Exchange 5

Sort 1

Sort 2

2.1 MB/s

Exchange 4

Group 1

Group 2

21 MB/s

Exchange 3

Temp DiskContention

HJoin

HJoin

HJoin

Exchange 1

Exchange 2

210 MB/s

Scan tabX

1

Scan tabX

2

Scan tabX

32

Scan tabY

1

Scan tabY

3

. . .

. . .

. . .

Database Disks

2

32

1

3

1

32*20MB/s= 640 MB/s

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

conclusion
Conclusion

DMP: shared-nothing and shared-everything

IT ISN’T THAT YOU CAN’T SHARE

IT IS WHERE YOU SHARE

ON A CHIP

ON A BOARD

ON A NETWORK

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt