Scaleable computing jim gray microsoft corporation gray@microsoft com
Download
1 / 46

Scaleable Computing Jim Gray Microsoft Corporation [email protected] - PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on

™. Scaleable Computing Jim Gray Microsoft Corporation [email protected] Thesis: Scaleable Servers. Scaleable Servers Commodity hardware allows new applications New applications need huge servers Clients and servers are built of the same “stuff” Commodity software and Commodity hardware

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Scaleable Computing Jim Gray Microsoft Corporation [email protected]' - kylee-wallace


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Scaleable computing jim gray microsoft corporation gray@microsoft com

Scaleable ComputingJim GrayMicrosoft [email protected]


Thesis scaleable servers
Thesis: Scaleable Servers

  • Scaleable Servers

    • Commodity hardware allows new applications

    • New applications need huge servers

    • Clients and servers are built of the same “stuff”

      • Commodity software and

      • Commodity hardware

  • Servers should be able to

    • Scale up (grow node by adding CPUs, disks, networks)

    • Scale out (grow by adding nodes)

    • Scale down (can start small)

  • Key software technologies

    • Objects, Transactions, Clusters, Parallelism


1987 256 tps benchmark
1987: 256 tps Benchmark

  • 14 M$ computer (Tandem)

  • A dozen people

  • False floor, 2 rooms of machines

Admin expert

Hardware experts

A 32 node processor array

Auditor

Network expert

Simulate 25,600 clients

Manager

Performance expert

OS expert

DB expert

A 40 GB disk array (80 drives)


1988 db2 cics mainframe 65 tps
1988: DB2 + CICS Mainframe65 tps

  • IBM 4391

  • Simulated network of 800 clients

  • 2m$ computer

  • Staff of 6 to do benchmark

2 x 3725

network controllers

Refrigerator-sized

CPU

16 GB disk farm

4 x 8 x .5GB


1997 10 years later 1 person and 1 box 1250 tps
1997: 10 years later1 Person and 1 box = 1250 tps

  • 1 Breadbox ~ 5x 1987 machine room

  • 23 GB is hand-held

  • One person does all the work

  • Cost/tps is 1,000x less25 micro dollars per transaction

4x200 Mhz cpu

1/2 GB DRAM

12 x 4GB disk

Hardware expert

OS expert

Net expert

DB expert

App expert

3 x7 x 4GB

disk arrays


What happened

mainframe

mini

price

micro

time

What Happened?

  • Moore’s law: Things get 4x better every 3 years(applies to computers, storage, and networks)

  • New Economics: Commodityclass price/mips software $/mips k$/yearmainframe 10,000 100 minicomputer 100 10microcomputer 10 1

  • GUI: Human - computer tradeoffoptimize for people, not computers


What happens next

?

performance

1985

1995

2005

What Happens Next

  • Last 10 years: 1000x improvement

  • Next 10 years: ????

  • Today: text and image servers are free 25 m$/hit => advertising pays for them

  • Future:video, audio, … servers are free“You ain’t seen nothing yet!”


Kinds of information processing
Kinds Of Information Processing

Point-to-point

Broadcast

Lecture

Concert

Conversation

Money

Network

Immediate

Book

Newspaper

Mail

Time-shifted

Database

It’s ALL going electronic

Immediate is being stored for analysis (so ALL database)

Analysis and automatic processing are being added


Why put everything in cyberspace
Why Put EverythingIn Cyberspace?

Point-to-point

OR

broadcast

Low rent -

min $/byte

Shrinks time -

now or later

Shrinks space -

here or there

Automate processing -

knowbots

Network

Immediate OR time-delayed

Locate

Process

Analyze

Summarize

Database


Magnetic storage cheaper than paper
Magnetic Storage Cheaper Than Paper

  • File cabinet: cabinet (four drawer) 250$ paper (24,000 sheets) 250$ space (2x3 @ 10$/ft2) 180$ total 700$ 3¢/sheet

  • Disk: disk (4 GB =) 800$ ASCII: 2 mil pages 0.04¢/sheet (80x cheaper)

  • Image: 200,000 pages 0.4¢/sheet (8x cheaper)

  • Store everything on disk


Databases information at your fingertips information network knowledge navigator
DatabasesInformation at Your Fingertips™ Information Network™Knowledge Navigator™

  • All information will be in anonline database (somewhere)

  • You might record everything you

    • Read: 10MB/day, 400 GB/lifetime(eight tapes today)

    • Hear: 400MB/day, 16 TB/lifetime(three tapes/year today)

    • See: 1MB/s, 40GB/day, 1.6 PB/lifetime (maybe someday)


Database store all data types

People

Name

Address

David

NY

Mike

Berk

Won

Austin

Database StoreALL Data Types

  • The old world:

    • Millions of objects

    • 100-byte objects

  • The new world:

    • Billions of objects

    • Big objects (1 MB)

    • Objects have behavior (methods)

  • Paperless office

  • Library of Congress online

  • All information online

    • Entertainment

    • Publishing

    • Business

  • WWW and Internet

People

Name

Voice

Address

Papers

Picture

NY

David

Mike

Berk

Won

Austin


Billions of clients
Billions Of Clients

  • Every device will be “intelligent”

  • Doors, rooms, cars…

  • Computing will be ubiquitous


Billions of clients need millions of servers
Billions Of ClientsNeed Millions Of Servers

  • All clients networked to servers

    • May be nomadicor on-demand

  • Fast clients wantfaster servers

  • Servers provide

    • Shared Data

    • Control

    • Coordination

    • Communication

Clients

Mobileclients

Fixedclients

Servers

Server

Super

server


Thesis many little beat few big

3

1 MM

10 nano-second ram

10 microsecond ram

10 millisecond disc

10 second tape archive

ThesisMany little beat few big

$1 million

$10 K

$100 K

Pico Processor

Micro

Nano

10 pico-second ram

1 MB

Mini

Mainframe

10

0

MB

1

0 GB

1

TB

1

00 TB

1.8"

2.5"

3.5"

5.25"

1 M SPECmarks, 1TFLOP

106 clocks to bulk ram

Event-horizon on chip

VM reincarnated

Multiprogram cache,

On-Chip SMP

9"

14"

  • Smoking, hairy golf ball

  • How to connect the many little parts?

  • How to program the many little parts?

  • Fault tolerance?


Future super server 4t machine

CPU

50 GB Disc

5 GB RAM

Future Super Server:4T Machine

  • Array of 1,000 4B machines

    • 1 bps processors

    • 1 BB DRAM

    • 10 BB disks

    • 1 Bbps comm lines

    • 1 TB tape robot

  • A few megabucks

  • Challenge:

    • Manageability

    • Programmability

    • Security

    • Availability

    • Scaleability

    • Affordability

  • As easy as a single system

Cyber Brick

a 4B machine

Future servers are CLUSTERS

of processors, discs

Distributed database techniques

make clusters work


The hardware is in place and then a miracle occurs
The Hardware Is In Place…And then a miracle occurs

?

  • SNAP: scaleable networkand platforms

  • Commodity-distributedOS built on:

    • Commodity platforms

    • Commodity networkinterconnect

  • Enables parallel applications


Thesis scaleable servers1
Thesis: Scaleable Servers

  • Scaleable Servers

    • Commodity hardware allows new applications

    • New applications need huge servers

    • Clients and servers are built of the same “stuff”

      • Commodity software and

      • Commodity hardware

  • Servers should be able to

    • Scale up (grow node by adding CPUs, disks, networks)

    • Scale out (grow by adding nodes)

    • Scale down (can start small)

  • Key software technologies

    • Objects, Transactions, Clusters, Parallelism


Scaleable servers both smp and cluster
Scaleable ServersBOTH SMP And Cluster

Grow up with SMP; 4xP6is now standard

Grow out with cluster

Cluster has inexpensive parts

SMP superserver

Departmentalserver

Personalsystem

Clusterof PCs


Smps have advantages
SMPs Have Advantages

  • Single system image easier to manage, easier to program threads in shared memory, disk, Net

  • 4x SMP is commodity

  • Software capable of 16x

  • Problems:

    • >4 not commodity

    • Scale-down problem (starter systems expensive)

  • There is a BIGGEST one

SMP superserver

Departmentalserver

Personalsystem


Building the largest node

1-TB home page

www.SQL.1TB.com

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

TM

1-TB SQL Server DBSatellite and aerial

photos

Supportfiles

Building the Largest Node

  • There is a biggest node (size grows over time)

  • Today, with NT, it is probably 1TB

  • We are building it(with help from DEC and SPIN2)

    • 1 TB GeoSpatial SQL Server database

    • (1.4 TB of disks = 320 drives).

    • 30K BTU, 8 KVA, 1.5 metric tons.

  • Will put it on the Web as a demo app.

  • 10 meter image of the ENTIRE PLANET.

  • 2 meter image of interesting parts (2% of land)One pixel per meter = 500 TB uncompressed.

  • Better resolution in US (courtesy of USGS).


What s terabyte
What’s TeraByte?

  • 1 Terabyte:

    1,000,000,000 business letters 150 miles of book shelf

    100,000,000 book pages 15 miles of book shelf

    50,000,000 FAX images 7 miles of book shelf

    10,000,000 TV pictures (mpeg) 10 days of video 4,000 LandSat images 16 earth images (100m)

    100,000,000 web page 10 copies of the web HTML

  • Library of Congress (in ASCII) is 25 TB

    1980: $200 million of disc 10,000 discs

    $5 million of tape silo 10,000 tapes

    1997: $200 k$ of magnetic disc 48 discs

    $30 k$ nearline tape 20 tapes

    Terror Byte !



Tpc c web based benchmarks
Tpc-C Web-Based Benchmarks

  • Client is a Web browser (7,500 of them!)

  • Submits

    • Order

    • Invoice

    • Query to server via Web page interface

  • Web server translates to DB

  • SQL does DB work

  • Net:

    • easy to implement

    • performance is GREAT!

HTTP

IIS

= Web

ODBC

SQL


Grow up and out

SMP superserver

Departmentalserver

Personalsystem

Grow UP and OUT

1 Terabyte DB

  • Cluster:

    • a collection of nodes

    • as easy to program and manage as a single node

1 billion transactions per day


Clusters have advantages
Clusters Have Advantages

  • Clients and servers made from the same stuff

  • Inexpensive:

    • Built with commodity components

  • Fault tolerance:

    • Spare modules mask failures

  • Modular growth

    • Grow by adding small modules

  • Unlimited growth: no biggest one


Windows nt clusters
Windows NT Clusters

  • Microsoft & 60 vendors defining NT clusters

    • Almost all big hardware and software vendors involved

  • No special hardware needed - but it may help

  • Fault-tolerant first, scaleable second

    • Microsoft, Oracle, SAP giving demos today

  • Enables

    • Commodity fault-tolerance

    • Commodity parallelism (data mining, virtual reality…)

    • Also great for workgroups!


Billion transactions per day project
Billion Transactions per DayProject

  • Building a 20-node Windows NT Cluster (with help from Intel)> 800 disks

  • All commodity parts

  • Using SQL Server & DTC distributed transactions

  • Each node has 1/20 th of the DB

  • Each node does 1/20 th of the work

  • 15% of the transactions are “distributed”


How much is 1 billion transactions per day
How Much Is 1 Billion Transactions Per Day?

  • 1 Btpd = 11,574 tps (transactions per second)~ 700,000 tpm (transactions/minute)

  • AT&T

    • 185 million calls (peak day worldwide)

  • Visa ~20 M tpd

    • 400 M customers

    • 250,000 ATMs worldwide

    • 7 billion transactions / year (card+cheque) in 1994

Millions of transactions per day

1,000.

100.

10.

Mtpd

1.

0.1

AT&T

Visa

BofA

NYSE

1 Btpd


Parallelism the other aspect of clusters
ParallelismThe OTHER aspect of clusters

  • Clusters of machines allow two kinds of parallelism

    • Many little jobs: online transaction processing

      • TPC-A, B, C…

    • A few big jobs: data search and analysis

      • TPC-D, DSS, OLAP

  • Both give automatic parallelism


Kinds of parallel execution
Kinds of Parallel Execution

Any

Any

Sequential

Sequential

Pipeline

Program

Program

Partition

outputs split N ways

inputs merge M ways

Any

Any

Sequential

Sequential

Program

Program

Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey


Partitioned execution
Partitioned Execution

Spreads computation and IO among processors

Partitioned data gives

NATURAL parallelism

Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey


N x m way parallelism
N x M way Parallelism

N inputs, M outputs, no bottlenecks.

Partitioned Data

Partitioned and Pipelined Data Flows

Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey


The parallel law of computing

1,000 MIPS

32 $

1 MIPS

1 $

.03$/MIPS

The Parallel Law Of Computing

  • Grosch's Law:

  • Parallel Law:

  • Needs:

    • Linear speedup and linear scale-up

    • Not always possible

2x $ is 4x performance

2x $ is2x performance

1,000 MIPS

1,000 $

1 MIPS 1 $


Thesis scaleable servers2
Thesis: Scaleable Servers

  • Scaleable Servers

    • Commodity hardware allows new applications

    • New applications need huge servers

    • Clients and servers are built of the same “stuff”

      • Commodity software and

      • Commodity hardware

  • Servers should be able to

    • Scale up (grow node by adding CPUs, disks, networks)

    • Scale out (grow by adding nodes)

    • Scale down (can start small)

  • Key software technologies

    • Objects, Transactions, Clusters, Parallelism


The big picture components and transactions
The BIG PictureComponents and transactions

  • Software modules are objects

  • Object Request Broker (a.k.a., Transaction Processing Monitor) connects objects(clients to servers)

  • Standard interfaces allow software plug-ins

  • Transaction ties execution of a “job” into an atomic unit: all-or-nothing, durable, isolated

Object Request Broker


Linking and embedding objects are data modules transactions are execution modules
Linking And EmbeddingObjects are data modules;transactions are execution modules

  • Link: pointer to object somewhere else

    • Think URL in Internet

  • Embed: bytesare here

  • Objects may be active; can callback to subscribers


Objects meet databases the basis for universal data servers access integration

Database

Spreadsheet

Photos

Mail

Map

Document

Objects Meet DatabasesThe basis for universaldata servers, access, & integration

  • object-oriented (COM oriented) programming interface to data

  • Breaks DBMS into components

  • Anything can be a data source

  • Optimization/navigation “on top of” other data sources

  • A way to componentized a DBMS

  • Makes an RDBMS and O-RDBMS (assumes optimizer understands objects)

DBMS

engine


The three tiers

Web Client

HTML

VB Java

plug-ins

VBscritpt

JavaScrpt

Middleware

ORB

TP Monitor

Web Server...

Object

server

Pool

VB or Java

Script Engine

VB or Java

Virt Machine

HTTP+

DCOM

ORB

Internet

DCOM (oleDB, ODBC,...)

LU6.2

Legacy

Gateways

IBM

The Three Tiers

Object & Data

server.


Server side objects easy server side execution
Server Side ObjectsEasy Server-Side Execution

  • Give simple execution environment

  • Object gets

    • start

    • invoke

    • shutdown

  • Everything else is automatic

  • Drag & Drop Business Objects

A Server

Network

Receiver

Queue

Management

Connections

Security

Context

Configuration

Thread Pool

Service logic

Synchronization

Shared Data


A new programming paradigm
A new programming paradigm

  • Develop object on the desktop

  • Better yet: download them from the Net

  • Script work flows as method invocations

  • All on desktop

  • Then, move work flows and objects to server(s)

  • Gives

    • desktop development

    • three-tier deployment

    • Software Cyberbricks


Transactions coordinate components acid
Transactions Coordinate Components (ACID)

  • Transaction properties

    • Atomic: all or nothing

    • Consistent: old and new values

    • Isolated: automatic locking or versioning

    • Durable: once committed, effects survive

    • Transactions are built into modern OSs

      • MVS/TM Tandem TMF, VMS DEC-DTM, NT-DTC


Transactions objects
Transactions & Objects

  • Application requests transaction identifier (XID)

  • XID flows with method invocations

  • Object Managers join (enlist)in transaction

  • Distributed Transaction Manager coordinates commit/abort


Distributed transactions enable huge throughput
Distributed Transactions Enable Huge Throughput

  • Each node capable of 7 KtmpC (7,000 active users!)

  • Can add nodes to cluster (to support 100,000 users)

  • Transactions coordinate nodes

  • ORB / TP monitor spreads work among nodes


Distributed transactions enable huge dbs
Distributed Transactions Enable Huge DBs

  • Distributed database technology spreads data among nodes

  • Transaction processing technology manages nodes


Thesis scaleable servers3
Thesis: Scaleable Servers

  • Scaleable Servers Built from Cyberbricks

    • Allow new applications

  • Servers should be able to

    • Scale up, out, down

  • Key software technologies

    • Clusters (ties the hardware together)

    • Parallelism: (uses the independent cpus, stores, wires

    • Objects (software CyberBricks)

    • Transactions: masks errors.


ad