Computer technology forecast
Download
1 / 50

Computer Technology Forecast - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

Computer Technology Forecast. Jim Gray Microsoft Research [email protected] http://~research.Microsoft.com/~Gray. Reality Check. Good news In the limit, processing & storage & network is free Processing & network is infinitely fast Bad news Most of us live in the present.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Computer Technology Forecast' - paul


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Computer technology forecast
ComputerTechnology Forecast

Jim Gray

Microsoft Research

[email protected]

http://~research.Microsoft.com/~Gray


Reality check
Reality Check

  • Good news

    • In the limit, processing & storage & network is free

    • Processing & network is infinitely fast

  • Bad news

    • Most of us live in the present.

    • People are getting more expensive.Management/programming cost exceeds hardware cost.

    • Speed of light not improving.

    • WAN prices have not changed much in last 8 years.


Interesting topics
Interesting Topics

  • I’ll talk about server-side hardware

  • What about client hardware?

    • Displays, cameras, speech,….

  • What about Software?

    • Databases, data mining, PDB, OODB

    • Objects / class libraries …

    • Visualization

    • Open Source movement


How much information is there

Yotta

Zetta

Exa

Peta

Tera

Giga

Mega

Kilo

How Much Information Is there?

Everything!

Recorded

  • Soon everything can be recorded and indexed

  • Most data never be seen by humans

  • Precious Resource: Human attentionAuto-Summarization Auto-Searchis key technology.www.lesk.com/mlesk/ksg97/ksg.html

All BooksMultiMedia

All LoC books

(words)

.Movie

A Photo

A Book

24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli


Moore s law
Moore’s Law

  • Performance/Price doubles every 18 months

  • 100x per decade

  • Progress in next 18 months = ALL previous progress

    • New storage = sum of all old storage (ever)

    • New processing = sum of all old processing.

  • E. coli double ever 20 minutes!

15 years ago


Trends ops s had three growth phases
Trends: ops/s/$ Had Three Growth Phases

1890-1945

Mechanical

Relay

7-year doubling

1945-1985

Tube, transistor,..

2.3 year doubling

1985-2000

Microprocessor

1.0 year doubling


What s a balanced system
What’s a Balanced System?

System Bus

PCI Bus

PCI Bus


Storage capacity beating moore s law
Storage capacity beating Moore’s law

  • 5 k$/TB today (raw disk)


Cheap storage
Cheap Storage

  • Disks are getting cheap:

  • 7 k$/TB disks (25 40 GB disks @ 230$ each)


Cheap storage or balanced system

2x800 Mhz

256 MB

Cheap Storage or Balanced System

  • Low cost storage (2 x 1.5k$ servers) 7K$ TB2x (1K$ system + 8x60GB disks + 100MbEthernet)

  • Balanced server (7k$/.5 TB)

    • 2x800Mhz (2k$)

    • 256 MB (400$)

    • 8 x 60 GB drives (3K$)

    • Gbps Ethernet + switch (1.5k$)

    • 14k$ TB, 28K$/RAIDED TB


The absurd disk
The “Absurd” Disk

  • 2.5 hr scan time (poor sequential access)

  • 1 aps / 5 GB (VERY cold data)

  • It’s a tape!

1 TB

100 MB/s

200 Kaps


Hot swap drives for archive or data interchange
Hot Swap Drives for Archive or Data Interchange

  • 25 MBps write(so can write N x 60 GB in 40 minutes)

  • 60 GB/overnite

    = ~N x 2 MB/second

    @ 19.95$/nite

17$

260$


240 gb 2k now 300 gb by year end
240 GB, 2k$ (now)300 GB by year end.

  • 4x60 GB IDE(2 hot plugable)

    • (1,100$)

  • SCSI-IDE bridge

    • 200k$

  • Box

    • 500 Mhz cpu

    • 256 MB SRAM

    • Fan, power, Enet

    • 700$

  • Or 8 disks/box600 GB for ~3K$ ( or 300 GB RAID)


Hot swap drives for archive or data interchange1
Hot Swap Drives for Archive or Data Interchange

  • 25 MBps write(so can write N x 74 GB in 3 hours)

  • 74 GB/overnite

    = ~N x 2 MB/second

    @ 19.95$/nite


It s hard to archive a petabyte it takes a long time to restore it
It’s Hard to Archive a PetabyteIt takes a LONG time to restore it.

  • At 1GBps it takes 12 days!

  • Store it in two (or more) places online (on disk?).A geo-plex

  • Scrub it continuously (look for errors)

  • On failure,

    • use other copy until failure repaired,

    • refresh lost copy from safe copy.

  • Can organize the two copies differently (e.g.: one by time, one by space)


Disk vs tape

Disk

60 GB

30 MBps

5 ms seek time

3 ms rotate latency

7$/GB for drive 3$/GB for ctlrs/cabinet

4 TB/rack

1 hour scan

Tape

40 GB

10 MBps

10 sec pick time

30-120 second seek time

2$/GB for media8$/GB for drive+library

10 TB/rack

1 week scan

Disk vs Tape

Guestimates

Cern: 200 TB

3480 tapes

2 col = 50GB

Rack = 1 TB

=20 drives

The price advantage of tape is narrowing, and

the performance advantage of disk is growing

At 10K$/TB, disk is competitive with nearline tape.


Trends gilder s law 3x bandwidth year for 25 more years
Trends: Gilder’s Law: 3x bandwidth/year for 25 more years

  • Today:

    • 10 Gbps per channel

    • 4 channels per fiber: 40 Gbps

    • 32 fibers/bundle = 1.2 Tbps/bundle

  • In lab 3 Tbps/fiber (400 x WDM)

  • In theory 25 Tbps per fiber

  • 1 Tbps = USA 1996 WAN bisection bandwidth

  • Aggregate bandwidth doubles every 8 months!

1 fiber = 25 Tbps


Sense of scale
Sense of scale

300 MBps OC48 = G2

Or

memcpy()

  • How fat is your pipe?

  • Fattest pipe on MS campus is the WAN!

20 MBps disk / ATM / OC3

94 MBps Coast to Coast

90 MBps PCI


Redmond/Seattle, WA

Information Sciences Institute

Microsoft

Qwest

University of Washington

Pacific Northwest Gigapop

HSCC (high speed connectivity consortium)

DARPA

New York

Arlington, VA

San Francisco, CA

5626 km

10 hops


The path
The Path

DC -> SEA

C:\tracert -d 131.107.151.194

Tracing route to 131.107.151.194 over a maximum of 30 hops

0 ------- DELL 4400 Win2K WKS

Arlington Virginia, ISI Alteon GbE

1 16 ms <10 ms <10 ms 140.173.170.65 ------- Juniper M40 GbE

Arlington Virginia, ISI Interface ISIe

2 <10 ms <10 ms <10 ms 205.171.40.61 ------- Cisco GSR OC48

Arlington Virginia, Qwest DC Edge

3 <10 ms <10 ms <10 ms 205.171.24.85 ------- Cisco GSR OC48

Arlington Virginia, Qwest DC Core

4 <10 ms <10 ms 16 ms 205.171.5.233 ------- Cisco GSR OC48

New York, New York, Qwest NYC Core

5 62 ms 63 ms 62 ms 205.171.5.115 ------- Cisco GSR OC48

San Francisco, CA, Qwest SF Core

6 78 ms 78 ms 78 ms 205.171.5.108 ------- Cisco GSR OC48

Seattle, Washington, Qwest Sea Core

7 78 ms 78 ms 94 ms 205.171.26.42 ------- Juniper M40 OC48 Seattle, Washington, Qwest Sea Edge

8 78 ms 79 ms 78 ms 208.46.239.90 ------- Juniper M40 OC48

Seattle, Washington, PNW Gigapop

9 78 ms 78 ms 94 ms 198.48.91.30 ------- Cisco GSR OC48

Redmond Washington, Microsoft

10 78 ms 78 ms 94 ms 131.107.151.194 ------- Compaq SP750 Win2K WKS

Redmond Washington, Microsoft SysKonnect GbE


Petabumps
“ PetaBumps”

  • 751 mbps for 300 seconds = (~28 GB)

    single-thread single-stream tcp/ip desktop-to-desktop out of the box performance*

  • 5626 km x 751Mbps = ~ 4.2e15 bit meter / second ~ 4.2 Peta bmps

  • Multi-steam is 952 mbps~5.2 Peta bmps

  • 4470 byte MTUs were enabled on all routers.

  • 20 MB window size


The promise of san via 10x in 2 years http www viarch org
The Promise of SAN/VIA:10x in 2 years http://www.ViArch.org/

  • Yesterday:

    • 10 MBps (100 Mbps Ethernet)

    • ~20 MBps tcp/ip saturates 2 cpus

    • round-trip latency ~250 µs

  • Now

    • Wires are 10x faster Myrinet, Gbps Ethernet, ServerNet,…

    • Fast user-level communication

      • tcp/ip ~ 100 MBps 10% cpu

      • round-trip latency is 15 us

  • 1.6 Gbps demoed on a WAN


Pointers
Pointers

  • The single-stream submission: http://research.microsoft.com/~gray/papers/Windows2000_I2_land_Speed_Contest_Entry_(Single_Stream_mail).htm

  • The multi-stream submission: http://research.Microsoft.com/~gray/papers/

    Windows2000_I2_land_Speed_Contest_Entry_(Multi_Stream_mail).htm

  • The code: http://research.Microsoft.com/~gray/papers/speedy.htm speedy.h speedy.cAnd a PowerPoint presentation about it. http://research.Microsoft.com/~gray/papers/ Windows2000_WAN_Speed_Record.ppt


Networking
Networking

  • WANS are getting faster than LANSG8 = OC192 = 8Gbps is “standard”

  • Link bandwidth improves 4x per 3 years

  • Speed of light (60 ms round trip in US)

  • Software stackshave always been the problem.

Time = SenderCPU + ReceiverCPU + bytes/bandwidth

This has been the problem


Rules of thumb in data engineering
Rules of Thumb in Data Engineering

  • Moore’s law -> an address bit per 18 months.

  • Storage grows 100x/decade (except 1000x last decade!)

  • Disk data of 10 years ago now fits in RAM (iso-price).

  • Device bandwidth grows 10x/decade – so need parallelism

  • RAM:disk:tape price is 1:10:30 going to 1:10:10

  • Amdahl’s speedup law: S/(S+P)

  • Amdahl’s IO law: bit of IO per instruction/second(tBps/10 top! 50,000 disks/10 teraOP: 100 M$ Dollars)

  • Amdahl’s memory law: byte per instruction/second (going to 10)(1 TB RAM per TOP: 1 TeraDollars)

  • PetaOps anyone?

  • Gilder’s law: aggregate bandwidth doubles every 8 months.

  • 5 Minute rule: cache disk data that is reused in 5 minutes.

  • Web rule: cache everything!

    http://research.Microsoft.com/~gray/papers/MS_TR_99_100_Rules_of_Thumb_in_Data_Engineering.doc


Dealing with terabytes petabytes requires parallelism

1,000 x parallel:

100 seconds scan.

At 10 MB/s:

1.2 days to scan

Use 100 processors &

1,000 disks

Dealing With TeraBytes (Petabytes):Requires Parallelism

parallelism: use many little devices in parallel


Parallelism must be automatic
Parallelism Must Be Automatic

  • There are thousands of MPI programmers.

  • There are hundreds-of-millions of people using parallel database search.

  • Parallel programming is HARD!

  • Find design patterns and automate them.

  • Data search/mining has parallel design patterns.


Scalability up and out

Up

  • “Scale Up”

    • Use “big iron” (SMP)

    • Cluster into packs for availability

  • “Scale Out” clones & partitions

    • Use commodity servers

    • Add clones & partitions as needed

Out

Scalability: Up and Out


Everyone scales out what s the brick
Everyone scales outWhat’s the Brick?

  • 1M$/slice

    • IBM S390?

    • Sun E 10,000?

  • 100 K$/slice

    • HPUX/AIX/Solaris/IRIX/EMC

  • 10 K$/slice

    • Utel / Wintel 4x

  • 1 K$/slice

    • Beowulf / Wintel 1x


Terminology for scaleability

Farm

Partition

Clone

Pack

Shared

Nothing

Shared

Disk

Shared

Nothing

Active-Active

Active-Passive

Terminology for scaleability

  • Farms of servers:

    • Clones: identical

      • Scaleability + availability

    • Partitions:

      • Scaleability

    • Packs

      • Partition availability via fail-over

  • GeoPlex for disaster tolerance.


Farm

Partition

Clone

Pack

Shared Nothing Clones

Shared Disk Clones

Shared

Nothing

Shared

Disk

Shared

Nothing

Partitions

Packed Partitions

Active-Active

Active-Passive


Unpredictable growth
Unpredictable Growth

  • The TerraServer Story:

    • We expected 5 M hits per day

    • We got 50 M hits on day 1

    • We peak at 15-20 M hpd on a “hot” day

    • Average 5 M hpd after 1 year

  • Most of us cannot predict demand

    • Must be able to deal with NO demand

    • Must be able to deal with HUGE demand


An architecture for internet services
An Architecture for Internet Services?

  • Need to be able to add capacity

    • New processing

    • New storage

    • New networking

  • Need continuous service

    • Online change of all components (hardware and software)

    • Multiple service sites

    • Multiple network providers

  • Need great development tools

    • Change the application several times per year.

    • Add new services several times per year.


Premise each site is a farm
Premise: Each Site is a Farm

  • Buy computing by the slice (brick):

    • Rack of servers + disks.

  • Grow by adding slices

    • Spread data and computation to new slices

  • Two styles:

    • Clones: anonymous servers

    • Parts+Packs: Partitions fail over within a pack

  • In both cases, remote farm for disaster recovery


Clones availability scalability
Clones: Availability+Scalability

  • Some applications are

    • Read-mostly

    • Low consistency requirements

    • Modest storage requirement (less than 1TB)

  • Examples:

    • HTML web servers (IP sprayer/sieve + replication)

    • LDAP servers (replication via gossip)

  • Replicate app at all nodes (clones)

  • Spray requests across nodes.

  • Grow by adding clones

  • Fault tolerance: stop sending to that clone.

  • Growth: add a clone.


Two clone geometries

Shared Nothing Clones

Shared Disk Clones

Two Clone Geometries

  • Shared-Nothing: exact replicas

  • Shared-Disk (state stored in server)


Facilities clones need
Facilities Clones Need

  • Automatic replication

    • Applications (and system software)

    • Data

  • Automatic request routing

    • Spray or sieve

  • Management:

    • Who is up?

    • Update management & propagation

    • Application monitoring.

  • Clones are very easy to manage:

    • Rule of thumb: 100’s of clones per admin


Partitions for scalability
Partitions for Scalability

  • Clones are not appropriate for some apps.

    • Statefull apps do not replicate well

    • high update rates do not replicate well

  • Examples

    • Email / chat / …

    • Databases

  • Partition state among servers

  • Scalability (online):

    • Partition split/merge

    • Partitioning must be transparent to client.


Partitioned clustered apps
Partitioned/Clustered Apps

  • Mail servers

    • Perfectly partitionable

  • Business Object Servers

    • Partition by set of objects.

  • Parallel Databases

    • Transparent access to partitioned tables

    • Parallel Query


Packs for availability
Packsfor Availability

  • Each partition may fail (independent of others)

  • Partitions migrate to new node via fail-over

    • Fail-over in seconds

  • Pack: the nodes supporting a partition

    • VMS Cluster

    • Tandem Process Pair

    • SP2 HACMP

    • Sysplex™

    • WinNT MSCS (wolfpack)

  • Cluster In A Box now commodity

  • Partitions typically grow in packs.


What parts packs need
What Parts+Packs Need

  • Automatic partitioning (in dbms, mail, files,…)

    • Location transparent

    • Partition split/merge

    • Grow without limits (100x10TB)

  • Simple failover model

    • Partition migration is transparent

    • MSCS-like model for services

  • Application-centric request routing

  • Management:

    • Who is up?

    • Automatic partition management (split/merge)

    • Application monitoring.


Partitions and packs

Partitions

Packed Partitions

Partitions and Packs

  • Packs for availabilty


Geoplex farm pairs
GeoPlex: Farm pairs

  • Two farms

  • Changes from one sent to other

  • When one farm failsother provides service

  • Masks

    • Hardware/Software faults

    • Operations tasks (reorganize, upgrade move

    • Environmental faults (power fail)


Services on clones partitions
Services on Clones & Partitions

  • Application provides a set of services

  • If cloned:

    • Services are on subset of clones

  • If partitioned:

    • Services run at each partition

  • System load balancing routes request to

    • Any clone

    • Correct partition.

    • Routes around failures.


Cluster scenarios 3 tier systems

Clones for availability

Packs for availability

Load Balance

Web Clients

Cluster Scenarios: 3- tier systems

A simple web site

SQL Database

Web File Store

SQL Temp State

Front End


Cluster scale out scenarios

Packed Partitions: Database Transparency

SQL Partition 3

SQL Partition 2

SQL Partition1

SQL Database

replication

Web File StoreB

Cloned

Packed file

servers

The FARM: Clones and Packs of Partitions

Cluster Scale Out Scenarios

Web File StoreA

SQL Temp State

ClonedFront Ends(firewall, sprayer, web server)

Web Clients

Load Balance


Terminology

Farm

Partition

Clone

Pack

Shared

Nothing

Shared

Disk

Shared

Nothing

Active-Active

Active-Passive

Terminology

  • Terminology for scaleability

  • Farms of servers:

    • Clones: identical

      • Scaleability + availability

    • Partitions:

      • Scaleability

    • Packs

      • Partition availability via fail-over

  • GeoPlex for disaster tolerance.


What we have been doing with sdss
What we have been doing with SDSS

  • Helping move the data to SQL

    • Database design

    • Data loading

  • Experimenting with queries on a 4 M object DB

    • 20 questions like “find gravitational lens candidates”

    • Queries use parallelism, most run in a few seconds.(auto parallel)

    • Some run in hours (neighbors within 1 arcsec)

    • EASY to ask questions.

  • Helping with an “outreach” website: SkyServer

  • Personal goal: Try datamining techniques to “re-discover” Astronomy


References doc or pdf
References (.doc or .pdf)

  • Technology forecast: http://research.microsoft.com/~gray/papers/MS_TR_99_100_Rules_of_Thumb_in_Data_Engineering.doc

  • Gbps experiments:http://research.microsoft.com/~gray/

  • Disk experiments (10K$ TB)http://research.microsoft.com/~gray/papers/Win2K_IO_MSTR_2000_55.doc

  • Scaleability Terminologyhttp://research.microsoft.com/~gray/papers/MS_TR_99_85_Scalability_Terminology.doc


ad