Jim gray talk at university of tokyo l.jpg
Advertisement
This presentation is the property of its rightful owner.
1 / 40

Jim Gray Talk at University of Tokyo PowerPoint PPT Presentation

Jim Gray Talk at University of Tokyo Personal views on PITAC report: invest in long term research Preview of Turing lecture: 10 long term research problems Bush: Summarize info in cyberspace Turing: Intelligent Computers 7 9s: build systems that are always up and prove it. 5-Minute rule

Download Presentation

Jim Gray Talk at University of Tokyo

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Jim gray talk at university of tokyo l.jpg

Jim GrayTalk at University of Tokyo

  • Personal views on PITAC report: invest in long term research

  • Preview of Turing lecture: 10 long term research problems

    • Bush: Summarize info in cyberspace

    • Turing: Intelligent Computers

    • 7 9s: build systems that are always up and prove it.

  • 5-Minute rule

    • For disks

    • For tapes

  • Sorting Progress

    • PennySort

    • Terabyte Sort (!)

  • Slides will be at http://research.Microsoft.com/~Gray/talks


Slide2 l.jpg

Presidential Advisory Committee onHigh Performance Computing and Communications,Information Technologies, and the Next Generation InternetInformation Technology

http://www.ccic.gov/ac/interim/ or

http://research.microsoft.com/~Gray/papers/PITAC_Interim_Report_8_98.doc


Charter for the committee provide an independent assessment of l.jpg

Charter for the Committee:provide an independent assessment of

  • High-Performance Computing and Communications (HPCC)

    • Progress

    • Balance among research components;

  • Next Generation Internet initiative;

    • Progress

    • Balance

  • IT Research and development

    • Maintain United States leadership in

      • IT and

      • Applications


Committee members l.jpg

Committee Members

  • Co-Chairs:

    • Bill Joy, Sun Microsystems• Ken Kennedy, Rice University

  • Members:

    • Eric Benhamou, 3Com• Vinton Cerf, MCI

    • Ching-chih Chen, Simmons• David Cooper, LLNL

    • Steve Dorfman, Hughes• David Dorman, PointCast

    • Bob Ewald, SGI• David Farber, U. of Pennsylvania

    • Sherri Fuller, U. of Washington• Hector Garcia-Molina, Stanford

    • Susan Graham, UC Berkeley• Jim Gray, Microsoft

    • Danny Hillis, Disney, Inc• John Miller, Montana State Univ.

    • David Nagel, AT&T• Raj Reddy, Carnegie Mellon

    • Ted Shortliffe, Stanford• Larry Smarr, U. of Illinois @ UC

    • Joe Thompson, Miss. State U.• Les Vadasz, Intel

    • Andy Viterbi, Qualcom• Steve Wallach, Centerpoint

    • Irving Wladawsky-Berger, IBM


My summary of the report l.jpg

My Summary of the Report

  • 1/3 of the US economic growth since 1992 was in the IT sector. IT is key to our health, wealth, and safety.

  • Created 400 B$ of wealth in last 3 years (!!)

  • Federal IT research funding of twenty years ago, created the boom.

  • Federal IT research funding for the last decade has been flat (in constant dollars).

  • Research funding is increasingly near-term & applied development

  • The committee recommends Increase long-term research funding in:

    • Software design and implementation technologies

    • Technologies to scale the Next Generation Internet to 6 billion users.

    • Tools, algorithms, and systems for high-performance computing.

  • Spend a billion dollars over the next 5 years on Lewis and Clark style "expeditions" into cyberspace.


Myths l.jpg

Myths

  • Now that IT is a big business, Industry will do long term research.

    FACT:

    industry spends LITTLE on long-term research.

    it is not in their best interest

  • IT research = buy computers for scientists.

    FACT

    computer science research

    is different from

    the application of computers to some discipline.


Research priorities l.jpg

Research Priorities

  • Findings:

    • Total federal Information technology R&D investment is inadequate

    • Federal IT R&D is excessively focused on near-term problems

  • Recommendations:

    • Create a strategic initiative in long-term IT R&D

    • Increase the investment for research in software, scalable information infrastructure, high-end computing, and socio-economic and workforce impacts


Software research l.jpg

Software Research

  • Findings:

    • Demand for software far exceeds the nation’s ability to produce it

    • The nation depends on fragile software

    • Technologies to build reliable and secure software are inadequate

    • The nation is under-investing in fundamental software research

  • Recommendations:

    • Fund more fundamental research in software development methods and component technologies

    • Sponsor a national library of software components

    • Make software research a substantive component of every major IT research initiative

    • Support research in human-computer interfaces and interaction

  • Make fundamental software research an absolute priority


Scalable information infrastructure l.jpg

Scalable Information Infrastructure

  • Findings:

    • The Internet has grown well beyond the intent of its original designers

    • Our nation’s dependence on the information infrastructure is increasing daily

    • We cannot safely extend what we currently know to more complex systems

    • Learning how to build large-scale, highly reliable and secure systems requires research

  • Recommendations:

    • Increase funding in research and development of core software and communications technologies aimed directly at the challenge of scaling the information infrastructure

    • Expand the Next Generation Internet test beds to include additional industry partnerships in order to foster the rapid commercialization and deployment of enabling technologies


High end computing l.jpg

High-End Computing

  • Findings HEC is:

    • essential for science and engineering research

    • an element of the United States national security

    • ripe for new applications

    • suppliers suffer from unusual market pressures

  • Research& Development Recommendations

    • Fund innovative technologies and architectures

    • Fund HEC software (parallel programming)

    • Aim for a real application petaops by 2010 through a both hardware and software strategies

    • Fund HEC systems for science and engineering research


Social economic workforce recommendations l.jpg

Social, Economic, Workforce Recommendations

  • Expand research on the social and economic impacts of information technology diffusion and adoption

  • Expand initiatives to increase IT literacy, access and research capabilities

  • Address the shortage of high-technology workers

  • Programs to re-train “stale” IT workers

  • Encourage participation by women and minorities

  • Short-term increase in immigration of skilled IT workers


Conclusions l.jpg

Conclusions

  • IT is an essential foundation for commerce, education, health care, environmental stewardship, and national security:

    • Dramatically transform the way we communicate, learn, deal with information and conduct research

    • Transform the nature of work, nature of commerce, product design cycle, practice of health care, and the government itself

  • The total Federal IT R&D investment is inadequate

  • The Federal IT R&D is excessively focused on near-term problems

  • U. S. government must:

    • Create a strategic initiative in long-term IT R&D

    • Establish an effective structure for managing and coordinating IT


Jim gray talk at university of tokyo13 l.jpg

Jim GrayTalk at University of Tokyo

  • Personal views on PITAC report: invest in long term research

  • Preview of Turing lecture: 10 long term research problems

    • Bush: Summarize info in cyberspace

    • Turing: Intelligent Computers

    • 7 9s: build systems that are always up and prove it.

  • 5-Minute rule

    • For disks

    • For tapes

  • Sorting Progress

    • PennySort

    • Terabyte Sort (!)

  • Slides will be at http://research.Microsoft.com/~Gray/talks


Vanaveer bush memex l.jpg

Vanaveer Bush: Memex

  • Memex: Proposed putting all information online (1948)

  • It will happen

  • Result: InfoGlut. Too much information in the shoebox

  • Challenge:

    • Organize the information.

    • Give answers as good as an expert in the field.

    • Anticipate questions and so inform “subscriber”

  • Protect personal privacy

    • A hacker cannot get access to your personal information without your consent.


Turing s test 1951 intelligent machines l.jpg

Turing’s Test (1951): Intelligent Machines

  • Computers helped with the 4-color problem end game

  • Computers (and people) won world chess championship

  • Computers will likely be our 5th brain

    • Augment our intelligence

    • See for us, hear for us, read for us,

    • Prosthetic eyes, ears, voices, arms, legs,….

  • Probably computers will be intelligent like plants and animals.

  • Perhaps computers can be intelligent like people

    • Pass the Turing Test (easy/impossible?) (70%, 5 minutes, B can lie)

    • Translating telephone (as good as a human translator)

    • Read a textbook and pass the written exam.

    • Pass a graduate programming class

    • Pass a graduate literature class

  • Radical: Download someone.


Dependable systems l.jpg

Dependable Systems

  • Build a system used by millions of people each day.

  • Then:

    • Prove that it does what it is supposed to do (code matches spec).

    • Prove that it delivers 99.99999% (7 9s) availability (1 hr per millennium)

    • Prove that it cannot be “hacked” for less than 1B$ (Y2K $)

  • Then build the system automatically from the specification.


Jim gray talk at university of tokyo17 l.jpg

Jim GrayTalk at University of Tokyo

  • Personal views on PITAC report: invest in long term research

  • Preview of Turing lecture: 10 long term research problems

    • Bush: Summarize info in cyberspace

    • Turing: Intelligent Computers

    • 7 9s: build systems that are always up and prove it.

  • 5-Minute rule

    • For disks

    • For tapes

  • Sorting Progress

    • PennySort

    • Terabyte Sort (!)

  • Slides will be at http://research.Microsoft.com/~Gray/talks


Storage hierarchy 9 levels l.jpg

Storage Hierarchy (9 levels)

Cache 1, 2

Main (1, 2, 3 if nUMA).

Disk (1 (cached), 2)

Tape (1 (mounted), 2)


Meta message technology ratios are important l.jpg

Meta-Message: Technology Ratios Are Important

  • If everything gets faster & cheaper at the same rate THEN nothing really changes.

  • Things getting MUCH BETTER:

    • communication speed & cost 1,000x

    • processor speed & cost 100x

    • storage size & cost 100x

  • Things staying about the same

    • speed of light (more or less constant)

    • people (10x more expensive)

    • storage speed (only 10x better)


Today s storage hierarchy speed capacity vs cost tradeoffs l.jpg

15

4

10

10

12

2

10

10

9

0

10

10

6

-2

10

10

3

-4

10

10

Today’s Storage Hierarchy : Speed & Capacity vs Cost Tradeoffs

Size vs Speed

Price vs Speed

Cache

Nearline

Tape

Offline

Main

Tape

Disc

Secondary

Online

Online

$/MB

Secondary

Tape

Tape

Disc

Typical System (bytes)

Main

Offline

Nearline

Tape

Tape

Cache

-9

-6

-3

0

3

-9

-6

-3

0

3

10

10

10

10

10

10

10

10

10

10

Access Time (seconds)

Access Time (seconds)


Storage ratios changed l.jpg

Storage Ratios Changed

  • 10x better access time

  • 10x more bandwidth

  • 4,000x lower media price

  • DRAM/DISK 100:1 to 10:10 to 50:1


The 5 minute rule derived l.jpg

M$: cost of a RAM page

RAM $/MB

PageSize x Lifetime

A$: cost of a disk access

Disk Price

AccessesPerSec x Lifetime

RI: Reference Interval

time between accesses to page

Disk access cost A$/RI

M$= A$/RI

M$

Cost of a RAM page

The 5 Minute Rule Derived

$

Breakeven:

M$ = A$ / Reference Interval

Reference Interval = M$/A$

= DiskPrice x PageSize

RAMprice x AccPerSec

Reference Interval =Time


The five minute rule observations l.jpg

The Five Minute Rule Observations

  • Break even has two terms:

    (2) Economic term: DiskPrice / RAM_MB_Price ~ 400:4 = 100:1

    (1) Technology term: PageSize / DiskAccPerSec ~ 8KB : 80 = 100:1

  • Economic term trends down

  • Technology term trends up to compensate.

  • Still at 5 minute for random, 1 minute sequential


Shows best page index page size 16kb l.jpg

Shows Best Page Index Page Size ~16KB


Standard storage metrics l.jpg

Standard Storage Metrics

  • Capacity:

    • RAM: MB and $/MB: today at 10MB & 100$/MB

    • Disk:GB and $/GB: today at 10 GB and 200$/GB

    • Tape: TB and $/TB: today at .1TB and 25k$/TB (nearline)

  • Access time (latency)

    • RAM:100 ns

    • Disk: 10 ms

    • Tape: 30 second pick, 30 second position

  • Transfer rate

    • RAM: 1 GB/s

    • Disk: 5 MB/s - - - Arrays can go to 1GB/s

    • Tape: 5 MB/s - - - striping is problematic


New storage metrics kaps maps scan l.jpg

New Storage Metrics: Kaps, Maps, SCAN?

  • Kaps:How many KB objects served per second

    • The file server, transaction processing metric

    • This is the OLD metric.

  • Maps:How many MB objects served per sec

    • The Multi-Media metric

  • SCAN: How long to scan all the data

    • The data mining and utility metric

  • And

    • Kaps/$, Maps/$, TBscan/$


Slide27 l.jpg

For the Record (good 1998 devices packaged in systemhttp://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdf)

X 14


Slide28 l.jpg

For the Record (good 1998 devices packaged in systemhttp://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdf)

X 14


How to get lots of maps scans l.jpg

How To Get Lots of Maps, SCANs

  • parallelism: use many little devices in parallel

  • Beware of the media myth

  • Beware of the access time myth

At 10 MB/s: 1.2 days to scan

1,000 x parallel: 100 seconds SCAN.

Parallelism: divide a big problem into many smaller ones to be solved in parallel.


The disk farm on a card l.jpg

The Disk Farm On a Card

The 1 TB disc card

An array of discs

Can be used as

100 discs

1 striped disc

10 Fault Tolerant discs

....etc

LOTS of accesses/second

bandwidth

14"

  • Life is cheap, its the accessories that cost ya.

  • Processors are cheap, it’s the peripherals that cost ya

    • (a 10k$ disc card).


Tape farms for tertiary storage not mainframe silos l.jpg

Tape Farms for Tertiary StorageNot Mainframe Silos

100 robots

1M$

50TB

50$/GB

3K Maps

10K$ robot

14 tapes

27 hr Scan

500 GB

5 MB/s

20$/GB

Scan in 27 hours.

many independent tape robots

(like a disc farm)

30 Maps


Tape optical beware of the media myth l.jpg

Tape & Optical: Beware of the Media Myth

Optical is cheap: 200 $/platter

2 GB/platter

=> 100$/GB (2x cheaper than disc)

Tape is cheap:30 $/tape

20 GB/tape

=> 1.5 $/GB (100x cheaper than disc).


Tape optical reality media is 10 of system cost l.jpg

Tape & Optical Reality: Media is 10% of System Cost

  • Tape needs a robot (10 k$ ... 3 m$ )

  • 10 ... 1000 tapes (at 20GB each) => 20$/GB ... 200$/GB

    • (1x…10x cheaper than disc)

  • Optical needs a robot (100 k$ )

  • 100 platters = 200GB ( TODAY ) => 400 $/GB

    • ( more expensive than mag disc )

  • Robots have poor access times

  • Not good for Library of Congress (25TB)

  • Data motel: data checks in but it never checks out!


  • The access time myth l.jpg

    The Access Time Myth

    The Myth: seek or pick time dominates

    The reality: (1) Queuing dominates

    (2) Transfer dominates BLOBs

    (3) Disk seeks often short

    Implication: many cheap servers better than one fast expensive server

    • shorter queues

    • parallel transfer

    • lower cost/access and cost/byte

      This is now obvious for disk arrays

      This will be obvious for tape arrays


    Jim gray talk at university of tokyo35 l.jpg

    Jim GrayTalk at University of Tokyo

    • Personal views on PITAC report: invest in long term research

    • Preview of Turing lecture: 10 long term research problems

      • Bush: Summarize info in cyberspace

      • Turing: Intelligent Computers

      • 7 9s: build systems that are always up and prove it.

    • 5-Minute rule

      • For disks

      • For tapes

    • Sorting Progress

      • PennySort

      • Terabyte Sort (!)

    • Slides will be at http://research.Microsoft.com/~Gray/talks


    Penny sort ground rules http research microsoft com barc sortbenchmark l.jpg

    Penny Sort Ground Ruleshttp://research.microsoft.com/barc/SortBenchmark

    • How much can you sort for a penny.

      • Hardware and Software cost

      • Depreciated over 3 years

      • 1M$ system gets about 1 second,

      • 1K$ system gets about 1,000 seconds.

      • Time (seconds) = SystemPrice ($) / 946,080

    • Input and output are disk resident

    • Input is

      • 100-byte records (random data)

      • key is first 10 bytes.

    • Must create output file and fill with sorted version of input file.

    • Daytona (product) and Indy (special) categories


    Pennysort l.jpg

    PennySort

    • Hardware

      • 266 Mhz Intel PPro

      • 64 MB SDRAM (10ns)

      • Dual Fujitsu DMA 3.2GB EIDE disks

    • Software

      • NT workstation 4.3

      • NT 5 sort

    • Performance

      • sort 15 M 100-byte records (~1.5 GB)

      • Disk to disk

      • elapsed time 820 sec

        • cpu time = 404 sec


    Sort speed doubles every year l.jpg

    Sort Speed Doubles Every Year

    ?

    ?h

    ?


    Recent results l.jpg

    Recent Results

    • NOW Sort: 9 GB on a cluster of 100 UltraSparcs in 1 minute

    • MilleniumSort: 16x Dell NT cluster: 100 MB in 1.8 Sec (Datamation)

    • Tandem/Sandia Sort: 68 CPU ServerNet 1 TB in 47 minutes

    • Rumor of IBM Sort: 7000 cpu Blue Pacific 1 TB in 1024 seconds (17 minutes). 10 Mrps (1GBps)


    Jim gray talk at university of tokyo40 l.jpg

    Jim GrayTalk at University of Tokyo

    • Personal views on PITAC report: invest in long term research

    • Preview of Turing lecture: 10 long term research problems

      • Bush: Summarize info in cyberspace

      • Turing: Intelligent Computers

      • 7 9s: build systems that are always up and prove it.

    • 5-Minute rule

      • For disks

      • For tapes

    • Sorting Progress

      • PennySort

      • Terabyte Sort (!)

    • Slides will be at http://research.Microsoft.com/~Gray/talks


  • Login