jim gray talk at university of tokyo l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
PowerPoint Presentation
Download Presentation

Loading in 2 Seconds...

play fullscreen
1 / 40

- PowerPoint PPT Presentation


  • 501 Views
  • Uploaded on

Jim Gray Talk at University of Tokyo Personal views on PITAC report: invest in long term research Preview of Turing lecture: 10 long term research problems Bush: Summarize info in cyberspace Turing: Intelligent Computers 7 9s: build systems that are always up and prove it. 5-Minute rule

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - Ava


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
jim gray talk at university of tokyo
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report: invest in long term research
  • Preview of Turing lecture: 10 long term research problems
    • Bush: Summarize info in cyberspace
    • Turing: Intelligent Computers
    • 7 9s: build systems that are always up and prove it.
  • 5-Minute rule
    • For disks
    • For tapes
  • Sorting Progress
    • PennySort
    • Terabyte Sort (!)
  • Slides will be at http://research.Microsoft.com/~Gray/talks
slide2

Presidential Advisory Committee onHigh Performance Computing and Communications,Information Technologies, and the Next Generation InternetInformation Technology

http://www.ccic.gov/ac/interim/ or

http://research.microsoft.com/~Gray/papers/PITAC_Interim_Report_8_98.doc

charter for the committee provide an independent assessment of
Charter for the Committee:provide an independent assessment of
  • High-Performance Computing and Communications (HPCC)
    • Progress
    • Balance among research components;
  • Next Generation Internet initiative;
    • Progress
    • Balance
  • IT Research and development
    • Maintain United States leadership in
      • IT and
      • Applications
committee members
Committee Members
  • Co-Chairs:
    • Bill Joy, Sun Microsystems • Ken Kennedy, Rice University
  • Members:
    • Eric Benhamou, 3Com • Vinton Cerf, MCI
    • Ching-chih Chen, Simmons • David Cooper, LLNL
    • Steve Dorfman, Hughes • David Dorman, PointCast
    • Bob Ewald, SGI • David Farber, U. of Pennsylvania
    • Sherri Fuller, U. of Washington • Hector Garcia-Molina, Stanford
    • Susan Graham, UC Berkeley • Jim Gray, Microsoft
    • Danny Hillis, Disney, Inc • John Miller, Montana State Univ.
    • David Nagel, AT&T • Raj Reddy, Carnegie Mellon
    • Ted Shortliffe, Stanford • Larry Smarr, U. of Illinois @ UC
    • Joe Thompson, Miss. State U. • Les Vadasz, Intel
    • Andy Viterbi, Qualcom • Steve Wallach, Centerpoint
    • Irving Wladawsky-Berger, IBM
my summary of the report
My Summary of the Report
  • 1/3 of the US economic growth since 1992 was in the IT sector. IT is key to our health, wealth, and safety.
  • Created 400 B$ of wealth in last 3 years (!!)
  • Federal IT research funding of twenty years ago, created the boom.
  • Federal IT research funding for the last decade has been flat (in constant dollars).
  • Research funding is increasingly near-term & applied development
  • The committee recommends Increase long-term research funding in:
    • Software design and implementation technologies
    • Technologies to scale the Next Generation Internet to 6 billion users.
    • Tools, algorithms, and systems for high-performance computing.
  • Spend a billion dollars over the next 5 years on Lewis and Clark style "expeditions" into cyberspace.
myths
Myths
  • Now that IT is a big business, Industry will do long term research.

FACT:

industry spends LITTLE on long-term research.

it is not in their best interest

  • IT research = buy computers for scientists.

FACT

computer science research

is different from

the application of computers to some discipline.

research priorities
Research Priorities
  • Findings:
    • Total federal Information technology R&D investment is inadequate
    • Federal IT R&D is excessively focused on near-term problems
  • Recommendations:
    • Create a strategic initiative in long-term IT R&D
    • Increase the investment for research in software, scalable information infrastructure, high-end computing, and socio-economic and workforce impacts
software research
Software Research
  • Findings:
    • Demand for software far exceeds the nation’s ability to produce it
    • The nation depends on fragile software
    • Technologies to build reliable and secure software are inadequate
    • The nation is under-investing in fundamental software research
  • Recommendations:
    • Fund more fundamental research in software development methods and component technologies
    • Sponsor a national library of software components
    • Make software research a substantive component of every major IT research initiative
    • Support research in human-computer interfaces and interaction
  • Make fundamental software research an absolute priority
scalable information infrastructure
Scalable Information Infrastructure
  • Findings:
    • The Internet has grown well beyond the intent of its original designers
    • Our nation’s dependence on the information infrastructure is increasing daily
    • We cannot safely extend what we currently know to more complex systems
    • Learning how to build large-scale, highly reliable and secure systems requires research
  • Recommendations:
    • Increase funding in research and development of core software and communications technologies aimed directly at the challenge of scaling the information infrastructure
    • Expand the Next Generation Internet test beds to include additional industry partnerships in order to foster the rapid commercialization and deployment of enabling technologies
high end computing
High-End Computing
  • Findings HEC is:
    • essential for science and engineering research
    • an element of the United States national security
    • ripe for new applications
    • suppliers suffer from unusual market pressures
  • Research& Development Recommendations
    • Fund innovative technologies and architectures
    • Fund HEC software (parallel programming)
    • Aim for a real application petaops by 2010 through a both hardware and software strategies
    • Fund HEC systems for science and engineering research
social economic workforce recommendations
Social, Economic, Workforce Recommendations
  • Expand research on the social and economic impacts of information technology diffusion and adoption
  • Expand initiatives to increase IT literacy, access and research capabilities
  • Address the shortage of high-technology workers
  • Programs to re-train “stale” IT workers
  • Encourage participation by women and minorities
  • Short-term increase in immigration of skilled IT workers
conclusions
Conclusions
  • IT is an essential foundation for commerce, education, health care, environmental stewardship, and national security:
    • Dramatically transform the way we communicate, learn, deal with information and conduct research
    • Transform the nature of work, nature of commerce, product design cycle, practice of health care, and the government itself
  • The total Federal IT R&D investment is inadequate
  • The Federal IT R&D is excessively focused on near-term problems
  • U. S. government must:
    • Create a strategic initiative in long-term IT R&D
    • Establish an effective structure for managing and coordinating IT
jim gray talk at university of tokyo13
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report: invest in long term research
  • Preview of Turing lecture: 10 long term research problems
    • Bush: Summarize info in cyberspace
    • Turing: Intelligent Computers
    • 7 9s: build systems that are always up and prove it.
  • 5-Minute rule
    • For disks
    • For tapes
  • Sorting Progress
    • PennySort
    • Terabyte Sort (!)
  • Slides will be at http://research.Microsoft.com/~Gray/talks
vanaveer bush memex
Vanaveer Bush: Memex
  • Memex: Proposed putting all information online (1948)
  • It will happen
  • Result: InfoGlut. Too much information in the shoebox
  • Challenge:
    • Organize the information.
    • Give answers as good as an expert in the field.
    • Anticipate questions and so inform “subscriber”
  • Protect personal privacy
    • A hacker cannot get access to your personal information without your consent.
turing s test 1951 intelligent machines
Turing’s Test (1951): Intelligent Machines
  • Computers helped with the 4-color problem end game
  • Computers (and people) won world chess championship
  • Computers will likely be our 5th brain
    • Augment our intelligence
    • See for us, hear for us, read for us,
    • Prosthetic eyes, ears, voices, arms, legs,….
  • Probably computers will be intelligent like plants and animals.
  • Perhaps computers can be intelligent like people
    • Pass the Turing Test (easy/impossible?) (70%, 5 minutes, B can lie)
    • Translating telephone (as good as a human translator)
    • Read a textbook and pass the written exam.
    • Pass a graduate programming class
    • Pass a graduate literature class
  • Radical: Download someone.
dependable systems
Dependable Systems
  • Build a system used by millions of people each day.
  • Then:
    • Prove that it does what it is supposed to do (code matches spec).
    • Prove that it delivers 99.99999% (7 9s) availability (1 hr per millennium)
    • Prove that it cannot be “hacked” for less than 1B$ (Y2K $)
  • Then build the system automatically from the specification.
jim gray talk at university of tokyo17
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report: invest in long term research
  • Preview of Turing lecture: 10 long term research problems
    • Bush: Summarize info in cyberspace
    • Turing: Intelligent Computers
    • 7 9s: build systems that are always up and prove it.
  • 5-Minute rule
    • For disks
    • For tapes
  • Sorting Progress
    • PennySort
    • Terabyte Sort (!)
  • Slides will be at http://research.Microsoft.com/~Gray/talks
storage hierarchy 9 levels
Storage Hierarchy (9 levels)

Cache 1, 2

Main (1, 2, 3 if nUMA).

Disk (1 (cached), 2)

Tape (1 (mounted), 2)

meta message technology ratios are important
Meta-Message: Technology Ratios Are Important
  • If everything gets faster & cheaper at the same rate THEN nothing really changes.
  • Things getting MUCH BETTER:
    • communication speed & cost 1,000x
    • processor speed & cost 100x
    • storage size & cost 100x
  • Things staying about the same
    • speed of light (more or less constant)
    • people (10x more expensive)
    • storage speed (only 10x better)
today s storage hierarchy speed capacity vs cost tradeoffs

15

4

10

10

12

2

10

10

9

0

10

10

6

-2

10

10

3

-4

10

10

Today’s Storage Hierarchy : Speed & Capacity vs Cost Tradeoffs

Size vs Speed

Price vs Speed

Cache

Nearline

Tape

Offline

Main

Tape

Disc

Secondary

Online

Online

$/MB

Secondary

Tape

Tape

Disc

Typical System (bytes)

Main

Offline

Nearline

Tape

Tape

Cache

-9

-6

-3

0

3

-9

-6

-3

0

3

10

10

10

10

10

10

10

10

10

10

Access Time (seconds)

Access Time (seconds)

storage ratios changed
Storage Ratios Changed
  • 10x better access time
  • 10x more bandwidth
  • 4,000x lower media price
  • DRAM/DISK 100:1 to 10:10 to 50:1
the 5 minute rule derived
M$: cost of a RAM page

RAM $/MB

PageSize x Lifetime

A$: cost of a disk access

Disk Price

AccessesPerSec x Lifetime

RI: Reference Interval

time between accesses to page

Disk access cost A$/RI

M$= A$/RI

M$

Cost of a RAM page

The 5 Minute Rule Derived

$

Breakeven:

M$ = A$ / Reference Interval

Reference Interval = M$/A$

= DiskPrice x PageSize

RAMprice x AccPerSec

Reference Interval =Time

the five minute rule observations
The Five Minute Rule Observations
  • Break even has two terms:

(2) Economic term: DiskPrice / RAM_MB_Price ~ 400:4 = 100:1

(1) Technology term: PageSize / DiskAccPerSec ~ 8KB : 80 = 100:1

  • Economic term trends down
  • Technology term trends up to compensate.
  • Still at 5 minute for random, 1 minute sequential
standard storage metrics
Standard Storage Metrics
  • Capacity:
    • RAM: MB and $/MB: today at 10MB & 100$/MB
    • Disk: GB and $/GB: today at 10 GB and 200$/GB
    • Tape: TB and $/TB: today at .1TB and 25k$/TB (nearline)
  • Access time (latency)
    • RAM: 100 ns
    • Disk: 10 ms
    • Tape: 30 second pick, 30 second position
  • Transfer rate
    • RAM: 1 GB/s
    • Disk: 5 MB/s - - - Arrays can go to 1GB/s
    • Tape: 5 MB/s - - - striping is problematic
new storage metrics kaps maps scan
New Storage Metrics: Kaps, Maps, SCAN?
  • Kaps:How many KB objects served per second
    • The file server, transaction processing metric
    • This is the OLD metric.
  • Maps:How many MB objects served per sec
    • The Multi-Media metric
  • SCAN: How long to scan all the data
    • The data mining and utility metric
  • And
    • Kaps/$, Maps/$, TBscan/$
slide27
For the Record (good 1998 devices packaged in systemhttp://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdf)

X 14

slide28
For the Record (good 1998 devices packaged in systemhttp://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdf)

X 14

how to get lots of maps scans
How To Get Lots of Maps, SCANs
  • parallelism: use many little devices in parallel
  • Beware of the media myth
  • Beware of the access time myth

At 10 MB/s: 1.2 days to scan

1,000 x parallel: 100 seconds SCAN.

Parallelism: divide a big problem into many smaller ones to be solved in parallel.

the disk farm on a card
The Disk Farm On a Card

The 1 TB disc card

An array of discs

Can be used as

100 discs

1 striped disc

10 Fault Tolerant discs

....etc

LOTS of accesses/second

bandwidth

14"

  • Life is cheap, its the accessories that cost ya.
  • Processors are cheap, it’s the peripherals that cost ya
          • (a 10k$ disc card).
tape farms for tertiary storage not mainframe silos
Tape Farms for Tertiary StorageNot Mainframe Silos

100 robots

1M$

50TB

50$/GB

3K Maps

10K$ robot

14 tapes

27 hr Scan

500 GB

5 MB/s

20$/GB

Scan in 27 hours.

many independent tape robots

(like a disc farm)

30 Maps

tape optical beware of the media myth
Tape & Optical: Beware of the Media Myth

Optical is cheap: 200 $/platter

2 GB/platter

=> 100$/GB (2x cheaper than disc)

Tape is cheap: 30 $/tape

20 GB/tape

=> 1.5 $/GB (100x cheaper than disc).

tape optical reality media is 10 of system cost
Tape & Optical Reality: Media is 10% of System Cost
  • Tape needs a robot (10 k$ ... 3 m$ )
  • 10 ... 1000 tapes (at 20GB each) => 20$/GB ... 200$/GB
          • (1x…10x cheaper than disc)
  • Optical needs a robot (100 k$ )
  • 100 platters = 200GB ( TODAY ) => 400 $/GB
          • ( more expensive than mag disc )
  • Robots have poor access times
  • Not good for Library of Congress (25TB)
  • Data motel: data checks in but it never checks out!
the access time myth
The Access Time Myth

The Myth: seek or pick time dominates

The reality: (1) Queuing dominates

(2) Transfer dominates BLOBs

(3) Disk seeks often short

Implication: many cheap servers better than one fast expensive server

  • shorter queues
  • parallel transfer
  • lower cost/access and cost/byte

This is now obvious for disk arrays

This will be obvious for tape arrays

jim gray talk at university of tokyo35
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report: invest in long term research
  • Preview of Turing lecture: 10 long term research problems
    • Bush: Summarize info in cyberspace
    • Turing: Intelligent Computers
    • 7 9s: build systems that are always up and prove it.
  • 5-Minute rule
    • For disks
    • For tapes
  • Sorting Progress
    • PennySort
    • Terabyte Sort (!)
  • Slides will be at http://research.Microsoft.com/~Gray/talks
penny sort ground rules http research microsoft com barc sortbenchmark
Penny Sort Ground Ruleshttp://research.microsoft.com/barc/SortBenchmark
  • How much can you sort for a penny.
    • Hardware and Software cost
    • Depreciated over 3 years
    • 1M$ system gets about 1 second,
    • 1K$ system gets about 1,000 seconds.
    • Time (seconds) = SystemPrice ($) / 946,080
  • Input and output are disk resident
  • Input is
    • 100-byte records (random data)
    • key is first 10 bytes.
  • Must create output file and fill with sorted version of input file.
  • Daytona (product) and Indy (special) categories
pennysort
PennySort
  • Hardware
    • 266 Mhz Intel PPro
    • 64 MB SDRAM (10ns)
    • Dual Fujitsu DMA 3.2GB EIDE disks
  • Software
    • NT workstation 4.3
    • NT 5 sort
  • Performance
    • sort 15 M 100-byte records (~1.5 GB)
    • Disk to disk
    • elapsed time 820 sec
      • cpu time = 404 sec
recent results
Recent Results
  • NOW Sort: 9 GB on a cluster of 100 UltraSparcs in 1 minute
  • MilleniumSort: 16x Dell NT cluster: 100 MB in 1.8 Sec (Datamation)
  • Tandem/Sandia Sort: 68 CPU ServerNet 1 TB in 47 minutes
  • Rumor of IBM Sort: 7000 cpu Blue Pacific 1 TB in 1024 seconds (17 minutes). 10 Mrps (1GBps)
jim gray talk at university of tokyo40
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report: invest in long term research
  • Preview of Turing lecture: 10 long term research problems
    • Bush: Summarize info in cyberspace
    • Turing: Intelligent Computers
    • 7 9s: build systems that are always up and prove it.
  • 5-Minute rule
    • For disks
    • For tapes
  • Sorting Progress
    • PennySort
    • Terabyte Sort (!)
  • Slides will be at http://research.Microsoft.com/~Gray/talks