jim gray talk at university of tokyo
Download
Skip this Video
Download Presentation
Jim Gray Talk at University of Tokyo

Loading in 2 Seconds...

play fullscreen
1 / 40

Jim Gray Talk at University of Tokyo - PowerPoint PPT Presentation


  • 492 Views
  • Uploaded on

Jim Gray Talk at University of Tokyo Personal views on PITAC report: invest in long term research Preview of Turing lecture: 10 long term research problems Bush: Summarize info in cyberspace Turing: Intelligent Computers 7 9s: build systems that are always up and prove it. 5-Minute rule

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Jim Gray Talk at University of Tokyo' - Ava


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
jim gray talk at university of tokyo
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report: invest in long term research
  • Preview of Turing lecture: 10 long term research problems
    • Bush: Summarize info in cyberspace
    • Turing: Intelligent Computers
    • 7 9s: build systems that are always up and prove it.
  • 5-Minute rule
    • For disks
    • For tapes
  • Sorting Progress
    • PennySort
    • Terabyte Sort (!)
  • Slides will be at http://research.Microsoft.com/~Gray/talks
slide2
Presidential Advisory Committee onHigh Performance Computing and Communications,Information Technologies, and the Next Generation InternetInformation Technology

http://www.ccic.gov/ac/interim/ or

http://research.microsoft.com/~Gray/papers/PITAC_Interim_Report_8_98.doc

charter for the committee provide an independent assessment of
Charter for the Committee:provide an independent assessment of
  • High-Performance Computing and Communications (HPCC)
    • Progress
    • Balance among research components;
  • Next Generation Internet initiative;
    • Progress
    • Balance
  • IT Research and development
    • Maintain United States leadership in
      • IT and
      • Applications
committee members
Committee Members
  • Co-Chairs:
    • Bill Joy, Sun Microsystems • Ken Kennedy, Rice University
  • Members:
    • Eric Benhamou, 3Com • Vinton Cerf, MCI
    • Ching-chih Chen, Simmons • David Cooper, LLNL
    • Steve Dorfman, Hughes • David Dorman, PointCast
    • Bob Ewald, SGI • David Farber, U. of Pennsylvania
    • Sherri Fuller, U. of Washington • Hector Garcia-Molina, Stanford
    • Susan Graham, UC Berkeley • Jim Gray, Microsoft
    • Danny Hillis, Disney, Inc • John Miller, Montana State Univ.
    • David Nagel, AT&T • Raj Reddy, Carnegie Mellon
    • Ted Shortliffe, Stanford • Larry Smarr, U. of Illinois @ UC
    • Joe Thompson, Miss. State U. • Les Vadasz, Intel
    • Andy Viterbi, Qualcom • Steve Wallach, Centerpoint
    • Irving Wladawsky-Berger, IBM
my summary of the report
My Summary of the Report
  • 1/3 of the US economic growth since 1992 was in the IT sector. IT is key to our health, wealth, and safety.
  • Created 400 B$ of wealth in last 3 years (!!)
  • Federal IT research funding of twenty years ago, created the boom.
  • Federal IT research funding for the last decade has been flat (in constant dollars).
  • Research funding is increasingly near-term & applied development
  • The committee recommends Increase long-term research funding in:
    • Software design and implementation technologies
    • Technologies to scale the Next Generation Internet to 6 billion users.
    • Tools, algorithms, and systems for high-performance computing.
  • Spend a billion dollars over the next 5 years on Lewis and Clark style "expeditions" into cyberspace.
myths
Myths
  • Now that IT is a big business, Industry will do long term research.

FACT:

industry spends LITTLE on long-term research.

it is not in their best interest

  • IT research = buy computers for scientists.

FACT

computer science research

is different from

the application of computers to some discipline.

research priorities
Research Priorities
  • Findings:
    • Total federal Information technology R&D investment is inadequate
    • Federal IT R&D is excessively focused on near-term problems
  • Recommendations:
    • Create a strategic initiative in long-term IT R&D
    • Increase the investment for research in software, scalable information infrastructure, high-end computing, and socio-economic and workforce impacts
software research
Software Research
  • Findings:
    • Demand for software far exceeds the nation’s ability to produce it
    • The nation depends on fragile software
    • Technologies to build reliable and secure software are inadequate
    • The nation is under-investing in fundamental software research
  • Recommendations:
    • Fund more fundamental research in software development methods and component technologies
    • Sponsor a national library of software components
    • Make software research a substantive component of every major IT research initiative
    • Support research in human-computer interfaces and interaction
  • Make fundamental software research an absolute priority
scalable information infrastructure
Scalable Information Infrastructure
  • Findings:
    • The Internet has grown well beyond the intent of its original designers
    • Our nation’s dependence on the information infrastructure is increasing daily
    • We cannot safely extend what we currently know to more complex systems
    • Learning how to build large-scale, highly reliable and secure systems requires research
  • Recommendations:
    • Increase funding in research and development of core software and communications technologies aimed directly at the challenge of scaling the information infrastructure
    • Expand the Next Generation Internet test beds to include additional industry partnerships in order to foster the rapid commercialization and deployment of enabling technologies
high end computing
High-End Computing
  • Findings HEC is:
    • essential for science and engineering research
    • an element of the United States national security
    • ripe for new applications
    • suppliers suffer from unusual market pressures
  • Research& Development Recommendations
    • Fund innovative technologies and architectures
    • Fund HEC software (parallel programming)
    • Aim for a real application petaops by 2010 through a both hardware and software strategies
    • Fund HEC systems for science and engineering research
social economic workforce recommendations
Social, Economic, Workforce Recommendations
  • Expand research on the social and economic impacts of information technology diffusion and adoption
  • Expand initiatives to increase IT literacy, access and research capabilities
  • Address the shortage of high-technology workers
  • Programs to re-train “stale” IT workers
  • Encourage participation by women and minorities
  • Short-term increase in immigration of skilled IT workers
conclusions
Conclusions
  • IT is an essential foundation for commerce, education, health care, environmental stewardship, and national security:
    • Dramatically transform the way we communicate, learn, deal with information and conduct research
    • Transform the nature of work, nature of commerce, product design cycle, practice of health care, and the government itself
  • The total Federal IT R&D investment is inadequate
  • The Federal IT R&D is excessively focused on near-term problems
  • U. S. government must:
    • Create a strategic initiative in long-term IT R&D
    • Establish an effective structure for managing and coordinating IT
jim gray talk at university of tokyo13
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report: invest in long term research
  • Preview of Turing lecture: 10 long term research problems
    • Bush: Summarize info in cyberspace
    • Turing: Intelligent Computers
    • 7 9s: build systems that are always up and prove it.
  • 5-Minute rule
    • For disks
    • For tapes
  • Sorting Progress
    • PennySort
    • Terabyte Sort (!)
  • Slides will be at http://research.Microsoft.com/~Gray/talks
vanaveer bush memex
Vanaveer Bush: Memex
  • Memex: Proposed putting all information online (1948)
  • It will happen
  • Result: InfoGlut. Too much information in the shoebox
  • Challenge:
    • Organize the information.
    • Give answers as good as an expert in the field.
    • Anticipate questions and so inform “subscriber”
  • Protect personal privacy
    • A hacker cannot get access to your personal information without your consent.
turing s test 1951 intelligent machines
Turing’s Test (1951): Intelligent Machines
  • Computers helped with the 4-color problem end game
  • Computers (and people) won world chess championship
  • Computers will likely be our 5th brain
    • Augment our intelligence
    • See for us, hear for us, read for us,
    • Prosthetic eyes, ears, voices, arms, legs,….
  • Probably computers will be intelligent like plants and animals.
  • Perhaps computers can be intelligent like people
    • Pass the Turing Test (easy/impossible?) (70%, 5 minutes, B can lie)
    • Translating telephone (as good as a human translator)
    • Read a textbook and pass the written exam.
    • Pass a graduate programming class
    • Pass a graduate literature class
  • Radical: Download someone.
dependable systems
Dependable Systems
  • Build a system used by millions of people each day.
  • Then:
    • Prove that it does what it is supposed to do (code matches spec).
    • Prove that it delivers 99.99999% (7 9s) availability (1 hr per millennium)
    • Prove that it cannot be “hacked” for less than 1B$ (Y2K $)
  • Then build the system automatically from the specification.
jim gray talk at university of tokyo17
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report: invest in long term research
  • Preview of Turing lecture: 10 long term research problems
    • Bush: Summarize info in cyberspace
    • Turing: Intelligent Computers
    • 7 9s: build systems that are always up and prove it.
  • 5-Minute rule
    • For disks
    • For tapes
  • Sorting Progress
    • PennySort
    • Terabyte Sort (!)
  • Slides will be at http://research.Microsoft.com/~Gray/talks
storage hierarchy 9 levels
Storage Hierarchy (9 levels)

Cache 1, 2

Main (1, 2, 3 if nUMA).

Disk (1 (cached), 2)

Tape (1 (mounted), 2)

meta message technology ratios are important
Meta-Message: Technology Ratios Are Important
  • If everything gets faster & cheaper at the same rate THEN nothing really changes.
  • Things getting MUCH BETTER:
    • communication speed & cost 1,000x
    • processor speed & cost 100x
    • storage size & cost 100x
  • Things staying about the same
    • speed of light (more or less constant)
    • people (10x more expensive)
    • storage speed (only 10x better)
today s storage hierarchy speed capacity vs cost tradeoffs
15

4

10

10

12

2

10

10

9

0

10

10

6

-2

10

10

3

-4

10

10

Today’s Storage Hierarchy : Speed & Capacity vs Cost Tradeoffs

Size vs Speed

Price vs Speed

Cache

Nearline

Tape

Offline

Main

Tape

Disc

Secondary

Online

Online

$/MB

Secondary

Tape

Tape

Disc

Typical System (bytes)

Main

Offline

Nearline

Tape

Tape

Cache

-9

-6

-3

0

3

-9

-6

-3

0

3

10

10

10

10

10

10

10

10

10

10

Access Time (seconds)

Access Time (seconds)

storage ratios changed
Storage Ratios Changed
  • 10x better access time
  • 10x more bandwidth
  • 4,000x lower media price
  • DRAM/DISK 100:1 to 10:10 to 50:1
the 5 minute rule derived
M$: cost of a RAM page

RAM $/MB

PageSize x Lifetime

A$: cost of a disk access

Disk Price

AccessesPerSec x Lifetime

RI: Reference Interval

time between accesses to page

Disk access cost A$/RI

M$= A$/RI

M$

Cost of a RAM page

The 5 Minute Rule Derived

$

Breakeven:

M$ = A$ / Reference Interval

Reference Interval = M$/A$

= DiskPrice x PageSize

RAMprice x AccPerSec

Reference Interval =Time

the five minute rule observations
The Five Minute Rule Observations
  • Break even has two terms:

(2) Economic term: DiskPrice / RAM_MB_Price ~ 400:4 = 100:1

(1) Technology term: PageSize / DiskAccPerSec ~ 8KB : 80 = 100:1

  • Economic term trends down
  • Technology term trends up to compensate.
  • Still at 5 minute for random, 1 minute sequential
standard storage metrics
Standard Storage Metrics
  • Capacity:
    • RAM: MB and $/MB: today at 10MB & 100$/MB
    • Disk: GB and $/GB: today at 10 GB and 200$/GB
    • Tape: TB and $/TB: today at .1TB and 25k$/TB (nearline)
  • Access time (latency)
    • RAM: 100 ns
    • Disk: 10 ms
    • Tape: 30 second pick, 30 second position
  • Transfer rate
    • RAM: 1 GB/s
    • Disk: 5 MB/s - - - Arrays can go to 1GB/s
    • Tape: 5 MB/s - - - striping is problematic
new storage metrics kaps maps scan
New Storage Metrics: Kaps, Maps, SCAN?
  • Kaps:How many KB objects served per second
    • The file server, transaction processing metric
    • This is the OLD metric.
  • Maps:How many MB objects served per sec
    • The Multi-Media metric
  • SCAN: How long to scan all the data
    • The data mining and utility metric
  • And
    • Kaps/$, Maps/$, TBscan/$
slide27
For the Record (good 1998 devices packaged in systemhttp://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdf)

X 14

slide28
For the Record (good 1998 devices packaged in systemhttp://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdf)

X 14

how to get lots of maps scans
How To Get Lots of Maps, SCANs
  • parallelism: use many little devices in parallel
  • Beware of the media myth
  • Beware of the access time myth

At 10 MB/s: 1.2 days to scan

1,000 x parallel: 100 seconds SCAN.

Parallelism: divide a big problem into many smaller ones to be solved in parallel.

the disk farm on a card
The Disk Farm On a Card

The 1 TB disc card

An array of discs

Can be used as

100 discs

1 striped disc

10 Fault Tolerant discs

....etc

LOTS of accesses/second

bandwidth

14"

  • Life is cheap, its the accessories that cost ya.
  • Processors are cheap, it’s the peripherals that cost ya
          • (a 10k$ disc card).
tape farms for tertiary storage not mainframe silos
Tape Farms for Tertiary StorageNot Mainframe Silos

100 robots

1M$

50TB

50$/GB

3K Maps

10K$ robot

14 tapes

27 hr Scan

500 GB

5 MB/s

20$/GB

Scan in 27 hours.

many independent tape robots

(like a disc farm)

30 Maps

tape optical beware of the media myth
Tape & Optical: Beware of the Media Myth

Optical is cheap: 200 $/platter

2 GB/platter

=> 100$/GB (2x cheaper than disc)

Tape is cheap: 30 $/tape

20 GB/tape

=> 1.5 $/GB (100x cheaper than disc).

tape optical reality media is 10 of system cost
Tape & Optical Reality: Media is 10% of System Cost
  • Tape needs a robot (10 k$ ... 3 m$ )
  • 10 ... 1000 tapes (at 20GB each) => 20$/GB ... 200$/GB
          • (1x…10x cheaper than disc)
  • Optical needs a robot (100 k$ )
  • 100 platters = 200GB ( TODAY ) => 400 $/GB
          • ( more expensive than mag disc )
  • Robots have poor access times
  • Not good for Library of Congress (25TB)
  • Data motel: data checks in but it never checks out!
the access time myth
The Access Time Myth

The Myth: seek or pick time dominates

The reality: (1) Queuing dominates

(2) Transfer dominates BLOBs

(3) Disk seeks often short

Implication: many cheap servers better than one fast expensive server

  • shorter queues
  • parallel transfer
  • lower cost/access and cost/byte

This is now obvious for disk arrays

This will be obvious for tape arrays

jim gray talk at university of tokyo35
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report: invest in long term research
  • Preview of Turing lecture: 10 long term research problems
    • Bush: Summarize info in cyberspace
    • Turing: Intelligent Computers
    • 7 9s: build systems that are always up and prove it.
  • 5-Minute rule
    • For disks
    • For tapes
  • Sorting Progress
    • PennySort
    • Terabyte Sort (!)
  • Slides will be at http://research.Microsoft.com/~Gray/talks
penny sort ground rules http research microsoft com barc sortbenchmark
Penny Sort Ground Ruleshttp://research.microsoft.com/barc/SortBenchmark
  • How much can you sort for a penny.
    • Hardware and Software cost
    • Depreciated over 3 years
    • 1M$ system gets about 1 second,
    • 1K$ system gets about 1,000 seconds.
    • Time (seconds) = SystemPrice ($) / 946,080
  • Input and output are disk resident
  • Input is
    • 100-byte records (random data)
    • key is first 10 bytes.
  • Must create output file and fill with sorted version of input file.
  • Daytona (product) and Indy (special) categories
pennysort
PennySort
  • Hardware
    • 266 Mhz Intel PPro
    • 64 MB SDRAM (10ns)
    • Dual Fujitsu DMA 3.2GB EIDE disks
  • Software
    • NT workstation 4.3
    • NT 5 sort
  • Performance
    • sort 15 M 100-byte records (~1.5 GB)
    • Disk to disk
    • elapsed time 820 sec
      • cpu time = 404 sec
recent results
Recent Results
  • NOW Sort: 9 GB on a cluster of 100 UltraSparcs in 1 minute
  • MilleniumSort: 16x Dell NT cluster: 100 MB in 1.8 Sec (Datamation)
  • Tandem/Sandia Sort: 68 CPU ServerNet 1 TB in 47 minutes
  • Rumor of IBM Sort: 7000 cpu Blue Pacific 1 TB in 1024 seconds (17 minutes). 10 Mrps (1GBps)
jim gray talk at university of tokyo40
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report: invest in long term research
  • Preview of Turing lecture: 10 long term research problems
    • Bush: Summarize info in cyberspace
    • Turing: Intelligent Computers
    • 7 9s: build systems that are always up and prove it.
  • 5-Minute rule
    • For disks
    • For tapes
  • Sorting Progress
    • PennySort
    • Terabyte Sort (!)
  • Slides will be at http://research.Microsoft.com/~Gray/talks
ad