Distributed Computing Economics

Distributed Computing Economics Jim Gray Microsoft Research gray@microsoft.com Talk at SD Forum: http://www.sdforum.org/ 18 Sept 2003, PARC Auditorium, Palo Alto, CA.Slides at: http://research.microsoft.com/~gray/talks

Two (?) Talks • Distributed Computing Economics • Online Science (what I have been doing).

Distributed Computing Economics • Why is Seti@Home a great idea? • Why is Napster a great deal? • Why is the Computational Grid uneconomic? • When does computing on demand work? • What is the “right” level of abstraction? • Is the Access Grid the real killer app? Based on: Distributed Computing Economics, Jim Gray, Microsoft Tech report, March 2003, MSR-TR-2003-24 http://research.microsoft.com/research/pubs/view.aspx?tr_id=655

Computing is Free • Computers cost 1k$ (if you shop right)(yes, there are 1μ$ to 1M$ computers, but..) • So 1 cpu day == 1$ (computers last 3 years) • If you pay the phone bill Internet bandwidth costs 50 … 500$/mbps/m(not including routers and management). • So 1GB costs 1$ to send and 1$ to receive Caveat: All numbers rounded to nearest factor of 3.

Why is Seti@Home a Good Deal? • Send 300 KB costs 3e-4$ • User computes for ½ day: benefit .5e-1$ • ROI: 1500:1

Seti@HomeThe worlds most powerful computer • 61 TF is sum of top 4 of Top 500. • 61 TF is 9x the number 2 system. • 61 TF more than the sum of systems 2..10

Why was Napster a Good Deal? • Send 5 MB costs 5e-3$ ½ a penny per song • Both sender and receiver can afford it. • Same logic powers web sites (Yahoo!...): • 1e-3$/page view advertising revenue • 1e-5$/page view cost of serving web page • 100:1 ROI

The Cost of Computing:Computers are NOT free! • IBM, HP, Dell make billions • Capital Cost of a TpcC system is mostly storage and storage software (database) • IBM 32 cpu, 512 GB ram 2,500 disks, 43 TB(680,613 tpmC @ 11.13 $/tpmc available 11/08/03)http://www.tpc.org/results/individual_results/IBM/IBMp690es_05092003.pdf • A 7.5M$ super-computer • Total Data Center Cost: 40% capital & facilities 60% staff(includes app development)

Computing Equivalents1 $ buys • 1 day of cpu time • 4 GB (fast) ram for a day • 1 GB of network bandwidth • 1 GB of disk storage for 3 years • 10 M database accesses • 10 TB of disk access (sequential) • 10 TB of LAN bandwidth (bulk) • 10 KWhrs == 4 days of computer time Depreciating over 3 years, and there are about 1k days in 3 years.

Some consequences • Beowulf networking is 10,000x cheaper than WAN networkingfactors of 105 matter. • The cheapest and fastest way to move Terabytes cross country is sneakernet.24 hours = 4 MB/s50$ shipping vs 1,000$ wan cost. • Sending 10PB CERN data via network is silly: buy disk bricks in Geneva, fill them, ship them. TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange Jim Gray; Wyman Chong; Tom Barclay; Alex Szalay; Jan vandenBerg Microsoft Technical Report may 2002, MSR-TR-2002-54 http://research.microsoft.com/research/pubs/view.aspx?tr_id=569

SpeedMbps Rent$/month $/TBSent Context $/Mbps Time/TB 0.04 40 1,000 3,086 6 years Home phone 50 Home DSL 0.6 117 360 5 months T1 1.5 1,200 800 2,469 2 months T3 43 28,000 651 2,010 2 days OC3 155 49,000 316 976 14 hours OC 192 9600 1,920,000 200 617 14 minutes 100 Mpbs 100 1 day Gbps 1000 2.2 hours How Do You Move A Terabyte? Source: TeraScale Sneakernet, Microsoft Research, Jim Gray et. all

Computational Grid Economics • To the extent that computational grid is like Seti@Home or ZetaNet or Folding@home or… it is a great thing • The extent that the computational grid is MPI or data analysis, it fails on economic grounds: move the programs to the data, not the data to the programs. • The Internet is NOT the cpu backplane. • An alternate reality: Nearly free networking • Telcos go bankrupt an price=cost=0 • Taxpayers pay your phone bill so price=0 and telcos BIG government subsidy

When to Export a Task IF instruction density > 100,000 instructions/byteAND remote computer is free (costs you nothing)THEN ROI > 0ELSE ROI < 0

Computing on Demand • Was called outsourcing / service bureaus in my youth. CSC and IBM did it. • It is not a new way of doing things: think payroll.Payroll is standard outsource. • Now Hotmail, Salesforce.com,Oracle.com,…. • Works for standard apps. • COD works for commoditized services. • Airlines outsource reservations.Banks outsource ATMs. • But Amazon, Amex, Wal-Mart, eTrade, eBay...Can’t outsource their core competence.

What do you Outsource? • Disk blocks? • Files ? • SQL ? • RPC ? • Application? • Ø • Xdrive • SkyServer • TerraServer • AOL, Google, Hotmail, Yahoo!, ….

What’s the right abstraction level for Internet Scale Distributed Computing? • Disk block? No too low. • File? No too low. • Database? No too low. • Application? Yes, of course. • Blast search • Google search • Send/Get eMail • Portals that federate astronomy archives(http://skyQuery.Net/) • Web Services (.NET, EJB, OGSA) give this abstraction level.

Access Grid • Q: What comes after the telephone? • A: eMail? • A: Instant messaging? • Both seem retro: text & emotons. • Access Grid could revolutionize human communication. • But, it needs a new idea. • Q: What comes after the telephone?

Distributed Computing Economics • Why is Seti@Home a great idea? • Why is Napster a great deal? • Why is the Computational Grid uneconomic • When does computing on demand work? • What is the “right” level of abstraction? • Is the Access Grid the real killer app? Based on: Distributed Computing Economics, Jim Gray, Microsoft Tech report, March 2003, MSR-TR-2003-24 http://research.microsoft.com/research/pubs/view.aspx?tr_id=655

Two (?) Talks • Distributed Computing Economics • Online Science (what I have been doing). • The World Wide Telescope • I have been looking for a distributed DB for most of my career. • I think I found one! (sort of).

The World Wide Telescope • I have been looking for a distributed DB for most of my career. • I think I found one! (sort of).

The Evolution of Science • Observational Science • Scientist gathers data by direct observation • Scientist analyzes Information • Analytical Science • Scientist builds analytical model • Makes predictions. • Computational Science • Simulate analytical model • Validate model and makes predictions • Science - InformaticsInformation Exploration Science Information captured by instrumentsOr Information generated by simulator • Processed by software • Placed in a database / files • Scientist analyzes database / files

Computational Science Evolves • Historically, Computational Science = simulation. • New emphasis on informatics: • Capturing, • Organizing, • Summarizing, • Analyzing, • Visualizing • Largely driven by observational science, but also needed by simulations. • Too soon to say if comp-X and X-info will unify or compete. BaBar, Stanford P&E Gene Sequencer From http://www.genome.uci.edu/ Space Telescope

Comp-Science generating anInformation avalanche comp-chem, comp-physics, comp-bio, comp-astro, comp-linguistics, comp-music, comp-entertainment, comp-warfare Science-Info dealing with Information avalanche bio-info, astro-info, text-info, Both comp-X and X-infoGenerating Petabytes

Information Avalanche Stories • Turbulence: 100 TB simulation then mine the Information • BaBar: Grows 1TB/day 2/3 simulation Information 1/3 observational Information • CERN: LHC will generate 1GB/s 10 PB/y • VLBA (NRAO) generates 1GB/s today • NCBI: “only ½ TB” but doubling each year very rich dataset. • Pixar: 100 TB/Movie

Astro-InfoWorld Wide Telescopehttp://www.astro.caltech.edu/nvoconf/http://www.voforum.org/ • Premise: Most data is (or could be online) • Internet is the world’s best telescope: • It has data on every part of the sky • In every measured spectral band: optical, x-ray, radio.. • As deep as the best instruments (2 years ago). • It is up when you are up.The “seeing” is always great(no working at night, no clouds no moons no..). • It’s a smart telescope: links objects and data to literature on them.

ROSAT ~keV DSS Optical IRAS 25m 2MASS 2m GB 6cm WENSS 92cm NVSS 20cm IRAS 100m Why Astronomy Data? • It has no commercial value • No privacy concerns • Can freely share results with others • Great for experimenting with algorithms • It is real and well documented • High-dimensional data (with confidence intervals) • Spatial data • Temporal data • Many different instruments from many different places and many different times • But, it’s the same universe so comparisons make sense & are interesting. • Federation is a goal • There is a lot of it (petabytes) • Great sandbox for data mining algorithms • Can share cross company • University researchers • Great way to teach both Astronomy and Computational Science

Data Mining Algorithms Miners Scientists Science Data & Questions Database To store data Execute Queries Plumbers Question & AnswerVisualization Tools What X-info Needs from us (cs)(not drawn to scale)

Show Maria’s 5-minute PPT SDSS Image Cutout slide show by Maria A. Nieto-Santisteban of JHUhttp://www.research.microsoft.com/~Gray/talks/FDIS_ImgCutoutPresentation.ppt

You can GREP 1 MB in a second You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years. Oh!, and 1PB ~5,000 disks At some point you need indices to limit searchparallel data search and analysis This is where databases can help You can FTP 1 MB in 1 sec You can FTP 1 GB / min (= 1 $/GB) … 2 days and 1K$ … 3 years and 1M$ Data Access is hitting a wallFTP and GREP are not adequate

Next-Generation Data Analysis • Looking for • Needles in haystacks – the Higgs particle • Haystacks: Dark matter, Dark energy • Needles are easier than haystacks • Global statistics have poor scaling • Correlation functions are N2, likelihood techniques N3 • As data and processing grow at same rate, we can only keep up with N logN • A way out? • Discard notion of optimal (data is fuzzy, answers are approximate) • Don’t assume infinite computational resources or memory • Requires combination of statistics & computer science

Analysis and Databases • Statistical analysis deals with • Creating uniform samples • data filtering & censoring bad data • Assembling subsets • Estimating completeness • Counting and building histograms • Generating Monte-Carlo subsets • Likelihood calculations • Hypothesis testing • Traditionally these are performed on files • Most of these tasks are much better done inside a databaseclose to the data. • Move Mohamed to the mountain, not the mountain to Mohamed.

Goal: Easy Data Publication & Access • Augment FTP with data query: Return intelligent data subsets • Make it easy to • Publish: Record structured data • Find: • Find data anywhere in the network • Get the subset you need • Explore datasets interactively • Realistic goal: • Make it as easy as publishing/reading web sites today.

Data Federations of Web Services • Massive datasets live near their owners: • Near the instrument’s software pipeline • Near the applications • Near data knowledge and curation • Super Computer centers become Super Data Centers • Each Archive publishes a web service • Schema: documents the data • Methods on objects (queries) • Scientists get “personalized” extracts • Uniform access to multiple Archives • A common global schema Federation

Web Services: The Key? Your program Web Server http • Web SERVER: • Given a url + parameters • Returns a web page (often dynamic) • Web SERVICE: • Given a XML document (soap msg) • Returns an XML document • Tools make this look like an RPC. • F(x,y,z) returns (u, v, w) • Distributed objects for the web. • + naming, discovery, security,.. • Internet-scale distributed computing Web page Your program Web Service soap Data In your address space objectin xml

The Challenge • This has failed several times before– understand why. • Develop • Common data models (schemas), • Common interfaces (class/method) • Build useful prototypes (nodes and portals) • Create a community that uses the prototypes and evolves the prototypes.

Grid and Web Services Synergy • I believe the Grid will be many web services • IETF standards Provide • Naming • Authorization / Security / Privacy • Distributed Objects Discovery, Definition, Invocation, Object Model • Higher level services: workflow, transactions, DB,.. • Synergy: commercial Internet & Grid tools

Some Interesting Things We are Doing in SDSS (what’s new) • SkyServer is “done.” Now it is 99% perspiration to load 25 TB (many times) and manage it. • I’m using it as a research vehicle to explore new DB ideas. • Others are cloning it for other surveys.Some doing DB2 & Oracle variants.

SkyServer Overview (10 min) • 10 minute SkyServer tour • Pixel space http://skyserver.sdss.org/en/ • Record space: http://skyserver.sdss.org/en/tools/explore/obj.asp?id=2255030989160697 • Doc space: Ned • Set space: • Web & Query Logs: • Dr1 WebService • You can download (thanks to Cathan Cook ) • Data + Database code: • Website: • Data Mining the SDSS SkyServer DatabaseMSR-TR-2002-01 select top 10 * from weblog..weblog where yy = 2003 and mm=7 and dd =25 order by seq desc select top 10 * from weblog..sqlLog order by theTime Desc http://skyserver.pha.jhu.edu/dr1/en/tools/chart/navi.asp http://research.microsoft.com/~gray/SDSS/personal_skyserver.htm

Cutout Service (10 min)A typical web service • Show it • Show WSDL • Show fixing a bug • Rush through code. • You can download it.Maria A. Nieto-Santisteban did most of this (Alex and I started it) http://research.microsoft.com/~gray/SDSS/personal_skyserver.htm

SkyQuery: http://skyquery.net/ • Distributed Query tool using a set of web services • Four astronomy archives from Pasadena, Chicago, Baltimore, Cambridge (England). • Feasibility study, built in 6 weeks • Tanu Malik (JHU CS grad student) • Tamas Budavari (JHU astro postdoc) • With help from Szalay, Thakar, Gray • Implemented in C# and .NET • Allows queries like: SELECT o.objId, o.r, o.type, t.objId FROM SDSS:PhotoPrimary o, TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5 AND AREA(181.3,-0.76,6.5) AND o.type=3 and (o.I - t.m_j)>2

Each SkyNode publishes Schema Web Service Database Web Service Portal is Plans Query (2 phase) Integrates answers Is itself a web service ImageCutout SkyQuery Portal 2MASS INT SDSS FIRST SkyQuery Structure

This is a DataGrid It works today It is challenging for OGSA-DAIS(hello world in OGSI-DAI is complex) SkyQuery is being used as a vehicle to explore OGSA and DAIS requirements. ImageCutout SkyQuery Portal 2MASS INT SDSS FIRST SkyQuery and The Grid

Let users add personal DB 1GB for now. Use it as a workbook. Online and batch queries. Moves analysis to the data Users can cooperate (share MyDB) Still exploring this ImageCutout SkyQuery Portal 2MASS INT SDSS FIRST MyDB added to SkyQuery MyDB

Some Database Topics • Sparse tables: column vs row store tag and index tables pivot • Maplist (cross apply) • Dealing with bad statistics: • Object Relational has arrived.

Column Store Pyramid • Users see fat base tables(universal relation) • Define popular columns index tag table 10% ~ 100 columns • Make many skinny indices 1% ~ 10 columns • Query optimizer picks right plan • Automate definition & use • Fast read, slow insert/update • Data warehouse • Note: prior to Yukon, index had 16 column limit. A bane of my existence. BASE Obese query TAG Fat query Typical Semi-join INDICIES Simple

Examples create tablebase (idbigint, f1intprimary key,f2int, …,f1000int) create index tagonbase (id)include(f1, …, f100) create indexskinnyonbase(f2,…f17) BASE Obese query TAG Typical Semi-join Fat query INDICIES Simple

A Semi-Join Example create tablefat(aint primary key, bint, cint, fatchar(988)) declare@iint, @jint; set@i = 0 again: insertfatvalues(@i, cast(100*rand() as int), cast(100*rand() as int), ' ') set@i = @i + 1; if(@i < 1000000)gotoagain create indexabonfat(a,b) create indexac onfat(a,c) dbcc dropcleanbuffers with no_infomsgs selectcount(*)fromfatwith(index (0))wherec = b -- Table 'fat'. Scan 3, reads 137,230, CPU : 1.3 s, elapsed 31.1s. dbcc dropcleanbuffers with no_infomsgs selectcount(*) fromfatwhereb=c -- Table 'fat'. Scan 2, reads: 3,482 CPU 1.1 s, elapsed: 1.4 s. b=c 137 K IO 31 sec 1GB ab 8MB 8MB ac b=c 3.4K IO 1.4 sec

T Object year color 4PNC4502000 white Moving From Rows to ColumnsPivot & UnPivot What if the table is sparse? LDAP has 7 mandatory and 1,000 optional attributesStore row, col, value create tableFeatures ( objectvarchar , attributevarchar, valuevarchar, primary key ( object, attribute)) select*from(featurespivotvalueonattribute in(year, color) )asTwhereobject = ‘4PNC450’ Features object attribute value ●●●● 4PNC450 year 2000 4PNC450 color white 4PNC450 make Ford 4PNC450 model Taurus ●●●●

p1 f(p1) p2 f(p2) pn f(pn) Maplist Meets SQL – cross apply selectp.*, q.* fromparentaspcross applyf(p.a, p.b, p.c)asq wherep.type = 1 • Your table-valued function F(a,b,c) returns all objects related to a,b,c. • spatial neighbors, • sub-assemblies, • members of a group, • items in a folder,… • Apply this function to each row • Classic drill-downuse outer apply if f() may be null

When SQL Optimizer Guesses Wrong,Life is DREADFUL • SQL is a non-procedural language. • The compiler/optimizer picks the procedurebased on statistics. • If the stats are wrong or missing….Bad things happen.Queries can run VERY slowly. • Strategy 1: allow users to specify plan. • Strategy 2: make the optimizer smarter (and accept hints from the user.)

Distributed Computing Economics