1 / 69

What Happens When Processing Storage Bandwidth are Free and Infinite?

What Happens When Processing Storage Bandwidth are Free and Infinite?. Jim Gray Microsoft Research. Outline. Hardware CyberBricks all nodes are very intelligent Software CyberBricks standard way to interconnect intelligent nodes What next? Processing migrates to where the power is

franciscae
Download Presentation

What Happens When Processing Storage Bandwidth are Free and Infinite?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Happens WhenProcessingStorageBandwidth are Free and Infinite? Jim Gray Microsoft Research

  2. Outline • Hardware CyberBricks • all nodes are very intelligent • Software CyberBricks • standard way to interconnect intelligent nodes • What next? • Processing migrates to where the power is • Disk, network, display controllers have full-blown OS • Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA) to them • Computer is a federated distributed system.

  3. A Hypothetical QuestionTaking things to the limit • Moore’s law 100x per decade: • Exa-instructions per second in 30 years • Exa-bit memory chips • Exa-byte disks • Gilder’s Law of the Telecosom3x/year more bandwidth 60,000x per decade! • 40 Gbps per fiber today

  4. Grove’s Law • Link Bandwidth doubles every 100 years! • Not much has happened to telephones lately • Still twisted pair

  5. Gilder’s Telecosom Law: 3x bandwidth/year for 25 more years • Today: • 10 Gbps per channel • 4 channels per fiber: 40 Gbps • 32 fibers/bundle = 1.2 Tbps/bundle • In lab 3 Tbps/fiber (400 x WDM) • In theory 25 Tbps per fiber • 1 Tbps = USA 1996 WAN bisection bandwidth 1 fiber = 25 Tbps

  6. ThesisMany little beat few big 3 1 MM 10 nano-second ram 10 microsecond ram 10 millisecond disc 10 second tape archive $1 million $10 K $100 K Pico Processor Micro Nano 10 pico-second ram 1 MB Mini Mainframe 10 0 MB 1 0 GB 1 TB 1 00 TB 1.8" 2.5" 3.5" 5.25" 1 M SPEC marks, 1TFLOP 106 clocks to bulk ram Event-horizon on chip VM reincarnated Multi-program cache, On-Chip SMP 9" 14" • Smoking, hairy golf ball • How to connect the many little parts? • How to program the many little parts? • Fault tolerance?

  7. Year 2000 4B Machine 1 Bips Processor .1 B byte RAM 10 GB byte Disk 1 B bits/sec LAN/WAN • The Year 2000 commodity PC • Billion Instructions/Sec • .1 Billion Bytes RAM • Billion Bits/s Net • 10 B Bytes Disk • Billion Pixel display • 3000 x 3000 x 24 • 1,000 $

  8. 4 B PC’s: The Bricks of Cyberspace • Cost 1,000 $ • Come with • OS (NT, POSIX,..) • DBMS • High speed Net • System management • GUI / OOUI • Tools • Compatible with everyone else • CyberBricks

  9. Super Server: 4T Machine CPU 50 GB Disc 5 GB RAM • Array of 1,000 4B machines • 1 b ips processors • 1 B B DRAM • 10 B B disks • 1 Bbps comm lines • 1 TB tape robot • A few megabucks • Challenge: • Manageability • Programmability • Security • Availability • Scaleability • Affordability • As easy as a single system Cyber Brick a 4B machine Future servers are CLUSTERS of processors, discs Distributed database techniques make clusters work

  10. Functionally Specialized Cards P mips processor Today: P=50 mips M= 2 MB • Storage • Network • Display ASIC M MB DRAM In a few years P= 200 mips M= 64 MB ASIC ASIC

  11. It’s Already True of PrintersPeripheral = CyberBrick • You buy a printer • You get a • several network interfaces • A Postscript engine • cpu, • memory, • software, • a spooler (soon) • and… a print engine.

  12. System On A Chip • Integrate Processing with memory on one chip • chip is 75% memory now • 1MB cache >> 1960 supercomputers • 256 Mb memory chip is 32 MB! • IRAM, CRAM, PIM,… projects abound • Integrate Networking with processing on one chip • system bus is a kind of network • ATM, FiberChannel, Ethernet,.. Logic on chip. • Direct IO (no intermediate bus) • Functionally specialized cards shrink to a chip.

  13. All Device Controllers will be Cray 1’s • TODAY • Disk controller is 10 mips risc engine with 2MB DRAM • NIC is similar power • SOON • Will become 100 mips systems with 100 MB DRAM. • They are nodes in a federation(can run Oracle on NT in disk controller). • Advantages • Uniform programming model • Great tools • Security • economics (cyberbricks) • Move computation to data (minimize traffic) Central Processor & Memory Tera Byte Backplane

  14. With Tera Byte Interconnectand Super Computer Adapters Tera Byte Backplane • Processing is incidental to • Networking • Storage • UI • Disk Controller/NIC is • faster than device • close to device • Can borrow device package & power • So use idle capacity for computation. • Run app in device.

  15. Offload device handling to NIC/HBA higher level protocols: I2O, NASD, VIA… SMP and Cluster parallelism is important. Move app to NIC/device controller higher-higher level protocols: CORBA / DCOM. Cluster parallelism is VERY important. Implications Tera Byte Backplane Central Processor & Memory Conventional Radical

  16. Each node has an OS Each node has local resources: A federation. Each node does not completely trust the others. Nodes use RPC to talk to each other CORBA? DCOM? IIOP? RMI? One or all of the above. Huge leverage in high-level interfaces. Same old distributed system story. How Do They Talk to Each Other? Applications Applications datagrams datagrams streams RPC ? ? RPC streams VIAL/VIPL VIAL/VIPL Wire(s)

  17. Objects! • It’s a zoo • ORBs, COM, CORBA,.. • Object Relationa1 Databases • Objects and 3-tier computing

  18. History and Alphabet Soup Microsoft DCOM based on OSF-DCE Technology DCOM and ActiveX extend it UNIX International Open software Foundation (OSF) ODBC XA / TX Object Management Group (OMG) NT OSF DCE DCE RPC GUIDs IDL DNS Kerberos Solaris COM CORBA 1985 X/Open 1990 1995 Open Group COM

  19. Objects are Software CyberBricks productivity breakthrough (plug ins) manageability breakthrough (modules) Microsoft Promises Cairo distributed objects, secure, transparent, fast invocation IBM/Sun/Oracle/Netscape promise CORBA + Open Doc + Java Beans + All will deliver Customers can pick the best one Both camps Share key goals: Encapsulation: hide implementation Polymorphism: generic opskey to GUI and reuse Uniform Naming Discovery: finding a service Fault handling: transactions Versioning: allow upgrades Transparency: local/remote Security: who has authority Shrink-wrap: minimal inheritance Automation: easy The Promise

  20. The OLE-COM Experience • Macintosh had Publish & Subscribe • PowerPoint needed graphs: • plugged MS Graph in as an component. • Office adopted OLE • one graph program for all of office • Internet arrived • URLs are object references, • Office is Web Enabled right away! • Office97 smaller than Office95 because of shared components • It works!!

  21. Linking And EmbeddingObjects are data modules;transactions are execution modules • Link: pointer to object somewhere else • Think URL in Internet • Embed: bytesare here • Objects may be active; can callback to subscribers

  22. Objects Meet Databasesbasis for universal data servers, access, & integration Database Spreadsheet Photos Mail Map Document • Object-oriented (COM oriented) interface to data • Breaks DBMS into components • Anything can be a data source • Optimization/navigation “on top of” other data sources • Makes an RDBMS anO-R DBMS assuming optimizer understands objects DBMS engine

  23. The BIG PictureComponents and transactions • Software modules are objects • Object Request Broker (a.k.a., Transaction Processing Monitor) connects objects (clients to servers) • Standard interfaces allow software plug-ins • Transaction ties execution of a “job” into an atomic unit: all-or-nothing, durable, isolated • ActiveX Components are a 250M$/year business. Object RequestBroker

  24. Object Request Broker (ORB)Orchestrates RPC Transaction • Registers Servers • Manages pools of servers • Connects clients to servers • Does Naming, request-level authorization, • Provides transaction coordination • Direct and queued invocation • Old names: • Transaction Processing Monitor, • Web server, • NetWare Object-Request Broker

  25. The OO Points So Far • Objects are software Cyber Bricks • Object interconnect standards are emerging • Cyber Bricks become Federated Systems. • Next points: • put processing close to data • do parallel processing.

  26. Three Tier Computing • Clients do presentation, gather input • Clients do some workflow (Xscript) • Clients send high-level requests to ORB • ORB dispatches work-flows and business objects -- proxies for client, orchestrate flows & queues • Server-side workflow scripts call on distributed business objects to execute task Presentation workflow Business Objects Database

  27. The Three Tiers Web Client HTML VB Java plug-ins VBscritpt JavaScrpt Middleware ORB TP Monitor Web Server... Object server Pool VB or Java Script Engine VB or Java Virt Machine HTTP+ DCOM ORB Internet DCOM (oleDB, ODBC,...) LU6.2 Legacy Gateways IBM Object & Data server.

  28. Transaction Processing Evolution to Three TierIntelligence migrated to clients Server green screen 3270 Active • Mainframe Batch processing (centralized) • Dumb terminals & Remote Job Entry • Intelligent terminals database backends • Workflow SystemsObject Request BrokersApplication Generators Mainframe cards TP Monitor ORB

  29. Web Evolution to Three TierIntelligence migrated to clients (like TP) Mosaic NS & IE Active Web Server WAIS • Character-mode clients, smart servers • GUI Browsers - Web file servers • GUI Plugins - Web dispatchers - CGI • Smart clients - Web dispatcher (ORB)pools of app servers (ISAPI, Viper)workflow scripts at client & server archie ghopher green screen

  30. PC Evolution to Three TierIntelligence migrated to server • Stand-alone PC (centralized) • PC + File & print servermessage per I/O • PC + Database server message per SQL statement • PC + App server message per transaction • ActiveX Client, ORB ActiveX server, Xscript IO request reply disk I/O SQL Statement Transaction

  31. Why Did Everyone Go To Three-Tier? • Manageability • Business rules must be with data • Middleware operations tools • Performance (scaleability) • Server resources are precious • ORB dispatches requests to server pools • Technology & Physics • Put UI processing near user • Put shared data processing near shared data • Minimizes data moves • Encapsulate / modularity Presentation workflow Business Objects Database

  32. Why Put Business Objects at Server? MOM’s Business Objects DAD’sRaw Data Customer comes to store with list Gives list to clerk Clerk gets goods, makes invoice Customer pays clerk, gets goods Customer comes to store Takes what he wants Fills out invoice Leaves money for goods Easy to manage Clerks controls access Encapsulation Easy to build No clerks

  33. The OO Points So Far • Objects are software Cyber Bricks • Object interconnect standards are emerging • Cyber Bricks become Federated Systems. • Put processing close to data • Next point: • do parallel processing.

  34. Parallelism: the OTHER half of Super-Servers • Clusters of machines allow two kinds of parallelism • Many little jobs: Online transaction processing • TPC A, B, C,… • A few big jobs: data search & analysis • TPC D, DSS, OLAP • Both give automatic Parallelism

  35. Why Parallel Access To Data? At 10 MB/s 1.2 days to scan 1,000 x parallel 100 second SCAN. BANDWIDTH Parallelism: divide a big problem into many smaller ones to be solved in parallel.

  36. Kinds of Parallel Execution Any Any Sequential Sequential Pipeline Program Program Sequential Sequential Partition outputs split N ways inputs merge M ways Any Any Sequential Sequential Sequential Sequential Program Program

  37. Why are Relational OperatorsSuccessful for Parallelism? • Relational data model uniform operators • on uniform data stream • Closed under composition • Each operator consumes 1 or 2 input streams • Each stream is a uniform collection of data • Sequential data in and out: Pure dataflow • partitioning some operators (e.g. aggregates, non-equi-join, sort,..) • requires innovation • AUTOMATIC PARALLELISM

  38. Database Systems “Hide” Parallelism • Automate system management via tools • data placement • data organization (indexing) • periodic tasks (dump / recover / reorganize) • Automatic fault tolerance • duplex & failover • transactions • Automatic parallelism • among transactions (locking) • within a transaction (parallel execution)

  39. SQL a Non-Procedural Programming Language Executors • SQL: functional programming language describes answer set. • Optimizer picks best execution plan • Picks data flow web (pipeline), • degree of parallelism (partitioning) • other execution parameters (process placement, memory,...) Execution Planning Monitor Schema Plan GUI Optimizer Rivers

  40. Automatic Data Partitioning Split a SQL table to subset of nodes & disks Partition within set: Range Hash Round Robin Good for equijoins, range queries group-by Good for equijoins Good to spread load Shared disk and memory less sensitive to partitioning, Shared nothing benefits from "good" partitioning

  41. N x M way Parallelism N inputs, M outputs, no bottlenecks.

  42. Parallel Objects? • How does all this DB parallelism connect to hardware/software Cyber Bricks? • To scale to large client sets • need lots of independent parallel execution. • Comes for from from ORB. • To scale to large data sets • need intra-program parallelism (like parallel DBs) • Requires some invention.

  43. Outline • Hardware CyberBricks • all nodes are very intelligent • Software CyberBricks • standard way to interconnect intelligent nodes • What next? • Processing migrates to where the power is • Disk, network, display controllers have full-blown OS • Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA) to them • Computer is a federated distributed system. • Parallel execution is important

  44. MORE SLIDESbut there is only so much time. Too bad

  45. The Disk Farm On a Card The 100GB disc card An array of discs Can be used as 100 discs 1 striped disc 10 Fault Tolerant discs ....etc LOTS of accesses/second bandwidth 14" • Life is cheap, its the accessories that cost ya. • Processors are cheap, it’s the peripherals that cost ya • (a 10k$ disc card).

  46. Parallelism: Performance is the Goal Goal is to get 'good' performance. Trade time for money. • Law 1: parallel system should be • faster than serial system • Law 2: parallel system should give • near-linear scaleup or • near-linear speedup or • both. • Parallel DBMSs obey these laws

  47. Success Stories • Online Transaction Processing • many little jobs • SQL systems support • 50 k tpm-C (44 cpu, 600 disk 2 node ) • Batch (decision support and Utility) • few big jobs, parallelism inside • Scan data at 100 MB/s • Linear Scaleup to 1,000 processors transactions / sec hardware recs/ sec hardware

  48. The New Law of Computing 2x $ is 4x performance 1,000 MIPS 32 $ 2x $ is 2x performance 1 MIPS 1 $ .03$/MIPS 1,000 MIPS 1 MIPS 1,000 $ 1 $ • Grosch's Law: • Parallel Law: • Needs • Linear Speedup and Linear Scaleup • Not always possible

  49. Clusters being built • Teradata 1,000 nodes (30k$/slice) • Tandem,VMScluster 150 nodes (100k$/slice) • Intel, 9,000 nodes @ 55M$ ( 6k$/slice) • Teradata, Tandem, DEC moving to NT+low slice price • IBM: 512 nodes ASCI @ 100m$ (200k$/slice) • PC clusters (bare handed) at dozens of nodes web servers (msn, PointCast,…), DB servers • KEY TECHNOLOGY HERE IS THE APPS. • Apps distribute data • Apps distribute execution

  50. BOTH SMP and Cluster? Grow Up with SMP 4xP6 is now standard Grow Out with Cluster Cluster has inexpensive parts Cluster of PCs

More Related