1 / 54

Data Management in a Highly Connected World

Data Management in a Highly Connected World. James Hamilton JamesRH@microsoft.com Microsoft SQL Server. March 3, 2000. Agenda. Client Tier Number of devices Device interconnect fabric Standard programming infrastructure Client tier database issues Resource requirements

Download Presentation

Data Management in a Highly Connected World

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management in a Highly Connected World James Hamilton JamesRH@microsoft.com Microsoft SQL Server March 3, 2000

  2. Agenda • Client Tier • Number of devices • Device interconnect fabric • Standard programming infrastructure • Client tier database issues • Resource requirements • Implementation language • Administrative cost implications • Development cost implications • Middle Tier • Server Tier • Summary

  3. How Many Clients? • 1998 US WWW users (IDC) • US: 51M; World wide: 131M • 2001 estimates: • World Wide: 319M users • 515M connected devices • ½ billion connected Clients • Conservative estimate based upon conventional device counts

  4. Other Device Types • TVs, VCRs, stoves, thermostats, microwaves, CD players, computers, garage door openers, lights, sprinklers, appliances, driveway de-icers, security systems, refrigerators, health monitoring, etc. • Sony evangelizing IEEE 1394 Interconnect • http://www.sel.sony.com/semi/iee1394wp.html • Microsoft & consortium evangelizing Universal Plug & Play • www.upnp.org • WAP: Wireless Application Protocol • http://www.wap.net/

  5. Device Interconnect Infrastructure • Power line control • X10: http://www.x10.org • Sunbeam Thalia: • http://www.thaliaproducts.com/

  6. Why Connect These Devices? • TV guide & auto VCR programming • CD label info & song list download • Sharing data & resources • Set clocks (flashing 12:00) • Fire and burglar alarms • Persist thermometer settings • Feedback & data sharing based systems: • Temperature control & power blind interaction • Occupancy directed heating and lighting

  7. Device Communication Implications • The need is there • Infrastructure is going in: • Wireless • Power line communications • Unused twisted pair (phone) bandwidth • Connectable devices & infrastructure arriving & being deployed • On order of billions of client devices

  8. Device Interconnect Example

  9. Device Interconnect Example

  10. Device Interconnect Example

  11. Device interconnect Example Den Windows NT Server Ethernet Hub 56k bps line Ethernet Backbone Deck Filtration Plant 130 Gallon F/W Aquarium Living Room 660 Gallon Marine Aquarium X10 Backbone 130 Gallon F/W Aquarium Home Sprinklers Bedroom

  12. Improvements For Example • Cooperation of lighting, A/C and power blind systems • Alarms and remote notification for failures in: • Circulations pump • Heating & cooling • Salinity & other water chemistry changes • Filtration system • Feedback directed systems

  13. Palmtop Resource Trends • Palmtops I’ve purchased through the years • All about same cost & physical size PalmtopRAM Moore’s Law 100 Casio E105 (32M) 32M HP 200LX (2M) 10 HP 100LX (1M) Everex A20 (4m) HP 95LX (0.5M) Sharp IQ8300 (0.25M) Sharp IQ7000 (0.125M) 1 0.1 1990 1992 1994 1996 1998 2000 2002

  14. O/S Memory Requirements • Windows Memory requirements over time Desktop RAM Moore’s Law 128m 100 Windows 2000(64M) Windows98 (16M) 10 Windows95 (4M) WFW 3.1 (3M) Windows 3.0 (2M) 1 Windows 2.0 (512K) Windows 1.0 (256K) 0.1 1985 1987 1989 1991 1993 1995 1997 1999

  15. Smartcard Resource Trends 300 M 1 M Memory Size (Bits) 10 K You are here 3 K 1990 1992 1996 1998 2000 2002 2004 Source: Denis Roberson PIN/Card -Tech/ NCR

  16. Devices Smaller Than PDAs • Qualcomm PDQ • 2 MB total memory • Same mem curve as PDAs…just 2 to 5 years behind • Nokia 9000il • 8 MB total Memory

  17. Digital Cameras

  18. Resource Trend Implications • Device resources at constant cost are growing at super-Moore rates • Same but 2 to 3 yrs behind desktop system growth • Same is true of each class of devices • Telephones trail PDAs but again grow at the same rate • Memory growth is not the problem • However devices always smaller than desktops • Devices more specialized so resource consumption less … can still run standard vertical app slice

  19. Standard Infrastructure at Client • Clearly specialized user interface S/W needed • But we have the memory resources to support: • Standard communications stack (TCP/IP) • Standard O/S software • Standard data management S/W with query • Transparent replication • Symmetric multi-tiered infrastructure S/W: • Leverage best development environments • No need to rewrite millions of redundant lines of code • More heavily used & tested so less bugs • Better productivity in programming to richer platform • A full DBMS at client both practical & useful

  20. Client-Side Database Issues • “Honey I shrunk the database” (SIGMOD99): • DB Footprint • Implementation Language • Both issues either largely irrelevant or soon to be: • Resource availability trends support standard infrastructure S/W • Dominant costs: admin, operations & user training, and programming • Vertical slice of standard apps rather than full custom infrastructure

  21. DB Implementation Language • Special DB implementation language (Java) argument: • centers on auto-installation of S/W infrastructure • Auto-install is absolutely vital, but independent of implementation language • Auto-install not enough: client should be a cache of recently used S/W and data • Full DBMS at client • Client-side cache of recently accessed data • Optimizer selected access path choice: • driven by accuracy & currency requirements • balanced against connectivity state & communications costs

  22. Admin Costs Still Dominate • 60’s large system mentality still prevails: • Optimizing precious machine resources is false economy • Admin & education costs more important • TCO education from the PC world repeated • Each app requires admin and user training…much cheaper to roll out 1 infrastructure across multiple form factors • Sony PlayStation has 3Mb RAM & Flash • Nokia 9000IL phone has 8Mb RAM • Trending towards 64M palmtop in 2001 • Vertical app slice resource reqmt can be met

  23. Dev Costs Over Memory Costs • Specialty RTOS weak dev environments • Quality & quantity of apps driven by: • Dev environment quality • Availability of trained programmers • Requirement for custom client development & configuration greatly reduces deployment speed • Same apps have wide range of device form factors • Symmetric client/server execution environ. • DB components and data treated uniformly • Both replicated to client as needed

  24. Client Side Summary • On order of billions connected client devices • Most are non-conventional computing devices • All devices include standard DB components • Standard physical & logical device interconnect standards will emerge • DB implementation language irrelevant • Device DB resource consumption much less important than ease of: • Installation • Administration • Programming • Symmetric client/server execution environment

  25. Agenda • Client Tier • Middle Tier • High Availability via redundant data & metadata • Fault Isolation domains • XML • Mid-tier Caching • Server Tier • Summary

  26. High Availability is Tough

  27. Server Availability: Heisenbugs • Industry good at finding functional errors • Multi-user & application interactions hard: • Sequences of statistically unlikely events • Heisenbugs (http://research.microsoft.com/~gray/talks) • Testing for these is exponentially expensive • Server stack is nearing 100 MLOC • Long testing and beta cycles delay software release • System size & complexity growth inevitable: • Re-try operation (Microsoft Exchange) • Re-run operation against redundant data copy (Tandem) • Fail fast design approach is robust but only acceptable with redundant access to redundant copies of data

  28. The Inktomi Lesson • Inktomi web search engine (Brewer --SIGMOD’98) • Quickly evolving software: • Memory leaks, race conditions, etc. considered normal • Don’t attempt to test & beta until quality high • System availability of paramount importance • Individual node availability unimportant • Shared nothing cluster • Exploit ability to fail individual nodes: • Automatic reboots avoid memory leaks • Automatic restart of failed nodes • Fail fast: fail & restart when redundant checks fail • Replace failed hardware weekly (mostly disks) • Dark machine room • No panic midnight calls to admins • Mask failures rather than futile attempt to avoid

  29. Apply to High Value TP Data? • Inktomi model: • Scales to 100’s of nodes • S/W evolves quickly • Low testing costs and no-beta requirement • Exploits ability to lose individual node without impacting system availability • Ability to temporarily lose some data W/O significantly impacting query quality • Can’t loose data availability in most TP systems • Redundant data allows node loss w/o data availability lost • Inktomi model with redundant data & metadata a potential solution

  30. Redundant Data & Metadata • TP Point access to data nearly solved problem • TP systems scale with user number, people on planet, or business size • All trending at sub-Moore rates • Data analysis systems growing far faster than Moore’s Law: • Greg’s law: 2x every 9 to 12 (SIGMOD98—Patterson) • Seriously super-Moore implying that no single system can scale sufficiently: clusters are the only solution • Storage trending to free with access speed limiting factor • Detailed data distribution statistics need to be maintained • Improve access speed & availability using redundant data (indexes, materialized views, etc.) • Async update for stats, indexes, mat views • Data paths choice based upon need currency & accuracy

  31. Affordable Availability • Web-enabled direct access model driving high availability requirements: • recent high profile failures at eTrade and Charles Schwab • Web model enabling competition in information access • Drives much faster server side software innovation which negatively impacts quality • “Dark machine room” approach requires auto-admin and data redundancy • Inktomi model (Erik Brewer–SIGMOD’98) • 42% of system failures admin error (Gray) • Paging admin at 2am to fix problem is dangerous

  32. Connection Model/Architecture • Redundant data & metadata • Shared nothing • Single system image • Symmetric server nodes • Any client connects to any server • All nodes SAN-connected Client Server Node Server Cloud

  33. Compilation & Execution Model Client • Query execution on many subthreads synchronized by root thread Server Thread Lex analyze Parse Normalize Optimize Code generate Server Cloud Query execute

  34. Lose node: • Recompile • Re-execute Node Loss/Rejoin • Execution in progress Client • Rejoin: • Node local recovery • Rejoin cluster • Recover global data at rejoining node • Rejoin cluster Server Cloud

  35. Redundant Data Update Model • Updates are standard parallel query plans • Optimizer manages redundant access paths • Query plan responsible for access plan management: • No significant new technology • Similar to materialized view & index updates today Client Server Cloud

  36. Fault Isolation Domains • Trade single-node perf for redundant data checks: • Complex error recovery more likely to be wrong than original forward processing code • Many redundant checks are compiled out of “retail versions” when shipped • Fail fast rather than attempting to repair: • Bring down node for mem-based data structure faults • Don’t patch inconsistent data … copies keep system available • If anything goes wrong “fire” the node and continue: • Attempt node restart • Auto-reinstall O/S, DB and recreate DB partition • Mark node “dead” for later replacement

  37. Data Structure Matters • Most internet content is unstructured text • restricted to simple Boolean search techniques • Docs have structure, just not explicit • Yahoo hand categorizes content • indexing limited & human involvement doesn’t scale well • XML is a good mix of simplicity, flexibility, & potential richness • Structure description language of internet • DBMSs need to support as first class datatype • Too few librarians in world • so all information must be self-describing

  38. Relational to XML • SELECT … FOR XML • FOR XML RAW (return an XML rowset) • FOR XML AUTO (exploit RI, name matching, etc.) • FOR XML EXPLICIT (maximal control) • Annotated Schema • Mapping between XML and relational schema expressed in XML • Templates • Encapsulated parameterized query • XSL/T support • XPATH support • Direct URL access (SQL “owned” virtual root) • SELECT … FOR XML • Annotated schema • Templates

  39. XML to Relational • XML bulk load • Templates and Annotated Schema • SQL server hosted XML tree • Directly insert document into SQL Server hosted XML tree • Select from server hosted XML tree rowset & insert into SQL tables • XML Data type support • Hierarchical full text search

  40. XML: Example http://SRV1/nwind?sql=SELECT+DISTINCT+ContactTitle+FROM+Customers+WHERE+ContactTitle+LIKE+'Sa%25'+ORDER+bY+ContactTitle+FOR+XML+AUTO Result set: <Customers ContactTitle="Sales Agent"/> <Customers ContactTitle="Sales Associate"/> <Customers ContactTitle="Sales Manager"/> <Customers ContactTitle="Sales Representative"/>

  41. Mid-Tier Cache Requirements • Non-proprietary multi-lingual programming • Symmetric mid-tier & server programming model • Non-connected, stateless programming model • High scale thread pool based • Efficient main memory DB support • Full query over local cache • Query over just cached data, or • Query over full corpus (server interaction reqd) • Ability to handle network partitions & server failure • Support for life-time attributed data: • Transactional (possibly multi-server) • Near real time • Every N time units • Read only

  42. Agenda • Client Tier • Middle Tier • Server Tier • Affordable computing by the slice • Everything online • Disk are actually getting slower • Processing moves to storage • Approximate answers quickly • Semi-structured storage support • Administrative issues • Summary

  43. Server-Side Changes • Server databases more functionally rich than often required • Trend reversal: • Less at the server-tier with richer mid-tier • Focus at back-end shifts to: • Reliability, Availability, and Scalability • Reducing administrative costs • Server side trends: • Scalability over single-node performance • Everything online • Affordable availability in high scale systems

  44. Compaq/Microsoft TPC-C Benchmark • tpmC These are Top 5 benchmarks as of Feb 17, 2000. 227,079 152,207 135,815 135,815 135,461 $98 $55 $53 $20 $19 ProLiant 8500 Cluster Windows 2000 SQL Server 2000 $4,341,603. $ 19.12 tpmC ProLiant 8500 Cluster Windows 2000 SQL Server 2000 $2,880,431. $ 18.93 tpmC IBM RS/6000 S80 AIX 4.3.3 Oracle v 8.1.6 $7,156,910. $52.70/tpmC Escala EPC2400 AIX 4.3.3 Oracle v8.1.6 $7,462,215 $54.94 tpmC Enterprise 6500 Solaris 2.6 Oracle 8i v 8.1.6 $13,153,324. $97.10/tpmC NOTE: All TPC-C results reported as of February 17, 2000

  45. Computing by the Slice Source: TPC report executive summary

  46. Just Save Everything • Able to store all Info produced on earth (Lesk): • Paper sources: less than 160 TB • Cinema: less than 166 TB • Images: 520,000 TB • Broadcasting: 80,000 TB • Sound: 60 TB • Telephony: 4,000,000 TB • These data yield 5,000 petabytes • Others estimate upwards of 12,000 petabytes • World wide 1998 storage production: 13,000 petabytes • No need to manage deletion of old data • Most data never accessed by a human • Access aggregations & analysis, not point fetch • More storage than data allows for greater redundancy: • indexes, materialized views, statistics, & other metadata

  47. Disk are Becoming Black Holes • Seagate Cheetah 73 • Fast: 10k RPM, 5.6 ms access, 16 MB cache • But Very large: 73.4 GB • Result? Black hole: 2.4 accesses/sec/gb • Large data caches required • Employ redundant access paths

  48. Processing Moves Towards Storage • Trends: • I/O bus bandwidth is bottleneck • Switched serial nets support very high bandwidth • Processor/memory interface is bottleneck • Growing CPU/DRAM perf gap leading to most CPU cycles in stalls • Combine CPU, serial network, memory, & disk in single package • E.g. David Patterson ISTORE project

  49. Processing Moves Towards Storage • Each disk forms part of multi-thousand node cluster • Redundant data masks failure • RAID-like approach • Each cyberbrick commodity H/W and S/W • O/S, database, and other server software • Each “slice” plugged in & personality set • E.g. database or SAP app server) • No other configuration required • On failure of S/W or H/W, redundant nodes pick up workload • Replace failed components at leisure • Predictive failure models

  50. Approximate Answers Quickly • DB systems focus on absolute correctness • As size grows, correct answer increasingly expensive • Text search systems depend upon quick approx answer • Approx answer with statistical confidence bound: • Steadily improve result until user satisfied • “Ripple Joins for Online Aggregation”(Hellerstein-SIGMOD99) • Allows rapid exploration of large search spaces: • Conventional full accuracy only when needed • Run query on incomplete mid-tier cache?

More Related