1 / 59

Commodity Database Servers

Status report on commodity server performance, why most VLDBs will be multimedia servers, and preview of Microsoft's SQL Server 7.

wilsona
Download Presentation

Commodity Database Servers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com http://Research.Microsoft.com/~Gray/talks

  2. Outline • Status report on Commodity Server Performance • Why Most VLDBs will be Multi-Media Servers • Preview of Microsoft’s SQL Server 7

  3. Status Report on Commodity Server Performance • Standards: • TPC, • SpecWeb, ... • Product benchmarks: e.g. • SAP, • PeopleSoft,… • Both indicate that • NT is 18 months behind Unix-SMP performance • but clusters can make up the difference

  4. SMP HP 9000 16 cpu, Sybase 1152.1 ktpmC, 82$/tpmC NEC 8 cpuSQL Server14.9 ktpmC, 60$/tpmC Cluster IBM SP2 12x8 cpuOracle 8.257 ktpmC, 148$/tpmc Predict:large & inexpensive NT cluster number this year. TPC-C

  5. Performance Champions:NCR/Teradata 1 TB:32x4 node clusters 300 GB: 24x4 node cluster 100 GB: 8x4 cluster All use Teradata software on NCR World-Mark Intel-based hardware TPC-D

  6. Outline • Status report on Commodity Server Performance • Why Most VLDBs will be Multi-Media Servers • Preview of Microsoft’s SQL Server 7

  7. California DMV ~ 20 million cars, drivers, doctors, barbers,.. Some drivers have moving violations DMV knows about 1.5 KB about each one 30 GB total. Microsoft: too big says DoJ 40B$ revenue (in company life time) ~1 billion unit sales: @ 100 B = 100 GB ~100 M customers: @1 KB = 100 GB Wall Mart (no one bigger!) Sells 10 B items per year 100 bytes/item => 1 TB ATT 300 M calls per day (peak day) 10 B calls per year 100 b/call = 1 TB VLDB Reality Test

  8. Its HARD to find 1 TB of transaction data 100 M web hits/day 250 B/hit 1TB/year Its HARD to find 1TB of text data 100 M web pages 10 KB/page = 1 TB How do they do it? Lots of indices? No: that is only 3x Precomputed Aggregates? Yes: OLAP benchmark Start at 30 MB Use 2.7 GB or 6GB database But: this is dumb Email? Microsoft: 6 TB Hotmail: 3.5 TB AOL? VLDB Reality Test

  9. Data Tidal Wave • Seagate 47GB drive @ 3k$ • 100 GB penny per MB drive coming in 2000 • 10 $/GB = 10 k$/ Terabyte! (in y2k) • Everyone can afford one • What’s a terror bite? • If you sell ten billion items a year (e.g Wal-Mart) • And you record 100 bytes on each one • Then you got a Terror Bite • Where will the terror bytes come from? • Multimedia (like the TerraServer) and...

  10. Multi Media: Very Large DBs • Photo is 100 KB, not 100 B • So, photo DBs are 1,000x larger • Examples: • Scanned documents • Photo records of products/people/places • Surveillance • Scientific monitoring

  11. Some TerrorByte Databases • EOS/DIS (picture of planet each week) • 15 PB by 2007 • Federal Reserve Clearing house: images of checks • 15 PB by 2006 (7 year history) • Sloan Digital Sky Survey: • 40 TB raw, 2 TB cooked • TerraServer:

  12. Scaleup - Big Database • Build a 1 TB SQL Server database • Show off Windows NT and SQL Server scalability • Stress test the product • Data must be • 1 TB • Unencumbered • Interesting to everyone everywhere • And not offensive to anyone anywhere • Loaded • 1.1 M place names from Encarta World Atlas • 1 M Sq Km from USGS (1 meter resolution) • 2 M Sq Km from Russian Space agency (2 m) • Will be on web (world’s largest atlas) • Sell images with commerce server. • USGS CRDA: 3 TB more coming.

  13. 324 disks (2.9 terabytes) 8 x 440Mhz Alpha CPUs 10 GB DRAM NT EE & SQL 7.0 Photo of the planetUSGS and Russianimages TerraServerWorld’s Largest PC!

  14. Earth is 500 Tera-meters square USA is 10 tm2 100 TM2 land in 70ºN to 70ºS We have pictures of 6% of it 3 tsm from USGS 2 tsm from Russian Space Agency Compress 5:1 (JPEG) to 1.5 TB. Slice into 10 KB chunks Store chunks in DB Navigate with Encarta™ Atlas globe gazetteer StreetsPlus™ in the USA Someday multi-spectral image of everywhere once a day / hour 1.8x1.2 km2 tile 10x15 km2 thumbnail 20x30 km2 browse image 40x60 km2 jump image Background

  15. US Geologic Survey 3 TeraBytes Most data not yet published Based on a CRADA TerraServer makes data available. 1x1 meter 4 TB Continental US New DataComing USGS “DOQ” USGS Digital Ortho Quads (DOQ)

  16. SPIN-2 Russian Space Agency(SovInfomSputnik)SPIN-2 (Aerial Images is Worldwide Distributor) • 1.5 Meter Geo Rectified imagery of (almost) anywhere • Almost equal-area projection • De-classified satellite photos (from 200 KM), • More data coming (1 m) • Want to sell imagery on Internet. • Putting 2 tm2 onto TerraServer.

  17. Microsoft BackOffice SPIN-2 Demo http://www.TerraServer.com

  18. Hardware SPIN-2 1TB Database Server AlphaServer 8400 4x400. 10 GB RAM 324 StorageWorks disks 10 drive tape library (STC Timber Wolf DLT7000 )

  19. Software Terra-Server Web Site Web Client ImageServer Active Server Pages Internet InformationServer 4.0 HTML JavaViewer The Internet broswer MTS Terra-ServerStored Procedures Internet InfoServer 4.0 Internet InformationServer 4.0 Sphinx (SQL Server) MicrosoftSite Server EE Microsoft AutomapActiveX Server Automap Server Image DeliveryApplication SQL Server7 Terra-Server DB Image Provider Site(s)

  20. System Management & Maintenance • Backup and Recovery • STC 9717 Tape robot • Legato NetWorker™ • Sphinx Backup/Restore Utility • Clocked at 80 MBps!! • SQL Server Enterprise Mgr • DBA Maintenance • SQL Performance Monitor

  21. H: G: E: F: TerraServer File Group Layout • Convert 324 disks to 28 RAID5 setsplus 28 spare drives • Make 4 NT volumes (RAID 50)595 GB per volume • Build 30 20GB files on each volume • DB is File Group of 120 files

  22. Gazetteer Design • Classic Snowflake Schema • Fast First hint to Optimizer

  23. Image Data Design • Image pyramid stored in DBMS (250 M recs)

  24. ESA LoadMgr AlphaServer4100 AlphaServer4100 60 4.3 GB Drives Image Delivery and Load DLTTape “tar” \Drop’N’ LoadMgrDB DoJob Wait 4 Load DLTTape NTBackup ... Cutting Machines LoadMgr 10: ImgCutter 20: Partition 30: ThumbImg40: BrowseImg 45: JumpImg 50: TileImg 55: Meta Data 60: Tile Meta 70: Img Meta 80: Update Place ImgCutter 100mbitEtherSwitch \Drop’N’ \Images TerraServer Enterprise Storage Array STCDLTTape Library AlphaServer8400 108 9.1 GB Drives 108 9.1 GB Drives 108 9.1 GB Drives

  25. SQL 7 Testimonial • We started using it March 4 1997 • SQL 7 Pre-Alpha • SQL 7 Alpha • SLQ 7 Beta 1 • SQL 7 Beta • Loaded the DB twice • (we made application mistakes) • Now doing it “right” • Reliability: Great! SQL 7 never lost data • Ease of use: Great! • Functionality: Great!

  26. Outline • Status report on Commodity Server Performance • Why Most VLDBs will be Multi-Media Servers • Preview of Microsoft’s SQL Server 7

  27. SQL 7: Easy & Functional • Dynamic self management • Multi-site management • Alert/response management • Job scheduling and execution • Scriptable management • profiling/tuning tools • Fully Unicode • English Language Query • Integrated text search engine Easy Scalability Data Warehousing

  28. Made It Easier!(fewer knobs) • Desktop & Workgroups • Auto Configure Engine / Dynamic Disk/memory • Reduce Learning Curve, Increase Productivity • Self-Managing SQLAgent, Wizards, “Task Pads” • Large Organizations • Deploy/manage hundreds of SQL Servers • Lower TOC for Large Environments • Multi-Server Operations/ “Lights-out” Environment

  29. Multi-Site Management • Admin servers from one place • Automate simple stuff • Wizards for common stuff • Manage arrays of servers • operations, security,… • Replication • Import/export • Interface is scriptable • COM object model • Script with Java, VB, ... • Scheduling and Multi-step jobs

  30. DBA and Developer Tools • Built-in GUI • data/schema design • data query & edit • intgrated with programming tools • SQL Server Profiler • Selected server events and trace criteria • “Capture” output to screen or replay • SQL Server Expert • Analyzes actual server usage history • Makes recommendations to improve performance • Recommends Index design • Recommends operations procedures

  31. Wizards and GUIs • Wizards galore (over 50 at last count) • MS Access as a query interface • Built-in data access tools (integrated with tools) • Graphical show plan

  32. Many New Wizards... • Web Assistant • Register Servers • Configure Replication • Create Publication • Create Pull Subscription • Create Push Subscription • Replication Partitioning • Create an Index • Create a Stored Procedure • Create a View • More to come... • Create a Database • Scheduled Backup • Create a Maintenance Plan • Create a Scheduled Job • Create an Alert • Security Wizard • Import Data to SQL Server • Export Data From SQL Server • Clustering (Wolfpack) • Index Tuning Wizard

  33. Distributed Management Objects (SQL-DMO) • COM Interfaces for administering SQL Server • Embedded Administration (no UI) • All Administration Functions Supported • Server, Database Configurations, Settings • Object Creation, Security, Replication, Scripting,.. • 40+ Objects, 1000+ properties and methods • Integration Interface for ISV Administration • I.e., Baan using DMO for Scripted App Install • Scripting Via VBA and Jscript + DCOM

  34. SQLAgent Jobs Tasks Alerts Operators FileGroups Files Table Columns View Indexes Stored Procs Keys (PK/FK) Rules Triggers Defaults DMO: Object Model (Overview) SQL Server Databases Users DB Options Transaction Logs Publications Logins Configurations Linked Servers Remote Login

  35. DMO Scripting • Backup a Database Set MyServer = CreateObject("SQLDMO.SQLServer")‘Create Server Object Set MyBackup = CreateObject("SQLDMO.Backup") ‘Create Backup Object MyServer.Name = “MSSALES” ‘ Identify Server MyServer.LoginSecure = True ‘ Windows NT Auth MyServer.Connect ‘ Connect MyBackup.Database = ”SALESII” ‘ Database to backup MyBackup.Files = "\\MyServer\Backups\" _ ‘ Backup Location + MyBackup.Database +”.bak” ‘ Name Backup File MyBackup.SQLBackup MyServer ‘ Back it Up MyServer.Disconnect ‘ We’re Done!

  36. Scalability • Win9x/NTW version • Dynamic row-level locking • Improved query optimizer • Intra-query parallelism • 64-bit support • Replication • Distributed query • High Availability Clusters Easy Scalability Data Warehousing

  37. Scale Down to Windows 95-98 • Full function (same as NTW) • Self managing • Many tools • Integration with Next MS Access • Great for imbedded apps

  38. Distributor ODBC OS 390 DB2 DB2 VSAM CICS Subscriber Subscriber Subscriber Subscriber Publisher Replication 2PC, RPC • Transactional and Merge • Remote update • ODBC and OLE DB subscribers • Wizards • Performance Updating Subscriber (immediate updates)

  39. Global Agg. + Result 50 rows 4 x 50 rows + + + + Local Agg. 50,000 rows Disks Parallel QuerySMP & Disk Parallelism • Plus Distributed • Plus Hash Join (fanciest on the planet) • Plus Optimized Partitioned views • # of emp. per group • total inc. per group

  40. DirectoryService Database (DB2, VSAM, Oracle, …) Spreadsheet Photos Mail Maps Documents and the Web Distributed Heterogeneous QueriesData Fusion / Integration SQL 7.0 Query Processor Join spread sheets, databases, directories, Text DBs etc. Any source that exposes OLE DB interfaces SQL Server as gateway, even on the desktop

  41. Backup Fuzzy Parallel Incremental Restartable Recovery Fast File granularity Reorganize shrinks file reclusters file Auto-repair UtilitiesThe Key to LARGE Databases

  42. Data Warehousing • Warehousing Framework • Visual data modeler • Microsoft repository • Data transformation services (DTS) • Plato & Dcube - Multi Dimensional Data Cubes • English query 2.0 • Built-in text-index engine Easy Scalability Data Warehousing

  43. Key Microsoft Data Warehouse Programs • Data Warehouse Framework (DWF) • Process -- for building, using and managing • Pipeline -- for metadata flow • Protocols -- to integrate components • Data Warehouse Alliance (DWA) • Partners -- ISVs pledged to the framework and its parts • Products -- complete spectrum from Microsoft and third-parties

  44. Microsoft Data Warehousing Framework Building Using Data Warehouse Design (logical/physical schema*/ data flow**) Data Mart Design** (Cubes/Star schema) End-User Tools (Excel**, Access, English Query) Operational Data (OLE-DB **) Data Transformations (DTS**) Data Marts (SQL Server** & OLAP Server**) OLE-DB** Managing Microsoft Repository** (Persistent Shared Meta-Data) DB Schema** Transformation** Scheduling OLAP Data Warehouse Management (Console*, Scheduling**, Events**,Topology*,) **available in SQL Server 7 (* partially) Data Flow Meta-Data Flow

  45. BMC Data Mirror Execusoft Informatica Microsoft Platinum Technology Praxis Prism Sagent SAS Sterling V-Mark Technical and marketing relationship Supports SQL Server storage engine Third-party products tested with BackOffice Alliance for Data Warehousing DW Build DW Access Andyne Business Objects Cognos IQ Software Microsoft NCR Data Mining Pilot Platinum Technology Sagent SAS Seagate Wall Data

  46. DW Alliance Milestones • 9/96 - Launched with 8 founding members • 3/97 - Design review • 1/97 - 6/97 - Expanded to 21 members • 7/97 - Repository design review • Team development of shared metadata • 9/97 - OLE DB for OLAP API specification • 1H’98 - Integration development with Sphinx DTS and Replication APIs

  47. UML UMX DTM GEN DBM CDE SQL COM OCL UML Unified Modeling Language GEN Generic UMX Uml Extensions DBM Database Model SQL Microsoft SQL Server CDE Component Descriptions COM Component Object Model OCL Oracle DTM Data Type Model Microsoft Repository • Based on joint Sterling/Microsoft design (Shipped 97Q2) • Wide distribution:VB, Visual Studio and Third-Parties • Designed with over 60 vendors • Extended to support DB schema, transformations, OLAP • Key element of the DW Framework • UML is abstract model • Everything viewable in UML terms

  48. Repository & Data Warehousing • Common infrastructure -- the meta-data pipeline • Supports interoperability between data warehousing tools and products • Process: • Initial spec developed with 12 vendors • Gathering feedback now • Final spec review in Redmond, 2/98

  49. IUnknown IDTSDataPump Transforms Oracle > SQL Server Data Pump Data Pump Data Transformation • Workflow system manages Data Pump • Pre-defined transforms using the DTS GUI • Procedural VB Script, JavaScript, VBA, any COM • Multi-stream in, Multi-stream out Repository Metadata Transformation Objects ActiveX Scripts • Function Example() Transform() • If DTSSource(“CreditRating”) = “1” then • DTSDestination(” Risk ") = ”Good" • Else If DTSSource(”Credit") = ”2” • DTSDestination(” Risk ") = ”Average” • Else If DTSSource(”Credit") = ”3” • DTSDestination(” Risk ") = ”Bad” • Else • Example = DTS_SkipRow • End if • End Function SQLAgent Multiserver Operations

  50. Transformations • Data quality and validation • Missing values, scrubbing, exception handling • Data integration • Heterogeneous query, join keys, elim. dups • Transforms • Combine/decompose multiple columns to one • Aggregation • Central metadata • Business rules, data lineage

More Related