1 / 36

Tom Barclay Jim Gray, Don Slutz, Greg Smith, many others Microsoft Research

SPIN-2. Tom Barclay Jim Gray, Don Slutz, Greg Smith, many others Microsoft Research. Scaleup - Big Database. Build a 1 TB SQL Server database Data must be 1 TB Unencumbered Interesting to everyone everywhere And not offensive to anyone anywhere Loaded

ami
Download Presentation

Tom Barclay Jim Gray, Don Slutz, Greg Smith, many others Microsoft Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPIN-2 Tom Barclay Jim Gray, Don Slutz, Greg Smith, many others Microsoft Research

  2. Scaleup - Big Database • Build a 1 TB SQL Server database • Data must be • 1 TB • Unencumbered • Interesting to everyone everywhere • And not offensive to anyone anywhere • Loaded • 1.1 M place names from Encarta World Atlas • 1 M Sq Km from USGS (1 meter resolution) • 2 M Sq Km from Russian Space agency (2 m) • Will be on web (world’s largest atlas) • Sell images with commerce server.

  3. What’s a Terabyte? 1 Terabyte 1,000,000,000 business letters 150 miles of book shelf 100,000,000 book pages 15 miles of book shelf 50,000,000 FAX images 7 miles of book shelf 10,000,000 TV pictures (mpeg) 10 days of video 4,000 LandSat images 16 earth images (100m) Library of Congress (in ASCII) is 25 TB 1980: 200 M$ of disc 10,000 discs 5 M$ of tape silo 10,000 tapes 1998: 100 k$ of magnetic disc 60 discs 50 K$ nearline tape 30 tapes Terror Byte !!

  4. Some Other Terror-Byte Databases • TerraServer • Sloan Digital Sky Survey: • 40 TB raw, 2 TB cooked • EOS/DIS (picture of planet each week) • 15 PB by 2007 • Federal Reserve Clearing house: images of checks • 15 PB by 2006 (7 year history) • Nuclear Stockpile Stewardship Program • 10 Exabytes (???!!) Kilo Mega Giga Tera Peta Exa Zetta Yotta

  5. TerraServer is: “A shameless advertisement of WNT and SQL Server Scalability” • An on-line demo and sales tool directed at IT customers and ISVs • A test of the Sphinx VLDB features: • Load performance • Online Backup/Restore • Query Performance • A “cool 90s app” • Image and Text data • Web-lication • Electronic Commerce

  6. Application Requirements • BIG —1 TB of data. • PUBLIC — available on the world wide web. • INTERESTING — to a wide audience • ACCESSIBLE — using standard browsers (IE, Netscape) • REAL — a real application (users can buy imagery) • FREE —cannot require NDA or money to access • FAST — impress customers for BackOffice, StorageWorks • EASY — Inexpensive to develop, deploy, and maintain

  7. Coverage: Range from 70ºN to 70ºS35% U.S., 1% outside U.S. Source Imagery: 3.5 TB 1sq meter/pixel Aerial (USGS - 60,000 46Mb B&W- 151Mb Color IR files) 700 GB 1.56 meter/pixelSatellite (Spin-2 - 2400 300 Mb B&W) Display Imagery: 80 m 225 x 150 pixel images, 1.6 m x 3 sub-sampled views Nav Tools: 1.5 m place names “Click-on” Coverage map Expedia & Virtual Globe map Concept: User navigates an ‘almost seamless’ image of earth 225x150m tile 1.8x1.2km 8m browse 1.8x1.2km 16m thumbnail 1.8x1.2km 32m “city view” Database & App UI

  8. World’s Largest PC! 324 disks (2.4 TB) 8 x 440 mhz Alpha CPU 10 GB RAM

  9. Alpha 8400 (8x440) 10GB Ram Site Configuration StorageTek Enterprise Storage Array 9 HSZ70 Ultra-SCSI Dual redundant Controllers 324 9.1 Seagate Disks 6 DLT7000 Quantum Drives FWD SCSI Compaq 5500 4x200mhz Web Servers Compaq 5500 4x200mhz Web Servers To the Web

  10. Software Terra-Server Web Site Web Client ImageServer Active Server Pages Internet InformationServer 4.0 HTML JavaViewer The Internet broswer MTS Terra-ServerStored Procedures Internet InfoServer 4.0 Internet InformationServer 4.0 Sphinx (SQL Server) MicrosoftSite Server EE Microsoft AutomapActiveX Server Automap Server Image DeliveryApplication SQL Server7 Terra-Server DB Image Provider Site(s)

  11. How We Did It • “Chopped” big images into small “tiles” • Sub-sampled tiles to create zoom levels • Tile sizes map to Lat/Lon system • Unique ID assigned to each Tile location • (Z-transform of lat/long or UTM) • Unique ID clusters adjacent tiles onto the same database & index pages • Wrote Load Management program • Runs image cutting job • Loads meta and image data into SQL • Multiple Loaders can run in parallel • Web Active Server Page controls load process

  12. 1 Degree Latitude 1 Degree Longitude USGS Editing Process 1 8 9 1 “QUAD” DOQ Photo (3.75’ x 3.75’) 1 Quadrangle (7.5’ x 7.5’) 64 1 2 3 4 5 6 DOQTiles 7 8 9 10 11 12 13 14 15 16 17 18 Quad Cut 3x6Jump, Thumb-nails & Browse Images DOQQ Origin Point

  13. Spin-2 Image Editing Process 48 x 96 cells per sq degree Image aligned to left corner of grid system Non-image squares (all white) are discarded Cut Images are extracted SubSample Jump 32m Thumb Tiles are cut 5x5, scrambled output Jpeg 16m 8m Browse

  14. File name (of image) City1 State1 Country Number of Rows Number of Columns Shooting Height Height of Sun Date of survey (mm/dd/yyyy) Time of survey (GMT) (hr:mn:ss) Upper Left Latitude Upper Left Longitude Lower Right Latitude Lower Right Longitude Camera System1 Pixel size1 Copyright1 Spin-2 Meta Data Semi-colon delimited fields, ASCII encoding 1 records per line 1Field is not required, if not present, then a blank field is present

  15. Database Design and Load • Build a 1 TB (2**40B) SQL Server Database • Database includes • Gazetteer data for searching • Image data pyramid and metadata • Load the Database • Chop the big images into tiles • BCP data and metadata in • Allow for restart and undo of loads • Create indexes • Check consistency of the data • Keep it Simple, no Tricks, Test the Scaling

  16. 1:1 1:1 64:1 Jump image 1 pixel = 32x32 m2 Dithered Browse image 1 pixel = 16x16 m2 USGS Tile image DOQ of Washington Monument 1 pixel = 1 sq meter Dithered Thumb image 1 pixel = 8x8 m2 The Image Pyramid • Zooming in on the Washington Monument

  17. Country State Image Data & Meta Data Theme Meta Information TileLog Place PlaceType TileMeta ImgMeta FeatureType Gazetteer Star schema Index on • image, place, type • image, state, type • image, state, country, type • image, place, state, type • image, place, country, type all lookups are fast BrowseImg TileImg Jump Img Thumb Img Lookup by UGrid or ZGrid ID plus resolution Lookups are fast. Indices are in DRAM (auto-magically by SQL) SQL manages all the tiles and indices Images are brought in on demand ‘Logical’ Schema Lat/Long(U/ZGridId)

  18. Gazetteer Design • Classic Snowflake Schema • Top 10 Hint to RE for Cursor Select

  19. Image Data Design • Image pyramid stored in DBMS (250 M recs)

  20. I: H: F: G: TerraServer File Group Design • Make 28 RAID5 sets from 324 disks Each raid set has 11 disks (16 spare drives) • Make 4 595GB NT volumesEach striped over 7 Raid sets on 7 controllers • Create 26 20,000MB files on F:, 27 on G: • DB is File Group of 53 files (1.011TB)

  21. 36 9GB disks 36 9GB disks 10x tapes 36 9GB disks 36 9GB disks 36 9GB disks 36 9GB disks 36 9GB disks 36 9GB disks 36 9GB disks Alpha 84004x400Mhz2 GB HSZ50Controller HSZ50Controller(2) HSZ50Controller HSZ50Controller HSZ50Controller HSZ50Controller HSZ50Controller HSZ0Controller HSZ50Controller HSZ50Controller Physical DB Design • 324 disks ~ 3 TB of disk space • Configured as RAID5 => ~2.4 TB • Configured as 20 NT volumes • Each volume ~ 120 GB • Big files! • SQL data spread across all volumes. • Combines the 20 files. • One BIG table for the tiles • Images stored as blobs (JPEG compressed) • 2 GB RAM holds • all indices and • gazetteer. DEC Alpha Server T2B2 PCI U-SCSI

  22. Other Details • Active Server pages • faster and easier than DB stored procedures. • Commerce Server is interesting • Images the Inventory • no SKU, • millions of them • USGS built their own • they are very smart, but it is easy • masquerade as a credit-card reader. • The earth is a geoid, and • Every Geographer has a coordinate system (or two). • Tapes are still a nightmare. • Everyone is a UI expert.

  23. Physical Database • 53 Files. 20,000MB each • 16,960,000 extents • 135,680,000 pages • Separate tables for DOQ, Spin ‘Themes’ • Each image stored in column of type ‘image’ • All tile images in one (big) table • A number of indexes too

  24. TerraServer Tables • USGS DOQ Data • 48,000 DOQQ images (45-55mb / image) • Creates 864,000 Jump, Thumb, & Browse images (3.5 m rows) • Creates 55.3 m Tile images (110.6 m rows) • SPIN-2 Data • 3200 278 MB images (approximate size) • Creates 620,800 Jump, Thumb, & Browse images (2.5 m rows) • Creates 15.5 m Tile images (31 m rows) • Gazetteer Data • 1.1 m named places (Encarta World Atlas) • 45 m cell names • Total Rows = 193.7 M

  25. The Loading Process • Includes Cutting Images, building BCP files, BCP meta data, BCP image data • First Load 1/97-5/97 for Scalability Day • 190 GB actual image data, 800 GB duplicates • Pre-beta Sphinx • Second Load 12/97-4/98 for Web Server • 750 GB actual image data, all images recut

  26. ESA LoadMgr AlphaServer4100 AlphaServer4100 60 4.3 GB Drives Image Preperation and Load DLTTape “tar” \Drop’N’ LoadMgrDB DoJob Wait 4 Load DLTTape NTBackup ... Cutting Machines LoadMgr 10: ImgCutter 20: Partition 30: ThumbImg40: BrowseImg 45: JumpImg 50: TileImg 55: Meta Data 60: Tile Meta 70: Img Meta 80: Update Place ImgCutter 100mbitEtherSwitch \Drop’N’ \Images TerraServer Enterprise Storage Array STCDLTTape Library AlphaServer8400 108 9.1 GB Drives 108 9.1 GB Drives 108 9.1 GB Drives

  27. “SRC”ThumbImg “SRC”BrowseImg “SRC”TileImg TileMeta ImgMeta ThumbImgId int ImgMetaId intZLatLong intSrcId intImgTypeId intPixWidth int PixHeight intImgData Blob BrowseImgId intImgMetaId intZLatLong intSrcId intImgTypeId intPixWidth int PixHeight intImgData Blob TileImgId intTileMetaId intZLatLong intSrcId intImgTypeId intPixWidth int PixHeight intImgData Blob TileMetaId intImgMetaId intOrigMetaId intSrcId intImgTypeId intXGridId intYGridId intHemisphere smallintContinent smallintxxLat smallintxxLong smallintZLatLong int ImgMetaId intOrigMetaId intSrcId intImgTypeId intXGridId intYGridId intImgDate DateHemisphere smallintContinent smallintxxLat smallintxxLong smallintZLatLong intMetaStr vchar(255) *.IMD & *.JPG Pre-Process DataRead *.IMD filesGenerate IdsGenerate ZLatLongSort by ZLatLong NT Backup Meta & ImageLoad Process Image Meta Tile Meta Load Browse ImgRead Image MetaRead Image DataBCP into ImgTbl Load Thumb ImgRead Image MetaRead Image DataBCP into ImgTbl Load Tile ImgRead Tile MetaRead Tile DataBCP into TileTbl Load Tile MetaRead Image MetaBCP into TileMeta Load Img MetaRead Image MetaBCP into TileMeta

  28. The Load Manager • A Workflow System. Manages Job ‘Steps’. • Built as an SQL Database App. Collects Stats. • Would use Data Transformation Services today

  29. Load Statistics • 601 DOQ Jobs, 818 Spin Jobs • Each job does 3 meta BCP, 4 Image BCP steps • 5676 Image BCP Steps • 106 million total images loaded • 546 GB total. 5.4 KB avg image size • For Tile Images (96% of the database) • avg 68,000 images/step. max 757,000 • avg 33 minutes/step. max 596 • total time 796 hours (33 days)

  30. System Maintenance: Backup &Recovery • Industrial Strength • High Performance • Online Backups • Simple, Error Free Media Handling • Minimal Recovery Time

  31. Project Phases & Characteristics • Load Phase • Ongoing Massive Data Loads • Updates to Fix Errors in Meta-Data • Backups at Key Milestones • Deployed • 7 x 24 • Some Updates to Existing Data • Small Loads as More Data Arrives • Infrequent Large Loads

  32. Deployed6/98... • ISV Supports SQL Server 7.0 High Performance Backup API • ISV Supports Full Range of SQL Server 7.0 Backup/Restore Features Tape Library SQL Server Backup Software Backup API

  33. Backup API Performance

  34. Verifying Backup/Restore • Minimal Risk Restore to a Separate System at DECWest • Early Problems with Unreadable Tapes Test System TerraServer Another Terabyte of Disk!

  35. TerraServer Backup/RestoreFactoids • Backup/Restore Rate • Time Required for Full Database Backup: • Number of DLT Tape Cartridges: 200 GB/Hr (57 MB/sec) 5 Hours 36

  36. Other Details • Active Server pages • faster and easier than DB stored procedures. • Commerce Server is interesting • Images the Inventory • no SKU, • millions of them • USGS built their own • they are very smart, but it is easy • masquerade as a credit-card reader. • The earth is a geoid, and • Every Geographer has a coordinate system (or two). • Tapes are still a nightmare. • Everyone is a UI expert.

More Related