1 / 21

Data Management Infrastructure

Data Management Infrastructure. NOAO Brown Bag Tucson, AZ March 18, 2008 Jeff Kantor LSST Corporation. Data Products. Pipelines. Application Framework. The DM reference design uses layers for scalability, reliability, evolution. Application Layer. Scientific Layer

denna
Download Presentation

Data Management Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management Infrastructure NOAO Brown Bag Tucson, AZ March 18, 2008 Jeff Kantor LSST Corporation NOAO Brown Bag March 18, 2008 Tucson, AZ

  2. Data Products Pipelines Application Framework The DM reference design uses layers for scalability, reliability, evolution Application Layer • Scientific Layer • Pipelines constructed from reusable, standard “parts”, i.e. Application Framework • Data Products representations standardized • Metadata extendable without schema change • Object-oriented, python, C++ Custom Software Middleware Layer Data Access Distr. Processing User Interface • Portability to clusters, grid, other • Provide standard services so applications behave consistently (e.g. recording provenance) • Keep “thin” for performance and scalability • Open Source, Off-the-shelf Software, Custom Integration System Administration, Operations, Security Infrastructure Layer Storage Computing Communications • Distributed Platform • Different parts specialized for real-time alerting vs peta-scale data access • Off-the-shelf, Commercial Hardware & Software, Custom Integration Physical Plant NOAO Brown Bag March 18, 2008 Tucson, AZ

  3. Data Management is a distributed system that leverages world-class facilities and cyber-infrastructure Archive Center NCSA, Champaign, IL 100 to 250 TFLOPS, 75 PB Data Access Centers U.S. (2) and Chile (1) 45 TFLOPS, 87 PB Long-Haul Communications Chile - U.S. & w/in U.S. 2.5 Gbps avg, 10 Gbps peak Mountain Summit/Base Facility Cerro Pachon, La Serena, Chile 25 TFLOPS, 150 TB 1 TFLOPS = 10^12 floating point operations/second 1 PB = 2^50 bytes or ~10^15 bytes NOAO Brown Bag March 18, 2008 Tucson, AZ

  4. Cerro Pachon La Serena Long-haul communications are feasible • Over 2 terabytes/second dark fiber capacity available • Only new fiber is Cerro Pachon to La Serena (~100 km) • 2.4 gigabits/second needed from La Serena to Champaign, IL • Quotes from carriers include 10 gigabit/second burst for failure recovery • Specified availability is 98% • Clear channel, protected circuits NOAO Brown Bag March 18, 2008 Tucson, AZ

  5. Baseline - Raw Data Rates/Volumes • Raw image data volume is15TB/24 hours (16-bit) out of the Camera Science Data Subsystem/daq • Crosstalk-corrected image data is 1.5*raw, transported to base for transient alert processing before raw, not archived • Raw image data is transported as 16-bit to base, on to archive, expanded at the destination to 32-bit for processing • Archived image data is stored as 32-bit, compressed • Meta data volume • From summit to base to archive but small volume • Current baseline is << 1.5 TB/night, archived • Overhead on data transfer assumed to be 20%

  6. Performance - Nightly processing timeline for a visit meets alert latency requirement Exposure 1 Image Processing/ Detection complete Shutter close Readout complete Transfer to Base complete Exposure begins 15s 2s 6s 20s T0 - Start of 60 second latency timer T0 + 51s Time (sec) Exposure 2 15s 2s 6s 3s 20s 10s 10s Image Processing/ Detection complete Exposure begins Shutter close Readout complete Transfer to Base complete Association complete Alert generate complete NOAO Brown Bag March 18, 2008 Tucson, AZ

  7. Computing needs show moderate growth Archive Center Base Data Access Center Archive Center Trend Line NOAO Brown Bag March 18, 2008 Tucson, AZ

  8. Object Association Alert Distribution Calibration Products Data Release Processing Nightly Processing (repeat) Image Quality Assessment Moving Object Prediction Image Subtraction/ Source Detection Instrument Signature Removal Crosstalk Correction & Science Data Transfer Alerts, data rates and data mining require distributed computing Summit Base Archive DACs Intercontinental Link 2.4 to 10 Gbps - Alerts, Raw Images & Meta-Data Summit to Base Link 40 Gpps - Crosstalk-Corrected, Raw Images, Meta-data U.S. Links 10 Gbps - Alerts, Images, Catalogs & Meta-Data 47 TF Data Access Center Computing 100 TF Archive Computing Data Acquisition Computers 25 TF Base Computing 150 TB Catalog, Meta-data, Sky Templates 15 PB Alerts, Catalogs Meta-Data 150 Terabyte Buffer 60 PB Image Archive Nightly Processing Image Data 75 PB Replicated Alerts, Catalogs, Meta-Data, Images 12 PB User Space Data Acquisition Alert Generation 1 Meta-Data Observatory Control NOAO Brown Bag March 18, 2008 Tucson, AZ 1

  9. Base Facility Cluster • Cluster backbone (Infiniband or ethernet) • 201 Compute nodes (1 node/CCD, 16 cores = 1/amp) • 4 File system nodes • 4 Network nodes => archive • 1 system management node • 2 Job control nodes • 7-10 summit-base buffer nodes • 32 hot spares 9 NOAO Brown Bag March 18, 2008 Tucson, AZ

  10. Compute Node Architecture - Today • 4 Slot Quad Core • DDR2-5400 RAM • 2 GBytes/core • 10 GByte/s bandwidth per slot • 1 PCIe 8x bus for 10 Gb/s Ethernet • 1 PCIe 8x bus for HCA • 1U or 2U • No local disk 10 NOAO Brown Bag March 18, 2008 Tucson, AZ

  11. File System Node Architecture - Today • 4 Slot Quad Core • DDR2-5400 RAM • 2 GBytes/core • 10 GByte/s bandwidth per slot • 1 PCIe 8x bus for 10 Gb/s Ethernet • Raw image write rate is 400 Mbit/s • 2U 11 NOAO Brown Bag March 18, 2008 Tucson, AZ

  12. Base Network Node Architecture - Today • 4 Slot Quad Core • DDR2-5400 RAM • 2 GBytes/core • 10 GByte/s bandwidth per slot • 1 PCIe 8x bus for 10 Gb/s Ethernet • Raw image write rate is 400 Mbit/s • 1 HCA • 2U 12 NOAO Brown Bag March 18, 2008 Tucson, AZ

  13. Sample Layout for 10 TF 2U Cluster 8 rack layout using 144 2U/4 Socket nodes and 12 24-port IBA switches NOAO Brown Bag March 18, 2008 Tucson, AZ

  14. Base Computing Configuration 30 TF NOAO Brown Bag March 18, 2008 Tucson, AZ

  15. Computing/Disk Products Cost/Perf at start of construction phase • Continue the usual 2x performance metric • 5 TF @ ~ $1M today 20 TF @ ~ $1M in 2011 • 1 Petabyte disk storage • ~ $1.5 M today using 1 TByte disk • ~$ 620K in 2011 using 4 TB disk 15 NOAO Brown Bag March 18, 2008 Tucson, AZ

  16. Database Volumes • Detailed spreadsheet-based analysis done • Expecting: • 6 petabytes of data, 14 petabytes data+indexes • all tables: ~16 trillion rows (16x1012) • largest table: 3 trillion rows (3x1012) NOAO Brown Bag March 18, 2008 Tucson, AZ

  17. Archive Center infrastructure based on anticipated 2014 - 2023 technology trends Archive Storage File Servers 32 Nodes Archive Storage Disk Array Pipeline Servers 442 Nodes From Base Facility Archive Ops Servers 64 Nodes To Data Access Center IB Switch To End User Sites Fast Storage Disk Array 780 TB Fast Storage File Servers 16 Nodes Deep Storage Tape Library Deep Storage File Servers 20 Nodes Deep Storage Disk Array NOAO Brown Bag March 18, 2008 Tucson, AZ

  18. Data Access Center infrastructure based on anticipated 2014 - 2023 technology trends Archive Storage File Servers 16 Nodes Archive Storage Disk Array From Archive Center Archive Ops Comm. Servers 48 Nodes IB Switch To End User Sites Deep Storage Tape Library Deep Storage File Servers 4 Nodes Deep Storage Disk Array NOAO Brown Bag March 18, 2008 Tucson, AZ

  19. Emergency Operations - Failure Modes • Network failure (Mountain to Base) • If end equipment, automated/autonomous fail-over to standing spare, processing continues • If fiber, buffer data, transport backup store to Base • On repair use spare fiber to catch up • Network slow-down (Base to Archive) • Continue processing at reduced rate • Catch-up processing on re-processing nodes (alerts are sent late from Archive Center) • Network failure (Base to Archive) • Continue processing at Base • Switch mode to buffering only for Alerts, Meta Data, Raw Images • On repair, switch to unprotected circuit (doubles capacity) • Transmit buffered data to Archive Center • Catch-up processing on re-processing nodes

  20. Summit Single Point of Failure Modes • Single flash fault • Software maps out bad locations • Relocate data to spare capacity • Network switch • Hot spare line cards • Redundant capacity in fibers 20 NOAO Brown Bag March 18, 2008 Tucson, AZ

  21. Base Single Point of Failure Modes • Single image buffer node fault • Image data still at summit - no data loss • Compute node fault • Configure hot spare, delays alerts • No data loss - copies at base and summit • File server fault • Have redundant F/S, MDS, OSS • Long haul network node fault • Data remains at summit and base - no data loss • Burst capacity to 400% for catch up transfer • Base Infiniband switch • Image data still at summit - no data loss • Repair and restart - can be in daylight NOAO Brown Bag March 18, 2008 Tucson, AZ

More Related