1 / 18

Jeff Kantor LSST Data Management Systems Manager LSST Corporation Institute for Astronomy

LSST Data Management: Making Peta-scale Data Accessible. Jeff Kantor LSST Data Management Systems Manager LSST Corporation Institute for Astronomy University of Hawaii Honolulu, Hawaii June 19, 2008. LSST Data Management System. Archive Center NCSA, Champaign, IL

rusty
Download Presentation

Jeff Kantor LSST Data Management Systems Manager LSST Corporation Institute for Astronomy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LSST Data Management: Making Peta-scale Data Accessible Jeff Kantor LSST Data Management Systems Manager LSST Corporation Institute for Astronomy University of Hawaii Honolulu, Hawaii June 19, 2008

  2. LSST Data Management System Archive Center NCSA, Champaign, IL 100 to 250 TFLOPS, 75 PB Data Access Centers U.S. (2) and Chile (1) 45 TFLOPS, 87 PB Long-Haul Communications Chile - U.S. & w/in U.S. 2.5 Gbps avg, 10 Gbps peak Mountain Summit/Base Facility Cerro Pachon, La Serena, Chile 10x10 Gbps fiber optics 25 TFLOPS, 150 TB 1 TFLOPS = 10^12 floating point operations/second 1 PB = 2^50 bytes or ~10^15 bytes Institute for Astronomy University of Hawaii Honolulu, Hawaii

  3. LSST Data Products Institute for Astronomy University of Hawaii Honolulu, Hawaii

  4. Database Volumes • Detailed analysis done based on existing surveys, SRD requirements • Expecting: • 6 petabytes of data, 14 petabytes data+indexes • all tables: ~16 trillion rows (16x1012) • largest table: 3 trillion rows (3x1012) Institute for Astronomy University of Hawaii Honolulu, Hawaii

  5. Data Products Pipelines Application Framework The DM reference design uses layers for scalability, reliability, evolution Application Layer • Scientific Layer • Pipelines constructed from reusable, • standard “parts”, i.e. Application Framework • Data Products representations standardized • Metadata extendable without schema change • Object-oriented, python, C++ Custom Software • Middleware Layer Data Access Distr. Processing User Interface • Portability to clusters, grid, other • Provide standard services so applications • behave consistently (e.g. recording provenance) • Keep “thin” for performance and scalability • Open Source, Off-the-shelf Software, Custom Integration System Administration, Operations, Security Infrastructure Layer Computing Communications Storage • Distributed Platform • Different parts specialized for real-time • alerting vs peta-scale data access • Off-the-shelf, Commercial Hardware & • Software, Custom Integration Physical Plant Institute for Astronomy University of Hawaii Honolulu, Hawaii

  6. LSST DM Middleware makes it easy to answer these questions • There are 75 PB of data, how do I get the data I need as fast as I need it? • I want to run an analysis code on MY [laptop, workstation, cluster, Grid], how do I do that? • I want to run an analysis code on YOUR [laptop, workstation, cluster, Grid], how do I do that? • My multi-core nodes are only getting 10% performance and I don’t know how to code for GPUs, how can I get better performance in my pipeline? • I want to reuse LSST pipeline software and add some of my own, how can I do that? Institute for Astronomy University of Hawaii Honolulu, Hawaii

  7. Facilities and Data Flows Archive Center Data Products Data Center Tier 2 - 4 End User VO Server : Data Access Server VO Server : Data Access Server Data Products Data Products High- Speed Storage Tier 1 Data Products DataProducts End User Pipeline Server High- Speed Storage Raw Data Meta Data DataProducts Sky Template Catalog Data Raw Data, Meta Data, Alerts Base Facility Mountain Site Raw Data Meta-Data Xtalk Corrected, Raw Data Meta-Data LSST Camera Data Management Subsystem Interface: Data Acquisition High- Speed Storage Subsystem : Instrument Subsystem Meta-Data DQA Sky Template Catalog Data Alerts Meta-Data Raw Data Meta-Data LSST OCS : Observatory Pipeline Server High- Speed Storage Control Institute for Astronomy University of Hawaii Honolulu, Hawaii System

  8. Computing needs show moderate growth Archive Center Base Data Access Center Archive Center Trend Line Institute for Astronomy University of Hawaii Honolulu, Hawaii

  9. Cerro Pachon La Serena Long-haul communications are feasible • Over 2 terabytes/second dark fiber capacity available • Only new fiber is Cerro Pachon to La Serena (~100 km) • 2.4 gigabits/second needed from La Serena to Champaign, IL • Quotes from carriers include 10 gigabit/second burst for failure recovery • Specified availability is 98% • Clear channel, protected circuits Institute for Astronomy University of Hawaii Honolulu, Hawaii

  10. DOE CD-1 DOE CD-0 (Q1-06) DOE CD-3 DOE CD-4 ORR NSF CoDR DOE CD-2 Camera Ready to Install MREFC Readiness Telescope First Light Camera Delivered to Chile Sensor Procurement Starts System First Light LSST Timeline FY-09 FY-10 FY-11 FY-12 FY-13 FY-14 FY-15 FY-16 FY-17 FY-07 FY-08 NSF D&D Funding MREFC Proposal Submission NSF PDR NSB NSF CDR NSF MREFC Funding NSF + Privately Supported Construction (8.5 years) Commissioning Operations DOE Ops Funding DOE MIE Funding Camera I&C Camera Fabrication (5 years) DOE R&D Funding Institute for Astronomy University of Hawaii Honolulu, Hawaii

  11. Validating the design - Data Challenges Institute for Astronomy University of Hawaii Honolulu, Hawaii

  12. Validating the design - Data Challenge work products to date Institute for Astronomy University of Hawaii Honolulu, Hawaii

  13. Data Challenges 1 & 2 were very successful Institute for Astronomy University of Hawaii Honolulu, Hawaii

  14. LSST Data Management Resources • Base year (2006) cost for developing LSST DM system and reducing/releasing data is • $5.5M R&D • $106M MREFC • $17M/yr Operations • For software, support, mountain, base, archive center, science centers • Includes Data Access user resources • Two DACs in U.S. locations • One EPO DAC at another U.S. location (added recently) • One DAC in Chile • Total Scientific Data Access user resources available across DACs • 16 Gbps network bandwidth • 12 petabyes of end user storage • 25 TFLOPS computing Institute for Astronomy University of Hawaii Honolulu, Hawaii

  15. Philosophy & Terminology • Access to LSST data should be completely open to anyone, anywhere • All data in the LSST public archive should be accessible to everyone worldwide; we should not restrict any of this data to “special” users • Library analogy: anyone can check out any book • Access to LSST data processing resources must be managed • Computers, bandwidth, and storage cost real money to purchase and to operate; we cannot size the system to allow everyone unlimited computing resources • Library analogy: we limit how many books various people can check out at one time so as equitably to share resources • Throughout the following, “access” will mean access to resources, not permission to view the data Institute for Astronomy University of Hawaii Honolulu, Hawaii

  16. Data Access Policy Considerations • The vast quantity of LSST data makes it necessary to use computing located at a copy of the archive • Compute power to access and work with the data is a limited resource • LSSTC must equitably and efficiently manage the allocation of finite resources • Declaring “open season” on the data will lead to inefficient use • Granting different levels of access to various uses will ensure increased scientific return • The data have value • Building and operating the system will require significant expenditures • Setting a value on the data product is an important ingredient of any cost-sharing negotiation Institute for Astronomy University of Hawaii Honolulu, Hawaii

  17. Service Levels Current LSST plans are for resources to be apportioned across four service levels • All users will automatically be granted access at the lowest level • Access to higher levels will be granted according to merit by a proposal process under observatory management • Review process includes scientific collaborations and other astronomy and physics community representatives • Higher levels are targeted to different uses Foreign investigators will be granted resources beyond the base level in proportion to their country’s or institution’s participation in sharing costs. Additional access to resources may similarly be obtained by any individual or group Institute for Astronomy University of Hawaii Honolulu, Hawaii

  18. Service Levels defined in MREFC Proposal Level 4 – typical/general users, no special access required 6 Gbps bandwidth 1 PB data storage 1 TFlop total Level 3 - power user individuals, requires approval 2 Gbps bandwidth 100 TB storage 1 TFlop at each DAC Level 2 - power user institutions, requires approval 2 Gbps bandwidth 900 TB storage (100 TB/yr) 5 TFlops at each DAC (1 TFlop/yr for 5 years) Level 1 –most demanding applications, requires approval 6 Gbps 10 PB storage (1 PB/yr) 25 TFlops (5 TFlops/yr for 5 years) Institute for Astronomy University of Hawaii Honolulu, Hawaii

More Related