1 / 13

Organizing the Extremely Large LSST Database for Real-Time Astronomical Processing

ADASS London, UK September 23-26, 2007 Jacek Becla 1 , Kian-Tat Lim 1 , Serge Monkewitz 2 , Maria Nieto-Santisteban 3 , Ani Thakar 3 1 Stanford Linear Accelerator Center 2 California Institute Of Technology 3 Johns Hopkins University.

maddy
Download Presentation

Organizing the Extremely Large LSST Database for Real-Time Astronomical Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ADASS London, UK September 23-26, 2007 Jacek Becla1, Kian-Tat Lim1, Serge Monkewitz2,Maria Nieto-Santisteban3, Ani Thakar31 Stanford Linear Accelerator Center2 California Institute Of Technology3 Johns Hopkins University Organizing the Extremely Large LSST Database for Real-Time Astronomical Processing

  2. Expected LSST Database Volumes • 16x1012 rows • 6 petabytes (data only) • 14 petabytes (data and indexes) SDSS • 68x109 rows • 0.02 petabyte LSST 700x larger, but much later ADASS September 23-26, 2007 London, UK

  3. Real-Time Alert Generation <60 sec goal • Goal: generate transient alerts in under 60 sec • Association <10 sec • Approach: • Minimize I/O • Maximize computational efficiency Transfer to base Alertgeneration Imageprocessing Association Sourcedetection ADASS September 23-26, 2007 London, UK

  4. ADASS September 23-26, 2007 London, UK Association Steps Difference Source Collectionnew detections, aver. 40K, max 100K per “visit” Object Cataloghistorical data, ~50x109 MOPSCatalogknown moving objects w/predicted positions Updates must be available when subsequent observations are processed cross-correlate update (~40K) mark matchedsources cross-correlate mark matchedsources add new objects forunmatched sources (~1K) persist newsources persist new objectsand updates

  5. ADASS September 23-26, 2007 London, UK I/O: Large Volumes, Sparse Updates Many small, sparse updates and writes Potentially large volume of data accessed update matching objects, ~40K save detected sources locate, then read objects for given FOV, 4x106 on average add ~1K objects DIASource Catalog Object Catalog

  6. ADASS September 23-26, 2007 London, UK Minimizing and Sequentializing I/O • Match only against objects in or near processed FOV • reduces problem from 100K x 50x109 to 100K x 4x106 • Fetch only columns used during cross-match • reduces problem from 4x106 x 1,658 to 4x106 x 24 = ~100MB • Cluster together data accessed together (spatially) • Maintain specialized copy of data used by cross-match • RA, Decl, objectId for objects • Partition data, round-robin across several disks • Serve data from memory during time-critical period • removes dependency on slow disks

  7. ADASS September 23-26, 2007 London, UK Minimizing and Sequentializing I/O Scan whole catalog - naive Match against processed FOV Cluster data spatially Fetch used columns only Keep specialized copy Partition data Serve from memory

  8. ADASS September 23-26, 2007 London, UK 8 Partitioning Data • Divide difference sources and objects into declination stripes and physically partition into chunks • optimal #partitions: O(300,000) Stripes are a fraction of a field-of-view high (less than one to minimize wasted I/O around the circular FOV) Chunks are one stripe height wide

  9. ADASS September 23-26, 2007 London, UK Serving Data From Memory • Pre-fetch variable objects to memory prior to processing when FOV known • Time-critical period: • lookup newly variable objects on disk (<1,000) • all other reads/writes from RAM (98+% of I/O) • Flush updates to disk after processing • Get back fault tolerance by mirroring

  10. ADASS September 23-26, 2007 London, UK Maximizing Computational Efficiency:Algorithm • Generate declination zones dynamically in-memory • Zones are several search radii high • more than one to minimize index overhead • Sort data within zones by RA on the fly • Match difference sources and objects within zones

  11. ADASS September 23-26, 2007 London, UK Maximizing Computational Efficiency: Implementation • Push computation from database to application • specialized algorithm faster than generic database approach • moving data to computation, not computation to data • works because we significantly reduced accessed data volume • result: significant speed up • Parallelize computation • by zone

  12. ADASS September 23-26, 2007 London, UK Results • Average real-LSST scale test • ~40K sources vs ~4 million objects • Using a single 2002-era server • SunFire V240 UltraSparc IIIi 1.5GHz • Results • read and index object data: 9.4 sec • read and index source data: 0.91 sec • perform cross-match: 0.14 sec Required: <30 sec } Required: <10 sec

  13. ADASS September 23-26, 2007 London, UK Summary • LSST Real-Time spatial cross-match • large scale • stringent timing requirements • still easily doable on a single server with appropriate setup and data organization

More Related