1 / 24

Distributed Pipeline Programming for Mosaics

Distributed Pipeline Programming for Mosaics. Or Mario Tips’N’Tricks. NOAO Mosaic Pipeline. Major Features and Goals. Data products for NOAO archive and NVO node Data products for observers Pipeline for NOAO and mosaic community Basic CCD mosaic calibrations

caron
Download Presentation

Distributed Pipeline Programming for Mosaics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

  2. NOAO Mosaic Pipeline

  3. Major Features and Goals • Data products for NOAO archive and NVO node • Data products for observers • Pipeline for NOAO and mosaic community • Basic CCD mosaic calibrations • Advanced time-domain data products • Real-time data quality assessment and monitoring • High performance, data parallel system • LSST testbed • Fairly generic pipeline infrastructure (NEWFIRM, …) • Automated operation • Thorough processing history and data documentation

  4. MARIO Mosaic Automatic Reduction Infrastructure and Operations (i.e. a pipeline)

  5. Key Concepts (Tips’N’Tricks) • sub-pipelines - “meta pipeline programming” • indirect files • load balancing using trigger files • stay-alive module • parallelization of algorithms over mosaic • shared monitoring • network filenames • image processing language (CL)

  6. What is a pipeline? • collection of processing modules • connected by dependency rules • modules may run concurrently on different data objects • Infrastructure to manage processes • Infrastructure to manage dependencies • Infrastructure to monitor processes and processing

  7. OPUS Operations Pipeline Unified System

  8. Opus • Triggers (dependency rules) • file, osf, time • Blackboard • Polling • Monitors and Managers

  9. Distributed Pipeline Issues • data vs. functional parallelism • shared file system vs. local file system • heterogeneous vs. homogenous processors • parasitic processing • push vs. pull • load balancing • master-worker vs. peer-to-peer

  10. MARIO Choices • data parallelism • local file system (w/ shared blackboard) • heterogeneous processors • push AND pull • load balancing by number of data objects • peer-to-peer

  11. MARIO Architecture Concept • Multiple CPUs but no dependency on N • Multiple types of sub-pipelines by function • One for operations over all mosaic elements • One for operations on individual elements • One for cataloging • One for image differencing • All types on all CPUs: no master! • Sub-pipelines triggered by files

  12. “Meta Pipeline Programming” • Build a pipeline out of sub-pipelines • Form a distributed web of sub-pipelines • Sub-pipelines play role of subroutines • Need equivalents of: • objects • call and return • node assignment • library of standard modules • start, call, return, done, obs, run

  13. What is a sub-pipeline • primarily operates on one type of object • operates on one node • data is maintained locally • multiple stages but limited functionality

  14. Example of Sub-pipelines NGT CAL SCL DTS MEF SIF multiextension single images

  15. Sub-pipelines • NGT: Nights worth of data • Group, Zero, Dome Flat, Objects, Done • CAL: Calibration sequence (MEF) • Setup, Split, Done • SCL: Calibration sequence (SIF) • Setup, CCDPROC, Combine, Done • MEF: Process objects (MEF) • Setup, Split, Done • SIF: Process objects (SIF) • Setup, CCDPROC, Done

  16. Network of Sub-pipelines and CPUs Pipeline CPU CPU CPU MEF MEF MEF SIF SIF SIF CPU CPU CPU SIF SIF SIF MEF MEF MEF MEF: pipeline for operations over all mosaic extensions; eg crosstalk, global WCS correction SIF: pipeline for single CCD images; eg ccdproc, masking

  17. Example Processing Status OBJECT NAME PIPELINE NODE STAGES anight1 ngt dhcp-4-152 cccw_ct4m20030102T183424S cal dhcp-4-152 cccd_ct4m20030102T183424S_01 scl archive2 ccccdct4m20030102T183424S_02 scl dhcp-4-152 ccccdct4m20030102T183424S_03 scl archive2 ccccdct4m20030102T183424S_04 scl dhcp-4-152 ccccdct4m20030102T191558S cal dhcp-4-152 cccd_ct4m20030102T191558S_01 scl archive2 ccccdct4m20030102T191558S_02 scl vmware ccccdct4m20030102T191558S_03 scl archive2 ccccdct4m20030102T191558S_04 scl dhcp-4-152 ccccdct4m20030103T084044 mef dhcp-4-152 ccw__ct4m20030103T084044_01 sif archive2 ccd__ct4m20030103T084044_02 sif archive2 cp___ct4m20030103T084044_03 sif vmware p____ct4m20030103T084044_04 sif archive2 _____ct4m20030103T084307 mef dhcp-4-152 cccd_ct4m20030103T084307_01 sif archive2 ccd__ct4m20030103T084307_02 sif vmware ccd__ct4m20030103T084307_03 sif archive2 ccd__ct4m20030103T084307_04 sif archive2 ccd__

  18. Calling a Sub-pipeline • Data is setup either locally or on target node • File with path for returned result written to target pipeline • File with paths of returned results written in calling pipeline • Trigger file written to target pipeline

  19. Returning Results • Return module in target pipeline looks for return file • Results are written to trigger file for calling pipeline specified in the return file • Calling pipeline triggers on return file

  20. Call/Return A -> HN!B/data/abcN (derived from abc) A -> HN!B/return/abcN [H!A/abcN.btrig] A -> H!A/abc.b [abc1.btrig,abc2.btrig,…] A -> HN!B/abcN.btrig [HN!B/data/abcN] HN!B -> H!A/abcN.btrig [results] return checks H!A/abc.b for all done

  21. anight1.ngttrig: anight1.list: Distribute data files across a network Move references and only move data as needed Pipeline objects: standard form, variable content Act as triggers and meta-data containers Indirect Files

  22. Data Trigger (DRA, user, or pipeline module) Tape Disk DTS Process File Triggers Pipeline Data Directory Trigger Directory Module obj123.fits obj123.trig GO Contains reference to data

  23. Data Flow Networking: Example Host1: Obj456.1 Obj321.2 Host2: Obj567.2 Host2!Obj123.2 Obj123.2 Host0: Crosstalk Obj123 Host4: DOWN Host3: Host3!Obj123.1 Obj123.1

  24. Data Parallel Modules Some algorithms may need to be re-implemented specifically for a data parallel pipeline. One type is where measurements are made across the mosaic for a global calibration. Rather than requiring all pieces to be in one pipeline arrange for measurements made in parallel to be collected for the global calibration and then apply the global calibration to the pieces in parallel.

More Related