Distributed pipeline programming for mosaics
Download
1 / 24

Distributed Pipeline Programming for Mosaics - PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on

Distributed Pipeline Programming for Mosaics. Or Mario Tips’N’Tricks. NOAO Mosaic Pipeline. Major Features and Goals. Data products for NOAO archive and NVO node Data products for observers Pipeline for NOAO and mosaic community Basic CCD mosaic calibrations

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Distributed Pipeline Programming for Mosaics' - caron


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


Major features and goals
Major Features and Goals

  • Data products for NOAO archive and NVO node

  • Data products for observers

  • Pipeline for NOAO and mosaic community

  • Basic CCD mosaic calibrations

  • Advanced time-domain data products

  • Real-time data quality assessment and monitoring

  • High performance, data parallel system

  • LSST testbed

  • Fairly generic pipeline infrastructure (NEWFIRM, …)

  • Automated operation

  • Thorough processing history and data documentation


Mario

MARIO

Mosaic Automatic Reduction Infrastructure and Operations

(i.e. a pipeline)


Key concepts tips n tricks
Key Concepts (Tips’N’Tricks)

  • sub-pipelines - “meta pipeline programming”

  • indirect files

  • load balancing using trigger files

  • stay-alive module

  • parallelization of algorithms over mosaic

  • shared monitoring

  • network filenames

  • image processing language (CL)


What is a pipeline
What is a pipeline?

  • collection of processing modules

  • connected by dependency rules

  • modules may run concurrently on different data objects

  • Infrastructure to manage processes

  • Infrastructure to manage dependencies

  • Infrastructure to monitor processes and processing


OPUS

Operations Pipeline Unified System


Opus

  • Triggers (dependency rules)

    • file, osf, time

  • Blackboard

  • Polling

  • Monitors and Managers


Distributed pipeline issues
Distributed Pipeline Issues

  • data vs. functional parallelism

  • shared file system vs. local file system

  • heterogeneous vs. homogenous processors

  • parasitic processing

  • push vs. pull

  • load balancing

  • master-worker vs. peer-to-peer


Mario choices
MARIO Choices

  • data parallelism

  • local file system (w/ shared blackboard)

  • heterogeneous processors

  • push AND pull

  • load balancing by number of data objects

  • peer-to-peer


Mario architecture concept
MARIO Architecture Concept

  • Multiple CPUs but no dependency on N

  • Multiple types of sub-pipelines by function

    • One for operations over all mosaic elements

    • One for operations on individual elements

    • One for cataloging

    • One for image differencing

  • All types on all CPUs: no master!

  • Sub-pipelines triggered by files


Meta pipeline programming
“Meta Pipeline Programming”

  • Build a pipeline out of sub-pipelines

  • Form a distributed web of sub-pipelines

  • Sub-pipelines play role of subroutines

  • Need equivalents of:

    • objects

    • call and return

    • node assignment

    • library of standard modules

      • start, call, return, done, obs, run


What is a sub pipeline
What is a sub-pipeline

  • primarily operates on one type of object

  • operates on one node

  • data is maintained locally

  • multiple stages but limited functionality


Example of sub pipelines
Example of Sub-pipelines

NGT

CAL

SCL

DTS

MEF

SIF

multiextension

single images


Sub pipelines
Sub-pipelines

  • NGT: Nights worth of data

    • Group, Zero, Dome Flat, Objects, Done

  • CAL: Calibration sequence (MEF)

    • Setup, Split, Done

  • SCL: Calibration sequence (SIF)

    • Setup, CCDPROC, Combine, Done

  • MEF: Process objects (MEF)

    • Setup, Split, Done

  • SIF: Process objects (SIF)

    • Setup, CCDPROC, Done


Network of sub pipelines and cpus
Network of Sub-pipelines and CPUs

Pipeline

CPU

CPU

CPU

MEF

MEF

MEF

SIF

SIF

SIF

CPU

CPU

CPU

SIF

SIF

SIF

MEF

MEF

MEF

MEF: pipeline for operations over all mosaic extensions; eg crosstalk, global WCS correction

SIF: pipeline for single CCD images; eg ccdproc, masking


Example processing status
Example Processing Status

OBJECT NAME PIPELINE NODE STAGES

anight1 ngt dhcp-4-152 cccw_ct4m20030102T183424S cal dhcp-4-152 cccd_ct4m20030102T183424S_01 scl archive2 ccccdct4m20030102T183424S_02 scl dhcp-4-152 ccccdct4m20030102T183424S_03 scl archive2 ccccdct4m20030102T183424S_04 scl dhcp-4-152 ccccdct4m20030102T191558S cal dhcp-4-152 cccd_ct4m20030102T191558S_01 scl archive2 ccccdct4m20030102T191558S_02 scl vmware ccccdct4m20030102T191558S_03 scl archive2 ccccdct4m20030102T191558S_04 scl dhcp-4-152 ccccdct4m20030103T084044 mef dhcp-4-152 ccw__ct4m20030103T084044_01 sif archive2 ccd__ct4m20030103T084044_02 sif archive2 cp___ct4m20030103T084044_03 sif vmware p____ct4m20030103T084044_04 sif archive2 _____ct4m20030103T084307 mef dhcp-4-152 cccd_ct4m20030103T084307_01 sif archive2 ccd__ct4m20030103T084307_02 sif vmware ccd__ct4m20030103T084307_03 sif archive2 ccd__ct4m20030103T084307_04 sif archive2 ccd__


Calling a sub pipeline
Calling a Sub-pipeline

  • Data is setup either locally or on target node

  • File with path for returned result written to target pipeline

  • File with paths of returned results written in calling pipeline

  • Trigger file written to target pipeline


Returning results
Returning Results

  • Return module in target pipeline looks for return file

  • Results are written to trigger file for calling pipeline specified in the return file

  • Calling pipeline triggers on return file


Call return
Call/Return

A -> HN!B/data/abcN (derived from abc)

A -> HN!B/return/abcN [H!A/abcN.btrig]

A -> H!A/abc.b [abc1.btrig,abc2.btrig,…]

A -> HN!B/abcN.btrig [HN!B/data/abcN]

HN!B -> H!A/abcN.btrig [results]

return checks H!A/abc.b for all done


Indirect files

anight1.ngttrig:

anight1.list:

Distribute data files across a network

Move references and only move data as needed

Pipeline objects: standard form, variable content

Act as triggers and meta-data containers

Indirect Files


File triggers

Data Trigger

(DRA, user, or

pipeline module)

Tape

Disk

DTS

Process

File Triggers

Pipeline

Data

Directory

Trigger

Directory

Module

obj123.fits

obj123.trig

GO

Contains reference to data


Data flow networking example
Data Flow Networking: Example

Host1:

Obj456.1

Obj321.2

Host2:

Obj567.2

Host2!Obj123.2

Obj123.2

Host0:

Crosstalk

Obj123

Host4:

DOWN

Host3:

Host3!Obj123.1

Obj123.1


Data parallel modules
Data Parallel Modules

Some algorithms may need to be re-implemented specifically for a data parallel pipeline.

One type is where measurements are made across the mosaic for a global calibration.

Rather than requiring all pieces to be in one pipeline arrange for measurements made in parallel to be collected for the global calibration and then apply the global calibration to the pieces in parallel.


ad