jsoc pipeline processing overview l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
JSOC Pipeline Processing Overview PowerPoint Presentation
Download Presentation
JSOC Pipeline Processing Overview

Loading in 2 Seconds...

play fullscreen
1 / 20

JSOC Pipeline Processing Overview - PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on

JSOC Pipeline Processing Overview. Rasmus Munk Larsen, Stanford University rmunk@quake.stanford.edu 650-725-5485. Overview. Hardware overview JSOC data model Pipeline infrastructure & subsystems Pipeline modules. JSOC Disk array. JSOC Connectivity. Stanford. DDS. NASA AMES . LMSAL.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'JSOC Pipeline Processing Overview' - umeko


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
jsoc pipeline processing overview

JSOC Pipeline Processing Overview

Rasmus Munk Larsen, Stanford University

rmunk@quake.stanford.edu

650-725-5485

overview
Overview
  • Hardware overview
  • JSOC data model
  • Pipeline infrastructure & subsystems
  • Pipeline modules
jsoc connectivity

JSOC Disk array

JSOC Connectivity

Stanford

DDS

NASA

AMES

LMSAL

1 Gb

Private line

MOC

“White”

Net

jsoc data model motivation
JSOC data model: Motivation
  • Evolved from MDI dataset concept to
    • Enable record level access to meta-data for queries and browsing
    • Accommodate more complex data models required by higher-level processing
  • Main design features
    • Lesson learned from MDI: Separate meta-data (keywords) and image data
      • No need to re-write large image files when only keywords change (lev1.8 problem)
      • No out-of-date keyword values in FITS headers - can bind to most recent values on export
    • Data access through query-like dataset names
      • All access in terms of (sets of) data records, which are the “atomic units” of a data series
      • A datasetname is a query specifying a set of data records:
        • jsoc:hmi_lev1_V[#3000-#3020] (21 records from with known epoch and cadence)
        • jsoc:hmi_lev0_fg[t_obs=2008-11-07_02:00:00/8h][cam=‘doppler’] (8 hours worth of filtergrams)
    • Storage and tape management must be transparent to user
      • Chunking of data records into storage units for efficient tape/disk usage done internally
      • Completely separate storage unit and meta-data databases: more modular design
      • MDI data and modules will be migrated to use new storage service
    • Store meta-data (keywords) in relational database
      • Can use power of relational database to search and index data records
      • Easy and fast to create time series of any keyword value (for trending etc.)
      • Consequence: Data records must be well defined (e.g. have a fixed set of keywords)
jsoc data model
JSOC data model

JSOC Data will be organized according to a data model with the following classes

  • Series: A sequence of like data records, typically data products produced by a particular analysis
    • Attributes include: Name, Owner , primary search index, Storage unit size, Storage group
  • Record: Single measurement/image/observation with associated meta-data
    • Attributes include: ID, Storage Unit ID, Storage Unit Slot#
    • Contain Keywords, Links, Data segments
    • Records are the main data objects seen by module programmers
  • Keyword: Named meta-data value, stored in database
    • Attributes include: Name, Type, Value, Physical unit
  • Link: Named pointer from one record to another, stored in database
    • Attributes include: Name, Target series, target record id or primary index value
    • Used to capture data dependencies and processing history
  • Data Segment: Named data container representing the primary data on disk belonging to a record
    • Attributes include: Name, filename, datatype, naxis, axis[0…naxis-1], storage format
    • Can be either structure-less (any file) or n-dimensional array stored in tiled, compressed file format
  • Storage Unit: A chunk of data records from the same series stored in a single directory tree
    • Attributes: include: Online location, offline location, tape group, retention time
    • Managed by the Storage Unit Manager in a manner transparent to most module programmers
jsoc data model7
JSOC data model

JSOC Data Series

Data records for

series hmi_lev1_fd_V

Single hmi_lev1_fd_V data record

Keywords:

RECORDNUM = 12345 # Unique serial number

SERIESNUM = 5531704 # Slots since epoch.

T_OBS = ‘2009.01.05_23:22:40_TAI’

DATAMIN = -2.537730543544E+03

DATAMAX = 1.935749511719E+03

...

P_ANGLE = LINK:ORBIT,KEYWORD:SOLAR_P

hmi_lev0_cam1_fg

hmi_lev1_fd_V#12345

aia_lev0_cont1700

hmi_lev1_fd_V#12346

hmi_lev1_fd_M

hmi_lev1_fd_V#12347

hmi_lev1_fd_V

Links:

ORBIT = hmi_lev0_orbit, SERIESNUM = 221268160

CALTABLE = hmi_lev0_dopcal, RECORDNUM = 7

L1 = hmi_lev0_cam1_fg, RECORDNUM = 42345232

R1 = hmi_lev0_cam1_fg, RECORDNUM = 42345233

hmi_lev1_fd_V#12348

aia_lev0_FE171

hmi_lev1_fd_V#12349

hmi_lev1_fd_V#12350

hmi_lev1_fd_V#12351

hmi_lev1_fd_V#12352

Data Segments:

V_DOPPLER =

hmi_lev1_fd_V#12353

Storage Unit

= Directory

jsoc subsystems
JSOC subsystems
  • SUMS: Storage Unit Management System
    • Maintains database of storage units and their location on disk and tape
    • Manages JSOC storage subsystems: Disk array, Robotic tape library
      • Scrubs old data from disk cache to maintain enough free workspace
      • Loads and unloads tape to/from tape drives and robotic library
    • Allocates disk storage needed by pipeline processes through DRMS
    • Stages storage units requested by pipeline processes through DRMS
    • Design features:
      • RPC client-server protocol
      • Oracle DBMS (to be migrated to PostgreSQL)
  • DRMS: Data Record Management System
    • Maintains database holding
      • Master tables with definitions of all JSOC series and their keyword, link and data segment definitions
      • One table per series containing record meta-data, e.g. keyword values
    • Provides distributed transaction processing framework for pipeline
    • Provides full meta-data searching through JSOC query language
      • Multi-column indexed searches on primary index values allows for fast and simple querying for common cases
      • Inclusion of free-form SQL clauses allows advanced querying
    • Provides software libraries for querying, creating, retrieving and storing JSOC series, data records and their keywords, links, and data segments
      • Currently available in C. Wrappers (with read-only restriction?) for Fortran, Matlab and IDL are planned.
    • Design features:
      • TCP/IP socket client-server protocol
      • PostgreSQL DBMS
      • Slony DB replication system to be added for managing query load and enabling multi-site distributed archives
pipeline software hardware architecture
Pipeline software/hardware architecture

JSOC Science

Libraries

Utility Libraries

Pipeline program “module”

File I/O

OpenRecords

CloseRecords

GetKeyword, SetKeyword

GetLink, SetLink

OpenDataSegment

CloseDataSegment

DRMS Library

Data Segment I/O

JSOC Disks

JSOC Disks

JSOC Disks

Record Cache (Keywords+Links+Data paths)

JSOC Disks

DRMS socket protocol

Data Record

Management Service

(DRMS)

Data Record

Management Service

(DRMS)

Storage unit transfer

Storage Unit

Management Service

(SUMS)

Data Record

Management Service

(DRMS)

AllocUnit

GetUnit

PutUnit

Storage unit transfer

SQL queries

Robotic Tape

Archive

Database

Server

SQL queries

SQL queries

Record

Catalogs

Record

Catalogs

Series

Tables

Record

Tables

Storage Unit

Tables

jsoc pipeline workflow
JSOC Pipeline Workflow

Pipeline processing plan

Pipeline Operator

DRMS session

Module3

Processing script, “mapfile”

List of pipeline modules with needed datasets for input, output

PUI

Pipeline User Interface

(scheduler)

Module2

Processing History Log

Module1

DRMS

Data Record

Management service

DRMS

Data Record

Management service

SUMS

Storage Unit

Management System

analysis modules co i contributions and collaboration
Analysis modules: co-I contributions and collaboration
  • Contributions from co-I teams:
    • Software for intermediate and high level analysis modules
    • Data series definitions
      • Keywords, links, data segments, size of storage units, primary index keywords etc.
    • Documentation
    • Test data and intended results for verification
    • Time
      • Explain algorithms and implementation
      • Help with verification
      • Collaborate on improvements if required (e.g. performance or maintainability)
  • Contributions from HMI team:
    • Pipeline execution environment
    • Software & hardware resources (Development environment, libraries, tools)
    • Time
      • Help with defining data series
      • Help with porting code to JSOC API
      • If needed, collaborate on algorithmic improvements, tuning for JSOC hardware, parallelization
      • Verification
hmi module status and mdi heritage

Code developed at Stanford

Code developed at HAO

Standalone “production” code routinely used

MDI pipeline modules exist

Research code in use

HMI module status and MDI heritage

Intermediate and high level data products

Primary observables

Internal rotation

Heliographic

Doppler velocity

maps

Spherical

Harmonic

Time series

Mode frequencies

And splitting

Internal sound speed

Full-disk velocity,

sound speed,

Maps (0-30Mm)

Local wave

frequency shifts

Ring diagrams

Doppler

Velocity

Carrington synoptic v and

cs maps (0-30Mm)

Time-distance

Cross-covariance

function

Tracked Tiles

Of Dopplergrams

Wave travel times

High-resolution v and cs

maps (0-30Mm)

Egression and

Ingression maps

Wave phase

shift maps

Deep-focus v and cs

maps (0-200Mm)

Far-side activity index

Stokes

I,V

Line-of-sight

Magnetograms

Line-of-Sight

Magnetic Field Maps

Stokes

I,Q,U,V

Full-disk 10-min

Averaged maps

Vector Magnetograms

Fast algorithm

Vector Magnetic

Field Maps

Vector Magnetograms

Inversion algorithm

Coronal magnetic

Field Extrapolations

Tracked Tiles

Tracked full-disk

1-hour averaged

Continuum maps

Coronal and

Solar wind models

Continuum

Brightness

Solar limb parameters

Brightness feature

maps

Brightness Images

questions to be discussed at working sessions
Questions to be discussed at working sessions
  • List of standard science data products
    • Which data products, including intermediate ones, should be produced by JSOC to accomplish the science goals of the mission?
    • What cadence, resolution, coverage etc. should each data product have?
    • Which data products should be computed on the fly and which should be archived?
    • What are the challenges to be overcome for each analysis technique?
  • Detailing each branch of the processing pipeline
    • What are the detailed steps in each branch?
    • Can some of the computational steps be encapsulated in general tools that can be shared among different branches (example: tracking)?
    • What are the CPU and I/O resource requirements of computational steps?
  • Contributed analysis modules
    • What groups or individuals will contribute code, and incorporate it in the pipeline?
    • If multiple candidate techniques and/or implementations exist, which should be included in the pipeline?
    • What is the test plan and what data is needed to verify the approach?
database tables for example series hmi fd v
Database tables for example series hmi_fd_v
  • Tables specific for each series contain per record values of
    • Keywords
    • Record numbers of records pointed to by links
    • DSIndex = an index identifying the SUMS storage unit containing the data segments of a record
    • Series sequence counter used for generating unique record numbers
pipeline batch processing
Pipeline batch processing
  • A pipeline batch is encapsulated in a single database transaction:
    • If no module fails all data records are commited and become visible to other clients of the JSOC catalog at the end of the session
    • If failure occurs all data records are deleted and the database rolled back
    • It is possible to commit data produced up to intermediate checkpoints during sessions

Pipeline batch = atomic transaction

Module 2.1

Module N

Commit Data

&

Deregister

Module 1

Register

session

DRMS API

DRMS API

DRMS API

DRMS API

DRMS API

Module 2.2

DRMS API

Input data

records

Output data

records

DRMS Service = Session Master

Record & Series

Database

SUMS

example of module code
Example of module code:
  • A module doing a (naïve) Doppler velocity calculation could look as shown below
  • Usage:

doppler DRMSSESSION=helios:33546 "2009.09.01_16:00:00_TAI" "2009.09.01_17:00:00_TAI"

extern CmdParams_t cmdparams; /* command line args */

extern DRMS_Env_t *drms_env; /* DRMS environment */

int module_main(void)

{

DRMS_RecordSet_t *filtergrams, *dopplergram;

int first_frame, status;

char query[1024],*start,*end;

start = cmdparms_getarg(&cmdparams, 1);

end = cmdparms_getarg(&cmdparams, 2);

sprintf(query, "hmi_lev0_fg[T_Obs=%s-%s]", start, end);

filtergrams = drms_open_records(drms_env, query, "RD", &status);

if (filtergrams->num_recs==0)

{

printf("Sorry, no filtergrams found for that time interval.\n");

return -1;

}

first_frame = 0; /* Start looping over record set. */

for (;;)

{

first_frame = find_next_framelist(first_frame, filtergrams);

if (first_frame == -1) /* No more complete framelists. Exit. */

break;

dopplergram = drms_create_records(drms_env, "hmi_fd_v",

1, &status);

if (status)

return -1;

compute_dopplergram(first_frame, filtergrams, dopplergram);

drms_close_records(drms_env, dopplergram);

}

return 0;

}

example continued
Example continued

int compute_dopplergram(int first_frame, DRMS_RecordSet_t *filtergrams,

DRMS_RecordSet_t *dopplergram)

{

int n_rows, n_cols, tuning;

DRMS_Segment_t *fg[10], *dop;

short *fg_data[10];

char *pol;

double *dop_data;

/* Get pointers for doppler data array. */

dop = drms_open_datasegment(dopplergram->records[0], "v_doppler", "RDWR");

n_cols = drms_getaxis(dop, 0);

n_rows = drms_getaxis(dop, 1);

dop_data = (double *)drms_getdata(dop, 0, 0);

/* Get pointers for filtergram data arrays. */

for (i=first_frame; i<first_frame+10; i++)

{

fg[i] = drms_open_datasegment(filtergrams->records[i], "intensity", "RD");

fg_data[i] = (short *)drms_getdata(fg, 0, 0);

pol = drms_getkey_string(filtergrams->records[i], "Polarization");

tuning = drms_getkey_int(filtergrams->records[i], "Tuning");

printf(“Using filtergram (%s, %d)\n”, pol, tuning);

}

/* Do the actual Doppler computation.*/

calc_v(fg_data, dop_data);

}