1 / 11

JSOC Pipeline Processing Environment

JSOC Pipeline Processing Environment. Rasmus Munk Larsen, Stanford University rmunk@quake.stanford.edu 650-725-5485. Overview. JSOC data series organization Pipeline execution environment Pipeline software architecture Co-I analysis module contribution Pipeline Data Products.

maina
Download Presentation

JSOC Pipeline Processing Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. JSOC Pipeline Processing Environment Rasmus Munk Larsen, Stanford University rmunk@quake.stanford.edu 650-725-5485

  2. Overview • JSOC data series organization • Pipeline execution environment • Pipeline software architecture • Co-I analysis module contribution • Pipeline Data Products

  3. JSOC logical data organization • Evolved from MDI dataset concept to • Fix known limitations/problems • Accommodate more complex data models required by higher-level processing • Main design features • Separation of meta-data (keywords) and image data • No need to re-write large image files when only keywords change (lev1.8 problem) • No (fewer) out-of-date keyword values in FITS headers • Can bind to most recent values on export • Easier data access • All access in terms of (collections of) data records, which are the “atomic units” of a data series • A datasetname is a query specifying a set of data records (possibly from multiple data series): • jsoc:hmi_lev0_com1_fg?recordnum=12345 (a specific filtergram with unique record number 12345) • jsoc:hmi_lev0_cam1_fg[12300-12330] (a minute’s worth of filtergrams from camera1) • jsoc:hmi_lev1_fd_V?”T_OBS>=‘2008-11-01’ AND T_OBS<‘2008-12-01’ AND N_MISSING<100” • Storage and tape management must be transparent to user • Chunking of data records into storage units for efficient tape/disk usage done internally • Completely separate storage and catalog (i.e. series & record) databases: more modular design • Legacy MDI modules should run on top of new storage service • Storing keywords in relational database system (Oracle) • Can use power of relational database to rapidly find data records • Easy and fast to create time series of any keyword value (for trending etc.) • Consequence: Data records for a given series must be well defined (e.g. have fixed set of keywords)

  4. Logical Data Organization JSOC Data Series Data records for series hmi_lev1_fd_V Single hmi_lev1_fd_V data record Keywords: RECORDNUM = 12345 # Unique serial number SERIESNUM = 5531704 # Slots since epoch. T_OBS = ‘2009.01.05_23:22:40_TAI’ DATAMIN = -2.537730543544E+03 DATAMAX = 1.935749511719E+03 ... P_ANGLE = LINK:ORBIT,KEYWORD:SOLAR_P … hmi_lev0_cam1_fg hmi_lev1_fd_V#12345 aia_lev0_cont1700 hmi_lev1_fd_V#12346 hmi_lev1_fd_M hmi_lev1_fd_V#12347 hmi_lev1_fd_V Links: ORBIT = hmi_lev0_orbit, SERIESNUM = 221268160 CALTABLE = hmi_lev0_dopcal, RECORDNUM = 7 L1 = hmi_lev0_cam1_fg, RECORDNUM = 42345232 R1 = hmi_lev0_cam1_fg, RECORDNUM = 42345233 … hmi_lev1_fd_V#12348 aia_lev0_FE171 hmi_lev1_fd_V#12349 … hmi_lev1_fd_V#12350 hmi_lev1_fd_V#12351 hmi_lev1_fd_V#12352 Data Segments: V_DOPPLER = hmi_lev1_fd_V#12353 … Storage Unit = Directory

  5. JSOC Series Definition (JSD) #======================= Global series information =========================== Seriesname: "testclass1" Description: “This is a small example of a JSOC series definition." Author: "Rasmus Munk Larsen" Owners: "rmunk" Unitsize: 10 Archive: 1 Retention: permanent Tapegroup: 127 Primary Index: #============================ Keywords ================================= # Format: # Keyword: <name>, link, <linkname>, <target keyword name> # or # Keyword: <name>, <type>, <default value>, <format>, <unit>, <comment> # Keyword: "keywd0", float, 0.0f, "%f", "unit3", "Comment3" Keyword: "keywd1", double, 0.0, "%lf", "unit4", "Comment4" Keyword: "keywd2", datetime, "1970-01-01 00:00:00", "%-s", "unit5", "Comment5" Keyword: "keywd3", timestamp, "19700101000000", "%-s", "unit6", "Comment6" Keyword: "keywd4", string, "", "%-s", "unit7", "Comment7" Keyword: "keywd5", link, "link1", "keywd0" Keyword: "keywd6", char, '\0', "%d", "unit1", "Comment1" Keyword: "keywd7", int, 0, "%d", "unit2", "Comment2" #============================ Links ===================================== # Format: # Link: <name>, <target series>, { static | dynamic } # Link: "link0", "testclass0", static Link: "link1", "testclass0", dynamic #============================ Data segments =============================== # Data: <name>, <type>, <naxis>, <axis dims>, <unit>, <protocol> # Data: "x-axis", float, 1, 100, "m", fits Data: "y-axis", float, 1, 200, "m", fits Data: "z-axis", float, 1, 50, "m", fits Data: "pressure", float, 3, 100, 200, 50, "kg/(s^2*m)", fitz Data: "velocity", float, 4, 100, 200, 50, 3, "m/s", fitz Creating a new Data Series: testclass1.jsd JSD parser SQL: INSERT INTO series_catalog VALUES(‘testclass1’,’rmunk’, … SQL: CREATE TABLE testclass1 ( recnum integer not null unique, keywd0 binary_float, … Oracle database

  6. Disk Pipeline batch = atomic transaction Module N Commit Data & Deregister Module 1 Module 2 Register session … JSOC API JSOC API JSOC API JSOC API JSOC API Input data records Output data records JSOC ARCHIVE Pipeline batch processing (a.k.a. MDI mapfile) • Pipeline processing is scheduled in batches by PUI+: a data driven pipeline scheduler inherited from MDI • A pipeline batch is a single atomic transaction: • If no module fails all data records are commited and become visible to other clients of the archive • If failure occurs all data records are deleted and the database rolled back

  7. Pipeline Client-Server Architecture Pipeline client process Analysis code C/Fortran/IDL/Matlab OpenRecords CloseRecords GetKeyword, SetKeyword GetLink, SetLink OpenDataSegment CloseDataSegment File I/O JSOC Library Data Segment I/O JSOC Disks JSOC Disks Record Cache (Keywords+Links+Data paths) JSOC Disks JSOC Disks Storage unit transfer Storage Unit Management Service (SUMS) Data Record Management Service (DRMS) AllocUnit GetUnit PutUnit Storage unit transfer SQL query Tape Archive Service Oracle Database Server SQL query SQL query Record Catalogs Record Catalogs Series Catalog Record Catalogs Storage Database

  8. co-I contributions and collaboration • Contributions from co-I teams: • Software for intermediate and high level analysis modules • Output data series definition • Keywords, links, data segments, size of storage units etc. • Documentation (detailed enough to understand the contributed code) • Test data and intended results for verification • Time • Explain algorithms and implementation • Help with verification • Collaborate on improvements if required (e.g. performance or maintainability) • Contributions from HMI team: • Pipeline execution environment • Software & hardware resources (Development environment, libraries, tools) • Time • Help with defining data series • Help with porting code to JSOC API • If needed, collaborate on algorithmic improvements, tuning for JSOC hardware, parallelization • Verification

  9. Instrument specific code, Stanford is primary developer Research code exists in the community New codes under development (HAO) Research code currently used Standalone “production” code routinely used MDI pipeline modules exist HMI module status and MDI heritage Intermediate and high level data products Primary observables Internal rotation Heliographic Doppler velocity maps Spherical Harmonic Time series Mode frequencies And splitting Internal sound speed Full-disk velocity, sound speed, Maps (0-30Mm) Local wave frequency shifts Ring diagrams Doppler Velocity Carrington synoptic v and cs maps (0-30Mm) Time-distance Cross-covariance function Tracked Tiles Of Dopplergrams Wave travel times High-resolution v and cs maps (0-30Mm) Egression and Ingression maps Wave phase shift maps Deep-focus v and cs maps (0-200Mm) Far-side activity index Stokes I,V Line-of-sight Magnetograms Line-of-Sight Magnetic Field Maps Stokes I,Q,U,V Full-disk 10-min Averaged maps Vector Magnetograms Fast algorithm Vector Magnetic Field Maps Vector Magnetograms Inversion algorithm Coronal magnetic Field Extrapolations Tracked Tiles Tracked full-disk 1-hour averaged Continuum maps Coronal and Solar wind models Continuum Brightness Solar limb parameters Brightness feature maps Brightness Images

  10. Questions this meeting should address • List of all science data products • Which data products, including intermediate ones, should be produced by JSOC? • What cadence, resolution, coverage etc. will/should each data product have? • Eventually a JSOC series description must be written for each one. • Which data products should be computed on the fly and which should be archived? • Have we got the basic pipeline right? Are there maturing new techniques that have been overlooked? • Detailing each branch of the processing pipeline • What are the detailed steps in each branch? • Can some of the computational steps be encapsulated in general tools that can be shared among different branches (example: tracking)? • What are the computer resource requirements of computational steps? • Contributed analysis modules • Who will contribute code? • Which codes are mature enough for inclusion? Should be at least working research code now, since integration has to begin by c. mid 2006.

  11. Example: Global Seismology Pipeline

More Related