1 / 33

Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS

Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS. James Frew 1 , Thomas H. Painter 2 , Peter Slaughter 1 , Jeff Dozier 1. 1 Donald Bren School of Environmental Science and Management, University of California, Santa Barbara

adriel
Download Presentation

Tracking Metadata and Lineage of the Data Processing Chain for Mapping Snow Cover Properties with the NASA MODIS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tracking Metadata and Lineageof the Data Processing Chainfor Mapping Snow Cover Propertieswith the NASA MODIS James Frew1, Thomas H. Painter2,Peter Slaughter1, Jeff Dozier1 1Donald Bren School of Environmental Science and Management, University of California, Santa Barbara 2National Snow and Ice Data Center,University of Colorado, Boulder

  2. Outline • Motivation • Snow mapping product • Implications for hydrologic modeling • Lineage Capture • Wrapping: the ESSW experience • Instrumenting,overriding,monitoring: the (ongoing) ES3 experience

  3. MODIS image – Sierra Nevada EOS Terra MODIS 07 March 2004 MOD09 Surface Reflectance 0.555 0.645 0.858

  4. Snow-covered area and grain size

  5. Hindu Kush 2003 DOY 070

  6. Colorado RockiesCLPX13 March 2002

  7. Model structure: MODIS snow-area / albedo

  8. Lineage Capture, Take 1 The ESSW experience

  9. Using Existing Science Applications • No “standard”Earth science computing environment • commercial packages (ArcInfo, MATLAB, …) • public packages/models (MM5, MODTRAN, …) • locally-developed codes • arbitrary combinations of  • Example: SST from AVHRR  • commercial, standalone programs • parameters highly customized for UCSB • How do we get these programs to • communicate • cooperate with ESSW, without rewriting them? Receive Ingest and Calibrate Navigate (Manual/Automatic) Sea Surface Temp (SST) Rectify SST Maps

  10. Lineage: Current Best Practice

  11. Earth System Science Workbench (ESSW) Producer and consumer issues can both be addressedby a laboratory metaphor • Experiment • Network of models • … ingesting / synthesizing data • … generating products • Laboratory • Experiment execution environment • Computing + storage = accessibility + scalability • Lab Notebook • Persistent storage that can be queried • Keeps track of all experiments • Documentation + lineage = accountability

  12. Wrap Your App: Scripts Talk to ESSW • No changes,just additions • Wrapper scripts • Make program (groups) look like ESSW experiments • use Perl API • Lab Notebook daemon • Accepts API commands • Creates XML documents • Sends to database • ESSW database • XML metadata & DTDs • Tabular metadata • XML search terms • Lineage links XML + SQL Perl API Lab Notebookdaemon Receive Ingest and Calibrate ESSW Database Navigate (Manual/Automatic) Sea Surface Temp (SST) Rectify Java MySQL JDBC Perl SST Maps

  13. ESSW Metadata management • Lab Notebook daemon verifies XML metadata document • Experiment step metadata stored for product lineage tracking • Complete metadata document stored in custom database table • XML DTD ← 1:1 → database table • (n+1)th column is document itself • Some metadata values extracted into database tables • DTD contains column names and types for some elements • Always save all the XML,even if don’t know how to “columnize” all of it

  14. # SST experiment wrapper # $L1B is the input Level 1B AVHRR image file # $SST is the output SST image file # run legacy command "nitpix": creates SST image from L1B image $base_temp = 5.0; $temp_step = 0.1; ... system("nitpix base_temp=$base_temp temp_step=$temp_step ... $L1B $SST"); # start recording ESSW metadata beginXMLBld($ENV{USER}, "PRODUCTION"); # get metadata for input file $L1B_ID = findSciObjFromFile($L1B); AHVRR Level 1B product avhrr_l1b Multi-channel sea surface temperature algorithm avhrr_ sstModel Sea surface temperature avhrr_sst (SST) Wrapper Example: Input Dataset

  15. AHVRR Level 1B product # create metadata for SST image $SST_ID = createMetadata("avhrr_sst"); addValue($SST_ID, "avhrr_sst.scene_id.satellite", $satellite); addValue($SST_ID, "avhrr_sst.scene_id.pass_date", $pass_date); ... saveToDB($SST_ID, avhrr_sst); closeMetadata($SST_ID); saveDigest($SST, $SST_ID); avhrr_l1b Multi-channel sea surface temperature algorithm avhrr_ sstModel Sea surface temperature avhrr_sst (SST) Wrapper Example: Output Dataset

  16. # create metadata for SST experiment $exp = createExperimentMetadata("avhrr_sstModel"); $exp_step = createExpStepMetadata($exp, "avhrr_sstExpStp"); addValue($exp_step, "avhrr_sstExpStp.base_temp", $base_temp); addValue($exp_step, "avhrr_sstExpStp.temp_step", $temp_step); ... saveToDB($exp_step, "avhrr_sstExpStp"); closeMetadata($exp_step); # connect input and output images to experiment registerExperimentInputs($exp, $L1B_ID); registerExperimentOutputs($exp, $SST_ID); # finish recording ESSW metadata endXMLBld(); AHVRR Level 1B product avhrr_l1b Multi-channel sea surface temperature algorithm avhrr_ sstModel Sea surface temperature avhrr_sst (SST) Wrapper Example: Process

  17. # create metadata for SST experiment $exp = createExperimentMetadata("avhrr_sstModel"); $exp_step = createExpStepMetadata($exp, "avhrr_sstExpStp"); addValue($exp_step, "avhrr_sstExpStp.base_temp", $base_temp); addValue($exp_step, "avhrr_sstExpStp.temp_step", $temp_step); ... saveToDB($exp_step, "avhrr_sstExpStp"); closeMetadata($exp_step); # connect input and output images to experiment registerExperimentInputs($exp, $L1B_ID); registerExperimentOutputs($exp, $SST_ID); # finish recording ESSW metadata endXMLBld(); AHVRR Level 1B product avhrr_l1b Multi-channel sea surface temperature algorithm avhrr_ sstModel Sea surface temperature avhrr_sst (SST) Wrapper Example: Lineage Links

  18. Process graph reconstructedfrom ESSW database

  19. ESSW Lessons • Providers are customers • ESIPs aren’t much good unless scientists are happy to put information in them • A light touch is the right touch • Wrapping is easier for scientists and their programmers to deal with than complete re-engineering • Scientists do write scripts, but not necessarily Perl • Scripting (gluing stuff together) comes naturally to scientists • Scientists don’t write DTDs • Nobody calls metadata APIs ESSW was automatic, but not automatic enough…

  20. Lineage Capture, Take 2 The ES3 experience

  21. ES3 : Earth System Science Server ESSW++ data lineage tracking MODster OpenDAP Watershed-scale snow product MODIS Microsoft TerraServer AVHRR Global-scale snow product Alexandria Digital Library Corona BUB data storage ROCKS processing clusters

  22. From ESSW to ES3: Summary • Perl wrappers  “Probulators” • Perl API  web services + XML messages • MySQL  XML database(s)

  23. From Wrappers to Probulators Wrappers: Active Lineage • + • Complete control over what gets recorded • Single language/API for all wrapped events • Not tied to execution • You can even lie about what happened • – • Must explicitly script everything • Scripts can drift from reality • You can even lie about what happened

  24. From Wrappers to Probulators Probulators: Passive Lineage • + • Record what actually happened • Not just what you think happened • Not what didn’t happen • Automatic: don’t have to write new scripts for everything • – • Different flavors for different environments • Can’t just do everything in Perl…

  25. Probulator patterns • Instrumentation • Insert lineage capture instructions directly into science codes • e.g. “I just created file ‘foo’” • Typical implementation: preprocessor/precompiler • Overriding • Replace standard routines/libraries with lineage-capturing versions • e.g. open(…) → snoopy_open(…) • Typical implementation: modify execution environment • environment variables • configuration files • Passive monitoring • Trace program execution • e.g. “called open() with args foo, bar, …” • Typical implementation: strace’d shell

  26. logfiles ES3 Lineage Architecture probulator1 logger transmitter ES3 core probulatorn

  27. Probulating IDL: Instrumenting the code ;edit pro modscag_cleanse,prefix=prefix,ns=ns,nl=nl HELP, NAMES="*", OUTPUT=ES3_ENVIROMENT & ES3_LOG, $ ENTER="modscag_cleanse", ENVIROMENT=ES3_ENVIROMENT ; clean up {under,over}flow of MODSCAG run ; ; Input: prefix = prefix for all of the MODSCAG output filenames ; ns = number of samples ; nl = number of lines ; Output: rewrite of the MODSCAG files ; ; t.h.painter / 1.19.2005 ; open snow file ES3_openr,1,string(prefix,'snow.pic') snow=fltarr(ns,nl) readu,1,snow [ blah blah blah ] HELP, NAMES="*", OUTPUT=ES3_ENVIROMENT & ES3_LOG, LEAVE="modscag_cleanse", $ ENVIROMENT=ES3_ENVIROMENT END ; modscag_cleanse

  28. Probulating IDL: Results <init time="20050522T234606Z” pid="31002" stime="20050522T234604Z" pstime="20050522T234256Z" ppid="30920" language="idl" user="haavar" hostname="spitting-duck.bren.ucsb.edu"> <enviroment> <variable name="!PATH" value="/home/haavar/probulator//idl: /home/rsi/idl_6.1/lib/hook: […] </enviroment> <mount-points> <mount share="dab15:/ed15/rsi" type="nfs">/home/rsi</mount> </mount-points> </init> <enter region="modscag_cleanse"> <enviroment> <variable type="INT" name="NL" value="2"/> <variable type="INT" name="NS" value="2"/> […] </enviroment> </enter> <exec time="20050522T234610Z" routine="OPENR"> <io> <file read="true">/home/haavar/painter/data/tillsnow.pic</file> </io> </exec>]

  29. Probulating bash: Passive Monitoring cat /etc/passwd | grep haavar | sed -n 's/\(.*:\)\{2\}\([0-9]\+\).*/\2/p' 25232 1138336174.480079 open("/etc/ld.so.cache", O_RDONLY) = 3 25232 1138336174.480215 open("/lib/libm.so.6", O_RDONLY) = 3 […] 25234 1138336178.887267 dup2(3, 255) = 255 25234 1138336178.887912 pipe([3, 4]) = 0 25234 1138336178.888257 clone(child_stack=0, […], child_tidptr=0xb7f2e708) = 25235 25235 1138336178.889366 dup2(4, 1) = 1 25235 1138336178.889975 pipe([3, 4]) = 0 25235 1138336178.890326 clone(child_stack=0, […], child_tidptr=0xb7f2e708) = 25236 25235 1138336178.891260 pipe([4, 5]) = 0 25235 1138336178.891756 clone(child_stack=0, […], child_tidptr=0xb7f2e708) = 25237 25235 1138336178.892753 clone(child_stack=0, […], child_tidptr=0xb7f2e708) = 25238 25238 1138336178.894266 dup2(4, 0) = 0 25236 1138336178.894726 dup2(4, 1) = 1 25237 1138336178.894763 dup2(3, 0) = 0 25237 1138336178.895581 dup2(5, 1) = 1 […] 25238 1138336178.897006 execve("/bin/sed", ["sed", "-n", "s/\\(.*:\\)\\{2\\}\\([0-9]\\+\\).*/\\2/p"], ["HOSTNAME=rubber-duck.bren.ucsb.edu", "TERM=xterm-color", […] 25236 1138336178.900117 execve("/bin/cat", ["cat", "/etc/passwd”], […] 25237 1138336178.903342 execve("/bin/grep", ["grep", "haavar"], […]

  30. Probulating bash: Results [… <init> same as IDL …] <exec time="20060027T042938.900117Z" routine="/bin/cat" pid="25236" ppid="25235"> <arguments> <argument>/etc/passwd</argument> </arguments> <io> <pipe read="true" id="std-in"/> <pipe write="true" id="3"/> <pipe write="true" id="std-err"/> <file read="true">/etc/ld.so.cache</file> […] <file read="true">/etc/passwd</file> </io> </exec> <exec time="20060027T042938.903342Z" routine="/bin/grep" pid="25237" ppid="25235"> <arguments> <argument>haavar</argument> </arguments> <io> <pipe read="true" id="3"/> <pipe write="true" id="4"/> […] </io> </exec>

  31. Now What? • Probulator reports not universally unique • Q: How hook separate reports together? • A: Logger assigns UUIDs to • Data streams • Processes • Jobs (workflows) • Lineage not explicit • Q: How publish lineage? • A: ES3 Core builds serialized graph

  32. Thanks to: Current • Mike Colee • Stephane Maritorena • Dominic Metzger • Karl Rittger • Dave Siegel Former • Anurag Acharya • Rajendra Bose • Scott Denning • Debbie Donahue • Jim Duff • Calin Duma • Erik Fields • Jim Gray • Steve Miley • Jordan Morris • Mark Pelletier • Pete Peterson • Walter Rosenthal • Klaus Schauser • Håvar Valeur

  33. Bose, R. and Frew, J., 2005. Lineage retrieval for scientific data processing: a survey. ACM Computing Surveys, vol. 37, no. 1, pp. 1-28. doi:10.1145/1057977.1057978 Dozier, J., and Painter, T.H., 2004. Multispectral and hyperspectral remote sensing of alpine snow properties. Annual Review of Earth and Planetary Sciences, vol. 32, pp. 465-494. doi:10.1146/annurev.earth.32.101802.120404 Molotch, N.P., Painter, T.H., Bales, R.C., and Dozier, J., 2004. Incorporating remotely sensed snow albedo into spatially distributed snowmelt modeling. Geophysical Research Letters, 31, L03501 doi:10.1029/2003GL019063 Frew, J. and Bose, R., 2001. Earth System Science Workbench: a data management infrastructure for Earth science products. In: Kerschberg, L. and Kafatos, M. (eds.) 2001. Proceedings, 13th International Conference on Scientific and Statistical Database Management (SSDBM 2001), pp. 180-189. doi:10.1109/SSDM.2001.938550 To Probulate Further… http://www.snow.ucsb.edu : Publications

More Related