1 / 22

GLAST Large Area Telescope Data Access Tony Johnson Stanford Linear Accelerator Center

Gamma-ray Large Area Space Telescope. GLAST Large Area Telescope Data Access Tony Johnson Stanford Linear Accelerator Center tonyj@slac.stanford.edu. http://glast-ground.slac.stanford.edu/. Outline. Topics Covered xrootd LAT Data Catalog Features Web Interface Tools Download Manager

delta
Download Presentation

GLAST Large Area Telescope Data Access Tony Johnson Stanford Linear Accelerator Center

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gamma-ray Large Area Space Telescope GLAST Large Area Telescope Data Access Tony Johnson Stanford Linear Accelerator Center tonyj@slac.stanford.edu http://glast-ground.slac.stanford.edu/

  2. Outline • Topics Covered • xrootd • LAT Data Catalog • Features • Web Interface • Tools • Download Manager • Skimmer • WIRED • Astro Server • Miscellaneous

  3. xrootd • xrootd • System developed at SLAC to manage large datasets • Distributes files across disks • Maximizes throughput • Minimizes manual disk management • Automates archiving datasets to (and restoring from) tape • Provides more reliability and scalability than NFS • Supports access control based on GLAST collaborator list • Has been in used for OpsSim2 and “Big MC Run” • Mostly working smoothly • Miscellaneous idiosyncrasies that need to be understood • Timeout problems when reading files

  4. LAT Data Catalog • Data catalog is a database designed for tracking LAT datasets • Can be used with • Disk files in AFS, NFS, or XROOTD servers, or tape archives • Data created inside or outside of processing pipeline • Data created/stored at SLAC or elsewhere • One or more locations per dataset • Simplifies access to data by providing a uniform view of files irrespective of their physical location • Allows data to be organized into a tree of “virtual” folders • Folders don’t have to correspond to physical location of data • Allows data to have associated “meta-data” • Some meta-data is required and verified by catalog • size, location, run range, creation date • Other meta-data is user-defined and arbitrarily extensible • Data can be • Browsed using virtual folders and “groups” • Folders contain arbitrary sub-folders, datasets and groups • Groups contain homogeneous list of datasets • Searched using meta-data • E.g. DatasetType=MC && RunMin > 50 && RunMin < 100 • Data crawler • As new datasets are registered crawler validates files and extracts meta-data (file size, number of events, etc).

  5. LAT Data Catalog - Web Interface Access/ Authentification handled by web Dataset Description Events, file size, run range automatically set by “crawler” Supports mirroring at multiple sites Browsable tree of datasets Meta-data added by creator • http://glast-ground.slac.stanford.edu/DataCatalog/

  6. LAT Data Catalog - Tools • Pipeline Tools • From within “Pipeline Scriptlet” datasets can be • registered together with meta-data and multiple locations • located using meta-data and passed to subsequent processing stages • Command Line Tools • Available now • registerDataset • Wildcards supported for registering many datasets at once • find • List/search for files • addLocation • addMetadata • Coming soon • remove • move • Java API • Programmatic access to full functionality • More Info • Data catalog User’s Guide • http://confluence.slac.stanford.edu/display/ds/Data+Catalog+Users+Guide

  7. Recent Improvements • Line-mode client find command • datacat find -G merit /MC-Tasks/OpsSim/opssim2-GR-v13r9/runs -s RunMin root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000002-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000003-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000004-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000005-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000006-merit.root • datacat find --recurse --search-groups -F 'DataType=="MERIT"&&nMetStart>=257731200 && nMetStart<=257731202' -S SLAC_XROOT -s TaskName -s Name /MC-Tasks/OpsSim/ root://glast-rdr//glast/mc/OpsSim/opssim2-GR-HEAD1-1041-2-6/merit/opssim2-GR-HEAD1-1041-2-6-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-HEAD1-1041-2-6/merit/opssim2-GR-HEAD1-1041-2-6-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p1/merit/opssim2-GR-v13r9p1-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p1/merit/opssim2-GR-v13r9p1-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2/merit/opssim2-GR-v13r9p2-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2/merit/opssim2-GR-v13r9p2-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2-np/merit/opssim2-GR-v13r9p2-np-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2-np/merit/opssim2-GR-v13r9p2-np-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p3/merit/opssim2-GR-v13r9p3-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p3/merit/opssim2-GR-v13r9p3-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-nocel/merit/opssim2-nocel-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-nocel/merit/opssim2-nocel-000001-merit.root • Available now in DEV, feedback encouraged • Dan is preparing adding to data catalog user’s guide • Enhancements to data catalog access in pipeline • Access meta-data from search results

  8. Recent Improvements • New faster crawler • Original crawler was not able to keep up with MC running at full throttle. • New crawler processes files in parallel and can easily keep up • During Ops Sim2 problems discovered with files >2GB in length • Now fixed

  9. Status/Problems/Plans • Problems • Can be painfully slow (with 5,000,000 datasets) • New oracle database being tested now • Karen working on adding “materialized views” • Further optimization of queries needed • Sensible pagination of large datasets • Web interface needs to allow selection of data based on • Run number range • Time range • Meta-data search (c.f. line-mode client) • File versions • As of Ops Sim 2 L1Proc registers multiple versions of files • r0257998848_v001_merit.root • r0257998848_v002_merit.root • Data catalog does not know these are multiple versions of the same file • Sends them both to the skimmer  duplicate events • Propose to add versioning to data catalog (show only latest by default) • Need Custom Views of data • E.g. All ASP products for run nnn source abc • Plan • Fix problems

  10. Download Manager • One-click download of multiple files • Inherits authorization from web login • note no anonymous FTP in future – SLAC account will be required for data access • Works with ftp:, http: and root: • Validates files (length, checksum) against data catalog • Supports simultaneous download of multiple files • Does not download files which already exist in target dir • So easy to fetch recently added files • Can resume download of partially downloaded files

  11. Status/Problems/Plans • Several problems discovered during Ops Sim 2 • 100% CPU usage after file recovery (fixed) • Bad error message if checksum inconsistent (fixed) • Problems downloading files >2GB (almost fixed) • New feature • Start/Pause download requested (now available) • Feature requests pending • Ability to download select run/time ranges • This will work automatically once this feature is added to data catalog web application • Non-GUI version for automated download/sync of data • Ability to select files to download from GUI (without web)

  12. LAT Data Skimmer • Allows data to be selected using “TCut” on tuple columns • Can output either Root or Fits (FT1) files • Uses Pipeline II for data processing • Allows parallel processing for large tasks • Output available for download for 10 days • Complete skim history maintained for later reuse

  13. 3 Ways to Access Data Skimmer • Directly from Data Portal • http://glast-ground.slac.stanford.edu/DataPortal/ • click on “Simple Skimmer” • Data Processing Page(s) • From the Data Catalog

  14. LAT Data Skimmer

  15. Status/Problems/Plans • Problems • Backend/root crashes • new (compiled) backend available soon • E-mail notification should include data dir even if failed • Need to be able to navigate from pipeline> data dirs • Skimmer improvements in progress • Ability to skim more types of files • “svac” “cal” and “gcr” added by David Chamont • Web interface needs to catch up • Ability to output more event types • Full Recon, Digi, MC trees • “Extended Event” (intermediate between FT1 and Merit) • Event Lists • CompositeEventLists (CEL) files • Access to more “expert” options

  16. Event Display (WIRED) • WIRED allows quick look at detector response • can be installed directly from Web with no additional GLAST software required. • Uses “HepRep” interchange format/infrastructure (shared with FRED)

  17. Event Display (WIRED)

  18. Status/Problems/Plans • According to rumour doesn’t work outside my office • Actually it doesn’t work in my office either • But it did work fine for DC2 data • Invariant under spatial translations/rotations • Now being hooked up to data catalog/xrootd • Issue related to CEL files in gleam being investigated • Should be working again in next few days • “Event Display” link will appear it data catalog • Will support browsing events or selection of specific events

  19. Astro Data Server • Similar to skimmer, allows events to be selected using cuts • Cuts can only be on position in the sky, energy, time, and event category • Works much faster than Skimmer • Currently loaded with DC2 data • Currently being refurbished for use with Service Challenge data and beyond • Will load all events as soon as they are produced by L1Proc • User will be able to select • all data including partial runs • only “complete” runs • Loose event cuts CTBClassLevel>1 • User can select CTBClassLevel category • Able to output FT1, FT2, Extended event files, Merit root files • API for programmatic event selection • Will be used by ASDC tools • Closer integration with data catalog, skimmer

  20. Astro Data Server • Astro data server will remember the last set of parameters you used • Astro Server also has a “Favorites” page • Keeps a list of your “favorite” search parameters

  21. Status/Problems/Plans • Was used for SC2 55 day run • Not used in Ops Sim 2 • Still plan to • Load data from L1Proc • Add programmatic interface for use by ASP/ASDC tools • Better integration with Data Portal • Bottom of priority list

  22. Miscellaneous • Data Access Restrictions • Starting very soon (this week hopefully) you will need to be a “glast collaborator” to access files from xrootd • You will need to login to access data catalog/download manager • Need to define standard skims • Automate their production • Part of RSP? • Automate their registration in data catalog • Access to ASP/RSP data has not been discussed here • But is in the plan • Feedback from Ops Sim2 has been very useful • Not all digested yet • Need more/better documentation • Data Access frequently asked questions • http://confluence.slac.stanford.edu/x/zgAz • Please suggest more FAQ’s • More feedback welcome • http://glast-ground.slac.stanford.edu/DataPortal/

More Related