1 / 30

The Data Access Layer for D0 Run II Design and Features of SAM

The Data Access Layer for D0 Run II Design and Features of SAM. Vicky White for the SAM team Lee Lueking, Vicky White, Heidi Schellman, Igor Terekhov, Matt Vranicar, Julie Trumbo, Rich Wellner, Steve White, Sinisa Veseli. SAM Overview.

waneta
Download Presentation

The Data Access Layer for D0 Run II Design and Features of SAM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Data Access Layer for D0 Run IIDesign and Features of SAM Vicky White for the SAM team Lee Lueking, Vicky White, Heidi Schellman, Igor Terekhov, Matt Vranicar, Julie Trumbo, Rich Wellner, Steve White, Sinisa Veseli Vicky White

  2. SAM Overview • SAM stands for Sequential Access Model. It is a major part of the D0 Data Handling System handling all data access : • from the writingof RAW event data files directly from the online, • to the reading of whole data-streams into the Farms for reconstruction and writing of reconstructed data • to read/write access to Thumbnail data and datasets of AOD • to read access to RAW or reconstructed data for a single event • Other services and layers of functionality are needed to build the entire Data Handling System. These include • access to tapes and movement of data from tape to disk • regulation of analysis jobs through a batch system (where needed) • management of production jobs through a Farm system Vicky White

  3. Software Layers for Data Management and Analysis User Programs/Jobs for Analysis and writing of files e.g. Farm specific job hooks Farms Production System e.g. Analysis job resources and parameters, stages of jobs e.g. D0 file access package d0om D0 Framework package calls Batch System(s)/Job Scheduler Data Access (SAM) Storage Management (ENSTORE) E.g. Stage files Inter-Station Operator Interaction Hardware Resources CPUs and scratch disk Tape Drives Network Disk Tape Robots Local Disk Vicky White Tape Shelves

  4. D0 Hardware Environment Vicky White

  5. Goals and Strategies • Cluster data into tertiary storage in a manner corresponding to expected access patterns. • Cache frequently accessed data on disk. • Organize data access to optimize use of robot, tape drive, and network resources. • Carefully track locations and processing steps for all data. • Estimate the resources required before access requests are initiated. • Provide user interfaces which integrate easily into data processing and analysis activities - ie d0 framework programs. Vicky White

  6. Organization of data - an optimization strategy File & Event Catalog Database ties together all data tiers User and physics group (derived) data Warm Cache Event Data Tiers Warm Cache Physical Clustering Vicky White

  7. Catagorize Access patterns - handling built into SAM Mass Storage Type of data/mode of processing Data Consumers One user many jobs Farm reconst. Freight Train Pick Event User File Thumbnail =Group of Users =Data flow =File =Disk Storage Thumbnail =Tape Storage =Pipeline Name =Single User =Event Vicky White

  8. Enstore • Enstore is the storage management system (E176 - next talk) • Provides: • encp - copy files between media and disk • pnfs - namespace of files (courtesy DESY) • enstore - user control interfaces to Enstore system • web interface - details of what the system is doing • enstore_tape - volume import mechanism • Enstore provides scalable system - designed for each mover/tape combination to read/write data to tape at full tape rate (~ each 10MB/sec expected) Vicky White

  9. SAM Database knows all! The SAM database tracks information data about • Events, Files, File locations, File processing and • lineage, meta-data, how/when analysed known for every file • Also configuration and operational data for SAM itself • configure and control resources, set caching policies, etc…. • support robustness features -- restart • track and understand performance of system itself The database keeps excellent track of the correlation between “Physics Data” and “Conditions Data”. • Support for Data export and remote sites Vicky White

  10. Run Volume Data Tier Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Files ID Name Format Size # Events Physical Data Stream Trigger Configuration Project Event-File Catalog Creation & Processing Info SAM Simplified Schema- Event/File meta-data part Run Conditions Luminosity Calibration Alignment Vicky White

  11. Processing Chains including Reprocessing/Merging/Splitting Typical file split Reprocessing Potential file merge location EDU50,250 Bad Archived Reco Farm Good Freight train processing Other DAQ-L3 Analysis processing Thumbnail Thumbnail processing Other streams Pick Events selection Potential file merge location Vicky White

  12. The database is implemented in Oracle - why … robust and mature product - essential to stability -- need high availability system size of database will be large (including Event Catalog) up to 0.5 TB or more - needs partition/backup/index support for large databases many design tools and monitoring,backup and tuning tools available Using Oracle Enterprise Manager to manage and monitor databases Vicky White

  13. SAM is a fully distributed client/server system both internal and client interfaces defined in IDL and implemented using CORBA. This gives us simple clients support for multiple languages -- C++, Python, Java support for multiple platforms right from start - stay on top of technology changes (Linux, IRIX, OSF1, Sun current platforms) Vicky White

  14. SAM command -> Servers manages disk cache and all projects on a single ‘Station’. Interfaces with Batch system Station Master sam command Project Master or File Storage Server arranges the delivery of the set of files for a single project - or stores a file,records location web page/GUI supplies information, resolves queries, records transactions and file information Database Server Vicky White

  15. Behind the scenes are more servers ... Station CORBA Name Server Project or File Storage Log Optimizer Database Info Stager(s) Program which copies or ‘gets’ a file for you when it is not in the local disk cache Vicky White

  16. Client/Server and CORBA Only one ‘singleton’ - Optimizer -- must look globally to control access to the Robot Easy to run many parallel universes production, development, integration, Mary’s test, etc. Many CORBA products -- chose 2 freeware ones Orbacus (for C++ and Java) http://www.ooc.com Fnorb (for Python) http://www.fnorb.org Servers are currently all in C++ or Python clients exist for all 3 languages Vicky White

  17. SAM clients programs or people that use the services of SAM to store or retrieve data, categorize it, browse it, configure SAM resources and policies,etc. Vicky White

  18. Using SAM • SAM (from user perspective) is just a few useful commands • all are available on the command line • a few from a web-GUI (define project etc.) • some (more later) are available in from within d0reco or any other d0 framework program • The SAM database can be queried and browsed extensively Vicky White

  19. SAM user commands - e.g. sam create project definition< defin. params> sam create project snapshot<project params> sam create analysis project<project params> sam verify snapshot<snap params > Vicky White

  20. SAM user commands e.g. sam start project<…> sam start consumer<…> sam start process<…> sam get next file<…> sam release< file params…> sam store<file and file metadata params…> sam locate <file> sam dump <project> … and many more ….. a Vicky White

  21. SAMManager and Framework and d0om (persistency) SAM interaction through a) name expanders - used by d0StreamName b) File Open/Close messages generated by ReadEvent and WriteEvent sam: in file name will be resolved by a SAM name expander --> SAM Servers to get next file, or get place/name for output file Vicky White

  22. Constraint to SQL Helper Vicky White

  23. Vicky White

  24. Vicky White

  25. Usage and Status • System is documented and has been used to store over 1.5TB of MonteCarlo Data (see E311) • Most of the files have been fetched and processed through a 50 node reconstruction Farm, using SAM (E60 - Monday) • Data is added to the system daily-- produced at Fermilab and by collaborators in France, Amsterdam, Prague and ftp’d to Fermilab along with the required meta-data description file. A feature to store parameter files along with the data files has recently been implemented Vicky White

  26. Performance and Robustness • we have started to build a serious test harness to emulate the entire load - from online --> random end users • promising performance numbers so far -- easily got 20MB/sec into Origin with only 1 Gbit Ethernet and limited number of current generation tape drives (3MB/sec) • the test harness runs with tens of projects, each delivering cached and tape resident files, each with tens of consumer analysis processes gaining access to all of them • Farms also got up to 20MB/sec. 50 nodes all requesting files - this is the required rate during a run (but of course we will test to saturation point of Farm nodes) Vicky White

  27. Vicky White

  28. What next? • strong focus on testing and robustness • aiming for a very high availabilty system -- all servers restart, clients recover, etc. • need to integrate with the Batch system and address the resulting resource management issues • will add pick-of single event data feature and test use of Event Catalog more extensively • more support for outside Fermilab to use system and/or set up their own Vicky White

  29. Analysis outside Fermilab, using SAM • In addition to your program, which must talk to a SAM Project Server and Database Server somewhere, and may need to have files staged, you will need Calibration Data Alignment Data Geometry Data RCP Data dspack files get through d0om interface to a Database Server Other I/o possib. RCP manager extracted RCP files interface to a Database Server Vicky White

  30. Conclusions • We have a working system to first order • Users are starting to use it • Making it robust and highly available is high on our priority list • many aspects from Robot hardware, operating systems, network components, database server machine , Enstore movers and servers and software and last of all SAM servers and client code which sits atop all of that • Involving off-site users and planning to provide access to data for all is starting now and will be there before we run Vicky White

More Related