1 / 14

HEPCAL, PPDG CS11 & the GAE workshop

HEPCAL, PPDG CS11 & the GAE workshop. Ruth Pordes Fermilab presenting (as usual) the work of many others. HEPCALs Documenting Use Cases. A forum for coming to a common understanding and generating/checking Grid middleware requirements across 4 LHC experiments.

tass
Download Presentation

HEPCAL, PPDG CS11 & the GAE workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HEPCAL, PPDG CS11& the GAE workshop Ruth Pordes Fermilab presenting (as usual) the work of many others. HEPCALs Documenting Use Cases. A forum for coming to a common understanding and generating/checking Grid middleware requirements across 4 LHC experiments. Chair of the committees is Federico, and Jeff Templon is the chief editor. HEPCAL - Summer 02 HEPCAL Prime - HEPCAL updated - Spring 03. HEPCAL-II - Analysis Use cases - Phase 1 June ‘03; Phase 2 Nov ‘03

  2. HEPCAL - its usefulness - • Discussion and comments following the release stimulated Test Case implementations for EDG. • Useful in identifying holes; thinking through details of end to end functionality. • Helped to solidify how to move forward to joint “GLUE” testing project. • Joint response from US and EU Grid Middleware projects helped • understanding of boundaries between VDT and EDG components • ability to move to to common underlying infrastructure. • better appreciation of components in LCG, EDG and VDT. • Good reference for glossary and definitions Willingness to have regular updates to this document will contribute to its usefulness -> Hepcal-Prime R. Pordes, GAE workshop

  3. Hepcal aims to give input/guidance to Software in the “Grid Domain” R. Pordes, GAE workshop

  4. HEPCAL-Prime - its relevance • Gives agreed upon definitions and scope of many Concepts. These may be wrong - but there is plenty of text to critique, an active mail list for discussions, and a recognised forum for consensus and decision. E.g. • “catalogues and datasets. A catalogue is a collection of data that is updateable and transactional. A dataset is a read-only collection of data. A special case of the dataset is the Virtual Dataset”. • Long discussion of datasets etc. • We expect the Grid to assign a unique job identifier to each Job. Classify all Jobs into 2 categories of “Organized” or “Chaotic” • Some significant areas of Requirements and Use superficially addressed e.g. • System Wide issues - Architecture, Requirements, Operations • Security - VO, Authorization mechanisms • Treatment of failures and faults • Long transactions and persistent state • Are the fundamental assumptions and scope correct or agreed to? • Mostly FILEs • LDN and GUID • All events part of a tree • Concept that “user” is often an “Agent” or “Role” based capability came late and there are lacks due to this. http://cern.ch/fca/HEPCAL-prime.doc R. Pordes, GAE workshop

  5. HEPCAL-prime has added first Performance Requirements R. Pordes, GAE workshop

  6. HEPCAL-II scope and status • Goal is to provide Use Cases describing Analysis such that Requirements can be synthesized and a Software Architecture and Design started. • First phase “over” for document to be delivered to the SC2 at the end of this month . Not clear that this is sufficient for the new RTAG. • Really only a first pass at bringing people on the committees thinking forward to approach the differences and similarities between Analysis and Production Processing. • At the moment there seem to be a couple of concepts that people agree are different: • May not know the Input Data that is needed til the job is run. (job executions are preceded by Queries to define the input data.) • User Interaction may be required and will have a wide range of “response” needs. • System concepts like planning, prioritization, VO management not included. R. Pordes, GAE workshop

  7. Still simple models of end to end Analysis steps R. Pordes, GAE workshop

  8. Performance Requirements: [ This section needs considerable reworking, still looking for brilliant ideas. ] It is expected to have about 10-15 physics analysis groups in each experiment with probably 10-20 active people in each extracting the data from the earlier scenarios... For the later stages ..the produced data may not necessarily be registered on the Grid. In addition, it is expected to have about 30(?) people per subdetector in each experiment (total of 3-500? per experiment) accessing the data for detector studies and/or calibration purposes. So a total of 400-600 people in each experiment is expected to do the extraction of (possibly private) results. This number is representative; depending on the stage of the experiment the profile might be quite different. • Is there a common data handling layer that is external to the application and has middleware and/or external to middleware components? Still no assumption on this. - is it time to make a decision? Query handlers as an LCG common project? Collaborating with PPDG? R. Pordes, GAE workshop

  9. The Arrow of “increasing interactivity” • The horizontal axis can be divided into general regions based largely on human time-scales: • < 1 sec: Instantaneous. User's attention is continually focused upon the job. • < 1 min: Fast. Time periods spent waiting for response or results is short enough that user will not start another task in the interim. • < 1 hour: Slow. User will likely devote attention to another task while waiting for response/results, but will return to task in same working day. • > 1 day: Glacial. User will likely release and forget. Will return to task after an extended period or only upon notification that task has completed. R. Pordes, GAE workshop

  10. 1.1.1 Persistent interactive environment For each analysis session user should be able to assign a name (in user’s private namespace) to which he/she can subsequently refer in order to • get additional information about analysis status, estimated time to completion,… • find and retrieve partial results of his/her analysis • re-establish complete analysis environment at later stage • …. R. Pordes, GAE workshop

  11. PPDG CS-11 R. Pordes, GAE workshop

  12. PPDG CS-11“Interactive” Physics Analysis on a Grid • Cross Experiment Working Group tp discuss common requirements and interfaces. • Forum to bring information about many needed parallel implementations and prototyping to gain understanding • Extract the common requirements that such applications make on the grid, to influence grid middleware to contain the necessary hooks • Evaluate existing interfaces and services propose extensions/ describe new interfaces as needed • Particularly strong participation has come from analysis tool makers in the US: JAS, Caigee, ROOT. R. Pordes, GAE workshop

  13. PPDG Analysis Tools Work Not focused yet on common development effort. Still a “working group” for PPDG Year3. Expect it to be a focus of Year 4&5. People in PPDG are encouraging us to make it a strong focus development -> production effort sooner? However, PPDG must avoid landing in the todays situation as for Replica Management systems ie 6 different implementations IN PRODUCTION R. Pordes, GAE workshop

  14. ..CS-11 service names to date.. • Discover Resource • Reserve Resource • Matchmaker • Manage Storage • Login/Logout • Install Software • Submit Abstract Job • Submit Concrete Job • Control Concrete Job • Status of Concrete Job • (Status is an exposed interface to every service) • Concrete Job Capabilities. • Sub-Job Management / Partition Job • Estimate Performance • Move Data • Copy Data • Query DataSet Catalog • Manage Dataset Catalog • Manage Data Replication • Access Metadata Catalog R. Pordes, GAE workshop

More Related