1 / 18

GridPP2: Metadata Management

This workshop focuses on grid-enabling metadata services for experiments, building upon previous work and forming a UK metadata group within GridPP2. The group will take responsibility for common experiment metadata technologies and collaborate with other projects.

sophied
Download Presentation

GridPP2: Metadata Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GridPP2: Metadata Management Gavin McCance University of Glasgow GridPP2 Workshop, UCL

  2. GridPP2 MiddlewareMetadata Management

  3. Work areas • Metadata management and UK metadata group • Storage management •  See Jens’ talk

  4. Metadata Management • The focus is upon Grid-enabling metadata services for the experiments • Building upon our previous work in this area • Building upon experiments’ existing work in this area • Formation of a UK metadata group within GridPP2 • 1 generic Grid metadata post @ Glasgow • ~1 post per experiment • ATLAS @ Glasgow, LHCb @ Oxford, CMS @ BristolUS expts, others?? • The UK metadata group will form part of the work of these experiment posts • Interaction with the UK data management support teams

  5. GridPP2 Metadata Group • Purpose will be to • Take overall responsibility for common experiment metadata technologies in order to Grid-enable the experiments’ metadata • Identify the commonalities and experience across experiments and make sure these are recognized • i.e. technologies, schema: data product navigational problem • Come to agreement and feed this back into the wider ARDA process • Work directly with interested groups forming the ARDA • EGEE JRA1 Data Management Group (@CERN) • LCG Deployment Teams (@CERN) • LCG Experiments • IT Database group (@CERN)

  6. Metadata Responsibilities • Generic metadata post @Glasgow: • Concentration on the technologies used to create scalable, manageable and fault-tolerant metadata services • The underlying Grid software stack • Emphasis upon the service, not just the product • 24/7 supportable production metadata services • Not prescribing things like the schema, or saying the ‘API must look like Spitfire’: prototype interfaces should be based upon experiments’ existing metadata interfaces • Will track, develop and adopt as necessary Grid metadata access standards • Feed into standards to make sure we’re in a position to benefit from the future production products that implement these standards • Feed PPE use-case and experience back into the wider world

  7. Metadata Responsibilities • Experiment metadata posts (~1 per experiment): • Document existing implementations from the experiments • Make sure all the experiments’ use-cases are satisfied by the products and the technologies being proposed by the group • Work within the group to ensure that commonalities and experience across experiments are recognized and effort is not wasted • At the technology level – e.g. using the same underlying Grid software stack • At the interface level – e.g. GANGA • Possibly at the schema level… • Feed this understanding and agreement back into the wider ARDA process and back into their own experiments • ARDA terminology: Dataset metadata  ARDA Metadata service Data product navigation  ARDA Job Provenance service

  8. Short term plans of the group… • Immediate work: • Current task of the group is information gathering • http://www.gridpp.ac.uk/datamanagement/metadata/ • A review of how each experiment uses metadata: • What you mean by the term metadata: what does it include? • Details on this.. how do you use the metadata? • Implementation and deployment details: how is it split into services, the size of metadata, details on the schema, technologies used, etc. • Relation to other products, e.g. POOL • Future directions already in people’s minds?

  9. …Short term plans of the group • The results of this review are being made available on a web page and should be pulled into a document • Common format to easily compare the different experiments uses of metadata • This document will serve as input to a metadata workshop • ~end of April..? Still to be arranged… @Glasgow? VRVS? • Purpose of the workshop will be to identify areas of commonality and work on the future programme for the group • Generate ~short-lived sub-tasks within the group with a clear purpose and outcome • Continue regular planning meetings to guide these sub-tasks • Should ensure we have input from other sciences as well.. • Can request input from the EDG WP9/10 groups and EGEE Biomed groups

  10. Links to other projects… • We can’t do this ourselves… • EGEE JRA1: The JRA1 data management development cluster of EGEE is based at CERN - we will build upon the relationship formed within EDG (it’s a similar team as EDG) • Primary interface to JRA1 will be the generic middleware post at Glasgow • Proposal to work directly with JRA1 DM • i.e. use the JRA1 CVS repository, use the same development tools and infrastructure, use the experience of the testing and integration teams of EGEE, deliver through this group • The large experiment participation in this UK metadata group is seen as a very helpful within the JRA1 DM cluster • Lack of any formal agreement…

  11. …Links to other projects… • LCG / EGEE SA1: products delivered to LCG through EGEE JRA1?? • See UK data management support posts later… • Experiments: members of the experiments will form part of the metadata group • Feed-back the work of this group into the experiments and verify that the proposed solutions will work for their experiments • Hope is to establish a recognized UK lead in metadata that is recognisably cross-experiment • ARDA project: Some combination of the above.. • ARDA is now a real project at CERN, though the details of how we work need to be sorted out

  12. …Links to other projects • Direct testing of our products and solutions for other sciences • Planning to do this through the other EGEE application groups • e.g. biomed have very strict security requirements • Is there another avenue in the UK for this sort of cross-science activity?? • Various Grid and web-service forums: • Global Grid Forum • Mainly the DAIS group, with probable participation in the related Data Area groups • Due to EDG focus on stability and support, we lost touch with the GGF data area groups the last year or so – re-establish… • W3C, OASIS ?

  13. Review of objectives and timelines • Multiple experiment posts with different deliverables and focus • Not all of the experiment posts’ work will be within the scope of the metadata group, but all work done should be reported there so that commonalities can be identified early • As an example of how the work will be divided and for the general timelines, I highlight the relevant objectives for: • The generic-middleware metadata post @Glasgow • The ATLAS post @Glasgow • Then discuss the timelines for the development

  14. Generic middleware objectives • Proforma 2 + 3: • Development of Grid technologies within a service-focussed architecture (such as WSRF) for use in metadata based applications for the experiments; • Delivery of fault-tolerant, reliable and manageable software for this purpose. The emphasis from the beginning will be upon developing services that meet the requirements of the experiments; • Use of this technology for the enabling of existing experiments’ metadata based products in line with the Metadata Catalog service described in the ARDA document (from LCG SC2 RTAG11); • Participation in the Grid Forum data areas to ensure that particle physics is in a position to benefit from developments here. Promising developments will influence the design of the metadata services and we will feed back our requirements and experience into these forums.

  15. ATLAS middleware objectives • Proforma 2 + 3: • Gain a conceptual understanding of the existing ATLAS metadata structures and the ATLAS specific use-cases that drive them; • Develop, with reference to the use-cases and interactions with other ATLAS developers, the metadata necessary to support the navigational use-cases. Both the schema itself and the optimal location of the metadata require study; • Understand the analysis use-cases and optimise the event to file granularity for different types of analysis data (ESD, AOD, TAG) depending upon the use-case. Develop automated ways to monitor the best granularity of event data based on analysis access patterns; • Implement fully working and documented solutions, working with the ATLAS and UK metadata teams to ensure that the developments here are fully integrated with the rest of the ATHENA/GAUDI software, in particular, with the ATLAS Metadata Infrastructure (AMI) product.

  16. Timescales for the deliverables… • Pre – Participate in architecture discussions and prototyping • PM1 – Architecture and Planning “Report” • Placing exercise in response to the EGEE architecture • PM2 – Understanding of the Experiment Metadata Requirements (process started now…) • PM3 – Design of Grid Services (Release 1) • PM7 – Software and Associated Documentation (Release 1) • PM9 – Participate in LCG TDR Review • PM10 – Tier 1 and 2 Support “Report” • In collaboration with UK data management support posts • PM11 – Detailed Metadata Requirements “Report” • PM11 – Architecture and Planning (Release 2)

  17. …Timescales for deliverables • PM12 – Design and Refactor of Grid Services (Release 2) • PM16 – Software and Associated Documentation (Release 2) • PM21 – Tier 1 and Tier 2 Detailed Support Plan • In collaboration with UK data management support posts • PM23 – Architecture and Planning (Release 3) • PM26 - Design and Refactor of Grid Services (Release 3) • PM32 – Software and Associated Documentation (Release 3) • PM36 – Final Report

  18. Support team of GridPP2 • UK data management support posts • Aim: to provide first-level support for all DM software • first stop for UK system administrators • Work directly with the development and deployment teams (GridPP2 Metadata Group and Storage, EGEE and LCG) • Provide hands-on deployment help for data challenge support • Develop how-to portal to collect deployment experience • Feed back sys-admin issues and experience to developers • Site policies, quotas, firewalls – survey sysadmins • Develop site validation tools • Responsible for developing the overall support plan for the data management services beyond GridPP2 • Need to fit all this in with the rest of the UK Support Plan

More Related