1 / 40

Middleware Summary

Middleware Summary. Robin Middleton (RAL/PPD). Overview. What’s in EDG2.0 What’s planned for EDG2.1 (Rel 3.0) Towards GridPP2 General. Requirements. HEPCAL  43 Use cases EDG 1.4 6 Fully implemented 12 Mostly satisfied (restrictions/complications)

Download Presentation

Middleware Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Middleware Summary Robin Middleton (RAL/PPD) GridPP7 – Oxford – R.Middleton

  2. Overview • What’s in EDG2.0 • What’s planned for EDG2.1 (Rel 3.0) • Towards GridPP2 • General GridPP7 – Oxford – R.Middleton

  3. Requirements • HEPCAL  43 Use cases • EDG 1.4 • 6 Fully implemented • 12 Mostly satisfied (restrictions/complications) • 16 not implemented as functionality missing • 9 partially implemented • EDG 2.0 • Re-assessment (for 1.4, 2.0 & 2.1) by AWG (FH) by 11th July • Authorisation, job control, optimisation improvements • Missing features • Virtual Data (not within EDG scope) • MetaData catalogues (some m/w support, but exp. clarification needed) GridPP7 – Oxford – R.Middleton

  4. EDG 2.0 GridPP7 – Oxford – R.Middleton

  5. What’s in 2.0 ? • Starting from Rel 1.4… • Move to RH7.3 & use LCFGng – Jan 2003 • Move to Globus 2.2 & Condor 6.4 – Feb 2003 • Based on the VDT packaging; compatibility with other projects • http://www.lsc-group.phys.uwm.edu/vdt/home.html • RB support interactive jobs, MPICH, checkpointing • RLS - Replica Location Service, etc (see J.Casey’s talk) • RGMA as backbone of information & monitoring system (S.Fisher talk) • Updated to GLUE schema • Storage Element (see J.Jensen’s talk) • Network Cost Function GridPP7 – Oxford – R.Middleton

  6. Job Checkpointing Job checkpoint states saved in the LB server Retrieval of job checkpoint Logging & Bookkeeping Server Job • Also used (even in rel. 1) as repository of job status info • Already proved to be robust and reliable • The load can be distributed between multiple LB servers, to address scalability problems Saving of job checkpoint state state.saveState() GridPP7 – Oxford – R.Middleton

  7. RLS in EDG2.0 User Interface or Worker Node Resource Broker Virtual Organization Membership Service Information Service Replica Metadata Catalog Replica Location Service Replica Manager Replica Optimization Service Storage Element Storage Element SE Monitor Network Monitor GridPP7 – Oxford – R.Middleton

  8. RGMA in EDG2.0 Ack : WP3 Archiver (LatestProducer) LDAP InfoProvider GLUE Schema Consumer (CE) ConsumerAPI Latest Producer Consumer (SE) GIN RDBMS Consumer (SiteInfo) R-GMA Stream Producer GOUT Stream Producer LDAP Server R-GMA Consumers GIN • Push mode • Updates every 30s • >70 sites (simul.) LDAP InfoProvider GridPP7 – Oxford – R.Middleton

  9. Client App Tomcat Web Service AXIS SE in EDG2.0 C Client Java Client Client App Java Client API API • The design of the SE follows a layered model with a central core handling all paths between client and MSS. SE HTTP library SSL socket Apache library RMANMAN Core is flexible and extensible making it easy to support new protocols, features and MSS SE SE core GridPP7 – Oxford – R.Middleton

  10. WP7 in EDG2.0 http://comp7.in2p3.fr/wp7archive/ GridPP7 – Oxford – R.Middleton

  11. WP7 – GridFTP Logging GridPP7 – Oxford – R.Middleton

  12. WP7 Network Monitoring GridPP7 – Oxford – R.Middleton

  13. EDG 2.1 GridPP7 – Oxford – R.Middleton

  14. EDG2.1 • Integration schedule • Detailed integration times throughout July & August • Feature freeze end August…only debug, integrate & fix after this • General (effort) move from design & code -> test & bug-fix • Quality above quantity • Final software release of EDG • Some important new functionality (in the wings) GridPP7 – Oxford – R.Middleton

  15. Testing & Bugfixing gcc3.2.2/RH7.3 (RH8/9 test on WN/UI); VDT update? VOMS & Security (ACLs, LCMAPS,…) Scalability/Stability measures (RLS,R-GMA,MSS staging,…) New Functionality (gridOpen, DAGs, …) EDG 2.1 (aka Rel 3.0 !!) (Ack: E.Laure CERN/EDG) GridPP7 – Oxford – R.Middleton

  16. TB2.1 – WP1 • Direct interaction of RB with R-GMA (Incl. Logging & Bookkeeping) • Integration with VOMS (proxy renewal) • Job dependencies & DAGman scheduling • Job partitioning • RB support for Data prefetch (depends on WP2) • Accounting & Advance Reservation • Strong dependence on underlying system  probably only a demonstrator GridPP7 – Oxford – R.Middleton

  17. TB2.1 – WP1 - DAGs A = [ Executable = "A.sh"; PreScript = "PreA.sh"; PreScriptArguments = { "1" }; Children = { "B", "C" } ]; B = [ Executable = "B.sh"; PostScript = "PostA.sh"; PostScriptArguments = { "$RETURN" }; Children = { "D" } ]; C = [ Executable = "C.sh"; Children = { "D" } ]; D = [ Executable = "D.sh"; PreScript = "PreD.sh"; PostScript = "PostD.sh"; PostScriptArguments = { "1", "a" } ] GridPP7 – Oxford – R.Middleton

  18. TB2.1 – WP1 – Job Partitioning JobType = Partitionable; Executable = ...; JobSteps = ...; StepWeight = ...; Requirements = ...; ... ... Prejob = [ Executable = ... Requirements = ...; ... ... Aggregator = [ Executable = ... Requirements = ...; ... ... ]; GridPP7 – Oxford – R.Middleton

  19. TB2.1 – WP2 • Full RLS deployment • RLI integrated with LRC • VOMS aware security • EDG Trust Manager • EDG Authorisation Manager (coarse grained) • File pre-fetch (needed by WP1) – not for 2.1 • Replica Subscription Service – not for 2.1 • First step towards proxy service for supporting sites w/o outbound IP • Must not compromise support RLS or security GridPP7 – Oxford – R.Middleton

  20. RLS at SC2002 Ack : G.McCance Used Globus RLS GridPP7 – Oxford – R.Middleton

  21. Registry1 Registry2 Registry3 Info mastered by Registry1 Info mastered by Registry2 Info mastered by Registry3 Copy of info from Registry1 Copy of info from Registry2 Copy of info from Registry1 Copy of info from Registry3 Copy of info from Registry2 Copy of info from Registry3 TB2.1 – WP3 • General performance enhancements • Performance enhancement forGRM/PROVE use • Registry resilience (replication) • VOMS aware security(authentication + basic authorisation) Producer1 Producer2 GridPP7 – Oxford – R.Middleton

  22. TB2.1 – WP4 • Resource management • GLUE info provider maintenance • Support for LSF, Condor & advance reservation • Fault tolerance framework • Gridification • LCMAPS-1.0, LCAS-2.0, VOMS plugin, job repository • Monitoring (see Jan van Eldik’s talk) • Full architecture, Oracle & MySQL backends, alarm display • New Install & Config architecture piloted at CERN, but NOT replacing LCFGng before end of EDG GridPP7 – Oxford – R.Middleton

  23. Packages (rpm, pkg) • Software Package Mgmt Agent (SPMA) • SPMA manages the installed packages • Runs on Linux (RPM) or Solaris (PKG) • SPMA configuration done via an NCM component • Can use a local cache for pre-fetching packages (simultaneous upgrades of large farms) Install & Config SWRep Servers Ack : WP4 http cache SPMA packages Mgmt API nfs SPMA.cfg (RPM, PKG) ACL’s • Automated Installation Infrastructure • DHCP and Kickstart (or JumpStart) are re-generated according to CDB contents • PXE can be set to reboot or reinstall by operator ftp SPMA SPMA NCM Components NCM Node (re)install? • Software Repository • Packages (in RPM or PKG format) can be uploaded into multiple Software Repositories • Client access is using HTTP, NFS/AFS or FTP • Management access subject to authentication/authorization Configuration Information is stored in the local cache. It is accessed via NVA-API Installation server Cdispd PXE CCM PXE handling Mgmt API Registration Notification ACL’s Node Install DHCP • Node Configuration Manager (NCM) • Configuration Management on the node is done by NCM Components • Each component is responsible for configuring a service (network, NFS, sendmail, PBS) • Components are notified by the Cdispd whenever there was a change in their configuration DHCP handling Configuration Data Base (CDB) Configuration Information store. The information is updated in transactions, it is validated and versioned. Pan Templates are compiled into XML profiles KS/JS KS/JS generator Client Nodes CCM CDB GridPP7 – Oxford – R.Middleton

  24. TB2.1 – WP5 • SRM interface • Asynchronous interaction • SE setup for WP9(EO)/10(Bio) • VOMS aware security • Improved error handling GridPP7 – Oxford – R.Middleton

  25. TB2.1 – WP7 • Probe Coordination Protocol • Network cost function enhancement • Network GLUE schema prototype • QoS & high throughput tests with GEANT GridPP7 – Oxford – R.Middleton

  26. high frequency low frequency CA CA CA TB2.x User Authorisation host cert(long life) service user crl update user cert(long life) VO-VOMS registration registration VO-VOMS voms-proxy-init VO-VOMS proxy cert(short life) service cert(short life) VO-VOMS authz cert(short life) authz cert(short life) authentication & authorization info edg-java-security LCAS GridPP7 – Oxford – R.Middleton

  27. voms-ldap-sync voms-ldap-sync VO-LDAP VOMS VO-LDAP VOMS edg-mkgridmap edg-mkgridmap grid-mapfile grid-mapfile grid-proxy-init grid-proxy-init service user service user proxy proxy phase 0. phase 1. testing the VOMS servers user management on VOMS VOMS VOMS VO-LDAP edg-mkgridmap voms-proxy-init grid-mapfile grid-proxy-init voms-proxy-init service user service user proxy (voms) proxy (voms) phase 2. phase 3. fully migrated: only VOMS-aware services compatibility mode: mixed services TB2.1 - Security • VOMS deployment • Server manually set up at several places • Work on auto-config ongoing – start testing soon GridPP7 – Oxford – R.Middleton

  28. SAM – D0/CDF GridPP7 – Oxford – R.Middleton

  29. GridPP-2 - Middleware GridPP7 – Oxford – R.Middleton

  30. GridPP-2 Middleware Directions • Policy • Mission critical to PP OR • Demonstrable lead on international stage OR • Contribute to wider programme leveraging benefit for PP • Guidelines • Clustering of expertise • Useful to LCG programme • Partnership/collaboration where possible (e.g. UK e-Science) • Tech. transfer to industrial sector • Awareness of / engagement with GGF (move to OGSA/I) • Areas • Data & Storage Management • Workload Management • Information & Monitoring • Security • Networking GridPP7 – Oxford – R.Middleton

  31. Evolution of m/w Effort GridPP7 – Oxford – R.Middleton

  32. GridPP2 - Middleware • Data Storage & Management • Fuller integration of exp meta-data with m/w • Site-local data management • caches, space reservation, cleanup • Full integration of MSS • Improved replica optimisation • Workload Management • OGSIfication of the RB • Redesign of WM architecture • Develop Java client • Autonomic aspects of WM • New scheduling algorithms GridPP7 – Oxford – R.Middleton

  33. GridPP2 - Middleware • Information & Monitoring • Requirements & architecture revision/cycle • OGSIfication of core service(s) (see A.Djaoui’s talk) • QA & Production Service Dev • Information model co-ordination • End-user tools/displays • Security • LCG Security • Local Access Control • Pool accounts, GACL, /grid, batch interfaces • Local Usage Control • GridSite • VO Access Management • VO Usage Management • Interface alternative authorisation frameworks • Audit and grid intrusion detection • Tool ports to other UNIX & Windows GridPP7 – Oxford – R.Middleton

  34. GridPP2 - Middleware • Networking • Next generation Grid Network Performance Measurement Service • High performance data transport • Resource allocation & reservation services • (UKLIGHT participation) • (PPNCG support) GridPP7 – Oxford – R.Middleton

  35. General GridPP7 – Oxford – R.Middleton

  36. EDG Quality Group (Slide : R.Jones – Barcelona meeting) • The Quality Group (QAG) was created in August 2002 with Quality representative (QAR) from each WP. The QAR ensure the measures are applied inside his/her WP. Chaired by Gabriel Zaquine. • http://www.eu-datagrid.org/QAG/ • The Quality Group has produced an EDG developers guide document • The document gives an overview of the tools available and conventions to be followed for the software development within EDG: • Packaging - Code Management – Automatic Build system - Environment - Interfaces and API's - Documentation • Test and validation process - Integration procedure - Style and naming conventions • http://edms.cern.ch/document/358824 • Work on EDG 2.0 shows that conventions are not yet being followed by everyone • All developers must read this document and ensure their software complies GridPP7 – Oxford – R.Middleton

  37. EDG Architecture Group (Slide : R.Jones – Barcelona meeting) • ATF has been working to clarify the details of the interactions and interfaces of EDG 2.0 • Continues to meet on a monthly basis http://agenda.cern.ch/displayLevel.php?fid=3l148 • Work driven by use cases provided by the application representatives • A document describing the architecture for EDG 2.0 has been produced: https://edms.cern.ch/file/368971/ • ATF has been further empowered to “own” the external interfaces • Intended to avoid discrepancies between the interface details agreed by ATF and those found in the software delivered by the mware WPs • Baseline document with interface definitions now in preparation • Mware WPs please make sure ATF have the APIs for your external interfaces GridPP7 – Oxford – R.Middleton

  38. How far have we come/to go ? (n.b. very subjective !!) GridPP7 – Oxford – R.Middleton

  39. Summary • Achievements • RB, VOMS, RLS, RGMA, SE, LCFG,… • Challenges • LCG-1, LCG-2, … • OGSA migration • Engineering production quality (R3 etc.) Scalability, Scalability, Scalability Stability, Stability, Stability GridPP7 – Oxford – R.Middleton

  40. END GridPP7 – Oxford – R.Middleton

More Related