400 likes | 492 Views
Middleware Summary. Robin Middleton (RAL/PPD). Overview. What’s in EDG2.0 What’s planned for EDG2.1 (Rel 3.0) Towards GridPP2 General. Requirements. HEPCAL 43 Use cases EDG 1.4 6 Fully implemented 12 Mostly satisfied (restrictions/complications)
E N D
Middleware Summary Robin Middleton (RAL/PPD) GridPP7 – Oxford – R.Middleton
Overview • What’s in EDG2.0 • What’s planned for EDG2.1 (Rel 3.0) • Towards GridPP2 • General GridPP7 – Oxford – R.Middleton
Requirements • HEPCAL 43 Use cases • EDG 1.4 • 6 Fully implemented • 12 Mostly satisfied (restrictions/complications) • 16 not implemented as functionality missing • 9 partially implemented • EDG 2.0 • Re-assessment (for 1.4, 2.0 & 2.1) by AWG (FH) by 11th July • Authorisation, job control, optimisation improvements • Missing features • Virtual Data (not within EDG scope) • MetaData catalogues (some m/w support, but exp. clarification needed) GridPP7 – Oxford – R.Middleton
EDG 2.0 GridPP7 – Oxford – R.Middleton
What’s in 2.0 ? • Starting from Rel 1.4… • Move to RH7.3 & use LCFGng – Jan 2003 • Move to Globus 2.2 & Condor 6.4 – Feb 2003 • Based on the VDT packaging; compatibility with other projects • http://www.lsc-group.phys.uwm.edu/vdt/home.html • RB support interactive jobs, MPICH, checkpointing • RLS - Replica Location Service, etc (see J.Casey’s talk) • RGMA as backbone of information & monitoring system (S.Fisher talk) • Updated to GLUE schema • Storage Element (see J.Jensen’s talk) • Network Cost Function GridPP7 – Oxford – R.Middleton
Job Checkpointing Job checkpoint states saved in the LB server Retrieval of job checkpoint Logging & Bookkeeping Server Job • Also used (even in rel. 1) as repository of job status info • Already proved to be robust and reliable • The load can be distributed between multiple LB servers, to address scalability problems Saving of job checkpoint state state.saveState() GridPP7 – Oxford – R.Middleton
RLS in EDG2.0 User Interface or Worker Node Resource Broker Virtual Organization Membership Service Information Service Replica Metadata Catalog Replica Location Service Replica Manager Replica Optimization Service Storage Element Storage Element SE Monitor Network Monitor GridPP7 – Oxford – R.Middleton
RGMA in EDG2.0 Ack : WP3 Archiver (LatestProducer) LDAP InfoProvider GLUE Schema Consumer (CE) ConsumerAPI Latest Producer Consumer (SE) GIN RDBMS Consumer (SiteInfo) R-GMA Stream Producer GOUT Stream Producer LDAP Server R-GMA Consumers GIN • Push mode • Updates every 30s • >70 sites (simul.) LDAP InfoProvider GridPP7 – Oxford – R.Middleton
Client App Tomcat Web Service AXIS SE in EDG2.0 C Client Java Client Client App Java Client API API • The design of the SE follows a layered model with a central core handling all paths between client and MSS. SE HTTP library SSL socket Apache library RMANMAN Core is flexible and extensible making it easy to support new protocols, features and MSS SE SE core GridPP7 – Oxford – R.Middleton
WP7 in EDG2.0 http://comp7.in2p3.fr/wp7archive/ GridPP7 – Oxford – R.Middleton
WP7 – GridFTP Logging GridPP7 – Oxford – R.Middleton
WP7 Network Monitoring GridPP7 – Oxford – R.Middleton
EDG 2.1 GridPP7 – Oxford – R.Middleton
EDG2.1 • Integration schedule • Detailed integration times throughout July & August • Feature freeze end August…only debug, integrate & fix after this • General (effort) move from design & code -> test & bug-fix • Quality above quantity • Final software release of EDG • Some important new functionality (in the wings) GridPP7 – Oxford – R.Middleton
Testing & Bugfixing gcc3.2.2/RH7.3 (RH8/9 test on WN/UI); VDT update? VOMS & Security (ACLs, LCMAPS,…) Scalability/Stability measures (RLS,R-GMA,MSS staging,…) New Functionality (gridOpen, DAGs, …) EDG 2.1 (aka Rel 3.0 !!) (Ack: E.Laure CERN/EDG) GridPP7 – Oxford – R.Middleton
TB2.1 – WP1 • Direct interaction of RB with R-GMA (Incl. Logging & Bookkeeping) • Integration with VOMS (proxy renewal) • Job dependencies & DAGman scheduling • Job partitioning • RB support for Data prefetch (depends on WP2) • Accounting & Advance Reservation • Strong dependence on underlying system probably only a demonstrator GridPP7 – Oxford – R.Middleton
TB2.1 – WP1 - DAGs A = [ Executable = "A.sh"; PreScript = "PreA.sh"; PreScriptArguments = { "1" }; Children = { "B", "C" } ]; B = [ Executable = "B.sh"; PostScript = "PostA.sh"; PostScriptArguments = { "$RETURN" }; Children = { "D" } ]; C = [ Executable = "C.sh"; Children = { "D" } ]; D = [ Executable = "D.sh"; PreScript = "PreD.sh"; PostScript = "PostD.sh"; PostScriptArguments = { "1", "a" } ] GridPP7 – Oxford – R.Middleton
TB2.1 – WP1 – Job Partitioning JobType = Partitionable; Executable = ...; JobSteps = ...; StepWeight = ...; Requirements = ...; ... ... Prejob = [ Executable = ... Requirements = ...; ... ... Aggregator = [ Executable = ... Requirements = ...; ... ... ]; GridPP7 – Oxford – R.Middleton
TB2.1 – WP2 • Full RLS deployment • RLI integrated with LRC • VOMS aware security • EDG Trust Manager • EDG Authorisation Manager (coarse grained) • File pre-fetch (needed by WP1) – not for 2.1 • Replica Subscription Service – not for 2.1 • First step towards proxy service for supporting sites w/o outbound IP • Must not compromise support RLS or security GridPP7 – Oxford – R.Middleton
RLS at SC2002 Ack : G.McCance Used Globus RLS GridPP7 – Oxford – R.Middleton
Registry1 Registry2 Registry3 Info mastered by Registry1 Info mastered by Registry2 Info mastered by Registry3 Copy of info from Registry1 Copy of info from Registry2 Copy of info from Registry1 Copy of info from Registry3 Copy of info from Registry2 Copy of info from Registry3 TB2.1 – WP3 • General performance enhancements • Performance enhancement forGRM/PROVE use • Registry resilience (replication) • VOMS aware security(authentication + basic authorisation) Producer1 Producer2 GridPP7 – Oxford – R.Middleton
TB2.1 – WP4 • Resource management • GLUE info provider maintenance • Support for LSF, Condor & advance reservation • Fault tolerance framework • Gridification • LCMAPS-1.0, LCAS-2.0, VOMS plugin, job repository • Monitoring (see Jan van Eldik’s talk) • Full architecture, Oracle & MySQL backends, alarm display • New Install & Config architecture piloted at CERN, but NOT replacing LCFGng before end of EDG GridPP7 – Oxford – R.Middleton
Packages (rpm, pkg) • Software Package Mgmt Agent (SPMA) • SPMA manages the installed packages • Runs on Linux (RPM) or Solaris (PKG) • SPMA configuration done via an NCM component • Can use a local cache for pre-fetching packages (simultaneous upgrades of large farms) Install & Config SWRep Servers Ack : WP4 http cache SPMA packages Mgmt API nfs SPMA.cfg (RPM, PKG) ACL’s • Automated Installation Infrastructure • DHCP and Kickstart (or JumpStart) are re-generated according to CDB contents • PXE can be set to reboot or reinstall by operator ftp SPMA SPMA NCM Components NCM Node (re)install? • Software Repository • Packages (in RPM or PKG format) can be uploaded into multiple Software Repositories • Client access is using HTTP, NFS/AFS or FTP • Management access subject to authentication/authorization Configuration Information is stored in the local cache. It is accessed via NVA-API Installation server Cdispd PXE CCM PXE handling Mgmt API Registration Notification ACL’s Node Install DHCP • Node Configuration Manager (NCM) • Configuration Management on the node is done by NCM Components • Each component is responsible for configuring a service (network, NFS, sendmail, PBS) • Components are notified by the Cdispd whenever there was a change in their configuration DHCP handling Configuration Data Base (CDB) Configuration Information store. The information is updated in transactions, it is validated and versioned. Pan Templates are compiled into XML profiles KS/JS KS/JS generator Client Nodes CCM CDB GridPP7 – Oxford – R.Middleton
TB2.1 – WP5 • SRM interface • Asynchronous interaction • SE setup for WP9(EO)/10(Bio) • VOMS aware security • Improved error handling GridPP7 – Oxford – R.Middleton
TB2.1 – WP7 • Probe Coordination Protocol • Network cost function enhancement • Network GLUE schema prototype • QoS & high throughput tests with GEANT GridPP7 – Oxford – R.Middleton
high frequency low frequency CA CA CA TB2.x User Authorisation host cert(long life) service user crl update user cert(long life) VO-VOMS registration registration VO-VOMS voms-proxy-init VO-VOMS proxy cert(short life) service cert(short life) VO-VOMS authz cert(short life) authz cert(short life) authentication & authorization info edg-java-security LCAS GridPP7 – Oxford – R.Middleton
voms-ldap-sync voms-ldap-sync VO-LDAP VOMS VO-LDAP VOMS edg-mkgridmap edg-mkgridmap grid-mapfile grid-mapfile grid-proxy-init grid-proxy-init service user service user proxy proxy phase 0. phase 1. testing the VOMS servers user management on VOMS VOMS VOMS VO-LDAP edg-mkgridmap voms-proxy-init grid-mapfile grid-proxy-init voms-proxy-init service user service user proxy (voms) proxy (voms) phase 2. phase 3. fully migrated: only VOMS-aware services compatibility mode: mixed services TB2.1 - Security • VOMS deployment • Server manually set up at several places • Work on auto-config ongoing – start testing soon GridPP7 – Oxford – R.Middleton
SAM – D0/CDF GridPP7 – Oxford – R.Middleton
GridPP-2 - Middleware GridPP7 – Oxford – R.Middleton
GridPP-2 Middleware Directions • Policy • Mission critical to PP OR • Demonstrable lead on international stage OR • Contribute to wider programme leveraging benefit for PP • Guidelines • Clustering of expertise • Useful to LCG programme • Partnership/collaboration where possible (e.g. UK e-Science) • Tech. transfer to industrial sector • Awareness of / engagement with GGF (move to OGSA/I) • Areas • Data & Storage Management • Workload Management • Information & Monitoring • Security • Networking GridPP7 – Oxford – R.Middleton
Evolution of m/w Effort GridPP7 – Oxford – R.Middleton
GridPP2 - Middleware • Data Storage & Management • Fuller integration of exp meta-data with m/w • Site-local data management • caches, space reservation, cleanup • Full integration of MSS • Improved replica optimisation • Workload Management • OGSIfication of the RB • Redesign of WM architecture • Develop Java client • Autonomic aspects of WM • New scheduling algorithms GridPP7 – Oxford – R.Middleton
GridPP2 - Middleware • Information & Monitoring • Requirements & architecture revision/cycle • OGSIfication of core service(s) (see A.Djaoui’s talk) • QA & Production Service Dev • Information model co-ordination • End-user tools/displays • Security • LCG Security • Local Access Control • Pool accounts, GACL, /grid, batch interfaces • Local Usage Control • GridSite • VO Access Management • VO Usage Management • Interface alternative authorisation frameworks • Audit and grid intrusion detection • Tool ports to other UNIX & Windows GridPP7 – Oxford – R.Middleton
GridPP2 - Middleware • Networking • Next generation Grid Network Performance Measurement Service • High performance data transport • Resource allocation & reservation services • (UKLIGHT participation) • (PPNCG support) GridPP7 – Oxford – R.Middleton
General GridPP7 – Oxford – R.Middleton
EDG Quality Group (Slide : R.Jones – Barcelona meeting) • The Quality Group (QAG) was created in August 2002 with Quality representative (QAR) from each WP. The QAR ensure the measures are applied inside his/her WP. Chaired by Gabriel Zaquine. • http://www.eu-datagrid.org/QAG/ • The Quality Group has produced an EDG developers guide document • The document gives an overview of the tools available and conventions to be followed for the software development within EDG: • Packaging - Code Management – Automatic Build system - Environment - Interfaces and API's - Documentation • Test and validation process - Integration procedure - Style and naming conventions • http://edms.cern.ch/document/358824 • Work on EDG 2.0 shows that conventions are not yet being followed by everyone • All developers must read this document and ensure their software complies GridPP7 – Oxford – R.Middleton
EDG Architecture Group (Slide : R.Jones – Barcelona meeting) • ATF has been working to clarify the details of the interactions and interfaces of EDG 2.0 • Continues to meet on a monthly basis http://agenda.cern.ch/displayLevel.php?fid=3l148 • Work driven by use cases provided by the application representatives • A document describing the architecture for EDG 2.0 has been produced: https://edms.cern.ch/file/368971/ • ATF has been further empowered to “own” the external interfaces • Intended to avoid discrepancies between the interface details agreed by ATF and those found in the software delivered by the mware WPs • Baseline document with interface definitions now in preparation • Mware WPs please make sure ATF have the APIs for your external interfaces GridPP7 – Oxford – R.Middleton
How far have we come/to go ? (n.b. very subjective !!) GridPP7 – Oxford – R.Middleton
Summary • Achievements • RB, VOMS, RLS, RGMA, SE, LCFG,… • Challenges • LCG-1, LCG-2, … • OGSA migration • Engineering production quality (R3 etc.) Scalability, Scalability, Scalability Stability, Stability, Stability GridPP7 – Oxford – R.Middleton
END GridPP7 – Oxford – R.Middleton