EPP Grid Activities - PowerPoint PPT Presentation

lotus
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
EPP Grid Activities PowerPoint Presentation
Download Presentation
EPP Grid Activities

play fullscreen
1 / 20
Download Presentation
EPP Grid Activities
288 Views
Download Presentation

EPP Grid Activities

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

    1. EPP Grid Activities AusEHEP Wollongong Nov 2004

    2. Grid Anatomy What are the essential components CPU Resources + Middleware (software common interface) Data Resources + Middleware replica catalogues; unifying many data sources Authentication Mechanism Certificates (Globus GSI), Certificate Authorities Virtual Organisation Information Services Grid consists of VOs!? users + resources participating in a VO Who is a part of what research/effort/group Authorisation for resource use Job Scheduling, Dispatch, and Information Services Collaborative Information Sharing Services Documentation & Discussion (web, wiki,) Meetings & Conferences (video conf., AccessGrid) Code & Software (CVS, CMT, PacMan) Data Information (Meta Data systems)

    3. 2nd Generation Accessible resources for Belle/ATLAS We have access to around ~120 CPU (over 2 GHz) APAC, AC3, VPAC, ARC currently 50% Grid accessible Continuing to encourage HPC facilities to install middleware We have access to ANUSF petabyte storage facility Will request ~100 TB for Belle data. SRB (Storage Resource Broker) Replica catalogue federating KEK/Belle, ANUSF, Melbourne EPP data storage Used to participate in Belles 4x109 event MC production during 2004

    4. 2nd Generation SRB (Storage Resource Broker) Globally accessible virtual file system Domains of storage resources eg. ANUSF domain contains the ANU petabyte storage facility and disk on Roberts in Melbourne Federations of Domains eg. ANUSF and KEK are federated $ Scd /anusf/home/ljw563.anusf $ Sls l $ Sget datafile.mdst $ Scd /bcs20zone/home/srb.KEK-B

    5. Grid Anatomy What are the essential components CPU Resources + Middleware Data Resources + Middleware replica catalogues; unifying many data sources Authentication Mechanism Globus GSI, Certificate Authorities Virtual Organisation Information Services Grid consists of VOs!? users + resources participating in a VO Who is a part of what research/effort/group Authorisation for resource use Job Scheduling, Dispatch, and Information Services Collaborative Information Sharing Services Documentation & Discussion (web, wiki,) Meetings & Conferences (AccessGrid) Code & Software (CVS, CMT, PacMan) Data Information (Meta Data systems)

    6. 3rd Generation Solutions NorduGrid -> ARC (Advanced Resource Connector) Nordic Countries plus others like Australia Weve used this for ATLAS DC2 Globus 2.4 based middleware Stable, patched, and redesigned collection of existing middleware (Globus, EDG) Grid 3 Middleware -> VDT US based coordination between iVDGL, GriPhyN, PPDG Globus 2.4 based middleware LHC Computing Grid (LCG) <- EDG -> EGEE Multiple Tiers: CERN T0, Japan/Taiwan T1, Australia T2 ? Regional Operations Centre in Taiwan Substantial recent development needs to be looked at once again!

    7. 3rd Generation Solutions Still a lot of development going on. data aware job scheduling is still developing VO systems are starting to emerge meta-data infrastructure is basic Deployment is still a difficult task. prescribed system/OS only

    8. Grid Anatomy What are the essential components CPU Resources + Middleware Data Resources + Middleware replica catalogues; unifying many data sources Authentication Mechanism Globus GSI, Certificate Authorities Virtual Organisation Information Services Grid consists of VOs!? users + resources participating in a VO Who is a part of what research/effort/group Authorisation for resource use Job Scheduling, Dispatch, and Information Services Collaborative Information Sharing Services Documentation & Discussion (web, wiki,) Meetings & Conferences (AccessGrid) Code & Software (CVS, CMT, PacMan) Data Information (Meta Data systems)

    9. Virtual Organisation Systems Now there are 3 systems available EDG/NorduGrid LDAP based VO VOMS (VO Membership Service) from LCG CAS (Community Authorisation Service) from Globus In 2003 we modified NorduGrid VO software for use with the Belle Demo Testbed, SC2003 HPC Challenge (worlds largest testbed) More useful for rapid Grid deployment than above systems. Accommodates Resource Owners security policies resource organisations are part of the community their internal security policies are frequently ignored/by-passed Takes into account CA certificate authorities are a part of the community a VO should be able to list CAs who they trust to sign certificates Compatible with existing Globus Might be of use/interest to the Australian Grid community? GridMgr (Grid Manager)

    10. Virtual Organisation Systems How do VOs manage internal priorities? This problem has not yet become apparent! This has been left up to local resource settings. For non VO resources, changes would require allocation or configuration renegotiation. CAS is only VO middleware to address this done by VOs specifying policies allowing/denying access to resources local resource priorities are not taken into account difficult to predict the effect VO manage job queue centrally managed VO priorities, independent of locally managed resource priorities resource job consumers pull jobs from the queue the VO decides and can change which jobs are run first results of prototype testing fair-share system could be used users/groups are allocated a target fraction of all resources

    11. Grid Anatomy What are the essential components CPU Resources + Middleware Data Resources + Middleware replica catalogues; unifying many data sources Authentication Mechanism Globus GSI, Certificate Authorities Virtual Organisation Information Services Grid consists of VOs!? users + resources participating in a VO Who is a part of what research/effort/group Authorisation for resource use Job Scheduling, Dispatch, and Information Services Collaborative Information Sharing Services Documentation & Discussion (web, wiki,) Meetings & Conferences (AccessGrid) Code & Software (CVS, CMT, PacMan) Data Information (Meta Data systems)

    12. Data Grid Scheduling Task -> Job1, Job2 ... Job1 -> input replica 1, input replica 2 ... Job1 + Input -> CPU resource 1 ... How do you determine what+where is best?

    13. Data Grid Scheduling Whats the problem? Try to schedule wisely free resources, close to input data, less failures Some resources are inappropriate need to parse and check job requirements and resource info (RSL - Resource Specification Language) Job failure is common error reporting is minimal need multiple retries for each operation need to try other resources in case of resource failure eventually we stop and mark a job as BAD What about firewalls some resource have CPUs which cannot access data Schedulers Nimrod/G (parameter sweep, not Data Grid) GridBus Scheduler (2003, 2004 aided them towards SRB) GQSched (prototype developed in 2002, used in 2003 demos)

    14. Data Grid Scheduling GQSched (Grid Quick Scheduler) Idea is based around the Nimrod model (user driven parameter sweep dispatcher) Addition of sweeps over data files and collections Built in 2002 as a demonstration to computer scientists of simple data grid scheduling Simple tool familiar to Physicists Shell script, Environment parameters Data Grid Enabled Seamless access to data catalogues and Grid storage systems Protocols GSIFTP, GASS (and non-Grid protocols also HTTP, HTTPS, FTP) Catalogues GTK2 Replica Catalog, SRB (currently testing) Scheduling based on metrics for CPU Resource Data Resource combinations previous failures of job on resource nearness of physical file locations (replicas) resource availability Extra features Pre- and Post-processing for preparation/collation of data and job status checks Creation and clean-up of unique job execution area Private network friendly staging of files for specific resources (3 stage jobs) Automatic retry and resubmit of jobs Reporting of file access errors and job errors Merging of RSL requirements for Resources and Jobs Automatic checking and creation of Grid proxy File globbing for Globus file staging, GSIFTP, GTK2 Replica Catalog, SRB Future features Scheduling based on optimised output storage location Dynamic parameter sweep and RSL specification (eg. splitting file processing via meta-data) Job profile reporting (in XML format, partially tested) Automatic proxy renewal or notification of expiry (for long term jobs) Job Submission Service mode for ongoing tasks such as Belle MC production?

    15. Grid Scheduling $ gqsched myresources myscript.csh

    16. Grid Anatomy What are the essential components CPU Resources + Middleware Data Resources + Middleware replica catalogues; unifying many data sources Authentication Mechanism Globus GSI, Certificate Authorities Virtual Organisation Information Services Grid consists of VOs!? users + resources participating in a VO Who is a part of what research/effort/group Authorisation for resource use Job Scheduling, Dispatch, and Information Services Collaborative Information Sharing Services Documentation & Discussion (web, wiki,) Meetings & Conferences (AccessGrid) Code & Software (CVS, CMT, PacMan) Data Information (Meta Data systems)

    17. Meta-Data System Advanced Meta-Data Repository Advanced = Above and beyond file/collection oriented meta-data Data oriented queries List the files resulting from task X. Retrieve the list of all simulation data of event type X. How can file X be regenerated? (if lost or expired) Other queries we can imagine What is the status of job X ? What analyses similar to X have been undertaken? What tools are being used for X analysis? Who else is doing analysis X or using tool Y ? What are the typical parameters used for tool X ? And for analysis Y ? Search for data skims (filtered sets) that are supersets of my analysis criteria.

    18. Meta-Data System XML some great advantages natural tree structure strict schema, data can be validated powerful query language (XPath) format is very portable information readily transformable (XSLT) some real disadvantages XML databases are still developing, not scalable XML DBs are based on lots of documents of the same type Would need to break tree into domains, query becomes difficult LDAP compromise natural tree structure loose schema but well defined reasonable query feature, not as good as XML very scalable (easily distributed and mirrored) information can be converted to XML with little effort if necessary structure/schema is easily accessible and describes itself ! might be a way to deal with schema migration (meta-data structure is dynamic; need to preserve old information)

    19. Meta-Data System Components Navigation, Search, Management of MD Task/Job/Application generated MD Merging and Uploading MD

    20. Meta-Data System Navigation and Creation via Web Search is coming

    21. How to use it all together? Getting set up Certificate from a recognised CA (VPAC) Accounts on each CPU/storage resource ANUSF storage, VPAC, ARC (UniMelb), APAC Install require software on resources (eg. BASF) Your certificate in the VO system Running jobs Find SRB input files, set up output collection Convert your scripts to GQSched scripts Run GQSched to execute jobs Meta Data Find/Create a context for your tasks (what you are currently doing) Submit this with your job, or store with output Merge context + output meta-data, then upload NOT COMPLETE need auto generated MD from BASF/jobs