1 / 13

DIRAC and SAFE

DIRAC and SAFE. DIRAC requirements. DIRAC serves a variety of different user communities. These have different computational requirements best served by different types of computer. User communities are spread across many different institutions.

jolie
Download Presentation

DIRAC and SAFE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DIRAC and SAFE

  2. DIRAC requirements • DIRAC serves a variety of different user communities. • These have different computational requirements best served by different types of computer. • User communities are spread across many different institutions. • Resources are geographically distributed and run by multiple organisations. • Some of these resources are provided by existing services with existing procedures. • Funding is limited • Mostly only HW was funded. • Need to provide rest of the service as efficiently as possible. • Need to utilise existing infrastructure/processes where possible • Avoid unnecessary complications.

  3. Stakeholders • Dirac management • Need overview of usage of resources to inform allocation policy. • Need mechanisms to implement allocation policy. • Research communities • Need resource usage information to manage community science programme. • Need mechanisms to manage community membership. • Need mechanisms to manage community resources. • Users • Need to be able to request accounts (frequently at remote institutions) • Need to access accounts remotely • Want to get on with science without additional complications.

  4. Level of integration • Most requirements for integration are at the management level • Experience suggests a strong correlation between user communities and compute resource. • Communities will choose resources appropriate to their science. • Users will want to access the unique features of these resources. • Though projects may span resources most individual users will probably stick to a single system. • Global accounts, single-sign-on etc. not essential.

  5. GRID? • Computational grid not appropriate • Grids designed to provide uniform access to interchangeable resources. DIRAC resources are complementary not interchangeable. • Provides standard interface but only to features common to all systems • Data grid may be more relevant. • Depends on the data handling requirements of user communities. • Need to gather more requirements.

  6. SAFE design principles • SAFE has been built to provide a single point of contact for users of national HPC services. • Role essentially that of the ITIL service desk. • Originally deployed for HPCx service, Currently used for HECToR service. Also used for internal EPCC services. • Provides a well defined interface for service providers. • Tries to express all requests as standard tickets. • Supports multiple service providers with different support policies. • Has to make very few technological assumptions. • Users can come from any academic institution. Can’t assume much more than Email and Web. • We usually bid to run service in parallel with hardware procurement. We have little say over hardware or system software and need to adapt SAFE quickly to provide service if bid successful.

  7. SAFE design principles II • Has to be flexible rather than prescriptive. • Requirements have changed constantly over the 10 years of SAFE development. • Need to be able to quickly implement new reports or policies generated by RCs or policy panels. • Need to maintain access to old data even when current system/policy has changed. • Need to be able to integrate new services into existing instances. • Need to be able to adapt tickets to meet needs of service teams and underlying infrastructure. • Controlling our own software gives us a great deal of flexibility. • We have built up an extensive toolbox to allow rapid implementation of new requirements.

  8. What can SAFE offer DIRAC. • Software already exists and is already managing BG/Q service (minimal cost). • Its designed to handle distributed user communities from many different institutions. • Many DIRAC users will already be familiar with it. • Its designed to handle multiple service providers with different operating policies. • While the SAFE supports many features sites only need to adopt those that work with their normal way of working.

  9. SAFE as a service • Can use the BG/Q safe to provide a service for the whole of DIRAC • Host, install, maintain, modify where necessary. • Generates necessary reports and statistics for whole of DIRAC. • Provides single point to manage project membership, account creation etc. • Lightweight and non-intrusive integration with service providers. • Special handling to work within local policies. • Choice over which features are adopted. • Centralised service requires minimal changes to existing software and only needs O(N) interactions not O(N2)

  10. Account creation. • Accounts requested via SAFE • Sends request to project manager. • Once approved raises ticket with service provider • Default is to do this by email, XML available for scripts. Hi Support, This user has been authorised to have an account on one of our machines. Please create a new user account for them using the following information. Task ID: 46067 Machine: hector Username: demo Email: s.booth@ed.ac.uk User's Name: Dr Stephen P Booth Consortium: z01 - USL Project Group(s): z01 UID: 13535 GID: 1001 Thanks, The SAF. P.S. You can see the current pending queue by looking at https://www.hector.ac.uk/safe/servlet/SysAdminServlet <SysAdmin> <Id>46067</Id> <Type>New User</Type> <Status>Pending</Status> <StarDate>2012-6-4 11:3:51</StarDate> <EndDate>0000-00-00 00:00:00</EndDate> <Machine>hector</Machine> <Project> <Code>z01</Code> <Name>USL</Name> </Project> <ProjectGroup> <Code>z01</Code><GroupID>1001</GroupID> </ProjectGroup> <Account> <Name>demo</Name> <UID>13535</UID> <GID>1001</GID> <Groups>z01</Groups> </Account> <Person> <Name><Title>Dr</Title><Firstname>Stephen</Firstname><Lastname>Booth</Lastname></Name> <Email>s.booth@ed.ac.uk</Email> </Person> </SysAdmin>

  11. Completing tickets. • Once created need to notify SAFE via web-form • Manually via browser or automatically via script. • Service provider can reject tickets. • Initial (one-shot?) password returned to SAFE for retrieval by user. • Similar mechanism possible for password resets. • We can gather more information if needed • IP address ranges has been requested. • We can encode local policies on Usernames UID/GID ranges into SAFE. • Or we can let site choose UID/GID/Username and return values to SAFE when completing ticket. • UID/GID only need to be managed centrally if supporting file-system cross mounts.

  12. Accounting/Reports • SAFE contains an extensive accounting sub-system. • Accounting data is parsed into DB tables. • Do NOT mandate a fixed format instead keep data close to raw format and define mappings to standard properties. • Easier to change system/policy without re-importing old data. • Easier to handle different service provider policies • Single reports may combine data from multiple tables in different formats provided reports are based on common properties. • Service providers only need to provide DIRAC usage data in some convenient format. • Normally upload data daily. • Can also support storage accounting though this does currently use a fixed format.

  13. Resource Management • Safe can provide more detailed resource management. Uses a 3 level model. • Project – Top level corresponds to a grant of resources from allocation panel mostly internal to SAFE • ProjectGroup – Internal project management grouping controlled by project PI or designated managers through web interface. These can be just compute budgets but may also correspond to unix groups if used to manage disk resources. • User – individual user. • Though this gives a lot of fine control to PI/PM it requires more integration with service provider • Sites can choose to use local resource management procedures instead. • Accounting does NOT depend on SAFE managing the resources.

More Related