1 / 21

The OxGrid Resource Broker

The OxGrid Resource Broker. David Wallom. Overview. OxGrid Resource Broking Why build our own Job Submission and other tools Future developments. OxGrid, a University Campus Grid. Single entry point for users to shared and dedicated resources

devon
Download Presentation

The OxGrid Resource Broker

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The OxGrid Resource Broker David Wallom

  2. Overview • OxGrid • Resource Broking • Why build our own • Job Submission and other tools • Future developments

  3. OxGrid, a University Campus Grid • Single entry point for users to shared and dedicated resources • Seamless access to NGS and OSC for registered users

  4. Resource Broking • The original idea of the grid relied on efficient resource broking to abstract the user away from the resources • This has been significantly neglected by grid software developers • Push or pull type of mechanism, each have significant advantages or disadvantages • Resources that have multiple job sources increase complexity many fold

  5. Why build our own? • OxGrid is intended to be a lightweight development • Replacement of individual components should be simple • Use of service based interfaces are the goal • Current solutions do not allow this with massive dependencies and non trivial maintenance requirements • Condor-G is a simple off the shelf Grid system meta scheduler, why make it so much more complicated?

  6. Condor Matchmaking • Matchmaking is a methodology for Distributed Resource Management • Conceptually simple: • Service providers and requesters advertise • Compatible advertisements are matched • Matched entities cooperate to perform service • Developed for opportunistic environments • Use resources as and when available Thanks to the Miron and the Condor Team

  7. Condor Matchmaking (Cont.) • Customers and Servers advertise to a Matchmaking Service • Advertisements describe advertising entities • Characteristics • Requirements and Constraints • Preferences • These descriptions are called classified advertisements (classads) Thanks to the Miron and the Condor Team

  8. Static and Dynamic Information • Static information • e.g. processor architecture, physical memory, operating system, scheduling system, no. of nodes • Dynamic information • e.g. system availability, scheduler load, queue length, used disk or memory

  9. OxGrid Virtual Organisation Manager Database • Final repository for authorisation information • Stores additional static information for each resource such as capability and maximum number of submitted jobs for that node

  10. Data Harvesting cycle • Information sources can be added or removed at will • Either a single repository for information aggregation (e.g. ngsinfo) or individual machines • Simple internal representation of information gives ease of adding new types of info source

  11. Generated classad MyType = "Machine" TargetType = "Job" Name = ”bedrock.oucs.ox.ac.uk-condor“ gatekeeper_url=”bedrock.oucs.ox.ac.uk/jobmanager-condor" Requirements=(CurMatches<20)& (TARGET.JobUniverse == 9) WantAdRevaluate = True UpdateSequenceNumber = 1097580300 CurMatches = 0 OpSys = "LINUX“ Arch = "INTEL" Memory = 501 MPI = False INTEL_COMPILER=True GCC3=True

  12. Tuning Condor to act as a metascheduler • The default configuration of Condor is as a cycle scavenger • Alter this through ensuring that all available tasks are attempted to be matched with each pass of the Negotiator • Since we are a Condor-G system only we change the default universe of the system to grid

  13. Changes to Condor configuration DEFAULT_UNIVERSE = GLOBUS CLASSAD_LIFETIME = 900 NEGOTIATE_ALL_JOBS_IN_CLUSTER = True NEGOTIATOR_INTERVAL = 30 JOB_START_DELAY = 10 GRIDMANAGER_JOB_PROBE_INTERVAL=60

  14. Job Submission • Most users are comfortable with command-line applications • Condor submission scripts would be another language for our users to learn… • submission step as a scriptable application with argument • Created job-submission

  15. job-submission -h <HOSTNAME>/<JOBMANAGER> -e <EXECUTABLE> -t Boolean transfer exe? -a EXE arguments -i Input files to be transferred -o Output files to be transferred

  16. Job classad executable = update_file Transfer_Executable = True globusscheduler = $$(gatekeeper_url) Requirements = (TARGET.gatekeeper_url == "t2ce02.physics.ox.ac.uk/jobmanager-lcgpbs" || TARGET.gatekeeper_url == "condor.oucs.ox.ac.uk/jobmanager-condor" || TARGET.gatekeeper_url == "grid-compute.oesc.ox.ac.uk/jobmanager-pbsox" || TARGET.gatekeeper_url == "bedrock.oucs.ox.ac.uk/jobmanager-sge") && TARGET.gatekeeper_url =!= UNDEFINED && TARGET.OpSys == "LINUX" match_list_length = 1 arguments = TEST_3_2.in TEST_3_2.out transfer_input_files = TEST_3_2.in transfer_output_files = TEST_3_2.out WhenToTransferOutput = ON_EXIT universe = grid grid_type = gt2 notification = ERROR output = temp-1168783341-2.out error = temp-1168783341-2.err log = temp-1168783341-2.log queue

  17. Additional User Tools • oxgrid_certificate_import • Simplifies the installation of a user digital certificate to a single command • oxgrid_q • Display the users current queue at the resource broker. Has the options to allow the user to see the full task queue. • oxgrid_status • Displays the resources that are available to the user with options for all resource currently registering with the resource broker • oxgrid_cleanup • Removes either a single submitted process or a range of child processes with their master

  18. oxgrid_status

  19. Users • Statistics • Materials science • Inorganic chemistry • Theoretical chemistry • Biochemistry • Computational biology • Astrophysics • Condensed matter physics • Zoology • Researchers and students

  20. Future Developments • As part of GridBS project development: • Additional direct submission into MS CCS using GridSAM BLAH • Addition of new types of data sources • EGEE • Grimoires • Continue to improve packaging to ensure ease of installation and re-distribution

  21. Conclusion • We have designed a resource broker that is orders of magnitude small with minimal external dependencies • Simple tools have allowed users of OxGrid easy access to resources in many different institutions • Over 65k individual tasks have been submitted to connected resources since January

More Related