1 / 27

Grid Challenge - programming competition on the Grid -

22nd APAN Meeting in Singapore. Grid Challenge - programming competition on the Grid -. Kento Aida Tokyo Institute of Technology. What is Grid Challenge?. programming competition to develop high-performance programs on the Grid The organizer operates a Grid testbed.

haley
Download Presentation

Grid Challenge - programming competition on the Grid -

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 22nd APAN Meeting in Singapore Grid Challenge- programming competition on the Grid - Kento Aida Tokyo Institute of Technology Kento Aida, Tokyo Institute of Technology

  2. What is Grid Challenge? • programming competition to develop high-performance programs on the Grid • The organizer operates a Grid testbed. • Participants develop/run programs on the testbed. • a special event in the Annual Symposium on Advanced Computing Systems and Infrastructures (SACSIS) • history • 1st Grid Challenge in SACSIS 2005 • 2nd Grid Challenge in SACSIS 2006 Kento Aida, Tokyo Institute of Technology

  3. Category • compulsory • programming competition on the Grid testbed • solving the problem provided by the organizer • Graph Partitioning Problem • students (university and high school) • free • giving opportunities to perform experiments on the Grid • presentations during the conference • students, engineers and researchers Kento Aida, Tokyo Institute of Technology

  4. 2 4 1 6 3 L R 5 Compulsory Graph Partitioning Problem for given undirected graph G(V,E), |V| = 2n L and R are disjoint partitions generated by equally dividing G, where |L| = |R|. Find partition that minimizes the number of edges with one endpoint in L and the other in R. Kento Aida, Tokyo Institute of Technology

  5. Compulsory (cont’d) • qualifying runs (3 weeks) • Solve early! • to find a solution within a given threshold • shared resources • problem size: |V| = 500 - 1500 • final runs (2 weeks) • Solve fast! • dedicated time slots for finalists (2.5h per a team) • to find a solution within a given period (10 min) • A finalist with the best solution will be a winner! • problem size: |V| = 30000 - 35000 Kento Aida, Tokyo Institute of Technology

  6. Free • experiments of research projects (1 month) • shared resources • projects • tools • a monitoring tool, a message passing system, a programming tool, volunteer computing • applications • physics simulation, bio informatics, simulation of diesel engine, optimization problems Kento Aida, Tokyo Institute of Technology

  7. H, 1 D, 2 U, 1 D, 2 U, 6 M, 12 M, 5 Participants compulsory free Kento Aida, Tokyo Institute of Technology

  8. Testbed • Grid Challenge Federation • AIST • Tokyo Institute of Technology • The University of Tokyo • Doshisha University more than 1,200 CPUs Kento Aida, Tokyo Institute of Technology

  9. Resources • collection of PC clusters • spec of a PC cluster • a gateway node • gateway, compiling • computing nodes • computation • global IP address/private IP address • NFS • “/home” is shared among nodes Kento Aida, Tokyo Institute of Technology

  10. Resources (cont’d) Kento Aida, Tokyo Institute of Technology

  11. SAKURA Tsukuba WAN F32 PrestoIII WIDE Chikayama DIS Tau SINET Xenia Internet Connection Kento Aida, Tokyo Institute of Technology

  12. Software • Grid middleware • Globus Tool Kit 2.4 • batch queueing system • Sun Grid Engine, PBS • remote process invocation • SSH, GXP • monitoring • Ganglia • programming • MPICH 1.2.7, Ninf-G 2.4 Kento Aida, Tokyo Institute of Technology

  13. GXP http://www.logos.ic.i.u-tokyo.ac.jp/phoenix/gxp_quick_man.shtml • shell for distributed multi-cluster environment • fast simultaneous command submissions • parallel job pipes • interactive selection of nodes to execute commands • no cumbersome per-node operations! • installation and deployment • invocation of parallel processes • monitoring, trouble diagnosis, debugging • dead processes clean-up Kento Aida, Tokyo Institute of Technology

  14. Ninf-G http://ninf.apgrid.org/ • reference implementation of GridRPC • GridRPC : a simple RPC-based programming model for the Grid • Client invokes remote libraries installed on remote servers on the Grid. • utilizing task parallelism server client program server program data library result client grpc_call(…) data server library result Kento Aida, Tokyo Institute of Technology

  15. Ganglia http://ganglia.sourceforge.net/ • a distributed monitoring tool for high-performance computing systems such as PC clusters and Grids • CPU load • memory usage • network traffic Kento Aida, Tokyo Institute of Technology

  16. Operation • The testbed is operated by volunteers! • researchers/technical staff/students • What we need to do • installation and its training for students • user management • job management Kento Aida, Tokyo Institute of Technology

  17. User Management • local account • the same UID and login name for a user on all sites • remote login via ssh • public key • Globus account • temporal CA for the Grid Challenge Kento Aida, Tokyo Institute of Technology

  18. Job Management • interactive or batch • All sites provide both environment for job execution. • dedicated slot • Finalists are assigned dedicated slots for their application runs. • the gentlemen’s agreement Kento Aida, Tokyo Institute of Technology

  19. Troubles … • computing nodes • OS hang up, troubles on hard disc drives • power supply • failure of balancing power supply • servers • troubles on NFS, batch queueing systems • monitoring • troubles to collect monitoring data on ganglia Kento Aida, Tokyo Institute of Technology

  20. Troubles … (cont’d) • jobs being out of control • waste of CPU/memory resources by jobs being out of control • dedicated slots • jobs running beyond its slot. Kento Aida, Tokyo Institute of Technology

  21. Operational Issue • trouble on computing nodes • monitoring tools to identify computing nodes • power supply • critical problem for small groups, e.g., a lab in university • tools for power monitoring • low-power processor • servers • redundancy Kento Aida, Tokyo Institute of Technology

  22. Operational Issue (cont’d) • user/process management • tools to control user processes • monitoring user processes • detecting unusual behavior • suspending/killing jobs being out of control • tools for reservation • reserving dedicated slots for users • controlling user jobs Kento Aida, Tokyo Institute of Technology

  23. Snapshots qualifying runs final runs Kento Aida, Tokyo Institute of Technology

  24. Snapshots (cont’d) Kento Aida, Tokyo Institute of Technology

  25. Conclusions • Grid Challenge is programming competition to develop high-performance programs on the Grid. • compulsory and free categories • Grid testbed for Grid Challenge • 6 sites, 7 PC clusters, >1200 CPU • Globus, SGE, PBS, GXP, Ganglia, Ninf-G, MPICH, … • discussion about operational issue • tools for monitoring, power supply, user/process management Kento Aida, Tokyo Institute of Technology

  26. Acknowledgements • Information Processing Society of Japan • Sun Microsystems • Soum Corporation • Grid Consortium Japan Kento Aida, Tokyo Institute of Technology

  27. Thank you. Kento Aida, Tokyo Institute of Technology

More Related