1 / 65

Taiwan UniGrid

Taiwan UniGrid. Yeh-Ching Chung Department of Computer Science National Tsing Hua University Hsin-Chu, 300, Taiwan. Outline. Introduction Portal Broker and Scheduler Resource Information Service Storage Service Applications Conclusion. Introduction (1).

yin
Download Presentation

Taiwan UniGrid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taiwan UniGrid Yeh-Ching Chung Department of Computer Science National Tsing Hua University Hsin-Chu, 300, Taiwan

  2. Outline • Introduction • Portal • Broker and Scheduler • Resource Information Service • Storage Service • Applications • Conclusion

  3. Introduction (1) • The purpose of grid computing is to integrate various resources within a large network environment. • The purpose of the UniGrid project is to build a platform for academic research using grid-related technologies in Taiwan.

  4. Introduction (2) • 9 institutes join to develop the system • 國網中心 • 清華大學資工系 • 中研院資科所 • 東華大學資工系 • 東海大學資科系 • 中華大學資工系 • 靜宜大學資管系 • 興國管理學院電子商務學系 • 台灣大學大氣科學系

  5. Introduction (3) • All institutes that participate in the UniGrid project contribute some resources. • These resources can be used in collaboration for large scale applications.

  6. Introduction (4) • System Architecture

  7. Outline • Introduction • Portal • Broker and Scheduler • Resource Information Service • Storage Service • Applications • Conclusion

  8. Portal • The UniGrid portal provides an interface for UniGrid users to use the resources available in the UniGrid system. • Functionalities of the portal • System status monitoring • Single sign-on • User workflow management • Project information

  9. System Status Monitoring (1) • UniGrid users can examine the status of system resources through the portal. • The portal gathers the current system information from the information service and present these information to the users.

  10. System Status Monitoring (2) • Screenshot of the system status monitoring web page

  11. Single Sign-On (1) • Single sign-on is a mechanism whereby a single authentication can permit a user to access all resources where he has access permission, without the need to enter multiple passwords. • All user account information are kept in a database at the portal site. • When a user requests a service, his verification data is passed to that service. • The request will be granted only if the identity is verified by the verification web service

  12. Single Sign-On (2) • User identity verification through single sign-on service

  13. User Workflow Management (1) • A UniGrid user can design and save his own workflows at the UniGrid portal. • A user can select any workflow he designed and execute the workflow through the UniGrid portal. • A user can also monitor the status of his workflow through the UniGrid portal.

  14. User Workflow Management (2) • Structure of a workflow Workflow parallel execution sequential execution

  15. User Workflow Management (3) • The workflows of each user is stored in the portal storage in XML format. • <flow name="testflow" numstages="3"> <stage name="stage1" numjobs="1"> <job id="0"> <sortkey>1</sortkey> <runtype>mpi</runtype> <workdir>/home/test/</workdir> <filename>mm_mpi</filename> <runrp>true</runrp> <datafile/> <argu>256</argu> <otherurl/> <cpuno>4</cpuno> </job> </stage> … </flow>

  16. User Workflow Management (4) • Screenshot of the workflow editing web page

  17. User Workflow Management (5) • When an user submits a workflow, the portal will pass the selected workflow information to the broker. • Upon receiving an execution request, the resource broker will find the required resource for that workflow and schedule its execution.

  18. User Workflow Management (6)

  19. User Workflow Management (7) • Users can examine the execution status of his workflow through the portal’s workflow monitoring system. • All workflow execution information are stored in a database at the machine with resource broker installed on it. • The portal queries the database and obtain the current status of a particular workflow. • The status information is processed and presented in the form of web pages.

  20. User Workflow Management (8) • Screenshot of the workflow monitoring web page

  21. User Workflow Management (9) • Screenshot of the UniGrid workflow management web page

  22. Outline • Introduction • Portal • Broker and Scheduler • Resource Information Service • Storage Service • Applications • Conclusion

  23. Broker & Scheduler (1) • The broker provides a uniform interface to access available resources in the UniGrid system. • The broker uses the resource information service to obtain the current status of the resources in the system. • After these information are gathered, the broker will allocate the resources that meets the requirements of the current job. • The jobs are then passed to the corresponding local schedulers to be executed locally.

  24. Broker & Scheduler (2) • Broker workflow

  25. Broker & Scheduler (3) • Each participating organization has a local scheduler (Condor) installed to schedule the jobs assigned to that organization. • Condor • A scheduler for large collections of distributively owned computing resources • Developed by the researchers at University of Wisconsin • Specialized for compute-intensive jobs • Uses the “ClassAd” mechanism to match job requirements to machine status and schedule the jobs according to the matching results

  26. Related Research (1) • Tools have been developed to simulate different load sharing and scheduling policies on computing grid and analyze their performance • Queuing methods • Independent clusters • Multiple queues • Forwarding to no-need-to-wait site • Forwarding to shortest-queue site • Forwarding to least-load site, load=

  27. Related Research (2) • Queuing methods (cont’d.) • Single queue • Multi-pool centralized queue • Single-pool centralized queue • One big cluster • Two-level scheduling • Empty queue only • Shortest queue first • Least load first • Two-level local queues • Forwarding to shortest-queue site

  28. Related Research (3) • Scheduling policies • Non-FCFS • Multi-pool centralized queue • Single-pool centralized queue • FCFS • Two-level scheduling • The performance of Non-FCFS is three times better than FCFS

  29. Related Research (4) • Implementation Approaches • Multi-Pool Centralized Queue • Global queue scheduling in the broker, no local queuing system • Global queue scheduling in the broker, making sure available processors through local queuing system • Single-Pool Centralized Queue • Global queue scheduling in the broker, no local queuing system

  30. Related Research (5) • Two-Level Scheduling (Empty-Queue-Only Multi-Pool Grid) • Global queue in the broker, local queues in the local queuing systems

  31. Related Research (6) • Simulation results

  32. Related Research (7) • Simulation results (cont’d.)

  33. Related Research (8) • Discussion • Non-FCFS methods can effectively improve the overall system utilization and performance. • The smallest first non-FCFS policy outperforms all other policies in terms of waiting time and waiting ratio. • As the worst case is concerned, the backfilling policy is superior because it does not allow jobs to be delayed by the backfilling activities

  34. Outline • Introduction • Portal • Broker & Scheduler • Resource Information Service • Storage Service • Applications • Conclusion

  35. Resource Information Services • The resource information service provides information about current resource status, these information can be used by other services of the system • Functionalities of the resource information service • Information system • Performance visualization of MPI parallel program’s execution

  36. Information System (1) • Provides an interface for other services to query various information about computing nodes • The statistics about the individual nodes are obtained using MDS (Monitoring & Discovery Service) provided by the Globus Toolkit • The current network status between machines are gathered using NWS (Network Weather Service) • Automatic update of node information • When a new computing nodes is added/removed

  37. Information System (2) • The Network Weather Service (NWS) • A distributed system that periodically monitors and dynamically forecasts the performance various network and computational resources can deliver over a given time interval • Developed by the researchers at UCSB • It uses numerical models to generate forecasts of what the conditions will be for a given time frame • Because this functionality is analogous to weather forecasting, the system is called Network Weather Service

  38. Information System (3)

  39. Information System (4) • Screenshot of the node status webpage

  40. Performance Visualization of MPI Programs (1) • Input: any application (depending on the availability of compiler in grid platform) • Output: performance visualization of the execution of this application

  41. Performance Visualization of MPI Programs (2) • Execution of a Parallel Application using 4 computing nodes

  42. Related Research (1) • Communication localization & data partitioning techniques in cluster-based grid system • Localized communication enhances performance of parallel applications on grid • Adaptive data partitioning for identical cluster & non-identical cluster grid topology • In-core & out-of-core applications

  43. Related Research (2) • Communication localization techniques for identical cluster Localized communication patterns Original communication patterns

  44. Related Research (3) • Communication localization techniques for non-identical cluster Original communication table

  45. Related Research (4) • Communication localization techniques for non-identical cluster (cont’d.) Localized communication table

  46. Outline • Introduction • Portal • Broker and Scheduler • Resource Information Service • Storage Service • Applications • Conclusion

  47. Storage Service • The goal of storage service is to provide a collaborative space where UniGrid users can share their data and resources with others. • Components of the storage service • Virtual storage system • Data management system

  48. Virtual Storage System (1) • Virtual storage system architecture

  49. Virtual Storage System (2) • The virtual storage system is implemented with Java as a web service • UniGrid services access the virtual storage system when they need to fetch/modify users’ data files • A client program is available for users to manage his own storage space • The files are stored in a master file server and replicas of the files are distributed to other machines

  50. Virtual Storage System (3)

More Related