1 / 11

Condor's Use of the Cisco Unified Computing System

Condor's Use of the Cisco Unified Computing System. Condor and UCS. How can UCS benefit a Condor pool Automated provisioning of machines with specialized configurations Experimenting with a small UCS system and our local Condor pools CHTC BaTLab. Center for High-Throughput Computing.

skah
Download Presentation

Condor's Use of the Cisco Unified Computing System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Condor's Use of the Cisco Unified Computing System

  2. Condor and UCS • How can UCS benefit a Condor pool • Automated provisioning of machines with specialized configurations • Experimenting with a small UCS system and our local Condor pools • CHTC • BaTLab

  3. Center for High-Throughput Computing • Offers free compute resources to any researcher at UW-Madison • Extensive hands-on assistance for researchers • 2000 dedicated cores in CHTC • Many more cores available across campus and in OSG

  4. BaTLab • Automated building and testing of software • Dozen of platforms • Used by Condor and many other projects • www.batlab.org

  5. Use Cases • Switching OS on a machine • Match resource mix to demand • Jobs that require hardware access • GPUs, hardware counters, VM testing • Jobs that need special network configurations • Sensitive data (HIPPA) • Tests analyzing network traffic

  6. Switching OS on a Machine • BaTLab has 1 or 2 machines per platform • Spike in jobs for a platform can cause large backlog • OS-switching machines can help manage backlog

  7. Tools We’re Using • Offline ads • Condor rooster • GoUCS • PXE boot • Cobbler • Puppet

  8. S S S S S J J J Machine Startup rooster Central Manager negotiator collector GoUCS UCS P P Submit Machine Execute Machine Execute Machine cobbler PXE startd puppet schedd

  9. Future Work • Improve offline ads and rooster • Multiple ads represent one machine • Pick most-matched ad to awaken • Shut down machine to switch ads

  10. Future Use Case • Jobs that require hardware access • Jobs that use GPUs, hardware counters, VMs • Administrator access required • Jobs may corrupt OS or hardware • Need hardware-level method to stop job and re-image machine

  11. Future Use Case • Special Network Configurations • Jobs with sensitive data • E.g. HIPPA data • Configure machine on secured network • Jobs analyzing network traffic • E.g. Three-machine test where Alice and Bob communicate and Eve listens in • Configure machines on an isolated network

More Related