1 / 29

Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids

Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids. Renato Figueiredo Arijit Ganguly, David Wolinsky, J. Rhett Aultman, P. Oscar Boykin, ACIS Lab, University of Florida http://wow.acis.ufl.edu. Outline. Motivations Background Condor Virtual Appliance: features

earl
Download Presentation

Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly, David Wolinsky, J. Rhett Aultman, P. Oscar Boykin, ACIS Lab, University of Florida http://wow.acis.ufl.edu

  2. Outline • Motivations • Background • Condor Virtual Appliance: features • On-going and future work

  3. Motivations • Goal: plug-and-play deployment of Condor grids • High-throughput computing; LAN and WAN • Collaboration: file systems, messaging, .. • Synergistic approach: VM + virtual network + Condor • “WOWs” are wide-area NOWs, where: • Nodes are virtual machines • Network is virtual: IP-over-P2P (IPOP) overlay • VMs provide: • Sandboxing; software packaging; decoupling • Virtual network provides: • Virtual private LAN over WAN; self-configuring and capable of firewall/NAT traversal • Condor provides: • Match-making, reliable scheduling, … unmodified

  4. 5. VMs obtain IP addresses from MyGrid Virtual DHCP server, join virtual IP network, discover available manager(s), and join pool 5b. VMs obtain IP addresses from OtherGrid Virtual DHCP server, join virtual IP network, discover available manager(s), and join pool 3. Create virtual IP namespace for pool: MyGrid:10.0.0.0/255.0.0.0 Prime custom image with virtual namespace, desired tools Bootstrap manager(s) 4. Download base and custom VM images; boot up 10.0.0.2 10.0.0.2 10.0.0.1 2. Download image; boot using free VM monitor (e.g. VMware Player or Server) 10.0.0.3 10.0.0.3 10.0.0.4 10.0.0.4 1. Prime base VM image with O/S, Condor, Virtual network; publish (Web/Torrent) Condor WOWs - outlook 10.0.0.1

  5. Condor WOW snapshot Gainesville Zurich Long Beach

  6. Roadmap • The basics: 1.1 VMs and appliances 1.2 IPOP: IP-over-P2P virtual network 1.3 Grid Appliance and Condor • The details: 2.1 Customization, updates 2.2 User interface 2.3 Security 2.4 Performance • Usage experience

  7. 1.1: VMs and appliances • System VMs: • VMware, KVM, Xen • Homogenous system • Sandboxing • Co-exist with unmodified hosts • Virtual appliances: • Hardware/software configuration packaged in easy to deploy VM images • Only dependences: ISA (x86), VMM

  8. 1.2: IPOP virtual networking • Key technique: IP-over-P2P tunneling • Interconnect VM appliances • WAN VMs perceive a virtual LAN environment • IPOP is self-configuring • Avoid administrative overhead of VPNs • NAT and firewall traversal • IPOP is scalable and robust • P2P routing deals with node joins and leaves • IPOP networks are isolated • One or more private IP address spaces • Decentralized DHCP serves addresses for each space

  9. IPOP Node B App App IPOP Node A B A tap0 tap0 (10.0.0.3) (10.0.0.2) eth0 (128.227.136.244) eth0 (139.70.24.100) P2P Overlay 1.2: IPOP virtual networking • Structured overlay network topology • Bootstrap 1-hop IP tunnels on demand • Discover NAT mappings; decentralized hole punching • VM keeps IPOP address even if it migrates on WAN • [Ganguly et al, IPDPS 2006, HPDC 2006]

  10. 1.3 Grid appliance and Condor • Base: Debian Linux; Condor; IPOP • Works on x86 Linux/Windows/MacOS; VMware, KVM/QEMU • 157MB zipped • Uses NAT and host-only NICs • No need to get IP address on host network • Managed negotiator/collector VMs • Easy to deploy schedd/startd VMs • Flocking is easy – virtual network is a LAN

  11. 2.1: Customization and updates • VM image: Virtual Disks • Portable medium for data • Growable after distribution • Disks are logically stacked • Leverage UnionFS file system • Three stacks: • Base – O/S, Condor, IPOP • Module – site specific configuration (e.g. nanoHUB) • Home – user persistent data • Major updates: replace base/module • Minor updates: automatic, apt-based

  12. 2.2: User interface (Windows host) VM console: X11 GUI Host-mounted loop-back Samba folder Loopback SSH

  13. 2.2: User interface (Mac host) VM console: X11 GUI Host-mounted loop-back Samba folder Loopback SSH

  14. 2.2: User interface (Linux host) VM console: X11 GUI Host-mounted loop-back Samba folder Loopback SSH

  15. 2.3 Security • Appliance firewall • eth0: block all outgoing Internet packets • Except DHCP, DNS, IPOP’s UDP port • Only traffic within WOW allowed • eth1 (host-only): allow ssh, Samba • IPsec • X.509 host certificates • Authentication and end-to-end encryption • VM joins WOW only with signed certificate bound to its virtual IP • Private net/netmask: ~10 lines of IPsec configuration for an entire class A network!

  16. 2.4: Performance • User-level C# IPOP implementation (UDP): • Link bandwidth: 25-30Mbit/s • Latency overhead: ~4ms • Connection times: • ~5-10s to join P2P ring and obtain DHCP address • ~10s to create shortcuts, UDP hole-punching SimpleScalar 3.0 (cycle-accurate CPU simulator)

  17. Experiences • Bootstrap WOW with VMs at UF and partners • Currently ~300 VMs, IPOP overlay routers (Planetlab) • Exercised with 10,000s of Condor jobs from real users • nanoHUB: 3-week long, 9,000-job batch (BioMoca) submitted via a Condor-G gateway • P2Psim, CH3D, SimpleScalar • Pursuing interactions with users and the Condor community for broader dissemination

  18. Time scales and expertise • Development of baseline VM image: • VM/Condor/IPOP expertise; weeks/months • Development of custom module: • Domain-specific expertise; hours/days/weeks • Deployment of VM appliance: • No previous experience with VMs or Condor • 15-30 minutes to download and install VMM • 15-30 minutes to download and unzip appliance • 15-30 minutes to boot appliance, automatically connect to a Condor pool, run condor_status and a demo condor_submit job

  19. On-going and future work • Enhancing self-organization at the Condor level: • Structured P2P for manager publish/discovery • Distributed hash table (DHT); primary and flocking • Condor integration via configuration files, DHT scripts • Unstructured P2P for matchmaking • Publish/replicate/cache classads on P2P overlay • Support for arbitrary queries • Condor integration: proxies for collector/negotiator • Decentralized storage, cooperative caching • Virtual file systems (NFS proxies) • Distribution of updates, read-only code repositories • Caching and COW for diskless, net-boot appliances

  20. Acknowledgments • National Science Foundation NMI, CI-TEAM • SURA SCOOP (Coastal Ocean Observing and Prediction) • http://wow.acis.ufl.edu • Publications, Brunet/IPOP code (GPL’ed C#), Condor Grid appliance

  21. Questions?

  22. Self-organizing NAT traversal, shortcuts Sends CTM request Node A Node B CTM request: connect to me at my NAT IP:port - A starts exchanging IP packets with B - Traffic inspection triggers request to create shortcut - Connect-to-me (CTM) - “A” tells “B” its known address(es): - “A” had learned NATed public IP/port when it joined overlay

  23. Self-organizing NAT traversal, shortcuts Link request: NAT endpoint (IP:port)A Node A Node B CTM reply through overlay: send NAT (IP:port)B - “B” sends CTM reply – routed through overlay - “B” tells “A” its address(es) - “B” initiates linking protocol by attempting to connect to “A” directly

  24. Self-organizing NAT traversal, shortcuts A Gets CTM reply; initiates linking Node A Node B - B’s linking protocol message to A pokes hole on B’s NAT • A’s linking protocol message to B pokes hole on A’s NAT CTM protocol establishes direct shortcut

  25. Performance considerations • CPU-intensive application, Condor • SimpleScalar 3.0d execution-driven computer architecture simulator

  26. Performance considerations • I/O: PostMark • Version 1.51 • Parameters: • Minimum file size: 500 bytes • Maximum file size: 4.77 MB • Transactions: 5,000

  27. Performance considerations • User-level C# IPOP implementation (UDP): • Link bandwidth: 25-30Mbit/s (LAN) • Latency overhead: ~4ms • Connection times: • (Fine-tuning has reduced mean acquire time to ~ 6-10s, with degree of redundancy n=8)

  28. Condor Appliance on a desktop VM Hardware configuration Swap User files Domain- specific tools Linux, Condor, IPOP

  29. Related Work • Virtual Networking • VIOLIN • VNET; topology adaptation • ViNe • Internet Indirection Infrastructure (i3) • Support for mobility, multicast, anycast • Decouples packet sending from receiving • Based on Chord p2p protocol • IPv6 tunneling • IPv6 over UDP (Teredo protocol) • IPv6 over P2P (P6P)

More Related