1 / 15

Exploiting SCI in the MultiOS management system

This paper discusses the problem of crashes in OS research environments and proposes a MultiOS solution that utilizes SCI to efficiently manage multiple environments in a cluster.

ehayes
Download Presentation

Exploiting SCI in the MultiOS management system

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting SCI in the MultiOS management system Ronan Cunniffe Brian Coghlan SCIEurope’2000 29-AUG-2000

  2. The Problem Those OS researchers crashed our cluster !

  3. The Desire • Crashes may be illuminating ! • OS research environment: • may not be stable • may be missing features • Separate cluster per project: • is inefficient • is expensive Mmmm …. As clusters become more common, problem gets more acute

  4. Existing solution Partition the local disk i.e. Dual/Multiple Boot • all candidate environments must be there • no easy way to add another environment • assumes environments will not over-write others • only the current image is accessible

  5. Time …. OS #1 OS #2 OS #1 out OS #2 in MultiOS solution - import & export every time Import the environment each time (from remote disk) • can support any number of environments • no assumptions about stability or good behaviour • environments are accessible when off-line • places great demands on network and remote disk

  6. MultiOS • standard mechanism for diskless workstations • BOOTP: Query remote server for file to download • TFTP: Download indicated file • Pass control to downloaded file • alternate between two types of session • Management sessions: • node runs management software obtained over network • this loads user image to local disk • User sessions: • node runs whatever environment is on the local disk • MultiOS considers this a black box How is it implemented?

  7. Use a full OS as management environment • can run from a RAM disk • can use network-mounted filesystems • Advantages of a full OS • SCI drivers • raw disk I/O support • standard UNIX tools: dd, diff, gzip, rcp, etc. • new tools can be written as necessary Management software is Linux

  8. SCI Network Scheduler Image Server BOOTP Server MultiOS Server Console (WWW) Linux FS (+MultiOS client) TFTP Server Cluster Nodes Overall MultiOS architecture • Architecture: • standard servers: BOOTP, TFTP, HTTP • isolated web console

  9. Disk images are big - hard disks are slow Our cluster: 16 nodes x 2GB disks 2GB raw over 100Mbps Ethernet: Nodes Network traffic Time 4 8GB@ 8MB/s 17.1 mins 8 16GB@ 8MB/s 34.1 mins 16 32GB@ 8MB/s 68.3 mins 32 64GB@ 8MB/s 136.5 mins 2GB raw over SCI: Nodes Network traffic Time 4 8GB@ 40MB/s 3.3 mins 8 16GB@ 50MB/s 5.5 mins 16 32GB@ 50MB/s 10.7 mins 32 64GB@ 50MB/s 21.3 mins

  10. Making the numbers smaller • Compression • Loading less than a disk • Multicast reference image + patching 3 ways to do it: • For 100MB differential information for each compute node: • Nodes Network traffic Storage Time • 4 2GB@10MB/s + 0.4GB@40MB/s 2.4GB 3.5 mins • 8 2GB@10MB/s + 0.8GB@50MB/s 2.8GB 3.6 mins • 16 2GB@10MB/s + 1.6GB@50MB/s 3.6GB 4.0 mins • 32 2GB@10MB/s + 3.2GB@50MB/s 5.2GB 4.4 mins

  11. If disks were faster Assuming 50MB/s disks : • For 100MB differential information for each compute node: • Nodes Network traffic Storage Time • 4 2GB@50MB/s + 0.4GB@50MB/s 2.4GB 48 secs • 8 2GB@50MB/s + 0.8GB@50MB/s 2.8GB 56 secs • 16 2GB@50MB/s + 1.6GB@50MB/s 3.6GB 72 secs • 32 2GB@50MB/s + 3.2GB@50MB/s 5.2GB 104 secs

  12. SCI Multicast vs. Ethernet Multicast • SCI Multicast-by-propagation • end-to-end latency small compared to total time • little extraneous traffic • Difference becomes important in partitioned clusters • IP Multicast • multicast = broadcast • image traffic disturbs everyone

  13. Partitionable clusters • MultiOS traffic can easily saturate the image server • fewer points of entry than partitions • transport scripts can provide traffic shaping

  14. Summary MultiOS allows a cluster to be shared • for any number of different environments • for any type of research • MultiOS via SCI exploits: • high bandwidth ceiling • low protocol overheads

  15. Conclusion Our vision of the future http://www.cs.tcd.ie/multios/ NB: OS research is an equal opportunity employer ! Normal user OS researcher

More Related