1 / 12

How to build a Myria-cluster (In the metric system, “myria” = 10,000)

How to build a Myria-cluster (In the metric system, “myria” = 10,000). Charles L. Seitz Myricom, Inc. (chuck@myri.com) CCGSC 24 September 2000. Background. The assumed time frame is 2001-2002. There are a few 1,000 + -host clusters in operation today, with many more planned.

erelah
Download Presentation

How to build a Myria-cluster (In the metric system, “myria” = 10,000)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to build a Myria-cluster(In the metric system, “myria” = 10,000) Charles L. Seitz Myricom, Inc. (chuck@myri.com) CCGSC24 September 2000

  2. Background • The assumed time frame is 2001-2002. • There are a few 1,000+-host clusters in operation today, with many more planned. • Just 4 years ago a 100-host cluster was considered to be remarkable. The subject of this brief talk is to consider from an engineering viewpoint the feasibility of a cluster of 10,000 hosts. This talk is intended to be interactive. I shall frequently ask the members of this distinguished audience for their estimates, opinions, and predictions of the future.

  3. The performance/cost argument for clusters Performance For a single processor Region of diminishing returns Cost-effective processors Cost Cluster computing shifts the burden of extending performance from the processors, where it is limited by physics and technology, to algorithms and programming (distributing an application across a collection of cost-effective hosts).

  4. Assumed Host Characteristics • The best peak performance/cost is likely with 2-4 processors per host (small-scale SMP architecture). • Example: 2-processor or 4-processor Alpha EV-7 host: • >> 100 SPECfp95; >>50 SPECint95 each processor • Main-memory peak data rate >10 GB/s (?) • Peak I/O rate (PCI-X) ~1 GB/s per PCI slot A useful “rule of thumb” for clusters is that the data rate to or from a host need not be more than a modest fraction – perhaps 10-20% – of the host’s memory bandwidth. A distributed computation that consumes so much memory bandwidth that it impacts compute performance is over-distributed. Thus, the network connection should support 1-2 GB/s data rate.

  5. Physical Size • A rack, including front and back access space, is ~1 m2 of floor space. • Somewhat pessimistic estimate: 10 hosts/rack is 1,000 m2 = ~(32m)2 for 10,000 hosts. • Implication: Most connectivity will be fiber if a “good” topology is employed. • Best data rate per cost will be with 2.5 GBaud (2 Gbit/s after 8b/10b encoding) VCSEL optical components and multimode fiber -- the fiber PHY level of 1x InfiniBand, Myrinet-2000, and other fast networks. Part of the “Los Lobos” cluster at the University of New Mexico. The 256 hosts are 2-processor IBM Netfinity units, the operating system is Linux, and the interconnect is Myrinet. The system supplier was IBM.

  6. The topology should be a Clos network • Clos networks are named for Charles Clos, who introduced them in a paper titled “A Study of Non-Blocking Switching Networks,” published in the Bell System Technical Journal in March 1953. • A Clos network is a rearrangeable, which means that it can route any permutation without blocking. Although the property of being rearrangeable is rarely exploited directly in clusters, a rearrangeable network necessarily exhibits full (maximal) bisection, a property that is crucial to any network that claims to be scalable. • A crossbar switch is a rearrangeable and full-bisection network, but technology sets an upper bound on the degree of crossbar switches.

  7. Scalable Clos Networks Example of a Clos network for 320 hosts, composed of 16-port switches 64 hosts 64 hosts 64 hosts 64 hosts 64 hosts

  8. A small, intuitive explanation 64 inter-switch links This line cuts 64 links Preserves 64-link bandwidth in this direction 64 host links The vertical dashed line cuts 32 links. In fact, the number of links between any {32, 32} partition of hosts is 32 (maximal).

  9. Why Clos Networks? • Maximal performance under arbitrary traffic patterns • Minimum bisection is the largest possible • “Rearrangable Network” (can route any permutation) • Network looks the same from any host (simplifies cluster management) • Multiple paths • All progressive routes are deadlock-free • Use multiple paths for redundancy • Use multiple paths to avoid hot spots (random dispersion) • Scales well. For n hosts (minimum bisection = n /2): • Diameter varies as log(n) • Cost varies as nlog(n) • Modular • Economies of sharing power and system monitoring between many switches, and implementing many of the inter-switch links on circuit boards rather than cables.

  10. The internal motherboard of the Myrinet M2LM-Clos64

  11. Recap of the Configuration • 10,000 2-processor or 4-processor hosts (10 (?) peak Gflops per host) • 2 x (250+250 MB/s) ports per host (1 GB/s total data rate) • 2 ports per NIC is attractive both for performance and for failover • 20,000-host-port Clos network • Diameter = 9 switches if based on 16-port switches Reliability / Availability • At 50K hours MTBF, an average of 5 hosts fail each day. • At 4M hours MTBF, a NIC fails ~ each 2 weeks. • A localized failure in the central switch ~ each 2 weeks (?). Clearly, such a system must be designed so that all components can be hot-swapped, and the system-management software must monitor the system status and allocate resources accordingly.

  12. Difficult questions Cost • ~$250M, based on $20K/host plus interconnect and integration. Operating System • Linux, of course ;-). Would you want to use a proprietary OS? Why? • A sufficiently important set of computing problems. • Bragging rights (at several levels).

More Related