Current and Emerging Cluster Components

Current and Emerging Cluster Components PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

20 October 2001. [email protected] Contents. Background the problem domain.Cluster 2001 highlighted issues:System tools,Communications.The TFCCSummary. 20 October 2001. [email protected] Beowulf is a Break Through!. Bringing the power to people:Reducing the entry price for H

Download Presentation

Current and Emerging Cluster Components

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

1. Current and Emerging Cluster Components Mark Baker

2. 20 October 2001 [email protected] Contents Background – the problem domain. Cluster 2001 – highlighted issues: System tools, Communications. The TFCC Summary

3. 20 October 2001 [email protected] Beowulf is a Break Through! Bringing the power to people: Reducing the entry price for High-Performance computing. Have your own supercomputer to play with! More people, more innovative ideas, more software, more applications… Only hope for small groups and developing countries to play with a machine like this big institutions/labs. Not yet there with this class of system – the problem is that the software is still immature: Installation, configuration, programming and usage.

4. 20 October 2001 [email protected] Cluster Software Tools and Environments

5. 20 October 2001 [email protected] Cluster Software Odyssey: The Dark Ages Circa 1994-1996 Few cluster tools: System has to be integrated and operated manually. System administrators use UNIX-based tool and ad hoc scripts. Difficult – only a few elite organization can build a Beowulf cluster

6. 20 October 2001 [email protected] Cluster Software Odyssey: The Middle Ages Circa 1997-1999 Simple, non-interoperable tools: Cluster tools started to appears. Each tool solves one particular problem, on one particular cluster. System integration and management depends on learning many tools.

7. 20 October 2001 [email protected] Cluster Software Odyssey: The Modern Age 2000 - Present. Integrated system tools and environments: Cluster distribution and software packages such as SCE, OSCAR, SCYLD, ROCKS. Better integration, and more comprehensive functionalities. Tools starting to appear to debug, profile and load balance applications.

8. 20 October 2001 [email protected] A Solid Software Philosophy Simplicity: Easy to use GUI and rich set of command line tools. Simple, well defined components that combined together can form a powerful system. Portability: Portable and compatible with other software components. Operable on a heterogeneous environment.

9. 20 October 2001 [email protected] A Solid Software Philosophy Interoperability: Shared configuration, system information, common components and services. Extensibility: Offer a rich set of services and well defined “open” APIs. Extensible, adaptable and dynamic components – easy to bolt-on and load new components.

10. 20 October 2001 [email protected] A Solid Software Architecture Middleware extends OS services. Services extends middleware. Command line extends services using API. GUI/Web based on command line.

11. 20 October 2001 [email protected] Solid Standards Information exchange and representation standards to describe: System configuration, dynamics, application performance, QoS, FT, reliability, etc. Software interaction and communication APIs. Use powerful meta-languages, such as XML, seems an excellence choice for this – W3C standard. Perhaps produce markup languages and style sheets for mapping between language definitions.

12. 20 October 2001 [email protected] Solid Standards Desperately need organised community efforts in defining such standards: Each tool/utility builder group typically takes their own approach of doing things and still not contributing to global community. Give users more freedom to mix and match tools. Give vendors a stable targets to base their tools/utilities… what MPI did for message passing!

13. 20 October 2001 [email protected] More answers When we can expect a profound shift in capability of Cluster software? No profound shift: Cluster software development is evolutionary not revolutionary. New demands, new features evolve… It is a difficult problem and a lot of work: Need to learn how to do it properly takes time. Need lots of “people resources” as there are many things to do. Beowulf Systems are helping to stimulate the rapid development of cluster software since it gives HPC technology and freedom to people.

14. 20 October 2001 [email protected] Cluster 2001, Newport Beach, CA

15. 20 October 2001 [email protected] Cluster 2001 - Flier

16. 20 October 2001 [email protected] Cluster 2001

17. 20 October 2001 [email protected] Cluster 2001 – Some Statistics Very successful, despite the recent circumstances: 300 delegates, 60 papers, 3 Keynote talks, 20 invited talks, Major sponsorship – Microsoft/Sun/IBM 20 exhibitors – all major apart from Cray! Queen Mary banquet. Next years event is being hosted by NCSA/ANL in Chicago, late September 2002.

18. 20 October 2001 [email protected]

19. 20 October 2001 [email protected]

20. 20 October 2001 [email protected]

21. 20 October 2001 [email protected]

22. 20 October 2001 [email protected]

23. 20 October 2001 [email protected] Cluster 2001 – Hot Areas System Administration tools Rocks/SCE/OSCAR/… Communications: Infiniband/GM-Sockets Talk a little about these areas… Skip these though: Scheduling, Transmeta, rolling upgrades, multirail comms, DECK

24. 20 October 2001 [email protected] Tools for system installation, configuration and management.

25. 20 October 2001 [email protected] System Tools: Cluster “industry” Problems Cluster standard versus standard cluster: One size does not fit all, One type does not work for all, There should be a unified view for all. Stumbling block to commercialisation: I want the software for free: Oh, and I want it to be of commercial quality, Did I mention support – I’d like that free too… Open source does not mean free: What are you willing to pay?

26. 20 October 2001 [email protected] Extreme Linux May 13, 1998 $29.95 CD RedHat

27. 20 October 2001 [email protected] PGI Cluster Development Kit (CDK). Includes “floating” seat compiler licenses.

28. 20 October 2001 [email protected] Rocks Cluster distribution. Cluster tools. RedHat/ Kickstart/ RPM based

29. 20 October 2001 [email protected] SCore Claims NOT a Beowulf “type” cluster. PMv2 communication library. Not TCP/IP stack based.

30. 20 October 2001 [email protected] Scyld Unique and cool! Built to cluster. Single point administration. Load and run.

31. 20 October 2001 [email protected] EnFuzion Same job everywhere model.

32. 20 October 2001 [email protected] PowerCockpit Cool GUI. Image based system. Global commands.

33. 20 October 2001 [email protected] CPlant Emphasis on scalability: Of running codes Not of building… Help me!!!

34. 20 October 2001 [email protected] Unlimited Scale We will help you! When? Will it support a range of system sizes? aka – small user vs big user problem

35. 20 October 2001 [email protected] MOSIX Migration is cool: Performance hit. Longevity award.

36. 20 October 2001 [email protected] SCE Scalable Cluster Environment. Diskless clusters. Tools included.

37. 20 October 2001 [email protected] OSCAR v1.x – repackage current best practices: One button – it was pressed for you at the factory. v2.x – repackage and more: Install. Maintenance. Daily operation.

38. 20 October 2001 [email protected] MSC.Linux OSCAR based. Adds: Webmin tool. Commercial grade integration and testing.

39. 20 October 2001 [email protected] Cluster-in-a-Box OSCAR based Adds: Myrinet. Additional Alliance software. Support for IA-64.

40. 20 October 2001 [email protected] System Components Functional Areas: Cluster Installation, Programming Environment, Workload Management, Security, General Administration & Maintenance. Others: Packaging, Documentation.

41. 20 October 2001 [email protected] Communications Infrastructure GM Sockets Infiniband

42. 20 October 2001 [email protected] GM Sockets API for high performance applications generally based on high-level ones, such as MPI/PVM or other low-level ones, such as GM/VIA/AM. To date there has been little interest in producing a high-performance Sockets API for existing applications, such as Client/Server or DBMS.

43. 20 October 2001 [email protected] GM Sockets - Motivations TCP/IP on GM = GigabitEthernet Performance = 85 MBytes/s, with 80?s Latency. GM Raw performance = 247 MBytes/s, ~ 7?s. How can existing, distributed applications using TCP/IP be improved?

44. 20 October 2001 [email protected] Goals – GMSOCKS Replace TCP/IP stack with thin, fast software layer, reduce overhead: or: redesign your application and use MPI/PVM or VIA to speed up your performance… Provide “ALL” TCP/IP semantics. Boost Performance. Let existing applications run out of the box: No relinking. Achieve failover strategy: In case something goes wrong. External communication without SAN.

45. 20 October 2001 [email protected] About GMSOCKS Provides a thin layer which mimics TCP/IP semantics. Requires additional control to original socket functions. Use Companion Sockets for high speed data transfer. Challenge: map existing TCP/IP calls to GMSOCKS layer. Mimic System, Kernel Functionality! Detect processes which got a SegV, Kill, Control-C …! This is User Level Communication.

46. 20 October 2001 [email protected] About GMSOCKS Be Winsock1.1 and Winsock 2 compliant: Winsock 2 adds overlapping! A bunch of additional socket functions! Other functions can perform operations on socket handles. You are allowed to combine Winsock 1.1 and 2 functions.

47. 20 October 2001 [email protected] GM Functionality Availability for W2K, LINUX, Solaris, … Connectionless, reliable, Point-to-Point communication. Message Passing Stylish API: Non-blocking gm_receive, gm_send Provides Zero Copy Methods: gm_register, gm_direct_send Provides get/put functionality. DMA engine is ideal for overlapping, CPU offloading. Fits well the socket programming style!

48. 20 October 2001 [email protected] Benchmarks – PC Specification PCI 64-Bit 66 MHz. Myrinet 2000. Windows 2000. Supermicro 370DLE, PIII 1 GHz: 455 MBytes/s bus read, 512 MBytes/s bus write.

49. 20 October 2001 [email protected] Latency Comparison

50. 20 October 2001 [email protected] Netpipe versus GM Raw (Streaming)

51. 20 October 2001 [email protected] Summary GMSOCKS running (well) with Detours Package. Tuning for larger messages. Binary Compatibility! Looking at techniques based: Layered Service Provider, Winsock Direct. Layering Technique for Linux/Solaris: Testing phase, including fork. Some more applications (ftp, ftpd). Start looking for a commercial application: Databases, …

52. 20 October 2001 [email protected] Infiniband

53. 20 October 2001 [email protected] Infiniband

54. 20 October 2001 [email protected] Infiniband

55. 20 October 2001 [email protected] Infiniband

56. 20 October 2001 [email protected] Infiniband

57. 20 October 2001 [email protected] Infiniband

58. 20 October 2001 [email protected] Infiniband

59. 20 October 2001 [email protected] Infiniband

60. 20 October 2001 [email protected] TFCC - A Little History Obvious huge interest in Clusters, seemed natural to set up a focussed group in this area. A Cluster Computing Task Force was proposed to the IEEE CS, approved and started operating in February 1999 – been going just over 2 years. Task Forces: Expected to have a finite life (<3 years), longer generally not appropriate. Either increase scope of activities or fade and die. TFCC will submit an application to the CS become a TC later this year.

61. 20 October 2001 [email protected] TFCCs Web PageTFCCs Web Page

62. 20 October 2001 [email protected] Annual Conference – ClusterXY IEEE International Workshop on Cluster Computing (IWCC99), Melbourne, Australia, December 1999, about 105 delegates from 16 countries. IEEE International Conference on Cluster Computing (Cluster 2000), Chemnitz, Germany, November, 2000, anticipate 160 delegates. IEEE International Conference on Cluster Computing (Cluster 2001), Newport Beach, California, October 8-11, 2001, expect 300 delegates.

63. 20 October 2001 [email protected] Other TFCC Activities Book Donation Programme 500+ copies of books and journals have been donated to faculties all over the world as part of educational promotion programme. Cluster Computing Archive CoRR is an initiative in collaboration with the ACM, the Los Alamos e-Print archive, and NCSTRL (Networked Computer Science Technical Reference Library). Top Clusters TFCC collaboration with Top500 project. Numeric, I/O, Web, database, and application level benchmarking of clusters.

64. 20 October 2001 [email protected] TFCC White paper A White paper on Cluster Computing, submitted to the International Journal of High-Performance Applications and Supercomputing, November 2000 Snap-shot of the state-of-the-art of Cluster Computing. Preprint:

65. 20 October 2001 [email protected] Future Plans We plan to submit an application to the IEEE CS Technical Activities Board (TAB) to attain full Technical Committee status – in November at SC2001 in Denver The TAB see the TFCC as a success and we hope that our application will be successful.

66. 20 October 2001 [email protected] TFCC - Summary Successful conference series has been started, with commercial sponsorship. Promoting Cluster-based technologies through TFCC sponsorship. Helping the community with our book donation programme. Engendering debate and discussion through mailing forum. Keeping the community informed with our information rich TFCC Web site.

67. Vision of the Future Age

68. 20 October 2001 [email protected] Powerful User Environment Easier and better integrated user environment: How to make it easier for the user to run, debug, run, profile and visualise their program easier. Migration of PC/Windows/MAC ideas to make things easy.

69. 20 October 2001 [email protected] Where are We Distinction between PCs and workstations hardware/software has evaporated. Beowulf-class systems and other PC clusters firmly established as a mainstream compute-performance resource strategy. Linux and Windows established as dominant O/S platforms. Integrating COTS network technology capable of supporting many application/algorithms. Both business/commerce and science/engineering exploiting Beowulfs for price-performance and flexibility

70. 20 October 2001 [email protected] Where are We (2) Thousand processor Beowulf systems. Multi-Gflops/s processors. MPI and PVM standards. Extreme Linux effort providing robust and scalable resource management. SMP support (on a node). First generation middleware components for distributed resource, multi-user environments. Books on Linux, Beowulfs, and general clustering available. Vendor acceptance in to market strategy.

71. 20 October 2001 [email protected] Million $$ Tflops/s Today, $2M peak Tflops/s. In year 2002, $1M peak Tflops/s. Performance efficiency is serious challenge. System integration: does vendor support of massive parallelism have to mean massive markup? System administration, boring but necessary. Maintenance without vendors; how? New kind of vendors for support! Heterogeneity will become major aspect.

72. 20 October 2001 [email protected] Summary of Immediate Challenges There are more costs than capital costs. Higher level of expertise is required in house. Software environments are behind vendor offerings. Tightly coupled systems easier to exploit in some cases. Linux model of development scares people. Not yet for everyone. PC-clusters have not achieved maturity.

73. 20 October 2001 [email protected] Software Stumbling Blocks Linux cruftiness… Heterogeneity. Scheduling and protection in time and space Task migration. Checkpointing and restarting. Effective, scalable parallel file system. Parallel debugging and performance optimization. System software development frameworks and conventions.

74. 20 October 2001 [email protected] Towards the near Future: what can we expect? 2+ Gflops/s peak processors. <$1000 per processor. 1 Gbps at < $250 per port. New backplane performance e.g. PCI++, Infiniband. Light-weight communications, <5?s latency. Optimized math libraries. 1+ GByte main memory per node. 40 GByte disk storage per node. De facto standardised middleware.

75. 20 October 2001 [email protected] The Future Common standards and Open Source software. Better: Tools, utilities and libraries; Design with minimal risk to accepted standards. Higher degree of portability (standards). Wider range and scope of HPC applications. Wider acceptance of HPC technologies and techniques in commerce and industry. Emerging Grid-based environments.

76. 20 October 2001 [email protected] Clusters go Global Grid Computing: Computing involves people, computing system, devices, massive data base, high diversity of people and computing infrastructure, distributed geographically. Cluster will be an important part of the big picture for global scale computing Need software environments that seamlessly integrate infrastructure to Grid.

77. 20 October 2001 [email protected] Acknowledgements Putchong Uthayopas – Kasetsart University, Thailand Stephen Scott – ORNL, USA Markus Fischer – University of Mannheim, Germany Gregory Pfister – IBM, Austin Texas, USA Thomas Sterling – Caltech/JPL, USA Larry Bergman & Dan Katz, JPL, USA

  • Login