1 / 32

The IEEE CS Task Force on Cluster Computing (TFCC)

The IEEE CS Task Force on Cluster Computing (TFCC). William Gropp Mathematics and Computer Science Argonne National Lab www.mcs.anl.gov/~gropp. Thanks to Mark Baker University of Portsmouth, UK http://www.dcs.port.ac.uk/~mab. A Little History.

mireya
Download Presentation

The IEEE CS Task Force on Cluster Computing (TFCC)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The IEEE CS Task Force on Cluster Computing (TFCC) William GroppMathematics and Computer ScienceArgonne National Labwww.mcs.anl.gov/~gropp Thanks to Mark BakerUniversity of Portsmouth, UKhttp://www.dcs.port.ac.uk/~mab

  2. A Little History • In 1998 there was obvious huge interest in clusters, so it seemed natural to set up a focused group in this area. • A Cluster Computing Task Force was proposed to the IEEE CS. • The TFCC was approved and started operating in February 1999 – been going just over 2 years. gropp@mcs.anl.gov

  3. Proposed Activities • Act as an international forum to promote cluster computing research and education, and participate in setting up technical standards in this area. • Be involved with issues related to the design, analysis and development of cluster systems as well as the applications that use them. • Sponsor professional meetings, produce publications, set guidelines for educational programs, and help co-ordinate academic, funding agency, and industry activities. • Organize events and hold a number of workshops that would span the range of activities sponsored by the Task Force. • Publish a bi-annual newsletter to help the community keep abreast of activities in field. gropp@mcs.anl.gov

  4. IEEE CS Task Forces • A TF is expected to have a finite term of existence, normally a period of 2-3 years - continued existence beyond that point is generally not appropriate. • A TF is expected to either increase their scope of activities such that establishment of a Technical Committee (TC) is warranted, or the task force will be merged into existing TCs. • TFCC will submit an application to the CS become a TC later this year. gropp@mcs.anl.gov

  5. Why a separate TFCC! • It brings together all the activities/technologies used with Cluster Computing into one area - so instead of tracking four or five IEEE TCs there is one... • Cluster Computing is NOT just Parallel, Distributed, OSs, or the Internet, it is a mix of them all, and consequently different. • The TFCC is an appropriate body for focusing activities and publications associated with Cluster Computing. gropp@mcs.anl.gov

  6. http://www.ieeetfcc.org gropp@mcs.anl.gov

  7. TFCC Mailing Lists • Currently three emails lists have been set up: • tfcc-l@bucknell.edu – a discussion list open to anyone interested in the TFCC - see TFCC page for info. on “how to subscribe”. • tfcc-exe@port.ac.uk– a closed executive committee mailing reflector. • tfcc-adv@port.ac.uk– a closed advisory committee mailing reflector. gropp@mcs.anl.gov

  8. Annual Conference – ClusterXY • 1st IEEE International Workshop on Cluster Computing (Cluster 1999), Melbourne, Australia, December 1999, about 105 attendees from 16 countries. http://www.clustercomp.org • 2nd IEEE International Conference on Cluster Computing (Cluster 2000), Chemnitz, Germany, November, 2000, anticipate 160 attendees. http://www.tu-chemnitz.de/cluster2000 • 3rd IEEE International Conference on Cluster Computing (Cluster 2001), Newport Beach, California, October 8-11, 2001, expect 250-300 attendees. http://andy.usc.edu/cluster2001 gropp@mcs.anl.gov

  9. Associated Events - GRID’XY • 1st IEEE/ACM International Workshop on Grid Computing (Grid2000), Bangalore, India, December 17, 2000 (attendees from 15 countries). http://www.gridcomputing.org • 2nd IEEE/ACM International Workshop on Grid Computing (Grid2001), at SC2001, November 2001 gropp@mcs.anl.gov

  10. Supercomputing • “Birds of A Feather” at SC99 and SC2000. • Aims of meetings are to gather together interested parties and bring them up to date, but also put together a bunch of short talks and start a discussion on a variety of topics… • Probably be another at SC01 – depending on the community interest. gropp@mcs.anl.gov

  11. Other Activities • Book donation program • Cluster Computing Archive • www.ieeetfcc.org/ClusterArchive.html • TopClusters Project • www.TopClusters.org • TFCC Whitepaper • www.dcs.port.ac.uk/~mab/tfcc/WhitePaper • TFCC Newsletter • www.eg.bucknell.edu/~hyde/tfcc gropp@mcs.anl.gov

  12. TopClusters Project • http://www.TopClusters.org • TFCC collaboration with Top500 project. • Numeric, I/O, Web, Database, and Application level benchmarking of clusters. • Joint BOF with Top500 at SC2000 on Cluster-based benchmarking. • Ongoing effort… gropp@mcs.anl.gov

  13. TFCC Whitepaper • A Whitepaper on Cluster Computing, submitted to the International Journal of High-Performance Applications and Supercomputing, November 2000 • Snap-shot of the state-of-the-art of Cluster Computing. • Preprint, www.dcs.port.ac.uk/~mab/tfcc/WhitePaper/ gropp@mcs.anl.gov

  14. TFCC Membership • Over 300 registered members • Free membership open to all, but few benefits may be restricted - (reduced registration fee for IEEE members) • Over 450 on the TFCC mailing list <tfcc-l@bucknell.edu> gropp@mcs.anl.gov

  15. Future Plans • We plan to submit an application to the IEEE CS Technical Activities Board (TAB) to attain full Technical Committee status. • The TAB see the TFCC as a success and we hope that our application will be successful. • Obviously if we achieve TC status, we will need the continuing assistance and help of the TFCCs current volunteers plus encourage a bunch of new ones… gropp@mcs.anl.gov

  16. Summary • Successful conference series has been started, with commercial sponsorship. • Promoting Cluster-based technologies through TFCC sponsorship. • Helping the community with our book donation program. • Engendering debate and discussion through mailing forum. • Keeping the community informed with our information rich TFCC Web site. gropp@mcs.anl.gov

  17. Scalable Clusters • TopCluster.org list: • 26 Clusters with 128+ nodes • 8 with 500+ nodes • 34 with 64-127 nodes • Most run Linux • Most dedicated to applications • Where are scalable tools developed and tested? • Caveats: • Does not include MPP-like systems (IBM SP, SGI Origin, Compaq, Intel TFLOPs, etc.) • Not a complete list • Only clusters explicitly contributed to topcluster.org gropp@mcs.anl.gov

  18. What is Scalability? • Most common definition in use: • Works for n+1 nodes if it works for n, for small n • Practical definition • Operations complete “fast enough” • 0.5 to 3 seconds for “interactive” • Operations are reliable • Approach to scalability must not be fragile gropp@mcs.anl.gov

  19. Issues in Clusters and Scalability • Developing and Testing Tools • Requires convenient access to large-scale system • Can this co-exist with production computing? • Too many different tools • Why not adopt Unix philosophy? • Example solution: Scalable Unix Tools • Following slides thanks to Rusty Lusk and Emil Ong gropp@mcs.anl.gov

  20. What Are the Scalable Unix Tools? • Parallel versions of common Unix commands like ps, ls, cp, …, with appropriate semantics • A few new commands in the same spirit but without a serial counterpart • Designed for users • New this spring: release of a high-performance implementation based on MPI • One of the original “official” Ptools projects • Original definition published • Proceedings of the Scalable High Performance Computing Conference • http://www.mcs.anl.gov/~gropp/papers/1994/shpcc-paper.ps gropp@mcs.anl.gov

  21. Motivation • Basic Unix commands (ls, grep, find, …) are quintessential tools. • Simple syntax and semantics (except maybe find syntax) • Have same component interface (lines of text, stdin, stdout) • Unix redirection ( <, >, and especially | ) allow tools to be easily combined into powerful command lines • “Old-fashioned”: no GUI, little interactivity gropp@mcs.anl.gov

  22. Motivation, continued • Many parallel machines have Unix and at least partially distinct file systems on each node. • A user needs simple and familiar ways to • Copy a file to local file space on each node • Find all processes running on all nodes • Test for conditions on all nodes • Avoid getting swamped with output • On large machines these commands are not useful unless they take advantage of parallelism in their execution. gropp@mcs.anl.gov

  23. Design Goals • Familiar to Unix users • Similar names (we chose pt<Unix-name>) • Same arguments, similar semantics • Interact well with traditional Unix commands, facilitating construction of powerful command lines • Run at interactive speeds (requires scalability in parallel process manager startup and handling of I/O) gropp@mcs.anl.gov

  24. ptcp ptmv ptrm ptln ptmkdir ptrmdir ptchmod ptchgrp ptchown pttest[ao] Part I: Parallel Versions of Traditional Commands • Select nodes to run on by • -all • -m <file of hostnames> • -M <hostlist> • ‘donner dasher blitzen’ • ‘ccn%d@1-32,42,65-96’ gropp@mcs.anl.gov

  25. Part II: Traditional Commands Producing Lots of Output • ptcat, ptls, ptfind • Have potential to produce lots of output, and the source is also of interest • With –h option: ptls –M node%d@1-3 -h [node1] myfile1 [node2] [node3] myfile1 myfile2 gropp@mcs.anl.gov

  26. Performance of ptcp • Copying a single 10 MB file • to 241 nodes in 14 seconds Time to Copy 10MB file Total Bandwidth gropp@mcs.anl.gov

  27. Watching ptcp ptcp –all bigfile BIGFILE X=1 while true; do \ ptexec -all 'echo "`hostname`: `ls -s BIGFILE \ | awk \ "{print \\"percentage\\" \$ (1)/98 \\" blue \ red\\"}\"`"' | ptdisp -h gropp@mcs.anl.gov

  28. Percentage of Completion gropp@mcs.anl.gov

  29. Percentage of Completion gropp@mcs.anl.gov

  30. Availability • Open source • Get from http://www.mcs.anl.gov/sut • All source, man pages • Configure, make, on Linux, Solaris, Irix, AIX • Needs MPI implementation with mpirun • Developed with Linux, MPICH, MPD, on Chiba City at Argonne gropp@mcs.anl.gov

  31. Chiba City Scalability Testbed • http://www-unix.mcs.anl.gov/chiba/ gropp@mcs.anl.gov

  32. Some Other Efforts in Scalable Clusters • Large Programs • DOE Scientific Discovery through Advanced Computing (SciDAC) • NSF Distributed Terascale Facility (DTF) • OSCAR • Goal is a “cluster in a box” CD • PVFS (Parallel Virtual File System) • Many Smaller Efforts • www.beowulf.org, etc. • Commercial Efforts • Scyld, etc. gropp@mcs.anl.gov

More Related