1 / 24

Technologies for the Future: CLUSTERS

Technologies for the Future: CLUSTERS. Anne C. Elster Dept. of Computer & Information Science (IDI) Norwegian Univ. of Science & Tech. (NTNU) Trondheim, Norway. NOTUR 2003. Clusters (Networks of PCs/Workstation). Are they suitable for HPC? Advantage:

chesterc
Download Presentation

Technologies for the Future: CLUSTERS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Technologies for the Future: CLUSTERS Anne C. Elster Dept. of Computer & Information Science (IDI) Norwegian Univ. of Science & Tech. (NTNU) Trondheim, Norway NOTUR 2003 NOTUR Cluster proj. status

  2. Clusters (Networks of PCs/Workstation) Are they suitable for HPC? Advantage: Cost-effective hardware since uses COTS (Commercial Of-The-Shelf) parts BUT: Typically much slower processor interconnectes than traditional HPC systems What about usability? NTNU IDI’s 40-node AMD 1.46GHz cluster 2GB RAM, 40GB disk, Fast Ethernet NOTUR Cluster proj. status

  3. Cluster Technologies:NOTUR Emerging Technology projectCollaboration between NTNU & Univ. of Tromsø Goal: Analyze Cluster technologies’ suitability for HPC by looking at some of the most interesting NOTUR applications • The results will provide a foundation for decisions regarding future HPC programs NOTUR Cluster proj. status

  4. Main Collaborators include • Anne C. Elster (IDI, NTNU) – Project leader • Otto Anshus & Tore Larsen (CS, U of Tromsø) • Tor Johansen & staff (CC, U of Tromsø) • Torbjørn Hallgren (IDI, NTNU) • Einar Rønquist (IMF, NTNU) • Master & Ph.D. Students and Post Docs at NTNU and Univ. of Tromsø NOTUR Cluster proj. status

  5. General Issues to Consider: • Why cluster vs. Powerful desktop vs. Large SMPs? • What are the total costs associated with clusters (hardware, software, support, usability) • 32-bit vs. 64-bit architectures NOTUR Cluster proj. status

  6. Cluster Project ACTIVITIES: A.1 Profiling & Tuning Selected Applications: A.1.a/b Physics and Chemistry Codes (Elster & students, Dept. of Computer Science Dept., NTNU) A.1.2a Profiling & User-Analysis of Amber, Dalton & Gaussian (Tor Johansen & staff, Comp. Center, U of Tromsø) A.1.2b Optimization & tool analysis of Dalton (Anshus & PostDoc/student, Dept. of Comp. Sci., U of Tromsø) NOTUR Cluster proj. status

  7. Cluster Project ACTIVITIES continuted: A.2 Execution Monitoring (Anshus, Tore Larsen & students, CS, U of T) A.3 Visualization servers, etc. (Hallgren, Elster & students, CS, NTNU) A.4 Impact of future numerical algorithms (Rønquist & student, Dept. of Mathematics, NTNU A.5 Interface with NOTUR ET – Grid Project (Elster, Harald Simonsen and colleagues, staff & students associated with the NOTUR ET Cluster & Grid projects) NOTUR Cluster proj. status

  8. A.1.a/b Physics & Chemistry Codes(Elster & students, Dept. of CS Dept., NTNU) • FORTRAN problems: • Different FORTRAN implementations have non-stardard add-ons (e.g. FORTRAN 90) • Leads to great difficulty in porting code to a different platform with a different Fortran compiler (e.g. by a different vendor) Lessons Learned so far -- Paul Sack’s work on a Physics application (report available on the Web) NOTUR Cluster proj. status

  9. A.1.a/b Physics & Chemistry Codescontin. • Performance of programs can individually vary on different machines Åsmund Østvold wrote a proj. report on porting PROTOMOL from an SMP w/ MPI one-siden communication primitives (MPI put/get) to a cluster. (available on WWW) He also did a MS study with SCALI on various MPI broadcast algorithms and bechmarking NOTUR Cluster proj. status

  10. A.1.a/b Physics & Chemistry Codescontin.2 Ongoing work with Snorre Boasson & Jan Christian Meyer on porting of PIC code using Pthread (SMP primitives) to MPI . Preliminary report will be available later this week. ”Recent Trends in Cluster Computing” presented at ParCo 2003 by Elster et. al. includes harware trends and survey of libraries and performance tools. NOTUR Cluster proj. status

  11. A.1.2a Profiling & User-Analysis of Amber, Dalton & Gaussian(Tor Johansen & staff, Comp. Center, U of Tromsø) Koordineringsarbeide • Reise: NOTUR 2003 • Porting og testing av Amber og Scali SW NOTUR Cluster proj. status

  12. A.1.2b Optimization & tool analysis of Dalton(Anshus & PostDoc/students, CS, U of Tromsø) “Ytelsesmålinger gjort på DALTON” A Report for the NOTUR Project Emerging Technologies: Cluster • Daniel Stødle, Otto J. Anshus, John Markus Bjørndalen “Survey of optimizing techniques for parallel programs running on computer clusters” • Espen S. Johnsen, Otto J. Anshus, John Markus Bjørndalen, Lars Ailo Bongo (September 29, 2003) NOTUR Cluster proj. status

  13. A.1.2b Optimization & tool analysis of Dalton(Anshus & PostDoc/student, IFI, U i Tromsø) CONTINUED RESULTS: • Dalton scales pretty well – 25x speedup on 32 nodes NOTE: Only with-out caching temp. If use cache – only 3-5x speedup on 32! • Even through the 8-way cluster had no local disk (only a netork file system), the sequential Dalton code was significantly faster. This indicates that network bandwith may not be a problem if caching is used in the parallel • Communication pattern: master-slave "bag-of-tasks" oriented programs with little communicaiton & sychronization and generally good utilization of the slave nodes. • Master does relatively little work and is blocked most of the time • Finally checked if the master node could be a bottle neck, but could not detect differences in execution time when Master put on a slow node vs. a fast node.. NOTE: Only tested up to 32 nodes …using larger no. of nodes may limit performance by overloading the master node. NOTUR Cluster proj. status

  14. A.1.2b Optimization & tool analysis of Dalton(Anshus & PostDoc/student, IFI, U i Tromsø) CONTINUED 2 Thanks to: • Kenneth Ruud, Chemistry, UiT • Roy Dragseth, CC UiT for support on the Itanium at U og Tromsø. NOTUR Cluster proj. status

  15. A.2 Execution Monitoring (Anshus, Tore Larsen & students, CS, U of T) • “Survey of execution monitoring tools for computer clusters” • Espen S. Johnsen, Otto J. Anshus, John Markus Bjørndalen, Lars Ailo Bongo, Sept 03 • “Performance Monitoring” • Lars Ailo Bongo, Otto J. Anshus, John Markus Bjørndalen NOTUR Cluster proj. status

  16. A.3 Visualization servers, etc.(Hallgren, Elster & students, CS, NTNU) On going work with Torbjørn Vik Preliminary report on survey of how clusters are currently used in visualization: To types of Cluster usages:: • off-line (non-real-time rendering). Often called "renderingfarms" with lots of nodes which all work on a frame each of a larger animation. • Typically used in the film industry and other areas where interactivity and/or real-time rendering not needed. • All larger 3D modelling programs such as Lightwave, 3DStudio, Maya has functionality for this. • * on-line ( realtime). Most interesting from a technical viewpoint... NOTUR Cluster proj. status

  17. A.3 Visualization servers, etc. - Contin. • Cluster brukes innenfor interaktiv visualiseringsprogramvare for å • øke ytelsen, • muliggjøre større datasett, • unngå begrensninger i lokal hardware. • De fleste visualiseringscluster fungerer prinsipielt ved at en bruker sitter på en klientmaskin som i seg selv ikke har noe særlig kapasitet. Clusteret tar seg av all beregning og sender bare de ferdige bildene til klienten. Klientmaskinen sørger også for å ta imot input fra bruker og sende disse til cluster. Datasett for slik visualisering er ofte svært store, og, avhengig av situasjonen, brukes både polygonbasert og voxelbasert rendering. • Hovedproblemet med å få clusters brukbare innenfor interaktive visualiseringsprogram er forsinkelser pga nettverk. Dette løses ved å redusere tiden som brukes for å overføre bilder mellom cluster og klient. Det kan enten løses ved å • redusere datamengden (komprimeringsmetoder) eller • øke nettverksytelsen. Eller begge. • Parallelitet i selve clusteret baseres på uavhengighetsforhold mellom forskjellige data. Det kan være uavhengigheter mellom forskjellige deler i samme datasett, eller det kan være uavhengigheter mellom forskjellige frames i et 4D datasett. Load-balancing blir ofte et problem i slike sammenhenger og er et viktig forskningsområde. • Hvilken metode som brukes for load-balancing er som oftest svært kontekstavhengig. • Clusterprogramvare for visualisering fremdeles manglende ?? NOTUR Cluster proj. status

  18. A.4 Impact of future numerical algorithms (Rønquist & student, Dept. of Mathematics, NTNU • Rønquist student Staff (now at Simulasenteret) wrote a report based on his summer jobb • May add in experiences from Elster’s group – fall 2003 NOTUR Cluster proj. status

  19. A.5 Interface with NOTUR ET – Grid Project (Elster, Harald Simonsen and colleagues, staff & students associated with the NOTUR ET Cluster & Grid projects) • Test node established at NTNU • Andreas Botnen(USIT) and • Robin Holtet (IDI, now ITEA) • May use IDI’s 30-40-node cluster in testgrid • Meetings • Between Elster and Simonsen’s groups • Robin Holtet and Elster’s student Thorvald Natvig to Linköping meeting this month. • Collaborations re. National GRID and EEGE • Student from NTNU and UiO at CERN NOTUR Cluster proj. status

  20. Main cluster issues: • Global operations have more severe impact on cluster performance than traditional supercomputers since communication between processors take relatively more of the total execution time • SCALABILITY!! NOTUR Cluster proj. status

  21. Lessons leared • Clusters generally have cheap hardware, but may cause increased ”hidden” costs regarding: • More incompatible compilers, especially Fortran 90 (also C++) • Some applications are non-trivial to port from a share-memory paradigm to a distributed memory paradigms • Some applications require high-bandwidth interconnects which drive up costs (e.g. SGI Altix) • Power and cooling costs (ref. Brian Vinter) • Stability, recovery • Overall costs and scalability should be further studied NOTUR Cluster proj. status

  22. The ”Ideal” Cluster -- Hardware • High-bandwidth network • Low-latency network • Low Operating System overhead (tcp causes ”slow start”) • Great floating-point performance (64-bit processors or more?) NOTUR Cluster proj. status

  23. The ”Ideal” Cluster -- Software • Compiler that is: • Portable • Optimizing • Do extra work to save communication • Self-tuning /Load -balanced • Automatic selection of best algorithm • One-sided communication support? • Optimized middleware NOTUR Cluster proj. status

  24. For more information: A dozen or more reports associated with this project will be made available on the web at: http://www.idi.ntnu.no/~elster Email: elster@idi.ntnu.no NOTUR Cluster proj. status

More Related