1 / 75

Planning and building linux based cluster for NWP

Planning and building linux based cluster for NWP. Climatological Research Institute (CRI Cluster). Dr. Jamali Chezgi. Outline. Introduction Our problem Our solution Building CRI Cluster Monitoring and controlling Benchmarking Feature plans references. Simulation. Nature.

molly
Download Presentation

Planning and building linux based cluster for NWP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Planning and building linux based cluster for NWP Climatological Research Institute (CRI Cluster) Dr. Jamali Chezgi

  2. Outline • Introduction • Our problem • Our solution • Building CRI Cluster • Monitoring and controlling • Benchmarking • Feature plans • references

  3. Simulation Nature Theory Experiment Introduction

  4. Environment / Climate / Weather • Aeronautics and space exploration • Energy research • Virtual reality • Scientific visualization • Health sciences

  5. Make observation Collect and process data Run forecast model Create product Provide for end users

  6. Main issues • Very large data sets • Distributed data • High processing required • Need to real-time processes • Coupled models

  7. Our problems • Data management • Lisa • NWP models • ARPS • MM5 • HRM • Climatological models • NCM

  8. NWP models • ARPS • MM5 • HRM

  9. ARPS • Advanced Regional Prediction System • Open source • Parallel code • Running on the all unixes • IBM RS/6000 Workstation • Cray C-90 • Cray T3D • Cray J90 • CM-5 • PC LINUX

  10. ARPS Model Process Flow chart Indexed terrain elevation file ( 1°,5 min,or 30 sec ) ARPSTERN ( Terrain data preprocessor ) Arpstern.input User-supplied gridded data (e.g.OLAPS.NMC analysis ) ARPSSFC ( Surface characteristics data preprocessor ) Soil .vegetation type and other land-use data EXT2 ARPS ( Gridded data interpolater) Arps40.input Arps40.input Rawinsondes,VAD. And wind profilers ARPS Data Assimilation System ARPSRETRV )Doppler Radar Data Retrieval system( Doppler Radar Data Doppler Radar Data ARPS Analysis System Single-level data ARPS ( Main model driver ) Arps40.input ARPSPLT ( Vector graphics post-processor) ARPSCVT ( History data format converter ) Arpscvt .input Arpsplt.input Other post- processing tools Visualizationpackages ( Savi3D,AVS etc )

  11. Climatological Models

  12. Our solution Memory: using bigger memory ? CPU: using better CPU ? Cluster: for powering Memory and CPU

  13. Building Cluster

  14. CRI Cluster

  15. Prebuilded clusters? • direct relation between technology and end user • Customize it for our users • obtaining this technology • Better use • We can upgrade it • Lower costs • Samples on the world

  16. OU Cluster • Breakdown of Nodes • 132 Compute Nodes (computing jobs) • 8 Storage Nodes (Parallel Virtual File System) • 2 Head Nodes (login, compile, debug, test) • 1 Management Node (PVFS control, batch queue) • Each Node • 2 Pentium4 XeonDP CPUs (2 GHz, 512 KB L2 Cache) • 2 GB RDRAM (400 MHz, 3.2 GB/sec) • Myrinet-2000 adapter

  17. Cluster Architecture

  18. Cluster room • Space • Packing • Power • Air condition • Easily repairing • Security • Cabling

  19. Linux true multitasking virtual memory shared library demand loading shared copy-on-write executables proper memory management TCP/IP networking Up to 64 GB memory support in i386 IP Virtual server Support Virtual server via NAT Virtual server Tunneling Virtual server direct routing Vlan Fast Switching Bonding driver Eql 386/486 based pc, ARMS, DEC, ALPHA, SUN sparc, M 68000, MIPS, PowerPC, …

  20. Communication protocols • Internet protocols • Low latency protocols • Active messages • Fast messages • VMMC • U-net • BIP

  21. TCP/IP problems for clustering • Latency • for small packets • Bandwidth • for big packets

  22. Protocol overhead system NIC 3) copy 5)Send to NIC 1)Preparing data Internal buffers Os memory User memory User process 2)Sending intrupt 4)Intrupt to sending out data OS

  23. Cluster computing standards • VIA • Combination of the many protocols • Like U-net uses virtual network interface • native and emulated • A version of the emulated VIA has more performance than TCP/IP • MPICH over VIA • Infiniband • Compaq dell HP IBM intel microsoft sun • Replace the shared I/O with a high speed serial,channel based,message passing ,scalable ,switched fabric. • Using HCA and TCA to connect the channel • Uses Six type transfer method: reliable and unreliable connections and datagrams,multicast connections,raw packets • Support DMA • IPv6

  24. Hardware products • Ethernet fast ethernet and gigabit ethernet • Giganet(cLAN) • Myrinet • Qsnet • ServerNet • SCI(Scalable Coherent Interface) • ATM • Fiber Channel • HIPPI • Reflective Memory • ATOLL

  25. Installing and configuring • Installing server • Building services • Auto installing clients • Auto configuring clients • Management of the nodes

  26. NIS configuration In the server 1) Specifying domain name # domainnam <DOMAIN_NAME> • Putting in the “/etc/Sysconfig/network” NISDOMAIN=<DOMAIN_NAME> 3) Specifying server name in “/etc/yp.conf ” : NISDOMAIN <DOMAIN_NAME> SERVER <SERVER_NAME> 4) Restarting daemons : # /etc/ rc.d/ ypserv restart # /etc/ rc.d/ypbind restart 5) Putting it in the init 6)Editing “/etc/yp/Makefile” • MERGE_PASSWD= FALSE TRUE • MERGE_GROUP=FALSE TRUE • delete netgrp from all options. 7)Bulding NIS Database : #/usr/lib/yp/ypinit -m 8) If you make any changes in the feature only run this # cd /var/yp; make

  27. NIS configuration In the client 1) Specifying domain name # domainnam <DOMAIN_NAME> 2) Putting in the “/etc/Sysconfig/network” NISDOMAIN=<DOMAIN_NAME> 3) Specifying server name in “/etc/yp.conf ” : NISDOMAIN <DOMAIN_NAME> SERVER <SERVER_NAME> 4) Restarting daemons : # /etc/ rc.d/ypbind restart 5) Putting it in the init 6) Testing it with logging in with the server users

  28. Monitoring and controlling 1)scripts: perl python bash 2) Prebuilded Webmin Scyld SCD

  29. Hardware monitoring and control(IceBox) • Icebox management with hardware • monitor temperatures within nodes and remotely reset motherboards through internally placed probes • SNMP compliant • DHCP or static network configuration • NIMP (Network ICE Management Protocol) • SIMP (Serial ICE Management Protocol) • Out-of-band Serial Data Buffering • Accessible with several protocols (NIMP, SIMP, Null Modem, Telnet, SNMP, ClusterWorX) • Remote temperature monitoring of CPU temperatures • Remote Power Management • Power sequencing to start-up nodes • Optional cabinet temperature monitoring (eight sensors per ICE Box) • Node reset • Multiple ICE Boxes scale to support large clusters • Embedded CPU powered by Linux for stable run-time environment • Ability to easily and safely update ICE Box Operating System without cluster downtime

  30. Security • SSH • PAM • Xinetd

  31. Running ARPS • Fortran 77 compiler (GNU) • Pre processing data • BC and IC data from other models • Post processing tools (NCARG) • Running flowchart • Preprocessing (always one time) • splitting • Initializing • Boundary conditions • Running • Joining • Post processing (another computers)

  32. Parallel architecture of the ARPS

  33. Transform Tool

  34. 800*400 200*200 800*800

  35. 10 km 3 km 1 km Grid computing? 1-Big domain low res  coarse domain and better res 2-in data assimulation code goes to the near of data

  36. AUI

  37. Benchmarking • ARPS results • GMandel • BPS

  38. Performance Utilities • AIMS - instrumentors, monitoring library, and analysis tools • MPE logging library and Nupshot performance visualization tool • Pablo - monitoring library and analysis tools • Paradyn - dynamic instrumentation and run-time analysis tool • SvPablo - integrated instrumentor, monitoring library, and analysis tool • VAMPIRtrace monitoring library and VAMPIR performance visualization tool • VT - monitoring library and performance analysis and visualization tool for the IBM SP

  39. ARPS performance • Performance is better for larger domain per CPU • Because of the network limitation at the cluster and we need larger calculation per data transfer.

  40. Model situation • 200*200 per processor • Prediction time = 60s • output = NONE • Dtbig = 6s • 1km * 1km * 500m grids

  41. --200 * 200 per domain {200 x 200}-1 cpu-- ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage ----------------------------------------------- Initialization : 0.760000E+01s 1.40% Data output : 0.829005E+01s 1.53% Wind advection : 0.190701E+02s 3.52% Scalar advection: 0.397800E+02s 7.34% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.618995E+01s 1.14% Small time steps: 0.241000E+03s 44.48% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.874099E+02s 16.13% Comput. mixing : 0.352601E+02s 6.51% Rayleigh damping: 0.271003E+01s 0.50% TKE src terms : 0.287300E+02s 5.30% Bound.conditions: 0.220026E+00s 0.04% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.452400E+02s 8.35% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.169800E+02s 3.13% Entire model : 0.541820E+03s 100.00% 0.541820E+03s

  42. --200 * 200 per domain {400 x 200}-2 cpu-- ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage ----------------------------------------------- Initialization : 0.763000E+01s 1.41% Data output : 0.822997E+01s 1.52% Wind advection : 0.190600E+02s 3.52% Scalar advection: 0.402001E+02s 7.42% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.615997E+01s 1.14% Small time steps: 0.241520E+03s 44.56% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.872100E+02s 16.09% Comput. mixing : 0.351900E+02s 6.49% Rayleigh damping: 0.276001E+01s 0.51% TKE src terms : 0.285300E+02s 5.26% Bound.conditions: 0.240047E+00s 0.04% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.451199E+02s 8.32% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.168399E+02s 3.11% Entire model : 0.542000E+03s 100.00% 0.542000E+03s

  43. --200 * 200 per domain {400 x 400}-4 cpu-- ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage ----------------------------------------------- Initialization : 0.762000E+01s 1.40% Data output : 0.827001E+01s 1.52% Wind advection : 0.191300E+02s 3.52% Scalar advection: 0.404000E+02s 7.44% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.614000E+01s 1.13% Small time steps: 0.241750E+03s 44.53% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.874600E+02s 16.11% Comput. mixing : 0.351000E+02s 6.47% Rayleigh damping: 0.273998E+01s 0.50% TKE src terms : 0.285099E+02s 5.25% Bound.conditions: 0.249939E+00s 0.05% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.451600E+02s 8.32% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.169001E+02s 3.11% Entire model : 0.542850E+03s 100.00% 0.542850E+03s

  44. --200 * 200 per domain {800 x 400}-8 cpu-- ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage ----------------------------------------------- Initialization : 0.758000E+01s 1.39% Data output : 0.827006E+01s 1.52% Wind advection : 0.190499E+02s 3.50% Scalar advection: 0.404402E+02s 7.44% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.619997E+01s 1.14% Small time steps: 0.242260E+03s 44.57% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.873999E+02s 16.08% Comput. mixing : 0.352699E+02s 6.49% Rayleigh damping: 0.271999E+01s 0.50% TKE src terms : 0.286100E+02s 5.26% Bound.conditions: 0.290039E+00s 0.05% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.451000E+02s 8.30% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.169199E+02s 3.11% Entire model : 0.543510E+03s 100.00% 0.543510E+03s

  45. --- {(200-3)*4+3=791 or ~ 800 totally }-16 cpu-- ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage ----------------------------------------------- Initialization : 0.762000E+01s 1.40% Data output : 0.820012E+01s 1.50% Wind advection : 0.191300E+02s 3.50% Scalar advection: 0.403599E+02s 7.39% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.615000E+01s 1.13% Small time steps: 0.243190E+03s 44.55% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.880600E+02s 16.13% Comput. mixing : 0.354600E+02s 6.50% Rayleigh damping: 0.276005E+01s 0.51% TKE src terms : 0.287300E+02s 5.26% Bound.conditions: 0.309933E+00s 0.06% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.455600E+02s 8.35% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.169700E+02s 3.11% Entire model : 0.545870E+03s 100.00% 0.545870E+03s

  46. Gmandel-pvm benchmark calculating with: x1=-0.760416667 y1=-0.354166667 x2=-0.614583333 y2=-0.208333333 limit=1000000 wall time=97 secs. MFLOPS=19556.6 calculating with: x1=-2.000000000 y1=-2.000000000 x2=2.000000000 y2=2.000000000 limit=1000000 wall time=17 secs. MFLOPS=19461.0

More Related