Benchmarks of a weather forecasting research model
This presentation is the property of its rightful owner.
Sponsored Links
1 / 14

Benchmarks of a Weather Forecasting Research Model PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on
  • Presentation posted in: General

Benchmarks of a Weather Forecasting Research Model. Daniel B. Weber, Ph.D. Research Scientist CAPS/University of Oklahoma ****CONFIDENTIAL**** August 3, 2001. UNM Los Lobos INTEL Benchmark Summary.

Download Presentation

Benchmarks of a Weather Forecasting Research Model

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Benchmarks of a weather forecasting research model

Benchmarks of a Weather Forecasting Research Model

Daniel B. Weber, Ph.D.

Research Scientist

CAPS/University of Oklahoma

****CONFIDENTIAL****

August 3, 2001


Unm los lobos intel benchmark summary

UNM Los Lobos INTEL Benchmark Summary

  • 20% increase in compute time for 2proc/node configuration on Intel Based systems due to bus competition

  • File system very slow on Intel based systems without fiber channel

  • File system is a weak link (UNM-LL)

    • 5.5mb/sec sustained for 480 2proc/node tests writing 2.1mb files from 8 separate processors simultaneously

    • passing through linux file server not r6000


Alpha benchmark summary

ALPHA Benchmark Summary

  • ES-40 Alpha EV-67 (TCS) is 5 times faster computationally than the INTEL PIII/733

  • Alpha (TCS) file system is very slow at times, need to look at the configuration, shows potential for very fast transfer rates

  • MPI overhead for a 256 processor TCS job is on the order of 15%, very good network performance.


Alpha benchmark summary1

ALPHA Benchmark Summary

  • ES-45 Alpha EV-67 (TCS) is 1.5 times faster computationally than the ES-40

  • 4-5 times faster than Intel PIII-1Ghz (using the Intel F90 compiler).


Arps optimization revisited

ARPS Optimization Revisited

  • Two modes:

    • Loop Optimization

    • MPI optimization

  • MPI requirements 30% on 450 processors on the Platinum IA-32 NCSA Cluster.

  • Calculations (70+%) primarily 3-D DO-Loops.


Arps optimization revisited1

ARPS Optimization Revisited

  • MPI Optimization:

    • Hide communications via calculations

    • Requires hand coding and knowledge of the computational structure - a very time intensive task.

    • Maximum gain is limited to the communication costs (30%), realistically we may obtain a 15% improvement


Arps optimization revisited2

ARPS Optimization Revisited

  • Loop Optimization for Vector Processors

  • Issues:

    • Length of vector pipeline, the longer the better.

    • KMA work shows nearly 75% peak (6GFLOPS per processor on the SX-5).

    • Code was hand tuned, hundreds of loops.


Arps optimization revisited3

ARPS Optimization Revisited

  • Loop Optimization for Scalar Processors

  • Issues:

    • Cheap, fast processors.

    • Cache reuse is very important.

    • Rethink the order/layout of the computational structure of ARPS.

    • Some optimization was included in 1997, that removed redundant computations and combined loops (good for both Vector and Scalar machines).

    • CPU utilization only 10-20% of peak.


Arps optimization revisited4

ARPS Optimization Revisited

  • New Approach to Loop Optimization

    • Combine loops further, the result is reduced loads and stores. This is very important on the new Intel technology.

    • Cache reuse is critical!

    • Force improvements in the compiler technology.

    • Our goal is to generate optimizations that are platform INDEPENDENT.

    • Example


Benchmarks of a weather forecasting research model

  • Horizontal Advection - Original Version

  • DO k=2,nz-2 ! compute avgx(u) * difx(u)

  • DO j=1,ny-1

  • DO i=1,nx-1

  • tem2(i,j,k)=tema*(u(i,j,k,2)+u(i+1,j,k,2))*(u(i+1,j,k,2)-u(i,j,k,2))

  • END DO’s

  • DO k=2,nz-2 ! compute avg2x(u)*dif2x(u)

  • DO j=1,ny-1

  • DO i=2,nx-1

  • tem3(i,j,k)=tema*(u(i-1,j,k,2)+u(i+1,j,k,2))*(u(i+1,j,k,2)-u(i-1,j,k,2))

  • END DO’s

  • DO k=2,nz-2 ! compute 4/3*avgx(tem2)+1/3*avg2x(tem3)

  • DO j=1,ny-1 ! signs are reversed for force array.

  • DO i=3,nx-2

  • uforce(i,j,k)=uforce(i,j,k)

  • : +tema*(tem3(i+2,j,k)+tem3(i-1,j,k))

  • : -temb*(tem2(i-1,j,k)+tem2(i,j,k))

  • END DO’s


Benchmarks of a weather forecasting research model

  • Horizontal Advection - Modified Version

  • Three loops are merged into one large loop that reuses data and reduces loads and stores.

  • DO k=2,nz-2

  • DO j=1,ny-1

  • DO i=3,nx-2

  • uforce(i,j,k)=uforce(i,j,k)

  • : +tema*((u(i,j,k,2)+u(i+2,j,k,2))*(u(i+2,j,k,2)-u(i,j,k,2))

  • : +(u(i-2,j,k,2)+u(i,j,k,2))*(u(i,j,k,2)-u(i-2,j,k,2)))

  • : -temb*((u(i,j,k,2)+u(i+1,j,k,2))*(u(i+1,j,k,2)-u(i,j,k,2))

  • : + (u(i-1,j,k,2)+u(i,j,k,2))*(u(i,j,k,2)-u(i-1,j,k,2)))

  • END DO’s...


Benchmarks of a weather forecasting research model

optimized

original


Benchmarks of a weather forecasting research model

optimized

original


  • Login