benchmarking fortran c c
Download
Skip this Video
Download Presentation
Benchmarking FORTRAN / C / C++

Loading in 2 Seconds...

play fullscreen
1 / 31

Benchmarking FORTRAN - PowerPoint PPT Presentation


  • 208 Views
  • Uploaded on

Benchmarking FORTRAN / C / C++. For LINUX 2.0 (Debian and Red Hat), and Windows NT 4.0. C. Leggett. Compilers. Linux: Debian 2.0 g++: egcs-2.91.57 19980901 (egcs-1.1 release) KCC: 3.3c -- June 24, 1998 f77 g77: egcs-2.91.57 19980901 (egcs-1.1 release) Linux: Red Hat 5.1

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Benchmarking FORTRAN ' - inocencia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
benchmarking fortran c c

Benchmarking FORTRAN / C / C++

For LINUX 2.0 (Debian and Red Hat), and Windows NT 4.0

C. Leggett

compilers
Compilers
  • Linux: Debian 2.0
    • g++: egcs-2.91.57 19980901 (egcs-1.1 release)
    • KCC: 3.3c -- June 24, 1998
    • f77
    • g77: egcs-2.91.57 19980901 (egcs-1.1 release)
  • Linux: Red Hat 5.1
    • g++: egcs-2.90.27 980315 (egcs-1.0.2 release)
    • g++: egcs-2.91.57 19980901 (egcs-1.1 release)
    • KCC: 3.3c -- June 24, 1998
    • g77: egcs-2.90.27 980315 (egcs-1.0.2 release)
  • Windows NT 4.0 (sp 3)
    • Micro$loth Visual C++ v6.0
haney kernels
Haney Kernels
  • Measures relative performance of FORTRAN, C, and C++
  • C code is compiled by C++ compiler
  • 3 Kernels:
    • Complex Matrix Multiply
      • Use complex classes and operator overloading
    • Real Matrix Multiply
      • Use real matrix classes with storage management and indexing
    • Vector Operations
      • Use array classes and operator overloading
haney kernels1
HaneyKernels
  • FORTRAN is usually faster than C which is usually faster than C++
  • g77 is faster than f77, which makes heavy use of f2c
  • Debian’s version of g++ (egcs-1.1) is more recent and considerably faster than the Red Hat 5.1 version (egcs-1.0.2)
  • The KAI compiler is not all its cracked up to be
bench suite
Bench++ Suite

http://www.research.att.com/~orost/bench_plus_plus.html

  • Benchmarks price of various C++ features, and compares C/C++ performance
  • Incorporates many ‘Standard’ benchmarks such as Drystone, Whetstone, Hennessey, OOPACK, and Stepanov benchmarks
bench
Bench++
  • Drystone
  • Whetstone
  • Hennesy benchmarks (11)
bench composite
Bench++Composite
  • Tracker (float)
  • Tracker (double)
  • Tracker (float + int)
  • Orbit
  • Kalman
  • Centroid
bench dynamic allocation
Bench++Dynamic Allocation
  • malloc & free: 1000 ints
  • malloc & init & free: 1000 ints
  • new & delete: 1000 ints
  • new & init & delete: 1000 ints
  • alloca: 1000 ints (FAIL)
  • alloca & init: 1000 ints
bench exceptions
Bench++Exceptions
  • Local exception caught
  • class method exception caught
  • procedure exception caught: 3 deep
  • procedure exception caught: 4 deep
  • declared proc exception caught: 4 deep
  • proc exception caught: 4 deep re-thrown at each level
  • proc exception caught: implmnt using setjmp/longjmp
bench coding style
Bench++Coding Style
  • boolean assignment
  • boolean if
  • 2-way if/else
  • 2-way switch
  • 10-way if/else
  • 10 way switch
  • 10 way sparse switch
  • 10 way virtual function call
bench i o timing
Bench++I/O Timing
  • iostream.getline: 20 char buffer
  • iostream.>>: 20 chars in loop
  • iostream.<<: 20 char buffer
  • iostream.<<: 20 chars in loop
  • istrstream.>>: int
  • istrstream.>>: float
  • fstream.open/fstream.close
bench machine level features
Bench++Machine Level Features
  • Packed bit arrays
  • unpacked bit arrays
  • packed bit ops in loop
  • unpacked bit ops in loop
  • int conversion
  • 10 float conversion
  • bit fields
  • bit fields and packed bit arrays
  • pack and unpack class objects
bench loop overhead
Bench++Loop Overhead
  • “for” loop
  • “while” loop
  • infinite loop w/ break
  • 5-iteration loop
bench optimizer performance
Bench++Optimizer Performance
  • Constant propagation
  • local common sub-expression
  • global common sub-expression
  • unnecessary copy
  • code motion
  • induction variable
  • reduction in strength
  • dead code
  • loop jamming
  • redundant code
  • unreachable code
  • string ops
bench procedure calls
Bench++Procedure Calls
  • procedure call: no args
  • procedure call: no args, catches exceptions
  • static class method call: no args, catches exceptions
  • inline procedure call: no args
  • static class method call: 1 int arg: catches exception
  • static class method call: 1 int *arg: catches exception
  • static class method call: 1 int &arg: catches exception
  • procedure call: no pars, called thru pointer, catch exception
  • procedure call: 10 int arg: catch exception
  • procedure call: 20 int arg: catch exception
  • procedure call: 10 (3-int) arg: catch exception
  • procedure call: 20 (3-int) arg; catch exception
  • class method call: 1 this arg: catch exception
  • virtual class method call: 1 this arg: catch exception
  • virtual const class method call: 1 this arg: catch exception
  • ibid, called in loop to check lookup optimization
bench abstraction
Bench++Abstraction
  • max: C++ style
  • max: C style
  • matrix: C++ style
  • matrix: C style
  • iterator: C++ style
  • iterator: C style
  • complex: C++ style
  • complex: C style
  • Stepanov C++ Abstraction
bench1
Bench++
  • float matrix multiply vs integer
  • double covariance matrix vs float
  • float & int covariance matrix vs float
  • new/delete vs malloc/free
bench2
Bench++
  • 4 deep exception handled vs 3 deep
  • declared exception handled vs not declared
  • 4 deep rethrown exception vs 4 deep
  • 4 deep setjmp/longjmp vs 4 deep exception
bench3
Bench++
  • if test vs logical equation
  • 2-way switch vs 2-way if/else
  • 10-way switch vs 10-way if/else
  • 10-way sparse switch vs 10-way if/else
  • 10-way sparse switch vs 10-way switch
  • 10-way virtual function vs 10-way switch
bench4
Bench++
  • 20-iostream.>> vs 20 char iostream.getline & gcount
  • 20-iostream.<< vs 20 char iostream.<<
  • istrstream.>> a float from local string vs int
  • boolean operations on bit arrays vs byte arrays
  • boolean operations on bits in loop vs bit arrays
  • boolean operations on bytes in loop vs byte arrays
  • while loop vs for loop
  • simple loop w/break vs for loop
bench5
Bench++
  • Constant Propagation
  • Local Common-sub
  • Global Common-sub
  • Unnecessary copy
  • Code Motion
  • Induction Variable
  • Reduction in Strength
  • Dead Code
  • Loop Jamming
  • Redundant Code
  • Unreachable Code

Hand optimized vs compiler optimized (higher is better)

bench7
Bench ++
  • Static class method call vs local procedure call
  • Inline procedure call vs inlineable local procedure call
  • Static class method call w/ 1 int* par vs int par
  • Static class method call w/ 1 int& par vs int par
  • Call thru a procedure variable vs local procedure call
  • Static class method call w/ 10 int pars vs 1 int par
  • Static class method call w/ 20 int pars vs 1 int par
  • Static class method call w/ 10 3-int pars vs 10 1-int pars
  • Static class method call w/ 20 3-int pars vs 20 1-int pars
  • Class method call w/ this par vs static class method call w/ int par
  • Virtual class method call vs class method call
  • Virtual const class method call vs class method call
  • Loop of virtual const class method call vs no loop
bench8
Bench++
  • C++ style Max vs C style
  • C++ style Matrix vs C style
  • C++ style Iterator vs C style
  • C++ style Complex vs C style
bench9
Bench++
  • Stepanov abstraction level n(12 .. 1) vs level 0
bench stepanov abstraction
Bench++Stepanov Abstraction
  • Level 0: Use a simple Fortran-like loop
  • Level 1,3,4,5,9,11: use doubles
  • Level 2,4,6,7,10,12: use Double - double wrapped in a class
  • Level 1,2: use regular pointers
  • Level 3,4: use pointers wrapped in a class
  • Level 5,6: use pointers wrapped in a reverse-iterator adapter
  • Level 7,8: use wrapped pointers wrapped in a reverse-iterator adapter
  • Level 9,10: use pointers wrapped in a reverse- iterator adapter wrapped in a reverse-iterator adapter
  • Level 11,12 use wrapped pointers wrapped in a reverse iterator adapter wrapped in a reverse- iterator adapter
bench some conclusions
Bench ++Some Conclusions
  • There is a significant increase in speed between versions 1.0 and 1.1 of egcs.
  • vC++ handles exceptions very well, g++ and KCC do not, though KCC does better at declared exceptions
  • alloca is much faster than new/delete. Too bad you can’t use it portably.
  • switch is much faster than if/else, but virtual functions don’t have much of a penalty
  • KCC does not do well with I/O (c/f g++), but reading characters as strings (iostream.getline) improves performance
  • KCC has trouble optimizing some simple loop structures
  • KCC handles procedure calls very badly, but shows a lower overhead than g++ or vC++ when large numbers of parameters are passed
  • C++ does not optimize such things as dead and redundant code, code motion, and local common sub-expressions on Linux platforms.
  • Abstraction has serious consequences, but KCC tends to handle complex class well.
observations on vc
Observations on VC++
  • Visual C++ 6.0 is not obviously superior. Neither is it incredibly inferior. Much like the other compilers, it does well in some areas, and poorly in others:
    • vC++ handles abstraction and procedure calls badly
    • vC++ handles exceptions well
    • vC++ does I/O well, except for opening and closing files
    • vC++ handles simple optimization well (ie constant propagation, common sub-expressions, redundant code, etc)
alternatives to improving compiler optimization
Alternatives to Improving Compiler Optimization
  • Decrease the level of abstraction - write kernels in low level languages such as C or Fortran
  • Put C++ wrappers around low level kernels to retain some advantages of C++. Only useful for large chunks of code.
  • Use macros and templates cleverly.

Details at:

http://annwm.lbl.gov/~leggett/bench/

ad