Benchmarking fortran c c
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

Benchmarking FORTRAN / C / C++ PowerPoint PPT Presentation


  • 151 Views
  • Uploaded on
  • Presentation posted in: General

Benchmarking FORTRAN / C / C++. For LINUX 2.0 (Debian and Red Hat), and Windows NT 4.0. C. Leggett. Compilers. Linux: Debian 2.0 g++: egcs-2.91.57 19980901 (egcs-1.1 release) KCC: 3.3c -- June 24, 1998 f77 g77: egcs-2.91.57 19980901 (egcs-1.1 release) Linux: Red Hat 5.1

Download Presentation

Benchmarking FORTRAN / C / C++

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Benchmarking fortran c c

Benchmarking FORTRAN / C / C++

For LINUX 2.0 (Debian and Red Hat), and Windows NT 4.0

C. Leggett


Compilers

Compilers

  • Linux: Debian 2.0

    • g++: egcs-2.91.57 19980901 (egcs-1.1 release)

    • KCC: 3.3c -- June 24, 1998

    • f77

    • g77: egcs-2.91.57 19980901 (egcs-1.1 release)

  • Linux: Red Hat 5.1

    • g++: egcs-2.90.27 980315 (egcs-1.0.2 release)

    • g++: egcs-2.91.57 19980901 (egcs-1.1 release)

    • KCC: 3.3c -- June 24, 1998

    • g77: egcs-2.90.27 980315 (egcs-1.0.2 release)

  • Windows NT 4.0 (sp 3)

    • Micro$loth Visual C++ v6.0


Haney kernels

Haney Kernels

  • Measures relative performance of FORTRAN, C, and C++

  • C code is compiled by C++ compiler

  • 3 Kernels:

    • Complex Matrix Multiply

      • Use complex classes and operator overloading

    • Real Matrix Multiply

      • Use real matrix classes with storage management and indexing

    • Vector Operations

      • Use array classes and operator overloading


Haney kernels red hat egcs 1 0

Haney KernelsRed Hat (egcs 1.0)


Haney kernels debian egcs 1 1

Haney KernelsDebian (egcs 1.1)


Haney kernels1

HaneyKernels

  • FORTRAN is usually faster than C which is usually faster than C++

  • g77 is faster than f77, which makes heavy use of f2c

  • Debian’s version of g++ (egcs-1.1) is more recent and considerably faster than the Red Hat 5.1 version (egcs-1.0.2)

  • The KAI compiler is not all its cracked up to be


Bench suite

Bench++ Suite

  • Written by Joe Orost <[email protected]>

    http://www.research.att.com/~orost/bench_plus_plus.html

  • Benchmarks price of various C++ features, and compares C/C++ performance

  • Incorporates many ‘Standard’ benchmarks such as Drystone, Whetstone, Hennessey, OOPACK, and Stepanov benchmarks


Bench

Bench++

  • Drystone

  • Whetstone

  • Hennesy benchmarks (11)


Bench composite

Bench++Composite

  • Tracker (float)

  • Tracker (double)

  • Tracker (float + int)

  • Orbit

  • Kalman

  • Centroid


Bench dynamic allocation

Bench++Dynamic Allocation

  • malloc & free: 1000 ints

  • malloc & init & free: 1000 ints

  • new & delete: 1000 ints

  • new & init & delete: 1000 ints

  • alloca: 1000 ints (FAIL)

  • alloca & init: 1000 ints


Bench exceptions

Bench++Exceptions

  • Local exception caught

  • class method exception caught

  • procedure exception caught: 3 deep

  • procedure exception caught: 4 deep

  • declared proc exception caught: 4 deep

  • proc exception caught: 4 deep re-thrown at each level

  • proc exception caught: implmnt using setjmp/longjmp


Bench coding style

Bench++Coding Style

  • boolean assignment

  • boolean if

  • 2-way if/else

  • 2-way switch

  • 10-way if/else

  • 10 way switch

  • 10 way sparse switch

  • 10 way virtual function call


Bench i o timing

Bench++I/O Timing

  • iostream.getline: 20 char buffer

  • iostream.>>: 20 chars in loop

  • iostream.<<: 20 char buffer

  • iostream.<<: 20 chars in loop

  • istrstream.>>: int

  • istrstream.>>: float

  • fstream.open/fstream.close


Bench machine level features

Bench++Machine Level Features

  • Packed bit arrays

  • unpacked bit arrays

  • packed bit ops in loop

  • unpacked bit ops in loop

  • int conversion

  • 10 float conversion

  • bit fields

  • bit fields and packed bit arrays

  • pack and unpack class objects


Bench loop overhead

Bench++Loop Overhead

  • “for” loop

  • “while” loop

  • infinite loop w/ break

  • 5-iteration loop


Bench optimizer performance

Bench++Optimizer Performance

  • Constant propagation

  • local common sub-expression

  • global common sub-expression

  • unnecessary copy

  • code motion

  • induction variable

  • reduction in strength

  • dead code

  • loop jamming

  • redundant code

  • unreachable code

  • string ops


Bench procedure calls

Bench++Procedure Calls

  • procedure call: no args

  • procedure call: no args, catches exceptions

  • static class method call: no args, catches exceptions

  • inline procedure call: no args

  • static class method call: 1 int arg: catches exception

  • static class method call: 1 int *arg: catches exception

  • static class method call: 1 int &arg: catches exception

  • procedure call: no pars, called thru pointer, catch exception

  • procedure call: 10 int arg: catch exception

  • procedure call: 20 int arg: catch exception

  • procedure call: 10 (3-int) arg: catch exception

  • procedure call: 20 (3-int) arg; catch exception

  • class method call: 1 this arg: catch exception

  • virtual class method call: 1 this arg: catch exception

  • virtual const class method call: 1 this arg: catch exception

  • ibid, called in loop to check lookup optimization


Bench abstraction

Bench++Abstraction

  • max: C++ style

  • max: C style

  • matrix: C++ style

  • matrix: C style

  • iterator: C++ style

  • iterator: C style

  • complex: C++ style

  • complex: C style

  • Stepanov C++ Abstraction


Bench1

Bench++

  • float matrix multiply vs integer

  • double covariance matrix vs float

  • float & int covariance matrix vs float

  • new/delete vs malloc/free


Bench2

Bench++

  • 4 deep exception handled vs 3 deep

  • declared exception handled vs not declared

  • 4 deep rethrown exception vs 4 deep

  • 4 deep setjmp/longjmp vs 4 deep exception


Bench3

Bench++

  • if test vs logical equation

  • 2-way switch vs 2-way if/else

  • 10-way switch vs 10-way if/else

  • 10-way sparse switch vs 10-way if/else

  • 10-way sparse switch vs 10-way switch

  • 10-way virtual function vs 10-way switch


Bench4

Bench++

  • 20-iostream.>> vs 20 char iostream.getline & gcount

  • 20-iostream.<< vs 20 char iostream.<<

  • istrstream.>> a float from local string vs int

  • boolean operations on bit arrays vs byte arrays

  • boolean operations on bits in loop vs bit arrays

  • boolean operations on bytes in loop vs byte arrays

  • while loop vs for loop

  • simple loop w/break vs for loop


Bench5

Bench++

  • Constant Propagation

  • Local Common-sub

  • Global Common-sub

  • Unnecessary copy

  • Code Motion

  • Induction Variable

  • Reduction in Strength

  • Dead Code

  • Loop Jamming

  • Redundant Code

  • Unreachable Code

Hand optimized vs compiler optimized (higher is better)


Bench6

Bench++


Bench7

Bench ++

  • Static class method call vs local procedure call

  • Inline procedure call vs inlineable local procedure call

  • Static class method call w/ 1 int* par vs int par

  • Static class method call w/ 1 int& par vs int par

  • Call thru a procedure variable vs local procedure call

  • Static class method call w/ 10 int pars vs 1 int par

  • Static class method call w/ 20 int pars vs 1 int par

  • Static class method call w/ 10 3-int pars vs 10 1-int pars

  • Static class method call w/ 20 3-int pars vs 20 1-int pars

  • Class method call w/ this par vs static class method call w/ int par

  • Virtual class method call vs class method call

  • Virtual const class method call vs class method call

  • Loop of virtual const class method call vs no loop


Bench8

Bench++

  • C++ style Max vs C style

  • C++ style Matrix vs C style

  • C++ style Iterator vs C style

  • C++ style Complex vs C style


Bench9

Bench++

  • Stepanov abstraction level n(12 .. 1) vs level 0


Bench stepanov abstraction

Bench++Stepanov Abstraction

  • Level 0: Use a simple Fortran-like loop

  • Level 1,3,4,5,9,11: use doubles

  • Level 2,4,6,7,10,12: use Double - double wrapped in a class

  • Level 1,2: use regular pointers

  • Level 3,4: use pointers wrapped in a class

  • Level 5,6: use pointers wrapped in a reverse-iterator adapter

  • Level 7,8: use wrapped pointers wrapped in a reverse-iterator adapter

  • Level 9,10: use pointers wrapped in a reverse- iterator adapter wrapped in a reverse-iterator adapter

  • Level 11,12 use wrapped pointers wrapped in a reverse iterator adapter wrapped in a reverse- iterator adapter


Bench some conclusions

Bench ++Some Conclusions

  • There is a significant increase in speed between versions 1.0 and 1.1 of egcs.

  • vC++ handles exceptions very well, g++ and KCC do not, though KCC does better at declared exceptions

  • alloca is much faster than new/delete. Too bad you can’t use it portably.

  • switch is much faster than if/else, but virtual functions don’t have much of a penalty

  • KCC does not do well with I/O (c/f g++), but reading characters as strings (iostream.getline) improves performance

  • KCC has trouble optimizing some simple loop structures

  • KCC handles procedure calls very badly, but shows a lower overhead than g++ or vC++ when large numbers of parameters are passed

  • C++ does not optimize such things as dead and redundant code, code motion, and local common sub-expressions on Linux platforms.

  • Abstraction has serious consequences, but KCC tends to handle complex class well.


Observations on vc

Observations on VC++

  • Visual C++ 6.0 is not obviously superior. Neither is it incredibly inferior. Much like the other compilers, it does well in some areas, and poorly in others:

    • vC++ handles abstraction and procedure calls badly

    • vC++ handles exceptions well

    • vC++ does I/O well, except for opening and closing files

    • vC++ handles simple optimization well (ie constant propagation, common sub-expressions, redundant code, etc)


Alternatives to improving compiler optimization

Alternatives to Improving Compiler Optimization

  • Decrease the level of abstraction - write kernels in low level languages such as C or Fortran

  • Put C++ wrappers around low level kernels to retain some advantages of C++. Only useful for large chunks of code.

  • Use macros and templates cleverly.

Details at:

http://annwm.lbl.gov/~leggett/bench/


  • Login