Estimating multimedia instruction performance based on workload characterization and measurement
Download
1 / 19

Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement - PowerPoint PPT Presentation


  • 154 Views
  • Uploaded on

Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement. Gheewala, A.; Peir, J.-K.; Yen-Kuang Chen; Lai, K.; IEEE International Workshop on Workload Characterization Pages: 98 - 106 Nov. 2002. Abstract.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement' - hila


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Estimating multimedia instruction performance based on workload characterization and measurement

Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement

Gheewala, A.; Peir, J.-K.; Yen-Kuang Chen; Lai, K.;IEEE International Workshop on Workload Characterization

Pages: 98 - 106

Nov. 2002


Abstract
Abstract Workload Characterization and Measurement

  • The increasing popularity in multimedia applications provokes microprocessors to include media-enhancement instructions. In this paper, we describe a methodology to estimate performanceimprovement of a new set of media instructions on emerging applications based on workload characterization and measurement. Application programs are characterized into a sequential segment, a vectorizable segment, and extra data moves for utilizing the SIMD capability of new media instructions.

  • Techniques based on benchmarking and measurements on existing systems are used to estimate the execution time of each segment. Based on the measurement results, the speedup and the additional data moves of using the new media instructions can be estimated to help processor architects and designers evaluate different design tradeoffs.


Outline
Outline Workload Characterization and Measurement

  • What’s the problem

  • Introduction

  • Methodology foundation and analysis

  • Proposed performance estimation methodology

  • Experimental results and evaluation

  • Conclusions


What s the problem
What’s the Problem Workload Characterization and Measurement

  • Traditional performance evaluation of a new set of media instructions is a time-consuming process

    • Requires detailed processor models to handle both regular and new SIMD media instructions

    • Needs to generate executable binary codes for the new media-extension instructions to drive simulator

  • It’sessentialto quickly estimate the speedup of applications with a few additional media instructions to assess tradeoffs for new media instructions


Introduction
Introduction Workload Characterization and Measurement

  • The proposed methodology

    • Based on timing measurement on existing systems

      • Where the new SIMD instructions are not available

    • Execution time of the following segments can be derived

      • Sequential segment

      • Vectorized segment

        • code segment that can be vectorized by a set of new SIMD instructions

      • Data move segment

        • Explicit data move code segment in using new SIMD instructions

    • Execution time of an application with SIMD instructions can be estimated from the three segments

      • Only need existing hardware

      • No cycle-accurate simulator is required


Estimating speedup for mmx
Estimating Speedup for MMX Workload Characterization and Measurement

  • Amdahl’s law can estimate the speedup of an application

    • fis fraction of the program that can be vectorized

    • n is the ideal speedup of f

  • Modify Amdahl’s law to accommodate the MMX technology

    • O is portion of the code in the vectorizable segment that can’t be replaced by MMX instructions

      • Such as program constructsloop controlsandprocedure calls

    • D represents the fraction of the data move instructions

      • Explicitly data move instruction to/from MMX register

    • m is the speedup of the data moves


Simd with data rearrangement
SIMD with Data Rearrangement Workload Characterization and Measurement

  • Data Arrangement in Registers for Matrix Multiplication

  • Packed Multiply-and-Add (PMADDWD)

    • Performs four 16 bits multiplications and two 32 bits additions

  • Packed-Add (PADDD)

    • Performs two 32 bits additions

16

16

32

32

32


Simd with data rearrangement cont
SIMD with Data Rearrangement (cont.) Workload Characterization and Measurement

  • Another Way of Data Arrangement in Registers

    • More natural data arrangement

    • Invent new PADDD to accomplish this

      • Adds thehigh-order and low-order 32 bits of each of the two source registers

16

16

32

32

32


Workload characterization and measurement
Workload Characterization and Measurement Workload Characterization and Measurement

  • Four types of code

    • Equivalent C-code(executable on existing system)

      • Application program written in C

    • MMX-code (un-executable on existing system)

      • Develops with new SIMD and data move instructions

    • Pseudo MMX-code (executable on existing system)

      • Replaces new SIMD with equivalent MMX-like C instructions

      • Includes all the data moves as that in the MMX-code

    • Cripple code (executable on existing system)

      • Removes new SIMD in MMX-code without replacement

  • Important assumption

    • Four SIMD computation instructions are assumed to be new to the currentMMX ISA

      • PMADDWD, PADDD, PSUBD, PSRAD


Workload characterization and measurement1
Workload Characterization and Measurement Workload Characterization and Measurement

Keeps all the data move instructions as that in the original MMX-code

Replaces the corresponding new SIMD instructions with the equivalent C instructions

Portion of the MMX-code and its equivalent pseudo MMX-code from IDCT


Timing components of four types of code
Timing Components of Four Types of Code Workload Characterization and Measurement

Sequential segment (1-f)

Data-move segment (D)

Execution time for the individual components can be derived except for the new SIMD instructions

Main target for improvement with new SIMD instructions

Vectorizable portion of the C-code (f-O)

Unvectorizable portion (O)


Performance projection and verification
Performance Projection and Verification Workload Characterization and Measurement

  • Individual Timing Components Derivation

    • Data-move segment (D)

      • Difference of execution time between equivalent C-code and pseudo MMX-code

    • Vectorizable portion of the C-code (f-O)

      • Difference of execution time between Cripple code and pseudo MMX-code

    • Unvectorizable portion (O)

      • Difference of execution time between vectorizable portion of the

        C-code (f-O) and original vectorizable segment (f)

  • Total execution time and speedup estimation

    • Sequential segment execution time (1-f)

    • Unvectorizable portion execution time (O)

    • Execution time spent on new SIMD instructions (f-O) / n

    • Data-move segment execution time (D)


Performance projection and verification1
Performance Projection and Verification Workload Characterization and Measurement

  • Steps for estimating speedup factor (n) of the new SIMD

    • Step1: Assembly code examined for each new SIMD instruction

=

Explicit data-move instructions

PMADDWD

+


Performance projection and verification2
Performance Projection and Verification Workload Characterization and Measurement

  • Step2: Estimates execution latency of the assembly

    • Execution latency of each assembly instruction is specified in the architectural book

    • Finally, obtains the estimated speedup factor (n)

  • Step3: Repeats the above steps for new SIMD instructions

    • Obtains the respective speedup of each new SIMD instruction

  • Step4: Calculates the weighted average speedup

    • According to the number of occurrences of each new SIMD instruction in the application

Thus, we can estimate the time spent on all the new SIMD instructions : (f-O) / n


Idct case study results
IDCT Case Study Results Workload Characterization and Measurement

  • Estimated Speedup Factor (n) for New SIMD Instructions

New SIMD computation instruction equivalent C code

=

8.09


Idct case study results cont
IDCT Case Study Results (cont.) Workload Characterization and Measurement

  • IDCT Performance Measurement and Project

=

Sequential

+

Unvectorizable

+

Data moves

+

New MMX


Experimental results and evaluation
Experimental Results and Evaluation Workload Characterization and Measurement

  • Overall speedup is close 1.5 with 2 times of performance improvement for the new SIMD instructions

  • Overall speedup is over 2.5 given 10 times improvement of the new SIMD instructions

Overall speedup

Execution time


Experimental results and evaluation cont
Experimental Results and Evaluation (cont.) Workload Characterization and Measurement

  • Overall speedupreduces from 2.9 to 2.7 with 30% more data move overhead

  • Overall speedupincreases from 2.9 to 3.1 if data move overhead can be reduced by 30%

Overall speedup

Execution time


Conclusions
Conclusions Workload Characterization and Measurement

  • Presents a performance estimation method for using new media instructions

    • Base on characterize media workload with benchmarking and measurement on existing systems

    • No cycle-accurate simulator is required

  • Given a range of performance improvement of the new media instructions, the proposed method can estimate a range of overall speedup


ad