Non-Intrusive Dynamic Application Profiling for Detailed Loop Execution Characterization
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

Non-Intrusive Dynamic Application Profiling for Detailed Loop Execution Characterization PowerPoint PPT Presentation


  • 49 Views
  • Uploaded on
  • Presentation posted in: General

Non-Intrusive Dynamic Application Profiling for Detailed Loop Execution Characterization . Ajay Nair, Roman Lysecky Department of Electrical and Computer Engineering University of Arizona Tucson, AZ USA {ajaynair, [email protected] Introduction Application Profiling.

Download Presentation

Non-Intrusive Dynamic Application Profiling for Detailed Loop Execution Characterization

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Non intrusive dynamic application profiling for detailed loop execution characterization

Non-Intrusive Dynamic Application Profiling for Detailed Loop Execution Characterization

Ajay Nair, Roman Lysecky

Department of Electrical and Computer Engineering

University of Arizona

Tucson, AZ USA

{ajaynair, [email protected]


Introduction application profiling

IntroductionApplication Profiling

  • Application profiling is useful for many purposes

    • Often used to identify frequently executed code regions

      • Allowing a designer to focus on optimizing those regions

      • Map frequently executed code and data regions to non-interfering cache regions

      • Used within binary translation approaches to store translation results

        • x86, Transmeta Crusoe

    • Can be used to create optimized SW or HW implementations selected at runtime

    • And many others….

University of Arizona


Introduction application profiling hw sw partitioning

IntroductionApplication Profiling – HW/SW Partitioning

Hardware/software Partitioning

Profiling is a critical step within hardware/software partitioning

Often utilized to determine critical software region

Frequently executed loops or functions

Critical kernels can be re-implemented in hardware

Speedup of 2X to 10X

Speedup of 1000X possible

Energy reduction of 25% to 95%

µP

HW COPROCESSOR (ASIC/FPGA)

I$

D$

Software Application

(C/C++)

Application Profiling

Critical Kernels

Partitioning

HW

SW

University of Arizona


Introduction application profiling warp processing overview

IntroductionApplication Profiling – Warp Processing Overview

PROFILER DYNAMICALLY DETECTS APPLICATION’S KERNELS

4

5

3

2

1

APPLICATION INITIALLY EXECUTES ON MICROPROCESSOR

I$

D$

ON-CHIP CADMAPS KERNELS ONTO FPGA

WARPED EXECUTION IS 2-100XFASTER

– OR – CONSUMES 75%LESS POWER

CONFIGURE FPGA AND UPDATE APPLICATION BINARY

Profiler

µP

W-FPGA

On-chip CAD

University of Arizona


Introduction application profiling warp processing

IntroductionApplication Profiling – Warp Processing

Warp Processing - Dynamic Hardware/Software Partitioning

Dynamically re-implements critical kernels as HW within W-FPGA

Requires non-intrusive profiling to determine critical kernels at runtime

Incorporated Frequent Loop Detection Profiler [Gordon-Ross, Vahid – TC 2005]

Monitors short backwards branches

Maintains a small list of branch executions frequency

May lead to sub-optimal partitioning as it does not provide detailed loop execution statistics

Profiler

µP

I$

D$

W-FPGA

On-chip CAD

University of Arizona


Introduction application profiling hw sw partitioning1

IntroductionApplication Profiling – HW/SW Partitioning

Loop iteration count alone may not provide sufficient information for accurate performance estimation

Example

Assume we want to partition only one of the following two loops to HW:

With profile data from Frequent Loop Detection Profiler, kernel B appears to be the better candidate

Software Application

(C/C++)

Application Profiling

µP

Critical Kernels

Partitioning

HW COPROCESSOR (ASIC/FPGA)

I$

D$

HW

SW

University of Arizona


Introduction application profiling warp processing1

IntroductionApplication Profiling – Warp Processing

However, communication requirements can significantly impact overall performance

Kernel A may in fact be the better choice

Software Application

(C/C++)

Application Profiling

µP

Critical Kernels

Partitioning

HW COPROCESSOR (ASIC/FPGA)

I$

D$

HW

SW

University of Arizona


Introduction application profiling goal non intrusive profiling

IntroductionApplication Profiling – Goal: Non-Intrusive Profiling

Non-intrusive Application Profiling

Goal: Profile application at runtime to determine detailed loop execution statistics with no impact on application execution

Runtime overhead cannot be tolerated by many applications at runtime

E.g. Real-time and embedded systems

May lead to missed deadlines and potentially system failure

Software Application

(C/C++)

Application Profiling

µP

Critical Kernels

Partitioning

HW COPROCESSOR (ASIC/FPGA)

I$

D$

HW

SW

University of Arizona


Introduction application profiling existing profiling methods

IntroductionApplication Profiling – Existing Profiling Methods

Software Based Profiling

Instrumenting - insert code directly within software

E.g., monitor branches, basic blocks, functions, etc.

Intrusive: Increases code size and introduces runtime overhead

Statistical Sampling

Periodically interrupt processor – or execute additional software task – to monitor program counter

Statistically determine the application profile

Very good accuracy with reduced overhead compared to instrumentation

Intrusive: Introduces runtime overhead

Software Application

(C/C++)

Application Profiling

µP

Critical Kernels

Partitioning

HW COPROCESSOR (ASIC/FPGA)

I$

D$

HW

SW

University of Arizona


Introduction application profiling existing profiling methods1

IntroductionApplication Profiling – Existing Profiling Methods

Hardware Based Profiling

Processor Support – Event Counters

Many processors include event counters that can be used to profile an application

Intrusive: Requires additional software support to process event counters to profile application

JTAG – Joint Test Action Group

Standard interface for reading register within hardware devices

Intrusive: Requires the processor to be halted to read the values

Software Application

(C/C++)

Application Profiling

µP

Critical Kernels

Partitioning

HW COPROCESSOR (ASIC/FPGA)

I$

D$

HW

SW

University of Arizona


Dynamic application profiler daprof non intrusive dynamic application profiling

Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling

Dynamic Application Profiler (DAProf)

Non-intrusively monitors both loop executions and iterations

Monitors processor’s instruction bus and branch execution behavior to build application profile

Requires a short backwards branch (sbb) signal from microprocessor

SBB

SBB

FOUND

FOUNDINDEX

IOFFSET

IADDR

IADDR

I$

REPLACEINDEX

D$

PROFILE CACHE

TAG

(30)

OFFSET

(8)

CURRITER

(10)

AVGITER

(13)

EXECS

(16)

INLOOP

(1)

FRESH-NESS

(3)

Profiler FIFO

Profiler Controller

DYNAMIC APPLICATION PROFILER (DAPROF)

sbb

DAProf

µP

iAddr

FPGA/ASIC

University of Arizona


Dynamic application profiler daprof non intrusive dynamic application profiling1

Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling

Profiler FIFO

Small FIFO that stores the instruction address (iAddr) and instruction offset (iOffset) of all executed sbb’s

Synchronizes between processor execution frequency and slower internal profiler frequency

SBB

SBB

FOUND

FOUNDINDEX

IADDR

IOFFSET

IADDR

REPLACEINDEX

PROFILE CACHE

TAG

(30)

OFFSET

(8)

CURRITER

(10)

AVGITER

(13)

EXECS

(16)

INLOOP

(1)

FRESH-NESS

(3)

Profiler FIFO

Profiler Controller

DYNAMIC APPLICATION PROFILER (DAPROF)

University of Arizona


Dynamic application profiler daprof non intrusive dynamic application profiling2

Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling

Profile Cache

Tag: Address of the short backwards branch

Offset: Negative branch offset

Corresponds to the size of the loop

Currently supports loops with less than 256 instructions

SBB

SBB

FOUND

FOUNDINDEX

IADDR

IOFFSET

IADDR

REPLACEINDEX

PROFILE CACHE

TAG

(30)

OFFSET

(8)

CURRITER

(10)

AVGITER

(13)

EXECS

(16)

INLOOP

(1)

FRESH-NESS

(3)

Profiler FIFO

Profiler Controller

DYNAMIC APPLICATION PROFILER (DAPROF)

University of Arizona


Dynamic application profiler daprof non intrusive dynamic application profiling3

Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling

Profile Cache

CurrIter: Number of iterations for the current loop execution

AvgIter: Average Iterations per execution of the loop

13-bit fixed point representation with 10 bits integer and 3 bits fractional

SBB

SBB

FOUND

FOUNDINDEX

IADDR

IOFFSET

IADDR

REPLACEINDEX

PROFILE CACHE

TAG

(30)

OFFSET

(8)

CURRITER

(10)

AVGITER

(13)

EXECS

(16)

INLOOP

(1)

FRESH-NESS

(3)

Profiler FIFO

Profiler Controller

DYNAMIC APPLICATION PROFILER (DAPROF)

University of Arizona


Dynamic application profiler daprof non intrusive dynamic application profiling4

Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling

Profile Cache

InLoop: Flag indicating loop is currently executing

Utilized to distinguish between loop iterations and loop executions

Freshness: Indicates how recently a loop has been executed

Utilized to ensure newly identified loops are not immediately replaced from the profile cache

SBB

SBB

FOUND

FOUNDINDEX

IADDR

IOFFSET

IADDR

REPLACEINDEX

PROFILE CACHE

TAG

(30)

OFFSET

(8)

CURRITER

(10)

AVGITER

(13)

EXECS

(16)

INLOOP

(1)

FRESH-NESS

(3)

Profiler FIFO

Profiler Controller

DYNAMIC APPLICATION PROFILER (DAPROF)

University of Arizona


Dynamic application profiler daprof non intrusive dynamic application profiling5

Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling

Profile Cache Outputs

found: Indicates if current loop (identified by iAddr) is found within the profile cache

foundIndex: Location of loop within profile cache, if found

replaceIndex:Loop that will be replaced upon new loop execution

Loop not identified as fresh with least total iterations

SBB

SBB

FOUND

FOUNDINDEX

IADDR

IOFFSET

IADDR

REPLACEINDEX

PROFILE CACHE

TAG

(30)

OFFSET

(8)

CURRITER

(10)

AVGITER

(13)

EXECS

(16)

INLOOP

(1)

FRESH-NESS

(3)

Profiler FIFO

Profiler Controller

DYNAMIC APPLICATION PROFILER (DAPROF)

University of Arizona


Dynamic application profiler daprof non intrusive dynamic application profiling6

Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling

Profiler Controller

If loop is found within cache

If InLoop flag is set

New iteration

Increment current iterations

Otherwise

New execution

Increment executions

Set current iterations to 1

Set InLoop flag

Update Freshness

University of Arizona


Dynamic application profiler daprof non intrusive dynamic application profiling7

Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling

Profiler Controller

If loop is not found within cache

Replace profile cache entry

Initialize execution and current iterations to 1

Set InLoop flag

Update Freshness

University of Arizona


Dynamic application profiler daprof non intrusive dynamic application profiling8

Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling

Profiler Controller

If current sbb (iAddr) is detected outside a loop within the profile cache

AND, the loop’s InLoop flag is set

Reset InLoop flag

Update average iterations

Ratio based average iteration calculation

Simple hardware requirements

Good accuracy for applications considered

University of Arizona


Dynamic application profiler daprof hardware implementation

Dynamic Application Profiler (DAProf)Hardware Implementation

PROFILE CACHE

SBB

SBB

TAG

(30)

OFFSET

(8)

CURRITER

(10)

AVGITER

(13)

EXECS

(16)

INLOOP

(1)

FRESH-NESS

(3)

Profiler FIFO

Profiler Controller

FOUND

FOUNDINDEX

IADDR

IADDR

IOFFSET

DYNAMIC APPLICATION PROFILER (DAPROF)

REPLACEINDEX

  • DAProf Hardware

    • Implemented fully associative, 16-way associative, and 8-way associative profiler design in Verilog

    • Synthesized using Synopsys Design Compiler targeted at UMC .18µm

University of Arizona


Dynamic application profiler daprof profiling accuracy

Dynamic Application Profiler (DAProf)Profiling Accuracy

  • DAProf Profiling Accuracy

    • Compared profiling accuracy of top tens loops for several MiBench applications – compared to detailed simulation based profiling

    • Results presented for 8-way DAProf design

      • All three associativity performed similarly well

90% accuracy for

average iterations

97% accuracy for

executions

95% accuracy for

% execution time

University of Arizona


Dynamic application profiler daprof profiling accuracy function call interference

Dynamic Application Profiler (DAProf)Profiling Accuracy – Function Call Interference

  • DAProf Profiling Accuracy

    • Some applications are affected by function call interference

      • Loop execution within functions called from within a loop may lead to InLoop flag being incorrectly reset for calling loop

      • Average iterations will be incorrectly updated

Function Call Interference

University of Arizona


Current work dynamic application profiler function call support

Current Work – Dynamic Application Profiler Function Call Support

Extended DAProf Profiler with Function Call Support

Monitors function calls and returns to avoid function call interference

InFunc: Flag within Profile Cache to determine is a loop has called a function

Will not update average iterations until function call returns

FUNC

SBB

SBB

FUNC

RET

RET

SBB

SBB

FOUND

FOUND

FOUNDINDEX

FOUNDINDEX

IOFFSET

IADDR

IADDR

IOFFSET

IADDR

IADDR

REPLACEINDEX

REPLACEINDEX

PROFILE CACHE

PROFILE CACHE

TAG

(30)

OFFSET

(30)

CURRITER

(10)

AVGITER

(13)

EXECS

(16)

INLOOP

(1)

FRESH-

NESS

(3)

INFUNC

(1)

TAG

(30)

OFFSET

(8)

CURRITER

(10)

AVGITER

(13)

EXECS

(16)

INLOOP

(1)

FRESH-NESS

(3)

Profiler FIFO

Profiler Controller

Profiler FIFO

Profiler Controller

DYNAMIC APPLICATION PROFILER (DAPROF)

DYNAMIC APPLICATION PROFILER (DAPROF)

University of Arizona


Current work dynamic application profiler profiling accuracy with function call support

Current Work – Dynamic Application ProfilerProfiling Accuracy with Function Call Support

  • DAProf Profiling Accuracy with Function Support

    • Compared profiling accuracy of top tens loops for several MiBench applications – compared to detailed simulation based profiling

    • Results presented for 8-way DAProf design

      • All three associativity performed similarly well

95% accurate for

average iterations,

executions, and

% execution time

University of Arizona


Conclusions

Conclusions

  • Conclusions

    • Developed a non-intrusive dynamic application profiler (DAProf)

      • Profiles an application at runtime providing detailed loop execution characteristics

      • Developed efficient methods for identifying loop executions from loop iterations

      • Developed Freshness based replacement policy to ensure newly executed loops are not immediately replaced

      • Developed efficient method for monitoring function call executions

    • Achieves excellent profiling accuracy

      • On average, better than 95% accurate for average iterations per executions, loops executions, and estimated percentage of total application execution time

    • Efficient Hardware Implementation

      • Area requirement as little as 11% of an ARM9 processor

      • Maximum operating frequency of 495 MHz

University of Arizona


Current future work

Current/Future Work

Current/Future Work

Current DAProf performs excellently for profiling single threaded software applications

However, multitasked/multithreaded applications may lead to context switch interference

Similar implications as that of function call interferences

Need for task/thread aware profiling

University of Arizona


  • Login