Intel parallel advisor workflow
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

Intel Parallel Advisor Workflow PowerPoint PPT Presentation


  • 43 Views
  • Uploaded on
  • Presentation posted in: General

Intel Parallel Advisor Workflow. David Valentine Computer Science Slippery Rock University. Parallel Advisor: the goal. Find the sections of your application that, when parallelized, will give you the best performance gains and scalability, while maintaining correct results.

Download Presentation

Intel Parallel Advisor Workflow

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Intel parallel advisor workflow

Intel Parallel Advisor Workflow

David Valentine

Computer Science

Slippery Rock University


Parallel advisor the goal

Parallel Advisor: the goal

  • Find the sections of your application that, when parallelized, will give you the best performance gains and scalability, while maintaining correct results


Advisor workflow from net

Advisor Workflow (from .NET)

“Open Advisor Work flow”

Can also get here from Tools-Intel Advisor XE 2013-Open Advisor XE Workflow

Simple, 5-step process

-all analysis done on your serial code


Different builds needed at different steps in the workflow

Different Builds needed at Different Steps in the Workflow


Different builds needed at different steps in the workflow1

Different Builds needed at Different Steps in the Workflow


Work flow step 1 survey target

Work Flow Step 1: Survey Target

  • This “hot spot” tool needs a “Release Mode” configuration along with Project Properties:

    • C/C++

      • General tab: Debug info to /Zi or /ZI

      • Optimization: Max Speed /O2 & Only_Inline/Ob1

      • Code Generation: Runtime Library to Multi-threaded /MD or /MDd

    • Linker- Debugging set to YES (/DEBUG)

  • Build Project

  • Click “Collect Survey Data”


Look at survey report

Look at Survey Report

100% of time spent in the loop in function trap. Double click to see the code involved.


Summary report

Summary Report

We can see all the time was spent in a single, time-consuming loop. We now have a target to parallelize.


Step 2 annotate source

Step 2: Annotate Source

  • The Workflow Advisor gives us 5 tasks:

    • Specify the Intel Advisor XE Include directory

      • I prefer to set .NET IDE for this. Tools-Options-Projects & Solutions-VC++ Directories-

      • Then drop the box “Show directories for” to Include Files

      • Browse to “C:\Program Files\Intel\Advisor XE 2013\include”

    • Include the annotation definitions

      • Go to top of code (in #include’s) and right click

      • Select Intel Advisor XE 2013 – Insert Annotation Definitions Reference and the #include will be inserted for you.

    • Specify the library name and directory


Checking suitability

Checking Suitability

  • Insert the actual Annotations: highlight & rt-click

    ANNOTATE_SITE_BEGIN(MySite1);

    for(inti=1; i<numIntervals; i++) {//get the interior points

    ANNOTATE_TASK_BEGIN(MyTask1);

    x = xLo + i*width;

    area += f(x);

    ANNOTATE_TASK_END(MyTask1);

    }

    ANNOTATE_SITE_END(MySite1);


Checking suitability1

Checking Suitability

  • Rebuild project (Release configuration)

    • The Survey & Suitability tools take RELEASE build

    • The Correctness tool (when we get there) takes a DEBUG build.


Suitability report

Suitability Report

We can almost double speed on dual core

But the tasks are VERY small


Check correctness

Check Correctness

  • Rebuild Project with Debug configuration

    • Compiler: Debug (/Zi)

    • Compiler: Optimization Disabled (/Od)

    • Compiler: Code Generation Runtime Library (/MD or /MDd)

    • Linker Debugging: Generate Debug info YES (/DEBUG)

  • And KNOCK DOWN THE ITERATIONS!

    • Correctness takes a LONG LONG time


We find a data race error

We find a data race error

Each thread tries to update “area”; we have a data race.

(There is also a bug in Advisor)


Fix data race with lock

Fix data race with lock

ANNOTATE_SITE_BEGIN(MySite1);

for(inti=1; i<numIntervals; i++) {//get the interior points

ANNOTATE_TASK_BEGIN(MyTask1);

x = xLo + i*width;ANNOTATE_LOCK_ACQUIRE(&area)

area += f(x);//add the interior value

ANNOTATE_LOCK_RELEASE(&area)

ANNOTATE_TASK_END(MyTask1);

}

ANNOTATE_SITE_END(MySite1);


Run correctness again

Run Correctness again

Clean bill of health!


Now add parallel framework

Now add Parallel Framework

#pragmaomp parallel for default(none) \ //make newbie list ALL

private(x) \//each thread has own x

shared(numIntervals, xLo, width) \ //all share these

reduction(+:area)//threads combine areas at end

for(inti=1; i<numIntervals; i++) {//get the interior points

x = xLo + i*width;//makes each iteration independent of others

area += f(x);//add the interior value ***

}

  • Will also need to add:

    • #include <omp.h>

    • Properties-Configuration- C/C++ -Language-OpenMP Support > Yes


Watch it run

Watch it run!

100% core usage!


Now on to the nifties

Now on to the Nifties…

  • Please respect the work of colleagues

  • DO NOT POST SOURCE CODE

  • Give credit back to the authors

  • DO NOT POST SOURCE CODE

  • Feel free to tweak the assignments

  • DO NOT POST SOURCE CODE


  • Login