Be nice scheduling for embedded smt processors
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

Be-Nice Scheduling for embedded SMT processors PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on
  • Presentation posted in: General

Be-Nice Scheduling for embedded SMT processors. Handong Ye. Apr 6 th , 2008 Boston. Be-Nice Scheduling. ITS (Inter-Thread Stall) Introduction Be-Nice Scheduling Some experimental results. Be-Nice Scheduling. ITS Introduction ITS in Out-Of-Order processor ITS in In-Order processor

Download Presentation

Be-Nice Scheduling for embedded SMT processors

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Be nice scheduling for embedded smt processors

Be-Nice Schedulingfor embedded SMT processors

Handong Ye

Apr 6th, 2008

Boston


Be nice scheduling

Be-Nice Scheduling

  • ITS (Inter-Thread Stall) Introduction

  • Be-Nice Scheduling

  • Some experimental results


Be nice scheduling1

Be-Nice Scheduling

  • ITS Introduction

    • ITS in Out-Of-Order processor

    • ITS in In-Order processor

  • Be-Nice Scheduling

  • Some experimental results


Be nice scheduling2

Be-Nice Scheduling

  • ITS Introduction

    • ITS in Out-Of-Order machine

      • A thread holds (or fulfills) shared resources too long, e.g., instruction queue/reservation station/..., and blocks others

      • Flush, …

    • ITS in In-Order machine

      • A thread holds Functional Units, blocking others

      • 2 examples

      • What can compiler do ?


Be nice scheduling3

Be-Nice Scheduling

  • ITS Introduction

    • ITS In In-Order machine

      • Examples, assume:

        • SMT, 2 threads

        • Embedded

        • 2 LS units, and 2 ALU

        • Separate dispatch buffer


Be nice scheduling for embedded smt processors

Be-Nice Scheduling

  • ITS Introduction

    • ITS In In-Order machine

      • Example – 1 (Same FU ITS)

        • A missed load can block other threads which are using the same LS unit


Be nice scheduling for embedded smt processors

Thread-B

Thread-A

add

add

Dispatch

Buffer

add

ld

ld

ld

EXE

MEM

MISS

WB

LS1

LS2

ALU1

ALU2

Example - 1 : same-FU block


Be nice scheduling for embedded smt processors

Be-Nice Scheduling

  • ITS Introduction

    • ITS In In-Order machine

      • Example – 2 (Cross FU ITS)

        • A missed load can block other threads which are using non-LS Functional Units, e.g., ALU


Be nice scheduling for embedded smt processors

Thread-B

Thread-A

add

add

add

Dispatch

Buffer

add

add

ld

ld

EXE

MEM

MISS

WB

LS1

LS2

ALU1

ALU2

Example – 2 : cross-FU block


Be nice scheduling for embedded smt processors

Be-Nice Scheduling

  • ITS Introduction

    • ITS In In-Order machine

  • Assume:

  • Thread-A cache miss, around 1%~2%

  • 2. Thread-B always hit

  • Results:

  • 1. Half of idle cycles are

  • due to ITS

  • 2. Almost 1/3 cycles are

  • idle

The effect of ITS, from thread-A to thread-B


Be nice scheduling for embedded smt processors

Be-Nice Scheduling

  • ITS Introduction

    • ITS In In-Order machine

      • What can compiler do ?

        • Focused on in-order embedded processor

        • Need a few simple HW supports

        • Using Open64, in Instruction Scheduling


Be nice scheduling4

Be-Nice Scheduling

  • ITS (Inter-Thread Stall) Introduction

  • Be-Nice Scheduling

  • Some experimental results


Be nice scheduling5

Be-Nice Scheduling

  • Be-Nice Scheduling

    • Intuitive thinking

      • Prefetch : Unacceptable for embedded system

      • Reduce Cross-FU ITS: Reduce the number of FUs hold by the thread-A

      • Reduce Same-FU ITS: Avoid issuing instructions from other threads into those blocked FUs


Be nice scheduling for embedded smt processors

ld

add

add

ld

Original Thread-A

Thread-B

Thread-A

add

add

add

add

sched

Dispatch

Buffer

ld

add

ld

ld

EXE

MEM

WB

LS1

LS2

ALU1

ALU2


Be nice scheduling6

Be-Nice Scheduling

  • Be-Nice Scheduling

    • Objective

      • Schedule n (>=2) loads back-to-back

      • Issue the n loads to same FU

    • Compiler + HW solution

      • HW side

        • Add an extra load, ld.n (n=1,2), saying sending load only to the nth LS unit

        • Different threads has its prefer LS unit

      • Compiler side

        • Profile to figure out the loads which are highly possible to miss , saying ‘load_a’

        • Schedule another load, saying ‘load_b’, behind ‘load_a’, and glue them as a pseudo OP

        • Change ‘load_a’ and ‘load_b’ to the thread’s prefer LS unit, e.g., both are changed to ‘ld.1’


Be nice scheduling for embedded smt processors

Be-Nice Scheduling

  • Be-Nice Scheduling

    • A Compiler + HW solution

Identified

to miss

BB1:

$r1 = ld $r2

$r2 = $r2 + 4

$r3 = ld $r4

$r3 = $r3 + 4

$r5 = $r1 + $r3

BB1:

$r1 = ld $r2

$r2 = $r2 + 4

$r3 = ld $r4

$r3 = $r3 + 4

$r5 = $r1 + $r3

BB1:

$r1 = ld $r2

$r3 = ld $r4

$r2 = $r2 + 4

$r3 = $r3 + 4

$r5 = $r1 + $r3

BB1:

$r1 = ld.1 $r2

$r3 = ld.1 $r4

$r2 = $r2 + 4

$r3 = $r3 + 4

$r5 = $r1 + $r3


Be nice scheduling for embedded smt processors

WHIRL

CG-expand

CGIR

Extended

block

optimizer

Software

pipelining

Loop

unrolling

Be-Nice Scheduling

Scheduling pre-

pass ( GCM here)

Global register alloc

Local register alloc

Control flow opt.

Scheduling post-pass

If-conversion

Loop optimizations

Prolog and Epilog

Code emission

.s


Be nice scheduling for embedded smt processors

Be-Nice Scheduling

  • Be-Nice Scheduling ( In Open64 GCM and LIS )

    • The key points during code motion

      • Use GCM to find candidates of <ld.1, ld.1> pair

      • Moving the pair as a ‘pseudo’ single instruction


Be nice scheduling for embedded smt processors

Be-Nice Scheduling

  • Some experimental results

    • Be-Nice Schedule on Thread-A

    • Performance difference on Thread-B


Be nice scheduling for embedded smt processors

Be-Nice Scheduling

  • Some experimental results

The Number of ITS Cycles in thread-B: w/ Be-Nice vs. w/o Be-Nice


Be nice scheduling for embedded smt processors

Be-Nice Scheduling

  • Some experimental results

IPC Improvement of thread-B with Be-Nice Instruction Scheduling


  • Login