building composable parallel software with liquid threads n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Building Composable Parallel Software with Liquid Threads PowerPoint Presentation
Download Presentation
Building Composable Parallel Software with Liquid Threads

Loading in 2 Seconds...

play fullscreen
1 / 16

Building Composable Parallel Software with Liquid Threads - PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on

Building Composable Parallel Software with Liquid Threads. Heidi Pan*, Benjamin Hindman + , Krste Asanovic + *MIT, + UC Berkeley Microsoft Numerical Library Incubation Team Visit UC Berkeley, April 29, 2008. Today’s Parallel Programs are Fragile.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Building Composable Parallel Software with Liquid Threads


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
building composable parallel software with liquid threads

Building Composable Parallel Software with Liquid Threads

Heidi Pan*, Benjamin Hindman+, Krste Asanovic+

*MIT, +UC Berkeley

Microsoft Numerical Library Incubation Team Visit

UC Berkeley, April 29, 2008

today s parallel programs are fragile
Today’s Parallel Programs are Fragile
  • Parallel programming usually needs to be aware of hardware resources to achieve good performance.
    • Don’t incur overhead of thread creation if no resources to run in parallel.
    • Run related tasks on same core to preserve locality.
  • Today’s programs don’t have direct control over resources, but hope that the OS will do the right thing.
    • Create 1 kernel thread per core.
    • Manually multiplex work onto kthreads to control locality & task prioritization.
  • Even if the OS tries to bind each thread to a particular core, it’s still not enough!

Integer Programming App (B&B)

spawn

spawn

spawn

spawn

Task Parallel Library(TPL) Runtime

OS

KT0

KT1

KT2

KT3

KT4

KT5

P0

P1

P2

P3

P4

P5

today s parallel codes are not composable
Today’s Parallel Codes are Not Composable
  • The system is oversubscribed!
  • Today’s typical solution: use sequential version of libraries within parallel app!

Integer Programming App (B&B)

parallel for

1 2 85 0 09 2 0

4 3 75 0 53 1 2

3 3 91 2 25 2 0

MathLib(MKL)

spawn

spawn

6 6 81 3 24 8 6

1 0 05 3 72 3 5

7 1 98 6 29 0 0

spawn

spawn

OpenMP Runtime

Task Parallel Library(TPL) Runtime

1 2 85 0 09 2 0

4 3 75 0 53 1 2

3 3 91 2 25 2 0

6 6 81 3 24 8 6

1 0 05 3 72 3 5

7 1 98 6 29 0 0

OS

P0

P1

P2

P3

P4

P5

global scheduler is not the right solution
Global Scheduler is Not the Right Solution
  • Difficult to design a one-size-fits-all scheduler that provides enough expressiveness and performance for a wide range of codes efficiently.
    • How do you design a dynamic load-balancing scheduler that preserves locality of both divide-and-conquer and linear algebra algorithms?
  • Difficult to convince all SW vendors and programmers to comply to the same programming model.
  • Difficult to optimize critical sections of code w/o interfering with or changing the global scheduler.

Integer Programming App (B&B)

Solver

1 2 85 0 09 2 0

parallel constructsspawn, parallel for, …

Generic Global Scheduler(User or OS)

cooperative hierarchical scheduling
Cooperative Hierarchical Scheduling

Goals:

  • Distributed Scheduling
    • Customizable, scalable, extensible schedulers that make localized code-specific scheduling decisions.
  • Hierarchical Scheduling
    • Parent decides relative priority of its children.
  • Cooperative Scheduling
    • Schedulers cooperate with each other to achieve globally optimal performance for app.

Integer Programming App (B&B)

Solver

1 2 85 0 09 2 0

OpenMP Scheduler(Child)

TPL Scheduler (Parent)

cooperative hierarchical scheduling1

235 781 143

128500920

552 372 801

990115423

OpenMP

OpenMP

OpenMP

TPL

OpenMP

OpenMP

TPL

Cooperative Hierarchical Scheduling
  • Distributed Scheduling
    • At any point in time, each scheduler has full control over a subset of the kernel threads allotted to the application to schedule its code.
  • Hierarchical Scheduling
    • A scheduler decides how many of its kernel threads to give to each child scheduler, and when these threads are given.
  • Cooperative Scheduling
    • A scheduler decides when to relinquish its kernel threads instead of being pre-empted by its parent scheduler.
standardizing inter scheduler interface
Standardizing Inter-Scheduler Interface

Integer Programming App (B&B)

Solver

1 2 85 0 09 2 0

OpenMP Scheduler(Child)

Standardized Inter-SchedulerResource Management Interface to achieveCooperative Hierarchical Scheduling

TPL Scheduler (Parent)

Need to extend sequential ABI to support the transfer of resources!

updating the abi for the parallel world
Updating the ABI for the Parallel World
  • Functional ABI
    • Call transfers the thread to the callee, which has full control of register & stack resources to schedule its instructions, and cooperatively relinquishes thread upon return.
    • Identical to sequential call.

Integer Programming App (B&B)

call

2378

T0

8502

T1

9254

solve(A) {

2385780292035431

0331

T2

T3

T4

T5

ret

t

OpenMP

  • Resource Mgmt ABI
    • Parallel callee registers with caller to ask for more resources.
    • Caller enters callee on additional threads that it decides to grant.
    • Callee cooperatively yields threads.

};

call

(steal)

reg

2378

TPL Scheduler

enter

8502

0331

OS

9254

yield

unreg

P0

P1

P2

P3

P4

P5

ret

t

the case for a resource mgmt abi
The Case for a Resource Mgmt ABI

By making resources a first-class citizen, we enable:

  • Composability:
    • Code can be written without knowing the context in which it will be called to encourage abstraction, reuse, and independence.
  • Scalability:
    • Code can call any library function without worrying about inadvertently oversubscribing the system’s resources.
  • Heterogeneity:
    • An application can incorporate parallel libraries that are implemented in different languages and/or linked with different runtimes.
  • Transparency:
    • A library function looks the same to its caller, regardless of whether its implementation is sequential or parallel.
tpl example managing child schedulers
TPL Example: Managing Child Schedulers
  • T0: 1) Push continuations at spawn points onto work queue. 2) Upon child registration, push child’s enter to recruit more threads. 3) Child keeps track of its own parallelism (not pushed onto parent queue).
  • T1: Steal subtree to compute.
  • T2: Steal enter task, which effectively grants the thread to the child.

T0

T1

T2

T0

call

T1

call

0

T2

spawn

solve(A) {

steal

steal

2385780292035431

1

enter

enter

1

3

OpenMP

2

};

steal

steal

mvmult ex managing variable of threads
MVMult Ex: Managing Variable # of Threads
  • Partition work into tasks, each operating on an optimal cache block size.
  • Instead of statically mapping all tasks onto a fixed number of threads (SPMD), tasks are dynamically fetched by current threads (and load balanced).
  • No loss of locality if no reuse of data between tasks.
  • Additional synchronization may be needed to impose an ordering of noncommutative floating-point operations.

next task

call

reg

enter

parallel for

1 2 85 0 09 2 0

4 3 75 0 53 1 2

3 3 91 2 25 2 0

enter

6 6 81 3 24 8 6

1 0 05 3 72 3 5

7 1 98 6 29 0 0

yield

yield

unreg

ret

t

liquid threads model
Liquid Threads Model
  • Thread resources flow dynamically & flexibly between different modules.
  • More robust parallel codes that adapt to different/changing environments.

P0

P1

P2

P3

call

enter

enter

P0

P1

P2

P3

yield

P0

P1

P2

P3

yield

P0

P1

P2

P3

ret

t

lithe liquid thread environment
Lithe: Liquid Thread Environment

ABI

call

ret

enter

yield

request

:

  • Not a (high-level) programming model.
  • Low-level ABI for expert programmers (compiler/tool/standard library developers) to control resources & map parallel codes.
  • Lithe can be deployed incrementally b/c it supports sequential library function calls & provides some basic cooperative schedulers.

functional

cooperativeresourcemanagement

  • Lithe also supports management of other resources, such as memory and bandwidth.
  • Lithe also supports (uncooperative) revocation of resources by the OS.
lithe s interaction with the os
Lithe’s Interaction with the OS

App

  • Up till now, we’ve implicitly assumed that we’re the only app running, but the OS is usually time-multiplexing multiple apps onto the machine.
  • We believe that a manycore OS should partition the machine spatially & give each app direct control over resources (cores instead of kthreads).
  • The OS may want to dynamically change the resource allocation between the apps depending on the current workload.
    • Lithe-compliant schedulers are robust and can easily absorb additional threads given by the OS & yield threads voluntarily to the OS.
    • Lithe-compliant schedulers can also easily dynamically check for contexts from threads pre-empted by the OS to schedule on remaining threads.
    • Lithe-compliant schedulers don’t use spinlocks (deadlock avoidance).

App

App 1

App 1

App 2

App 3

time-multiplexing

space-multiplexing(spatial partitioning)

OS

OS

P0

P1

P2

P3

P0

P1

P2

P3

status in early stage of development

Slither

Status: In Early Stage of Development

add/kill thread

Fibonacci onVthread (Work Stealing Scheduler)

  • Slither simulates a variable-sized partition.
    • We simulate hard threads using pthreads
    • We simulate partitions using processes.
  • User can dynamically add/kill threads from the Vthread partition through the Slither prompt & Vthread will adapt.
summary
Summary
  • Lithe defines a new parallel ABI that:
    • supports cooperative hierarchical scheduling.
    • enables a liquid threads model in which thread resources flow dynamically & flexibly between different modules.
    • provides the foundation to build composable & robustparallel software.
  • The work is funded partly by