Programming models for the barrelfish multi kernel operating system
This presentation is the property of its rightful owner.
Sponsored Links
1 / 68

Programming models for the Barrelfish multi-kernel operating system PowerPoint PPT Presentation


  • 223 Views
  • Uploaded on
  • Presentation posted in: General

Programming models for the Barrelfish multi-kernel operating system. Tim Harris

Download Presentation

Programming models for the Barrelfish multi-kernel operating system

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Programming models for the barrelfish multi kernel operating system

Programming models for the Barrelfish multi-kernel operating system

Tim Harris

Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Tim Harris, Orion Hodson, Rebecca Isaacs, Ross McIlroy, Simon Peter, Vijayan Prabhakaran, Timothy Roscoe, Adrian Schüpbach, Akhilesh Singhania


Cache coherent multicore

Cache-coherent multicore

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

RAM

RAM

RAM

RAM

RAM

RAM

RAM

RAM

L3

L3

L3

L3

AMD Istanbul: 6 cores, per-core L2, per-package L3


Single chip cloud computer scc

Single-chip cloud computer (SCC)

L2

Core

RAM

RAM

MC-1

MC-3

Router

MPB

L2

Core

RAM

RAM

MC-0

MC-4

VRC

System interface

Non-coherent caches

Hardware supported messaging

24 * 2-core tiles

On-chip mesh n/w


Msr beehive

MSR Beehive

Module RISCN

Module RISCN

Module RISCN

Module RISCN

Core

2

Core

1

Core N

Core

3

RingIn

[

31

:

0

]

,

SlotTypeIn

[

3

:

0

]

,

SrcDestIn

[

3

:

0

]

Module MemMux

Messages

,

Locks

MQ

WD

Rdreturn

(

32

bits

)

RD

(

128

bits

)

RA

,

DDR Controller

RA from display

WA

(

pipelined bus to

controller

all cores

)

RD to

Display

controller

RAM

Ring interconnect

Message passing in h/w

No cache coherence

Split-phase memory access


Ibm cell

IBM Cell

L3

L3

L3

L3

L3

L3

L3

L3

SXU

SXU

SXU

SXU

SXU

SXU

SXU

SXU

LS

LS

LS

LS

LS

LS

LS

LS

MFC

MFC

MFC

MFC

MFC

MFC

MFC

MFC

Memory interface

I/O interface

L1

PXU

L2

64-bit Power (PXU) and SIMD (SXU) core types


Messaging vs shared data as default

Messaging vs shared data as default

  • Fundamental model is message based

  • “It’s better to have shared memory and not need it than to need shared memory and not have it”

Barrelfishmultikernel

Traditional operating systems

Shared state,one-big-lock

Fine-grainedlocking

Clustered objects,partitioning

Distributed state,replica maintenance


The barrelfish multi kernel os

The Barrelfish multi-kernel OS

App

App

App

App

OS node

OS node

OS node

OS node

State

replica

State

replica

State replica

State

replica

Message passing

x64

x64

ARM

Accelerator core

Hardware interconnect


The barrelfish multi kernel os1

The Barrelfish multi-kernel OS

App

App

App

App

OS node

OS node

OS node

OS node

State

replica

State

replica

State replica

State

replica

Message passing

System runs on heterogeneous hardware, currently supporting ARM, Beehive, SCC, x86 & x64

x64

x64

ARM

Accelerator core

Hardware interconnect


The barrelfish multi kernel os2

The Barrelfish multi-kernel OS

App

App

App

App

OS node

OS node

OS node

OS node

Focus of this talk: system components, each local to a specific core, and using message passing

State

replica

State

replica

State replica

State

replica

Message passing

System runs on heterogeneous hardware, currently supporting ARM, Beehive, SCC, x86 & x64

x64

x64

ARM

Accelerator core

Hardware interconnect


The barrelfish multi kernel os3

The Barrelfish multi-kernel OS

User-mode programs: several models supported, including conventional shared-memory OpenMP & pthreads

App

App

App

App

OS node

OS node

OS node

OS node

Focus of this talk: system components, each local to a specific core, and using message passing

State

replica

State

replica

State replica

State

replica

Message passing

System runs on heterogeneous hardware, currently supporting ARM, Beehive, SCC, x86 & x64

x64

x64

ARM

Accelerator core

Hardware interconnect


The barrelfish messaging stack

The Barrelfish messaging stack

  • High-level language runtime system: threads, join patterns, buffering, ...


The barrelfish messaging stack1

The Barrelfish messaging stack

  • High-level language runtime system: threads, join patterns, buffering, ...

H/W specific interface: max message sizes, guarantees, flow control, ...


The barrelfish messaging stack2

The Barrelfish messaging stack

  • High-level language runtime system: threads, join patterns, buffering, ...

Portable IDL: marshalling to/from C structs, events on send/receive possible

H/W specific interface: max message sizes, guarantees, flow control, ...


The barrelfish messaging stack3

The Barrelfish messaging stack

  • High-level language runtime system: threads, join patterns, buffering, ...

Simple synchronous send/receive interface, support for concurrency

Portable IDL: marshalling to/from C structs, events on send/receive possible

H/W specific interface: max message sizes, guarantees, flow control, ...


Programming models for the barrelfish multi kernel operating system

Multi-core hardware & BarrelfishEvent-based modelSynchronous modelSupporting concurrencyCancellationPerformance


Event based interface

Event-based interface

interface Echo {

rpc ping(in int32 arg, out int32 result);

}

Stub compiler

Common event-based programming interface

Interconnect-specific stubs

(SCC)

Interconnect-specific stubs

(Shared-mem)

Interconnect-specific stubs

(Beehive)

Interconnect-specific stubs

(Same core)


Event based interface1

Event-based interface

interface Echo {

rpc ping(in int32 arg, out int32 result);

}

// Try to send => OK or ERR_TX_BUSY

errval_tEcho_send_ping_call (Echo_binding *b, intarg);

Stub compiler

// Register callback when send worth re-trying

errval_tEcho_register_send(Echo_binding *b, Echo_can_send_cb *cb);

typedef void Echo_can_send_cb (Echo_binding *b);

Common programming interface

// Register callback on incoming “ping” response

errval_tEcho_register_recv_ping_resp(Echo_binding *b, Echo_recv_ping_cb *cb);

typedef void Echo_recv_ping_cb (Echo_binding *b, int result);

Interconnect-specific stubs

(SCC)

Interconnect-specific stubs

(UMP)

Interconnect-specific stubs

(BMP)

Interconnect-specific stubs

(LMP)

// Wait for next callback to be ready, then execute it

errval_tevent_dispatch(void);


Event based interface2

Event-based interface

Echo_binding *b;

...

Echo_register_recv_ping_resp(b, &response_handler);

...

bool done = false;

err = Echo_send_ping_call(b, 10);

if (err == ERR_TX_BUSY) {

b->st = malloc(sizeof(int));

*(int*)b->st = 10;

err = Echo_register_send(b, &resend_handler);

assert(err == ERR_OK);

}

while (!done) {

event_dispatch();

}


Event based interface3

Event-based interface

Echo_binding *b;

...

Echo_register_recv_ping_resp(b, &response_handler);

...

bool done = false;

err = Echo_send_ping_call(b, 10);

if (err == ERR_TX_BUSY) {

b->st = malloc(sizeof(int));

*(int*)b->st = 10;

err = Echo_register_send(b, &resend_handler);

assert(err == ERR_OK);

}

while (!done) {

event_dispatch();

}

static void response_handler(Echo_binding *b,

intval) {

printf(“Got response %d\n”, val);

done = true;

}


Event based interface4

Event-based interface

Echo_binding *b;

...

Echo_register_recv_ping_resp(b, &response_handler);

...

bool done = false;

err = Echo_send_ping_call(b, 10);

if (err == ERR_TX_BUSY) {

b->st = malloc(sizeof(int));

*(int*)b->st = 10;

err = Echo_register_send(b, &resend_handler);

assert(err == ERR_OK);

}

while (!done) {

event_dispatch();

}

static void response_handler(Echo_binding *b,

intval) {

printf(“Got response %d\n”, val);

done = true;

}

static void resend_handler(Echo_binding *b) {

err = Echo_send_ping_call(b, (int)*(b->st));

if (err == ERR_TX_BUSY) {

err = Echo_register_send(&resend_handler);

assert(err == ERR_OK);

} else {

free(b->st);

}

}


Why do it this way

Why do it this way?

  • Overlap computation and communication

    • Non-blocking send/receive operations allow the caller to continue with other work

  • Remain responsive to multiple clients

    • Don’t end up “stuck” blocked for a receive from one client while another client is ready for service

  • Lightweight runtime system

    • No threads, no GC, etc.

    • Support on diverse hardware

    • Use within the implementation of runtime systems for higher-level languages


Programming models for the barrelfish multi kernel operating system

Multi-core hardware & BarrelfishEvent-based modelSynchronous modelSupporting concurrencyCancellationPerformance


Goals for the synchronous model

Goals for the synchronous model

  • Cleaner programming model

  • Integration in C

    • Resource consumption can be anticipated

  • Low overhead over the underlying messaging primitives

    • Don’t want to harm speed

    • Don’t want to harm flexibility (e.g., ability to compute while waiting for responses

  • Focus on concurrency between communicating processes

    • Everything runs in a single thread (unless the code says otherwise)

    • Execution is deterministic (modulo the timing and content of external inputs)


Event based programming model

Event-based programming model

interface Echo {

rpc ping(in int32 arg, out int32 result);

}

Stub compiler

Common event-based programming interface

Interconnect-specific stubs

(SCC)

Interconnect-specific stubs

(Shared-mem)

Interconnect-specific stubs

(Same core)


Synchronous message passing

Synchronous message-passing

interface Echo {

rpc ping(in int32 arg, out int32 result);

}

Stub compiler

Synchronous message-passing interface

Synchronous message-passing stubs

Common event-based programming interface

Interconnect-specific stubs

(SCC)

Interconnect-specific stubs

(Shared-mem)

Interconnect-specific stubs

(Same core)


Synchronous message passing1

Synchronous message-passing

interface Echo {

rpc ping(in int32 arg, out int32 result);

}

Stub compiler

// Send “ping”, block until complete

void Echo_tx_ping (Echo_binding *b, intarg);

Synchronous message-passing interface

// Wait for and receive response to “ping”

void Echo_rx_ping(Echo_binding *b, int *result);

Synchronous message-passing stubs

Common event-based programming interface

// RPC send-receive pair

void Echo_ping(Echo_binding*b, intarg, int*result);

Interconnect-specific stubs

(SCC)

Interconnect-specific stubs

(UMP)

Interconnect-specific stubs

(LMP)


Channel abstraction

Channel abstraction

Client-side

Server-side


Channel abstraction1

Channel abstraction

Pair of uni-directional channels, of bounded but unknown capacity

Client-side

Server-side


Channel abstraction2

Channel abstraction

Send is synchronous between the sender and their channel endpoint

Client-side

Server-side


Channel abstraction3

Channel abstraction

FIFO, lossless transmission, with unknown delay

Client-side

Server-side


Channel abstraction4

Channel abstraction

FIFO, lossless transmission, with unknown delay

Client-side

Server-side


Channel abstraction5

Channel abstraction

FIFO, lossless transmission, with unknown delay

Client-side

Server-side


Channel abstraction6

Channel abstraction

FIFO, lossless transmission, with unknown delay

Client-side

Server-side


Channel abstraction7

Channel abstraction

Receive is synchronous between receiver and head of channel

Client-side

Server-side


Channel abstraction8

Channel abstraction

Client-side

Server-side


Channel abstraction9

Channel abstraction

Only whole-channel failures, e.g. if other party exits

Client-side

Server-side


Two back to back rpcs

Two back-to-back RPCs

  • This looks cleaner but:

    • We’ve lost the ability to contact multiple servers concurrently

    • We’ve lost the ability to overlap computation with waiting

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

Echo_ping(b1, arg, &result1);

Echo_ping(b2, arg, &result2);

return result1+result2;

}


Programming models for the barrelfish multi kernel operating system

Multi-core hardware & BarrelfishEvent-based modelSynchronous modelSupporting concurrencyCancellationPerformance


Adding asynchrony async do finish

Adding asynchrony: async, do..finish

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}


Adding asynchrony async do finish1

Adding asynchrony: async, do..finish

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}

If the async code blocks, then resume after the async


Adding asynchrony async do finish2

Adding asynchrony: async, do..finish

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}

If the async code blocks, then resume after the async

Wait until all the async work (dynamically) in the do..finish has completed


Example same core l4 style rpc

Example: same-core L4-style RPC

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}


Example same core l4 style rpc1

Example: same-core L4-style RPC

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}

Execute to the “ping” call as normal


Example same core l4 style rpc2

Example: same-core L4-style RPC

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}

Message send transitions directly to recipient process


Example same core l4 style rpc3

Example: same-core L4-style RPC

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}

Response returned to caller


Example same core l4 style rpc4

Example: same-core L4-style RPC

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}

Caller proceeds through second call


Example cross core rpc

Example: cross-core RPC

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}


Example cross core rpc1

Example: cross-core RPC

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}

First call sends message, this core now idle


Example cross core rpc2

Example: cross-core RPC

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}

Resume after async; send second message


Example cross core rpc3

Example: cross-core RPC

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}

Resume after async; block here until done


Example cross core rpc4

Example: cross-core RPC

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}


Example cross core rpc5

Example: cross-core RPC

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}


Example cross core rpc6

Example: cross-core RPC

static inttotal_rpc(Echo_binding *b1,

Echo_binding *b2,

intarg) {

int result1, result2;

do {

asyncEcho_ping(b1, arg, &result1);

asyncEcho_ping(b2, arg, &result2);

} finish;

return result1+result2;

}

Continue now both responses received


Programming models for the barrelfish multi kernel operating system

Multi-core hardware & BarrelfishEvent-based modelSynchronous modelSupporting concurrencyCancellationPerformance


Cancellation

Cancellation

  • Send a request on n channels

  • Receivers vote yes/no

  • Wait for votes one way or the other


Cancellation1

Cancellation

static bool tally_votes(Vote_binding *b[], int n) {

int yes=0, no=0;

int winning = (n/2)+1;

do {

for (int i = 0; i < n; i ++) {

async{

bool v;

Vote_get_vote(b[i], &v); if (v) { yes++; } else { no++; }

if (yes>=winning || no >=winning) { ??? }

}

}

} finish;

return (yes>no);

}

  • Send a request on n channels

  • Receivers vote yes/no

  • Wait for votes one way or the other


Cancellation2

Cancellation

static bool tally_votes(Vote_binding *b[], int n) {

int yes=0, no=0; int winning = (n/2)+1;

do{

for (int i = 0; i < n; i ++) {

async {

bool v;

if (Vote_get_vote_x(b[i], &v) != CANCELLED) { if (v) { yes++; } else { no++; }

if (yes>=winning || no >=winning) cancel;

}

}

}

} finish;

return (yes>no);

}


Cancellation3

Cancellation

static bool tally_votes(Vote_binding *b[], int n) {

int yes=0, no=0; int winning = (n/2)+1;

do{

for (int i = 0; i < n; i ++) {

async {

bool v;

if (Vote_get_vote_x(b[i], &v) != CANCELLED) { if (v) { yes++; } else { no++; }

if (yes>=winning || no >=winning) cancel;

}

}

}

} finish;

return (yes>no);

}

“_x” suffix marks this as a cancelable function


Cancellation4

Cancellation

static bool tally_votes(Vote_binding *b[], int n) {

int yes=0, no=0; int winning = (n/2)+1;

do{

for (int i = 0; i < n; i ++) {

async {

bool v;

if (Vote_get_vote_x(b[i], &v) != CANCELLED) { if (v) { yes++; } else { no++; }

if (yes>=winning || no >=winning) cancel;

}

}

}

} finish;

return (yes>no);

}

“_x” suffix marks this as a cancelable function

We block here waiting for all the async work to be done


Cancellation with deferred clean up

Cancellation with deferred clean-up

Split RPC into explicit send/receive: distinguish two cancellation points

  • async {

  • bool v;

  • if (Echo_tx_get_vote_x(b[i]) == OK) {

  • if (Echo_rx_get_vote_x(b[i], &v)) == OK) {

  • if (v) { yes++; } else { no++; }

  • if (yes>=winning || no >=winning) cacnel;

  • } else {

  • push_work(q, [=]{ Echo_tx_apologise(b[i])});

  • }

  • }

  • }

C++ closure to apologies for lost votes

Push clean-up operation to queue if cancelled after tx, before rx


Programming models for the barrelfish multi kernel operating system

Multi-core hardware & BarrelfishEvent-based modelSynchronous modelSupporting concurrencyCancellationPerformance


Implementation

Implementation

Thread

Finish block

(

FB

)

Stack

1

Run queue

main

Current FB

ACTV

Enclosing FB

Count

=

0

Completion AWI

  • Techniques:

  • Per-thread run queue of ready work items (“AWIs”)

  • Wait for next IO completion when queue empty

  • Book-keeping data all stack allocated

  • Perform work lazily when blocking: overhead of “async X()” versus “X()” is a couple of cycles

First

Last


Performance

Performance

Ping-pong test

Minimum-sized messages

  • AMD 4 * 4-core machine

  • Using cores sharing L3 cache


Performance1

Performance

Ping-pong test

Minimum-sized messages

  • AMD 4 * 4-core machine

  • Using cores sharing L3 cache


Performance2

Performance

  • “Do not fear async”

    • Think about correctness: if the callee doesn’t block then perf is basically unchanged


Software raid0 64kb transfers

Software RAID0, 64KB transfers

2 * Intel X25-M SSDs

Each 250MB/s

AC

Async

Sync


2 phase commit between cores

2-phase commit between cores

AC

Async

Sync


Project status

Project status

  • Papers:

    • MMCS’08, HotOS’09, PLOS’09, SOSP’09, HotPar’10

  • Substantial prototype system:

    • 146kLOC (new), 488kLOC (including ported code)

  • Establishing collaborations around this research platform

    • More than 20,000 downloads of ETHZ source-code release

    • Intel: Early access to SCC, other test systems

    • Interest from other HW vendors: Tilera, ARM, . . .

    • Boston Uni, IBM Research: HPC work, BlueGene

    • KTH Sweden: Tilera port

    • Barcelona Supercomputing Center


  • Login