fast communication and user level parallelism
Download
Skip this Video
Download Presentation
Fast Communication and User Level Parallelism

Loading in 2 Seconds...

play fullscreen
1 / 27

Fast Communication and User Level Parallelism - PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on

Fast Communication and User Level Parallelism. Howard Marron. Introduction. We have studied systems that have attempted to build transparent layers below the application that created properties like replication and group communication.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Fast Communication and User Level Parallelism' - ingrid-franks


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
introduction
Introduction

We have studied systems that have attempted to build transparent layers below the application that created properties like replication and group communication.

We will look at some areas where more control has been given to the user on parallelism

threads
Threads
  • Allows smaller granularity to programs for better parallelism and performance.
  • Will have lower overhead than processes
  • Same program will run on one machine as a multiprocessor with little or no modification
  • Threads in same process can easily communicate since they share the same address space
implementation
Implementation

Do we want threads and if so where should we implement them?

Latency in μsec on a Firefly system

advantages and problems of ult
Advantages

Thread switching does not involve the kernel:

Scheduling can be application specific: choose the best algorithm.

ULTs can run on any OS. Only needs a thread library

Disadvantages

Most system calls are blocking and the kernel blocks processes. So all threads within the process will be blocked

The kernel can only assign processes to processors. Two threads within the same process cannot run simultaneously on two processors

Advantages and problems of ULT
advantages and inconveniences of klt
Advantages

The kernel knows what the processing environment is and will assign threads accordingly.

Blocking is done on a thread level

Kernel routines can be multithreaded

Disadvantages

Thread switching within the same process involves the kernel. We have 2 mode switches per thread switch.

This results in a significant slow down in thread switches within same process

Advantages and inconveniences of KLT
ult with scheduler activations
ULT with Scheduler Activations
  • Implement user level threads with the help of the kernel.
  • Gain the flexibility and performance of ULT
  • Have functionality of KLT without the overhead
ult over klt
ULT over KLT
  • Kernel operates without knowledge of user programming
  • User threads are never notified of what the kernel schedules since it is transparent to user
  • Kernel schedules threads without respect to user thread priorities and memory locations.
the model
The Model

User level

Thread pool

Scheduler

Scheduler

Kernel runs an instance of the

scheduler on each processor.

P1

P2

kernel support of ult
Kernel Support of ULT
  • Kernel has control of processor allocation
  • ULT has control of what threads to run on allocated processors
  • Kernel notifies ULT scheduler of any changes to environment
  • ULT scheduler can notify Kernel of current processor needs
scheduler activations
Scheduler Activations
  • Add processor – run a thread here
  • Processor preempted – returns state of preempted processor, can run another thread
  • Scheduler has blocked – can run thread here
  • Scheduler has unblocked – return thread to ready list
hints to kernel
Hints to Kernel
  • Add more processors
  • This processor is idle
critical sections
Critical Sections
  • Idea 1
    • On a CS conflict give control back to thread holding lock
    • Thread will give control back after done with CS.
    • Found that was too slow to find if thread was in CS
    • Hard to make thread give up control after CS is done
critical sections cont
Critical Sections (Cont.)
  • Idea 2
    • Make copies of critical sections available to scheduler.
    • Compare PC of thread with CS to check if holding a lock
    • Can run the copy of CS and will return sooner than before since the release of the lock is known to the scheduler.
threads summary
Threads Summary
  • Best solution to threads problem will lay somewhere between ULT and KLT
  • Both must cooperate for best performance
  • Want to have most of control in user level to manage threads since kernel is far away from threads
remote procedure calls
Remote Procedure Calls
  • A technique for constructing distributed systems
  • Allows user to have no knowledge of transport system
  • Called procedure can be located anywhere
  • Strong client/server model of computing
problems with rpc
Problems with RPC
  • Adds huge amount of overhead
    • More protection in every call
    • All calls trap to OS
    • Have to wait for response from other system
    • All calls treated the same – worst case
ways to improve
Ways to improve
  • 95%< all RPCs are to local domain
  • Optimize most taken path
  • Reduce number of system boundaries that RPC crosses
anatomy of a remote rpc
Anatomy of a remote RPC

SERVER

CLIENT

Kernel

User

User

Protection

checks

Interpret and

Dispatch

callRPC()

Message

transfer

Schedule

Run service

Protection

checks

Reply

Message

transfer

Wake up thread

reschedule

lightweight rpc lrpc
Lightweight RPC (LRPC)
  • Create new routines for cross domain calls
  • Use RPC similar calls for cross system calls
  • Blur the line of client/server in new calls
  • Reduce number of variable copies to messages and stacks by maintaining stacks that are dedicated to individual calls
  • Eliminates needs to schedule threads on RPC receipt at server, because processor can be instructed to just switch the calling and called threads
anatomy of a local lrpc
Anatomy of a local LRPC

CLIENT

Kernel

User

Protection

checks

callRPC()

There is no need to schedule

Threads here, the scheduler

Can be told to just switch

The two threads

Copy to Stack

Run service

Reply

Copy to Stack

Resume

multiprocessors
Multiprocessors
  • Can cache whole processor contexts on idle processors
  • Instead of context switching local processor for cross domain calls, run procedure on cached processor
  • Saves on TLB misses and other exchanges like virtual memory
lrpc conclusions
LRPC Conclusions
  • RPCs can be improved for general case
  • Common case should be emphasized not the most general case
  • Can reduce many unnecessary tasks when optimizing for cross domain tasks.
ad