Fast multi threading on shared memory multi processors
1 / 11

Fast Multi-Threading on Shared Memory Multi-Processors - PowerPoint PPT Presentation

  • Uploaded on

Fast Multi-Threading on Shared Memory Multi-Processors. Joseph Cordina B.Sc. Computer Science and Physics Year IV. Aims of Project. Implementation of MESH, a user level threads package, on to a shared memory multi-processor machine

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Fast Multi-Threading on Shared Memory Multi-Processors' - jalia

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Fast multi threading on shared memory multi processors

Fast Multi-Threading on Shared Memory Multi-Processors

Joseph Cordina

B.Sc. Computer Science and Physics

Year IV

Aims of project
Aims of Project

  • Implementation of MESH, a user level threads package, on to a shared memory multi-processor machine

  • Take advantage of concurrent processing while maintaining the advantages of fine grain user level thread scheduling with low latency context switching

  • Enable concurrent inter-process communication on same machine and on an Ethernet network through the NIC

What is mesh
What Is MESH ?

  • A tightly coupled fine grain uni-processor user level thread scheduler for the C language

  • MESH provides an environment in which to manage user level threads

  • Makes use of inline active context switching relying on the compiler knowledge of the registers in use at any one time (min.c.s/w 55ns)

  • Direct hardware access close to maximum theoretical limit when using jumbo frames

  • Communication API supports message pools, ports and poolports

Concurrent resource access
Concurrent Resource Access

  • Scheduler entry points are explicit

  • Scheduler entry occurs concurrently when using more than one thread of execution

  • Access to global data needs to be protected from concurrency

    • Data read access does not need to be protected

    • Data write access cannot occur concurrently with data reads

  • Spin-lock protected resources with small critical section providing minimum busy wait time

  • Spin on read to preserve cache

Scheduling in smp mesh
Scheduling in SMP-MESH

  • Shared run queue to store user level threads descriptors at 32 levels of priority

  • Multiple Kernel level threads access it to retrieve threads and place new ones and can lead to data corruption

  • Lock protected run queue forces synchronisation

  • Fine thread granularity increases contention for run queue lock

Scheduling in smp mesh 2
Scheduling in SMP-MESH (2)

Linux does not provide a private memory area for each Kernel Level Thread unlike SunOS LWPs

  • Kernel level threads need knowledge of self identification achieved through comparing stack space

  • Kernel level thread should equal number of processors for best utilization

Well behaved idling
Well Behaved Idling

  • Upon finding the run queue empty, kernel level threads sleep in the kernel giving up the processor for other applications to execute

  • Sleeping on a semaphore removes risk of lost wakeup unlike signals and message passing

  • Upon re-awaking, the new user level threads are passed directly to the sleeping thread, without invoking run queue access

Load balancing
Load Balancing

  • No Kernel Level Thread is idle when a user level thread is on the shared run-queue

  • The run-queue’s FIFO structure ensures that oldest threads will be executed first

  • Cache consistency is not ensured when using shared run-queue

Communication in smp mesh
Communication in SMP-MESH

  • Inter thread communication on same system and in between different systems

  • All instances of message pools, ports and poolports have a private lock providing maximum concurrent communication

  • Consecutive memory needs to be protected when creating messages

  • Message transmission to NIC using a lock, reception from NIC using no lock


  • 500,000 context switches at differing thread granularity

  • Contention for shared resources at fine thread granularity gives worse performance on an SMP machine than on a uni-processor machine


  • SMP-MESH takes advantage of multi-processors for fine grain multi-threading

  • Concurrency is encouraged in all areas unless risk of data corruption exists

  • Overheads on the uni-processor MESH are expected yet counter balanced as number of processors are increased

  • Considerable speedup is available at minimum cost, the main disadvantage is requiring more careful synchronisation in application design