1 / 11

Fast Multi-Threading on Shared Memory Multi-Processors

Fast Multi-Threading on Shared Memory Multi-Processors. Joseph Cordina B.Sc. Computer Science and Physics Year IV. Aims of Project. Implementation of MESH, a user level threads package, on to a shared memory multi-processor machine

jalia
Download Presentation

Fast Multi-Threading on Shared Memory Multi-Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Multi-Threading on Shared Memory Multi-Processors Joseph Cordina B.Sc. Computer Science and Physics Year IV

  2. Aims of Project • Implementation of MESH, a user level threads package, on to a shared memory multi-processor machine • Take advantage of concurrent processing while maintaining the advantages of fine grain user level thread scheduling with low latency context switching • Enable concurrent inter-process communication on same machine and on an Ethernet network through the NIC

  3. What Is MESH ? • A tightly coupled fine grain uni-processor user level thread scheduler for the C language • MESH provides an environment in which to manage user level threads • Makes use of inline active context switching relying on the compiler knowledge of the registers in use at any one time (min.c.s/w 55ns) • Direct hardware access close to maximum theoretical limit when using jumbo frames • Communication API supports message pools, ports and poolports

  4. Concurrent Resource Access • Scheduler entry points are explicit • Scheduler entry occurs concurrently when using more than one thread of execution • Access to global data needs to be protected from concurrency • Data read access does not need to be protected • Data write access cannot occur concurrently with data reads • Spin-lock protected resources with small critical section providing minimum busy wait time • Spin on read to preserve cache

  5. Scheduling in SMP-MESH • Shared run queue to store user level threads descriptors at 32 levels of priority • Multiple Kernel level threads access it to retrieve threads and place new ones and can lead to data corruption • Lock protected run queue forces synchronisation • Fine thread granularity increases contention for run queue lock

  6. Scheduling in SMP-MESH (2) Linux does not provide a private memory area for each Kernel Level Thread unlike SunOS LWPs • Kernel level threads need knowledge of self identification achieved through comparing stack space • Kernel level thread should equal number of processors for best utilization

  7. Well Behaved Idling • Upon finding the run queue empty, kernel level threads sleep in the kernel giving up the processor for other applications to execute • Sleeping on a semaphore removes risk of lost wakeup unlike signals and message passing • Upon re-awaking, the new user level threads are passed directly to the sleeping thread, without invoking run queue access

  8. Load Balancing • No Kernel Level Thread is idle when a user level thread is on the shared run-queue • The run-queue’s FIFO structure ensures that oldest threads will be executed first • Cache consistency is not ensured when using shared run-queue

  9. Communication in SMP-MESH • Inter thread communication on same system and in between different systems • All instances of message pools, ports and poolports have a private lock providing maximum concurrent communication • Consecutive memory needs to be protected when creating messages • Message transmission to NIC using a lock, reception from NIC using no lock

  10. Results • 500,000 context switches at differing thread granularity • Contention for shared resources at fine thread granularity gives worse performance on an SMP machine than on a uni-processor machine

  11. Conclusion • SMP-MESH takes advantage of multi-processors for fine grain multi-threading • Concurrency is encouraged in all areas unless risk of data corruption exists • Overheads on the uni-processor MESH are expected yet counter balanced as number of processors are increased • Considerable speedup is available at minimum cost, the main disadvantage is requiring more careful synchronisation in application design

More Related