Mastering Multithreading in Operating Systems

Introduction to Threads • Overview • Multithreading Models • Thread Libraries • Threading Issues • Operating System Examples • Windows XP Threads • Linux Threads

Threads • A Thread is just a sequence of instructions to execute • Threads share the same memory space as other threads in the same application – so they automatically share data and variables. • Threads can run on different processor cores on a multicore processor – this makes applications faster and more responsive • Even on a single core processor threads make an application more responsive – if one thread stops waiting for I/O, other threads can still run • Processes have a unique virtual memory address space and they take a lot longer for the OS to switch between than threads. Sharing data requires additional overhead and steps – so they have a lot more overhead than threads in many applications. Most applications have one process with several threads. • In C/C++, a thread typically runs the code in a C/C++ function and a special API call starts up a new thread running that function.

Single and Multithreaded Processes

Benefits of Threads • Responsiveness • Applications can run up to N times faster on an N core processor • Resource Sharing • Economy • Scalability

Multicore Programming • Applications only run on one processor core - unless they use multiple threads • Multicore systems are putting more pressure on programmers to use threads, multithreaded application challenges include: • Dividing activities • Balancing the Computational Load • Data splitting • Data dependency • Testing and debugging

Concurrent Execution on a Single-core System OS can time slice between the four Threads T1…T4

Parallel Execution on a Multicore System OS can time slice the four Threads T1…T4 on two processor cores. Two threads can run in parallel on different cores. Application could run up to twice as fast. Without threads, an application can run on only one core!

User Threads • Thread management done by a user-level threads library • Three primary thread libraries: • POSIX Pthreads • Win32 threads • Java and C# threads

Thread Libraries • Thread library provides programmer with API for creating and managing threads • Two primary ways of implementing • Library entirely in user space • Kernel-level library supported by the OS

Pthreads • A POSIX standard (IEEE 1003.1c) API for thread creation and synchronization • API specifies behavior of the thread library, implementation is up to development of the library • Common in UNIX operating systems (Solaris, Linux, Mac OS X) • Can also be added to Windows by installing the optional Pthreads library

Java and C# Threads • Thread support is built into these newer languages with keywords • Java threads are managed by the JVM • C# thread support is in .Net Framework (the C# JVM) • Typically implemented using the threads model provided by underlying OS • Java and C# threads may be created by: • Extending Thread class • Implementing the Runnable interface

Threading Issues • Semantics of fork() and exec() system calls • Thread cancellation of target thread • Asynchronous or deferred • Signal handling • Thread pools • Thread-specific data • Scheduler activations

Thread Cancellation • Terminating a thread before it has finished • Two general approaches: • Asynchronous cancellation terminates the target thread immediately • Deferred cancellation allows the target thread to periodically check if it should be cancelled

Signal Handling • Signals are used in UNIX systems to notify a process that a particular event has occurred • A signal handler is used to process signals • Signal is generated by particular event • Signal is delivered to a process • Signal is handled • Options: • Deliver the signal to the thread to which the signal applies • Deliver the signal to every thread in the process • Deliver the signal to certain threads in the process • Assign a specific thread to receive all signals for the process

Thread Pools • Create a number of threads in a pool where they await work • Advantages: • Usually slightly faster to service a request with an existing thread than create a new thread • Allows the number of threads in the application(s) to be bound to the size of the pool

Windows Threads • Implements the one-to-one mapping, kernel-level • Each thread contains • A thread id • Register set • Separate user and kernel stacks • Private data storage area • The register set, stacks, and private storage area are known as the context of the threads

Linux Threads • Linux refers to them as tasks rather than threads • Thread creation is done through clone() system call • clone() allows a child task to share the address space of the parent task (process)

Background on the need for Synchronization • Threads may need to wait for other threads to finish an operation • Additionally concurrent access to shared data with threads may result in data inconsistency (i.e., incorrect values) • Maintaining data consistency requires mechanisms to ensure the orderly execution of cooperating processes (or threads)

Example Problem • Suppose two threads share a common buffer array. The producer put items in the buffer and the consumer removes them. • A solution to a two thread consumer-producer problem that fills all the buffer space has an integer count that keeps track of the number of full buffers. Initially, count is set to 0. It is incremented by the producer after it produces a new buffer and is decremented by the consumer after it consumes a buffer.

Producer while (true) { /* produce an item and put in nextProduced */ while (count == BUFFER_SIZE) ; // do nothing buffer [in] = nextProduced; in = (in + 1) % BUFFER_SIZE; count++; }

Consumer while (true) { while (count == 0) ; // do nothing nextConsumed = buffer[out]; out = (out + 1) % BUFFER_SIZE; count--; // consume the item in nextConsumed }

Critical Section • The code segments that read and write global shared data between threads or processes is called a “critical section” • Possible race condition bugs on global variable values – example will follow • OS Synchronization API used to solve this • Must be careful and use OS synchronization primitives to control access to a critical section or hidden bugs will appear in code

Race Condition on Count • count++ could be implemented asregister1 = count register1 = register1 + 1 count = register1 • count-- could be implemented asregister2 = count register2 = register2 - 1 count = register2 • Consider this execution interleaving with “count = 5” initially: S0: producer executes register1 = count {register1 = 5}S1: producer executes register1 = register1 + 1 {register1 = 6} S2: consumer executes register2 = count {register2 = 5} S3: consumer executes register2 = register2 - 1 {register2 = 4} S4: producer executes count = register1 {count = 6 } S5: consumer executes count = register2 {count = 4}

Need an Atomic Operation • Count++ and Count-- code must run to end before switching to other thread to avoid bugs • Atomic operation here means a basic operation which cannot be stopped or interrupted in the middle to switch to another thread • Race conditions will occur faster on systems with multiple processors since threads are running in parallel

Solution to Critical-Section Problem 1. Mutual Exclusion (Mutex) - If process Pi is executing in its critical section, then no other processes can be executing in their critical sections 2. Progress - If no process is executing in its critical section and there exist some processes that wish to enter their critical section, then the selection of the processes that will enter the critical section next cannot be postponed indefinitely 3. Bounded Waiting - A bound must exist on the number of times that other processes are allowed to enter their critical sections after a process has made a request to enter its critical section and before that request is granted • Assume that each process executes at a nonzero speed • No assumption concerning relative speed of the N processes

Solution to Critical-section Problem Using Mutex Locks do { acquire lock critical section release lock remainder section } while (TRUE);

Deadlock and Starvation • Deadlock – two or more processes or threads are waiting indefinitely for an event that can be caused by only one of the waiting processes • Let S and Q be two semaphores initialized to 1 (i.e. a mutual exclusion lock) P0P1 wait (S); wait (Q); wait (Q); wait (S); . . . . . . signal (S); signal (Q); signal (Q); signal (S); • Starvation – indefinite blocking. A process may never be removed from the semaphore queue in which it is suspended • Priority Inversion - Scheduling problem when lower-priority process holds a lock needed by higher-priority process. Might need to run lower –priority process first to continue. – messes up priority on processes

RTOS • Real Time Operating System (RTOS) • Used in systems that need a fast response time to external events on the order of milliseconds • This is about 10-100X faster than PCs • The general purpose OS in a PC is optimized for throughput and a fast graphical user interface – but at the expense of the Real Time response

Mbed RTOS & Threads • Runs a 1ms time slice to switch between threads this is about 10-100X faster than PCs • Memory is limited to around 8 threads – each thread needs its own stack and the RTOS also uses a fair chunk of RAM (32K). RAM is used for variables only. Nonvolatile Flash memory stores code and constants -there is (512K) of it, so it is typically not the issue.

MBED RTOS • The mbed RTOS also provides some basic synchronization primitves: • Mutex Lock – used to lock and unlock access to shared memory (variables) and I/O devices • On the mbed compiler, using the keyword volatile will put the equivalent of a mutex lock on a simple built in global variable data type (but not arrays) • Signals – can be used to send signals between threads

MBED RTOS • Semaphores – a more advanced synchronization primitive than a mutex. Can count things, but also slower than a mutex. • Thread::wait(x ms) – tells the RTOS scheduler to not run this thread again until x ms of time has passed. Useful to keep a thread from using too much processor time when it does not need it. Other threads run during the delay. • Don’t use wait – use Thread::wait

Mbed RTOS • Free for ARM mbed users. Many RTOSes require a license fee. Just need RTOS library in project and a new #include “rtos.h” after mbed.h include • Documentation and code examples found in the mbed Handbook under “Real Time Operating System” click “mbed RTOS” link • Free networking libraries are also available that use the RTOS for Internet of Things Devices (IoT)

Mastering Multithreading in Operating Systems

Mastering Multithreading in Operating Systems

Presentation Transcript

„ Threads ”

Introduction to Java Threads

Threads

Intro to Threads

Threads, Gerenciamento de Threads

Threads

An Introduction to Programming with Threads

Introduction to Threads

Talking to Threads

Introduction to MPI, OpenMP, Threads

Introduction to Threads

Java threads: Introduction

Introduction to Threads

Introduction to Concurrency ( Processes, Threads, Interrupts, etc.)

Introduction to Threads

Introduction to Concurrency ( Processes, Threads, Interrupts, etc.)

Threads

Threads

„ Threads ”

THREADS

An Introduction to Programming with Threads