Lightweight (Multithreads) Processes

Lightweight (Multithreads)Processes

Many experimental OS, and some commercial ones, have recently included support for concurrent programming. The most popular is to allow multiple lwp “threads” within a single address space, used from within a single program. Concurrent programming has many problems that do not occur in sequential programming. A thread is a single sequential flow of control. In a high level language we program a thread using procedures, where the procedure calls the traditional stack. Having “multiple threads” in a program means that at any instant the program has multiple points of execution, one in each of its threads.

The programmer view the threads as executing simultaneously, as if the computer has many processors as there are threads. Having threads execute within a “single address space” means that the computer’s addressing hardware permit the threads to read and write the same memory locations. In high-level language, this corresponds that the off-stack (global) variables are shared among all the threads of the program. Each thread executes on a separate call stack with its own separate local variables. The programmer is responsible for using the synchronization mechanisms of the thread facility to ensure that the shared memory is accessed in a manner that will give the correct answer.

Thread facilities are “lightweight”. This means that thread creation, existence, destruction and synchronization primitives are cheap enough that the programmer will use them for all his concurrency needs. In most conventional OS we have multiple separate processes, running in separate address space. This tends to be expansive to set up, and the costs of communicating between address spaces are high.

Why use concurrency ? • Life would be simpler without concurrency. Who needs it: • Multiprocessing (simultaneous points of execution). • Driving slow devices such as disks, networks, terminals and printers. The program is doing some other useful work while waiting for the device. • Building distributed systems. Shared network servers. Server is willing to service requests from multiple clients. Use of multiple threads allows the server to handle clients’ requests in parallel.

“Lighter” than ordinary processes. They represent a thread of control not bound to an address space. Threads operate more efficiently than ordinary SunOs operating system processes, because threads communicate via shared memory instead a file system. Threads can share a common address space. Then the cost of creating tasks and intertask communication is less than the cost of using more “heavyweight” primitives.

Thread creation and destruction, status gathering, scheduling manipulation, suspend and resume. • Multiplexing the clock (any number of threads can sleep concurrently). • Individualized context switching. It is possible to specify that a given set of threads will touch floating point registers and only those threads will context switch these registers. • Monitors and condition variables to synchronize threads. • Extended rendezvous (message send-receive-reply) between threads.

Scheduling is by default, priority based, and non-preemptive within a priority. It is possible to write your own scheduler. A high-priority thread may periodically reshuffle the queue f time-sliced threads which are at lower priority. Threads currently lack kernel support, so system calls still serialize thread activity. When a set of threads are running, it is assumed that they all share memory.

Threads The lwp mechanism allows several threads of control to share the same address space. Each lwp process is represented by a procedure that will be converted into a thread by the lwp_create(). Once created, a thread is an independent entity, with its own stack as supplied by the creator. A collection of threads runs within a single ordinary process. The collection is called a pod.

LWP or threads are scheduled by priority. The highest priority non-blocked thread is executing. Within priority, threads execute on FCFS basis. Example 1

#include <lwp/lwp.h> #include <lwp/stackdep.h> int lwp_create(tid, func, prio, flags, stack, nargs, arg1, ..., argn) thread_t *tid; void (*func)(); int prio; int flags; stkalign_t *stack; int nargs; int arg1, ..., argn; lwp_create() creates a lightweight process which starts at address func and has stack segment stack. If stack is NULL, the thread is created in a suspended state. And prio is the scheduling priority of the thread (higher priorities are favored by the scheduler). The identity of the new thread is filled in the reference parameter tid. flags describes some options on the new thread.

The first time a lwp primitive is used, the lwp library automatically converts the caller (i.e., main) into a thread with the highest available scheduling priority. Scheduling is, by default, non-preemptive within a priority, and within a priority, threads enter the run queue on a FIFO basis (that is, whenever a thread becomes eligible to run,it goes to the end of the run queue of its particular priority). Thus, a thread continues to run until it voluntarily relinquishes control or an event (including thread creation) occurs to enable a higher priority thread. Some primitives may cause the current thread to block, in which case the unblocked thread with the highest priority runs next. When several threads are created with the same priority, they are queued for execution in the order of creation.

There is no concept of ancestry in threads: the creator of a thread has no special relation to the thread it created. When all threads have died, the pod terminates. lwp_destroy() is a way to explicitly terminate a thread or agent (instead of having an executing thread "fall though",which also terminates the thread). tid specifies the id of the thread or agent to be terminated. If tid is SELF, the invoking thread is destroyed. Upon termination, the resources (messages, monitor locks, agents) owned by the thread are released, in some cases resulting in another thread being notified of the death of its peer (by having a blocking primitive become unblocked with an error indication).

lwp_newstk() returns a cached stack that is suitable for use in an lwp_create() call. lwp_setstkcache() must be called (once) prior to any use of lwp_newstk. If running under SunOS 4.x, the stacks allocated by lwp_newstk() will be red-zone protected (an attempt to reference below the stack bottom will result in a SIGSEGV event).

Stack Size How big to make the threads stacks ? Then, we can decide if we need protection against exceeding this limit. Unix presents the same problem to the user. Allocating large stacks is not a performance drain because pages are only allocated if actually used. Hence, we can allocate very large stacks.

Stacks are problematical with lightweight processes. What is desired is that stacks for each thread are red-zone protected so that one thread's stack does not unexpectedly grow into the stack of another. In addition, stacks should be of infinite length, grown as needed. The process stack is a maximum-sized segment. This stack is red- zone protected, and you can even try to extend it beyond its initial maximum size in some cases.

The stack used by main() is the same stack that the system allocates for a process on fork(). For allocating other thread stacks, the client is free to use any statically or dynamically allocated memory (using memory from main()'s stack is subject to the stack resource limit for any process created by fork()). Threads created with stacks from lwp_newstk() should not use the NOLASTRITES flag. If they do, cached stacks will not be returned to the cache when a thread dies.

Protecting against stack overflow • Lwp_newstk() automatically allocated protected stacks. Reference beyond the stack limit will generate a SIGSEGV event. • There are two ways to check stack integrity when not using lwp_newstk(): • Use the check() macro at the beginning of each procedure (before any locals are assigned), in conjunction with lwp_checkstkset(). If the procedure exceeds the thread stack limit, the procedure will return and set a global variable. • Use lwp_stkcswset(). This enables stack checking on context switching. This is transparent to the client programs. It may not

detect error until after the stack limit has been exceeds. With lwp_stkcswset() an error is considered fatal. CHECK() detects errors before any damage is done, so error recovery is possible. It is possible to assign a statically allocated stack to a thread. Example 2

UNIX command limit cputime unlimited filesize unlimited datasize 2097152 kbytes stacksize 65536 kbytes coredumpsize unlimited memoryuse 247848 kbytes vmemoryuse 2097152 kbytes descriptors 200 threads 1024

Coroutines It is possible to use threads as pure coroutines: one thread explicitly yields control to another. Lwp_yield() allows a thread to yield to either a specific thread at the same priority, or the next thread in line at the same priority. Since we are using coroutines, a single priority (MINPRIO) is sufficient and we do not increase the number of available priorities with pod_setmaxpri().

If we have lwp_yield(THREADNULL) then the current thread goes to the end of its scheduling queue. When a specific yield is performed, the specified thread jumps in front of the current one at a front of the scheduling queue. Example 3: Three coroutines: main(), coroutine(), other(). 1-7 have to be printed.

lwp_self() returns the ID of the current thread in tid. This is the only way to retrieve the identity of main. lwp_yield() allows the currently running thread to voluntarily relinquish control to another thread with the same scheduling priority. If tid is SELF, the next thread in the same priority queue of the yielding thread will run and the current thread will go the end of the scheduling queue. Otherwise, it is the ID of the thread to run next, and the current thread will take second place in the scheduling queue.

Custom Schedulars • There are three ways to provide scheduling control of threads to the client. • Do nothing and provide the client a pointer to a thread context which can be scheduled at will. The problem: the client has to build its own scheduler from scratch. • Provide single scheduling policy with very little client control over what runs next. The UNIX system provides such a policy. It is difficult to implement policies that take into account the differing response time needs of client threads. • Middle ground: Default scheduling policy, but enough primitives are provided that it is possible to construct a wide variety of scheduling policies. (avoid the two above problems).

To custom-build your own scheduler we can use the following primitives:

lwp_sleep() blocks the thread executing this primitive for at least the time specified by timeout. Scheduling of threads is, by default, preemptive (higher priorities preempt lower ones) across priorities and non-preemptive within a priority. lwp_resched() moves the front thread for a given priority to the end of the scheduling queue. Thus, to achieve a preemptive round-robin scheduling discipline, a high priority thread can periodically wake up and shuffle the queue of threads at a lower priority. lwp_resched() does not affect threads which are blocked. If the priority of the rescheduled thread is the same as that of the caller, the effect is the same as lwp_yield().

lwp_setpri() is used to alter (raise or lower) the scheduling priority of the specified thread. If tid is SELF, the priority of the invoking thread is set. Note: if the priority of the affected thread becomes greater than that of the caller and the affected thread is not blocked, the caller will not run next. lwp_setpri() can be used on either blocked or unblocked threads.

lwp_suspend() makes the specified thread ineligible to run. If tid is SELF, the caller is itself suspended. lwp_resume() undoes the effect of lwp_suspend(). If a blocked thread is suspended, it will not run until it has been unblocked as well as explicitly made eligible to run using lwp_resume(). By suspending a thread, one can safely examine it without worrying that its execution-time state will change. Note: When scheduling preemptively, be sure to use monitors to protect shared data structures such as those used by the standard I/O library.

lwp_yield(), lwp_sleep(), lwp_resched(), lwp_join(), lwp_suspend(), lwp_resume() return: 0 on success. -1 on failure.

Example 4: how to build round-robin time sliced schedular. To have a high priority thread that acts as a scheduler, with the other threads at a lower priority. This scheduler thread sleeps for the desired quantum. When the quantum expires, the scheduler issues a lwp_reached() command for the priority of the scheduled threads. This causes a reshuffling of the run queue at that priority.

Context Switching A thread can pretend to be the only activity executing on its machine even though many threads are running. The LWP library provides this illusion. LWP library provides for the context switches between threads.

Messages • There are two types of process synchronization in use: • Rendezvous - easy to use interprocess-communication facilities (RPC). It supports communication across different address spaces. Higher-level than monitors because both data transmission and synchronization are combined into a single concept. It is natural to map asynchronous events into higher-level abstractions since messages are reliable and conditions are not. • Monitor - familiarity to UNIX system programmers via similarity to sleep() and wakeup() in the kernel. Messages vs. Monitors

With rendezvous, a context switch is always required. With monitors, a context switch is only necessary if the monitor lock is busy at the time of access. The LWP library provides both.

Rendezvous Semantics To use messages, one thread issues a msg_send() and another thread issues a msg_recv(). Whichever thread gets to the corresponding primitive first waits for the other, hence the term rendezvous. When rendezvous takes place, the sender remains blocked until the receiver decides to issue a msg_reply(). Immediately after msg_reply() returns, both threads are unblocked.

msg_send, msg_recv, msg_reply, msg_recvall, msg_enumsend,msg_enumrecv - LWP send and receive messages SYNOPSIS #include <lwp/lwp.h> int msg_send(dest,arg,argsize, res, ressize) thread_t dest; /* destination thread */ caddr_t arg; /* argument buffer */ int argsize; /* size of argument buffer */ caddr_t res; /* result buffer */ int ressize; /* size of result buffer */

int msg_recv(sender,arg,argsize, res, ressize, timeout) thread_t *sender; /* value-result: sending thread or agent */ caddr_t *arg; /* argument buffer */ int *argsize; /* argument size */ caddr_t *res; /* result buffer */ int *ressize; /* result size */ struct timeval *timeout; /* POLL, INFINITY, else timeout */

int msg_reply(sender) thread_t sender; /*agent id or thread id */ int msg_enumsend(vec, maxsize) thread_t vec[]; /*list of blocked senders */ int maxsize; int msg_enumrecv(vec, maxsize) thread_t vec[]; /*list of blocked receivers */ int maxsize; int MSG_RECVALL(sender, arg, argsize, res, ressize, timeout) /* Has the same parameters as msg_recv() but ensures that the sender is properly initialized to allow receipt from any sender. It returns the result from msg_recv */ thread_t *sender; caddr_t *arg; int *argsize; caddr_t *res; int *ressize; struct timeval *timeout;

DESCRIPTION Each thread queues messages addressed to it as they arrive. Threads may either specify that a particular sender's message is to be received next, or that any sender's message may be received next. msg_send() specifies a message buffer and a reply buffer, and initiates one half of a rendezvous with the receiver. The sender will block until the receiver replies using msg_reply(). msg_recv() initiates the other half of a rendezvous and blocks the invoking thread until a corresponding msg_send()is received. When unblocked by msg_send(), the receiver may read the message and generate a reply by filling in the reply buffer and issuing msg_reply().

msg_reply() unblocks the sender. Once a reply is sent, the receiver should no longer access either the message or reply buffer. In msg_send(), argsize specifies the size in bytes of the argument buffer argbuf, which is intended to be a read-only (to the receiver) buffer. ressize specifies the size in bytes of the result buffer resbuf, which is intended to be a write-only (to the receiver) buffer. dest is the thread that is the target of the send.

msg_recv() blocks the receiver until: A message from the agent or thread bound to sender has been sent to the receiver or, Sender points to a THREADNULL-valued variable and any message has been sent to the receiver from a thread or agent, or, After the time specified by timeout elapses and no message is received.

It is the responsibility of the sender to provide the buffer space both for a message to be sent to the receiver, and for a reply message from the receiver. While the sender is blocked, the receiver has access to the buffers provided by the sender.

Messages and Threads Messages are sent to threads, and each thread has exactly one queue associated with it to that receives messages. We could have provided message queues (ports) as objects not bound to processes. This would give more flexibility, but would require a more complex functionality. It will also complicate the implementation. To receive a rendezvous request, a process specifies the identity of the sending thread it wishes to rendezvous with. Optionally, a receiver may specify that any sender will do.

There is no other form of selection available. Example 6 demonstrates basic message passing.

Intelligent Severs Because the reply can be done at any time, a receiver can receive a number of messages before replying to them. This enables to implement complex servers. Example 7 demonstrates how processes send requests in a random order to a server thread. This server serializes the requests and process them in the order associated with the request.

Agents Because of the random nature of interrupts, it is hard to understand programs to deal with them. The LWP library provides a simple way to transform asynchronous events into synchronous ones. A message paradigm was chosen (instead of monitor) to map interrupts because an interrupt can not wait for a monitor lock if held by a client. With asynchronous interrupts, an event causes a context switch within the same thread. With LWP’s, a thread must synchronously randezvous within interrupt. Thus, to have an event that do something asynchronously, it is necessary to use a separate thread to handle it.

To simulate typical UNIX signal handling, we have to create two threads, one thread to represent the main program, and another thread at a higher priority to represent the signal handler. The latter thread would have an agent set up to receive signals. The agent mechanism is provided to map synchronous events into messages to a lightweight process. A message from an agent looks exactly like a message from another thread. When agent is created, we provide a portion of the pod’s address space for the agent to store its message. You can not receive the next message from an agent until you reply to the current one.

System Calls A set of heavyweight processes can execute concurrently system calls in the kernel. For example, 3 heavyweights processes can concurrently initiate writes to the same device. This is not the case for the lightweight threads. However, there is no general solution to the problem of having several threads execute system calls concurrently until the LWP primitives are made available as true system calls operating on a set of descriptors. The use of non-blocking I/O library can help by automatically blocking a thread attempting any I/O until such I/O is likely to succeed immediately. Non-Blocking I/O Library

Using the Non-Blocking I/O library Examples 8,9 shows how to use the non-blocking I/O library.

int socket(domain, type, protocol) int domain, type, protocol; socket()creates an endpoint for communication and returns a descriptor. The domain parameter specifies a communications domain within which communication will take place; this selects the protocol family which should be used. The protocol family generally is the same as the address family for the addresses supplied in later operations on the socket. These families are defined in the include file <sys/socket.h>. The currently understood formats are

PF_UNIX (UNIX system internal protocols), PF_INET (ARPA Internet protocols), and PF_IMPLINK (IMP "host at IMP" link layer). The socket has the indicated type, which specifies the semantics of communication. Currently defined types are: SOCK_STREAM SOCK_DGRAM SOCK_RAW SOCK_SEQPACKET SOCK_RDM A SOCK_DGRAM socket supports datagrams (connectionless, unreliable messages of a fixed (typically small) maximum length).

Lightweight (Multithreads) Processes

Lightweight (Multithreads) Processes

Presentation Transcript

Lightweight Wheelchair

Lightweight Trekking

LIGHTWEIGHT CONSTRUCTION

Lightweight Collaboration

LIGHTWEIGHT CONCRETE

Lightweight Containerboard

Lightweight purlins

Lightweight Prototyping

Lightweight Backpacking

Lightweight Architecture

LightWeight Luggage

Lightweight Helmet

Chapter 4: Threads

LightWeight Ninja

MultiThreads of Qt

Lightweight RPCs

Lightweight Strollers

Lightweight strollers

lightweight scaffolding

Lightweight Suitcases

Lightweight Concrete Lightweight Aggregate Concrete

Lightweight Concrete Lightweight Aggregate Concrete