slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Distributed system PowerPoint Presentation
Download Presentation
Distributed system

Loading in 2 Seconds...

play fullscreen
1 / 90

Distributed system - PowerPoint PPT Presentation

  • Uploaded on

Distributed system. Distributed Process Scheduling.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Distributed system' - traci

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

The primary objective of scheduling is to enhance overall system performance metrics such as process completion time and processor utilization. The existence of multiple processing nodes in distributed systems present a challenging problem for scheduling processes onto processors and vice versa.

a system performance model
A system performance model
  • Depicts the relationship among algorithm, scheduling and architecture to describe the Inter process communication
  • Basically three types of model are there:
  • Precedence process model(DAG)

Directed edges represent the precedence relationship

communication process model
Communication process model
  • In this model processes co-exist and Communicate asynchronously.
  • Edges in this model represent the need of communication between the processes
disjoint process model
Disjoint process model:
  • In this processes run independently and completed in finite time.
  • Processes are mapped to the processors to maximize the utilization of processesorsand minimize the turnaround time of the processes.

Partitioning a task into multiple processes for execution can result in a speedup of the total task completion time. The speedup factor S is a function

S = F(Algorithm; System; Schedule)


S can be written as:


OSPT = optimal sequential processing time

CPT = concurrent processing time

OCPTideal = optimal concurrent processing time

Si = the ideal speedup by using multiple processor system over best sequential time

Sd = the degradation of the system due to actual implementation compared to an ideal system


and n is the number of processors. The term is

is the total computation of the concurrent algorithm where m is the number of tasks in the algorithm.

Sd can be rewritten as:



RP is Relative Processing: how much loss of speedup is due to the substitution of the best sequential algorithm by an algorithm better adapted for concurrent implementation.

  • RC is the Relative Concurrency which measures how far from optimal the usage of the n-processor is. It reflects how well adapted the given problem and its algorithm are to the ideal n-processor system.
  • The final expression for speedup S is

The term is called efficiency loss. It is a function of scheduling and the system architecture. It would be decomposed into two independent terms but this is not easy to do since scheduling and the architecture are interdependent. The best possible schedule on a given system hides the communication overhead (overlapping with other computations).


The unified speedup model integrates three major components

    • algorithm development
    • system architecture
    • scheduling policy

with the objective of minimizing the total completion time (makespan) of a set of interacting processes. If processes are not constrained by precedence relations and are free to be redistributed or moved around among processors in the system, performance can be further improved by sharing the workload

    • statically - load sharing
    • dynamically - load balancing
speed up
Speed up

N- Number of processes

- Efficiency Loss when implemented on a real machine.

RC-relative concurrency

RP-relative processing requirement

Speed up depends on:

Design and efficiency of the scheduling algorithm.

Architecture of the system

static process scheduling
Static process scheduling
  • Scheduling a set of partially ordered tasks on a nonpreemtive multiprocessor system of identical processors to minimize the overall finishing time (makespan)
  • Except for some very restricted cases scheduling to optimize makespan is NP-complete
  • Most research is oriented toward using approximate or heuristic methods to obtain a near optimal solution to the problem
  • A good heuristic distributed scheduling algorithm is one that can best balance and overlap computation and communication

In static scheduling, the mapping of processes to processors is determined before the execution of the processes. Once a process is started, it stays at the processor until completion.


This model is used to describe scheduling for ‘program’ which consists of several sub-tasks. The schedulable unit is sub-tasks.

  • Program is represented by a DAG.
  • Primary objective of task scheduling is to achieve maximal concurrency for task execution within a program
  • Precedence constraints among tasks in a program are explicitly specified.
  • critical path: the longest execution path in the DAG, often used to compare the performance of a heuristic algorithm.
Scheduling goal: minimize the makespan time.


List Scheduling (LS): Communication overhead is not considered. Using a simple greedy heuristic: No processor remains idle if there are some tasks available that it could process.

Extended List Scheduling (ELS): the actual scheduling results of LS with communication consideration.

Earliest Task First scheduling (ETF): the earliest schedulable task (with communication delay considered) is scheduled first.


[Chow and Johnson 1997]

communication process model1
Communication process model
  • Process scheduling for many system applications has a perspective very different from precedence model – applications may be created independently, processes do not have explicit completion time and precedence constraints
  • Primary objectives of process scheduling are to maximize resource utilization and to minimize interprocess communication
  • Communication process model is an undirected graph G with node and edge sets V and E, where nodes represent processes and the weight on an edge is the amount of interaction between two connected processes

Objective function called Module Allocation for finding an optimal allocation of m process modules to P processors:


Heuristic solution: separate optimization of computation and communication into two independent phases.

    • Processes with higher interprocess interaction are merged into clusters
    • Each cluster is then assigned to the processor that minimizes the computation cost
dynamic load sharing and balancing
Dynamic load sharing and balancing
  • The assumption of prior knowledge of processes is not realistic for most distributed applications. The disjoint process model, which ignores the effect of the interdependency among processes, is used.
  • Objective of scheduling: utilization of the system (has direct bearing on throughput and completion time) and fairness to the user processes (difficult to defne).

If we can designate a controller process that maintains the information about the queue size of each processor:

  • Fairness in terms of equal workload on each processor (join the shortest queue) - migration workstation model (use of load sharing and load balancing, perhaps load redistribution i.e. process migration)
  • Fairness in terms of user's share of computation resources (allocate processor to a waiting process at a user site that has the least share of the processor pool) - processor pool model

Solutions without a centralized controller: sender- and receiver-initiated algorithms.

Sender-initiated algorithms:


push model

  • includes probing strategy for ¯nding a node with the smallest queue length (perhaps multicast)
  • performs well on a lightly loaded system
receiver initiated algorithms
Receiver-initiated algorithms:
  • pull model
  • probing strategy can also be used
  • more stable
  • perform on average better

Combinations of both algorithms are possible: choice based on the estimated system load information or reaching threshold values of the processing node's queue.


Three significant application scenarios:

  • Remote service: The message is interpreted as a request for a known service at the remote site (constrained only to services that are supported at the remote host)
    • remote procedure calls at the language level
    • remote commands at the operating system level
    • interpretive messages at the application level
  • Remote execution: The messages contain a program to be executed at the remote site; implementation issues:
    • load sharing algorithms (sender-initiated, registered hosts,broker...)
    • location independence of all IPC mechanisms including signals
    • system heterogeneity (object code, data representation)
    • protection and security

Process migration: The messages represent a process being migrated to the remote site for continuing execution (extension of load-sharing by allowing a remote execution to be preeemted)

  • State information of a process in a distributed systems consists of two parts: computation state (similar to conventional context switching) and communication state (status of the process communication links and messages in transit). The transfer of the communication state is performed by link redirection and message forwarding.

Reduction of freeze time can be achieved with the transfer of minimal state and leaving residual computation dependency on the source host: this concept fits well with distributed shared memory.

real time systems
Real Time Systems
  • Correctness of the system may depend not only on the logical result of the computation but also on the time when these results are produced
  • Tasks attempt to control events or to react to events that take place in the outside world
  • These external events occur in real time and processing must be able to keep up
  • Processing must happen in a timely fashion, neither too late, nor too early
  • Some examples include, Air Traffic Control, Robotics, Controlling Cars/Trains, Medical Support, Multimedia.

Real time services are carried out by set of real time tasks

  • Each task τ is described by

τi = (Si,Ci,Di)

WhereSi is the earliest possible start time of task

τi , Ci is the worst case execution time of τi , Di is the deadline of τi

types of real time systems
Types of Real Time Systems
  • Hard real time systems
    • Must always meet all deadlines
    • System fails if deadline window is missed
  • Soft real time systems
    • Must try to meet all deadlines
    • System does not fail if a few deadlines are missed
  • Firm real time systems
    • Result has no use outside deadline window
    • Tasks that fail are discarded


- Each task can arrive at any time

  • Periodic

- Each task is repeated at a regular interval

- Max execution time is the same each period

- Arrival time is usually the start of the period

- Deadline is usually the end


Each task is released at a given constant rate

  • Given by the period T
  • All instances of a task have:
    • The same worst case execution time: C
    • The same relative deadline: D=T (not a restriction)
    • The same relative arrival time: A=0 (not a restriction)
    • The same release time, released as soon as they arrive
  • All tasks are independent
  • No sharing resources
  • All overheads in the kernel are assumed to be zero E.g context switch etc

V={Ji=(Ci, Ti)|1≤ i ≤ n}

real time scheduling
Real time scheduling
  • Schedule tasks for execution in such a way that all tasks meet their deadline
  • Uniprocessor scheduling
  • A schedule is a set A of execution intervals described as

A = {(si,fi,ti)|i=1,...,n}

Where si is the start time of the interval

fi is the finish time of the interval

tiis the task executed during the interval


The schedule is valid if

    • For every i=1, …,n si < fi
    • For every i=1, …,n fi < si+1
    • If ti=k, then Sk ≤ si and fi ≤ Dk
  • A task set is feasible if every task τk receives at least Ck seconds of CPU execution in the schedule
  • A set of task is feasible if there is feasible schedule for the tasks
rate monotonic
Rate Monotonic
  • Assumptions
    • Tasks are periodic and Ti is the period for task τi
    • Tasks do not communicate with each other
    • task are scheduled according to the priority and task priorities are fixed( static priority scheduling)
  • If task τi is requested at time t τi can meet its deadline if the time spent executing higher priority tasks during the time interval (t, t+Di) Di-Ci or less
  • The critical instant for task τi occurs when task τi and all higher priority tasks are scheduled simultaneously

If τi can meet its deadline when it is scheduled at a critical instant, τi can always meet its deadline

  • Rate monotonic priority assignment

If Th< Tl then PRh > PRl

deadline monotonic
Deadline Monotonic
  • Some tasks in real time system might need to complete execution a short time after being requested
  • Tasks with shorter deadlines get higher priority.
  • Static Scheduling.
  • If D(h) < D(l), then PR(h) > PR(l), where D indicates the deadline. This is called Deadline Monotonic priority assignment.
earliest deadline first
Earliest Deadline First
  • Dynamic Scheduling
  • Assume a preemptivesystem with dynamic priorities
  • Like deadline monotonic, the task with shortest deadline gets highest priority, but the difference is real time priorities can vary during the system’s execution. Priorities are reevaluated when events such as task arrivals, completions occur, synchronization
real time synchronization
Real time synchronization
  • Required when tasks are not independent and need to share information and synchronize
  • If two tasks want to access the same data, semaphores are used to ensure non-simultaneous access
  • Blocking due to synchronization can cause subtle timing problems
priority inversions
Priority Inversions
  • Low priority task holds resource that prevents execution of higher priority tasks.

- Task L acquires semaphore

- Task H needs resource, but is blocked by Task L, so Task L is

allowed to execute

- Task M preempts Task L, because it is higher priority

- Thus Task M delays Task L, which delays Task H


A task t will access a set of critical sections. Overlapping of critical sections must be properly nested

  • A task is blocked by a critical section zl(k)of lower priority task tl if th must wait for tl to exit zl(k) before resuming execution
priority inheritance protocol
Priority Inheritance Protocol
  • PIP eliminates priority inversion problem
  • The algorithm will increase the priority of a task to the maximum priority of any task waiting for any resource the task has a resource lock on
      • i.e. if a lower priority task L has a lock on a resource required by a higher priority task H, then the priority of the lower task is increased to the priority of the higher task
      • Once the lock is released the task resumes back its original priority

PIP rules are

  • A task is assigned its normal priority when it is requested
  • The CPU is assigned to the highest priority ready process
  • Before a task can enter CS, it must first acquire a lock on the semaphore that guards the CS
  • If task th is blocked through semaphore S, which is held by lower priority task tl , th is removed from the ready list , and PRl is assigned PRh
  • Priority inheritance is transitive. i.e if t2 blocks t1 and t3 blocks t2, both t2 and t3 inherits PR1
  • When tl releases semaphore S, the highest priority process blocked through S is put on the ready queue. Task tl releases any priority it inherited through S and resumes a lower priority

PIP limits the time during which a task is blocked

  • There are two ways that a low priority task can block a high priority task
    • Direct blocking: which occurs when a high priority task attempts to lock a semaphore held by a low priority task
    • Push through blocking: when a low priority task inherits a high priority and executes at the cost of medium priority task. In this case the medium priority task that experience the push through blocking while high priority task experiences direct blocking

Blocking duration

    • th can be blocked for at most the duration of one critical section
  • Ceiling(S) be the priority of the highest priority task that can be blocked by S.
priority ceiling protocol
Priority Ceiling Protocol
  • A task can acquire a lock on resource S only if no other task holds a lock on resource R. Thus higher priority tasks will not be blocked through both S and R
  • If a high priority task is blocked through a resource, then the task holding that resource gets the priority of the high priority task. Once the resource is released, the priority is reset back to its original
general access consistency models
General access consistency models
  • Atomic consistency
    • Atomic consistency (also called strict consistency) is the most stringent one and is defined by the following condition:
      • Any read to a memory location x returns the value stored by the most recent write operation to x.
    • Definition assumes the existence of absolute global time - “impossible” to achieve in distributed systems

Sequential consistency

    • Slightly weaker memory model than strict consistency.
    • Definition (by Lamport):
      • The result of any execution is the same as if the operations of all processes were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.
    • Any valid interleaving is acceptable behavior, but all processes must see the same sequence of memory references

Can be implemented on a DSM system by replicating only read-only pages, while writable pages are never replicated, or if can be ensured that no memory operation is started until all the previous ones have been completed (totally-ordered reliable broadcast mechanism)

  • Programmer-friendly, but has a serious performance problem: every write to a memory location must be propagated throughout the whole DSM system before the next one (to the same location) can be started

Causal consistency

    • Further relaxation of sequential consistency. Distinction is made between events that are potentially causally related and those that are not.
    • Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines.
    • (Causality was defined during explanation of vector logical clocks and causally ordered multicast). Operations that are not causally related are said to be concurrent.
    • Causal consistent DSM can be implemented similarly to causally ordered multicast.

Causal consistency (continued)

P1: W(X)1

P2: R(X)1 W(X)2

P3: R(X)2 R(X)1

P4: R(X)1 R(X)2

not causally consistent

If we remove R(X)1, W(X)1 and W(X)2 are concurrent

P1: W(X)1

P2: W(X)2

P3: R(X)2 R(X)1

P4: R(X)1 R(X)2

causally consistent


Processor consistency

Writes from the same processor are performed and observed in the

order they were issued. Writes from different processors can be in

any order.

P1: W(X)1

P2: R(X)1 W(X)2

P3: R(X)1 R(X)2

P4: R(X)2 R(X)1

processor consistent, not causally consistent

In this model all writes generated by different nodes are considered concurrent.


Slow memory consistency

Writes to the same location by the same processor must be in order.

P1: W(X)1 W(Y)2 W(X)3

P2: R(Y)2 R(X)1 R(X)3

slow memory consistent


Consistency models with synchronization access

  • user information to relax consistency
  • synchronization access: read/write operations to synchronization
  • variables only by special instructions

Weak consistency

• Access to synchronization variables are sequentially consistent

• No access to a synchronization variable is issued by a processor

before all previous read/write operations have been performed

• No read/write data access is issued by a processor before a

previous access to a synchronization variable has been performed

P1: W(X)1 W(X)2 S P1: W(X)1 W(X)2 S

P2: R(X)1 R(X)2 S P2: S R(X)1

P3: R(X)2 R(X)1 S

weakly consistent not weakly consistent

weak consistency enforces on a group of operations, not on individual reads and writes.


Release consistency

Use a pair of synchronization operations: acquire(S) and release(S)

• Acquire accesses are used to tell the DSM system that a critical

region is about to be entered.

• Release accesses say that a critical section has just been exited.

  • No future access can be performed until the acquire operation is
  • completed
  • All previous operations must have been performed before the
  • completion of the release operation
  • Order of synchronization access follows the processor consistency
  • model (acquire - read, release - write)

Entry consistency

Locking objects, instead of locking critical sectionie,

• For each shared variable X, associate acquire(X) and release(X)

• acquire(X) locks the shared variable X for the subsequent exclusive

operations on X until X is unblocked by a release(X)


Distinction of exclusive (for writing) and nonexclusive access (for reading) to synchronization variables The consistency rules for entry consistency:

  • Before an acquire access of a synchronization variable is allowed to perform, all updates to the guarded shared data must be performed with respect to the process.
  • When a synchronization variable is acquired in exclusive mode, no other process may hold the synchronization variable, not even in non-exclusive mode.
  • After an exclusive mode access to a synchronization variable has been performed, the next non-exclusive mode access to that synchronization variable performed by any other process is allowed to perform only after it is performed with respect to that variable’s owner.
snooping cache and strong consistency
Snooping cache and strong consistency
  • Hardware with capability for broadcasting and monitoring communication accesses.
  • In snooping cache with a common bus, each cache controller can monitor all memory access on the bus
  • Write update with broadcast- write broadcast
distributed shared memory1
Distributed Shared Memory
  • Three options for accessing a remote memory block:
  • A remote access is performed remotely at a remote node
  • The remote block is migrated to the local node
  • The remote block is replicated to the local node – enables concurrent accesses

Four combinations with respect to type of accesses (read, write) are meaningful:

  • Read-remote-write-remote: Central server algorithm;

– servers are potential bottleneck

– memory coherence is trivial

  • Read-migrate-write-migrate: Migration algorithm (SRSW)

– better performance by exploiting program localities

– suffers from ping-pong effect or thrashing effect and perhaps false sharing

  • Read-replicate-write-migrate: Read-replication algorithm (MRSW)

– uses write-invalidate protocol in case of a write access to a

read-replicated block

– natural to use the notion of a block owner


• Read-replicate-write-replicate: Full replication algorithm (MRMW), most frequently uses write-update protocol in case of a write access to a read-replicated block

  • difficult to achieve strong consistency in DSM
  • 2PC can be used for implementing atomic broadcast protocols
  • assume a definitive group of members (the copy set of the shared data block is known)
block owner and copy list
Block owner and copy list

Memory coherence managers in a DSM system do not have the capability of hardware broadcasts→need of maintaining data structures for:

  • Locating the current owner of a data block (after several migrations)
  • Identifying all replicated copies for invalidating and update (i.e. representation of the copy set)

Data structures for the representation of a copy set:

  • Spanning tree; problem with deciding when the broadcast is completed
  • Distributed linear linked list; each node keeps two pointers:

– to the node with the master copy

– to the next node on the list

  • A write request is always forwarded to the head node with the master copy for propagating invalidation or update through the list

Computer security:

1. Secrecy (privacy, confidentiality)

2. Integrity

3. Availability (without denial of service)

  • Fault tolerance:

1. Reliability

2. Safety


Fault-tolerant and secure computer and communication system is called dependable.

  • Distributed systems are inherently more vulnerable to security threats than a single computer system:

• Open architecture

  • Need for interaction across a wide range of autonomous and heterogeneous systems
  • Message passing IPC through a communication network (spoofing and forging)
fundamentals of computer security
Fundamentals of computer security
  • Two views of computer security:
  • access control policy: security policy describing how objects are to be accessed by subjects
  • flow control policy: security policy describing the information flow between entities (objects and subjects)

Four categories of common security threats to objects:

    • interruption
    • interception
    • modification
    • fabrication

Fundamental approaches in dealing with security problems:

  • authentication (excluding external intruders)
  • authorization (control of internal intruders)
  • fault-tolerance (prevention of unintentional faults)
  • encryption (maintaining privacy)
  • auditing (passive form of protection, catching security breaches)
security issues in distributed systems
Security issues in distributed systems
  • Distributed OS system architecture principle: separation of mechanisms (kernel) and policies (servers).

Retaining interoperability and transparency in face of potential security threats - security transparency. To achieve this, a standard security system architecture with an API for trusted applications is needed. Example: Generic Security Service Application Program Interface (GSS-API).