Presented by Kunmun Garabadu & Roney Philip

Presented byKunmun Garabadu & Roney Philip Real–Time Communication -Paulo Verissimo

Real time communication • To achieve real-time communication: • Real time protocols • Real time networks - timely and reliable • Characteristics of real time communication • Known and bounded msg delivery • Deterministic behavior in the presence of disturbing factors • Recognition of latency classes • Connectivity

Real time networks • LAN or MAN • LAN • Small scale • Reliable to very reliable • Span a few 1000 ms • Round trip times 10-5 to 10-1 secs

Reliability Strategies • Faults lead to: • Lost messages • Delays • Corrupted contents • Solution: • Space redundancy - replicated hardware • Mandatory for critical systems like flight control • Time redundancy - message repetition

Reliability Strategies • Space redundancy Cons • High cost of hardware • Complex • Time redundancy Cons • Communication reliability low for real-time applications • Which methods and techniques to use? • Ask 2 questions • Can we reliably obtain real time behavior out of simplex( non- replicated) networks? • Which protocols and QoS to use?

Reliability Strategies • Solution to 1 • Combination of simplex standard LANs • Space redundancy in physical layer • To maintain connectivity • Protocol time redundancy • Protocols see only one LAN controller • Solution to 2 • For reliability of communication • Error masking • Error detection and forward recovery • Error detection and backward recovery

Error masking • Assume bounded number of failures, say k, from a particular component • Have more than k channels • Have more than k transmissions • Mask k failures a)space redundancy b) time redundancy

Error detection: Forward recovery • For periodic real time communication • Relationship between consecutive measurements • Possible to skip a lost msg • Wait for the next msg use previous value refreshed V(t3) V(t1) 1 2 3 k = 1 Maximum period without refreshing a) Forward recovery

Error detection: Backward recovery • Ack based protocol • Restarts when a msg is lost • Appropriate when msgs cannot be lost Timeout k = 1 b) Backward recovery

Making real-time LANs reliable • LANs have to display real-time behavior • Obtained by: • Establishing a model • Traffic patterns • Reliability and timeliness requirements • Failure assumptions • Service and interface definition • Dressing the elementary LAN with hardware and software to comply with requirements

Abstract LAN Model • We need LAN interfacing to be LAN independent • Standardisation bodies achieved this through LLC • But no services in LLC aims at real-time, reliability etc • So we devise a complete model overcoming these problems • Using some of the properties of LAN to implement protocols

Abstract LAN Properties • An1 – Broadcast • An2 – Error Detection • An3 – Network Order • An4 – Full Duplex • An5 – Tightness • An6 – Bounded Transmission Delay • An7 – Bounded Omission Degree • An8 – Bounded Inaccessibility

Real time communication requirements • LAN components display following failures: • Timing failures • Omission failures • Network partitions • Definition of reliable real time network RT- “A reliable real-time network displays bounded and known message delivery delay, in the presence of disturbing factors such as overload or faults”

Real time communication requirements • Some networks recognize urgency • Urgency classes • Critical or hard real-time • Best-effort or soft real-time • Background or non real-time

Solution to real-time communication requirements • Enforce bounded delay from request to transmission of a frame given the worst case conditions assumed (avoid timing failures) • Ensure that a message is delivered despite the occurrence of omissions (tolerate omission failures) • Maintain connectivity (control partitions)

Enforcing Bounded Transmission Delay • An6 not guaranteed • Factors to take into account: • Traffic patterns • Latency classes • LAN sizing and parametrising • User-level load/flow control

Traffic patterns • Designer must model the traffic offered to the network • Aperiodic traffic • No guarantees about transmission delays • Cyclic traffic – defined by period • Sporadic traffic – bursty

Latency classes • Traffic separation in latency classes • Highest criticality traffic should be given lowest latency class • Should be given certain amount of channel bandwidth to fulfill latency requirements • Enforce a given transmission time bound for every sender

LAN sizing and parametrising • LAN sized and parametrised to comply with aimed bound or vice-versa • Aimed latency not achievable with offered load • Consequences • Latency goes up • number of nodes and/or their offered load go down • Sending node reduces its traffic demands • Iterative procedure

User level load/flow control • Flow based load control delays transmissions • Role of real-time load control • Regulate global offered load • Throttle individual traffic • Sporadic event class has bound for • Interarrival rate • Burst length • Burst rate

Burst period Burst length Minimum interarrival time Average interarrival time Fig: Timing pattern of sporadic events

User level load/flow control • Rate based flow control • Calculate average interarrival rate • Manipulate the rate at which data is sent • Smoothens the bursty nature • Rate should not go smaller than average interarrival rate

User level load/flow control • Load control mechanisms • Rate control • Suited for periodic and sporadic traffic • Matches senders and recipients capabilities • No discontinuities in traffic flow • Credit control • Allocates recipients some credits • When credit is over, recipient refuses to accept more information • Improved scheme – look ahead credit request or supply

Handling Omission Failures Characterstics of omissions in a LAN: • Omissions are rare. • They can occur in bursts. • Are usually the result of failure of a single component. Omission Degree : It is the number of consecutive omissions produced by a component. An7 : Bounded Omissions Degree. In a known interval Trd, omission errors may affect at most k transmissions. This feature serves as the foundation of basic error processing protocols with deterministic termination. This is important for real time operation.

Transmission-With-Reply tries := 0; resp := empty; do tries < nrTries ^ resp != full -> resp := empty; Tx(data, id); waitRepliesPutInBag(TwaitReply, resp); tries :=tries + 1; od

Diffusion tries := 0; do tries < nrTries -> Tx(data, id); tries :=tries + 1; od

Tx-with reply • Optimal for average case where error rate is expected to be low • Only one try in absence of errors • Identifier id allows to distinguish between duplicate messages. • It aims for a completely correct series • It allows for complete order among competing LAN transmissions.

Diffusion • At least one instance of the message reaches every node • It repeats transmission k+ 1 times. Both algorithms execute within a bounded time in absence of partitions

Comparision of Algorithms

Inaccessibility RT: Maintain connectivity An8 : Bounded Inaccessibility. In a known interval Trd, the network may be inaccessible at most i times with a total duration of at most Tina. Network is partitioned into subsets of nodes that cannot communicate. • Causes of partition : bus medium failure, ring disruption, transmitter or receiver defects, token loss etc. • Controlling partition : Solution is in knowing how long a partition lasts. This should be sufficiently small so that the service can be carried on effectively • Inaccessibility : Period of time for which the partition lasts.

Inaccessibility Control How to implement inaccessibility control ? • Instrument the LAN to recover from all conditions leading to partition • Have a bound for number and duration of inaccessibility periods • Accommodate inaccessibility in the protocols and timeliness calculations. • Determine the upper bound for recovery from partitioning • The upper bound may be dependant on operating situation specific to each LAN. • If network is properly managed and parameterised inaccessibility figures can be drastically reduced.

Inaccessibility in Timeliness Model Inaccessibility must be accounted in the following : • Calculations of real worst case execution times • Dimensioning of timeouts Synchronous real-time operation of LAN: • Tina has to be added to the real worst-case execution time of protocols • The protocol may fail if it times out too early but inaccessibility occurs. • Including Tina in time-outs is a sufficient condition for running synchronous operation • Tina may be much greater than Ttd causing timeouts to be undesirably long.

Better to take inaccessibility off from the time-outs Methods to remove inaccessibility : Timer Freezing : • Inaccessibility is detected • All timers used in time-outs are suspended • Timers are restarted when the network becomes accessible Inaccessibility Trapping : • Each inaccessibility period inside two consecutive transmission signals from the LAN are trapped This avoids more than one timeout per inaccessibility period. • Each inaccessibility occurrence counts as one omission. • Extra omissions have to be added in the retry count of the low level protocols.

LAN Redundancy Enforcement of bounded omission degree and bounded inaccessibility can be obtained through redundancy in the physical and medium layers • FDDI has a dual-reconfiguring ring capable of surviving just one interruption. • Token-bus and Ethernet have no standardised redundancy. • Extra measures have to be implemented to survive multiple failures.

Dual Media Token Bus LAN Higher-level protocols Medium-Access Control VLSI Selector State Machines Physical layer Physical layer Dual Media Token Bus LAN

Addressing • Efficient and timely to meet real-time requirements. • Reception of frames not addressed to anyone in the node has to be avoided Frame addressing involves the following : • Construction of the address at frame transmission • Interpretation of the address of the passing or received frame Address formats correspond to (type;addressing mode) • Type performs the first step in selection ; it points to a set of possible filters • Mode selects the appropriate filter.

Addressing Classification of several addressing modes : • Individual : It enables a sender to address a particular station by its physical address. • Broadcast : It enables a frame to be accepted in all nodes. • Logical : It is intended to address a given group of nodes identified by a n-bit gate address independent of their location and number. • Selective : It consists of a n-bit binary chain but each of the bits represents a node. The association between a station and a bit can be static or dynamic.

Processor Group Membership • It provides a map of the nodes belonging to the group. • It is independent of higher level groupings of processes. • It maintains an Active Stations Table (AST) AST provides the station ordering and a basic mask where stations are marked “up” or “down”

Processor Group Membership Categories of events that PGM responds to : • Insert/Delete, • Join/Leave, • Failure PGM functions : • Maintenance of AST : Responds to insert/delete requests • Provision of Short Addresses : Reference a node by its positionin the AST • Failure and Group Change Handling : Acts upon suspicion of failure that may come from a network driver, group communication protocol etc • Information about group members : Can respond to a number of requests regarding group members.

Clockless PGM Protocol Delta-4 System • A GroupChangeEvent for join,leave or failure cases triggers the protocol. • In case of failure, a component detecting failure issues the check request. The node requests the other members’ state. • The node gets replies and constructs the new AST. It sends it out to members. This is done using Tx-with-Reply to make sure all members install the new table. • The first message locks the table so that competitors are left out • With omissions more than one competitor may lock subsets of the nodes • Each of them retries incrementing a lock_level counter until one of them locks all nodes successfully and then proceeds

Clockless PGM Protocol Group change event Compute station table GetState(and lock) NewState (unlock) My state Installed a) StationTableOps: Insert, Delete, Down, Up

Clock-driven PGM Protocol AAS System • Two events trigger the protocol : Upon request like join or passage of time • Periodically membership management is done to ensure changes are detected in bounded time • Group communication is through diffusion. Only way to detect failures is through such a protocol. • All processors diffuse an “I’m alive” message so that each and everyone will build the same view of processors alive.

Time-Triggered PGM Protocol MARS System • Periodically all nodes broadcast their message • Each message is sent twice to overcome omission • Each processor listens to all transmissions making a vector of dimension N, where N is the number of nodes. Vu,v is a boolean which is true when processor u saw a valid message from processor v • Vector V is then sent in the following period transmission.All processors receive N vectors • A matrix is built which is as follows: • Each column u accounts for the messages Pu saw from all others • Each row v accounts for the messages from Pv seen by all the others

Time-Triggered PGM Protocol • This protocol detects failures with one cycle delay at most. • Matrices may not be equal in all nodes.They guarantee to have enough information to deterministically detect a failed processor. • A failed processor is one that fails to transmit both copies of its message to all or fails to receive both copies of another node’s message [ ] P1 * V2,1 V3,1 V4,1 P2 V1,2 * V3,2 V4,2 P3 V1,3 V2,3 * V4,3 P4 V1,4 V2,4 V3,4 * [ ]

Summary • Real time communication • Real time networks • Real time protocols • Real time networking and reliability policies • Making real-time networks reliable and timely • Bounded transmission delays • Handling failures • Inaccessibility

Summary • Low level protocols assist high level protocols in attaining: • Transmission reliability • Selective and logical addressing

Presented by Kunmun Garabadu & Roney Philip