Videoconferencing over Packet Switched Networks

Videoconferencing over Packet Switched Networks 1. Introduction 2. Components 3. Network Architecture 4. Software Architecture 5. Performance 6. A Videoconferencing over ATM

1. Introductions • The International Telecommunications Union (ITU) addressed videoconferencing standard. H.320 was defined for a circuit-switched narrow-band ISDN environment at bandwidths ranging from 64 kbit/s to over 2 Mbit/s. 1) a central conference server called a multipoint control unit (MCU) to enable multiparty calls. Each participant directly linked to the MCU, which then controls the conference. 2) the first one. (the H.261 video compression standard) the H.324 for POTS, the H.323 for LAN's, and H.310 for ATM. • The circuit-switched network (narrow-band ISDN or the switched 56 kbit/s phone line) 1. All connections flow over dedicated point-to-point links. Once established, a connection is dedicated between the endpoint's and bandwidth is guaranteed for the duration of the call. 2. A centralised approach MCU has to be used to achieve a multiparty conference. 3. The transmission rate was ensured by synchornization of the codec and the network clock in both sender and receiver stations. 4. Video bit rate is forced to be constant efficiently utilize the connection bandwidth. Variations about video quality and an additional delay in compression engine.

1. Introduction • The packet-based network (e.g., Ethernet and Token Ring, or more generally, IP network) solution, which strive to carry real-time traffic over existing computer communications networks. 1. the nodes send discrete blocks of data to each other. 2. Connections may be established between endpoinds, but may only be used when there is data to be sent. 3. The routers and switches can multicast without through a centralised MCU. 4. Video/audio processing can be performed in the each end station. 5. The packet-based network will be popular. • The existing applications over packet based networks are: the Xerox PARC Network Video tool nv, the INRIA Video Conferencing System ivs, and the LBL UCB flexible videoconferencing vic. All of them provide video in real time over the MBone. They share the goal of supporting low-rate multicast video over Internet.

2. Components • The hardware configuration of a videoconferencing system consists of: a video/audio capture and display card, a network communication adaptor, a compression/decompression card (or software), a video camera, a microphone/speaker, and a high performance computer.

3. Network Architecture 1) Multicast • The purpose of videoconferencing: a large group of people who spread over a large geographic region have some reason to hold a conference, which need to sent and receive data within a scalable subgroup. • Multicast is a special form of broadcast in which packets are delivered to a specified subgroup of network hosts. • There are two styles of multicasting: ___doing multicasting utilising the Multicast Backbone (MBone), in which the sender doesn't know who will receive the packets. The sender just sends to an address and it up to the receivers to join that group. (nv, ivs and vic) ___ the sender specifies who will receive the packets. This gives more control over the distribution. The drawback is that it doesn't scale well, and it is always impossible to handle with thousands of receivers. A prototype named Multimedia Multiparty Teleconferencing (MMT) is proposed by IBM. It uses a predetermined multicast address to set up the multicast connections. Network directory services are required to easily assign addresses and provide confidentiality.

MBone • Delete sample documenticons and replace with working document icons as follows: • From Insert Menu, select Object... • Click “Create from File” • Locate File name in “File” box • Make sure “Display as Icon” is checked • Click OK • Select icon • From Slide Show Menu, Select “Action Settings” • Click “Object Action” and select “Edit” • Click OK • MBone is a virtual network on "top" of the Internet providing a multicasting facility to the Internet. It is composed of networks (islands) that support multicast. • In Fig.2, MBone consists of three islands. Each island consists of a local network connecting a number of client hostes ("C"), and one host running mrouted ("M"). The mrouted's are connected with each other via unicast tunnels. • Each mrouted has routing tables for deciding the forward tunnel. Each tunnel has a metric and a threshold. The metric specifies a routing cost that is used in the Distance Vector Multicasting Routing Protocol (DVMRP). The threshold is the minimum time-to-live (TTL) that a multicast datagram needs to be forwarded into a given tunnel. Threshold is specified for each multicast packet. • All traffic in Mbone uses User Data Protocol (UDP) rather than TCP.

2) Protocols • Because videoconferencing needs real time video/audio transmission, TCP is not suitable to such requirement. • Both ivs and vic used real time protocol (RTP) and UDP/IP to transmit the streams across the Internet. • RTP was developed by the Internet Engineering Task Force (IETF) as an application level protocol. It aims to provide a very thin transport layer which is the most often integrated into the application processing rather than being implemented as a separate layer. • In IP protocol stack, RTP is layered over UDP; in the ATM stack, it runs over ATM Adaptation Layer. (Basically every connection which can be established using existing end-to-end protocols can be used for RTP.)

Real Time Transport Protocol (RTP) • RTP describes two protocols: the data transfer protocol (RTP) and the control protocol (RTCP). RTP/RTCP concentrates on the transmission of real-time, non-reliable streaming data and is specialised to work both with unicast and multicast delivery. • Each RTP consists of a RTP header and a RTP payload. In ivs, the RTP payload = RTP-H.261 header + H.261 video stream. • RTCP manages control information like sender identification, receiver feedback, and cross-media synchronisation. RTCP packets are transmitted periodically to all participants in the session and the period is adjusted according to the size of the session. • The RTP header mainly consists of following items: • Marker (M): 1 bit, 1 = the last packet of a video frame, 0= other • Payload type (PT): 7 bits, gives the media type and encoding/ compression format of the RTP payload. At any given time an RTP sender is supposed to send only a single type of payload.

The RTP header: • Delete sample documenticons and replace with working document icons as follows: • From Insert Menu, select Object... • Click “Create from File” • Locate File name in “File” box • Make sure “Display as Icon” is checked • Click OK • Select icon • From Slide Show Menu, Select “Action Settings” • Click “Object Action” and select “Edit” • Click OK • Sequence number: 16 bits, this number increments by one for each RTP data packet sent, and may be used by the receiver to detect packet loss and to restore packet sequence. • Timestamp: 32 bits, it encodes the sampling instant of the first data stream in the RTP data packet. The sampling instant must be based on a clock. If a video image occupies more than one packet, the time stamp will be the same on all of those packets. Packets from different video images must have different time stamps. • Synchronization source packet identifier (SSRC): 32 bits, it identifies the synchronization source. It is a randomly chosen value meant to be globally unique within a particular RTP session. • Contributing source (CSRC) list: 0 to 15 items, 32 bits each. The list identifies the contributing sources for the payload contained in this packet. CSRC lists are be inserted by the mixer. The mixer is an intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines the packets in some manner and then forwards a new RTP packet.

The main goals of RTP: • Synchronisation of various streams RTP provides functionality suited for carrying real-time content, e.g. a timestamp for synchronising different streams with timing properties. Synchronization has to be happen in application level by either: 1) Using a playback buffer large enough to compensate most of the jitter of all participating streams, 2) Introducing an additional RTCP-like management stream, which only gives feedback about relative arrival times between the independent streams. Hence it will use a relatively low bandwidth. • Flow and congestion control The basis for flow and congestion control is provided by the RTCP Sender and Receiver Reports. Congestion is distincted as transient congestion and persistent congestion. By analysing the interarrival jitter field of the sender RTCP report, the jitter over a certain interval can be measured, and then to indicate congestion before it becomes persistent. • A global service provider can detect local or global network congestions, and react in time to prevent heavy packet loss, by using a monitor program to receive and evaluate only RTCP packets. • The receivers of a stream detect decreasing packet rates and inform the sender by using Sender Reports. The sender can then change the format/compression of the media.

The main goals of RTP: • Support of different payload types The PT field of RTP header can identify the payload media type. 1) dynamic payload types : in the 96 — 127 area. 2) popular payload types : detailed descriptions. • Packet source tracing after arrival 1) the SSRC field of each RTP header can be used to trace its origin. 2) the CSRC field of RTP header is used to identify the information of the use of mixers and translators. • Reliability RTP is a non-reliable protocol. It doesn’t provide any mechanism to error detection or packet ordering. 1) The sequence number of each RTP header can be used to reorder packets and estimate local or overall packets loss rate. 2) RTCP Receiver Reports are used to give a feedback to the sender about the received packet rate and network influences like jitter.

3) Congestion control • Congestion control scheme used in ivs is an end-to-end control model, which consists of a network sensor, a throughput controller and a output rate control. 1) To perform a network sensor, two approaches are provided. __ let each receiver send a negative acknowledgment (NACK) packet whenever it detects a loss. __periodically sending a QoS measure of the packet loss rate observed by receivers during a time interval of getting 100 packet. The QoS approach is more efficient than the NACK approach if the packet loss rate is higher than 1%. After receiving these information, the sender gets a median loss rate med_loss. 2) In throughput controller, ivs adjusts the maximum output rate of the coder max_rate so that med_loss < a tolerable loss rate tol_loss. The control algorithm is shown as following: If ( med_loss > tol_loss ) max_rate = max (max_rate/2, min_rate) else max_rate = gain*max_rate. Where, min_rate = 10kb/s, gain=1.5, tol_loss=10%, and max_rate=100kb/s. The resulted max_rate will be used to control the output rate.

3) In ivs, two methods are used to control the output rate. __In privilege quality (PQ) mode, the value of the quantizer and the motion detection threshold are constant, and match the maximal visual quality. The frame rate is changed so that the output rate stays below max_rate. __In privilege frame rate (PFR) mode, for constant frame rate, the output rate is controlled using the different quantizer and motion detection threshold values. • In MMT, just frame rate control is used in the rate control implementation due to that it used the hardware JPEG chip in compression scheme. The frame rate is calculated as follows: Frame rate = target bandwidth/ size of the current compressed frame. Which is computed just when the different of the current compressed frame size and the previous size large than a threshold.

4) Error control • ivs used two methods to do error control. 1) the receiver identifies the missing blocks by using the timestamp, as all GOBs belonging to a given image have the same timestamp. When receiver notices that some packets are not received, it will send a NACK packet to the sender. The sender will send the INTRA encoded data of a new frame to the receiver. This is a forced replenishment and not a retransmission procedure. The drawbacks are : a) feedback explosion when receivers are too numerous, b) regular hardware H.261 codecs are not designed to adapt NCAK packets. 2) periodically refreshing the image by INTRA encoding mode. The H.261 recommendation requires INTRA encoding of each MB at least once every 131 times it is transmitted for control of accumulation of decode mismatch error. In ivs, when the receiver number is less than 10, NACK's packets are used; else the INTRA refreshment is used in ivs.

4. Software Architecture The software of videoconferencing consisted of: 1) user interface, 2) capture path, 3) compression path, 4) packtization path, 5) encryption path, 6) the conference control bus, 7) decryption path, 8) depacktization path, 9) decompression path, 10) rending path. The nv, ivs, vic are developed by c++ and Tcl/tk in UNIX environment. • User Interface It should include the conference, compression and transmission information and other useful information. • Capture path It performs converting an analog video to digital signals by software or hardware. ---MIT laboratory of CS has developed a video capture Vidboard for a real time distributed multimedia system centered around a ATM network. ---nv, ivs and vic did not provide software capture, they use a video capture board. ---vic optimised the capture paths by supporting each video format. • Rending path It is to convert video from the YUV pixel representation used by most compression schemes to a format suitable for the output device(an external video device or to an X window). vic supports several operations for X window : colour-mapping display; simply convert pixels from the YUV color space to RGB; to dither only the regions of the image that change.

User Interface ivs vic

Compression path It performs video compression to reduce data redundancy. 1) nv uses a Haar wavelet compression scheme, in which each 8x8 image blocks are transformed by a Haar wavelet. Resulted coefficient less than a threshold are set to zero, then are coded by the run-length coding. ivs uses the H.261. 2) vic provided several compression schemes including MPEG, Motion-JPEG, H.261, Intra-H.261. Intra.H.261 uses the intraframe coding of H.261 without the H.261' interframe coding. 3) ivs uses the H.261. The H.261 standard, commonly called px64, is optimised to achieve very high compression ratio for full-color, real-time motion video transmission. The px64 (p=1, 2,..., 30) compression algorithm combines: intraframe coding ___ DCT-based intraframe compression, interframe coding ___the predictive interframe coding based on DPCM (Differential Pulse Code Modulation) and motion estimation. The px64 algorithm operates with two picture formats adopted by the CCITT, Common Intermediate Format (CIF), and Quarter-CIF (QCIF), where 1 CIF=12GOBs (Group of Blocks), 1 QCIF=3GOBs, 1 GOB = 3x11MBs (Macro Blocks) and 1 MB = 4Y+Cb+Cr.

Packettization path The compressed data are fragmented or collected to transmission unit, and then interlaced with audio stream and transmitted over network in packetization path. _Both ivs and vic use real time protocol (RTP) to transmit the video/audio flow. _The latest version of ivs takes the MB as unit of fragmentation in the packtization scheme instead of GOB in first version. 1) Packets must start and end on a MB boundary, it means that a MB cannot be split across multiple packets. 2)The fragmentation units are carried as payload data within the RTP protocol. This packetization scheme is currently proposed as standard by the audio-video transport working group (AVT-WG) at the IETF. • Encryption/decryption path This is for network security. Encryption is implemented as the last step in the transmission path and decryption is implemented as the first step in the reception path. _vic employed the Data Encryption Standard (DES) in cipher block chaining mode.

The conference bus It is only provided by vic as a mechanism to provide coordination among the separate processes. Each application can broadcast a typed message on the bus and all applications that are registered to receive that message type will get a copy. Conference Buses are implemented as multicast datagram sockets bound to the interface. vic uses the conference bus to provides following functions: • Floor Control: the moderator can give the floor to a participant by multicasting a takes-floor directive with that participant's RTP CNAME. Locally, each receiver then mutes all participants except the one that holds the floor. • Synchronization: each real-time application induces a buffering delay, called the playback point, to adapt to packet delay variations. By broadcasting "synchronise" messages across the Conference Bus, the different media can compute the maximum of all advertised playout delays. This maximum is then used in the delay-adaptation algorithm. • Voice-switched Windows: the current speaker messages are broadcast by vat over the Conference Bus, indicating the speaker’s RTP CNAME. vic monitors these messages and switches the viewing window to that person. • Device Access: applications sharing a common device issue claim-device and release-device messages on the global bus to coordinate ownership of an exclusive-access device.

5. Performance • Performance is tested on an SS10/20(41 Mips) platform with the SunVideo board. • Results: 1) only nv can reach 20 QCIF frames/s when the video is very animated. 2) the ivs-H.261 gives a 30% higher compression rate than vic-H.261. 3) the vic-H.261 coder is less greedy of CPU than the ivs-H.261. • nv is strong in: 1) the low complexity of compression algorithm, then a higher frame rate. But, the compression rate is low. • ivs is strong in: 1) network control. • vic is strong in: 1) compression methods.

6. A videoconferencing over ATM • ATM versus Internet: • ATM provides all sorts of services, including audio, video and data applications. • Internet is not much suitable for real-time transmission. • ATM provides high performance required by live video and audio. • ATM seems easier to provide quality of service guarantees that are suitable for high-quality voice and video. • the CCITT pursues a cell architecture for future telephone networks since cells had the potential to reduce the number of transmission networks, provide easier support for multicasting and offer a better multiplexing schemes than ISDN for high speeds. Being a particular form of cell networking, ATM is argued as the only rational choice for future networks.

A prototypeof videoconferencing over ATM

Videoconferencing over Packet Switched Networks

Videoconferencing over Packet Switched Networks

Presentation Transcript

TCP/IP Performance across Optical Packet-Switched (OPS) Networks

Modified Cell Delineation Strategy for Packet Switched Networks

Practical Considerations for Smoothing Multimedia Traffic over Packet-Switched Networks

Language Modeling and Encryption on Packet Switched Networks

Efficient Policies for Carrying Traffic Over Flow-Switched Networks

TTY Transport Over Packet-Switched Networks for Users

Legacy and Voice over Packet Switched Networks

Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Synchronization over packet-switching networks: theory and applications

PATH DIVERSITY WITH FORWARD ERROR CORRECTION SYSTEM FOR PACKET SWITCHED NETWORKS

All-Optical Header Processing in Optical Packet-Switched Networks

Packet-Switched Networks

1.6 Delay( 时延 ) P29-31 （ in packet-switched networks ）

A Forward end-to-end delays Analysis for packet switched networks

WAN – Packet and Cell-switched Networks

Circuit Switched vs. Packet Switched Technology

Chapter 4 Switched Networks

Global Serverless Videoconferencing over IP

Internetworking: Voice over packet-switched networks and IP over X