RTP and playout delay compensation

RTP and playout delay compensation Henning Schulzrinne Dept. of Computer Science Columbia University Fall 2003

RTP packet header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

RTP: timestamp • Timestamp measured in sample units • reflects nominal sampling time of first sample in packet • e.g., 20 ms block size of 8,000 Hz audio  160 timestamp units per packet • always 90 kHz for video • e.g., 3000 timestamp units per packet for 30 fps • 3600 for 25 fps • 3750 for 24 fps • even if real system clock is slower or faster • note: 32 bit integer  may wrap around • if start at 0, after about 6 days for audio, ½ day for video • but starting value is supposed to be random

RTP sequence number • Counts packets actually sent • Wraps around much quicker • e.g., for 20 ms packets, in about 22 minutes • Also uses random starting value

RTP timestamp vs. sequence number • Related, but different purposes • timestamp for timing reconstruction: • playout delay compensation (later) • synchronization with other sources (later) • sequence number for loss measurements and gap detection • t = s*b + c • where t = timestamp • s = sample units per packet • offset c is constant within a talkspurt, but changes after each talkspurt or after transmission gap

Playout delay • Converts variable network delay (“jitter”) into fixed delay • thus, end-to-end delay is max(jitter) + propagation delay • or, if willing to tolerate some late packets: • delay < 95% of jitter + propagation delay • Propagation delay is invisible • and hard to measure without synchronized clocks • about 5 ms/1000 km one way • Total delay should be less than 150 ms one-way • End-to-end delay must remain constant within a talkspurt • otherwise gaps

Playout delay playout delay packet jitter late = lost time

Logically infinite buffer Implemented as “circular buffer”, with wrap around Takes care of jitter and re-ordering based on RTP timestamp t Playout point p = t*b + c p = buffer position, measured in samples (typically, 16 bits if decoding is done before playout) b = buffer positions per sample (usually, = 1) c = offset Usually, best to think of each talkspurt as an independently schedulable unit p = p0 + (t – t0) * b t0 = timestamp for first packet in talkspurt p0 = position for first packet in talkspurt Playout buffer silence decoder (G.729  L16)

Thus, hard part is computing insertion point for first packet in talkspurt Trying to predict future late loss vs. excessive delay Conceptually, two approaches: look at current playout point when first packet arrives then, leave some margin of error may be too conservative compute based on last talkspurt and change c avoids overestimation due to slow first packet deals less well with jumps in delay after long pauses Simple method: assume roughly normal distribution and take n times the variance of the delay (= jitter) this becomes the extra delay Other mechanisms: spike detection optimal value for last talkspurt Playout buffer, cont’d. insert play t t=140 t=100

RTP and playout delay compensation