A STUDY OF DESIGN COMPROMISES FOR SPEECH CODERS IN PACKET NETWORKS

F2k-2 F2k-1F2k F2k+1 F2k+2 F2k+3 R Past excitation Pk-1 Pk Pk+1 Long-term prediction 10 ms frame 20 ms frame Codec_P Redundancy Quality (robustness to frame loss) Pk-1 Pk-1 Pk-1 Pk-1 Pk Pk Pk Pk Pk+1 Pk+1 Pk+1 Pk+1 Codec_P + R + Delay F2k+2 F2k-2 F2k … … … … F2k-2 F2k-2 F2k-2 F2k-1 F2k+3 F2k+1 F2k-1 F2k-1 F2k-1 F2k F2k F2k F2k+1 F2k+1 F2k+1 … … … … Codec_FI F2k+2 F2k+2 F2k+2 F2k+3 F2k+3 F2k+3 Codec_FI or Codec_P + R F2k+2 F2k+4 F2k F2k-3 F’2k-3 F’2k-1 F2k-1 F2k-4 F’2k-4 F2k-2 F’2k-2 F’2k F2k F2k+1 F’2k+1 Codec_P encoded in « absolute » 20 ms packet (two G.729 frames) Long-term prediction Total payload bit rate 0 % FER G.729-2 G.729-0 G.729-4 G.729-3 G.729-1 iLBC A STUDY OF DESIGN COMPROMISES FOR SPEECH CODERS IN PACKET NETWORKS Roch Lefebvre, Philippe Gournay University of Sherbrooke Sherbrooke, Quebec, Canada Redwan Salami VoiceAge Corp. Montreal, Quebec, Canada • INTRODUCTION 3. PROPOSED APPROACHES FOR ADDING REDUNDANCY 6. LISTENING TEST RESULTS G.729 frame packet • In voice over packet networks, the coding gain achieved by prediction-based speech coders is offset by packet losses. Concealment must be applied to the missing packets, which reduces quality for two main reasons : • not all missing packets can be concealed, especially when concealment uses only the past signal • onsets, transients • the concealment error can propagate over several frames, even frames received correctly •  culprit : desynchronisation of the excitation content (LTP) Consider only G.729 at 8 kbps (baseline predictive coder) and add redundancy to obtain bit rates similar to iLBC at 15.2 kbps. R (kbps) D (ms) G.729-2 and G.729-3 differ at the decoder : Content of each 20-ms packet : 11.8 15 G.729-2 : Decode packet Pk when it arrives (do not wait for packet Pk+1). If packet Pk is missing, then apply concealment followed by resynchronisation of filter memories using F’2k and F’2k+1 that are received when packet Pk+1 arrives. Then, start decoding packet Pk+1. G.729-3 : Decode packet Pkonly after packet Pk+1 has arrived (additional delay of 20 ms). If packet Pkwas missing, then just use F’2k and F’2k+1 that are added as redundancy in packet Pk+1. No concealment is applied in this case. G.729-0 : G.729-1 : Bit rate and algorithmic delay 16 45 G.729-2 / G.729-3 : • We propose to compare two approaches for alleviating this problem : • Adding redundancy to increase the robustness of a baseline predictive encoder (G.729) • Using a speech coding model which does not have interframe dependencies (iLBC) • To be compared, solutions should have comparable bit rates 14.1 45 15.2 25 12 35 G.729-4 : 14.1 25 G.729-4 : At the decoder, wait for packet Pk+1before decoding packet Pk. In G.729-2 and G.729-3, F’k denotes Fk but without the 18 LSF bits and pitch parity bit (hence, frame F’k has 19 bits less than frame Fk). The missing ISFs have to be extrapolated at the decoder when a missing frame occurs. 8 25 (Point size proportional to quality at 10 % FER) 2. ADDED REDUNDANCY versus FRAME INDEPENDENCE 4. EFFECT ON ERROR PROPAGATION 5. SUBJECTIVE EXPERIMENT • A formal listening test was conducted to compare the different solutions for increasing the robustness in case of missing packets. The main features of this test are : • clean speech, narrowband, IRS filtered • 4 male, 4 female speakers • 32 naive listeners • listening using binaural headphones • following guidelines of ITU-T Rec. P.800 • 36 conditions in total, including MNRU and other reference conditions • 0 – 20% random packet losses, synchronized between iLBC and G.729 G.729-0 : Every missing 20-ms packet implies that two consecutive 10-ms frames of G.729 are lost. Concealment and propagation introduce large artefacts. G.729-1 : Every missing 20-ms packet reduces to a single 10-ms frame loss in G.729. Concealment is more optimal, and propagation is reduced. G.729-2 : Concealment followed by approximate resynchronisation of filter memories. G.729-3 : Limited concealment (there would be no concealment if F’ was equal to F). G.729-4 : No effective loss in all single packet losses. ILBC : Concealment, but limited error propagation (only due to post-filtering at decoder to smooth frame transitions). Codec_P : G.729 (CELP-based) Approach 1 : Use a lower bit rate, predictive (CELP) coder, and add channel redundancy to improve robustness to missing frames. Approach 2 : Use a higher bit rate, non-predictive or « frame-independent » codec, to improve robustness to missing frames in the core codec itself. 7. CONCLUSIONS • From the test results, we can make the following conclusions : • In clean channel conditions, iLBC at 15.2 kbps has equivalent quality to G.729 at 8 kbps (i.e. a much higher bit rate is necessary in a « frame- independent » coder to increase both the quality in clean channel and frame loss conditions).  extreme example = G.711 at 64 kbps • The best quality in frame loss conditions was achieved by using a low-rate CELP coder with added redundancy and delay (G.729-4), with a total bit rate close to iLBC (16 kbps compared to 15.2 kbps) • The approaches studied to increase robustness represent only a subset of all possible combinations. Only solutions based on a standard CELP-coder (G.729) were considered, with some of them not optimal (ex.: G.729-2). Improved results could be expected by designing a solution without the constraint of using standard core codecs. • The G.729 RTP payload can already support solutions G.729-1 and G.729-4. 20 ms packet 3rd Packet lost G.729 synthesis G.729-0 error at decoder Codec_FI : iLBC (Freame-independent) G.729-1 error at decoder Anticipated gains in quality G.729-2 error at decoder G.729-3 error at decoder G.729-4 error at decoder iLBC error at decoder (compared to iLBC synthesis without frame loss)

A STUDY OF DESIGN COMPROMISES FOR SPEECH CODERS IN PACKET NETWORKS

A STUDY OF DESIGN COMPROMISES FOR SPEECH CODERS IN PACKET NETWORKS

Presentation Transcript

Routing in packet switching networks

A bundle of Compromises

Speech Coders – a VoIP perspective

Delays in Packet Networks

RECONSTRUCTION OF MISSING PACKETS FOR CELP-BASED SPEECH CODERS

Design study for multimedia transport protocol in heterogeneous networks

Delays in Packet Networks

Quality of Service Support in Packet Networks

DYSONET: A Study of Networks

Packet-Switching Networks

Packet Radio Networks

Packet Switching Networks

Packet-Switched Networks

A Simulative Study Of Distributed Speech Recognition Over Internet Protocol Networks

FRuG: A Benchmark for Packet Forwarding in Future Networks

Quality of Service Support in Packet Networks

Speech Coders – a VoIP perspective

Design of a Diversified Router: Packet Formats

A Study of Speech Perception:

Design of a Diversified Router: Packet Formats

FAMA for Packet-Radio Networks