130 likes | 240 Views
This study focuses on analyzing vocal noise in running speech through bi-directional long-term linear prediction. It aims to detect and assess dysperiodicities in vocal signals, such as steady vowel fragments and various phonation types. By utilizing both forward and backward long-term prediction techniques, the analysis evaluates voice characteristics, including modal and hoarse voices. The proposed methodology removes existing correlations to focus on unpredictable noise components, providing insight into vocal quality and its relation to dysperiodicities without assumptions of periodicity or stationarity.
E N D
Assessment of Vocal Noise via Bi-directional Long-term Linear Prediction of Running Speech F. Bettens*, F. Grenez*, J. Schoentgen*,** *Université Libre de Bruxelles **National Fund for Scientific Research Belgium
Existing Cues of Vocal Noise • Detection of individual vocal cycles(or harmonics) • Steady vowel fragments • (Pseudo)-Periodicity • Period Perturbation Quotient • Amplitude Perturbation Quotient • Harmonics-to-Noise Ratio
Objectives : Analyses of Dysperiodicities • Give up request that speech fragments are : • (Pseudo)-Periodic • Steady • Any Speech Fragment : • Modal Voices & (Very) Hoarse Voices • Sustained Vowels & Running Speech
Motivation : Analysis of Running Speech • Voicing in running speech • Variable acoustic impedance • Voicing onsets & offsets • Variable pressure drops • Variable laryngeal positions • Voice Loading
Double Linear Predictive Analysis • Conventional short-term linear prediction: • Long-term linear prediction: remove existing correlations unpredictable noise component (Qi, 1999) forward short-term prediction error forward double prediction error
Double Linear Predictive Analysis Solutions: remove short-term linear predictive analysis stage proceed to bi-directional analysis Drawbacks: • eS[n] is an artificial signal • the dysperiodicities in weighted sum x[n] are omitted • eL[n] is inflated to the right of unvoiced/voiced boundaries
Bi-directional Long-term Prediction • Forward long-term linear prediction: • Backward long-term linear prediction: • Bi-directional long-term linear prediction: keep the “best” (frame by frame) forward long-term prediction error backward long-term prediction error bi-directional long-term prediction error
Long-term Prediction Distance : P Maximum of the auto-correlation function example: steady vowel [a] (dysphonic speaker) P = 184 (2 cycles)
Vocal Noise Cue Signal-to-Dysperiodicity Ratio: example: steady vowel [a] healthy speaker dysphonic speaker speech signal x[n] bi-directional long-term prediction error eL[n] SDR = 31,2 dB SDR = 10,1 dB
Results1:Sentence(1 female speaker; modal phonation type)(http://www.limsi.fr/VOQUAL/ : “Il est sorti avant le jour”) segments [il] speech signal bi-directional long-term prediction error forward long-term prediction error
Results 2 : Sentence (1 female speaker; 5 phonation types)(http://www.limsi.fr/VOQUAL/ : “Il est sorti avant le jour”)
Conclusion The forward & backward long-term prediction of speech enables the analysis of any speech signal with a view to the assessment of the vocal noise (i.e. vocal dysperiodicities) The analysis is not based on any assumptions regarding the periodicity or stationarity of the speech signals