EE513 Audio Signals and Systems. LPC Analysis and Speech Kevin D. Donohue Electrical and Computer Engineering University of Kentucky. Speech Generation.
EE513Audio Signals and Systems
LPC Analysis and Speech
Kevin D. DonohueElectrical and Computer EngineeringUniversity of Kentucky
Speech can be divided into fundamental building blocks of sounds referred to as phonemes. All sounds result from turbulence through obstructed air flow
The vocal cords create quasi-periodic obstructions of air flow as a sound source at the base of the vocal tract. Phonemes associated with the vocal cord are referred to as voiced speech.
Single shot turbulence from obstructed air flow through the vocal tract is primarily generated by the teeth, tongue and lips. Phonemes associated with non-periodic obstructed air flow are referred to as unvoiced speech.
Taken from http://www.kt.tu-cottbus.de/speech-analysis/
Air Burst or Continuous flow
The general speech model:
Sources can be modeled as quasi periodic impulse trains or random sequences of impulses.
Vocal tract filter can be modeled as an all-pole filter related to the tract resonances.
The radiator can be modeled as a simple gain with spatial direction (possibly some filtering)
First 3 resonances of tube with 1 closed end
Vocal tract length corresponds to signal wavelength (). It can be obtained from resonant frequencies (f ) estimated from recorded speech soundsand the speed of sound (c), using equation:
Image adapted from:hyperphysics.phy-astr.gsu.edu
The resonances of the vocal tract are called formants and
can be estimated from peaks of the spectrum where the effects
of pitch have been smoothed out (i.e. spectral envelope).
If the voiced speech is characterized by an all pole model with low order (i.e. about 10 for sampling rate of 8kHz), then the pole frequencies correspond to the resonances of the vocal tract:
The above transfer function can represent a filter that computes the error between the current sample and the sample predicted from previous samples. Therefore, it is call a prediction error filter.
Create an “auh” sound (as the “a” in about or “u” in hum) and use the (linear prediction coefficient) LPC command to model this sound being generated from a quasi-periodic sequence of impulses exciting an all pole filter.
The LPC command finds a vector of filter coefficients such that prediction error is minimized.
Predict x(n) from previous samples:
Compute prediction error sequence with:
Use Z-transforms to find transfer function of filter that recovers x(n) from the LPCs and error sequence e(n).
Derive an algorithm to compute LPC coefficients from a stream of data that minimizes the mean squared prediction error.
Let be the sequence of data points and
be the Mth order LPC coefficients, and be the prediction estimate.
The mean squared error for the prediction is given by:
Put prediction equations in matrix form:
Each row of is a prediction of the corresponding sample in
The mean squared error can be expressed as:
If derivative is taken with respect to a and set equal to 0, the result is:
Transpose of the data matrix times itself results in the autocorrelation matrix:
The data matrix transpose times the future (p-vector) values become a sequence of autocorrelation values starting with the first lag:
Define the autocorrelation of a sequence as:
Note that the LPC coefficients are computed from the autocorrelation coefficients:
winlens = 50; %PSD window length in milliseconds
[y,fs] = wavread('../data/aaa3.wav'); % Read in wavefile
winlen = winlens*fs/1000;
[cb,ca] = butter(5,2*100/fs,'high'); % Filter to remove LF recording noise
yf = filtfilt(cb,ca,y);
[a,er] = lpc(yf,10); % Compute LPC coefficient with model order 10
predy = filter(a,1,yf); % Compute prediction error with all zero filter
kd=1; % Starting figure number
figure(kd) ; plot(predy); hold on; plot(yf,'g'); hold off; title('Prediction error'); xlabel('Samples'); ylabel('Amplitude')
recon = filter(1,a,predy); % Compute reconstructed signal from error and all-pole filter
figure(kd+1) % Plot reconstructed signal
% Plot with original delayed by a unit so it does not entirely overlap the perfectly reconstructed signal
title('Reconstructed Signal (blue) and Original (red)')
% By examining a the error sequence, generate a simple impulse sequence to simulate its period (about 103 sample period)
g = ;
g = [g, 1, zeros(1,55)];
% Run simulated error sequence through all pole filter
sim = filter(1,a,g);
soundsc([(sim')/std(sim); zeros(fix(fs)*1,1); yf/std(yf)],fs)
% Plot pole zero diagram
r = (roots(a))
w = [0:.001:2*pi];
title('Pole diagram of vocal tract filter')
% Find resonant frequencies corresponding to poles
froots = (fs/2)*angle(r)/pi;
nf = find(froots > 0 & froots < fs/2); % Find those corresponding to complex conjugate poles
% Examine average specturm with formant frequencies
[pd,f] = pwelch(yf,hamming(winlen),fix(winlen/2),2*winlen,fs);
dbspec = 20*log10(pd);
mxp = max(dbspec); % Find max and min points for graphing verticle lines
mnp = min(dbspec);
plot(f,dbspec,'b') % Plot PSD
% Over lines on plot where formant frequencies were estimated from LPCs
plot([froots(nf(k)), froots(nf(k))], [mnp(1), mxp(1)], 'k--')
title('PSD plot with formant frequencies (Black broken lines)')
% Get spectrum from the AR (LPC) parameters
[hz,fz] = freqz(1, a, 1024, fs);
title('Spectrum Generated by LPCs')
Pole Frequencies of LPC model from vocal tract shape
Frequency periodicities from harmonics of Pitch frequency
Direct form 1 for all pole model:
Direct form 1, second order sections:
Lattice implementation are popular because of good numerical error and stability properties. The filter is implement in modular stages with coefficients directly related to stability criterion and tube resonances of the vocal tract (example of 2nd order system):