Acoustic Analysis of Speech

Acoustic Analysis of Speech • Robert A. Prosek, Ph.D. • CSD 301

Acoustic Analysis • Instrumental acoustical analyses have been used for over 100 years • Analog techniques dominated the first 60 of these years • More recently, digital techniques have dominated the field • We will begin by introducing a few of the important analog methods, then turn to the digital

Oscillograph/Oscillogram • Any device that can display a waveform is an oscillograph • The output (display or hardcopy) is an oscillogram • There is limited information available in a waveform • silence • burst • noise • periodicity

Filter Bank Analysis • In this procedure, a filter bank or a single filter is used to divide the signal energy into frequency bands • The output energy is displayed for each band • This is a form of spectral analysis • The output typically is displayed in the form of an histogram • The technique is very common in audiology and hearing applications

Sound Spectrograph/Spectrogram • The instrument is called a spectrograph • The output (usually a hardcopy) is a spectrogram • This is the most commonly used device in speech research • The spectrograph can capture the dynamics of speech • Acoustic signals vary only in frequency, amplitude and time • The sound spectrograph captures all of these

Sound Spectrogram • Abscissa is time • Ordinate is frequency • Intensity is shown as shades of gray • Black areas indicate the highest amplitudes • White areas indicate the noise floor • Amplitudes between these extremes are shown in varying shades of grey • the more intense the signal is at a particular frequency and time, the darker the trace

Digital Signal Processing (1) • In the late 1960’s general purpose digital computers made it possible to analyze acoustic signals on the computer • These techniques are necissarily discrete as well as digital • Once in discrete form, the signal can be stored conveniently and analyzed in many way that were not possible with analog techniques

Digital Signal Processing (2) • Presampling or brickwall filtering • Nyquist Theorum • In order to represent a signal faithfully, it must be sampled at a rate equal to twice its highest frequency • The brickwall filter removes all of the energy above the Nyquist frequency • The clinician/researcher determines the Nyquist frequency • Some knowledge of speech and speech and language disorders is required

Digital Signal Processing (3) • Sampling • Analog-to-digital conversion • Signal must be sampled at the Nyquist rate • Sampling decides the times at which the signal will be • Sampling converts the acoustic signal into a series of numbers • Instead of amplitudes at all instances of time, no matter how small the time interval, amplitudes in the digital world exist only at the sampling interval • Aliasing

Digital Signal Processing (4) • Quantization • Discrete number of amplitude levels • The more quantizer levels available, the more the discrete signal represents the original analog signal • In our applications, 16 -bit quantizers over a 20-volt range are typical • This yields an amplitude resolution of 300 μvolts and a signal to noise ratio of 96 dB

Digital Signal Processing (5) • After A/D conversion • the signal is stored as a stream of numbers • time is related by the index to the sampling rate • the amplitude is the stored number • in this form, many operations can be performed

Waveform Display • Duration measurements • speech changes gradually • some consistent rules need to be adopted • Signal editing • again, some consistent rules need to be adopted • Amplitude measurements • rms is the most common • vocal fundamental frequency

Digital Spectrum Analysis • The Fourier Transform revisited (FFT) • Periodic waveforms can be thought of as a series of sinusoids • amplitude and phase • The Fourier Transform and the Inverse Fourier transform allow powerful analysis-by-synthesis techniques

Digital Spectrograph • This is a series of spectra based on the FFT or LPC (see below) • The amplitude is depicted as shades of gray • PRAAT is an example of a digital spectrograph • Speech Filing System, Speech Station 2, Wavesurfer, and many other free or commercially spectrographs are available

Linear Predictive Coding (1) • Speech is highly predictable over the short term • It is not hard to predict the amplitude of the next time sample of the speech waveform from a knowledge of the previous amplitudes • As few as 10 to 15 previous samples is all that is required

LPC (2) • From statistics, we know that: • y= a0+a1(x-1)+a2(x-2)+...+an(x-n) • where y is the amplitude of the next sample • and x is one of the previous samples • This is linear prediction

LPC (3) • Linear Predictive Coding (LPC) is one of the most powerful techniques in speech analysis • The a’s in the previous equation can be used as estimates of the resonances of the vocal tract. • They can represent sections of the vocal tract

Wideband versus Narrowband Spectrograms • Wideband (0.005, 0.007, 0.009) • Short time window • Good for measuring formant frequencies • Narrowband (0.1, 0.05) • Long time window • Good for showing and measuring harmonics

Acoustic Analysis of Speech