1 / 38

Analysis and Synthesis of Pathological Vowels

Analysis and Synthesis of Pathological Vowels. Prospectus. Brian C. Gabelman. 6/13/2003. OVERVIEW OF PRESENTATION. I. Background II. Analysis of pathological voices III. Synthesis of pathological voices IV. Summary. BACKGROUND. What is a pathological vowel?.

wayne
Download Presentation

Analysis and Synthesis of Pathological Vowels

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis and Synthesis of Pathological Vowels Prospectus Brian C. Gabelman 6/13/2003

  2. OVERVIEW OF PRESENTATION I. Background II. Analysis of pathological voices III. Synthesis of pathological voices IV. Summary

  3. BACKGROUND What is a pathological vowel? May be caused by physical or neural problems. Characterized by substantial and complex NONPERIODIC signal components. What is this project? Methods for modeling, analysis, and synthesis of pathological vowels incorporating novel approaches in: - System identification - Parameterization of non-periodic components (AM, FM, and noise) - Synthesizer designs for realtime and offline use

  4. BACKGROUND Why do it? - Create a non-subjective basis to compare pathological voices for: 1. Improved diagnosis 2. Tracking changes in a patient’s voice - Generate voice samples with known levels of variations (noise, roughness, etc.) for: 1. Evaluation of model parameters 2. Evaluation of listener variability 3. Evaluation of importance of levels of pathological features. What has been done before? - A well-established theory exists for NORMAL voices - Recent studies of pathological voices employ perturbation of normal features plus additive noise. - ES (external source) stimulation of the vocal tract to analyze formants (vowels) since 1942 for NORMAL voices.

  5. BACKGROUND For the theoretical/analytical aspect of the project, an expression of the hypothesis of the dissertation in one sentence is: “By means of FM and AM demodulation techniques, estimation of nonperiodic features of pathological vowels may be improved.”

  6. VOCAL TRACT FREQ. RESP. (freq domain) GLOTTAL SOURCE WAVEFORM (time domain) RESULTING VOICE SIG. (time domain) v(t) V(s) g(t) G(s) f(t) F(s) g(t) conv. f(t) = v(t) (time domain) G(s) x F(s) = V(s) (freq domain) ANALYSIS SOURCE - FILTER MODEL OF SPEECH

  7. ANALYSIS Steps for analysis: PERIODIC ANALYSIS: 1. FORMANT DETERMINATION Uses LP (linear prediction) to model vocal tract as cascaded 2nd order digital resonators. External source testing is shown to augment or replace LP for pathological vowels (Inv). 2. SOURCE MODELING Uses inverse filtering and least squares optimization to fit source waveform to a standard model (LF). NONPERIODIC ANALYSIS: 3. ANALYSIS OF PITCH VARIATION Uses high resolution pitch tracking to measure detailed nonperiodic frequency variation. Variations are segmented into low and high frequeny FM with Gaussian form.

  8. ANALYSIS 4. FM DEMODULATION Pitch variations are removed from original voice to achieve accurate noise estimation (step 7) (Inv.) 5. ANALYSIS OF POWER VARIATION Uses power tracking to measure detailed nonperiodic loudness variations. Variations are segmented into low and high frequency AM with Gaussian form. 6. AM DEMODULATION Power variations are removed from original voice to achieve accurate noise estimation (step 7) (Inv.) 7. ASPIRATION NOISE Frequency domain methods are used to separate aspiration noise component. The noise is spectrally modeled.[Gus de Krom]

  9. COLLECTION OF VOICE SAMPLES VOICE ANALYSIS + PARAMETERS (PITCH, NSR, FORMANTS,..) PERCEPTUAL COMPARISONS - VALIDATION VOICE SYNTHESIS ANALYSIS ANALYSIS BY SYNTHESIS

  10. ANALYSIS ANALYSIS/SYNTHESIS MODEL OVERVIEW SOURCE WAVEFORM PERIODIC SYNTHESIS FM MODULATION ANALYSIS AM MODULATION + VOCAL TRACT OUTPUT VOICE + NONPERIODIC SPECTRAL SHAPING GAUSSIAN NOISE

  11. ANALYSIS OVERVIEW OF PROJECT OPERATONS MIC SIGNAL SHIMMER JITTER ESTIMATE LPC FORMANT ESTIMATE. POWER PITCH ANALYSIS / TRACKER TRACKER MANUAL OPS VOLUME TREMOR TIME HIST TIME HIST POWER PITCH FORMANTS TIME HIST TIME HIST RESAMPLE RESAMPLE MIKE INVERSE TO REMOVE TO REMOVE SYNTHESIZER SIGNAL FILTERING TREMOR TREMOR RAW FLOW CONST. POWER CONST. PITCH DERIVATIVE VOICE VOICE CEPSTRAL LEAST SQR NOISE LF FIT ANALYSIS FITTED LF SRC NOISE NSR ESTIMATE SOURCE PULSE SPECTRUM

  12. + u(n) s(n) e(n) - UNKOWN SYSTEM PREDICTOR H(z) s’(n) ESTIMATED INVERSE SYSTEM A(z) G/H(z) PERIODIC ANALYSIS LINEAR PREDICTION Estimates the vocal tract as an all-pole filter by minimizing the error between actual and model-predicted signals. (VOCAL TRACT) (SOURCE) (ERROR) (MODEL)

  13. LPC COVARIANCE ROOTS. O=TRUE X=LPC 1 0.8 POLE REFLECTED INSIDE UNIT CIRCLE 0.6 0.4 0.2 Imaginary part 0 -0.2 -0.4 -0.6 -0.8 -1 -1.5 -1 -0.5 0 0.5 1 1.5 Real part 25 LPC COV. PREDICTOR: LINE = ACTUAL OUT, DOT= PREDICTED 20 15 10 5 0 -5 -10 -15 -20 0.03 0.035 0.04 0.045 0.05 0.055 0.06 2 ERROR SIGNAL OF COVARIANCE LPA = INVERSE FILTERED OUTPUT 0 -2 -4 -6 -8 -10 0.06 0.03 0.035 0.04 0.045 0.05 0.055 PERIODIC ANALYSIS IDEALIZED LP RESULT Requires a priori knowledge of system. More difficult for pathological vowels.

  14. CASE 0: NORMAL SOURCE AND NORMAL VOCAL TRACT SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIES VOICE SPECTRUM CASE 1: “BREATHY” SOURCE AND NORMAL VOCAL TRACT SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIES VOICE SPECTRUM CASE 2: NORMAL SOURCE AND ABNORMAL VOCAL TRACT SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIES VOICE SPECTRUM PERIODIC ANALYSIS SOURCE-FILTER AMBIGUITY Source & filter are mixed in final voice. Unique LP solution may be difficult.

  15. 150 E1 (FIRST SEG.) E2 (SECOND SEG.) 100 U(t) 50 tc 0 FLOW U(t) & DERIVATIVE E(t) tp (tpk,Epk) (t2,Ee/2) -50 30*E(t) (te,Ee) -100 -150 0 0.002 0.004 0.006 0.008 0.01 0.012 T (SECONDS) PERIODIC ANALYSIS FITTING RAW SOURCE TO LF Having established the inverse filtered source, it is fit to the LF model [Qi]

  16. 8 6 4 2 0 VOICE SIG -2 275 -4 -6 270 Measured Period -8 0.018 0.02 0.022 0.024 0.026 0.028 0.03 0.032 0.034 0.036 FREQ (HZ) TIME (SEC) 265 260 255 0 0.2 0.4 0.6 0.8 1 TIME (SEC) NON-PERIODIC ANALYSIS HIGH RESOLUTION PITCH TRACKING Nonperiodic analysis begins with interpolating pitch tracking. VOICE SIGNAL PITCH TRACK

  17. 275 270 FREQ HZ 265 260 0 0.2 0.4 0.6 0.8 1 SEC 4 2 0 FREQ HZ -2 -4 0 0.2 0.4 0.6 0.8 1 SEC NON-PERIODIC ANALYSIS FM DEVIATION SEGREATED TO LOW AND HIGH FREQUENCY The pitch track is low/hi pass filtered to yield tremor and HFPV (High Frequency Pitch Variation).

  18. 45 40 35 30 25 #OCCURRANCES 20 15 10 5 0 -1.5 -1 -0.5 0 0.5 1 1.5 DELT FREQ (%) TOTSKIP=0 FCUT=10 TOTMAN=0 NON-PERIODIC ANALYSIS GAUSSIAN HFPV Successful pitch tracking yields a Gaussian distribution in HPFV. The standard deviation is a convenient measure of HFPV.

  19. 0.1 0.05 0 DELTA FREQ PERCENT -0.05 -0.1 0 0.2 0.4 0.6 0.8 1 TIME (SEC) NON-PERIODIC ANALYSIS FM DEMODULATION The pitch track may be used to demodulate the original voice to obtain a version with almost no pitch variation; re-tracking verifies constant pitch (<0.1%).

  20. 1 PITCH TRACK FEATURES 4 x 10 5 0.9 4.5 4 0.8 3.5 0.7 SMOOTHED ABS ORIG VOICE 3 POWER 2.5 0.6 2 1.5 0.5 1 0.5 0.4 0 0 100 200 300 400 500 SAMPLE NUMBER 0.3 ENVELOPE MINIMA 0 0.5 1 1.5 2 2.5 3 3.5 4 SAMPLE # 4 x 10 NON-PERIODIC ANALYSIS POWER TRACKING Analogously to pitch tracking, voice power is tracked. POWER TRACK

  21. ORIGINAL POWER ( 0) AND TREMOR (LINE) 8 x 10 2 1.5 POWER 1 0.5 0 0.2 0.4 0.6 0.8 1 TIME (SEC) SHIM% = 100*(POWER - LOWPASS POWER)/POWER 20 10 0 DELTA POWER PERCENT -10 -20 0 0.2 0.4 0.6 0.8 1 TIME (SEC) NON-PERIODIC ANALYSIS POWER SEGREGATED TO LOW AND HIGH FREQUENCY The power track is low/hi pass filtered to yield low frequency power variations and high frequency shimmer.

  22. SHIM% HIST. STAND DEV = 4.823% 40 35 30 25 20 #OCCURRANCES 15 10 5 0 -20 -15 -10 -5 0 5 10 15 DELT PWR (%) NON-PERIODIC ANALYSIS GAUSSIAN POWER VARIATIONS Shimmer also displays Gaussian variations.

  23. VARIOUS MEASURES OF SIGNAL STRENGTH 1 0.95 ENERGY SUM POWER 0.9 MAX AMPL. 0.85 0.8 0.75 0 0.5 1 1.5 2 2.5 3 3.5 4 4 x 10 SAMPLE # NON-PERIODIC ANALYSIS AM DEMODULATION The power track may be used to demodulate the original voice to obtain a version with almost no variation in strength; re-tracking verifies constant power.

  24. 7 0.12 6 0.1 5 0.08 LOG10 MAGNITUDE 4 0.06 3 0.04 2 0.02 1 0 0 0 0.1 0.2 0.3 0.4 0.5 0 1000 2000 3000 4000 5000 TIME (SEC) FREQUENCY (HZ) Figure 2.30b. Cepstrum of original voice. Figure 2.30a. PSD oforiginal voice. 0.12 0.12 0.1 0.1 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 0 0.005 0.01 0.015 0.02 0 0.005 0.01 0.015 0.02 TIME (SEC) TIME (SEC) Figure 2.30c. Cepstrum (expanded scale). Figure 2.30d. Comb-liftered cepstrum of 14c. 7 4.5 6 4 5 LOG10 MAGNITUDE LOG10 MAGNITUDE 4 3.5 3 3 2 2.5 1 0 2 -1 -2 1.5 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 FREQUENCY (HZ) FREQUENCY (HZ) Figure 2.30f. Source aspiration PSD with vocal tract removed and fitted to 25-point piecewise-linear model. Figure 2.30e. Orig PSD, aspriation PSD, and vocal tract. NON-PERIODIC ANALYSIS ASPIRATION NOISE ANALYSIS Aspiration noise is segregated via spectral techniques. Peaks in the FFT of the log of the FFT (cepstrum) represent periodic energy, and are filtered out with a comb filter (lifter). Results are used to calculate noise-to-signal ratio (NSR).

  25. 7 6 5 4 3 LOG10 MAGNITUDE 2 1 0 -1 0 100 200 300 400 500 600 700 800 FREQUENCY (HZ) NON-PERIODIC ANALYSIS FM DEMODULATION IMPROVES NSR ACCURACY Using FM demodulation improves resolution of spectral peaks of periodic components, thus allowing longer FFT windows and more accurate NSR determination.

  26. 0 -5 -10 -15 NSR (DB) -20 -25 -30 0 5 10 15 20 25 30 35 CASE# - SORTED BY ASCENDING ORIGINAL NSR NON-PERIODIC ANALYSIS CHANGES IN NSR AFTER FM AND AM DEMODULATION FM demodulation reduces NSR measures by up to 20 dB, yielding results closer to perceived levels. AM demodulation has much less effect. ORIG TREMOR REMOVED ALL FM REMOVED

  27. PC #1: STIMULUS GEN. STIMULUS SIGNAL D/A ACOUSTIC CONDUIT /a/ AMPLIFIER XDUCER PC #2: DAQ CH 1 SIGNAL CONDITIONER MICROPHONE A/D MEM. VOCAL TRACT CH 2 SIGNAL CONDITIONER A/D TUBE OF LENGTH L WITH CLOSED END. FORMANTS AT F = C/4L, 3C/4L, 5C/4L,... PERIODIC ANALYSIS EXTERNAL (ES) SOURCE ANALYSIS Source-filter ambiguity may be resolved by augmenting the glottal source with an known external stimulus. [Epps].

  28. SPECTRUM WITH RAG, W/O RAG, & MAG OF T.F. (YEL) 140 |HA(f)| 120 |HB(f)| 100 80 60 TUBE FORMANTS 40 MAGNITUDE (dB) 20 0 -20 -40 |V(f)| -60 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 FREQUENCY (Hz) 4 x 10 PERIODIC ANALYSIS ES VERIFICATION A simple plastic tube model verified the ES experimental setup. Resonances occur at expected frequencies.

  29. SPECTRA D1: VC LPA(DASH), VC FFT(SOLID) @ 0dB, EXTSRC TF(DOT) @ 20dB EXT SRC T.F.’S 30 F1 F2 20 10 0 -10 F3 F4 -20 LPA -30 -40 -50 -60 -70 FFT -80 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 PERIODIC ANALYSIS ES: NORMAL /a/ LP & FFT analysis show consistent results with ES analysis for a normal vowel.

  30. D1BRTHY: VOICE LPA & FFT (BOT), EXTSRC TF (TOP) 30 EXT SRC T.F.’S 20 F1 F2 10 0 -10 F3 F4 -20 MAGNITUDE (dB) LPA -30 -40 -50 -60 FFT -70 -80 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 FREQUENCY (Hz) PERIODIC ANALYSIS ES: SIMULATED BREATHY /a/ LP & FFT analysis show poor resolution for F3 and F4 for a breathy /a/, while the ES resolution for F3 and F4 remains good.

  31. SYNTHESIS SYNTHESIS OF PATHOLOGICAL VOWELS Synthesis is a critical step in the study of pathological vowels. It provides evidence of the success of analysis and modeling steps via immediate comparisons of original and synthetic voice. Two synthesizers were implemented: 1. A realtime hardware-based synthesizer capable of providing instant response to changes in model parameters. 2. A software synthesizer implemented in MATLAB with extended features, convenient graphical interface, and ease of modification.

  32. CURRENT REALTIME SYNTHESIZER FUNCTIONAL OVERVIEW X86 INTERUPT IMPL CLK SOURCE SELECTION 20 POLES 4 ZEROES KGLOT 88 LOW PASS FILT. . . . D/A AGC + LF1 RECORD WAV AMP GAH LF2 OUTPUT SIGNAL FILE STORE/RECALL CONTROL PARMS ASP NOISE SPKR ARB SOURCE CONTROLS ARBITRARY SOURCE FILE PARAMETER TIME VARIATION JITTER SHIMMER DIPLOPHONIA SYNTHESIS REALTIME SYNTHESIZER - Implemented in native X86 assembly language - Executes all code within 100us cycle - Overrides PC OS to achieve determinancy - Employs dedicated clock, I/O, and control hardware implemented in a wire-wrap PCB adapter card

  33. SYNTHESIS SYNTHESIS VALIDATION The current model, analysis tools, and synthesizers yield a high level of fidelity in generation of synthetic pathological vowels. The system is currently employed at the UCLA Voicelab for NIH funded perceptual studies. In order to objectively validate the analysis/synthesis process, the loop is closed by re-analyzing the synthetic time series to confirm parameter values. Re-analysis also provides opportunity to observe interactions of nonperiodic components.

  34. 0 -5 -10 -15 MEASURED NSR IN SYNTHETIC (DB) -20 -25 -30 -30 -25 -20 -15 -10 -5 0 A.N. SET NSR IN SYNTHESIZER NPB21 SYNTHESIS SYNTHESIS VALIDATION Re-measured synthetic aspiration noise level agrees with level set in synthesizer.

  35. SYNTHESIS SYNTHESIS VALIDATION Aspiration noise adds about %0.2 to HFPV measurements. 2 1.8 1.6 1.4 1.2 1 JITTER MEASURED IN SYNTHETIC (%) 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 JITTER LEVEL SET IN SYNTHESIZER (%) HFPV adds about 4dB to NSR measurements. 0 -5 -10 MEASURED NSR IN SYNTH (dB) -15 -20 -25 -30 -30 -25 -20 -15 -10 -5 0 A.N. SET NSR IN SYNTHESIZER (dB)

  36. SYNTHESIS SYNTHESIS VALIDATION Subjective analysis by synthesis experiments demonstrate the success of AM and FM demodulation in achieving accurate modeling of nonperiodic features. Listeners adjust synthetic aspiration noise to match original. Match improves with demodulation 0 ORIGINAL VOICE (NO DEMOD) -5 PEARSON = 0.51 -10 -15 SABS ASPIRATION NOISE (DB) -20 -25 -30 -30 -25 -20 -15 -10 -5 0 CEPSTRAL NSR (DB)

  37. 0 0 TREMOR REMOVED ALL AM&FM REMOVED -5 -5 PEARSON = 0.71 PEARSON = 0.87 -10 -10 SABS ASPIRATION NOISE (DB) SABS ASPIRATION NOISE (DB) -15 -15 -20 -20 -25 -25 -30 -30 -30 -25 -20 -15 -10 -5 0 -30 -25 -20 -15 -10 -5 0 CEPSTRAL NSR (DB) CEPSTRAL NSR (DB) SYNTHESIS SYNTHESIS VALIDATION

  38. CONCLUSION SUMMARY This study has achieved improved automatic, objective analysis and synthesis of speech within the specialization of pathological vowels. Specific accomplishments include: - A unique, symmetric model for nonperiodic components as AM, FM and spectrally-shaped aspiration noise - Improved accuracy of noise analysis via AM & FM demodulation - Application of ES formant identification for pathological vowels. - Implementation of realtime and offline specialized high fidelity vowel synthesizers

More Related