Speech acoustics and phonetics
Download
1 / 26

speech acoustics and phonetics - PowerPoint PPT Presentation


  • 473 Views
  • Updated On :

Speech acoustics and phonetics. Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC). NATO-ASI “Dynamics of Speech Production and Perception” Il Ciocco, Tuscany, Italy, July 1, 2002. Overview. Dynamics in speech acoustics

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'speech acoustics and phonetics' - arleen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Speech acoustics and phonetics l.jpg

Speech acoustics and phonetics

Louis C.W. Pols

Institute of Phonetic Sciences (IFA)

Amsterdam Center for Language and Communication (ACLC)

NATO-ASI “Dynamics of Speech Production and Perception” Il Ciocco, Tuscany, Italy, July 1, 2002


Overview l.jpg
Overview

  • Dynamics in speech acoustics

  • Contour modeling (mainly formants)

  • Aspects of spectral undershoot

  • Modeling V and C reduction

  • Phonetic knowledge from speech corpora

    • IFA, CGN, TIMIT, found speech

  • Conclusions

Speech acoustics and phonetics, Il Ciocco


Dynamics in speech acoustics l.jpg
Dynamics in speech acoustics

  • Dynamics is the norm, not stationarity

    • articulatory efficiency

  • Dynamics is everywhere

    • generally no word boundaries in speech

    • deletion of words, syllables, phonemes; insertion

    • within/between word coarticulation/assimilation

    • vowel and consonant reduction

  • Acoustic manifestations

    • segment duration, F0, loudness, spectral quality

Speech acoustics and phonetics, Il Ciocco


Dynamics is the norm l.jpg
Dynamics is the norm

  • The speaker speaks as sloppily as the listeners allow him to do in communication

    • communicative efficiency

  • Articulatory vs. perceptual efficiency

    • do spectral transitions facilitate or hamper perception? —> see other presentation

  • Speaker flexibility; speaking style (clear vs. sloppy); speaking rate

Speech acoustics and phonetics, Il Ciocco


Dynamics is everywhere l.jpg
Dynamics is everywhere

Speech acoustics and phonetics, Il Ciocco


Acoustic manifestations l.jpg
Acoustic manifestations

  • pitch, loudness, formant, component contours

  • contour stylization (e.g., pitch in praat)

  • contour modeling

    • n-th degree curve fitting (D.van Bergem)

    • Legendre polynomials ) (R.van Son)

    • 16 points per segment )

  • (phoneme) segmentation

    • by hand (time consuming; non-consistent)

    • automatically (via forced phoneme recognition and a pronunciation lexicon with alternatives; systematic errors)

Speech acoustics and phonetics, Il Ciocco


Contour modeling l.jpg
Contour modeling

  • allows modeling of specific phenomena

    • pitch accentuation (vs. vowel onset)

    • reduction, centralization, undershoot

  • allows generation of stimuli for perc. expts.

    • phoneme identification in extending context

    • 2-alternatives forced choice identif. of continua

    • discrimination, RT

  • allows statistics on large speech corpora

    • TIMIT, CGN, IFA-corpus, Switchboard

Speech acoustics and phonetics, Il Ciocco


Static vs dynamic v recogn l.jpg
Static vs. dynamic V recogn.

  • see Weenink (2001)

    • “Vowel normalizations with the TIMIT acoustic phonetic speech corpus”, IFA Proc. 24, 117-123

  • 438 males, both train & test sent. of TIMIT

  • 35,385 vowel segments, hand segmented

  • 13 monophthongeal vowel categories

  • 1-Bark bandfilter anal. (18), intensity. normal.

  • 3 frames per segment: central and 25 ms L/R

Speech acoustics and phonetics, Il Ciocco


Some results l.jpg
Some results

  • Vowel classif. (%) with discriminant functions

Speech acoustics and phonetics, Il Ciocco


Formant tracks speaking rate l.jpg
Formant tracks / speaking rate

  • Ph.D. thesis Rob van Son (1993)

    • “Spectro-temporal features of vowel segments”

    • see also Speech Comm. 13, 135-148 (Pols & vSon)

  • 850-words text, read at normal and fast rate

  • hand segmentation of 7 most freq. V + schwa

  • formant tracks

    • via 16 points per segm. or 5 Legendre polynomials

  • influence of rate, V-dur., context, sent. acc.

  • evidence for duration-controlled undershoot?

Speech acoustics and phonetics, Il Ciocco


Some results12 l.jpg
Some results

  • no differences for F1/F2 in vowel center for normal- or fast-rate speech; only some over- all rise in F1 for fast rate (irrespective of V)

  • same formant track shape (normalized to 16 points) for normal- or fast-rate speech

  • same results when using the more elaborate Legendre polynomials

  • Concl.: changes in V-duration do not change the amount of undershoot —> active control of articulation speed

Speech acoustics and phonetics, Il Ciocco


Formant representations l.jpg
Formant representations

e

e

zeroth order Legendre Legendre polynomial coefficients (mean Fi in vowel segment)

second order polynomials (axes reversed)

Speech acoustics and phonetics, Il Ciocco


Modeling vowel reduction l.jpg
Modeling vowel reduction

  • Ph.D. thesis Dick van Bergem (1995)

    • “Acoustic and lexical vowel reduction”

    • see also Speech Communication 16, 329-358

  • lexical V reduction Fr /betõ/ vs. Du /[email protected]/

  • acoustic V reduction /banan, bAnan, [email protected]/

    • f(sent. acc., w. str., w. class): can-candy-canteen

  • coarticulatory effects on the schwa

  • perceptual effects (full V or schwa, f.i. ‘ananas’)

Speech acoustics and phonetics, Il Ciocco


Some results15 l.jpg
Some results

t-n

w-l

The schwa is not just a centralized vowel but something

that is completely assimilated with its phonemic context

Speech acoustics and phonetics, Il Ciocco


Modeling consonant reduction l.jpg
Modeling consonant reduction

  • Sp. Comm. (1999) 28, 125-140 (vSon & Pols)

  • 20 min. speech, both spontaneous and read

  • 2 x 791 similar VCV; hand segmented

  • 5 aspects of V and C reduction

    • related to coarticulation: F2 slope differences at CV- vs. VC-boundaries; F2 locus equations (F2 onset vs. F2 target)

    • related to speaking effort: duration; spectral COG (mean freq.); V-C sound energy differences

Speech acoustics and phonetics, Il Ciocco


Some results17 l.jpg
Some results

  • V markedly reduced in spontaneous speech

  • lower F2-slope diff. in spontaneous speech —> decrease in articulation speed

  • no systematic effect on F2 locus equation; V onsets and targets change in concert —> any V reduction mirrored by comparable change in C

  • spont. sp.: V and C shorter; lower COG —> decrease in vocal and articulatory effort

Speech acoustics and phonetics, Il Ciocco


Access to large corpora l.jpg
Access to large corpora

  • more, and more realistic, data

  • phonetic knowledge via statistical analyses

  • f.i. highly accessible IFA-corpus (free, SQL)

    • see “Structure and access of the open source IFA-corpus”, IFA Proc. 24, 15-26 (vSon & Pols)

    • on-line http://www.fon.hum.uva.nl/IFAcorpus/

  • 4 M/4F speakers, 5.5 hrs of speech

    • from informal to read + sent., words, syllables

    • ~ 50Kwords segm. and labeled at phoneme level

Speech acoustics and phonetics, Il Ciocco


Some results19 l.jpg
Some results

  • speech + annot. + meta data: relational DB

  • realization of final n, f.i. Du ‘geven’ /[email protected](n)/

Read

Speech acoustics and phonetics, Il Ciocco


Spoken dutch corpus cgn l.jpg
Spoken Dutch Corpus (CGN)

  • 10 M words, 1,000 hrs of speech

  • variety of styles, incl. telephone speech

  • adult Dutch and Flemish speakers

  • for linguistic and technological research

  • see various LREC and ICSLP papers (2002)

  • see also http://lands.let.kun.nl/cgn/home.htm

  • fully transcribed: orthogr., POS, lemmas

  • partly transcr.: phonemic, prosodic, syntactic

Speech acoustics and phonetics, Il Ciocco


Timit l.jpg
TIMIT

  • popular DB in acoustic phonetics and ASR

    • also telephone version (NTIMIT)

  • hand segmented & labeled at phoneme level

  • 438 males, 192 females (8 dialect regions)

  • 10 sent./sp. (2 fixed, 1 phon. compact, 7 diverse)

    sa1: “She had her dark suit in greasy wash water all year”

  • includes separate test data (112 M, 56 F)

  • e.g. Ph.D thesis X. Wang (1997)

    “Incorporating knowledge on segmental duration in HMM-based continuous speech recognition”

Speech acoustics and phonetics, Il Ciocco


Slide22 l.jpg

overall average=95 ms

normal rate=95

primary stress=104

word final=136

utterance final=186

Useful info: durational variability

Adopted from Wang (1998)

Speech acoustics and phonetics, Il Ciocco


Slide23 l.jpg

all 3,696 training sent. (sx + si) of TIMIT training set

0

normalized phone duration

speaking rate


Found speech l.jpg
‘found’ speech

  • DARPA-LVSR community rather ambitious

  • Broadcast News (BN), Sp.Comm. 37 (2002)

For Proc. DARPA Workshops, see http://www.nist.gov/speech/proc/darpa99/index.htm

Speech acoustics and phonetics, Il Ciocco


Articul acoustic features in asr l.jpg
Articul.-acoustic features in ASR

  • “A Dutch treatment of an elitist approach to articulatory-acoustic feature classification”, Proc. Eurospeech-2001, 1729-1732 (M. Wester et al.)

  • “Integrating articulatory features into acoustic models for speech recognition”, Phonus 5, 73-86 (K. Kirchhoff, 2000)

  • “An overlapping-feature-based phonological model incorporating linguistic constraints: Applications to speech recognition”, JASA 111 (2), 1086-1101 (J. Sun & L. Deng, 2002)

Speech acoustics and phonetics, Il Ciocco


Conclusions l.jpg
Conclusions

  • examples of dynamics in speech acoustics

  • going from formal to informal speech:

    • less dynamics, more reduction (artic. guided)

    • undershoot vs. speaking style

    • sloppiness or articulatory limits?

  • functionality of dynamics? —> other paper

  • systematicity of dynamics?

    • easing ASR, rules for TTS, acquiring knowledge?

Speech acoustics and phonetics, Il Ciocco


ad