speech acoustics and phonetics l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Speech acoustics and phonetics PowerPoint Presentation
Download Presentation
Speech acoustics and phonetics

Loading in 2 Seconds...

play fullscreen
1 / 26

Speech acoustics and phonetics - PowerPoint PPT Presentation


  • 486 Views
  • Uploaded on

Speech acoustics and phonetics. Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC). NATO-ASI “Dynamics of Speech Production and Perception” Il Ciocco, Tuscany, Italy, July 1, 2002. Overview. Dynamics in speech acoustics

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Speech acoustics and phonetics' - arleen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
speech acoustics and phonetics

Speech acoustics and phonetics

Louis C.W. Pols

Institute of Phonetic Sciences (IFA)

Amsterdam Center for Language and Communication (ACLC)

NATO-ASI “Dynamics of Speech Production and Perception” Il Ciocco, Tuscany, Italy, July 1, 2002

overview
Overview
  • Dynamics in speech acoustics
  • Contour modeling (mainly formants)
  • Aspects of spectral undershoot
  • Modeling V and C reduction
  • Phonetic knowledge from speech corpora
    • IFA, CGN, TIMIT, found speech
  • Conclusions

Speech acoustics and phonetics, Il Ciocco

dynamics in speech acoustics
Dynamics in speech acoustics
  • Dynamics is the norm, not stationarity
    • articulatory efficiency
  • Dynamics is everywhere
    • generally no word boundaries in speech
    • deletion of words, syllables, phonemes; insertion
    • within/between word coarticulation/assimilation
    • vowel and consonant reduction
  • Acoustic manifestations
    • segment duration, F0, loudness, spectral quality

Speech acoustics and phonetics, Il Ciocco

dynamics is the norm
Dynamics is the norm
  • The speaker speaks as sloppily as the listeners allow him to do in communication
    • communicative efficiency
  • Articulatory vs. perceptual efficiency
    • do spectral transitions facilitate or hamper perception? —> see other presentation
  • Speaker flexibility; speaking style (clear vs. sloppy); speaking rate

Speech acoustics and phonetics, Il Ciocco

dynamics is everywhere
Dynamics is everywhere
  • Deletion
    • ‘bread and butter’ /brEmbY3/
    • ‘Amsterdam’ (Du) /Amst@rdAm/ —>/Ams@dAm/
    • ‘koninklijke’ (Du) /konIŋkl@k@/ —>/kol@k@/
  • Insertion
    • homorganic glide insertion: ‘die een’ (Du) /dij@n/
  • Degemination
    • ‘is zichtbaar’ (Du) /Is zIxtbar/ —>/IsIxbar/
  • Reduction, coarticulation, assimilation

Speech acoustics and phonetics, Il Ciocco

acoustic manifestations
Acoustic manifestations
  • pitch, loudness, formant, component contours
  • contour stylization (e.g., pitch in praat)
  • contour modeling
    • n-th degree curve fitting (D.van Bergem)
    • Legendre polynomials ) (R.van Son)
    • 16 points per segment )
  • (phoneme) segmentation
    • by hand (time consuming; non-consistent)
    • automatically (via forced phoneme recognition and a pronunciation lexicon with alternatives; systematic errors)

Speech acoustics and phonetics, Il Ciocco

contour modeling
Contour modeling
  • allows modeling of specific phenomena
    • pitch accentuation (vs. vowel onset)
    • reduction, centralization, undershoot
  • allows generation of stimuli for perc. expts.
    • phoneme identification in extending context
    • 2-alternatives forced choice identif. of continua
    • discrimination, RT
  • allows statistics on large speech corpora
    • TIMIT, CGN, IFA-corpus, Switchboard

Speech acoustics and phonetics, Il Ciocco

static vs dynamic v recogn
Static vs. dynamic V recogn.
  • see Weenink (2001)
    • “Vowel normalizations with the TIMIT acoustic phonetic speech corpus”, IFA Proc. 24, 117-123
  • 438 males, both train & test sent. of TIMIT
  • 35,385 vowel segments, hand segmented
  • 13 monophthongeal vowel categories
  • 1-Bark bandfilter anal. (18), intensity. normal.
  • 3 frames per segment: central and 25 ms L/R

Speech acoustics and phonetics, Il Ciocco

some results
Some results
  • Vowel classif. (%) with discriminant functions

Speech acoustics and phonetics, Il Ciocco

formant tracks speaking rate
Formant tracks / speaking rate
  • Ph.D. thesis Rob van Son (1993)
    • “Spectro-temporal features of vowel segments”
    • see also Speech Comm. 13, 135-148 (Pols & vSon)
  • 850-words text, read at normal and fast rate
  • hand segmentation of 7 most freq. V + schwa
  • formant tracks
    • via 16 points per segm. or 5 Legendre polynomials
  • influence of rate, V-dur., context, sent. acc.
  • evidence for duration-controlled undershoot?

Speech acoustics and phonetics, Il Ciocco

some results12
Some results
  • no differences for F1/F2 in vowel center for normal- or fast-rate speech; only some over- all rise in F1 for fast rate (irrespective of V)
  • same formant track shape (normalized to 16 points) for normal- or fast-rate speech
  • same results when using the more elaborate Legendre polynomials
  • Concl.: changes in V-duration do not change the amount of undershoot —> active control of articulation speed

Speech acoustics and phonetics, Il Ciocco

formant representations
Formant representations

e

e

zeroth order Legendre Legendre polynomial coefficients (mean Fi in vowel segment)

second order polynomials (axes reversed)

Speech acoustics and phonetics, Il Ciocco

modeling vowel reduction
Modeling vowel reduction
  • Ph.D. thesis Dick van Bergem (1995)
    • “Acoustic and lexical vowel reduction”
    • see also Speech Communication 16, 329-358
  • lexical V reduction Fr /betõ/ vs. Du /b@tOn/
  • acoustic V reduction /banan, bAnan, b@nan/
    • f(sent. acc., w. str., w. class): can-candy-canteen
  • coarticulatory effects on the schwa
    • C1@C2V- and VC1@C2-typenonsense words
  • perceptual effects (full V or schwa, f.i. ‘ananas’)

Speech acoustics and phonetics, Il Ciocco

some results15
Some results

t-n

w-l

The schwa is not just a centralized vowel but something

that is completely assimilated with its phonemic context

Speech acoustics and phonetics, Il Ciocco

modeling consonant reduction
Modeling consonant reduction
  • Sp. Comm. (1999) 28, 125-140 (vSon & Pols)
  • 20 min. speech, both spontaneous and read
  • 2 x 791 similar VCV; hand segmented
  • 5 aspects of V and C reduction
    • related to coarticulation: F2 slope differences at CV- vs. VC-boundaries; F2 locus equations (F2 onset vs. F2 target)
    • related to speaking effort: duration; spectral COG (mean freq.); V-C sound energy differences

Speech acoustics and phonetics, Il Ciocco

some results17
Some results
  • V markedly reduced in spontaneous speech
  • lower F2-slope diff. in spontaneous speech —> decrease in articulation speed
  • no systematic effect on F2 locus equation; V onsets and targets change in concert —> any V reduction mirrored by comparable change in C
  • spont. sp.: V and C shorter; lower COG —> decrease in vocal and articulatory effort

Speech acoustics and phonetics, Il Ciocco

access to large corpora
Access to large corpora
  • more, and more realistic, data
  • phonetic knowledge via statistical analyses
  • f.i. highly accessible IFA-corpus (free, SQL)
    • see “Structure and access of the open source IFA-corpus”, IFA Proc. 24, 15-26 (vSon & Pols)
    • on-line http://www.fon.hum.uva.nl/IFAcorpus/
  • 4 M/4F speakers, 5.5 hrs of speech
    • from informal to read + sent., words, syllables
    • ~ 50Kwords segm. and labeled at phoneme level

Speech acoustics and phonetics, Il Ciocco

some results19
Some results
  • speech + annot. + meta data: relational DB
  • realization of final n, f.i. Du ‘geven’ /xev@(n)/

Read

Speech acoustics and phonetics, Il Ciocco

spoken dutch corpus cgn
Spoken Dutch Corpus (CGN)
  • 10 M words, 1,000 hrs of speech
  • variety of styles, incl. telephone speech
  • adult Dutch and Flemish speakers
  • for linguistic and technological research
  • see various LREC and ICSLP papers (2002)
  • see also http://lands.let.kun.nl/cgn/home.htm
  • fully transcribed: orthogr., POS, lemmas
  • partly transcr.: phonemic, prosodic, syntactic

Speech acoustics and phonetics, Il Ciocco

timit
TIMIT
  • popular DB in acoustic phonetics and ASR
    • also telephone version (NTIMIT)
  • hand segmented & labeled at phoneme level
  • 438 males, 192 females (8 dialect regions)
  • 10 sent./sp. (2 fixed, 1 phon. compact, 7 diverse)

sa1: “She had her dark suit in greasy wash water all year”

  • includes separate test data (112 M, 56 F)
  • e.g. Ph.D thesis X. Wang (1997)

“Incorporating knowledge on segmental duration in HMM-based continuous speech recognition”

Speech acoustics and phonetics, Il Ciocco

slide22

overall average=95 ms

normal rate=95

primary stress=104

word final=136

utterance final=186

Useful info: durational variability

Adopted from Wang (1998)

Speech acoustics and phonetics, Il Ciocco

slide23

all 3,696 training sent. (sx + si) of TIMIT training set

0

normalized phone duration

speaking rate

found speech
‘found’ speech
  • DARPA-LVSR community rather ambitious
  • Broadcast News (BN), Sp.Comm. 37 (2002)

For Proc. DARPA Workshops, see http://www.nist.gov/speech/proc/darpa99/index.htm

Speech acoustics and phonetics, Il Ciocco

articul acoustic features in asr
Articul.-acoustic features in ASR
  • “A Dutch treatment of an elitist approach to articulatory-acoustic feature classification”, Proc. Eurospeech-2001, 1729-1732 (M. Wester et al.)
  • “Integrating articulatory features into acoustic models for speech recognition”, Phonus 5, 73-86 (K. Kirchhoff, 2000)
  • “An overlapping-feature-based phonological model incorporating linguistic constraints: Applications to speech recognition”, JASA 111 (2), 1086-1101 (J. Sun & L. Deng, 2002)

Speech acoustics and phonetics, Il Ciocco

conclusions
Conclusions
  • examples of dynamics in speech acoustics
  • going from formal to informal speech:
    • less dynamics, more reduction (artic. guided)
    • undershoot vs. speaking style
    • sloppiness or articulatory limits?
  • functionality of dynamics? —> other paper
  • systematicity of dynamics?
    • easing ASR, rules for TTS, acquiring knowledge?

Speech acoustics and phonetics, Il Ciocco