speech variation and the use of distance metrics on the articulatory feature space louis ten bosch n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch PowerPoint Presentation
Download Presentation
SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch

Loading in 2 Seconds...

play fullscreen
1 / 16

SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch. Contents. Introduction Objectives Articulatory Features Speech Material Experimental details set-up Results Questions, future plans. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
speech variation and the use of distance metrics on the articulatory feature space louis ten bosch

SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACELouis ten Bosch

contents
Contents
  • Introduction
  • Objectives
  • Articulatory Features
  • Speech Material
  • Experimental details
    • set-up
    • Results
  • Questions, future plans
introduction
Introduction
  • Speech is usually represented in terms of sequences from a limited set of phone-like symbols (ASR, synthesis, annotation)
  • ‘Beads-on-a-string’ paradigm (Ostendorf, 1999; etc)
    • Powerful as meta description
    • Weak to describe articulatory variation, pronunciation variation
  • Research on new descriptions & models of speech
    • Many proposals for new signal representations (continuity preserving, auditorily inspired) and new models (neural models, long-span models, parallel models)
    • Here: articulatory features (AF)
objectives
Objectives
  • To obtain alternative representations that intrinsically better model variation in speech
  • Focus on articulatory/pronunciation variation
  • To investigate the relation between better representations and decoding
articulatory features afs
Articulatory Features (AFs)
  • AF advantages are twofold:
    • Allow feature asynchrony
    • Deal with ‘incompleteness’: incomplete nasalization, voicing
    • Intrinsically better modelling of continuous processes
    • Assumed to better model fine phonetic details (FPD)
      • FPD mediate human speech processing (lexical access)
      • [together with indexical information]
distance metric in af space
Distance Metric in AF Space
  • Each utterance is a path in AF space
  • Distance metric in AF space defines ‘speed’ along path
    • Compare with delta-features in ASR
  • Speed peak detection impose intrinsic temporal structure
  • Which distances to use?
    • Three types (L1, L2, cosine)
  • How relates this ‘intrinsic’ temporal structure with external temporal structure e.g. phone boundaries?
speech material
Speech Material
  • IFAcorpus (Dutch, read + prepared, 8 speakers, 6 used for training and development, 2 for test)
  • Many different rich annotation levels
alignment results
Alignment Results
  • Nbr of hits (detected -> observed) versus time window size:

Wesenick & Kipp ‘96

asynchrony and phonetic classes
Asynchrony and Phonetic Classes

Average (in number of frames) and standard deviation of the difference (diff.) between cosine-peak location and manual boundary.

Only the transitions with extreme negative and positive distances are shown.

Manner transition avg. (st.dev.)

Fricative-fricative -0.57 (1.6)

Vowel-vowel -0.31 (1.8)

….

Silence-approximant 0.49 (1.8)

Approx.-stop 0.63 (1.6)

Vowel-silence 0.64 (2.1)

Nasal-approx 0.66 (1.0)

open questions 1
Open questions 1
  • To what extent the type of distance (L1, L2, cosine) distinguishes fine detail in the alignment with manual segmentation?
    • For distances close to 0, all metrics will provide about the same result
    • The metrics deviate for larger distances, thereby putting more weight to different types of distinctions
  • This means that event parsing along the AF trajectory may result into essentially different segmentations along the trajectory for different metrics.
open questions 2
Open questions 2
  • What about the cue trading (by using weights)?
    • Difficult, depends on phone
  • What about the precise quantification of asynchrony?
    • The variation of observed AF vectors around a canonical AF vector = feature asynchrony + the variation in the classifier output
near future plans
Near-future plans
  • Exploit phenomena described here in terms of design principles for alternative procedures for data-driven annotation and unit selection
  • Design word recognition framework based on AF representation of speech
  • Study usability for memory-prediction models