SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch

SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACELouis ten Bosch

Contents • Introduction • Objectives • Articulatory Features • Speech Material • Experimental details • set-up • Results • Questions, future plans

Introduction • Speech is usually represented in terms of sequences from a limited set of phone-like symbols (ASR, synthesis, annotation) • ‘Beads-on-a-string’ paradigm (Ostendorf, 1999; etc) • Powerful as meta description • Weak to describe articulatory variation, pronunciation variation • Research on new descriptions & models of speech • Many proposals for new signal representations (continuity preserving, auditorily inspired) and new models (neural models, long-span models, parallel models) • Here: articulatory features (AF)

Objectives • To obtain alternative representations that intrinsically better model variation in speech • Focus on articulatory/pronunciation variation • To investigate the relation between better representations and decoding

Articulatory Features (AFs) • AF advantages are twofold: • Allow feature asynchrony • Deal with ‘incompleteness’: incomplete nasalization, voicing • Intrinsically better modelling of continuous processes • Assumed to better model fine phonetic details (FPD) • FPD mediate human speech processing (lexical access) • [together with indexical information]

Distance Metric in AF Space • Each utterance is a path in AF space • Distance metric in AF space defines ‘speed’ along path • Compare with delta-features in ASR • Speed peak detection impose intrinsic temporal structure • Which distances to use? • Three types (L1, L2, cosine) • How relates this ‘intrinsic’ temporal structure with external temporal structure e.g. phone boundaries?

Articulatory Features and Their Values

Speech Material • IFAcorpus (Dutch, read + prepared, 8 speakers, 6 used for training and development, 2 for test) • Many different rich annotation levels

AF Classification Results by ANNs

AF-Based Events and Segment Boundaries

Alignment Results • Nbr of hits (detected -> observed) versus time window size: Wesenick & Kipp ‘96

Asynchrony and Phonetic Classes Average (in number of frames) and standard deviation of the difference (diff.) between cosine-peak location and manual boundary. Only the transitions with extreme negative and positive distances are shown. Manner transition avg. (st.dev.) Fricative-fricative -0.57 (1.6) Vowel-vowel -0.31 (1.8) …. Silence-approximant 0.49 (1.8) Approx.-stop 0.63 (1.6) Vowel-silence 0.64 (2.1) Nasal-approx 0.66 (1.0)

Open questions 1 • To what extent the type of distance (L1, L2, cosine) distinguishes fine detail in the alignment with manual segmentation? • For distances close to 0, all metrics will provide about the same result • The metrics deviate for larger distances, thereby putting more weight to different types of distinctions • This means that event parsing along the AF trajectory may result into essentially different segmentations along the trajectory for different metrics.

Open questions 2 • What about the cue trading (by using weights)? • Difficult, depends on phone • What about the precise quantification of asynchrony? • The variation of observed AF vectors around a canonical AF vector = feature asynchrony + the variation in the classifier output

Near-future plans • Exploit phenomena described here in terms of design principles for alternative procedures for data-driven annotation and unit selection • Design word recognition framework based on AF representation of speech • Study usability for memory-prediction models

Thank you for your attention

SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch

SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch

Presentation Transcript

Ten Commandments of Speech Preparation

THE CREATIVE USE OF SPACE

Robust Speech Feature

L2 Speech and Rhythm Metrics

Angle, distance, and Powers of Ten

The ten commandments of feature writing

Unit Ten Going the Distance

Robust Speech Feature

Investigation on Inter-Speaker Variability in The Feature Space

Modeling and Perceiving of (Un)Certainty in Articulatory Speech Synthesis

Visual Articulatory Feature Classification, and A Comparison of MLP and SVMs

Articulatory Feature-Based Speech Recognition

Perspectives for Articulatory Speech Synthesis

The Use of Speech in Speech-to-Speech Translation

Articulatory Feature-Based Speech Recognition

Articulatory Feature-Based Speech Recognition JHU WS06 Planning Meeting June 4, 2006

Acoustic to articulatory inversion of speech Yves Laprie Speech Group INRIA Lorraine

SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch

Feature Computation: Representing the Speech Signal

The Implicit Mapping into Feature Space

Articulatory Feature-Based Speech Recognition

Slides for use on “The Space Show”