slide1
Download
Skip this Video
Download Presentation
Analysis, Modelling and Synthesis of British, Australian and American Accents

Loading in 2 Seconds...

play fullscreen
1 / 27

Content - PowerPoint PPT Presentation


  • 127 Views
  • Uploaded on

Analysis, Modelling and Synthesis of British, Australian and American Accents. Supported by EPSRC. Qin Yan Saeed Vaseghi Multimedia Communication Signal processing Lab Department of Electronic and Computer Engineering Brunel University.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Content' - albert


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Analysis, Modelling and Synthesis of

British, Australian and American Accents

Supported by EPSRC

Qin Yan Saeed Vaseghi

Multimedia Communication Signal processing Lab

Department of Electronic and Computer Engineering

Brunel University

content
1- Introduction to Phonetics and Acoustics of Accents

2-Research Issues in Modelling Acoustics of Accents of English

3- Current Research Problems

4- Accent Analysis and Models

5- Accent Morphing

6- Audio Demo

Content
slide3
1.1 Background

Accents are acoustic manifestations of differences in pronunciation and intonations by a community of people from a national, regional or a socio-economic grouping.

Accents are dynamic processes in that they evolve over time influenced by large-scale immigration, socio-economic changes and cultural trends.

Applications of accent models include:

- speech recognition,

- text to speech synthesis,

- voice editing,

- accent morphing in broadcasting and films,

- toys and computer games,

- accent coaching, education.

1. Introduction to Phonetics and Acoustics of Accents

slide4
The importance of an accent feature depends on its distance from that of the ‘standard’ or ‘received’ pronunciation and the frequency with which that feature occurs in the acoustics of speech.
  • 1.2 Basic Structure of Accents
  • Generally the structural differences betweenaccents can be divided into two broad parts:
  • (a) Differences in phonetic transcriptions.
  • (b) Differences in acoustics correlates and intonations of accents.
slide5

1.3 Phonetics of Accents

  • A dominant aspect of accents is in the differences in pronunciation as transcribed by a phonetic dictionary.
  • The differences in phonetic transcription can be categorized into two classes:
  • a) Differences in the number and identity of the phonemes.
  • For example, British English as transcribed by Cambridge University’s BEEP dictionary2 has five extra vowels: /ax(ə) ea(ɛə) ia(iə) ua (uə) ah (ɒ)/ compared to American as transcribed by Carnegie Melon University CMU dictionary. /iəɛəuə/,are allophones of /i ɛ u/.American /ɒ/ is merged with /a/ compared with British accent.
  • American transcription has three different levels of stress for vowels and diphthongs. Also Australian English has distinctive vowels such as /æi/ instead of /ei/ and /æƆ/for /au/.
  • b) Differences in phonetic realizations: phoneme substitution, deletion, insertion.
  • For example, ‘JOHN’ is pronounced as /ʤΛn/ in American but as /ʤƆn/in British and Australian English. The word ‘SAY’ is pronounced as /sei/ in British and American but it is pronounced as /sæi / in Australian.
slide6

1.4 Acoustics of Accents

  • Perceived acoustics differences of accents are due to the differences, during the production of sound, in the configurations, positioning, tension and movement of laryngeal and supra-laryngeal articulatory parameters, namely vocal folds, vocal tract, tongue and lips
  • Four aspects of acoustic correlates of accents are considered essential for accent models and accent synthesis. These are:
  • (a) Formants(i.e. frequency of vocal tract resonance) correlates of accents, including:
  • (i) Formant trajectoriesFkj(t), k is the formant index and j is phoneme index.
  • (ii) Timing and magnitude of the formant target point(s) in formant space for each phonetic unit.
slide7

(b) Pitch prosody correlates of accents, include:

(i) Pitch trajectory at various linguistic contexts and positions. e.g. pitch rise, at the beginning of a voiced group or phrase, pitch fall at the end of a phrase.

(ii) Pitch nucleus i.e. the timing and magnitude of the prominent pitch event in a voiced group.

(c) Duration and Timing correlates of accents,

(i) Duration of vowels and diphthongs.

(ii) Relative duration and timings of the two constituent vowels of diphthongs.

(d) Laryngeal (glottal) correlates of accents, i.e the voice quality of speech segments in certain contexts as a function of accent.

slide8

2. Research Issues in Modelling Acoustics of Accents of English

  • Definition of an accent ‘feature set’ composed of formants’ trajectories, formants’ target points, pitch trajectory, power trajectory, duration.
  • Separation, normalisation, or averaging out of speakers’ characteristics from accent characteristics, this is required for modelling parameters of accent.
  • Modelling formants of vowels and diphthongs, the latter is composed of two connected elementary sounds.
  • Modelling the duration of vowels and diphthongs and the relative duration of the two halves of diphthongs.
  • Modelling pitch trajectory in different phonetic/linguistic positions and contexts.
  • Modelling voice quality correlates of an accents in different phonetic/linguistic positions and contexts.
  • Integration of all accent features within a coherent generative model.
slide9

Accent Profile (AP)

Parameters

Comments

Rank

Phonetic Parameters

Substitution, insertion, deletion

Pronunciation differences obtained from phonetic transcription dictionaries

*****

Supra-laryngeal and Laryngeal Correlates

Formants & their trajectories

2nd formant with largest variance is most sensitive to accent

****

Glottal pulse (Voice Quality)

Durations and shapes of opening and closing of glottal folds

**

Prosody Correlates

F0 mean

Average of pitch

*

F0 range

Range of pitch

*

Pitch Nucleus

Prominent point (stressed) within an intonation group (Tone Unit)

***

Initial Pitch Rise

First pitch slope of a narrative utterance

***

Final Pitch Lowering

Final fall pitch slope of a narrative utterance

***

Final Pitch Rise

Final rise pitch slope of a narrative utterance

***

Timing and Delivery Correlates

Speaking Rate

Phonemes or words per second

*

Phoneme Duration

Vowel duration elongation and complete pronunciation all affect

***

Excessive Co-articulation

Clipped or short duration sounds

****

speech accent feature analysis method
Speech Accent Feature Analysis Method

Speaking Rate

& Durations

HMM

Training

Labeling &

Segmentation

Formants

& Trajectories

Input

Speech

Accent

Profile

F0 Range/Mean

Pitch Accents

Pitch Contour

Tracker

Pitch

Marker

Tone Nucleus

Features

Block diagram illustration of the processes involved in accent analysis

  • The basic processes involved in accent analysis includes
  • Speech phoneticlabelling and boundary segmentationusing HMMs
  • Pitch trajectory and pitch nucleus estimation
  • Formant models and formant track estimation
  • Duration and power trajectory analysis
analysis of duration correlate of au us and uk accent speech
Analysisof Duration Correlate of AU, US and UK Accent Speech

Figure: Comparison of speaking rates of British, Australian and American.

0.2

0.18

0.16

0.14

0.12

Duration (sec)

0.1

0.08

0.06

0.04

Australian

British

American

0.02

aa

ae

ah

ao

aw

ay

eh

er

ey

ih

iy

ow

oy

uh

uw

Figure: Comparison of phoneme durations of British, Australian and American.

slide12

Speaking Rate (number/sec)

Phone

Word

British

12.1

3.64

American

11.6

3.1

Australian

10.8

2.8

Comparison of speaking rates of British, American and Australian Accents.

  • Australian speaking (word) rate is 23% slower than British
  • American speaking (word) rate is 15% slower than British

Table : (%) word error of speech recognition across British, American and Australian accents.

  • There is an apparent correlation between automatic speech recognition and speaking rate.
  • Australian with the slowest speaking rate obtains the best recognition results followed by American and British.
slide13

Formant Estimation with 2D-HMM

  • Formant feature extraction, illustrated consists of three main functions,
  • an LP model,
  • (2) a polynomial root finder, and
  • (3) a contour trend estimator.
  • Consider the z-transfer function of an LP model with K real poles and I complex pole pairs and a gain factor G as
  • where Ak is the pole radius, Fi the pole frequency and Fs sampling frequency.

D-estimator

Formant candidate

Feature vector

Frequency,Bandwidth

Intensity Calculation

LPC

Model

Polynomial

roots

Segmentation

& window

Speech

LP-based Formant-candidate feature extraction method

slide14

Time(s)

Frequency(Hz)

Illustration of of LP spectrum and the modelling of 6 complex pole pairs of a speech segment with an HMM composed of 4 formant-states.

  • 2D HMMs span time and frequency dimensions
  • Left-right HMM states across frequency model formants such that the first state models the first formant, the second state the second formant and so on
  • The distribution of formants in each state is modelled by a mixture Gaussian density.
slide16

Comparison of histograms (thin solid line) and Gaussian HMMs of formants of Australian English (bold dashed line). X axis: frequency (Hz); Y axis: probability.

The figures show that HMMS are excellent models of the distribution of the formants.

comparison of formants spaces of american australian and british accents
Comparison of Formants Spaces of American, Australian and British Accents

F1 vs F2 space of British, Australian and American English. Click phoneme to listen.

  • Note the following features:
  • Rising of vowels /ae/ and /eh/ in Australian.
  • Fronting of the open vowel /aa/ and high vowel /uw/ in Australian.
  • Fronting and rising of the vowel /er/ in Australian.
  • The vowels /iy/, /eh/ and /ae/ in Australian are closer.
slide18

Figure : Comparison of trajectories and target time of formant of British,Australian and American accents

slide19

Figure : Comparison of formants of Australian, British and American (female)

Formant Ranking using a normalised distance

  • 2nd Formant has widest frequency range and is most sensitive to Accent
accent morphing method
Accent Morphing Method

Accent Model

HMMTraining/

Adaptation

Speech Labeling & Segmentation

Prosody Modification

Source Speech

Formant Mapping

Formant Estimation

Accent Synthesised

Speech

Pitch Tracker

Figure : Diagram of a voice morphing system used for accent conversion

  • Formant Mapping : Transformation of formants of the source towards those of the target accent is based on non-uniform linear prediction modelfrequency warping.
  • Prosody Modification : based on time domain pitch synchronous overlap and add (TD-PSOLA) method.
  • Prosody Modification includes pitch slope, duration and power trajectory.
  • Application : Text to speech synthesis, Broadcasting System e.g. Accent modification in films, Education software such language teaching, Speech interface in mobile, Call centre and other electronic products
formant transformation via non uniform lp frequency warping
Formant Transformation via Non-Uniform LP Frequency Warping

-35

-40

-45

-50

-55

Magnitude (dB)

-60

-65

I

12

-70

I

I

23

34

-75

1

0.1

BW

BW

BW

0

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

BW

1

3

4

2

F

F

F

F

F

01

12

23

34

45

Frequency (Hz)

Figure Illustration of a non-uniform frequency warping using LP model frequency response. The spectrum is divided into a number of bands centered on the formants and a different set of warping parameters is applied to each band.

Formant Transformation Ratios

Linear Prediction Model

Polynomial roots

Pole estimation

Accent modified

spectrum

LP Spectrum Mapping

Speech

Formant Estimation

Formant HMMs

Figure : Illustration modification of spectrum towards formants of target accent

slide22

The frequency bands of the source speaker [F01F12F23F34F45] are mapped to the target accent using a set of warping ratios derived from differences in the formants of phonetic segments of speech across accents as

-

T

T

f

f

+

1

i

i

a

=

+

(

1

)

i

i

-

S

S

f

f

+

1

i

i

Where fiT and fiS are the ith formants of the source and target accents

The frequency mapping can be expressed as

=

a

f

f

+

+

+

(

1

)

(

1

)

i

i

i

i

(

1

)

i

i

Figure : Illustration of warped(solid line) and original(dash dot line) formant trajectories of /aa/ in accent conversion from Australian to British.

pitch modification using time domain psola td psola
Pitch Modification Using Time Domain PSOLA (TD-PSOLA)

marks

Source pitch

marks

Target pitch

Illustration of mapping of pitch periods of a source speech to a target

Source Speech Pitch Marks

Target Speech Pitch Marks

  • TD-PSOLA is applied into each corresponding voiced speech segment to modify the pitch slope and duration of the segments
slide24

Examples of changes in accent/duration modulation of pitch

(b)

(a)

(a) ‘article’ in Australian, (b) Australian-accent ‘article’ transformed to British accent

(d)

(c)

(c) ‘asked’ in Australian, (d) Australian-accent ‘article’ transformed to British accent

slide25

Source

Source

Speech

Speech

LP

LP

Model

Model

Source

Source

Formant

Formant

Speaker

Speaker

Trajectory

Trajectory

HMM

HMM

Mapped

Mapped

Model

Model

Spectrum Warping / Pole Rotation

Speech

Speech

Speech

Speech

Warping

Warping

Recon

Recon-

Spectrum Warping / Pole Rotation

Factors

Factors

struction

struction

Target

Target

Formant

Formant

Speaker

Speaker

Trajectory

Trajectory

HMM

HMM

Model

Model

-

-

LP

LP

LPC

LPC

Model

Model

Target

Target

Speech

Speech

Model

Model

Formant

Formant

Speech

Speech

Formant Mapping

Formant Mapping

Estimation

Estimation

Tracking

Tracking

Reconstruction

Reconstruction

An Outline of Voice-Morph: A system for Voice and Accent Conversion

An example of voice

conversion

American male

Transformed(AM m->f)

American female

accent conversion demonstration
Accent Conversion Demonstration

Source Accent

TargetAccent

Spoken

word

Transformed

Australian

British

‘Article’

‘Claim’

‘Beige’

British

American

Transformed

‘Cooperation’

‘Boston’

‘Opposition’

‘The occupied’

ad