multimedia communications 371 speech and image communications 348 n.
Download
Skip this Video
Download Presentation
Multimedia Communications (371) Speech and Image Communications (348)

Loading in 2 Seconds...

play fullscreen
1 / 80

Multimedia Communications (371) Speech and Image Communications (348) - PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on

Multimedia Communications (371) Speech and Image Communications (348). John Mason Engineering Swansea University. Features in speech. X 1 . . . . X i. Feature extraction. Acquisition. time. (frame: 20/30 ms & sampling F: 8khz). Features in speech. X 1 . . . . X i . . .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Multimedia Communications (371) Speech and Image Communications (348)' - pete


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
multimedia communications 371 speech and image communications 348

Multimedia Communications (371)Speech and Image Communications (348)

John Mason

Engineering

Swansea University

EG-348_371_09

features in speech
Features in speech

X1

.

.

.

.

Xi

.

.

.

.

.

Feature extraction

Acquisition

time

(frame: 20/30 ms & sampling F: 8khz)

EG-348_371_09

features in speech1
Features in speech

X1

.

.

.

.

Xi

.

.

.

.

.

Feature extraction

Acquisition

(frame: 20/30 ms & sampling F: 8khz)

EG-348_371_09

speech production
Speech production

Air from

the lungs

Vocal fold

Vocal tract

Speech

EG-348_371_09

lpc short and long

Air from

the lungs

Vocal fold

Vocal tract

Speech

H1(z)

H2(z)

synthesised

Speech

noise

LPC Short and Long

Spectral envelop reflects morphological characteristics of the vocal tract

EG-348_371_09

slide6

Features: building of statistical model

T1

T2

T1

T2

T1

T2

T1

T2

T2

T1

T2

T1

T2

T1

T2

T1

T2

T1

T2

T1

T2

T1

EG-348_371_09

speech processing applications
Speech Processing - Applications
  • Why?
    • Communications
    • Synthesis
    • Recognition
      • Speech & Speaker
  • How?
    • Frame-based
    • Systems approach

EG-348_371_09

some books
Some Books
  • Flanagan -’Speech Analysis, Synthesis and Perception’, Springer-Verlag, - a classic!
  • Furui - several books on recognition
  • Parsons - `Voice and Speech Processing’ - McGraw Hill, one of the first text books on computer speech processing
  • O’Shaughnessy - ‘Speech Comms - human and machine’ Addison-Wesley
  • Rabiner & Juang - ‘Fundamentals of Speech Recognition’ Prentice Hall, 1993
  • Ramachandran & Mamone (eds) ‘Modern Methods of Speech Processing’ Kluer Academic, 1995

EG-348_371_09

speech communications
Speech Communications

Person-to-Person

Person-to-Machine

speech/speaker recognition

Machine-to-Person

speech synthesis

EG-348_371_09

electronic speech communications
(Electronic)Speech Communications

perhaps separated by long distance

(or in time)

EG-348_371_09

telephony broadcasting
Telephony & Broadcasting

Acoustic Air Path

l Transmission Path

Acoustic Air Path

Electronic

Link

EG-348_371_09

speech comms telephony

Channel Transmission Path

Electronic

Link

Speech Comms: Telephony

Microphone

ADC

Analysis

Coding

Transmitter

Receiver

Decoding

(re-)Synthesis

DAC

Loudspeaker

EG-348_371_09

speech bit rates

Human

Acoustic

generation

Transmission

Message

Creation

Language

Coding

Speech Bit Rates

hundreds

thousands

Tens of

thousands

tens

Approx. bit rate in bps

Acoustic Space

Human

Hearing

Extraction

Message

Realisation

Language

decoding

EG-348_371_09

criteria in speech comms

Excellent

Quality

Good

ADPCM

GSM

Fair

CELP

Poor

4

8

16

32

64 kbps

Criteria in Speech Comms.

Quality versus Bit-rate

4 Quality Measures:

intelligibility loudness

naturalness ease-of-listening

EG-348_371_09

low bit rate speech coding compandent http www compandent com
Low Bit Rate Speech CodingCompandent http://www.compandent.com/

EG-348_371_09

speech processing
Speech Processing

The three main application areas are:

  • Speech Comms. (the ‘electronic link’)
  • Automatic Speech/Speaker recognition
  • Speech SynthesisMuch of the underlying analysis is common, eg linear predictive coding

EG-348_371_09

what does speech look like1
What does speech look like?

Dynamic Range - for flexibility

and robustness

Time-varying - to convey

information

EG-348_371_09

frame based analysis
Frame-based Analysis
  • To capture time variations:
    • 20-30 ms frames - ‘centi-second’ labeling
    • spectral analysis
        • FFT
        • Filter-bank
        • Linear Predictive Coding

EG-348_371_09

speech analysis coding

Excitation:

voiced

unvoiced

sn

speech

en

H(z)

Speech Analysis/Coding
  • Two general cases:
      • Waveform coders
      • Source (voice) coders (vo-coders)
  • Source coders eg linear predictive coding (LPC):
      • Model the source ie the vocal tract (VT)
      • Linear, time varying model of VT, plus excitation

EG-348_371_09

systems approach
Systems Approach

Excitation

Speech

Vocal

Tract

Voiced

Speech

Model

f0

Unvoiced

Time Varying

Parameters

EG-348_371_09

lpc analysis synthesis

H(z)

hn

S(z)

E(z)

en

sn

E(z)

S(z)

1/H(z)

sn

en

LPC Analysis/Synthesis
  • Synthesis:
      • Input: Excitation
      • output: Speech
  • Analysis:
      • Input: Speech
      • output: Excitation

EG-348_371_09

perfect analysis synthesis

S(z)

E(z)

E(z)

S(z)

1/H(z)

H(z)

sn

en

sn

en

‘Perfect’ Analysis/Synthesis

Input sn and output sn are identical

(within arithmetic limits)

EG-348_371_09

slide26

S(z)

E(z)

E(z)

S(z)

1/H(z)

H(z)

sn

en

sn

en

Transmission

Sending

Receiving

Practical Analysis/Synthesis

  • Parameters for Transmission :
          • Input / Excitation en
          • Source model H(z)
  • Thus Analysis must derive these parameters, and
  • Synthesis must use them to re-generate speech

EG-348_371_09

slide27

a

s

s

a

s

a

s

a

s

.

.

.

.

.

.

.

.

n

p

p

n

1

n

1

n

2

3

n

2

3

Linear Predictive Coding - LPC

Principle of linear prediction:

  • The next value (or sample) in a series, ie at time n, is predicted or estimated by a weighted sum of previous values, ie those at time n-1, n-2, ...
  • Thus for a predictor of order p, we have:

EG-348_371_09

linear prediction
Linear Prediction

Transforming to the z-domain gives:

EG-348_371_09

lpc error terms
LPC Error Terms

Error is simply difference between predicted and actual values:

sn

en

+

-

ˆ

sn

A’(z)

EG-348_371_09

synthesis

en

Synthesis

sn

H(z)

Parameters updated at frame rate

sn

en

+

+

A’(z)

NB ‘hat’ of approximation omitted for simplicity

EG-348_371_09

analysis for synthesis

Synthesis

en

sn

H(z)

Analysis

Analysis

sn

en

S(z)

+

E(z)

1/H(z)

sn

-

en

A’(z)

Analysis for Synthesis
  • The Analysis and Synthesis must match
      • what is needed for the Synthesis?
      • Answer: en - the excitation and H(z) - the system
  • Thus the Analysis must derive these terms (from sn ):
  • The speech signal, sn is analysed to give en and H(z) ie A’(z) parameters for transmission.

EG-348_371_09

derivation of lpc coefficients a z
Derivation of LPC Coefficients - A(z)

Recall:

where ai are the pprediction coefficients.The principle

behind LPC is to find a set of pcoefficients, a1, a2, a3, ...

ap, which in some sense minimizes the error signal en,

over a frame of speech, N. This leads to a set p

coefficients for each frame.

EG-348_371_09

derivation of a z 2

for i = 1, 2, .… p

From which:

where:

In matrix form:

or

Derivation of A(z) – (2)

Minimisation of En is achieved by setting the ppartial derivatives to zero:

The matrix [R] is Toepliz symmetric, offering numerically efficient inversion techniques - Durbin’s recursion algorithm being one of the most popular.

EG-348_371_09

derivation of a z 3
Derivation of A(z) – (3)
  • When N very large r is the autocorrelation coefficients of s
  • S comes from e convolved with h (excitation & vocal tract)
  • we are interested here in separating e and h
  • the predictor order, p, is small to reflect the short-term periodicities (formants)
  • with higher predictor orders we will get the longer-term periodicities (pitch)
  • 2 practical problems with evaluating a:
    • matrix singularities in R-1
    • unstable resultant H(z)
  • in practice both are solved by windowing - shaping frame - Hamming

EG-348_371_09

speech signal characteristics
Speech Signal Characteristics
  • Duration
  • Dynamic Range
  • Periodicities:
    • vocal tract
    • pitch
  • Frame-based Analysis
      • frame size: quasi-stationary capture transition typically 20 - 30ms
      • frame rate: task dependent: more means moreband-width/computation - up to 100 frames/second

EG-348_371_09

slide36

Harmonic Structures and Periodicities

  • Harmonic Structures & Periodicities give potential for data reduction
  • LPC is one way of gaining this compression
  • Speech has two obvious separate structures
    • vocal tract resonances
    • pitch

EG-348_371_09

slide37

Harmonic Structures and Periodicities

voiced

or

unvoiced

sn

speech

en

H(z)

Vocal tract

Short Term

Tp

p

Short term prediction

EG-348_371_09

slide38

Harmonic Structures and Periodicities

voiced

unvoiced

epn

sn

speech

Hlt(z)

Hst(z)

en

Pitch

Vocal tract

Tp

P

Long term prediction

EG-348_371_09

slide39

k

Gain

en

epn

sn

Hlt(z)

Hst(z)

Harmonic Structures and Periodicities

Two Structures: short-term (formants) & long-term - pitch (excitation)

eg 20ms frame

160 samples @ 8Khz

ai eg p=3

ai eg p=10

NB Representations of these parameters are transmitted

EG-348_371_09

practical coding systems
Practical Coding Systems
  • Waveform & Source Coders (Vocoders)
      • 2 periodicities/redundancies in source
        • short-term (formants)
        • long-term - pitch
      • Excitation en

en

epn

sn

Hlt(z)

Hst(z)

EG-348_371_09

perfect analysis synthesis 1

S(z)

E(z)

E(z)

S(z)

1/H(z)

H(z)

sn

en

sn

en

‘Perfect’ Analysis/Synthesis (1)

Input sn and output sn are identical

(within arithmetic limits)

EG-348_371_09

perfect analysis synthesis 2

S(z)

E(z)

E(z)

E(z)

S(z)

S(z)

1 – A’(z)

1/H(z)

H(z)

sn

sn

en

sn

en

en

‘Perfect’ Analysis/Synthesis (2)

S(z)

E(z)

1/(1–A’(z))

en

sn

en

sn

sn

en

1/(1–A’(z))

1 – A’(z)

EG-348_371_09

perfect analysis synthesis 3

sn

sn-1

a1

ai

sn-i

sn-p

‘Perfect’ Analysis/Synthesis (3)

sn

en

sn

en

1/(1–A’(z))

1 – A’(z)

Original Speech

Residual

sn

en

+

-

sn

Z-1

Z-1

Note – minus sign:

in Matlab combined with ai What determines p?

Z-1

ap

EG-348_371_09

perfect analysis synthesis 4

sn

en

sn

en

1/(1–A’(z))

1 – A’(z)

sn

sn-1

a1

a1

ai

ai

sn-i

sn-p

‘Perfect’ Analysis/Synthesis (4)

Residual

Re-Synth.

Original Speech

en

en

sn

+

+

-

sn

sn

Z-1

Z-1

Note

No minus

sn-1

Z-1

Z-1

sn-i

Z-1

Z-1

sn-p

ap

ap

EG-348_371_09

practical system

S(z)

E(z)

E(z)

S(z)

1/H(z)

H(z)

sn

en

sn

en

Input sn and output sn are “similar”

Practical System

Transmitted

Data Frame

What does the Transmitted Data Frame Contain?

EG-348_371_09

analysis by synthesis lpas
Analysis-by-Synthesis: LPAS

Integrated encoder & decoder at the encoder

-

sn

Basic

decoder

Adaptive

encoder

+

Weighted error

LPAS Encoder

EG-348_371_09

log spectral estimates
Log Spectral Estimates
  • Comparisons between frames are very important in many situations
  • log spectral estimates are the most common (though in Comms. An approximation is used to reduce computation)

In Comms, compuation is expensive and parameter vector approximations to D are used

EG-348_371_09

some standards
Some Standards

GSM European Cellular RPE-LTP 13kb/s

FS1016 Secure Voice CELP 4.8

IS54 NA Cellular VSELP 7.95

IS96 “ QCELP 1-8

JDC-FR Japanese Cellular VSELP 6.7

JDC-HR “ PSI-CELP 3.67

G.728 (terrestrial) LD-CELP 16

EG-348_371_09

low bit rate speech coding compandent http www compandent com1
Low Bit Rate Speech CodingCompandent http://www.compandent.com/

EG-348_371_09

criteria in speech comms1

Excellent

Quality

Good

ADPCM

GSM

Fair

CELP

Poor

4

8

16

32

64 kbps

Criteria in Speech Comms.

Quality versus Bit-rate

4 Quality Measures:

intelligibility loudness

naturalness ease-of-listening

EG-348_371_09

celp eg
CELP eg

Short-term coefficients

(formants)

Long-term coefficients

(pitch)

CB

Index

Gain

en

sn

Hlt(z)

Hst(z)

Excitation is

represented

by address

ie CB Index

en

EG-348_371_09

celp lpas encoder
CELP – LPAS (Encoder)

Short-term coefficients

(formants)

Long-term coefficients

(pitch)

CB

Index

Gain

sn

en

en

sn

sn

Hlt(z)

Hst(z)

Excitation is

represented

by address

ie CB Index

en

-

sn

Basic

decoder

Adaptive

encoder

+

Weighted error

EG-348_371_09

conversion of lpc parameters

LSF = ws . /2

z-plane jy

x

ws

x

Conversion of LPC Parameters
  • A(z) = 1 + a1 z - 1 + a2 z - 2 + …… ap z - p and a i are to be Tx’d
  • Line Spectral Frequencies (LSF) present a clever way of representing the LPC coefficients, the ai’s of A(z)
  • The ai’s are floating point numbers and their accuracy is important
  • Factorising A(z) tends to give complex roots in the z-domain
  • LSF’s map these complex roots on to the unit circle

LSF’s

  • Lead to efficient coding
  • Ensure a minimum phase filter
  • Bit errors are spectrum localised minimising loss of speech quality

EG-348_371_09

line spectral frequencies
Line Spectral Frequencies
  • Consider
      • P(z) = A(z) + z—(n+1) A(z—1 )
  • and
      • Q(z) = A(z) - z—(n+1) A(z—1 )
  • then P(z) and Q(z) lead to what is known as LSF’s
  • Clearly if P(z) and Q(z) are known then A(z) can be found:
  • A(z) = {P(z) + Q(z)} / 2
  • Roots of P(z) and Q(z) lie on the unit circle in z-domain The locations give:
      • the LSF’s
      • P(z) and Q(z), and whence A(z)

EG-348_371_09

lsf evaluation
LSF Evaluation

Consider one pair of complex roots, A1(z) :

A1(z) = 1 + a1 z -1 + a2 z -2

P1(z) = 1 + a1 z -1 + a2 z -2 + z -3(1 + a1 z1 + a2 z2 )

= (z2 + (a1+ a2- 1)z + 1 )( z + 1 ) z –3

Q1(z) = 1 + a1 z -1 + a2 z -2 - z -3(1 + a1 z1 + a2 z2 )

= (z2 + (a1 - a2 + 1)z + 1 )( z - 1 ) z -3

The roots at 0 and 1 are discarded

It follows that the LSF’s, 1 & 2 , are given by:

cos (1) = - (a1 + a2- 1)/2

and cos (2) = - (a1 - a2+ 1)/2

Show:

a1 = -(cos (1) + cos (2) ) and

a2 = (cos (2) - cos (1) +1 )

EG-348_371_09

lsf test example
LSF Test Example

A1(z) = 1 + a1 z -1 + a2 z - 2

= (z2 + a1 z+ a2 )z - 2

= (z2 + 2 cos() wn z+ wn2 ) z - 2

where wn is radius and  is angle from . So: radius =  a2 &  =  - 

Note: in P & Q all w n2 terms (of the multiple 2nd orders) are unity

EG 1: a2 = 1 then cos (1) = - (a1 + a2- 1)/2 = -(a1)/2

roots already on circle and do not move (unstable system – not practical)

EG 2: a1 = 0 then cos (1) = - (a1+ a2-1)/2 = - (a2 - 1)/2

cos (2) = - (a1- a2+ 1)/2 = - (-a2 + 1)/2

so LSF’s are symmetric about  /4

EG-348_371_09

slide57

LSF Review & Example (1)

LSF’s/LSP’s are defined as:

P(z) = A(z) + z-(n+1) A(z-1 )

and

Q(z) = A(z) - z-(n+1) A(z-1 )

thus

A(z) = {P(z) + Q(z)} / 2

EG-348_371_09

slide58

LSF Review & Example (2)

For a second order A(z)= 1 + a1 z-1 + a2 z-2

P (z) = 1 + a1 z-1 + a2 z-2 + (1 + a1 z1 + a2 z2)z-3

= (z2 + (a1 + a2 - 1)z + 1)(z + 1)z–3

Q (z) = 1 + a1 z-1 + a2 z-2 - (a1 z1 + a2 z2)z-3

= (z2 + (a1 - a2 + 1)z + 1)(z - 1 )z–3

cf: (s2 + ( 2cos()wn )s + wn2)

EG-348_371_09

slide59

Q(z)

P(z)

Q(z)

P(z)

2

1

LSF Review & Example (3)

For a second order A(z)= 1 + a1 z-1 + a2 z-2 :

P (z) = (z2 + (a1 + a2 - 1)z + 1)(z + 1)z–3

Q (z)= (z2 + (a1 - a2 + 1)z + 1)(z - 1 )z–3

cf: (s2 + ( 2cos()wn )s + wn2)

Thus:(a1 + a2 - 1) = 2cos(1)

= - 2cos(1)

&

(a1 - a2 + 1) = - 2cos(2 )

So, given:

i) LPC coeffs., a1 and a2 , then LSFs 1 & 2can be found

ii) LSFs, 1 & 2 , then the LPC coeffs. a1 and a2be found

2

1

EG-348_371_09

slide60

LSF Review & Example (4)

For a second order and with P(z) corresponding to the first root, Q(z) to the second root,

and so

P (z) = 1 + a1 z-1 + a2 z-2 + (1 + a1 z1 + a2 z2)z-3

= (z2 + (a1 + a2 - 1)z + 1)(z + 1)z–3

for the second pair of qi, 1.37 and 1.77

= (z2 - 2cos(1.37) z + 1 )(z + 1) z–3

= (z3 +(1 - 2cos(1.37) z2+ (1 - 2cos(1.37))z + 1)z–3

 Likewise

Q (z) = 1 + a1 z-1 + a2 z-2 - (a1 z1 + a2 z2)z-3

= (z2 + (a1 - a2 + 1)z + 1)(z - 1 )z–3

= (z2 - 2cos(1.77) z + 1 )(z - 1) z–3

= (z3 +(-1 - 2cos(1.77) z2+ (1 + 2cos(1.77))z - 1)z–3

Then

A(z) = {P(z) + Q(z)} / 2)

= (z3 + (cos(1.37) + cos(1.77))z2 + (1 - cos(1.37) + cos(1.77))z)z–3

EG-348_371_09

slide61

LSF Examples

EG-348_371_09

slide62

LSF Examples

A(z)= 1 + a1 z-1 + a2 z-2

P (z) = 1 + a1 z-1 + a2 z-2 + (1 + a1 z1 + a2 z2)z-3

= (z2 + (a1 + a2 - 1)z + 1)(z + 1)z–3

= (z2 + (-1.8 + 0.9 - 1)z + 1)(z + 1)z–3

= (z2 - 1.9 z + 1) (z + 1)z–3

cf: (z2 + ( 2cos()wn )z + wn2)

thus cos() = - 1.9/2 or  = 2.824 and 1 = π -

= 0.318

EG-348_371_09

slide64

Codebooks & VQ

N = 2L

Identical book

i (0 … N-1)

p

p

Data reduction: (p x B) to L

time

time

EG-348_371_09

codebook compression

N = 2 k

i

M

index, i

A(z)

en

sn

H(z)

Codebook Compression
  • Principle
    • representative data sets
    • data vector is replaced / representedby “nearest” vector, chosen from a “codebook” - a closed set of vectors
  • Examples
      • LPC parameter sets
      • Excitation as in CELP

EG-348_371_09

codebook compression celp

sn

H(z)

Codebook Compression - CELP

Codebook of time-domain samples

start point

en

y ms

y ms

y ms

en are time domain samples (integers)

R samples per second (eg 8000 Hz)

Frame rate governs vector size

P = 2 j

Bit rate = j/y bits/ms

P

NB en also includes gain

EG-348_371_09

codebook compression of h z
Codebook Compression of H(z)

x ms

N = 2 k

time

i

M

index, i

A[z] at time t

Vector with M elements, every x ms

Codebook with N = 2 kvectors

Bit rate = k/x bits per ms (not a function of M)

In practice A[z] is converted to LSF’s.

EG-348_371_09

codebook generation
Codebook Generation

1) Initialise:

form a single centroid of all training data, N=1

2) Repeat

Split centroids: N -> 2N

Repeat

Cluster data to nearest centroid

until convergence

until N large enough

EG-348_371_09

slide69

VQ Performance on Unseen Data

Ramachandran & Mamone (eds) ‘Modern Methods of Speech Processing’ Kluer Academic, 1995

EG-348_371_09

slide70

VQ Performance on Unseen Data

Ramachandran & Mamone (eds) ‘Modern Methods of Speech Processing’ Kluer Academic, 1995

EG-348_371_09

slide71

1

0.5

0

Waveform

-0.5

-1

0

3.2

6.4

9.6

12.8

16

19.2

22.4

25.6

Time (ms)

LPC & FFT Spectra

LPC Roots

-0.6651 ± 0.6695i

-0.0560 ± 0.9709i

0.7228 ± 0.6225i

0.8714 ± 0.3694i

0.5758

-0.4200

LSFs

40

20

0

Magnitude (dB)

-20

-40

0

1

2

3

4

5

Frequency (KHz) ( 0-to-Fs/2)

EG-348_371_09

slide72

40

20

Magnitude (dB)

0

-20

-40

0

1

2

3

4

5

LPC Spectra & LSF’s

LPC Roots

-0.6651 ± 0.6695i

-0.0560 ± 0.9709i

0.7228 ± 0.6225i

0.8714 ± 0.3694i

0.5758

-0.4200

LSFs

Frequency (KHz) ( 0-to-Fs/2)

EG-348_371_09

slide73

LPC & FFT Spectra - 2nd Order

A(z):

1.5537 -0.8276

Roots:

0.7769 ± 0.4733i

1

0.5

0

-0.5

H(0) = K

(1- (1.5537 -0.8276))

H(ws/2) = K

(1- (-1.5537 -0.8276))

H(0)K/0.274

= = 21.8dB

H(ws /2) K/ 3.38

-1

0

3.2

6.4

9.6

12.8

16

19.2

22.4

25.6

Time (ms)

40

20

0

-20

-40

0

1

2

3

4

5

Frequency (KHz) ( 0-to-Fs/2)

EG-348_371_09

slide74
GSM
  • Groupe Special Mobile - EU
      • First digital cellular system in world
      • See Hodge 1990
      • Based on TDMA & FDMA at 900MHz, and RPE-LPC(ie it is an ‘LPAS’ system)
      • Now at 1800 MHz
      • Carriers at 200kHz
      • Supporting 8 TDMA time slots each
      • Time slots: 577ms - 156.26 bit slots
      • 8 time slots form 1 GSM frame of 4.62 ms
      • Modulation: Gaussian minimum shift key
      • 26 bit training in every time slot
      • Round-trip delay ~ 80ms
      • EU: GSM US: D-AMPS

EG-348_371_09

slide75

Other Related Topics

Spectral Lifting: H(z) = (1-az-1)

Codebook Training

Spectral Differences between 2 frames

Cepstra

Modeling Speech Space - HMM’s

EG-348_371_09

slide76

1

- 1

1

- 1

30ms

(a)

(b)

Figure Q1

Pre-Emphasis Example

EG-348_371_09

slide77

Pre-Emphasis Example

z-plane jy

G(ws/2) = 1 + a

G(0) = 1 - a

a

For G(ws/2 ) > G(0) then

a must be > 0

1+a = 2

ws/2

EG-348_371_09

slide78

Z-plane to Magnitude Spectrum

1

0.5

0

Imaginary Part

-0.5

-1

-1

-0.5

0

0.5

1

Real Part

50

40

30

1+a = 2

20

10

Magnitude (dB)

0

-10

ws/2

-20

-30

0

1

2

3

4

5

Frequency (KHz) ( 0-to-Fs/2)

EG-348_371_09

lpc short and long1

Air from

the lungs

Vocal fold

Vocal tract

Speech

H1(z)

H2(z)

synthesised

Speech

noise

LPC Short and Long

Spectral envelop reflects morphological characteristics of the vocal tract

EG-348_371_09

st lt prediction

+

-

Z-1

Z-1

a1

a1

Z-1

ai

ai

ap

ap

ST & LT Prediction

Speech

Residual

e`n

sn

en

1 – A’(z)

1 – A’(z)

sn

+

-

Z-1

sn

sn-1

Z-1

STP

sn-i

Z-1

LTP

ai

Z-1

sn-p

EG-348_371_09