S nack for r uby
1 / 36

Snack for Ruby - PowerPoint PPT Presentation

  • Uploaded on

S nack for R uby. S Legrand. Talk Objectives. Tour of API Learn the walk and talk Have Fun. S nack. Snack library is a tool to aid in the learning about sound, voice, ASR, and is hopefully a fun way to experiment Snack is a tcl-based API

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Snack for Ruby' - phyllis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
S nack for r uby l.jpg

S Legrand

Talk objectives l.jpg
Talk Objectives

  • Tour of API

  • Learn the walk and talk

  • Have Fun

S nack l.jpg

  • Snack library is a tool to aid in the learning about sound, voice, ASR, and is hopefully a fun way to experiment

  • Snack is a tcl-based API

  • Snack has been adapted to and included in Standard Python Distribution

S nack4 l.jpg

  • Snack is Swedish for “talk” or “chat”

  • Kåre Sjölanderis the principal investigator for tcl-based snack

  • Tcl Snack is available at http://www.speech.kth.se/snack/

S nack for r uby5 l.jpg
Snack for Ruby

  • rbSnack is a ruby wrapper around tcl snack

  • rbSnack has additional ruby based utilities

  • rbSnack has html-based help. (rdoc+rbTeX)

  • rbSnack can be found at http://rbsnack.sourceforge.net/

Snack toolkit includes l.jpg
Snack Toolkit Includes

  • Recording, Playback

  • Waveform display

  • Spectrogram: Fourier, LPC

  • Formant analysis

  • Power analysis

  • Filters

    (will demo)

The speech signal l.jpg
The Speech Signal

  • Continuous speech is discretely sampled

  • Signal consist of rapidly changing data points.

  • The display of the sampled signal is called the waveform

  • Snack can display the waveform real-time

Analysis uses frames l.jpg
Analysis uses frames

  • Signal is broken into frames

  • Frames may overlap

  • Characteristics of signal analyzed using Fourier and LPC analysis on a per frame basis.

Going in circles l.jpg
Going in Circles

  • Complex numbers is just a funny way of multiplying: add angles.

  • Eulers formula

Fourier analysis l.jpg
Fourier Analysis

  • Fourier matrix is an unitary matrix

  • Multiplication by Fourier matrix returns the frequency components of the signal, called the Fourier coefficients

  • Easy to compute the inverse: Called Fourier Inverse

The fourier matrix looks like l.jpg
The Fourier Matrix Looks Like

  • Spinning disks

Multiplication by signal produces Fourier coefficients (frequency components)

Examining fourier components l.jpg
Examining Fourier components

  • A Spectrogram gives a picture of the Fourier components (coefficients) as they evolve over time. Snack can display real time.

  • Looks like an X Ray

  • Bands of high activity correspond to formants

Linear filters l.jpg
Linear Filters

  • Useful to understand nature of speech signals

  • Generators: generate square waves, sin waves, saw tooth, etc.

  • Composers: composes several filters.

  • FIR: Finite impulse response

  • IIR: Infinite impulse response

Fir filter l.jpg
FIR Filter

  • Determined completely by response to a unit impulse.

  • Response finite in duration.

y(t)=b0 x(t) + b1 x(t-1)+ b2x(t-2)+…+bn x(t-n)

(We will demo FIR using rbSnack)

Iir filter l.jpg
IIR Filter

  • Also called Recursive filter

  • Response infinite in duration.

y(t)=b0 x(t) + b1 x(t-1)+ b2x(t-2)+…+bn x(t-n) +a1 y(t-1)+ a2y(t-2)+…+an y(t-n)

(We will demo IIR using rbSnack)

Linear predictive analysis l.jpg
Linear Predictive Analysis

  • Analogous to Fourier analysis

  • Assumption: For each frame, the signal is predicted by

  • The LPC coefficients are the best least squares approximation.

  • Can also be used to predict formants

y(t)=a1 y(t-1)+ a2y(t-2)+…+ap y(t-p)

What is sound what is speech l.jpg
What is Sound? What is Speech?

  • Sound is the resulting signal created by the longitude waves in some medium like air.

  • Sound waves are continuous

  • Can be decomposed into linear combination of sin waves.

  • Speech is a special noise made by humans

It s just tubing l.jpg
It’s Just Tubing…

  • The simplest model of speech is to consider the lungs and trachea as one long tube.

  • Resonance frequencies are called Formants.



Some speech recognition features l.jpg
Some Speech Recognition Features

  • Formants

  • Pitch

  • Voiced/Unvoiced

  • Nasality

  • Frication

  • Energy

Our current work only uses Formants and Energy

Basic utterances l.jpg
Basic Utterances

  • A basic unit of speech is called a Phone

  • Vowels are utterances with constant formants

  • Diphthong is the transitioning from one vowel to another

  • Vowels and Diphthongs are essentially characterized by the first and second formant.

Other phones the consonants l.jpg
Other Phones: The Consonants

  • Plosives: closure in oral cavity /p/

  • Nasal: Closure of nasal cavity /m/

  • Fricative: Turbulent airstream noise /s/

  • Retroflex liquid: Vowel like-tongue high curled back /r/

  • Lateral liquid: Vowel like, tongue central, side air stream /l/

  • Glide: Vowel like /y/

Some problems with speech signals l.jpg
Some Problems with Speech Signals

  • Segmentation: when does a word begin and end? (Noise?)

  • Wet ware: (speaker’s internal configuration + lip smacks, breathing etc.)

    SegmentationWorkshop demos one approach.

Code books l.jpg
Code Books

  • A code book consists of code words.

  • Idea is to search through code book to find code word corresponding to best match of feature sequence.

  • RbSnack uses codebook approach in word recognition.

Code book approach l.jpg
Code Book Approach

  • ++ Easy to implement

  • + Good for isolated words

  • +- Works best on small vocabularies

  • -- Is insensitive to context, prone to errors

Code book approach25 l.jpg
Code Book Approach

  • WhichWay is a simple demo of this approach

More problems with speech signals l.jpg
More Problems with Speech Signals

  • Accent: Southern vs. New England vs. California Valley vs. Other.

  • Variation in rate of speech makes it hard to compare words

Dynamic time warping l.jpg
Dynamic Time Warping

  • A pattern comparison technique

  • A way of stretching or compressing one sequence to match another.

  • Evaluated using dynamic programming

Dynamic programming l.jpg
Dynamic Programming

  • Form a grid, with start at lower left, end at upper right.

  • Label each node with difference (error) between pattern 1 at time i and pattern 2 at time j.

  • Find minimal distance from start to end using

Dynamic programming29 l.jpg
Dynamic Programming

Basic Assumption:

If best path P(S,E) passes through node N, then P(S,E) is the concatenation of P(S,N) (best from S to N) and P(N,E) (best from N to E)

  • A possible path

Dynamic programming30 l.jpg
Dynamic Programming


RbSnack includes examples for various time alignment approaches






Type I

Type III

Dynamic programming31 l.jpg
Dynamic Programming










Type IV

Hidden markov models l.jpg
Hidden Markov Models

  • Sometime the second (or third) best match is the right word. Use HMM’s to ascertain the correct word in the context of the sentence. (Ditto for phones within a word)

  • HMM’s are similar to non-deterministic finite state machines, except for they have non-deterministic output.

Hidden markov models33 l.jpg
Hidden Markov Models

  • Dynamic Programming is used to compute weights.

  • HMM’s look like






P(/i/)=.5 P(/a/)=.2 P(/o/)=.3



Possiblefuture directions l.jpg
PossibleFuture Directions

  • Examine other features, (pitch?)

  • Incorporate other libraries. (Do the computationally hard work in C)

  • Add more signal processing routines

  • Add more examples

  • Use Hidden Markov Models

Lessons learned to be learned l.jpg
Lessons Learned/to be learned

  • Document everything.

  • Nothings perfect

  • Automate everything

  • Project is never done

What s next l.jpg
What’s next?

  • Try it out.