s nack for r uby l.
Skip this Video
Download Presentation
S nack for R uby

Loading in 2 Seconds...

play fullscreen
1 / 36

S nack for R uby - PowerPoint PPT Presentation

  • Uploaded on

S nack for R uby. S Legrand. Talk Objectives. Tour of API Learn the walk and talk Have Fun. S nack. Snack library is a tool to aid in the learning about sound, voice, ASR, and is hopefully a fun way to experiment Snack is a tcl-based API

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'S nack for R uby' - phyllis

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
s nack for r uby

S Legrand

talk objectives
Talk Objectives
  • Tour of API
  • Learn the walk and talk
  • Have Fun
s nack
  • Snack library is a tool to aid in the learning about sound, voice, ASR, and is hopefully a fun way to experiment
  • Snack is a tcl-based API
  • Snack has been adapted to and included in Standard Python Distribution
s nack4
  • Snack is Swedish for “talk” or “chat”
  • Kåre Sjölanderis the principal investigator for tcl-based snack
  • Tcl Snack is available at http://www.speech.kth.se/snack/
s nack for r uby5
Snack for Ruby
  • rbSnack is a ruby wrapper around tcl snack
  • rbSnack has additional ruby based utilities
  • rbSnack has html-based help. (rdoc+rbTeX)
  • rbSnack can be found at http://rbsnack.sourceforge.net/
snack toolkit includes
Snack Toolkit Includes
  • Recording, Playback
  • Waveform display
  • Spectrogram: Fourier, LPC
  • Formant analysis
  • Power analysis
  • Filters

(will demo)

the speech signal
The Speech Signal
  • Continuous speech is discretely sampled
  • Signal consist of rapidly changing data points.
  • The display of the sampled signal is called the waveform
  • Snack can display the waveform real-time
analysis uses frames
Analysis uses frames
  • Signal is broken into frames
  • Frames may overlap
  • Characteristics of signal analyzed using Fourier and LPC analysis on a per frame basis.
going in circles
Going in Circles
  • Complex numbers is just a funny way of multiplying: add angles.
  • Eulers formula
fourier analysis
Fourier Analysis
  • Fourier matrix is an unitary matrix
  • Multiplication by Fourier matrix returns the frequency components of the signal, called the Fourier coefficients
  • Easy to compute the inverse: Called Fourier Inverse
the fourier matrix looks like
The Fourier Matrix Looks Like
  • Spinning disks

Multiplication by signal produces Fourier coefficients (frequency components)

examining fourier components
Examining Fourier components
  • A Spectrogram gives a picture of the Fourier components (coefficients) as they evolve over time. Snack can display real time.
  • Looks like an X Ray
  • Bands of high activity correspond to formants
linear filters
Linear Filters
  • Useful to understand nature of speech signals
  • Generators: generate square waves, sin waves, saw tooth, etc.
  • Composers: composes several filters.
  • FIR: Finite impulse response
  • IIR: Infinite impulse response
fir filter
FIR Filter
  • Determined completely by response to a unit impulse.
  • Response finite in duration.

y(t)=b0 x(t) + b1 x(t-1)+ b2x(t-2)+…+bn x(t-n)

(We will demo FIR using rbSnack)

iir filter
IIR Filter
  • Also called Recursive filter
  • Response infinite in duration.

y(t)=b0 x(t) + b1 x(t-1)+ b2x(t-2)+…+bn x(t-n) +a1 y(t-1)+ a2y(t-2)+…+an y(t-n)

(We will demo IIR using rbSnack)

linear predictive analysis
Linear Predictive Analysis
  • Analogous to Fourier analysis
  • Assumption: For each frame, the signal is predicted by
  • The LPC coefficients are the best least squares approximation.
  • Can also be used to predict formants

y(t)=a1 y(t-1)+ a2y(t-2)+…+ap y(t-p)

what is sound what is speech
What is Sound? What is Speech?
  • Sound is the resulting signal created by the longitude waves in some medium like air.
  • Sound waves are continuous
  • Can be decomposed into linear combination of sin waves.
  • Speech is a special noise made by humans
it s just tubing
It’s Just Tubing…
  • The simplest model of speech is to consider the lungs and trachea as one long tube.
  • Resonance frequencies are called Formants.



some speech recognition features
Some Speech Recognition Features
  • Formants
  • Pitch
  • Voiced/Unvoiced
  • Nasality
  • Frication
  • Energy

Our current work only uses Formants and Energy

basic utterances
Basic Utterances
  • A basic unit of speech is called a Phone
  • Vowels are utterances with constant formants
  • Diphthong is the transitioning from one vowel to another
  • Vowels and Diphthongs are essentially characterized by the first and second formant.
other phones the consonants
Other Phones: The Consonants
  • Plosives: closure in oral cavity /p/
  • Nasal: Closure of nasal cavity /m/
  • Fricative: Turbulent airstream noise /s/
  • Retroflex liquid: Vowel like-tongue high curled back /r/
  • Lateral liquid: Vowel like, tongue central, side air stream /l/
  • Glide: Vowel like /y/
some problems with speech signals
Some Problems with Speech Signals
  • Segmentation: when does a word begin and end? (Noise?)
  • Wet ware: (speaker’s internal configuration + lip smacks, breathing etc.)

SegmentationWorkshop demos one approach.

code books
Code Books
  • A code book consists of code words.
  • Idea is to search through code book to find code word corresponding to best match of feature sequence.
  • RbSnack uses codebook approach in word recognition.
code book approach
Code Book Approach
  • ++ Easy to implement
  • + Good for isolated words
  • +- Works best on small vocabularies
  • -- Is insensitive to context, prone to errors
code book approach25
Code Book Approach
  • WhichWay is a simple demo of this approach
more problems with speech signals
More Problems with Speech Signals
  • Accent: Southern vs. New England vs. California Valley vs. Other.
  • Variation in rate of speech makes it hard to compare words
dynamic time warping
Dynamic Time Warping
  • A pattern comparison technique
  • A way of stretching or compressing one sequence to match another.
  • Evaluated using dynamic programming
dynamic programming
Dynamic Programming
  • Form a grid, with start at lower left, end at upper right.
  • Label each node with difference (error) between pattern 1 at time i and pattern 2 at time j.
  • Find minimal distance from start to end using
dynamic programming29
Dynamic Programming

Basic Assumption:

If best path P(S,E) passes through node N, then P(S,E) is the concatenation of P(S,N) (best from S to N) and P(N,E) (best from N to E)

  • A possible path
dynamic programming30
Dynamic Programming


RbSnack includes examples for various time alignment approaches






Type I

Type III

dynamic programming31
Dynamic Programming










Type IV

hidden markov models
Hidden Markov Models
  • Sometime the second (or third) best match is the right word. Use HMM’s to ascertain the correct word in the context of the sentence. (Ditto for phones within a word)
  • HMM’s are similar to non-deterministic finite state machines, except for they have non-deterministic output.
hidden markov models33
Hidden Markov Models
  • Dynamic Programming is used to compute weights.
  • HMM’s look like






P(/i/)=.5 P(/a/)=.2 P(/o/)=.3



possiblefuture directions
PossibleFuture Directions
  • Examine other features, (pitch?)
  • Incorporate other libraries. (Do the computationally hard work in C)
  • Add more signal processing routines
  • Add more examples
  • Use Hidden Markov Models
lessons learned to be learned
Lessons Learned/to be learned
  • Document everything.
  • Nothings perfect
  • Automate everything
  • Project is never done
what s next
What’s next?
  • Try it out.