Speech processing
Sponsored Links
This presentation is the property of its rightful owner.
1 / 25

Speech Processing PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on
  • Presentation posted in: General

Speech Processing. Applications of Images and Signals in High Schools. AEGIS RET All-Hands Meeting University of Central Florida June 22, 2012. Contributors. Dr . Veton Këpuska , Faculty Mentor, FIT Jacob Zurasky , Graduate Student Mentor, FIT

Download Presentation

Speech Processing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Speech Processing

Applications of Images and Signals in High Schools

AEGIS RET All-Hands Meeting

University of Central Florida

June 22, 2012


Contributors

Dr. VetonKëpuska, Faculty Mentor, FIT

Jacob Zurasky, Graduate Student Mentor, FIT

Becky Dowell, RET Teacher, BPS Titusville High


Motivation

  • Speech audio processing has increased in its usefulness.

  • Applications

    • Siri on iPhone 4S

    • Automated telephone systems

    • Voice transcription (e.g. dictation software)

    • Hands-free computing (e.g., OnStar)

    • Video games (e.g., XBOX Kinect)

    • Military applications (e.g., aircraft control)

    • Healthcare applications


Motivation

  • Speech recognition requires speech to first be characterized by a set of “features”.

  • Features are used to determine what words are spoken.

  • To understand how the features are computed is very important.

  • Our project will implement the feature extraction stage of a speech processing application.


Work Completed

  • MATLAB fundamentals

  • Introduction of Signal Processing and Filtering

  • Beginning Project Implementation


Speech Recognition

Front End:

Pre-processing

Back End: Recognition

Features

Recognized speech

Speech

Large amount of data.

Ex: 256 samples

Reduced data size. Ex: 13 features

  • Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector.

    • 256 samples ------> 13 features

  • Back End - statistical models used to classify feature vectors as a certain sound in speech


Discrete Time Signals

  • Computer is a discrete system with finite memory resources, requires a discrete representation of sound

  • Sound represented as a sequence of samples

    • time vs. amplitude

    • Amplitude = volume


Discrete Time Signals


Discrete Time Signals

  • Sampling rate (# of samples per second)

    • 8 kHz - telephone

    • 44.1 KHz – CD audio

    • 96 kHz – DVD audio


Frequency Domain

  • Need to analyze signals over frequency rather than time.

  • Sound is composed of many frequencies at the same time

  • Frequency determines the pitch of the sound

  • To recognize the sound, we need to know the frequencies that make the sound.


Fast Fourier Transform (FFT)

  • Algorithm used to transform time domain to frequency domain.

  • MATLAB function:

    FFT(X,N)X – discrete time signal

    N – FFT size

X – frequency spectrum

K - frequency bin

N – FFT size

n - sample number

x[n] – input signal


Sine Wave Example

  • MATLAB function sine_sound

    • Generate 3 sine waves and a composite signal

    • Play sound and plot graphs

    • Compute and plot FFT of composite signal


Sine Wave Example

% plays a C major chord (C4, E4, F4)

sine_sound(8000, 261.626, 329.628, 391.995, 1, 4096);


Front-End Processing of Speech Recognizer

  • Pre-emphasis

  • Window

  • FFT

  • Mel-Scale

  • log

  • IFFT


Pre-Emphasis

  • 1st order FIR filter

  • In human speech, higher frequencies have less energy. Need to compensate for higher frequency roll off in human speech

  • High Pass filter


Windowing

  • Separate speech signal into frames

  • Apply window to smooth edges of framed of speech signal


Mel-Scale

  • Model sound as humans perceive it – logarithmically.

  • At high frequencies, a larger change in frequency is required to notice a difference

  • Convert linear scale (Hz) to logarithmic scale (mel-scale)


FFT Size Comparison


Connections to High School Mathematics Curriculum

  • Florida Math Standard (NGSSS) MA.912.T.1.8:

    • Solve real world problems involving applications of trigonometric functions using graphing technology when appropriate.

  • Pre-Calculus course

    • related topics include graphs of trigonometric functions, unit circle, logarithmic scale, complex numbers in trig form


Timeline

  • Week 1

    • MATLAB fundamentals

    • MATLAB Filter Design & Analysis Tool

    • Introduction to Signal Processing, FFT, Filtering

    • Identified topics connected to high school math curriculum

  • Week 2

    • Continued tutorials on signal processing and filtering

    • Implementation of sample code for use in lesson plans

    • Implementation of Pre-emphasis, Windowing, FFT, Cepstral Transform


Timeline

  • Week 3 – 6

    • Implementation of Front-End Speech Processing

    • Work on deliverables.


References

  • Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007.

  • Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010.

  • Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, 2007.


Thank you!

Questions?


  • Login