Speech processing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Speech Processing PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

Speech Processing. Applications of Images and Signals in High Schools. AEGIS RET All-Hands Meeting Florida Institute of Technology July 6, 2012. Contributors. Dr . Veton Këpuska , Faculty Mentor, FIT [email protected] Jacob Zurasky , Graduate Student Mentor, FIT [email protected]

Download Presentation

Speech Processing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Speech processing

Speech Processing

Applications of Images and Signals in High Schools

AEGIS RET All-Hands Meeting

Florida Institute of Technology

July 6, 2012


Contributors

Contributors

Dr. VetonKëpuska, Faculty Mentor, FIT

[email protected]

Jacob Zurasky, Graduate Student Mentor, FIT

[email protected]

Becky Dowell, RET Teacher, BPS Titusville High

[email protected]


Timeline

Timeline

  • 1874: Alexander Graham Bell proves frequency harmonics from electrical signal can be divided

  • 1952: Bell Labs develops first effective speech recognizer

  • 1971-1976 DARPA: speech should be understood, not just recognized

  • 1980’s: Call center and text-to-speech products commercially available

  • 1990’s: PC processing power allow of SR software by ordinary user

    Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm


Motivation

Motivation

  • Applications

    • Call center speech recognition

    • Speech-to-text applications (e.g. dictation software)

    • Hands-free user-interface (e.g., OnStar, XBOX Kinect, Siri)

      • Science Fiction 1968: Stanley Kubrick’s 2001: A Space Odysseyhttp://www.youtube.com/watch?v=6MMmYyIZlC4

      • Science Fact 2011: Apple iPhone 4S Sirihttp://www.apple.com/iphone/features/siri.html


Difficulties

Difficulties

  • Continuous Speech (word boundaries)

  • Noise

    • Background

    • Other speakers

  • Differences in speakers

    • Dialects/Accents

    • Male/female


Motivation1

Motivation

  • Speech recognition requires speech to first be characterized by a set of “features”.

  • Features are used to determine what words are spoken.

  • Our project implements the feature extraction stage of a speech processing application.


Speech recognition

Speech Recognition

Front End:

Pre-processing

Back End: Recognition

Features

Recognized speech

Speech

Large amount of data.

Ex: 256 samples

Reduced data size. Ex: 13 features

  • Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector.

    • 256 samples ------> 13 features

  • Back End - statistical models used to classify feature vectors as a certain sound in speech


Front end processing of speech recognizer

Front-End Processing of Speech Recognizer

  • High pass filter to compensate for higher frequency roll off in human speech

  • Pre-emphasis


Front end processing of speech recognizer1

Front-End Processing of Speech Recognizer

  • High pass filter to compensate for higher frequency roll off in human speech

  • Separate speech signal into frames

  • Apply window to smooth edges of framed speech signal

  • Window

  • Pre-emphasis


Front end processing of speech recognizer2

Front-End Processing of Speech Recognizer

  • High pass filter to compensate for higher frequency roll off in human speech

  • Separate speech signal into frames

  • Apply window to smooth edges of framed speech signal

  • Window

  • FFT

  • Pre-emphasis

  • Transform signal from time domain to frequency domain

  • Human ear perceives sound based on frequency content


Front end processing of speech recognizer3

Front-End Processing of Speech Recognizer

  • High pass filter to compensate for higher frequency roll off in human speech

  • Separate speech signal into frames

  • Apply window to smooth edges of framed speech signal

  • Window

  • FFT

  • Pre-emphasis

  • Mel-Scale

  • Transform signal from time domain to frequency domain

  • Human ear perceives sound based on frequency content

  • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)


Front end processing of speech recognizer4

Front-End Processing of Speech Recognizer

  • High pass filter to compensate for higher frequency roll off in human speech

  • Separate speech signal into frames

  • Apply window to smooth edges of framed speech signal

  • Window

  • FFT

  • log

  • Pre-emphasis

  • Mel-Scale

  • Transform signal from time domain to frequency domain

  • Human ear perceives sound based on frequency content

  • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)

  • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals


Front end processing of speech recognizer5

Front-End Processing of Speech Recognizer

  • High pass filter to compensate for higher frequency roll off in human speech

  • Separate speech signal into frames

  • Apply window to smooth edges of framed speech signal

  • Window

  • FFT

  • log

  • IFFT

  • Pre-emphasis

  • Mel-Scale

  • Transform signal from time domain to frequency domain

  • Human ear perceives sound based on frequency content

  • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)

  • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals

  • Inverse of FFT to transform to Cepstral Domain… the result is the set of “features”


Speech analysis and sound effects sase project

Speech Analysis and Sound Effects (SASE) Project

  • Graphical User Interface (GUI)

  • Speech input

    • Record and save audio

    • Sound file (*.wav, *.ulaw, *.au)

  • Graphs the entire audio signal

  • Select a “frame” by clicking on graph

  • Process speech frame and display output for each stage of processing

  • Displays spectrogram


Speech processing

GUI Components


Speech processing

GUI Components

Plotting Axes


Speech processing

Buttons

GUI Components

Plotting Axes


Matlab code

MATLAB Code

  • Graphical User Interface (GUI)

    • GUIDE (GUI Development Environment)

    • Callback functions

    • Work in progress

    • Extendable

  • Stages of speech processing

    • Modular functions for reusability


Sase lab

SASE Lab

  • Interactive teaching tool

  • Demo


Future work

Future Work

  • Improve GUI

  • Audio Effects

    • Ex: Echo, Reverberation, Chorus, Flange

  • Noise Filtering


References

References

  • Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007.

  • Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010.

  • Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, 2007.

  • Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm


Speech processing

Thank you!

Questions?


Unit plan

Unit Plan

  • Introduction

  • Lesson #1: The Sound of a Sine Wave

  • Lesson #2: Frequency Analysis

  • Lesson #3: Filtering (work in progress)

  • Lesson #4: SASE Lab (work in progress)

  • Conclusion


  • Login