speech processing
Download
Skip this Video
Download Presentation
Speech Processing

Loading in 2 Seconds...

play fullscreen
1 / 25

Speech Processing - PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on

Speech Processing. Applications of Images and Signals in High Schools. AEGIS RET All-Hands Meeting Florida Institute of Technology July 6, 2012. Contributors. Dr . Veton Këpuska , Faculty Mentor, FIT [email protected] Jacob Zurasky , Graduate Student Mentor, FIT [email protected]

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Speech Processing' - galia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
speech processing

Speech Processing

Applications of Images and Signals in High Schools

AEGIS RET All-Hands Meeting

Florida Institute of Technology

July 6, 2012

contributors
Contributors

Dr. VetonKëpuska, Faculty Mentor, FIT

[email protected]

Jacob Zurasky, Graduate Student Mentor, FIT

[email protected]

Becky Dowell, RET Teacher, BPS Titusville High

[email protected]

timeline
Timeline
  • 1874: Alexander Graham Bell proves frequency harmonics from electrical signal can be divided
  • 1952: Bell Labs develops first effective speech recognizer
  • 1971-1976 DARPA: speech should be understood, not just recognized
  • 1980’s: Call center and text-to-speech products commercially available
  • 1990’s: PC processing power allow of SR software by ordinary user

Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm

motivation
Motivation
  • Applications
    • Call center speech recognition
    • Speech-to-text applications (e.g. dictation software)
    • Hands-free user-interface (e.g., OnStar, XBOX Kinect, Siri)
      • Science Fiction 1968: Stanley Kubrick’s 2001: A Space Odysseyhttp://www.youtube.com/watch?v=6MMmYyIZlC4
      • Science Fact 2011: Apple iPhone 4S Sirihttp://www.apple.com/iphone/features/siri.html
difficulties
Difficulties
  • Continuous Speech (word boundaries)
  • Noise
    • Background
    • Other speakers
  • Differences in speakers
    • Dialects/Accents
    • Male/female
motivation1
Motivation
  • Speech recognition requires speech to first be characterized by a set of “features”.
  • Features are used to determine what words are spoken.
  • Our project implements the feature extraction stage of a speech processing application.
speech recognition
Speech Recognition

Front End:

Pre-processing

Back End: Recognition

Features

Recognized speech

Speech

Large amount of data.

Ex: 256 samples

Reduced data size. Ex: 13 features

  • Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector.
    • 256 samples ------> 13 features
  • Back End - statistical models used to classify feature vectors as a certain sound in speech
front end processing of speech recognizer
Front-End Processing of Speech Recognizer
  • High pass filter to compensate for higher frequency roll off in human speech
  • Pre-emphasis
front end processing of speech recognizer1
Front-End Processing of Speech Recognizer
  • High pass filter to compensate for higher frequency roll off in human speech
  • Separate speech signal into frames
  • Apply window to smooth edges of framed speech signal
  • Window
  • Pre-emphasis
front end processing of speech recognizer2
Front-End Processing of Speech Recognizer
  • High pass filter to compensate for higher frequency roll off in human speech
  • Separate speech signal into frames
  • Apply window to smooth edges of framed speech signal
  • Window
  • FFT
  • Pre-emphasis
  • Transform signal from time domain to frequency domain
  • Human ear perceives sound based on frequency content
front end processing of speech recognizer3
Front-End Processing of Speech Recognizer
  • High pass filter to compensate for higher frequency roll off in human speech
  • Separate speech signal into frames
  • Apply window to smooth edges of framed speech signal
  • Window
  • FFT
  • Pre-emphasis
  • Mel-Scale
  • Transform signal from time domain to frequency domain
  • Human ear perceives sound based on frequency content
  • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)
front end processing of speech recognizer4
Front-End Processing of Speech Recognizer
  • High pass filter to compensate for higher frequency roll off in human speech
  • Separate speech signal into frames
  • Apply window to smooth edges of framed speech signal
  • Window
  • FFT
  • log
  • Pre-emphasis
  • Mel-Scale
  • Transform signal from time domain to frequency domain
  • Human ear perceives sound based on frequency content
  • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)
  • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals
front end processing of speech recognizer5
Front-End Processing of Speech Recognizer
  • High pass filter to compensate for higher frequency roll off in human speech
  • Separate speech signal into frames
  • Apply window to smooth edges of framed speech signal
  • Window
  • FFT
  • log
  • IFFT
  • Pre-emphasis
  • Mel-Scale
  • Transform signal from time domain to frequency domain
  • Human ear perceives sound based on frequency content
  • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)
  • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals
  • Inverse of FFT to transform to Cepstral Domain… the result is the set of “features”
speech analysis and sound effects sase project
Speech Analysis and Sound Effects (SASE) Project
  • Graphical User Interface (GUI)
  • Speech input
    • Record and save audio
    • Sound file (*.wav, *.ulaw, *.au)
  • Graphs the entire audio signal
  • Select a “frame” by clicking on graph
  • Process speech frame and display output for each stage of processing
  • Displays spectrogram
slide17
GUI Components

Plotting Axes

slide18
Buttons

GUI Components

Plotting Axes

matlab code
MATLAB Code
  • Graphical User Interface (GUI)
    • GUIDE (GUI Development Environment)
    • Callback functions
    • Work in progress
    • Extendable
  • Stages of speech processing
    • Modular functions for reusability
sase lab
SASE Lab
  • Interactive teaching tool
  • Demo
future work
Future Work
  • Improve GUI
  • Audio Effects
    • Ex: Echo, Reverberation, Chorus, Flange
  • Noise Filtering
references
References
  • Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007.
  • Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010.
  • Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, 2007.
  • Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm
slide24
Thank you!

Questions?

unit plan
Unit Plan
  • Introduction
  • Lesson #1: The Sound of a Sine Wave
  • Lesson #2: Frequency Analysis
  • Lesson #3: Filtering (work in progress)
  • Lesson #4: SASE Lab (work in progress)
  • Conclusion
ad