1 / 30

Knowledge Base approach for spoken digit recognition

Knowledge Base approach for spoken digit recognition. Vijetha Periyavaram. Speech Recognition Systems. Provides a vehicle for communication between people and machines

havyn
Download Presentation

Knowledge Base approach for spoken digit recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge Base approach for spoken digit recognition Vijetha Periyavaram

  2. Speech Recognition Systems • Provides a vehicle for communication between people and machines • The exchange of information with machines is actually the complex product of more than 30 years of research in statistics, physics, linguistics, and computer science. • Characters in science fiction stories have conversed with robots and computers for a long time.

  3. Speech Recognition Systems • We may have shared a few words • with a computer, car, or cell phone • when they are not working properly • But now these machines can understand and can • respond because of speech recognition systems

  4. Advantages As a result of speech recognition systems you can • Ask your car for directions • Dial your mobile phone with out touching it. • Dictate a term instead of typing it on the keyboard • Give commands to a personal organizer e.g: Shutdown , pop up start menu etc…

  5. Main Concept of Speech Recognition Systems • Speech recognition systems first break down spoken language into phonemes For example: /w/ as in "we" "quite" "once"   /ch/ as in "much" "nature" "match" /ou/ as in "no" "boat" "low"   /au/ as in "haul" "bought" "draw" • Almost 40 phonemes

  6. Main Concept • The system converts the individual sounds into • digitized sound waves ,which it matches with • a built in dictionary • The speech recognition system figures out the correct • choice through a series of algorithms, or mathematical • models, that help narrow down the possibilities to ones • that make the most sense

  7. Proposed Method • One of the method proposed here for speech recognition systems is Knowledge base approach for spoken digit recognition. • In this method digitized data is processed using MATLAB – DSP tool box.

  8. Problem definition • To develop a system that can identify an isolated spoken digit based on the knowledge developed by analyzing the digits • Analysis is based on on the following features which can be extracted using Matlab – DSP tool kit • Energy envelope: Plots the energy of the wave • Zero crossing rate: No. of times in a sound sample that amplitude of the sound wave changes sign

  9. Proposed solution • Utterances of different people are studied • Knowledge base for digits is created from above • Each digit has unique characteristics irrespective of speaker’s nationality because this method mainly concentrates on the phonemes • Analyzing few features of these spoken words the digits are recognized • The output is printed on the screen.

  10. Scope of the system • Isolated word vocabulary • Unlimited speaker population, unrestricted by age or sex • Computer room speaking environment • Transmission over high quality microphone • No prior training • Single word format with pauses between each spoken input

  11. Technique used • Speech signal is sampled at a particular frequency • End points of isolated words detected • Data time normalized • All digits set to same number of data • Zero crossing rate and energy envelope determined from each segment

  12. Data Acquisition • Records voice from the user using multimedia sound recording equipment • Data is digitized • Sampled at the rate mentioned • Recorded speech is plotted

  13. Recorded wave for digit zero

  14. Filtering the recorded data • Required because of presence of environmental, system, and inherent microphone noises • Uses elliptical band pass filter in range of human voice frequency

  15. Output of filter for digit zero

  16. Plotting energy envelope • Plots energy of spoken digit • This is smoothened using moving point average method • A 200-point moving point average chosen • Replaces each sample’s amplitude with average of 200 consecutive samples

  17. Energy envelope for digit zero

  18. Location of start and end points • Start and end points of envelope of spoken digit identified • Criterion: less than 10% of maximum value of the energy envelope is not considered

  19. Actual message for digit zero

  20. Time normalization of data • Envelope resampled so that spoken word always contains 6000 samples • Envelope smoothened using moving point average method

  21. Resampled envelope for digit zero

  22. Various wave forms for digit seven

  23. FLOW CHART Start Wave Recording (8000samples) Filtering (Band pass of 400 – 3200HZ) Energy Calculation

  24. FLOW CHART Smoothening (moving point average Filter of 200 pts) Determination of Start and end points Calculation of Zero crossing rate Resampling to 6000 samples

  25. FLOW CHART Calculation of no.of peaks and peak positions Classification Algorithm Stop

  26. Setting up the knowledge base • Number of peaks in energy envelope of spoken digit • Energy peak level • Energy peak positions • Zero crossing rate for each segment

  27. Classification • First sweep – counting number of peaks in energy envelope • Single peak • Two peaks • Three peaks • Second sweep – peak positions • Third sweep – zero crossing rate

  28. Classification

  29. Results • The system was tested for 100 different human voice signals and the success rate was 89% • The final output was displayed on the monitor as well as on the LCD screen • The response time was 7 seconds.

  30. Few More Applications • Speaker Identification system • Security systems • Robot Control • Bank Transactions • Aircraft control system • Stock price quotation system

More Related