html5-img
1 / 27

uSpeak2Me Esol Android Tool

uSpeak2Me Esol Android Tool. Speech Recognition ECE5526 Wilson Burgos. Outline. Introduction Objective Existing Solutions Implementation Test and Result Conclusion. Introduction. Lots of $$$ are spent annually to improve language skills for non native speakers.

kalia-park
Download Presentation

uSpeak2Me Esol Android Tool

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. uSpeak2MeEsol Android Tool Speech Recognition ECE5526 Wilson Burgos

  2. Outline • Introduction • Objective • Existing Solutions • Implementation • Test and Result • Conclusion

  3. Introduction • Lots of $$$ are spent annually to improve language skills for non native speakers. • Classes for ESOL (English Speakers of other languages) • Lack of effective tools • Speech recognition can help us in some areas

  4. Objective • Create a tool to help people learn to speak English correctlyin an effective way. • Engage people using new technology (Smartphone's) • Using pocketsphinx, android and Text-to-speech technology • Simple and intuitive to use • Fun

  5. Existing Solutions • EyeSpeak - http://www.eyespeakenglish.com • Pros • Uses Native Speakers • Pronunciation, pitch, timing & loudness • Cons • Difficult to use • Runs only on windows

  6. Concept of Operation • The user from the main menu can start the game • The game screen must lead the user through a series of words and log the number of positive responses (the score). • Each word has a corresponding graphic to display. For example, the game might show the user a picture of a mountain • The user has at most 30 seconds to respond

  7. Development Environment • Eclipse IDE with Android plugin • Cygwin • Emulator • QEMU-based ARM emulator • Runs the same image as the device • Limitations • No Camera support

  8. Development Environment • Actual Device

  9. Implementation • Using Java with the Android SDK • Pocketsphinx • Lightweight speech recognition decoder library • Implemented in C

  10. Android Architecture

  11. Application Building Blocks • Activity • IntentReceiver • Service • ContentProvider

  12. Application Architecture

  13. Implementation • QuizGameActivity • The screen at the heart of the application—the game play screen. • This screen prompts the user to answer a series of trivia questions and stores the resulting score information • Uset Text-to-Speech technology to speak word if in simple mode

  14. Implementation RecognizerTask AudioTask PocketSphinx VUMeter

  15. Implementation • RecognizerTask • Interfaces directly to the pocketsphinx library using JNI calls • The Java Native Interface (JNI) enables the integration of code written in the Java programming language with code written in other languages such as C and C • Consumes data from the audio queue, produced by the AudioTask • Calls process_raw • Scoring • Based on positive detection of the utterance

  16. Implementation • AudioTask • Interfaces directly to the audio peripherals to gather data • Format Sample Rate 8000Hz, 16Bit PCM, 8192k buffer

  17. PocketSphinx • Very limited documentation • Packaged the pocketsphinx into a shared library • Created java shared library counterpart (jar) • To be added to the android application • Compiled using cygwin • Initialized with custom dictionary and language model • Speak2me.dic • Speak2me.lm • Loaded at startup from java code

  18. Limitations • Hardware memory • In the sphinx4 demos the recognizer was active all the time gathering data. When running in the device the AudioRecord buffer fills up preventing the recognizer to be active all the time. • Game needs to be responsive, how to solve this problem?

  19. Limitations • Hardware memory • The VUMeter class calculates the energy of the sampled data, removing the DC offset with a filter. • Detection logic was added to trigger end of utterance automatically with configurable lock/unlock thresholds • The game timer automatically starts the recognizer after every given word • Device Speed • To improve detection the application uses the partial results to determine if a match has been found, doesn’t penalize if partial is incorrect.

  20. Screenshots

  21. Test and Results • The cmu07a.dic recognized very poorly • hub4_wsj_sc_3s_8k.cd_semi_5000 • TOTAL Words: 91 • Correct: 56 Errors: 46 • TOTAL Percent correct = 61.54% • Error = 50.55% • Accuracy = 49.45% • TOTAL Insertions: 11 Deletions: 3 Substitutions: 32 • hub4_wsj_sc_3s_8k.cd_semi_5000adapt • TOTAL Words: 91 Correct: 71 Errors: 25 • TOTAL Percent correct = 78.02% • Error = 27.47% Accuracy = 72.53% T • TOTAL Insertions: 5 Deletions: 9 Substitutions: 11

  22. Test and Results • Using the custom corpus and creating custom language model the tool accurately detects speech in a timely fashion ~2s.

  23. Installation • Need to install custom lexical and language modeling files

  24. Future Additions • Adapt scoring based on pitch and phoneme recognition. • Add different levels of difficulty • Show progress reports

  25. References • http://developer.android.com • http://sites.google.com/site/io • Sams Teach Yourself Android Application Development, Lauren Darcey & Shane Conder (2010)

More Related