1 / 21

Text to Speech Synthesizer Presentation for COSC 683: Software Engineering Practicum by Kishore Chidarapu E00652529

IntroductionProject BeneficiariesGeneral ObjectivesModules Java Speech APIInterfaces and MethodsScreen shots

antoinette
Download Presentation

Text to Speech Synthesizer Presentation for COSC 683: Software Engineering Practicum by Kishore Chidarapu E00652529

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Text to Speech Synthesizer Presentation for COSC 683: Software Engineering Practicum by Kishore Chidarapu E00652529

    2. Introduction Project Beneficiaries General Objectives Modules Java Speech API Interfaces and Methods Screen shots “Demo” Future Enhancements Thank you

    3. Introduction to the Company This project is done in Capgemini. Under the guidance of Mr. Srini Kancha, Senior Project Manager. Capgemini serves industries like Automotive, Consumer Products, Financial Services, Health, Retail etc. My project comes under consumer products.

    4. Project Beneficiaries Visually impaired people Potential users of Computer System Students and researchers IT companies General public

    5. General Objective This project performs the task of developing an application program which can read out an input text typed at a source.

    6. Modules Structure analysis Process the input text to determine where paragraphs, sentences and other structures start and end. For most languages, punctuation and formatting data are used in this stage.

    7. Modules Text pre processing Analyze the input text for special constructs of the language. In English, special treatment is required for abbreviations, acronyms, dates, times, numbers, currency amounts, email addresses and many other forms.

    8. Modules Text-to-phoneme conversion Convert each word to phonemes. A phoneme is a basic unit of sound in a language. US English has around 45 phonemes including the consonant and vowel sounds. Different languages have different sets of sounds (different phonemes).

    9. Modules Prosody code Process the sentence structure, words and phonemes to determine appropriate prosody for the sentence. This includes the pitch (or melody), the timing (or rhythm), the pausing, the speaking rate, the emphasis on words and many other features.

    10. Modules Waveform production The phonemes and prosody information are used to produce the audio waveform for each sentence. The current systems do it by concatenation of chunks of recorded human speech, or formant synthesis.

    11. Java Speech API The Java Speech API enables developers of speech-enabled applications to incorporate more sophisticated and natural user interfaces into Java applications and applets. Two core speech technologies are supported through the Java Speech API: speech recognition speech synthesis.

    12. Design Goals for the Java Speech API Provide support for speech synthesizers and for both command-and-control and dictation speech recognizers. Provide a robust cross-platform, cross-vendor interface to speech synthesis and speech recognition. Enable access to state-of-the-art speech technology. Support integration with other capabilities of the Java platform, including the suite of Java Media APIs. Be simple, compact and easy to learn.

    13. Interfaces and Methods com.sun.speech.freetts.voice Class Voice allocate() deallocate() getDomain() speak() com.sun.speech.freetts.VoiceManager VoiceManager getInstance() getVoice()

    17. “DEMO”

    18. Future Enhancements With various types of given text the TTS conversion tool will be tested for naturalness and accuracy and examined by linguistic experts to achieve more correct pronunciation. The outcomes of these examinations shall be incorporated to the TTS.

    19. Future Enhancements A new button is created by name “Image converter” This can be used to recognize the characters of languages like Chinese, Japanese, Mongolian. Each character of that language is recognized and converted into English and then converted into speech.

    20. Future Enhancement Input text which is converted into speech is stored as a wave file. Stream the audio thus obtained to a destination, which can be any computer in a network.

    21. Thank you one and all

More Related