1 / 54

Design and Implementation of Voice Conversion Application (VOCAL)

Design and Implementation of Voice Conversion Application (VOCAL). Elizabeth Kwan (26406025) Supervised by: Ms. Liliana, M.Eng Mr. Resmana Lim, M.Eng. A method to transform the input speech signal such that the output signal will be perceived as produced by another speaker. ?.

amber
Download Presentation

Design and Implementation of Voice Conversion Application (VOCAL)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design and Implementation of Voice Conversion Application (VOCAL) Elizabeth Kwan (26406025) Supervised by: Ms. Liliana, M.Eng Mr. Resmana Lim, M.Eng

  2. A method to transform the input speech signal such that the output signal will be perceived as produced by another speaker ? DEFINITIONWhat is Voice Conversion?

  3. Rapid development in speech technology • Speech recognition and text-to-speech have been the priorities in research efforts to improve human-machine (computer) interaction • Improve the naturalness of human-machine (computer) interaction • Voice conversion used in personification of speech enabled system ? BACKGROUNDWhy Voice Conversion?

  4. GENERAL : • Format : wave file (.wav), single channel (mono) INPUT : • Source speaker and target speaker which speaks same utterances • Home recording • One person with minimal noise (no background sound) • For speech only ? SCOPE & LIMITATIONScope and limitation of project

  5. PROCESS : • Not real-time, pre-record speech needed • Text-dependent OUTPUT • Output signal will be perceived as produced by another speaker, judge by subjectivity of human auditory perception • Dialect not included ? SCOPE & LIMITATIONScope and limitation of project

  6. Test using Mean Opinion Score (MOS) • Developed in .NET environment (C# .NET Visual Studio 2005) ? SCOPE & LIMITATIONScope and limitation of project

  7. Difference system conversion used difference methods General system: • A method to represent the speaker specific characteristics of the speech waveform • A method to map the source and the target acoustical spaces • A method to modify the characteristics of the source speech using the mapping obtained in previous step ? VOICE CONVERSION METHODBrief explanation on Voice Conversion

  8. ? VOICE CONVERSION METHODPage 33

  9. SEGMENTATION ANALYSIS or MODELING TRANSFORMATION SYNTHESIS ? VOICE CONVERSION METHODMain Process (Flow Chart see Page 30)

  10. Complexity of human language Speech is more than sequences of phones that forms words and sentences. It carries information (rhythm, intonation, stress of words, etc) This information is varied from one person to the others The infinite variety raised the application complexity, especially in segmentation ? WHY IT IS DIFFICULT?External Problems

  11. Speaker Variability Unique voice. Speech generated from one person may varied too - Realization - Speaking style - Sex of speaker - Anatomy of vocal tract - Speed of speech - Dialects ? WHY IT IS DIFFICULT?External Problems

  12. Digital form only contains information of amplitude per periods • Amplitude can not directly used to determined the speech parameters (problems for analysis process) • Manipulate (add or delete) some part of the sound would effect to whole sound ? WHY IT IS DIFFICULT?Internal Problems

  13. SEGMENTATION ANALYSIS or MODELING TRANSFORMATION SYNTHESIS ? VOICE CONVERSION METHODMain Process (Flow Chart see Page 30)

  14. It is difficult to process entire phrase as tone, pitch, and other characteristics may diverse over the whole signal • Split base on syllable • Use end-point detection methods, combination of volume (two volume threshold) and zero-crossing rate (ZCR) ? SEGMENTATIONFlow Chart see Page 34

  15. Volume Loudness of audio signal • Zero-Crossing Rate (ZCR) Rate where signal change from positive to negative, and vise versa ? SEGMENTATIONFlow Chart see Page 34

  16. ? SEGMENTATIONFlow Chart see Page 34

  17. SEGMENTATION ANALYSIS or MODELING TRANSFORMATION SYNTHESIS ? VOICE CONVERSION METHODMain Process (Flow Chart see Page 30)

  18. ANALYSIS or MODELING Linear Predictive Coding Pitch Period Computation ? ANALYSIS OR MODELINGMain Process (Flow Chart see Page 36)

  19. ANALYSIS or MODELING Linear Predictive Coding Pitch Period Computation ? ANALYSIS OR MODELINGMain Process (Flow Chart see Page 36)

  20. ? ANALYSIS OR MODELINGModeling Vocal Tract

  21. ? ANALYSIS OR MODELINGModeling Vocal Tract Source : signal x(t) [excitation signal] Filter : linear time invariant h(t)[transfer function] Speech : convolution of source and filter y(t) = x(t) * h(t)

  22. ? ANALYSIS OR MODELINGModeling Vocal Tract De-convolution needed Use of LPC methods predicting a sample of a speech signal based on several previous samples

  23. ? ANALYSIS OR MODELINGLinear Predictive Coding

  24. ANALYSIS or MODELING Linear Predictive Coding Pitch Period Computation ? VOICE CONVERSION METHODMain Process (Flow Chart see Page 36)

  25. Pitch Period Computation Pitch Analysis Glottal Pulse Computation Pitch Tier Computation ? VOICE CONVERSION METHODMain Process (Flow Chart see Page 36)

  26. Pitch Analysis Based on autocorrelation methods (Boersma 1993) ? ANALYSIS OR MODELINGPitch Period Computation

  27. Glottal Pulse Computation Repeated pattern of voiced sound τ : glottal pulse ? ANALYSIS OR MODELINGPitch Period Computation

  28. Pitch Tier Calculation total points according to total voiced frames from pitch contour obtained from previous step ? ANALYSIS OR MODELINGPitch Period Computation

  29. SEGMENTATION ANALYSIS or MODELING TRANSFORMATION Synthesis ? VOICE CONVERSION METHODMain Process (Flow Chart see Page 30)

  30. ? TRANSFORMATIONTransform speech parameter obtained

  31. SEGMENTATION ANALYSIS or MODELING TRANSFORMATION SYNTHESIS ? SYNTHESISMain Process (Flow Chart see Page 30)

  32. Use of LPC Filter method to reconstruct transformed speech ? SYNTHESISFlow Chart see Page 46

  33. ? EXPERIMENTAL RESULT

  34. ? TESTINGEffect of choice of hardware used to record

  35. ? TESTINGEffect of choice of hardware used to record

  36. ? TESTINGEffect of choice of hardware used to record

  37. Speech : “Hai” from 4 difference speakers ? TESTINGTest on segmentation

  38. Speech : “Hai” from 4 (four) difference speakers ? TESTINGTest on segmentation

  39. Speech : “Hai” from 4 (four) difference speakers Percentage result: For speech with only 1 (one) syllable : 100% success ? TESTINGTest on segmentation

  40. Speech : “Saya” from 4 difference speakers ? TESTINGTest on segmentation

  41. Speech : “Saya” from 4 difference speakers ? TESTINGTest on segmentation

  42. Speech : “Saya” from 4 (four) difference speakers Percentage result: For speech with 2 (two) syllables without paused : 0% success (All detect as 1 (one) syllable only) But it works good in the application : 100% success ? TESTINGTest on segmentation

  43. Speech : “Sistem Cerdas” from 4 difference speakers ? TESTINGTest on segmentation

  44. Speech : “Sistem Cerdas” from 4 difference speakers ? TESTINGTest on segmentation

  45. Speech : “Sistem Cerdas” from 4 (four) difference speakers Percentage result: For speech with more complex forms : 50% success Related to Speaker Variability ? TESTINGTest on segmentation

  46. ? TESTINGTest on pitch modification

  47. Average percentage result: 98.67 % ? TESTINGTest on pitch modification

  48. Similarity (based on human auditory perception) • Test on 20 peoples, 5 utterances • Overall result : 3.71 of 5.0 ? TESTINGSubjectivity Test

  49. Based on gender • Test on 22 peoples, 2 utterances. • 4 combinations gender for each utterance ? TESTINGSubjectivity Test

  50. Similarity of speaker characteristic • Test on 22 peoples, 5 utterances • Overall result : 3.64 of 5.0 ? TESTINGSubjectivity Test

More Related