160 likes | 322 Views
Study and Implementation of Two Voice Warping Algorithms: Pitch Shifting & Time Stretching. EEL 6586 Project Presentation Deng, Chengyu Wang, Dexiang. Outline. Pitch analysis Voice warping applications Pitch shifting Time stretching Pitch shifting algorithm How to change pitch
E N D
Study and Implementation of Two Voice Warping Algorithms:Pitch Shifting & Time Stretching EEL 6586 Project Presentation Deng, Chengyu Wang, Dexiang
Outline • Pitch analysis • Voice warping applications • Pitch shifting • Time stretching • Pitch shifting algorithm • How to change pitch • General approaches: phase vocoder VS PSOLA • Improve frequency resolution • Formants consideration • Time stretching implementation • Software exhibition (real work we have done)
Take a Look at Pitch • The perceived fundamental frequency of a sound (definition) Pitch period • Due to glottis excitation • An important identification of male or female, adults or children • Accompany with lots of harmonics
Voice Warping Applications • Pitch shifting: maintain time duration but upscale or downscale pitch • Change men’s voice to women’s OR vice versa • Create chipmunk or Mickey mouse like sounds • Lots of applications in movie industry • Time stretching: keep the pitch unchanged but shorten or stretch time duration • Help with word identification • Create some extremely short or long period of voice which can hardly be spoken by normal people
How to Change Pitch? • Naïve idea • Down-sample or up-sample the speech signal • Problems • Time duration also gets changed • Formants get moved as well • We should generate the same number of samples but only scale the pitch
Two General Approaches • Phase Vocoder • Manipulate the signal in frequency domain • Phase is an important feature to determine the pitch and its harmonic position • More accurate, higher fidelity, but longer computation • Time domain scaling ((P)SOLA, etc) • Manipulate the signal in time domain • Precise pitch detection is a critical prerequisite • Shorter computation, but lower quality • (P)SOLA: (Pitch) Synchronous OverLap/Add
Window Concatenation Original Speech Window STFT Frequency Scaling iSTFT Pitch shifted Speech Basic Algorithms We Used for Pitch Shifting • Frequency domain process (more accurate) • Use short time frequency transform • And overlapped windows • Scale the frequency axis to change the pitch and harmonics positions • Upscale: discard high frequency components to avoid aliasing (human cannot feel difference) • Downscale: put zeros as high frequency components
Improve Frequency Resolution • Due to the accuracy limitation of discrete fourier transform • Cannot precisely represent peak components • Example • A: frequency point exactly on 50th sample • B: frequency point in between 50th and 51st samples • Solution • Utilize phase difference between two successive windows to compute exact frequency bins (final report will have more details)
Formants Consideration • Deal with formant movement issues • Lose vocal tract information • Upshifting pitch -> smaller vocal tract (shape) effect • Downshifting pitch -> bigger vocal tract (shape) effect • Solution • Calculate formants envelop (LPC) • Normalize magnitudes before frequency scaling • Scale frequency axis • Recover formants envelop
Frequency Domain Interpolation /Compression Window Concatenation Original Speech Window STFT iSTFT Pitch shifted Speech Time Stretching Implementation • Still take advantage of frequency domain manipulation • Stretch time duration • Interpolate additional samples between original frequency bins (upsampling in frequency domain) • Linear interpolation instead of SINC function interpolation (for convenience of computation) • Shrink time duration • Compression of original frequency bin samples (downsampling in frequency domain)
Put All Together (Building Our Software Implementation) • Windows platform / Visual C++ • Self-developed framework & algorithms • Formant position maintenance (LPC formant envelop calculation) • Time stretching • Borrowed idea and some source codes from DSP website • http://www.dspdimension.com/ for elementary frequency shifting algorithm • http://www.koders.com/ for Levinson LPC algorithm
Introduction to Our GUI Functions • Set target parameters needed by pitch shifting and time stretching process • Click “Process Voice File…” to assign the original voice file and altered voice file • Waiting for process completion • Click “Play Voice File…” button to hear the effect of altered voice
Introduction to Our GUI Functions • Advanced setup • Change the parameters used in our algorithms • LPC order • Window size • Overlapped percentage of the windows
Demo Show (Question session follows)