Microcomputer Systems 2

Microcomputer Systems 2 Time Stretching & Pitch Shifting of Audio Signals

Time Stretching & Pitch Shifting of Audio Signals Outline Introduction Techniques Used for Time Compression/Expansion and Pitch Shifting Comparison Timbre and Formants

Outline • Introduction • Frequency Shift vs. Pitch Shift – Audio Examples • Time Compression/Expansion • Techniques Used for Time Compression/Expansion and Pitch Shifting • The Phase Vocoder • Related Topics • Why Phase • Time Domain Harmonic Scaling (TDHS) • More recent approaches • Comparison • Which Method to Use • Pitch Shifting Considerations • Audio Examples • Timbre and Formants • Phase Vocoder and Formants • Time Domain Harmronic scaling and Formants Veton Këpuska

Introduction • Time Stretching & Pitch Shifting • Are two dominant techniques that used for speech and sound manipulation. • Typical applications entail: • Changing the speed of play-back (altering the length of the signal) without altering the pitch of the voice and/or instruments • Changing the pitch of the voice and/or instruments without changing the length of the signal. Veton Këpuska

Pitch Shifting

Pitch Shifting: • As opposed to the process of pitch transposition achieved using (a simple) sample rate conversion, Pitch Shifting is a way to change the pitch of a signal without changing its length. • In practical applications, this is usually achieved by changing the length of a sound using one of the methods discussed next and then performing a sample rate conversion to change the pitch. Veton Këpuska

Introduction • Pitch Shifting is NOT Frequency Shifting: • There exists a certain confusion in terminology in the literature, as Pitch Shifting is often also incorrectly named 'Frequency Shifting'. • A true Frequency Shift (as obtainable by modulating an analytic signal by a complex exponential) will shift the spectrum of a sound, while • Pitch Shifting will dilate it, upholding the harmonic relationship of the sound. • Frequency Shifting yields a metallic, inharmonic sound which may well be an interesting special effect but which is a totally inadequate process for changing the pitch of any harmonic sound except a single sine wave. Veton Këpuska

Audio Examples of Pitch Shifting vs. Frequency Shifting • Original Sound: • Pitch Shifted: • Frequency Shifted: Veton Këpuska

Time Compression/Expansion

Time Compression/Expansion • Time Compression/Expansion, also known as "Time Stretching" is the reciprocal process to Pitch Shifting. • It leaves the pitch of the signal intact while changing its speed (tempo). • This is a useful application when you wish to change the speed of a voiceover without messing with the timbre of the voice. Veton Këpuska

Time Compression/Expansion • There are several fairly good methods to do time compression/expansion and pitch shifting but most of them will not perform well on all different kinds of signals and for any desired amount of shift/stretch ratio. • Typically, good algorithms allow pitch shifting up to 5 semitones on average or stretching the length by 130%. • When time stretching and pitch shifting single instrument recordings you might even be able to achieve a 200% time stretch, or a one-octave pitch shift with no audible loss in quality. Veton Këpuska

Time Compression/Expansion of Speech • Typical Goals • To either speed up or slow down a speech signal while maintaining the approximate pitch • Applications • Change voice mail playback • Court stenographers-play proceedings quicker • Sound effects • Etc… Veton Këpuska

Techniques Used for Time Compression/Expansion & Pitch Shifting • Option 1 – Change sample rate • If you modify the sample rate, you can change the speed but the pitch is also changed • Increase sample rate = higher pitch (chipmunk sound) • Decrease sample rate = lower pitch (drawn out echo sound) • Option 2 – Decimate or Interpolate Signal • If you change the number of samples, the result is the same as modifying the sample rate Veton Këpuska

Techniques Used for Time Compression/Expansion & Pitch Shifting • Option 3 – Use more complex methods • This will change the speed of the sample while preserving the pitch data • Short Time Fourier Transform • Short Time Fourier Transform Magnitude • Sinusoidal Synthesis • Linear Prediction Synthesis Veton Këpuska

Techniques Used for Time Compression/Expansion & Pitch Shifting • Currently, there are two different principal time compression/expansion and pitch shifting schemes employed in most of today's applications: • Phase Vocoder. • Time Domain Harmonic Scaling (TDHS). Veton Këpuska

Phase Vocoder

Phase Vocoder • Phase Vocoder. This method was introduced by Flanagan and Golden in 1966 and digitally implemented by Portnoff ten years later. • Portnoff, M.R. 1981a."Short-Time Fourier Analysis of Sampled Speech."IEEE Transactions on Acoustics, Speech and Signal ProcessingASSP-29(3):364-373. • Portnoff, M.R. 1981b."Time-Scale Modification of Speech Based on Short-Time Fourier Analysis."IEEE Transactions on Acoustics, Speech and Signal ProcessingASSP-29(3):374-390. Veton Këpuska

Phase Vocoder • It uses a Short Time Fourier Transform (use abbreviation STFT from here on) to convert the audio signal to the complex Fourier representation. • Since the STFT returns the frequency domain representation of the signal at a fixed frequency grid, the actual frequencies of the partial bins have to be found by converting the relative phase change between two STFT outputs to actual frequency changes. • Note the term 'partial' has nothing to do with the signal harmonics. In fact, a STFT will never readily give you any information about true harmonics if you are not matching the STFT length to the fundamental frequency of the signal – and even then is the frequency domain resolution quite different to what our ear and auditory system perceives. • The timebase of the signal is changed by calculating the frequency changes in the Fourier domain on a different time basis, and then an iSTFT is done to regain the time domain representation of the signal. Veton Këpuska

Phase Vocoder • Phase vocoder algorithms are used mainly in scientific and educational software products (to show the use and limitations of the Fourier Transform) but have gained in popularity over the past few years due to improvements that made it possible to greatly reduce the artifacts of the "original" phase vocoder algorithm. • The basic phase vocoder suffers from a severe drawback because it introduces a considerable amount of artifacts audible as 'smearing' and 'reverberation' (even at low expansion ratios) due to the “non-synchronized vertical coherence of the sine and cosine basis functions” that are used to change the timebase. Veton Këpuska

Phase Vocoder • Puckette, Laroche and Dolson have shown that the phasiness can be greatly reduced by picking peaks in the Fourier spectrum and keeping the relative phases around the peaks unchanged. Even though this improves the quality considerably it still renders the result somewhat phasey and diffuse when compared to time domain methods. • Current research focuses on improving the phase vocoder by applying intra-frame sinusoidal sweep and ramp rate correction (Bristow-Johnson and Bogdanowicz) and multi-resolution phase vocoder concepts (Bonada). Veton Këpuska

Links to Publicly Available Vocoders • Pointers - Phase Vocoder: • The MIT Lab Phase Vocoder • WaveMasher - GPL/Open Source Phase Vocoder by Kenneth Sturgis • Sculptor: A Real Time Phase Vocoder by Nick Bailey • A Phase Vocoder implementation using Matlab • More reading on the Phase Vocoder • The IRCAM "Super Phase Vocoder“ • S.M.Bernsee's Pitch Shifting Using The Fourier Transform article (with C code) Veton Këpuska

Time Domain Harmonic Scaling (TDHS).

Time Domain Harmonic Scaling (TDHS). • Time Domain Harmonic Scaling (TDHS). This is based on a method proposed by Rabiner and Schafer in 1978. It is heavily based on a correct estimate of the fundamental frequency of the sound processed. Veton Këpuska

Theory • Short Time Fourier Transform Methods • Chapter 7 in our text (Discrete-Time Speech Signal Processing) • Refer to notes from in class for mathematical theory of operation • I will pick up from where Dr. Kepuska stopped in his notes Veton Këpuska

How is the Speech/Sound Signal Processed • Link: • Ch7-Short-Time_Fourier_Transform_Analysis_and_Synthesis.ppt Veton Këpuska

Terminology & Basic Idea Frame Rate Window Size Veton Këpuska

Signal STFT Decimate Samples IFFT Output OLA Short Time Fourier Transform • Short Time Fourier Transform • Also called the Fairbanks method • Extract successive short-time segments and then discard the following ones Veton Këpuska

Short Time Fourier Transform • Frame Rate factor L • In frequency domain after taking the STFT, you get • X(nL,ω) • Form a new signal by • Y(nL, ω) = X(snL, ω) • where s = compression factor • Take Inverse Fourier Transform • Use Overlap and Add method to form new signal Veton Këpuska

Short Time Fourier Transform X(nL, ω) Y(nL, ω) = X(2nL, ω) Veton Këpuska

Short Time Fourier Transform New Sequence Original Windowed Sequence Veton Këpuska

Short Time Fourier Transform • Problems • Pitch Synchronization • It is highly likely that the pitch periods will not line up properly Veton Këpuska

Short Time Fourier Transform Magnitude • Short Time Fourier Transform Magnitude • Problems with STFT method relate directly to the linear phase component of the STFT • Time shift = phase change • Alternate approach is to only use the magnitude portion of the STFT—Short Time Fourier Transform Magnitude Veton Këpuska

Short Time Fourier Transform Magnitude • Compression • With the Fairbanks method, time slices were discarded • Now we can just compress the time slices • Form a new signal by • |Y(nM, ω)| = |X(nL, ω)| where • M = compression factor = L / speed • i.e. for speeding up by two => M = L/2 Veton Këpuska

Short Time Fourier Transform Magnitude • Compression • Take Inverse Fourier Transform • Use Overlap and Add method to form new signal Veton Këpuska

Short Time Fourier Transform Magnitude X(nL, ω) Y(nM, ω) = X(nL, ω) M=L/2 Veton Këpuska

Short Time Fourier Transform Magnitude New Sequence Original Windowed Sequence Veton Këpuska

Other Methods • Sinusoidal Synthesis—Chapter 9 • Time-warp the sinewave frequency track and the amplitude function • This technique has been successful with not only speech but also music, biological, and mechanical signals • Problems • Does not maintain the original phase relations • Suffer from reverberance Veton Këpuska

Other Methods • Linear Prediction Synthesis • Use Homomorphic and Linear Prediction results to modify the time base • Book briefly mentions this is possible but ran out of time before I could investigate this process more Veton Këpuska

Other Methods • New Techniques • Internet search showed several methods trying to improve on what is out there now • Software • Different software programs that will change speed for you • Adobe Audition is one of the most all encompassing right now Veton Këpuska

Matlab Code-Prepare the Workspace %%%%%%%%%%%%%%%% % Prepare Workspace %%%%%%%%%%%%%%%% close all; clear all; window_size_1 = 200; frame_rate_1 = 100; %Speed to slow down by speed = 2; Veton Këpuska

Matlab Code-Load the Speech Signal %%%%%%%%%%%%%%%% % Load Data File %%%%%%%%%%%%%%%% filename = input('Please enter the file name to be used. '); [sample_data,sample_rate,nbits] = wavread(filename); loop_time = floor(max(size(sample_data))/frame_rate_1); sample_data((max(size(sample_data))):(loop_time+1)* frame_rate_1)=0; Veton Këpuska

Matlab Code-Develop the Window %%%%%%%%%%%%%%%% % Create Windows %%%%%%%%%%%%%%%% % Want windows of 25ms % File sampled at 10,000 samples/sec % Want a window of size 10000 * 25ms(10ms) triangle_30ms = triang(window_size_1); %triangle_30ms = hamming(window_size_1); W0 = sum(triangle_30ms); Veton Këpuska

Matlab Code-Window the Entire Speech Signal %%%%%%%%%%%%%%%% % Window the speech %%%%%%%%%%%%%%%% for i =0:loop_time-1 window_data(:,i+1)=sample_data((frame_rate_1*i)+1:((i+2)* frame_rate_1)).*triangle_30ms; end Veton Këpuska

Matlab Code-Perform the Fast Fourier Transform %%%%%%%%%%%%%%%% % Create FFT %%%%%%%%%%%%%%%% for i = 1:loop_time window_data_fft(:,i) = fft(window_data(:,i),1024); end Veton Këpuska

Matlab Code-Recreate the Modified Signal %%%%%%%%%%%%%%%% % Recreate Original Signal %%%%%%%%%%%%%%%% %Initialize the recreated signals reconstructed_signal(1:(loop_time+1)*frame_rate_1)=0; real_reconstructed_signal(1:(loop_time+1)*frame_rate_1)=0; modified_reconstructed_signal(1:(loop_time+3)*(frame_rate_1/speed))=0; modified_reconstructed_signal_compressed(1:(loop_time+3)* (frame_rate_1/ speed))=0; Veton Këpuska

Matlab Code-Recreate the Modified Signal % Perform the ifft for i = 1:loop_time recreated_data_ifft(:,i) = ifft(window_data_fft(:,i),1024); real_recreated_data_ifft(:,i) = ifft(abs(window_data_fft(:,i)),1024); truncated_recreated_data_ifft(:,i) = recreated_data_ifft(1:window_size_1,i).*(frame_rate_1/W0); real_truncated_recreated_data_ifft(:,i) = real_recreated_data_ifft(1:window_size_1,i).*(frame_rate_1/W0); end Veton Këpuska

Matlab Code-Recreate the Modified Signal % Get back to the original signal for i=0:loop_time-1 reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) = reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + truncated_recreated_data_ifft(:,i+1)'; real_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) = real_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + real_truncated_recreated_data_ifft(:,i+1)'; end Veton Këpuska

Matlab Code-Recreate the Modified Signal % Get a modified signal by deleting certain parts (STFT) for i=0:(loop_time-1)/speed modified_reconstructed_signal((frame_rate_1*i)+1:((i+2)* frame_rate_1)) = modified_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + real_truncated_recreated_data_ifft(:,i*speed+1)'; end Veton Këpuska

Matlab Code-Recreate the Modified Signal % Initialize the compressed sequence (STFTM) modified_reconstructed_signal_compressed(1:frame_rate_1+frame_rate_1/speed+1)=truncated_recreated_data_ifft(frame_rate_1-frame_rate_1/speed:window_size_1,1)'; % Get a modified signal by compressing for i=0:(loop_time-2) modified_reconstructed_signal_compressed((frame_rate_1/speed*i)+1:(frame_rate_1/speed*i)+window_size_1) = modified_reconstructed_signal_compressed((frame_rate_1/speed*i)+1:(frame_rate_1/speed*i)+window_size_1) + real_truncated_recreated_data_ifft(:,i+2)'; end Veton Këpuska

Matlab Code-Plot Results %%%%%%%%%%%%%%%% % Plot Results %%%%%%%%%%%%%%%% Figure; subplot(211) plot(sample_data) title('Original Speech'); v1=axis; hold on; subplot(212) plot(real(modified_reconstructed_signal)) title(['STFT Synthesis w/ Speed = ',num2str(speed),'X']); v2=axis; if speed > 1 subplot(211); axis(v1) subplot(212); axis(v1) else subplot(211); axis(v2) subplot(212); axis(v2) end Veton Këpuska

Microcomputer Systems 2