- By
**paul2** - Follow User

- 1877 Views
- Uploaded on

Download Presentation
## An Overview of Pitch Detection Algorithms

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### An Overview of Pitch Detection Algorithms

### Content

### Introduction

### Introduction

### Introduction

### Introduction

### Introduction

### Time Domain

### Time Domain

### Time Domain

### Time Domain

### Time Domain

### Time Domain

### Frequency Domain

### Frequency Domain

### Frequency Domain

### Frequency Domain

### Frequency Domain

### Frequency Domain

### Frequency Domain

### Alternative Technique

### Alternative Technique

### Alternative Technique

### Bibliography

### Bibliography

### Bibliography

Alexandre Savard

MUMT611: Music Information Acquisition, Preservation, and Retrieval

February 2006

Introduction

Classification

Applications

Problems and Constraints

Time Domain Algorithms

Frequency Domain Algorithms

Alternative Techniques

Conclusion

Prior Definitions

Pitch : Defined as the perceptual appreciation of

the highness or the lowness of a sound. It is related

to the periodicity of a sound.

Frequency : Physical attribute of a sound or any

type other of signal. Describes the amount of times

that a repeated event occur per unit of time.

Fundamental Frequency :In a complex sound or

signal, it is the lowest partial.

Application of Pitch Tracking

Music Automatic Transcription from audio signals to

common music notation or to MIDI number

Score Following

Musical Queries by singing or humming

Acoustic feature for Human-Computer Interaction

Sound-Editing Program like pitch-shifting and time-

scaling operation

Non-Exclusive Classification

Voice ( Speech, Singing )

Instrumental

Monophonic

Polyphonic

Time-Based Algorithm

Spectral-Based Algorithm

Alternative

Generally Encountered Problems

Noise

Reverberation

Other Sounds from the environment

Shortness of the sustained part for certain sounds

Sounds need to be analyzed right after the attack

transient where they are not totally stable

Detuning during the sustain part of a sound

Minimal output delay for realtime.

Music-Specific Difficulties

Large frequency range for musical instrument

Many instrumental sound have inharmonic partials

Expressiveness factors ( glissando, vibrato, thrill )

Fast algorithm for real-time processing

Multiphonic

Zero-Crossing Detection

Based on a direct application of the definition of

periodicity

Counting the number of time that the signal crosses a

reference level

Mostly Inexpensive in computation

Weakness against noise

Presents weakness when used to analyze signals with

energy in high frequencies

Autocorrelation Technique

Cross-Correlation is a non-linear operation that

measure the similarity between two signal.

The coresponding samples of a signals and a time-

shifted version of an other one are multiplied and

added toghether.

The Cross-Correlation functionwill then have a peak to

the offset value which coresponds to the maximum of

similarity.

Autocorrelation Technique

Autocorrelation is a cross-correlation of a signal with

itself.

The maximum of similarity occurs for time shifting of

zero.

An other maximum should occur in theory when the

time-shifting of the signal corresponds to the

fundamental period.

Autocorrelation Technique

Not very efficient for high fundamental frequency.

Convolution is a very expensive process.

Computation efficiency can be improved using the FFT

algorithm instead of convolution. It reduces calculation

from N squared to NlogN.

Most of the variation of this technique related to the

mathematical definition of the autocorrelation used, the

way the maximums are localized, and how errors in the

maximum identification are attenuated.

Average Magnitude Difference Function

It is an alternate to Autocorrelation function.

It compute the difference between the signal and a

time-shifted version of itself.

While auttocorelation have peaks at maximum

similarity, there will be valleys in the average

magnitude difference function.

Other Temporal Algorithm

Waveform Maximum Detection

Sum Magnitude Difference Squared Function

Average Squared Difference Function

Cumulative Mean Normalized Difference Function

Circular Average Magnitude Difference Function

Adaptive Filter

Harmonic Product Spectrum

FFT is used to convert temporal representation of

sound into its spectral representation

Assume that all signals are made of harmonic partials

The spectrum is compressed by a factor corresponding

to harmonic numbers

Multiplying the compressed spectrum with the

original one leads to a amplification of the fundamental

frequency

Harmonic Product Spectrum

The highest peak most likely correspond to the

fundamental frequency

http://www-ccrma.stanford.edu/~pdelac/154/m154paper.htm#_ftn5

Harmonic Product Spectrum

Presents a high degree of robustness in a noisy

environment

Less efficient for sounds that are not made from

harmonic components

Computationnally inexpensive

Octave Errors can occur

Cepstrum

Cepstrum is defined as the inverse Fourrier transform

of the logarithm of the power spectrum of a signal

Cepstrum extracts periodicity from the spectrum

It can be unformally mathematically written as:

It results a peak which correspond to the fundamental

period

Calculation of Cepstrum for Voice

In the source filter-model, voiced speech s(t) can be

considered as the convolution of a pulse train p(t) with

the impulse respond of the vocal tract h(t).

In the spectrum we get:

Taking the logarithm on both side we then obtain:

Cepstrum

The logarithim operation flatten the spectra so that so

that it gives more robustness for formants

However this same operation rises the noise level

Other Frequency Domain Algorithm

Maximum Likelihood

Linear Prediction Coding

Spectral Autocorrelation

Teager Energy Function

Referring again to the source-filter model for voice,

it can be represented by a pulse train filtered by the

vocal tract.

The pulse train is produced by the successive opening

and closure of the glottis.

The production of speech is closely related to the

release of energy through the glottis.

The opening/closure of the glottis result in a peak of

energy into the signal

Teager Energy Function

The Teager energy function is a non-linear operator

that defines the instantaneous energy as:

It is derived from the total energy of an oscillatory

spring-mass system.

Estimating the periodicity of energy peaks for the

signal leads to an approximation of the fundamental

frequency.

Miscellaneous Technique

Wavelet Transform

Bayesian Statistical Model

Hidden Markov Model

Graphical probablilistic Models

Perceptual Pitch Detector

Liu B.,Wu Y., L Yi. "Linear Hidden Markov Model for Music Information Retrieval Based on Humming." Paper presented at the International Conference on Acoustics, Speech, and Signal Processing 2003.

Li B., Li Y., Wang C., Tang C., Zhang E. "A New Efficient Pitch-Tracking Algorithm." Paper presented at the International Conference on Robotics, Intelligent Systems and Signal Processing 2003.

Chilton E., Evans B. "The Spectral Autocorrelation Applied to the Linear Prediction Residual of Speech for Robust Pitch Detection." Paper presented at the International Conference on Acoustics, Speech, and Signal Processing 1988.

Monti G., Sandler M. "Monophonic Transcription with Autocorrelation " Paper presented at the Conference on Digital Audio Effects 2000.

Liu J., Zheng T., Deng J. and Wu W. "Real-Time Pitch Tracking Based on Combined Smdsf." Paper presented at the Conference on Speech Communcation and Technology 2005.

Luo H., Denbigh P. "A Speech Separation System That Is Robust to Reverberation." Paper presented at the International Symposium on Speech, Image Processing and Neural Networks 1994.

Wu M., Wang D., Brown G. "A Multi-Pitch Tracking Algorithm for Noisy Speech." Paper presented at the International Conference on Acoustic, Speech, and Signal Processing 2002.

Nazih Abu-Shikhah Mohamed Deriche. "A Novel Pitch Estimation Technique Using the Teager Energy Function." Paper presented at the International Symposium on Signal Processing and its Applications 1999.

Picone J., Doddington G., Secrest B. "Robust Pitch Detection in a Noisy Telephone Environment." Paper presented at the International Conference on Acoustics, Speech, and Signal Processing 1987.

Quast H., Schreiner O., Schroeder R. "Robust Pitch Tracking in the Car Environment." Paper presented at the International Conference on Acoustics, Speech, and Signal Processing 2002.

Marchand S. "An Efficient Pitch-Tracking Algorithm Using a Combination of Fourier Transforms." Paper presented at the Conference on Digital Audio Effects 2001.

Walmsley P., Godsill S., Rayner P. "Polyphonic Pitch Tracking Using Joint Bayesian Estimation of Multiple Frame Parameters." Paper presented at the Workshop on Applications of Signal Processing to Audio and Acoustics 1999.

Zhu W., Kankanhalli M. "Robust and Efficient Pitch Tracking for Query-by-Humming." Paper presented at the Conference on Information, Communications and Signal Processing 2003.

Roads C., “The Computer Music Tutorial”, p.497-533, Boston, The MIT Press,

1996.

Download Presentation

Connecting to Server..