A system for hybridizing vocal performance
1 / 35

A System for Hybridizing Vocal Performance - PowerPoint PPT Presentation

  • Updated On :

A System for Hybridizing Vocal Performance. By Kim Hang Lau. Parameters of the singing voice . Parameters of the singing voice can be loosely classified as: Timbre Pitch contour Time contour (rhythm) Amplitude envelope (projections). Vocal Modification.

Related searches for A System for Hybridizing Vocal Performance

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'A System for Hybridizing Vocal Performance' - Roberta

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Parameters of the singing voice l.jpg
Parameters of the singing voice

  • Parameters of the singing voice can be loosely classified as:

    • Timbre

    • Pitch contour

    • Time contour (rhythm)

    • Amplitude envelope (projections)

Vocal modification l.jpg
Vocal Modification

  • Vocal modification refers to the signal processing of live or recorded singing to achieve a different inflection and/or timbre

  • Commercially available units include

    • Intonation corrector

    • Pitch/formant processor

    • Harmonizer

    • Vocoder

Objectives l.jpg

  • Prototype a system for vocal modification

  • Modify a source vocal sample to match the time evolution, pitch contour and amplitude envelope of a similarly sung, target vocal sample

  • Simulates a transfer of singing techniques from a target vocalist to a source vocalist – thus a hybridizing vocal performance

Order of presentation l.jpg
Order of Presentation

  • System Overview

  • Individual components

  • System evaluation

  • System limitations

  • Conclusions and recommendations

System overview l.jpg
System Overview

  • Three components

    • Pitch-marking

    • Time-alignment

    • Time/pitch/amplitude modification engine

  • Inspired by Verhelst’s prototype system for the post-synchronization of speech utterances

Component no 1 pitch marking l.jpg
Component No.1Pitch-marking

Pitch marking and glottal closure instants gcis l.jpg






Pitch-marking and Glottal Closure Instants (GCIs)

  • Information generated from pitch-marking

    • Pitch period

    • Amplitude envelope

    • Voiced/unvoiced segment boundaries

Pitch marking applying dyadic wavelet transform dywt l.jpg
Pitch-marking applying Dyadic Wavelet Transform (DyWT)

  • Kadambe adapted Mallat’s algorithm for edge detection in image signal to the detection of GCIs in speech signal

  • He assumed the correlation between edges in image signal and GCIs in speech signal

  • DyWT computation for dyadic scales 2^3 to 2^5 was sufficient for pitch-marking

  • If a particular peak detected in DyWT matches for two consecutive scales, starting from a lower scale, that time-instant is taken as a GCI

Slide11 l.jpg



Original Signal







The proposed pitch marking scheme l.jpg
The proposed pitch-marking scheme

  • Detection principle

    • Detection of the scale that contains the fundamental period

    • Starting from a higher scale (of lower frequency), there is a considerable jump in frame power when this scale is encountered

  • Features

    • 4X decimation to support high sampling rates

    • Frame based processing and error correction for possible quasi-real-time detection

Comparisons of results with auto tune l.jpg
Comparisons of results with Auto-Tune

Proposed system


Component no 2 the modification engine l.jpg
Component No.2The Modification Engine

Time pitch amplitude modification engine l.jpg





Time/pitch/amplitude modification engine

(n): time-modification factor (n): pitch-modification factor

(n): amplitude modification factor D(n): time-warping function

Td psola time domain pitch synchronous overlap add l.jpg
TD-PSOLA(Time-domain Pitch Synchronous Overlap-Add)

  • Time-domain splicing overlap-add method

  • Used in prosodic modification of speech

Evaluation of the modification engine l.jpg
Evaluation of the modification engine




Component no 3 time alignment l.jpg
Component No.3Time-alignment

Time alignment l.jpg

  • Based on Verhelst’s prototye system that applies Dynamic Time Warping (DTW)

  • He claimed that the basic local constrain produces the most accurate time-warping path

  • Exponential increase in computation as length of comparison increases

  • Accuracy deteriorates as length of comparison increases

Adaptations from verhelst s method l.jpg
Adaptations from Verhelst’s method

  • Proposed to perform time-alignment on a voiced/unvoiced segmental basis

    • DTW for voiced segments

    • Linear Time Warping (LTW) for unvoiced segments

  • Global constraints are introduced to further reduce computations

  • Synchronization of voiced/unvoiced segments are required, which is manually edited in current implementation

Manipulation of modification parameters l.jpg
Manipulation of modification parameters

  • Simple smoothing of (n), (n) using linear phase FIR low-pass filters are performed before feeding them to the modification engine

System limitations l.jpg
System Limitations

  • Segmentation

    • Lack of a reliable technique for voiced/unvoiced segmentation

    • Segmentation and classification of different vocal sounds is the key to devise rules for modification

  • Modification engine

    • Lack capabilities to handle pitch transition, total dependence to the pitch-marking stage

System limitations27 l.jpg
System Limitations

  • Pitch-marking

    • Proposed system lacks robustness

    • Despite desirable time-response of the wavelet filter bank, its frequency response is not capable of isolating harmonics effectively and efficiently

  • Time-alignment

    • The DTW basic local constraint allows infinite time expansion and compression.

    • This factor often causes distortions in the synthesized vocal sample

Conclusions and recommendations l.jpg
Conclusions and Recommendations

  • Current systems works well for slow and continuous singing

  • Further improvements on the individual components are recommended to handle greater dynamic changes of the vocal signal, thereby extending the current good results to a wider range of singing styles

Slide29 l.jpg