A system for hybridizing vocal performance
Download
1 / 35

A System for Hybridizing Vocal Performance - PowerPoint PPT Presentation


  • 314 Views
  • Updated On :
  • Presentation posted in: Music / Video

A System for Hybridizing Vocal Performance. By Kim Hang Lau. Parameters of the singing voice . Parameters of the singing voice can be loosely classified as: Timbre Pitch contour Time contour (rhythm) Amplitude envelope (projections). Vocal Modification.

Related searches for A System for Hybridizing Vocal Performance

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

A System for Hybridizing Vocal Performance

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A System for Hybridizing Vocal Performance

By Kim Hang Lau


Parameters of the singing voice

  • Parameters of the singing voice can be loosely classified as:

    • Timbre

    • Pitch contour

    • Time contour (rhythm)

    • Amplitude envelope (projections)


Vocal Modification

  • Vocal modification refers to the signal processing of live or recorded singing to achieve a different inflection and/or timbre

  • Commercially available units include

    • Intonation corrector

    • Pitch/formant processor

    • Harmonizer

    • Vocoder


Objectives

  • Prototype a system for vocal modification

  • Modify a source vocal sample to match the time evolution, pitch contour and amplitude envelope of a similarly sung, target vocal sample

  • Simulates a transfer of singing techniques from a target vocalist to a source vocalist – thus a hybridizing vocal performance


Order of Presentation

  • System Overview

  • Individual components

  • System evaluation

  • System limitations

  • Conclusions and recommendations


System Overview

  • Three components

    • Pitch-marking

    • Time-alignment

    • Time/pitch/amplitude modification engine

  • Inspired by Verhelst’s prototype system for the post-synchronization of speech utterances


Targeted System Specifications


Component No.1Pitch-marking


Pitch-marks

P

P’

5ms

5ms

Pitch-marking and Glottal Closure Instants (GCIs)

  • Information generated from pitch-marking

    • Pitch period

    • Amplitude envelope

    • Voiced/unvoiced segment boundaries


Pitch-marking applying Dyadic Wavelet Transform (DyWT)

  • Kadambe adapted Mallat’s algorithm for edge detection in image signal to the detection of GCIs in speech signal

  • He assumed the correlation between edges in image signal and GCIs in speech signal

  • DyWT computation for dyadic scales 2^3 to 2^5 was sufficient for pitch-marking

  • If a particular peak detected in DyWT matches for two consecutive scales, starting from a lower scale, that time-instant is taken as a GCI


Mallat

Kadambe

Original Signal

2^1

2^2

2^3

2^4

2^5

Base-band


The proposed pitch-marking scheme

  • Detection principle

    • Detection of the scale that contains the fundamental period

    • Starting from a higher scale (of lower frequency), there is a considerable jump in frame power when this scale is encountered

  • Features

    • 4X decimation to support high sampling rates

    • Frame based processing and error correction for possible quasi-real-time detection


The proposed pitch-marking system


Comparisons of results with Auto-Tune

Proposed system

Auto-Tune


Component No.2The Modification Engine


(n)

(n)

(n)

D(n)

Time/pitch/amplitude modification engine

(n): time-modification factor (n): pitch-modification factor

(n): amplitude modification factor D(n): time-warping function


TD-PSOLA(Time-domain Pitch Synchronous Overlap-Add)

  • Time-domain splicing overlap-add method

  • Used in prosodic modification of speech


Evaluation of the modification engine

Original

TD-PSOLA

Auto-Tune


Component No.3Time-alignment


Time-alignment

  • Based on Verhelst’s prototye system that applies Dynamic Time Warping (DTW)

  • He claimed that the basic local constrain produces the most accurate time-warping path

  • Exponential increase in computation as length of comparison increases

  • Accuracy deteriorates as length of comparison increases


Adaptations from Verhelst’s method

  • Proposed to perform time-alignment on a voiced/unvoiced segmental basis

    • DTW for voiced segments

    • Linear Time Warping (LTW) for unvoiced segments

  • Global constraints are introduced to further reduce computations

  • Synchronization of voiced/unvoiced segments are required, which is manually edited in current implementation


Manipulation of modification parameters

  • Simple smoothing of (n), (n) using linear phase FIR low-pass filters are performed before feeding them to the modification engine


The Prototype System


System Evaluation: case 1


System Evaluation: case 2


System Limitations

  • Segmentation

    • Lack of a reliable technique for voiced/unvoiced segmentation

    • Segmentation and classification of different vocal sounds is the key to devise rules for modification

  • Modification engine

    • Lack capabilities to handle pitch transition, total dependence to the pitch-marking stage


System Limitations

  • Pitch-marking

    • Proposed system lacks robustness

    • Despite desirable time-response of the wavelet filter bank, its frequency response is not capable of isolating harmonics effectively and efficiently

  • Time-alignment

    • The DTW basic local constraint allows infinite time expansion and compression.

    • This factor often causes distortions in the synthesized vocal sample


Conclusions and Recommendations

  • Current systems works well for slow and continuous singing

  • Further improvements on the individual components are recommended to handle greater dynamic changes of the vocal signal, thereby extending the current good results to a wider range of singing styles


Questions

&

Answers


Wavelet filter bank


Dyadic Spline Wavelet


Wide-band analysis


DTW local constraints


Calculation of pitch-marks


DyWT


ad
  • Login