a system for hybridizing vocal performance l.
Skip this Video
Loading SlideShow in 5 Seconds..
A System for Hybridizing Vocal Performance PowerPoint Presentation
Download Presentation
A System for Hybridizing Vocal Performance

Loading in 2 Seconds...

play fullscreen
1 / 35

A System for Hybridizing Vocal Performance - PowerPoint PPT Presentation

  • Uploaded on

A System for Hybridizing Vocal Performance. By Kim Hang Lau. Parameters of the singing voice . Parameters of the singing voice can be loosely classified as: Timbre Pitch contour Time contour (rhythm) Amplitude envelope (projections). Vocal Modification.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'A System for Hybridizing Vocal Performance' - Roberta

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
parameters of the singing voice
Parameters of the singing voice
  • Parameters of the singing voice can be loosely classified as:
    • Timbre
    • Pitch contour
    • Time contour (rhythm)
    • Amplitude envelope (projections)
vocal modification
Vocal Modification
  • Vocal modification refers to the signal processing of live or recorded singing to achieve a different inflection and/or timbre
  • Commercially available units include
    • Intonation corrector
    • Pitch/formant processor
    • Harmonizer
    • Vocoder
  • Prototype a system for vocal modification
  • Modify a source vocal sample to match the time evolution, pitch contour and amplitude envelope of a similarly sung, target vocal sample
  • Simulates a transfer of singing techniques from a target vocalist to a source vocalist – thus a hybridizing vocal performance
order of presentation
Order of Presentation
  • System Overview
  • Individual components
  • System evaluation
  • System limitations
  • Conclusions and recommendations
system overview
System Overview
  • Three components
    • Pitch-marking
    • Time-alignment
    • Time/pitch/amplitude modification engine
  • Inspired by Verhelst’s prototype system for the post-synchronization of speech utterances
pitch marking and glottal closure instants gcis






Pitch-marking and Glottal Closure Instants (GCIs)
  • Information generated from pitch-marking
    • Pitch period
    • Amplitude envelope
    • Voiced/unvoiced segment boundaries
pitch marking applying dyadic wavelet transform dywt
Pitch-marking applying Dyadic Wavelet Transform (DyWT)
  • Kadambe adapted Mallat’s algorithm for edge detection in image signal to the detection of GCIs in speech signal
  • He assumed the correlation between edges in image signal and GCIs in speech signal
  • DyWT computation for dyadic scales 2^3 to 2^5 was sufficient for pitch-marking
  • If a particular peak detected in DyWT matches for two consecutive scales, starting from a lower scale, that time-instant is taken as a GCI



Original Signal







the proposed pitch marking scheme
The proposed pitch-marking scheme
  • Detection principle
    • Detection of the scale that contains the fundamental period
    • Starting from a higher scale (of lower frequency), there is a considerable jump in frame power when this scale is encountered
  • Features
    • 4X decimation to support high sampling rates
    • Frame based processing and error correction for possible quasi-real-time detection
time pitch amplitude modification engine





Time/pitch/amplitude modification engine

(n): time-modification factor (n): pitch-modification factor

(n): amplitude modification factor D(n): time-warping function

td psola time domain pitch synchronous overlap add
TD-PSOLA(Time-domain Pitch Synchronous Overlap-Add)
  • Time-domain splicing overlap-add method
  • Used in prosodic modification of speech
evaluation of the modification engine
Evaluation of the modification engine




time alignment
  • Based on Verhelst’s prototye system that applies Dynamic Time Warping (DTW)
  • He claimed that the basic local constrain produces the most accurate time-warping path
  • Exponential increase in computation as length of comparison increases
  • Accuracy deteriorates as length of comparison increases
adaptations from verhelst s method
Adaptations from Verhelst’s method
  • Proposed to perform time-alignment on a voiced/unvoiced segmental basis
    • DTW for voiced segments
    • Linear Time Warping (LTW) for unvoiced segments
  • Global constraints are introduced to further reduce computations
  • Synchronization of voiced/unvoiced segments are required, which is manually edited in current implementation
manipulation of modification parameters
Manipulation of modification parameters
  • Simple smoothing of (n), (n) using linear phase FIR low-pass filters are performed before feeding them to the modification engine
system limitations
System Limitations
  • Segmentation
    • Lack of a reliable technique for voiced/unvoiced segmentation
    • Segmentation and classification of different vocal sounds is the key to devise rules for modification
  • Modification engine
    • Lack capabilities to handle pitch transition, total dependence to the pitch-marking stage
system limitations27
System Limitations
  • Pitch-marking
    • Proposed system lacks robustness
    • Despite desirable time-response of the wavelet filter bank, its frequency response is not capable of isolating harmonics effectively and efficiently
  • Time-alignment
    • The DTW basic local constraint allows infinite time expansion and compression.
    • This factor often causes distortions in the synthesized vocal sample
conclusions and recommendations
Conclusions and Recommendations
  • Current systems works well for slow and continuous singing
  • Further improvements on the individual components are recommended to handle greater dynamic changes of the vocal signal, thereby extending the current good results to a wider range of singing styles