1 / 16

ESTIMATION OF A PHYSICAL MODEL OF THE VOCAL FOLDS VIA DYNAMIC PROGRAMMING TECHINQUES

ESTIMATION OF A PHYSICAL MODEL OF THE VOCAL FOLDS VIA DYNAMIC PROGRAMMING TECHINQUES. E. Marchetto 1 , F. Avanzini 1 , and C. Drioli 2 1 Dept. of Information Engineering, University of Padova, Italy 2 Dept. of Computer Science, University of Verona, Italy. MAVEBA 2007 Firenze, 13-15 Dec. 2007.

lindsay
Download Presentation

ESTIMATION OF A PHYSICAL MODEL OF THE VOCAL FOLDS VIA DYNAMIC PROGRAMMING TECHINQUES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ESTIMATION OF A PHYSICAL MODEL OF THE VOCAL FOLDS VIA DYNAMIC PROGRAMMING TECHINQUES E. Marchetto1, F. Avanzini1, and C. Drioli2 1Dept. of Information Engineering, University of Padova, Italy 2Dept. of Computer Science, University of Verona, Italy MAVEBA 2007 Firenze, 13-15 Dec. 2007

  2. Summary • The physical model and its control • A codebook between articulatory vectors and acoustical vectors • The inverse problem and its codebook • Non-univocity issue, cost function and dynamic programming • Applications of the RBFNs by clustering • Results with a resynthesis example • Conclusions

  3. The physical model • We refer to the two-mass vocal folds model presented in [1] • One-dimensional, quasi-stationary and incompressible flow • Time-varying separation point • Vocal tract modeled as an inertive load [2]

  4. Control of the physical model • Low-level physical parameters are not independently controlled by a speaker: more physiologically motivated control spaces are needed. • In [4] a set of rules, derived from [3], was used to control a two-mass model. • The rules link vocal fold geometry to the activation levels of three muscles: cricothyroid , thyroarytenoid and cricoarytenoid . We also consider the subglottal pressure . • Values normalized in [0-1], except in [0.5-1.5]kPa. The physical model is completely controlled by a set of only four articulatory parameters:

  5. Articulatory vector Acoustical vector The direct codebook • The glottal pulse is characterized by means of a set of well-known acoustic parameters: • Foundamental frequency (F0) • Open, Speed, Return Quotients (OQ, SQ, RQ) • Normalized Amplitude Quotient (NAQ) • Direct codebook as a Dictionary: • Articulatory vectors are the keys • Acoustical vector are the values • Only one value for each key

  6. The direct codebook • Large number of numerical simu-lations of the two-mass model (about 100k) • 86125 vectors in the codebook • The figure shows the distributions of the acoustical parameters

  7. The inverse problem • Given a glottal flow we want to estimate the articulatory vectors which, used as input to the simulator, lead to a re-synthesis of the given glottal flow • The problem is in principle non-unique • We build an inverse codebook • Each acoustical vector is associated to one or more articulatory vectors • How to tackle the non-uniqueness problem during the inverse lookup process? Dynamic programming techniques

  8. Dynamic programming • Rather than work on single vectors, we sub-divide the acoustical input sequence in frames • In each frame we find the optimal sequence of articulatory vectors by minimizing a cost function • Three terms: • Acoustical distance between input vector and its discretized companion in the codebook • Articulatory effort: distance between each consecutive articulatory vector in the output sequence • Accumulation term: provides a way to find the global minimum for the entire frame, but causes exponential complexity

  9. Dynamic programming • We have N acoustical vectors in the frame, each associated with Vk possible articulatory vectors • Lookup process in brief (for each frame): • Forward: Compute the cost function for each path • Backward: Minimize the cost function and choose the optimal output sequence for the frame • Dynamic Prog. cuts down the complexity from expo-nential to polynomial • Exploiting the optimal sub-structure we are able to store many values instead of recalculate them

  10. Radial Basis Function Networks • A way to interpolate the articulatory space • The input vectors are rarely present in the codebook • The output can only be the nearest approximation • We apply the RBFNs to interpolate from the acoustical space to the articulatory one[5]: • RBFNs are defined for functions, not for multi-maps Need to overcome the non-uniqueness

  11. Clusters and subclusters Cluster Acoustical space • The algorithm avoids the non-uniqueness problem • Subdivide the acoustical space in clusters • Associate to each cluster one or more subclusters in the articulatory space Subcluster Articulatory space • Subclusters are built joining the nearest vectors • Find a sort of hyperplanes in the articulatory space and put together the nearest vectors • Create as many subclusters as are necessary to put every non-unique vector in a different subcluster

  12. Results • We apply the descripted techniques to a complete resynthesis example • The process in brief: • Starting from a recorded utterance, we estimate the glottal flow and characterize it by means of the acoustical parameters before descripted • The obtained vectors are used as input for dynamic programming and eventually RBFNs • The output articulatory vectors drive the numerical simulator, which outputs a full synthetic flow • Filtering the obtained flow with tempo-variant formants (from recorded utterance) we are able to obtain the resynthetized speech

  13. Without RBFNs With RBFNs Results / Articulatory vectors Legend • About 160 vectors retrieved by dynamic programming. Notice the smoothness of the RBFNs vectors.

  14. Without RBFNs Reference Results / Acoustical vectors Legend • Comparison between the reference (input) acoustical vectors and the ones obtained by a look-up in the direct codebook using the vectors of the previous slide as keys With RBFNs

  15. Conclusions • We develop an effective approach to cope with the inverse problem, with reference to the glottal source • The cost function seems to adequately model the physiological facts • RBFNs have proved as a good tool in this context, but some work remains to be done (weights determination and other peculiarities) • The resynthesis is perceptually good • Also the time-varying vectors are almost well followed • Usually NAQ is followed with good accuracy • We recall the relation between NAQ and voice quality

  16. References • [1] N. J. C. Lous, G. C. J. Hofmans, R. N. J. Veldhuis, and A. Hirschberg, “A symmetrical two-mass vocal-fold model coupled to vocal tract and trachea, with application to prothesis design”, Acta Acustica united with Acustica, vol. 84 pp. 1135-1150, 1998 • [2] I. R. Titze and B. H. Story, “Acoustic interactions of the voice source with the lower vocal tract”, J. Acoust. Soc. Am., vol. 101(4) pp. 2234-2243, Apr. 1997 • [3] -, “Rules for controlling low-dimensional vocal fold models with muscle activation”, J. Acoust. Soc. Am., vol. 112(3) pp. 1064-1027, Sep. 2002 • [4] F. Avanzini, S. Maratea and C. Drioli, “Physiological control of low-dimensional glottal models with applications to voice-source parameter matching”, Acta Acustica united with Acustica, vol. 92 suppl. 1 pp. 731-740, Aug. 2002 • [5] T. Poggio and F. Girosi, “Networks for approximation and learning”, Proceedings of the IEEE, vol. 78(9) pp.1481-1497, Sep. 1990

More Related