Speech is more than only its lingvistic content
Download
1 / 32

SPEECH IS MORE THAN ONLY ITS LINGVISTIC CONTENT - PowerPoint PPT Presentation


  • 68 Views
  • Uploaded on

Institute of Informatics of the Slovak Academy of Sciences. SPEECH IS MORE THAN ONLY ITS LINGVISTIC CONTENT. Rusko Milan. Institute of Informatics of the Slovak Academy of Sciences Dubravska cesta 9, 847 05 Bratislava, Slovakia Milan.R [email protected] E xpressive speech.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' SPEECH IS MORE THAN ONLY ITS LINGVISTIC CONTENT' - morela


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Speech is more than only its lingvistic content

Institute of Informatics of the Slovak Academy of Sciences

SPEECH IS MORE THAN ONLY ITS LINGVISTIC CONTENT

Rusko Milan

Institute of Informatics of the Slovak Academy of Sciences

Dubravska cesta 9, 847 05 Bratislava, Slovakia

[email protected]

WIKT 2006


E xpressive speech
Expressive speech

“Expressive speech” designates the whole vocal display of a speaker.

It consists:

Linguistic information part of information that can be encoded in general written text message

Various additional information on the speaker

– age, cultural background, education, sex, attempt, relation to the listener, individuality etc.

(The expression “individuality” is used here to denote personality, mood (attitude) and emotions of a speaker.)

WIKT 2006


E xpressive speech1
Expressive speech

Expresion =>

- - > SPEECH - - >

=>Impression

WIKT 2006


Personality and temperament
Personality (and temperament)

Personality is considered to be a set of constant features of an individual.

Temperament is that aspect of personality that is genetically based, inborn.

  • Ancient Greeks – 2 dimensions of temperament =>4 types of temperament:

  • sanguine type(cheerful and optimistic, pleasant to be with)

  • choleric type (quick, hot temper, often an aggressive nature)

  • phlegmatic type (characterized by slowness, laziness, and dullness)

  • melancholy type(sad, even depressed, pessimistic view of world)

.

WIKT 2006


Generalized model of personality
Generalized model of personality

personality p have n dimensions, and so it can be represented by a following vector (Egges, A., Kshirsagar, S., Magnenat-Thalmann, N.[2]:

.

WIKT 2006


The ocean model the big five model of personality
The OCEAN model„The Big Five“ model ofpersonality

.

WIKT 2006

Five dimensions are enough to express the personality.

The Big Five model also known as OCEAN model takes into account the following five dimensions of personality:

Openness

Consciousness

Extraversion

Agreeableness

Neuroticism

(Digman, J. M[3], McRae, R.R.; John, O.P. [4])


Traditional psychological classification of personality dimensions Five Factor Model[Digman 1990, Mc.Rae, John 1992]


Mood and emotion
Mood and Emotion dimensions

Mood(attitude) can be defined as a rather static state of being, that is less static than personality and less fluent than emotions. Mood can be defined as one-dimensional (e.g. good or bad mood) or perhaps multi-dimensional (feeling in love, being paranoid etc.)

(Ksirsagar&Magnenat-Thalmann[5])

WIKT 2006


Generalized model of emotion
Generalized model of emotion dimensions

An emotional state has a similar structure as personality, but it changes over time.

Defined as an m-dimensional vector, where all m emotion intensities are represented by a value in the interval [0,1] .

The actual emotional state is dependent on the preliminary evolvement of emotins.

A need to model the emotins respecting their previous trends (history).

An emotional state history ωt is defined, that contains all emotional states until et, thus :

WIKT 2006


Generalized model of mood
Generalized model of dimensions mood

Egges continues with defining the individual ITas a triple (p, mt, et), where mt represents the mood of the individual at a time t.

Mood dimension is defined as a value in the interval [-1,1].

k mood dimensions=>the mood can be described as follows:

The mood and emotional values are changing in time

=>Both have to be updated regularly.

WIKT 2006


Basic emotions
Basic emotions dimensions

There are many theories of emotions and many different classifications exist.

This table, taken fromOrtony, A., Turner, T. J. [6] gives a short overview of basic emotion sets used by different authors.

WIKT 2006


Placement on emotion dimensions
Placement on emotion dimensions dimensions

Pleasure

Happy <======> Unhappy

Pleased <======>Annoyed

Satisfied <======>Unsatisfied

Contented <======>Melancholic

Hopeful <======>Despairing

Relaxed <======> Bored

Arousal

Stimulated <======> Relaxed

Excited <======>Calm

Frenzied <======> Sluggish

Jittery <======> Dull

Wide-awake <======>Sleepy

Aroused <======>Unaroused

Dominance

Controlling <======> Controlled

Influential <======>Influenced

In control <======> Cared-for

Important <======> Awed

Dominant <======>Submissive

Autonomous <======> Guided

Semantic differential scales are often used for measuring emotion dimensions.

A Set of dimensions as proposed by Mehrabian & Russell (1974, Appendix B, p. 216)[7].

It is evident that the authors have included moods and personality dimensions in this system too.

WIKT 2006


Acoustic correlates of emotions
Acoustic correlates of emotions dimensions

Problem: speech parameters involved in expression of personality, moods and emotions are shared for all the components of expressivity.

Decoding the expressive speech code is very subjective.

Nevertheless, a general set of the speech parameters responsible for the expression of emotion can be constructed. There are three main categories of speech correlates of emotion:

• Pitch contour

• Timing

• Voice quality

It is believed that value combinations of these speech parameters are used to express vocal emotion.(Schröder M.[8])

WIKT 2006


Pitch contour
Pitch contour dimensions

Pitch contour is a representation of the intonation of an utterance, which describes the nature of accents and the overall pitch range of the utterance.

Pitch is expressed as fundamental frequency (F0).

One of the most frequently used methods for F0 measurement is the method using autocorrelation function of the LP residual.

Parameters include average pitch, pitch range, contour slope, and final lowering.

WIKT 2006


Intonation contour
Intonation contour dimensions

Models of intonation- two main categories:

  • Phonetic

  • Phonological

    The phonetic models (e.g. Fujisaki model, Tilt model, MOMEL and many others) model the intonation curve.

    The phonological model (e.g. ToBI) is used to model the speaker's concept of distribution of accents in the intonational phrase.

WIKT 2006


Automatic intonation contour anal y sis in fujisaki editor
Automatic intonation contour dimensions analysis in Fujisaki editor

WIKT 2006


Pitch contour analysis in praat with tobi labels
Pitch contour dimensions analysis in PRAAT with ToBI labels

WIKT 2006


Timing
Timing dimensions

Timing

  • Speed that an utterance is spoken

  • Rhythm

  • Duration of emphasized syllables

    The results of measurement of syllable andphoneme lengths are often given in a form of z-scores

    (the instantaneous value is normalized be the mean value of the same elements in the whole database.

    Parameters: speech rate, hesitation pauses, exaggeration...

WIKT 2006


Voice quality
Voice quality dimensions

Voice quality denotes the overall ‘character’ of the voice, which includes effects such as whispering, hoarseness, breathiness, and intensity.

The voice quality is influenced mainly by:

function of glottis

function of the vocal tract

A detailed classification scheme was published by Laver [9].

WIKT 2006


Analysis of the glottal function
Analysis of the glottal function dimensions

The analysis of the glottal function is generally done using source-filter model of speech production [10].

The glottal function is obtained from the speech signal by inverse filtering. One of the most efficient inverse filtering methods uses Discrete Linear Prediction – DLP (El-Jaroudi A., Makhoul J., [11])

to obtain the inverse filter coefficients and to filter the speech signal.

The resultant DLP residual function is considered as a representative of aderivative of glottal volume velocity function.

WIKT 2006


Time and spectral domain characteristics of the glottal function
Time dimensions and spectral domain characteristics of the glottal function

Time characteristics

OQ, Open Quotient – ratio of the open phase of the glottal waveform to the period of the pulse.

OQ predicts the values for the amplitudes of the lower harmonics. (increased value of OQ is correlated with an increase in the amplitude of the lower harmonics in the voice spectrum.)

CQ, Closing Quotient – ratio of the closing phase of the glottal pulse to the period of the pulse.

These characteristics has been recently often replaced by AQ – Amplitude quotient and NAQ-Normalized amplitude quotient (Alku [12]).

EE, Excitation Strength – amplitude of the negative peak, calculated after the positive peak. EE is correlated with the overall intensity of the signal. A decrease in EE is correlated with a breathy voice.

RK, Glottal Symmetry/Skew – ratio of the closing phase to the opening phase of the differentiated glottal pulse. RK affects mainly the lower harmonics; the more symmetrical the pulse, the greater their amplitude.

Spectral characteristics

H1-H2– the amplitude of the first harmonic (H1) compared to the amplitude of the second harmonic (H2). An indicator of the relative length of the opening phase of the glottal pulse (Hanson 1997).

H1-A1– the amplitude of the first harmonic (H1) compared to the strongest harmonic in the first formant (A1). Reflects the first formant bandwidth

spectral tilt - Expected to be large and positive for breathy voices and small and/or negative for creaky voices

H1-A2– the amplitude of the first harmonic (H1) compared to the amplitude of the strongest harmonic in the second formant (A2). An indicator of spectral tilt at the mid formant frequencies. Large and positive for breathy voices and small and/or negative for creaky voices.

H1-A3– the amplitude of the first harmonic (H1) compared to the amplitude of the strongest harmonic in the third formant (A3). An indicator of spectral tilt at the higher formant frequencies. Large and positive for breathy voices and small and/or negative for creaky voices.

WIKT 2006


Glottal pulse analysis in aparat
Glottal pulse analysis dimensions in APARAT

WIKT 2006


Analysis of the vocal tract
Analysis of the vocal tract dimensions

Methods of vocal tract shape estimation include x-ray, computer tomography and magnetic resonance methods.

  • stationary sound production only

    .Cheaper and quicker method – computing of the vocal tract shape from the speech signal

    complementary to glottal pulse analysis from the speech signal. (e.g. vocal tract shape computation from LPC derived reflection coefficients).

    - allows for analysis of the dynamic behavior of the articulators. Similar information can be obtained by formant analysis using homomorphic deconvolution (cepstrum) or LPC spectrum analysis.

WIKT 2006


Static a nalysis by synthesis using articulatory synthesizer
Static a dimensions nalysis by synthesis using articulatory synthesizer

(TRACTSYN)

WIKT 2006


Dynamic a nalysis by synthesis articulatory synth tractsyn
Dynamic a dimensions nalysis by synthesis (articulatory synth. TRACTSYN)

WIKT 2006



Vision speech sound mining
Vision: Speech Sound Mining dimensions

Aim: to extract information from supra-segmental and extra-linguistic layers

Where to look for information:

  • time domain a) quantity (lengths of segments)

    b) rhythm

  • frequency domain

    a) long term characteristics

    b) short term characteristics

  • model based characteristics

    a) glottal excitation function b) articulatory model

WIKT 2006


Vision speech sound mining1
Vision: Speech Sound Mining dimensions

How to define a set of speech sound objects?

  • Objective methods of analysis (pattern recognition)

  • Subjective methods (impression of the listener)

    Possible objects:

    Speech sound event

    Speech sound act

    Speech sound gesture

    Speech sound characteristic

    Speech sound characteristic change

WIKT 2006


Vision speech sound mining2
Vision: Speech Sound Mining dimensions

First steps to be accomplished:

  • Speech corpus building

  • Annotation of SSO

  • Boundary markers

  • Frequencies of occurence of SSO

  • Concordances of SSO

  • Correlation among different sets of objects (pitch SSO, accent SSO, rhythmic SSO, timbre SSO, etc.)

  • Semantic representation of SSO

  • Cross cultural semantic analysis

WIKT 2006


Vision speech sound mining3
Vision: Speech Sound Mining dimensions

  • Traditional methods used in NLP and

    data mining will be applicable:

    Bag of words  Bag of SSO

    WordNet  SSO semantic net

    e.t.c.

  • Research on the relation between lingvistic and paralingvistic&extralingvistic information.

  • Creation of a complex (holistic) model of the speech signal as an information carrier in communication.

WIKT 2006


Thank you for your attention
Thank you for your attention dimensions

Milan Rusko

Institute of Informatics

Slovak Academy of Sciences

[email protected]

WIKT 2006


ad