learning relationships from conversational patterns l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Learning Relationships from Conversational Patterns PowerPoint Presentation
Download Presentation
Learning Relationships from Conversational Patterns

Loading in 2 Seconds...

play fullscreen
1 / 36

Learning Relationships from Conversational Patterns - PowerPoint PPT Presentation


  • 197 Views
  • Uploaded on

Learning Relationships from Conversational Patterns. Tanzeem Choudhury+ and Sumit Basu* +MIT Media Lab, Intel Research (Seattle) *MIT EECS/Media Lab, Microsoft Research (Redmond). From Conversations to Relationships. Modeling Relationships via Conversations:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Learning Relationships from Conversational Patterns' - dreama


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
learning relationships from conversational patterns

Learning Relationships from Conversational Patterns

Tanzeem Choudhury+ and Sumit Basu*

+MIT Media Lab, Intel Research (Seattle)

*MIT EECS/Media Lab, Microsoft Research (Redmond)

from conversations to relationships
From Conversations to Relationships
  • Modeling Relationships via Conversations:
    • conversations are a key part of our interactions
    • contextual features: how often we have conversations, who we have them with
    • behavioral features: how we act during a converation
    • can we use these features to model relationships?
  • Our Approach:
    • robust, unobtrusive sensing method: detect conversations and extract conversational features as in S. Basu, Conversational Scene Analysis
    • probabilistic learning techniques that model prominence in the network and effects of individuals’ dynamics on interaction dynamics.
the sociometer
The Sociometer
  • Measurements:
  • Face-to-face proximity (sampling rate 17Hz – sensor IR)
  • Speech information (8KHz - microphone)
  • Motioninformation (50Hz - accelerometer)

Factors that contribute towards the wearability of a wearable:

Shape, size, attachment, weight, movement, aesthetics

People involved in the hardware and design: Brian Clarkson, Rich DeVaul, Vadim Gerasimov, Josh Weaver

the experiment
The Experiment
  • 23 subjects wore sociometers for 2 weeks – 6 hours everyday
  • 66 hours of data per subject: total 1518 hours of interaction data
  • 4 different groups distributed throughout the lab
aims of the experiment
Aims of the Experiment
  • We want to identify:
    • When people are facing each other
    • When two people are conversing regardless of what they are saying
  • From that analyze:
    • The communication patterns within the community
    • Various social network properties
    • Individual turn-taking style
    • How people influence each other’s turn-taking style ?
auditory features
Auditory Features
  • Why these features: (Scherer et al., 1972)
    • These features are sufficient to recover emotional content

Speech Segments

Voicing Segments

Speaking Rate

Pitch Track

Energy

Spectrogram of Telephone Speech (8 kHz, 8 bit)

what the microphone gets
What the Microphone Gets
  • Close talking, 16 kHz, quiet surroundings
  • Mic 6” away, 8 kHz, noisy environment
modeling the dynamics of speech

St-1

St+1

St

Vt+1

Vt

Vt-1

Ot-1

Ot+1

Ot

Modeling the Dynamics of Speech
  • Transitions between Voiced/Unvoiced (V/UV)
    • Consistent within speech despite features
    • Different transitions for speech/non-speech
  • The “linked” HMM: (Saul and Jordan ’95)

Speech vs. non-speech

Voiced vs. unvoiced

Observations

HMM

LHMM

lhmm computational complexity
LHMM: Computational Complexity
  • Cliques of 3:
  • Cost of Exact Inference (per timestep)
    • LHMM:
    • HMM :
    • For binary states: 36 vs. 12 operations

St-1

St

St+1

Vt+1

Vt

Vt-1

Vt+1

Vt

Vt-1

Ot-1

Ot+1

Ot

Ot-1

Ot+1

Ot

HMM

LHMM

features spectral entropy
Features: Spectral Entropy
  • Spectral Entropy
  • Higher values => spectrum is more “random”

Voiced:

Unvoiced:

features noisy autocorrelation
Features: Noisy Autocorrelation
  • Normalized Autocorrelation:
    • Reject banded energy by adding noise to s[n]
    • Use max peak, number of peaks

before

periodic noise

after

performance
Performance
  • Example
  • Versus HMM (–14 dB of noise, not shown)

LHMM

HMM

performance noise
Performance: Noise

Speech/Non-Speech Error

Voiced/Unvoiced Error

< 4% error @ 0dB

< 2% error @ 10dB

SSNR (dB)

SSNR (dB)

performance distance from microphone
Performance: Distance from Microphone

Speech/Non-Speech Error

Voiced/Unvoiced Error

< 10% error @ 20 ft

< 10% error @ 20 ft

Distance from Mic (feet)

Distance from Mic (feet)

more features regularized energy
More Features: Regularized Energy

-13dB

+20dB

RawEnergy

Regularized

Energy

estimating speaking rate
Estimating Speaking Rate
  • Productive Segments: following (Pfau and Ruske 1998)
    • But: only measure within speech segments

Reading passage in 21 seconds

Articulation Rate (segs/sec)

Passage length (seconds)

Reading passage in 46 seconds

speaker segmentation
Speaker Segmentation
  • Regularize Energy Ratio over voicing segments
    • 6” from mic, 2’ of separation => 4:1 mixing ratio
    • Regularized log energy ratio:

Regularized energy

1

2

Segmentation with raw energy (-15 dB)

Raw energy

(still using V/UV)

1

2

Segmentation with reg. energy (-15 dB)

Segmentation performance in noise

segmenting speakers real world
Segmenting Speakers: “Real World”
  • Two subjects wearing sociometers
    • 4 feet of separation, 6 feet from interfering speaker

mic

Reg

Reg

Raw

(still using V/UV)

Raw

(still using V/UV)

ROC: One Sociometer (1 mic)

ROC: Two Sociometers (2 mics)

finding conversations
Finding Conversations
  • Consider two voice segment streams
    • How tightly synchronized are they?
    • Alignment measure based on Mutual Information

1.6 seconds

16 seconds

2.5 minutes

30 minutes

k = 7500 (2 minutes)

how well does it work
How Well Does It Work?
  • Callhome (telephone) conversations
    • Data: 5 hours of conversational speech
    • Performance in noise (two-minute segments):

SSNR Values

-- : 20 dB

O : -12.7dB

V : -14.6 dB

+ : -17.2 dB

* : -20.7 dB

Top: PD=0.992

PFA = 0.0075

why does it work so well
Why Does It Work So Well?
  • Voicing segs: pseudorandom bit sequence
    • The conversational partner is a noisy complement

aligned

random

how about on mixed streams
How About On Mixed Streams?
  • Sociometer Data
    • PERFECT!! (PD=1.00, PFA=0.00) with 15 seconds
    • BUT…

aligned

accuracy of real world interaction data
Accuracy of Real-World Interaction Data
  • Low consistency across subjects using survey data:
    • Both acknowledge having conversation 54%
          • Both acknowledge having same number of conversations 29%
          • Per conversation analysis not possible – one survey per day
  • We thus evaluated the performance of our algorithms against
  • hand-labeled data: 4 subjects labeled 2 days worth of data each
  • Data was labeled in 5 minute chunks
interaction matrix conversations
Interaction Matrix (Conversations)

Each row corresponds to a different person. The color value indicates, for each subject, the proportion of their total interactions that they have with each of the other subjects

social network
Social Network

Based on multi-dimensional scaling of geodesic distances

effects of distance
Effects of Distance

Probability of Interaction

X-axis label Distance

0 office mates

1 1-2 offices away

2 3-5 offices away

3 offices on the same floor

4 offices separated by a floor

5 offices separated by two floor

Distance

within cross group interactions
Within/Cross Group Interactions

Fraction of Interaction

identifying prominent people
Identifying Prominent People

Betweenness centrality:based on how often one lies in between other individuals in the network

gjk are the number of geodesics linking two actors and all the geodesics are equally likely to be chosen. If individual i is involved in gjk(ni) geodesics between j and k, the betweenness for i is calculated as follows:

Individuals with high betweenness play a role in keeping the community connected and removing someone who has high betweenness can result in isolated subgroups.

A measure to estimate how much control an individual has over the interaction of other individuals who are not directly connected

betweenness centrality of participants
Betweenness Centrality of Participants

Betweenness centrality of individuals in the interaction network

beyond overall network characteristics exploring the dynamics of interaction

Probability of person giving up turn to conversation partner

Probability of person holding turn

Probability of conversation partner giving up turn

Probability of conversation partner holding turn

Beyond Overall Network Characteristics:Exploring the Dynamics of Interaction

Moving from who to how

turn taking matrix
Turn-taking Matrix
  • Person A converses with a given conversation partner. We can estimate:
    • Turn-taking matrix for A
    • And turn-taking matrix for the partner

Probability of person giving up turn to conversation partner

Probability of person

holding turn

Probability of conversation partner holding turn

Probability of conversation partner giving up turn

Probability of conversation partner giving up turn

Probability of person giving up turn to conversation partner

Probability of person

holding turn

Probability of conversation partner holding turn

Person A’s turn-taking behavior

Partner’s turn-taking behavior

mixture of speaker dynamics

aBB

aAA

aAB

Person A

Person B

Mixture of Speaker Dynamics

When two people interact do they affect each other’s interaction style?

If they do, how do we model the effect?

aBA

Person A

A’s “average-self”

B’s “average partner”

Person B

B’s “average-self”

A’s “average partner”

does mixing speaker dynamics lead to a better model
Does Mixing Speaker Dynamics Lead to a Better Model?

Using eighty different conversations

Average conversation duration 5 minutes

  • KL divergence between true model vs. average speaker model and true model vs. mixture model
      • KL divergence reduced by 32%
  • The mixture model is a statistically significantly better model (F-test, p<0.0001)
who are the people with large a values
Who are the people with large avalues?

Influence values calculated for a subset of users: people who

interacted with at least 4 different people more than once.

correlating influence values with centrality scores
Correlating Influence Valueswith Centrality Scores

Correlation: 0.90

p<0.0004

This opens the possibility that a person’s style during one-on-one conversations maybe indicative of the person’s overall prominence in the network.

Betweenness centrality indices best measure which individuals in the network are most frequently viewed as leaders (Freeman, L.C., Roeder, D., and Mulholland, R.R., Centrality in social networks: II. Experimental results. Social Networks, 1980. 2: p. 119-141.)

future work
Future Work
  • Can we find quantitative effects of other aspects of conversational behavior on conversational partners in terms of pitch, speaking rate, and dominance as well as turn taking?
  • Can we make finer distinctions between individual relationships – family members vs. friends, etc.?
  • Can we infer classes of relationships in an unsupervised manner?

Talking to father

Talking to mother

daughter

daughter

father

mother

conversation between two parents and their daughter showing differences in dominance, style