Framework for inferring ongoing activities of workstation users
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

Framework for Inferring Ongoing Activities of Workstation Users PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on
  • Presentation posted in: General

Framework for Inferring Ongoing Activities of Workstation Users. Yifen Huang, Sophie Wang and Tom Mitchell School of Computer Science Carnegie Mellon University. Activity Example: Learned Activity Frame from TM email corpus [1448 msgs, Feb 2004]. ActivityCluster4 (105 emails)

Download Presentation

Framework for Inferring Ongoing Activities of Workstation Users

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Framework for inferring ongoing activities of workstation users

Framework for Inferring Ongoing Activities of Workstation Users

Yifen Huang, Sophie Wang and Tom MitchellSchool of Computer ScienceCarnegie Mellon University


Activity example learned activity frame from tm email corpus 1448 msgs feb 2004

Activity Example: Learned Activity Framefrom TM email corpus [1448 msgs, Feb 2004]

  • ActivityCluster4 (105 emails)

  • Keywords: CALO, TFC, SRI, examples, heads, labeled, Leslie, HMM, contacts, email, task, estimates, zero, reschedule, baseline, Rebecca

  • PrimarySenders:Mitchell(39), Kaelbling(7), McCallum(6), Perrault(4),

  • UserActivityFraction: 105/1448=.072 of total emails

  • IntensityOfUserInvolvement: created 37% of traffic; (default 31%)

  • ExtractedNames: Leslie(23), Rebecca(21), Carlos(12), Ray(10), Stuart(9), William(9), April(9), …

  • ExtractedDates: Wed(39), Tues(33), Fri(25), Mon(23), Thurs(20),… Feb 18 (16)

  • ExtractedTimes: 5pm(24), noon(14), morning(8), 8am(7), before 5pm(7),...

  • RequestEmails: <emailA>, <emailB>, …


Framework for inferring ongoing activities of workstation users

Activity Example: Learned Activity Framefrom TM email corpus [1448 msgs, Feb 2004]

  • ActivityCluster5 (105 emails)

  • Keywords: CALO, TFC, SRI, examples, heads, labeled, Leslie, HMM, contacts, email, task, estimates, zero, reschedule, baseline, Rebecca

  • PrimarySenders:Mitchell(39), Kaelbling(7), McCallum(6), Perrault(4),

  • UserActivityFraction: 105/1448=.072 of total email

  • IntensityOfUserInvolvement: created 37% of traffic; (default 31%)

  • ExtractedNames: Leslie(23), Rebecca(21), Carlos(12), Ray(10), Stuart(9), William(9), April(9), …

  • ExtractedDates: Wed(39), Tues(33), Fri(25), Mon(23), Thurs(20),… Feb 18 (16)

  • ExtractedTimes: 5pm(24), noon(14), morning(8), 8am(7), before 5pm(7),...

  • RequestEmails: <emailA>, <emailB>, …


Activity example learned activity frame from tm email corpus 1448 msgs feb 20041

Activity Example: Learned Activity Framefrom TM email corpus [1448 msgs, Feb 2004]

  • ActivityCluster5 (105 emails)

  • Keywords: CALO, TFC, SRI, examples, heads, labeled, Leslie, HMM, contacts, email, task, estimates, zero, reschedule, baseline, Rebecca

  • PrimarySenders:Mitchell(39), Kaelbling(7), McCallum(6), Perrault(4),

  • UserActivityFraction: 105/1448=.072 of total email

  • IntensityOfUserInvolvement: created 37% of traffic; (default 31%)

  • ExtractedNames: Leslie(23), Rebecca(21), Carlos(12), Ray(10), Stuart(9), William(9), April(9), …

  • ExtractedDates: Wed(39), Tues(33), Fri(25), Mon(23), Thurs(20),… Feb 18 (16)

  • ExtractedTimes: 5pm(24), noon(14), morning(8), 8am(7), before 5pm(7),...

  • RequestEmails: <emailA>, <emailB>, …


Activity example learned activity frame from tm email corpus 1448 msgs feb 20042

Activity Example: Learned Activity Framefrom TM email corpus [1448 msgs, Feb 2004]

  • ActivityCluster5 (105 emails)

  • Keywords: CALO, TFC, SRI, examples, heads, labeled, Leslie, HMM, contacts, email, task, estimates, zero, reschedule, baseline, Rebecca

  • PrimarySenders:Mitchell(39), Kaelbling(7), McCallum(6), Perrault(4),

  • UserActivityFraction: 105/1448=.072 of total email

  • IntensityOfUserInvolvement: created 37% of traffic; (default 31%)

  • ExtractedNames: Leslie(23), Rebecca(21), Carlos(12), Ray(10), Stuart(9), William(9), April(9), …

  • ExtractedDates: Wed(39), Tues(33), Fri(25), Mon(23), Thurs(20),… Feb 18 (16)

  • ExtractedTimes: 5pm(24), noon(14), morning(8), 8am(7), before 5pm(7),...

  • RequestEmails: <emailA>, <emailB>, …

I need to get to DARPA by COB tomorrow a list of CALO participants who need access to the IPTO booth. It seems to me we should ask for this for any of you who is likely to be there. Could you let me know asap if you *might* be there? No big deal if you end up not going.

THanks, --r


Content

Content

  • Inferring on-going activities by clustering, social network filtering and information extraction

  • Getting information from the whole workstation

  • Accepting user’s feedback

  • Future work


Framework for inferring ongoing activities of workstation users

Inferring Activities

Using Emails

Activity clustersand descriptions

Clustering

Social network filtering

Information extraction


Unsupervised learning of activities

Unsupervised Learning of Activities

  • Cluster emails

    • (Text) We use multi-nomial Naïve Bayes model and refine clusters by applying EM algorithm,

      • Represent email by bag of words in subject and body

    • (Socialnetwork) Subdivide each cluster based on graph of email co-recipients

      • Make each cliqueofco-recipients a subcluster

  • For each cluster, extractinformation from the email text and headers


Framework for inferring ongoing activities of workstation users

Web Activity

Directories

Calendar

Email

To: Bill@ cmu.edu

Subj: fMRI meeting

We need to meet soon to discuss the paper deadline.

To: Sue @ cmu.edu

Subj: Re: fMRI meeting

Ok, I suggest Wednesday at 4pm.

fMRI paper writing

People: Sue, Bill

Document: <fileptr>

Meetings: Aug 24,

Emails: 1423, 1644,

Leader: Bill

Deadline: Jan 15

To: Bill@ cmu.edu

Subj: Re: fMRI meeting

See you then. Attached is the current draft.


Framework for inferring ongoing activities of workstation users

fMRI paper writing

People: Sue, Bill

Document: <fileptr>

Meetings: Aug 24,

Emails: 1423, 1644,

Leader: Bill

Deadline: Jan 15

Web Activity

Directories

Calendar

Email

To: Bill@ cmu.edu

Subj: fMRI meeting

We need to meet soon to discuss the paper deadline.

To: Sue @ cmu.edu

Subj: Re: fMRI meeting

Ok, I suggest Wednesday at 4pm.

To: Bill@ cmu.edu

Subj: Re: fMRI meeting

See you then. Attached is the current draft.


Getting information from the whole workstation

Getting Information fromthe Whole Workstation

  • Bag of word features for any queries using Google desktop search

  • We can produce feature vectors for meetings,person names, and project keywords.

    • Cluster initialization using project keywords

    • Co-clustering meetings and emails

    • Inferring any queries to activities


Framework for inferring ongoing activities of workstation users

Cluster Initialization Using Bag of Features of Project Keywordsfrom YH email corpus [623 msgs, 2004]

DI: an improved version of random initialization (0.46)

GI: bag of features from Google desktop search for user-provided keywords (0.44)


Content1

Content

  • Inferring on-going activities by clustering, social network filtering and information extraction

  • Getting information from the whole workstation

  • Accepting user’s feedback

  • Future work


Framework for inferring ongoing activities of workstation users

Collecting User’s Feedback


Speclustering model split specific topics from general topics

X

W

β

S

ξ

π

G

N

M

Speclustering Modelsplit specific topics from general topics

  • Each document has a cluster label S.

  • For each word in a document, there is a hidden variable X to indicate the word is generated by the cluster specific topic S or by the general topic G.

  • 3. Parameters can be estimated using the EM algorithm.

Activity


Em modification with user s feedback

X

W

β

S

ξ

π

G

N

M

EM Modification with User’s Feedback

  • Email-cluster association

    • Re-assign posterior probability p(cluster|email) according to user’s approval or disapproval.

  • Keyword-cluster association

    • Re-assign if the keyword is confirmed by the user and if the keyword is removed by the user.


Folder reconstruction accuracy using speclustering algorithm

Folder Reconstruction Accuracy Using Speclustering Algorithm

accuracy

Iteration

149 feedback entries(76 keyword-cluster pairs, and 73 email-cluster pairs)


Future work

Future Work

  • Jointly cluster meetings, people, files and other interesting entities.

    • preliminary results of jointly cluster emails and meetings

      • Found good match between emails and meetings

      • Didn’t visibly improve cluster quality

  • Allow richer user feedback.

  • Move from bag of features to structural data.


  • Login