1 / 56

SCALING ACTIVITY DISCOVERY AND RECOGNITION TO LARGE, COMPLEX DATASETS

SCALING ACTIVITY DISCOVERY AND RECOGNITION TO LARGE, COMPLEX DATASETS. Candidate: Parisa Rashidi Advisor: Diane J. Cook. Agenda. Introduction Challenges Solutions Sequence mining Stream mining Transfer Learning Active learning Results Conclusions & future directions. Smart Homes.

frieda
Download Presentation

SCALING ACTIVITY DISCOVERY AND RECOGNITION TO LARGE, COMPLEX DATASETS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SCALING ACTIVITY DISCOVERY AND RECOGNITIONTO LARGE, COMPLEX DATASETS Candidate: Parisa Rashidi Advisor: Diane J. Cook

  2. Agenda • Introduction • Challenges • Solutions • Sequence mining • Stream mining • Transfer Learning • Active learning • Results • Conclusions & future directions

  3. Smart Homes Percepts (sensors) Agent Environment Actions (controllers) Sensors & actuators integrated into everyday objects Knowledge acquisition about inhabitant

  4. Applications • Energy efficiency • Security • Achieving more comfort • Monitoring well-being of residents • In home monitoring • Monitor daily activities • Check for anomalies • Help by giving prompts and cues

  5. Activity Recognition An Activity (Sequence of sensor events) A Sensor Event • A vital component of smart homes • Recognizing activities from stream of sensor events

  6. Agenda • Introduction • Challenges • Solutions • Sequence mining • Stream mining • Transfer learning • Active learning • Results • Conclusions & future directions

  7. Why it is difficult? • Human activity is erratic and complex • Discontinuous (interrupting events) • Step order might vary each time • Inter-subject and intra-subject variability • The algorithm should be scalable • Data annotation • Costly and laborious • Training for each new space?

  8. Unsolved Challenges • Many methods proposed • Hidden Markov models, conditional random fields, naïve Bayes, … • Current methods • Consider many simplifying assumptions • Mostly are supervised • Data annotation problem • Even if unsupervised • Trained for each new setting from scratch • Ignore activity variations or interruptions • …

  9. Agenda • Introduction • Challenges • Solutions • Sequence mining • Stream mining • Transfer learning • Active learning • Results • Conclusions & future directions

  10. Our Solutions • Discovering complex activities • Sequence mining • Discovery activities from stream • Stream sequence mining • Transferring activity models to new spaces • Transfer learning • Guiding activity annotation • Active learning

  11. Agenda • Introduction • Challenges • Solutions • Sequence mining • Stream mining • Transfer learning • Active learning • Results • Conclusions & future directions

  12. Sequence Mining AGCTACCCGTTTA • Sequence • Ordered set of items • Examples • Speech: sequence of phonemes • DNA sequence: AAGCTACGTAA • Network: sequence of packets • Our data: sequence of sensor events • Goal • Finding repetitive sequential patterns in data • Many methods proposed • GSP, PrefixSpan, SPADE, …

  13. Activity Sequence Mining Problem No boundaries ! Item-set boundary • Data: a single sequence with no boundaries • Unlike transaction data • We are looking for activity sequence patterns • With discontinuous steps • Variations of the same activity

  14. From Sequence Mining to Activity Recognition • Find activity patterns • Discontinuous Varied Sequence Mining (DVSM) • Continuous, varied Order, Multi Threshold (COM) • Cluster similar patterns • Cluster centroid is a representative activity. • Recognize activities • Hidden Markov Model

  15. DVSM Pattern Instances {a,b,q} {b,x,a} {a,u,b} <a,b> General Pattern Compression Continuity • Finds general patterns/variations in several iteration • During each iteration • Finds increasing length patterns • Extend by prefix and suffix at each iteration • Checks if it is a variation of a general pattern • At the end of each iteration • Retain only interesting patterns according to MDL principle

  16. DVSM • Continuity • Pattern  Variations  Instances  Events • Prunes patterns/variations with low compression values • Highly discontinuous • Infrequent • Prunes non-maximal patterns • Prune irrelevant variations using mutual information and sensor

  17. Improve DVSM: COM • Different sensor frequencies for • Different regions of home • Different types of sensor • “Rare item problem” • A global min-support doesn’t work! • Use multiple support thresholds

  18. Clustering • Grouping similar objects together • There are many different clustering methods • Partition based (k-Means) • Hierarchal (CURE) • Density based (DBSCAN) • Model based (EM)

  19. Similarity Measure = Start Time Similarity + Total Similarity Duration Similarity + Structure Similarity + Location Similarity • How similarity is determined? • Our activity similarity measure

  20. Day Day Time Time Room Room Activity n Activity n Time t Time t+1 Activity Recognition DBN HMM X X Y Y Time t Time t+1 • Basically a sequence classification problem • Different than ordinary classification problems • Variable length records • Order • Probabilistic methods are the most widely used • Markov chains • Hidden Markov models • Dynamic Bayesian Networks • Conditional random fields

  21. Hidden Markov Model • A statistical model • Markovianproperty • A number of observed & hidden variables • Their transition probabilities • We automatically build HMM from cluster centroids

  22. Agenda • Introduction • Challenges • Solutions • Sequence mining • Stream mining • Transfer learning • Active learning • Results • Conclusions & future directions

  23. Stream Mining …0100101111101111… • Many emerging applications • IP network traffic • Scientific data • Process data as it arrives • We cannot store all data • One pass • Approximate and randomization answers • E.g. relaxed support threshold • Some proposed methods • Frequent itemset mining • Lossy counting [Manku 2002], SpaceSaving algorithm [Metwally 2005], … • Frequent sequence mining • SPEED algorithm [Raissi 2005], ..

  24. Tilted Time Model Month day hour *C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu, Mining Frequent Patterns in Data Streams at Multiple Time Granularities. MIT Press, 2003, ch. 3. • Uses a set of time-tilted windows to keep frequency of items • Finer details for more recent time frame • Coarser details for older time frames • Shifting history into older time frames as data arrives

  25. Tilted Time Model • Minimum support: σ • Maximum support error: ε • An itemset can be • Frequent • Sub-frequent • Infrequent • Pruning itemsets (tail pruning)

  26. StreamCOM • Extending COM into a stream mining method • Using tilted time model

  27. Agenda • Introduction • Challenges • Solutions • Sequence mining • Stream mining • Transfer learning • Active learning • Results • Conclusions & future directions

  28. Transfer Learning Traditional ML Transfer Learning test items training items test items training items • Apply skills learned in previous tasks to novel tasks • Chess  Checkers • Math  CS

  29. Why in Smart Homes? • Why transfer learning? • Supervised methods • Requires annotation • Unsupervised methods • Requires lots of data

  30. Our Transfer Learning Solutions • Activity Transfer • Transfer from one resident to another • Different residents, space layouts, sensors • Transfer from a single physical source to a target • Transfer from multiple physical source to a target • Domain selection

  31. Multi Home Transfer Learning (MHTL) • Find activity models in both spaces • Source: extract activity model • Target: location based mining, incremental clustering • Activity consolidation, sensor selection • Map activity models from source to target • Map Sensors • Map activities • Map Labels • Use labels for recognition!

  32. MHTL Architecture

  33. Domain Selection Some animals are more equal ... George Orwell – Animal Farm • Our previous works • Assumed “all sources are equal” • Not all sources are equal • Some sources are more equal! • Select top N sources • Efficiency: do not use all sources • Accuracy: negative transfer effect

  34. Domain Similarity How to measure difference between two distributions?

  35. Domain Similarity • Conventional similarity measures • KullbeckLeibler divergence (KL), Jensen Shannon divergence (JSD), L1 or Lp norms • Kiferet al [2004] proposed H distance • Later Ben David et al [2007] proved that • It is exactly the problem of minimizing the empirical risk of a classifier that discriminates between instances drawn from the two domain!

  36. Demonstration of H Distance H-distance: 0.1, small! *Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. In NIPS, 2007.

  37. Agenda • Introduction • Challenges • Solutions • Sequence mining • Stream mining • Transfer learning • Active learning • Results • Conclusions & future directions

  38. Active Learning • The learning algorithm can query for the label of a point • Ask the oracle! • Proposed methods • Uncertainty sampling, committee based, …

  39. A Problem! vs. “What is the class label if (sex= female) and (age =39) and (chest pain type =3) and (serum cholesterol = 150.2 mg/dL) and (fasting blood sugar = 150 mg/dL)... and (electrocardiographic result = 1) and (maximum heart rate achieved = 126) and (exercise induced angina = 90) and (heart old peak = 2.3) and (number of major vessels colored by fluoroscopy = 3)? ” “What is the class label if (age > 65) and (chest pain type = 3) and (serum cholesterol > 240 mg/dL) ?” • Traditional active learning methods • Ask overly specific queries

  40. Template Based Queries Select the most informative instances Select friends (+) and enemies (-) = Δ Select relevant and weakly relevant features in Δ Build a template query using relevant and weakly relevant features

  41. RIQY RIQY: Rule Induced active learning QuerYmethod Select the most informative instances Select friends (+) and enemies (-) = Δ Use rule induction to build generic queries

  42. Agenda • Introduction • Challenges • Solutions • Sequence mining • Stream mining • Transfer learning • Active learning • Results • Conclusions & future directions

  43. Can we discover activities? DVSM vs. COM

  44. Activity Discovery Confusion matrix for various activities in apartment 1

  45. Some Discovered Patterns

  46. StreamCOM Taking medication activity

  47. Transferring Activities

  48. Transferring Activities

  49. What about active learning? Kyoto smart apartment dataset -CASAS Wisconsin breast cancer dataset -UCI repository

  50. Conclusions • Two novel sequence mining methods • DVSM • COM • A novel stream data mining method • StreamCOM • A couple of transfer learning methods • Between residents • Between one/multiple smart homes • Source selection • Two novel active learning methods • Template based active learning • RIQY

More Related