260 likes | 352 Views
This study introduces a novel method to reduce missing data costs in travel behavior modeling, with wide applications and minimal impact on predictive performance. The approach is detailed with examples and experimental evaluations.
E N D
Attribute Constrained Rules:A new approach for missing traveler data Chad A. Williams† Peter C. Nelson Abolfazl (Kouros) Mohammadian University of Illinois at Chicago Department of Computer Science Colloquium July 16th, 2009 † Supported by the NSF IGERT program under Grant DGE-0549489
Abstract • Address reducing participant burden in travel behavior modeling • New approach introduced that greatly reduces cost of missing data • Opportunity to reduce data collected with limited impact on predictive performance • Review wider application implications
Outline • Overview and problem context • Approach • Illustrative example • Experimental evaluation • Summary and next steps
Problem space • Predicting sets within a sequence • Introduce technique for handling missing data • Minimal impact to predictive performance with even large quantities of data missing • Benefits • Able to tolerate more problematic data sources • Reduce data requirements • Opportunity to reduce communication needs
Applications • Reduce user burden associated with data collection • Travel surveys • Applications such as intelligent travelers assistant (ITA) • Reduce communication costs • Mobile networks • Sensor networks
Implications for ITA Map to sequence of sets • Activity • Start time • Length of Activity • Location • Activity • People involved • When planned • Time flexibility • Location flexibility • Accessibility • Travel • Start time • Travel time • Mode of travel • Trip distance • People involved • When mode planned Time
Problem statement Partially labeled sequence completion • Given database of sequences of sets • The attributes/labels and possible discrete values within each set are known • The value of any attribute within any sequence can be missing • Given a target set and the surrounding sequence predict the missing value(s) {a1,b2,c1}{a2,?}{a1,b1}
Related work • Lots of effort on mining frequent sequences and rules efficiently • Techniques for associative mining with missing values but don’t extend well to sequences Original DB VDB X1 VDB X2
Related work (cont.) • Regular expression rule mining • Recent work has started to examine tailoring rule form to benefit specific classes of problems {a1,b2,c1}{a2}{a1,b1}→ {a1,b2,c1}{a2,b2,c1}{a1,b1} {a1,b2,c1}{a2,*}{a1,b1}→ {a1,b2,c1}{a2,b2,c1}{a1,b1}
Design • Introduce new rule form for attribute constrained sequences with missing values • Constrained rule similar to a template • Focus on attribute presence as well as values • Allows for more accurate confidence & support using knowledge of problem
Attribute constrained rule form • Introduce new rule form for attribute constrained partially labeled sequences • Allows for more accurate confidence & support using knowledge of problem {a1,b2,c1}{a2,?}{a1,b1} Traditional sequential rule form {a1,b2,c1}{a2}{a1,b1}→ {a1,b2,c1}{a2,b2}{a1,b1} {a1,b2,c1}{a2}{a1,b1}→ {a1,b2,c1}{a2,c1}{a1,b1} Attribute constrained rule (ACR) form {a1,b2,c1}{a2,b:*}{a1,b1}→ {a1,b2,c1}{a2,b2}{a1,b1}
Example: Frequent sequence graph Ø [4] {a1}{a2,b2}{b1} {a1} [4] {a2} [4] {b1} [4] {b2} [3] {a1}{a2,b2}{a2,b1} {a1}{b2,c2}{a2}{b1} {a1}{a2} [4] {a1}{ b1} [4] {a1}{ b2} [3] {a2}{b1} [4] {b2}{ a2} [2] {b2}{ b1} [3] {a2b2} [2] {a1}{a2,c1} {b1} Example sequence DB Traditional sequential rules: <{a1}{a2}{b1}> <{a1}{a2,b2}{b1}> [support = 2/4, confidence = 2/4] {a1}{a2}{b1} [4] {a1}{b2}{a2} [2] {a1}{b2}{b1} [3] {a1}{a2b2} [2] {a2b2}{b1} [2] {a1}{a2b2}{b1} [2]
ACR extended frequent sequence graph • ACR templates added to the tree Ø [4] Level 0 {a:*} [4] {b:*} [4] {b:*}{b1} [3] {b2}{b:*} [3] Level 1 {a1} [4] {a2} [4] {b1} [4] {b2} [3] {a:*b2} [2] {a2b:*} [2] {a:*b:*} [2] {b2}{ b1} [3] {a2b2} [2]
ACR extended frequent sequence graph • ACR templates added to the tree • Additions - O(#freq. patterns * 2min(freq. pattern len., set length)) Ø [4] {a:*} [4] {b:*} [4] Level 0 {a:*}{a2} [4] {a1}{a:*} [4] {a1}{b:*} [4] {a:*}{b1} [4] {a:*}{b2} [3] Level 1 {a1} [4] {a2} [4] {b1} [4] {b2} [3] {b2}{a:*} [2] {b:*}{a2} [2] {b:*}{b1} [3] {b2}{b:*} [3] {a:*b2} [2] {a2b:*} [2] {a:*b:*} [2] {a2}{b:*} [4] Level 2 {a1}{a2}{b:*} [4] {a1}{a:*}{b1} [4] {a:*}{a2}{b1} [4] {a:*b:*}{ b1} [2] {a:*b2}{ b1} [2] {a1}{a2} [4] {a1}{ b1} [4] {a1}{ b2} [3] {a2}{b1} [4] {b2}{ a2} [2] {b2}{ b1} [3] {a2b2} [2] {a1}{b2}{a:*} [2] {a1}{b:*}{a2} [2] {a:*}{b2}{a2} [2] {a1}{b2}{b:*} [3] {a1}{b:*}{b1} [3] {a:*}{a2b2} [2] {a1}{a2b:*} [2] {a1}{a:*b:*} [2] {a1}{a:*b2} [2] {a2b2}{ b:*} [2] {a2b:*}{ b1} [2] {a:*}{b2}{b1} [3] {a1}{a2}{b1} [4] {a1}{b2}{a2} [2] {a1}{b2}{b1} [3] {a1}{a2b2} [2] {a2b2}{b1} [2] {a1}{a2b2}{b:*} [2] {a1}{a2b:*}{b1} [2] {a1}{a:*b2}{b1} [2] Level 3 {a1}{a:*b:*}{b1} [2] {a:*}{a2b2}{b1} [2] {a1}{a2b2}{b1} [2]
Example • Example rules: • Traditional sequential rules: • <{a1}{a2}{b1}> <{a1}{a2,b2}{b1}> • [support = 2/4, confidence = 2/4] • Attribute constrained rules (ACR): • <{a1}{a2,b:*}{b1}> <{a1}{a2,b2}{b1}> • [support = 2/4, confidence = 2/2] {a1}{a2,b2}{b1} {a1}{a2,b2}{a2,b1} {a1}{b2,c2}{a2}{b1} {a1}{a2,c1} {b1} Example sequence DB Support = # of occurrences # of sequences Confidence = # of full sequence # of premise
Evaluation • 2001 Atlanta household travel survey • 21k people, 8k households, 126k places visited, 48 hrs • 49,695 sets of information, avg. seq. length 7.4 sets • Focus 6 attributes (act. type, mode, time, duration) • Results take average over all combinations {a1,b2,c1}{a2,?}{a1,b1} Metrics Precision = # true positives (# true positives + # false positives) Recall = # true positives (# true positives + # false negatives) F-measure = (2 * precision * recall) precision + recall
Performance vs.. percent of values missing Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.
Traditional performance vs.. # of target values missing Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.
ACR performance vs.. traditional performance w.r.t. # of targets missing Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.
ACR performance vs.. traditional performanceRecall Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.
ACR performance vs.. traditional performancePrecision Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.
Summary • Introduced technique for handling missing data • New rule form for entire class of problems • Much better predictions when data is missing than traditional methods • Represents opportunity to reduce number of data points that need to be collected/communicated • Computationally practical
Future work • Study underway to confirm benefits in survey applications • Adapt technique for streaming data sources • Sensor applications
Other related research • Transfer learning of activity patterns • Further data requirement reduction • Reduce learning time • Sequential pattern mining of streams • Current model offline modeling • Enable real-time pattern learning for ITA • Demo of pattern and location learning • Tilted time windows
Information Technology Transportation Questions? IGERT Program in Computational Transportation Science IGERT Integrative Graduate Education and Research Traineeship This research was supported in part by the National Science Foundation IGERT program under Grant DGE-0549489
Additional results Sequential rules C. Williams, A. Mohammadian, P. Nelson, and S. Doherty, “Mining Sequential Association Rules for Traveler Context Prediction” The First International Workshop on Computational Transportation Science (IWCTS’08), Dublin, Ireland, July 2008.