Innovative Approach for Missing Traveler Data Management

Attribute Constrained Rules:A new approach for missing traveler data Chad A. Williams† Peter C. Nelson Abolfazl (Kouros) Mohammadian University of Illinois at Chicago Department of Computer Science Colloquium July 16th, 2009 † Supported by the NSF IGERT program under Grant DGE-0549489

Abstract • Address reducing participant burden in travel behavior modeling • New approach introduced that greatly reduces cost of missing data • Opportunity to reduce data collected with limited impact on predictive performance • Review wider application implications

Outline • Overview and problem context • Approach • Illustrative example • Experimental evaluation • Summary and next steps

Problem space • Predicting sets within a sequence • Introduce technique for handling missing data • Minimal impact to predictive performance with even large quantities of data missing • Benefits • Able to tolerate more problematic data sources • Reduce data requirements • Opportunity to reduce communication needs

Applications • Reduce user burden associated with data collection • Travel surveys • Applications such as intelligent travelers assistant (ITA) • Reduce communication costs • Mobile networks • Sensor networks

Implications for ITA Map to sequence of sets • Activity • Start time • Length of Activity • Location • Activity • People involved • When planned • Time flexibility • Location flexibility • Accessibility • Travel • Start time • Travel time • Mode of travel • Trip distance • People involved • When mode planned Time

Problem statement Partially labeled sequence completion • Given database of sequences of sets • The attributes/labels and possible discrete values within each set are known • The value of any attribute within any sequence can be missing • Given a target set and the surrounding sequence predict the missing value(s) {a1,b2,c1}{a2,?}{a1,b1}

Related work • Lots of effort on mining frequent sequences and rules efficiently • Techniques for associative mining with missing values but don’t extend well to sequences Original DB VDB X1 VDB X2

Related work (cont.) • Regular expression rule mining • Recent work has started to examine tailoring rule form to benefit specific classes of problems {a1,b2,c1}{a2}{a1,b1}→ {a1,b2,c1}{a2,b2,c1}{a1,b1} {a1,b2,c1}{a2,*}{a1,b1}→ {a1,b2,c1}{a2,b2,c1}{a1,b1}

Design • Introduce new rule form for attribute constrained sequences with missing values • Constrained rule similar to a template • Focus on attribute presence as well as values • Allows for more accurate confidence & support using knowledge of problem

Attribute constrained rule form • Introduce new rule form for attribute constrained partially labeled sequences • Allows for more accurate confidence & support using knowledge of problem {a1,b2,c1}{a2,?}{a1,b1} Traditional sequential rule form {a1,b2,c1}{a2}{a1,b1}→ {a1,b2,c1}{a2,b2}{a1,b1} {a1,b2,c1}{a2}{a1,b1}→ {a1,b2,c1}{a2,c1}{a1,b1} Attribute constrained rule (ACR) form {a1,b2,c1}{a2,b:*}{a1,b1}→ {a1,b2,c1}{a2,b2}{a1,b1}

Example: Frequent sequence graph Ø [4] {a1}{a2,b2}{b1} {a1} [4] {a2} [4] {b1} [4] {b2} [3] {a1}{a2,b2}{a2,b1} {a1}{b2,c2}{a2}{b1} {a1}{a2} [4] {a1}{ b1} [4] {a1}{ b2} [3] {a2}{b1} [4] {b2}{ a2} [2] {b2}{ b1} [3] {a2b2} [2] {a1}{a2,c1} {b1} Example sequence DB Traditional sequential rules: <{a1}{a2}{b1}>  <{a1}{a2,b2}{b1}> [support = 2/4, confidence = 2/4] {a1}{a2}{b1} [4] {a1}{b2}{a2} [2] {a1}{b2}{b1} [3] {a1}{a2b2} [2] {a2b2}{b1} [2] {a1}{a2b2}{b1} [2]

ACR extended frequent sequence graph • ACR templates added to the tree Ø [4] Level 0 {a:*} [4] {b:*} [4] {b:*}{b1} [3] {b2}{b:*} [3] Level 1 {a1} [4] {a2} [4] {b1} [4] {b2} [3] {a:*b2} [2] {a2b:*} [2] {a:*b:*} [2] {b2}{ b1} [3] {a2b2} [2]

ACR extended frequent sequence graph • ACR templates added to the tree • Additions - O(#freq. patterns * 2min(freq. pattern len., set length)) Ø [4] {a:*} [4] {b:*} [4] Level 0 {a:*}{a2} [4] {a1}{a:*} [4] {a1}{b:*} [4] {a:*}{b1} [4] {a:*}{b2} [3] Level 1 {a1} [4] {a2} [4] {b1} [4] {b2} [3] {b2}{a:*} [2] {b:*}{a2} [2] {b:*}{b1} [3] {b2}{b:*} [3] {a:*b2} [2] {a2b:*} [2] {a:*b:*} [2] {a2}{b:*} [4] Level 2 {a1}{a2}{b:*} [4] {a1}{a:*}{b1} [4] {a:*}{a2}{b1} [4] {a:*b:*}{ b1} [2] {a:*b2}{ b1} [2] {a1}{a2} [4] {a1}{ b1} [4] {a1}{ b2} [3] {a2}{b1} [4] {b2}{ a2} [2] {b2}{ b1} [3] {a2b2} [2] {a1}{b2}{a:*} [2] {a1}{b:*}{a2} [2] {a:*}{b2}{a2} [2] {a1}{b2}{b:*} [3] {a1}{b:*}{b1} [3] {a:*}{a2b2} [2] {a1}{a2b:*} [2] {a1}{a:*b:*} [2] {a1}{a:*b2} [2] {a2b2}{ b:*} [2] {a2b:*}{ b1} [2] {a:*}{b2}{b1} [3] {a1}{a2}{b1} [4] {a1}{b2}{a2} [2] {a1}{b2}{b1} [3] {a1}{a2b2} [2] {a2b2}{b1} [2] {a1}{a2b2}{b:*} [2] {a1}{a2b:*}{b1} [2] {a1}{a:*b2}{b1} [2] Level 3 {a1}{a:*b:*}{b1} [2] {a:*}{a2b2}{b1} [2] {a1}{a2b2}{b1} [2]

Example • Example rules: • Traditional sequential rules: • <{a1}{a2}{b1}>  <{a1}{a2,b2}{b1}> • [support = 2/4, confidence = 2/4] • Attribute constrained rules (ACR): • <{a1}{a2,b:*}{b1}>  <{a1}{a2,b2}{b1}> • [support = 2/4, confidence = 2/2] {a1}{a2,b2}{b1} {a1}{a2,b2}{a2,b1} {a1}{b2,c2}{a2}{b1} {a1}{a2,c1} {b1} Example sequence DB Support = # of occurrences # of sequences Confidence = # of full sequence # of premise

Evaluation • 2001 Atlanta household travel survey • 21k people, 8k households, 126k places visited, 48 hrs • 49,695 sets of information, avg. seq. length 7.4 sets • Focus 6 attributes (act. type, mode, time, duration) • Results take average over all combinations {a1,b2,c1}{a2,?}{a1,b1} Metrics Precision = # true positives (# true positives + # false positives) Recall = # true positives (# true positives + # false negatives) F-measure = (2 * precision * recall) precision + recall

Performance vs.. percent of values missing Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

Traditional performance vs.. # of target values missing Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

ACR performance vs.. traditional performance w.r.t. # of targets missing Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

ACR performance vs.. traditional performanceRecall Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

ACR performance vs.. traditional performancePrecision Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

Summary • Introduced technique for handling missing data • New rule form for entire class of problems • Much better predictions when data is missing than traditional methods • Represents opportunity to reduce number of data points that need to be collected/communicated • Computationally practical

Future work • Study underway to confirm benefits in survey applications • Adapt technique for streaming data sources • Sensor applications

Other related research • Transfer learning of activity patterns • Further data requirement reduction • Reduce learning time • Sequential pattern mining of streams • Current model offline modeling • Enable real-time pattern learning for ITA • Demo of pattern and location learning • Tilted time windows

Information Technology Transportation Questions? IGERT Program in Computational Transportation Science IGERT Integrative Graduate Education and Research Traineeship This research was supported in part by the National Science Foundation IGERT program under Grant DGE-0549489

Additional results Sequential rules C. Williams, A. Mohammadian, P. Nelson, and S. Doherty, “Mining Sequential Association Rules for Traveler Context Prediction” The First International Workshop on Computational Transportation Science (IWCTS’08), Dublin, Ireland, July 2008.

Innovative Approach for Missing Traveler Data Management

Innovative Approach for Missing Traveler Data Management

Presentation Transcript

Analyzing Patterns of Missing Data

The Missing Gator of Gumbo Limbo.

Final Exam Revision 4

Data Mining Association Rules: Advanced Concepts and Algorithms

NFHS RULES

GIS Lecture 5 Importing Spatial and Attribute Data

AMCS/CS 340: Data Mining

Logic and Rules

What’s Missing for Effective Stem Cell Therapies Practical Clinical Approach

Logic and Rules

Efficient Algorithms for Imputation of Missing SNP Genotype Data

Preparing the Traveler

Data Mining: Data

Association Rules Mining with SQL

Ciphertext-Policy Attribute-Based Encryption (CP-ABE)

21 CFR Part 11

Convex Relaxations of Non-Convex Mixed Integer Quadratically Constrained Problems

Convex Relaxations of Non-Convex Mixed Integer Quadratically Constrained Problems

Improving Data Recovery From Embedded Networked Sensing Systems

CPMP/EWP/1776/99: PtC on Missing Data

The Control Chart for Attributes