1 / 26

Attribute Constrained Rules: A new approach for missing traveler data

Attribute Constrained Rules: A new approach for missing traveler data. Chad A. Williams † Peter C. Nelson Abolfazl (Kouros) Mohammadian University of Illinois at Chicago Department of Computer Science Colloquium July 16th, 2009. † Supported by the NSF IGERT program under Grant DGE-0549489.

byron
Download Presentation

Attribute Constrained Rules: A new approach for missing traveler data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Attribute Constrained Rules:A new approach for missing traveler data Chad A. Williams† Peter C. Nelson Abolfazl (Kouros) Mohammadian University of Illinois at Chicago Department of Computer Science Colloquium July 16th, 2009 † Supported by the NSF IGERT program under Grant DGE-0549489

  2. Abstract • Address reducing participant burden in travel behavior modeling • New approach introduced that greatly reduces cost of missing data • Opportunity to reduce data collected with limited impact on predictive performance • Review wider application implications

  3. Outline • Overview and problem context • Approach • Illustrative example • Experimental evaluation • Summary and next steps

  4. Problem space • Predicting sets within a sequence • Introduce technique for handling missing data • Minimal impact to predictive performance with even large quantities of data missing • Benefits • Able to tolerate more problematic data sources • Reduce data requirements • Opportunity to reduce communication needs

  5. Applications • Reduce user burden associated with data collection • Travel surveys • Applications such as intelligent travelers assistant (ITA) • Reduce communication costs • Mobile networks • Sensor networks

  6. Implications for ITA Map to sequence of sets • Activity • Start time • Length of Activity • Location • Activity • People involved • When planned • Time flexibility • Location flexibility • Accessibility • Travel • Start time • Travel time • Mode of travel • Trip distance • People involved • When mode planned Time

  7. Problem statement Partially labeled sequence completion • Given database of sequences of sets • The attributes/labels and possible discrete values within each set are known • The value of any attribute within any sequence can be missing • Given a target set and the surrounding sequence predict the missing value(s) {a1,b2,c1}{a2,?}{a1,b1}

  8. Related work • Lots of effort on mining frequent sequences and rules efficiently • Techniques for associative mining with missing values but don’t extend well to sequences Original DB VDB X1 VDB X2

  9. Related work (cont.) • Regular expression rule mining • Recent work has started to examine tailoring rule form to benefit specific classes of problems {a1,b2,c1}{a2}{a1,b1}→ {a1,b2,c1}{a2,b2,c1}{a1,b1} {a1,b2,c1}{a2,*}{a1,b1}→ {a1,b2,c1}{a2,b2,c1}{a1,b1}

  10. Design • Introduce new rule form for attribute constrained sequences with missing values • Constrained rule similar to a template • Focus on attribute presence as well as values • Allows for more accurate confidence & support using knowledge of problem

  11. Attribute constrained rule form • Introduce new rule form for attribute constrained partially labeled sequences • Allows for more accurate confidence & support using knowledge of problem {a1,b2,c1}{a2,?}{a1,b1} Traditional sequential rule form {a1,b2,c1}{a2}{a1,b1}→ {a1,b2,c1}{a2,b2}{a1,b1} {a1,b2,c1}{a2}{a1,b1}→ {a1,b2,c1}{a2,c1}{a1,b1} Attribute constrained rule (ACR) form {a1,b2,c1}{a2,b:*}{a1,b1}→ {a1,b2,c1}{a2,b2}{a1,b1}

  12. Example: Frequent sequence graph Ø [4] {a1}{a2,b2}{b1} {a1} [4] {a2} [4] {b1} [4] {b2} [3] {a1}{a2,b2}{a2,b1} {a1}{b2,c2}{a2}{b1} {a1}{a2} [4] {a1}{ b1} [4] {a1}{ b2} [3] {a2}{b1} [4] {b2}{ a2} [2] {b2}{ b1} [3] {a2b2} [2] {a1}{a2,c1} {b1} Example sequence DB Traditional sequential rules: <{a1}{a2}{b1}>  <{a1}{a2,b2}{b1}> [support = 2/4, confidence = 2/4] {a1}{a2}{b1} [4] {a1}{b2}{a2} [2] {a1}{b2}{b1} [3] {a1}{a2b2} [2] {a2b2}{b1} [2] {a1}{a2b2}{b1} [2]

  13. ACR extended frequent sequence graph • ACR templates added to the tree Ø [4] Level 0 {a:*} [4] {b:*} [4] {b:*}{b1} [3] {b2}{b:*} [3] Level 1 {a1} [4] {a2} [4] {b1} [4] {b2} [3] {a:*b2} [2] {a2b:*} [2] {a:*b:*} [2] {b2}{ b1} [3] {a2b2} [2]

  14. ACR extended frequent sequence graph • ACR templates added to the tree • Additions - O(#freq. patterns * 2min(freq. pattern len., set length)) Ø [4] {a:*} [4] {b:*} [4] Level 0 {a:*}{a2} [4] {a1}{a:*} [4] {a1}{b:*} [4] {a:*}{b1} [4] {a:*}{b2} [3] Level 1 {a1} [4] {a2} [4] {b1} [4] {b2} [3] {b2}{a:*} [2] {b:*}{a2} [2] {b:*}{b1} [3] {b2}{b:*} [3] {a:*b2} [2] {a2b:*} [2] {a:*b:*} [2] {a2}{b:*} [4] Level 2 {a1}{a2}{b:*} [4] {a1}{a:*}{b1} [4] {a:*}{a2}{b1} [4] {a:*b:*}{ b1} [2] {a:*b2}{ b1} [2] {a1}{a2} [4] {a1}{ b1} [4] {a1}{ b2} [3] {a2}{b1} [4] {b2}{ a2} [2] {b2}{ b1} [3] {a2b2} [2] {a1}{b2}{a:*} [2] {a1}{b:*}{a2} [2] {a:*}{b2}{a2} [2] {a1}{b2}{b:*} [3] {a1}{b:*}{b1} [3] {a:*}{a2b2} [2] {a1}{a2b:*} [2] {a1}{a:*b:*} [2] {a1}{a:*b2} [2] {a2b2}{ b:*} [2] {a2b:*}{ b1} [2] {a:*}{b2}{b1} [3] {a1}{a2}{b1} [4] {a1}{b2}{a2} [2] {a1}{b2}{b1} [3] {a1}{a2b2} [2] {a2b2}{b1} [2] {a1}{a2b2}{b:*} [2] {a1}{a2b:*}{b1} [2] {a1}{a:*b2}{b1} [2] Level 3 {a1}{a:*b:*}{b1} [2] {a:*}{a2b2}{b1} [2] {a1}{a2b2}{b1} [2]

  15. Example • Example rules: • Traditional sequential rules: • <{a1}{a2}{b1}>  <{a1}{a2,b2}{b1}> • [support = 2/4, confidence = 2/4] • Attribute constrained rules (ACR): • <{a1}{a2,b:*}{b1}>  <{a1}{a2,b2}{b1}> • [support = 2/4, confidence = 2/2] {a1}{a2,b2}{b1} {a1}{a2,b2}{a2,b1} {a1}{b2,c2}{a2}{b1} {a1}{a2,c1} {b1} Example sequence DB Support = # of occurrences # of sequences Confidence = # of full sequence # of premise

  16. Evaluation • 2001 Atlanta household travel survey • 21k people, 8k households, 126k places visited, 48 hrs • 49,695 sets of information, avg. seq. length 7.4 sets • Focus 6 attributes (act. type, mode, time, duration) • Results take average over all combinations {a1,b2,c1}{a2,?}{a1,b1} Metrics Precision = # true positives (# true positives + # false positives) Recall = # true positives (# true positives + # false negatives) F-measure = (2 * precision * recall) precision + recall

  17. Performance vs.. percent of values missing Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

  18. Traditional performance vs.. # of target values missing Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

  19. ACR performance vs.. traditional performance w.r.t. # of targets missing Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

  20. ACR performance vs.. traditional performanceRecall Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

  21. ACR performance vs.. traditional performancePrecision Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

  22. Summary • Introduced technique for handling missing data • New rule form for entire class of problems • Much better predictions when data is missing than traditional methods • Represents opportunity to reduce number of data points that need to be collected/communicated • Computationally practical

  23. Future work • Study underway to confirm benefits in survey applications • Adapt technique for streaming data sources • Sensor applications

  24. Other related research • Transfer learning of activity patterns • Further data requirement reduction • Reduce learning time • Sequential pattern mining of streams • Current model offline modeling • Enable real-time pattern learning for ITA • Demo of pattern and location learning • Tilted time windows

  25. Information Technology Transportation Questions? IGERT Program in Computational Transportation Science IGERT Integrative Graduate Education and Research Traineeship This research was supported in part by the National Science Foundation IGERT program under Grant DGE-0549489

  26. Additional results Sequential rules C. Williams, A. Mohammadian, P. Nelson, and S. Doherty, “Mining Sequential Association Rules for Traveler Context Prediction” The First International Workshop on Computational Transportation Science (IWCTS’08), Dublin, Ireland, July 2008.

More Related