1 / 16

An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns

An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns. Zhigang Zheng , Yanchang Zhao, Ziye Zuo , and Longbing Cao PAKDD 2010. Outline. Motivation Problem Definition GA-Based Negative Sequential Pattern Mining Algorithm Experiments Conclusion. Motivation.

ninon
Download Presentation

An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns ZhigangZheng, Yanchang Zhao, ZiyeZuo, and Longbing Cao PAKDD 2010

  2. Outline • Motivation • Problem Definition • GA-Based Negative Sequential Pattern Mining Algorithm • Experiments • Conclusion

  3. Motivation • Negative sequential patterns focus on negative relationships between itemsets. • Absent items are taken into consideration • Drawback • The search space for mining negative patterns is much bigger than that for positive ones. • Huge amounts of negative candidates will be generated. • Ex. 10 distinct 1-item positive frequent items 103 3-item positive candidates, but there will be 203 3-item negative candidates.

  4. (Cont.) • Based on Genetic Algorithm, a generation pass good genes on to a new generation by crossover and mutation without generating candidates • using dynamic fitness function and pruning method to improve performance.

  5. Problem Definition • A sequence is an ordered list of elements • A element ei consists of one or more items. • Ex. <ab (c,d) f> consists of 4 elements and (c,d) is an element which includes two items. • A positive sequence s =<a b c d> • A negative sequence s = <a b ¬c d> or <a b¬ (c,d) f> • A sequence <a b f> is a max. positive subsequence of sequences <a b ¬ c f> and <a b¬ (c,d) f>

  6. (Cont.) • Negative sequential pattern • s_sup ≥ min_sup • Items in the same element should be all positive or all negative. Ex. <a (a, ¬b) c> is not allowed • Two or more continuous negative elements are not accepted. • For each negative item in a negative pattern, its positive item is required to be frequent. • Negative Matching

  7. GA-Based Negative Sequential Pattern Mining Algorithm • Population and Selection • Crossover and Mutation • Pruning • Algo. Flow

  8. Population and Selection • Initial Population: all 1-item frequent positive and negative patterns. • Selecting top K individuals with high dynamic fitness • In order to evaluate the individuals and decide which are the best for the next generation, a fitness function is used.

  9. Crossover and Mutation • Crossover • Parents with different lengths are allowed to crossover with each other. • Crossover may happen at different positions to get sequential patterns with varied lengths. • Ex.

  10. (Cont.) • Mutation • Mutation is helpful in avoiding contraction of the population to a special frequent pattern. • Ex. <b ¬ca> <bd¬ e>

  11. Pruning • Ex. c=<e1 e2 e3 … en> c’ =<eiej… ek> is the max. positive subsequence of c and 0<i≤j ≤k≤n If c’ is not frequent, c must be infrequent and should be pruned.

  12. Algo. Flow

  13. Experiments

  14. Conclusion • In the crossover process, how to decide which position can be crossovered?

More Related