1 / 37

a belief-driven method for discovering unexpected patterns - PowerPoint PPT Presentation

A Belief-Driven Method for Discovering Unexpected Patterns. Introduction Zhang Yi Algorithms Lee Wai Choy, Julian Application and Conclusions Wee Jee Jeng. Outline of presentation (Part 1). Background information Structure of rule or belief Assumption of belief

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'a belief-driven method for discovering unexpected patterns' - ryanadan

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

A Belief-Driven Method for Discovering Unexpected Patterns

Introduction Zhang Yi

Algorithms Lee Wai Choy, Julian

Application and Conclusions Wee Jee Jeng

• Background information

• Structure of rule or belief

• Assumption of belief

• Unexpectedness of rule

• Terms for the algorithm

http://web.singnet.com.sg/~waichoy/cs6203/

• Most of the research work in the KDD (Knowledge Discovery in Database) field focuses on the validity aspect.

• Drawbacks: Many existing tools generate a large number of valid but obvious or irrelevant patterns

• To address this issue, some researchers have studied the discovery of novel and useful patterns

http://web.singnet.com.sg/~waichoy/cs6203/

Background (cont.)

• This paper focuses on discovering the unexpected patterns relative to a belief system.

• Belief (a logical statement) is the prior knowledge contains a set of expectations about the problem domain.

• The belief can be generated by elicitation of beliefs for the domain expert, learning them from data, and refinement of existing beliefs using newly discovered patterns, etc.(Not the concern of this paper)

http://web.singnet.com.sg/~waichoy/cs6203/

1. Defined in probabilistic terms(Silberschatz and Tuzhilin 1996a)

• A rule is considered to be “interesting” if it affects the degrees of beliefs.

2. Based on a syntactic comparison between a rule and a belief In (Liu and Hsu 1996)

• A rule and a belief are “different” when the consequents of the rule and the belief are “similar” but the antecedents are “far apart” or vice versa

http://web.singnet.com.sg/~waichoy/cs6203/

Overview of unexpectedness (cont.)

• This paper uses a new definition of unexpectedness in terms of a logical contradiction of a rule and a belief

• The method uses these beliefs to seed the search for patterns in data that contradict the beliefs.

http://web.singnet.com.sg/~waichoy/cs6203/

Structure of a rule or belief

• Rules and beliefs are in the form: body -> head

• Body : conjunction of (attribute op value)

• Head : single (attribute op value)

• Op  (, , =)

http://web.singnet.com.sg/~waichoy/cs6203/

Structure of a rule or belief (cont.)

• Sample of a rule:

• Education level >= Degree and

• Work experience >= 1 year

• -> Salary >= 3000

http://web.singnet.com.sg/~waichoy/cs6203/

• If a belief Y -> B that we expect to hold on a dataset D,

• then the belief will also be expected to hold on any “statistically large” subset of D

http://web.singnet.com.sg/~waichoy/cs6203/

• The rule A->B is unexpected with respect to the belief X->Y on the dataset D if:

• B and Y = False (Logically contradict)

• A and X holds on a statistically large subset of tuples in D

• Rule A, X -> B holds

http://web.singnet.com.sg/~waichoy/cs6203/

Subset contains A(Body of rule)

Subset contains X(body of belief)

Subset contains A, X and B(¬Y)

Figure 1

Belief X->Y

Dataset D

http://web.singnet.com.sg/~waichoy/cs6203/

1) If the head of the belief is of the form “a  val”:

a) Any condition of the form “avp” CONTR(Y)

if vp  {v1, v2,...vk} And vp < val;

http://web.singnet.com.sg/~waichoy/cs6203/

Term “CONTR(Y)” (cont.)

b) Any condition of the form

“a = vp”  CONTR(Y)

if vp  {v1, v2,...vk} and vp < val;

An example:

Month  10 is contradicted by

month  x, x  {1,2,...,9}

and

Month = x, x {1,…,9}

http://web.singnet.com.sg/~waichoy/cs6203/

Term “CONTR(Y)” (cont.)

2) a val

• “a  vp”  CONTR(Y) if vp  {v1, v2,...vk} and vp > val

• “a = vp” CONTR(Y) if vp  {v1, v2,...vk} And vp > val;

http://web.singnet.com.sg/~waichoy/cs6203/

Term “CONTR(Y)” (cont.)

3) “ a = vp”

• If a is an ordered attribute, “a  vp”  CONTR(Y) if vp  {v1, v2,...vk} and vp > val

• “a  vp”  CONTR(Y) if vp  {v1, v2,...vk} and vp < val

• “a = vp”  CONTR(Y) if vp  {v1, v2,...vk} and vp <> val

http://web.singnet.com.sg/~waichoy/cs6203/

• confidence of the rule X, P -> C = support(X,P,C)/support(X,P)

• Two sets of candidate itemsets for support determination:

• Ck and Ck’

http://web.singnet.com.sg/~waichoy/cs6203/

Confidence of the rule (cont.)

The form of Itemset Ck: {X, P, C}

(i) the body X of the belief,

(ii) contradict condition of the head of belief (c  CONTR(Y)) and

(iii) k other conditions (i.e. P is a conjunction of k conditions)

http://web.singnet.com.sg/~waichoy/cs6203/

Confidence of the rule (cont.)

The form of Itemset Ck’: {X, P}

• Each itemset in Ck’ (I.e. {X,P}) is generated from an itemset in Ck by dropping a contradictory condition, C

http://web.singnet.com.sg/~waichoy/cs6203/

• ZoominUR discovers are “refinements” to the beliefs

• e.g. the beliefs are contradicted

• ZoomoutUR discovers are more general rules that satisfy the conditions of unexpectedness

• In general, ZoomUR discovers all non-trivial unexpected rules with respect to a belief :-

X  Y

http://web.singnet.com.sg/~waichoy/cs6203/

• Generate a set of unexpected rules from the set of initial beliefs.

• e.g.

• Subscribers with monthly income more than \$5000tend tosubscribe to more than 3 magazines.

• Senior subscribers tend to subscribes to Health related magazines.

http://web.singnet.com.sg/~waichoy/cs6203/

A set of beliefs

1. forall beliefs B Bel_Set

2. { C0 = { {x, body(B) } | x  CONTR(head(B)) };

C0’ = { { body(B) } };

k =0;

3. while ( Ck !=  ) do

4. { forall candidates c  Ck  Ck’, compute support(c);

5. Lk = {x | x  Ck  Ck’, support(x) min_support };

6. k++;

7. Ck = generate_new_candidates(Lk-1, B);

8. Ck’ = generate_bodies(Ck , B);

9. }

10. Let X = { x | x  Li , x  a, a  CONTR(head(B)) }

11. Items_In_UnexpRuleB = 

12. forall (x  X)

13. { forall (a  x  CONTR(head(B)))

14. { rule_conf = support(x) / support(x-a)

15. if (rule “ x – a  a ” is not trival) and (rule_conf > min_conf)

16. { Items_In_UnexpRuleB = Items_In_UnexpRuleB  {x};

17. Output Rule “ x – a  a “;

18. }

19. }

20. }

21. }

Dataset

Expected min_support

Expected min_conf

http://web.singnet.com.sg/~waichoy/cs6203/

Then

C1 = {{ sal >= 5000, noOfMag < 3, payment = 1},

{ sal >= 5000, noOfMag < 3, payment = 2}}

C1’ = {{ sal >= 5000, payment = 1},

{ sal >= 5000, payment = 2}}

ZoominUR Algorithm (cont.)

1. forall beliefs B Bel_Set

2. { C0 = { {x, body(B) } | x  CONTR(head(B)) };

C0’ = { { body(B) } };

k =0;

3. while ( Ck !=  ) do

4. { forall candidates c  Ck  Ck’, compute support(c);

5. Lk = {x | x  Ck  Ck’, support(x) min_support };

6. k++;

7. Ck = generate_new_candidates(Lk-1, B);

8. Ck’ = generate_bodies(Ck , B);

9. }

10. Let X = { x | x  Li , x  a, a  CONTR(head(B)) }

11. Items_In_UnexpRuleB = 

12. forall (x  X)

13. { forall (a  x  CONTR(head(B)))

14. { rule_conf = support(x) / support(x-a)

15. if (rule “ x – a  a ” is not trival) and (rule_conf > min_conf)

16. { Items_In_UnexpRuleB = Items_In_UnexpRuleB  {x};

17. Output Rule “ x – a  a “;

18. }

19. }

20. }

21. }

Convert each belief into the form

x  y

e.g. subscriber’s salary >= \$5000 tend to subscribe more than 3 magazines

Sal >= 5000  noOfMag >= 3

Then,

C0 = {{ sal >= 5000, noOfMag < 3 }}

C0’ = { sal >= 5000 }

• Step 4: Compute the support using the dataset

• Step 5: Generate the large itemset, LK

• (if the min_support is satisfied).

• L0 = {{ sal >= 5000, noOfMag < 3},

• { sal >= 5000 }}

http://web.singnet.com.sg/~waichoy/cs6203/

ZoominUR Algorithm (cont.)

Step 10-20: Generates the unexpected rules X of the form,

x, p  a

Step 12-20: Repeat for all x in X.

Step 3-9: Repeat until CK becomes a null set.

Repeated for each belief in the belief set, Bel_Set.

1. forall beliefs B Bel_Set

2. { C0 = { {x, body(B) } | x  CONTR(head(B)) };

C0’ = { { body(B) } };

k =0;

3. while ( Ck !=  ) do

4. { forall candidates c  Ck  Ck’, compute support(c);

5. Lk = {x | x  Ck  Ck’, support(x) min_support };

6. k++;

7. Ck = generate_new_candidates(Lk-1, B);

8. Ck’ = generate_bodies(Ck , B);

9. }

10. Let X = { x | x  Li , x  a, a  CONTR(head(B)) }

11. Items_In_UnexpRuleB = 

12. forall (x  X)

13. { forall (a  x  CONTR(head(B)))

14. { rule_conf = support(x) / support(x-a)

15. if (rule “ x – a  a ” is not trival) and (rule_conf > min_conf)

16. { Items_In_UnexpRuleB = Items_In_UnexpRuleB  {x};

17. Output Rule “ x – a  a “;

18. }

19. }

20. }

21. }

Step 15: Ensure that rule is

i) NOT trivial, and

ii) Satisfy the min_conf provided by user

Step 14: Compute the confidence value of the rule.

http://web.singnet.com.sg/~waichoy/cs6203/

Step 3-6: For each unexpected rules, generate more general association rules.

Step 8-11: Iteratively check if new rules satisfy the minimum confidence required.

ZoomoutUR Algorithm

The set of unexpected rules generated from ZoominUR.

1. forall beliefs B

2. { new_candidates = ;

3. forall (x Items_In_UnexpRulesB )

4. { Let K = { ( k, k’) | k x , k  x – body(B),

k’ = k – a, a  CONTR(head(B)) }

5. new_candidates = new_candidates  K;

6. }

7. find_support(new_candidates);

8. foreach (k, k’)  new_candidates

9. { consider rule: k’  k-k’ with confidence = support(k) / support(k’);

10. if (confidence > min_conf) Output Rule = “ k’  k-k’“

11. }

12. }

Step 7: Find the support of the new candidate using the dataset.

http://web.singnet.com.sg/~waichoy/cs6203/

Marketing Applications-ZoominUR association rules.

• Beliefs

• Shoppers in households with childrentend to purchase regular beverages more than diet

• Unexpected pattern

http://web.singnet.com.sg/~waichoy/cs6203/

Marketing Applications-ZoominUR association rules.(cont.)

• Beliefs

• Processionalstend to shop more on weekends than on weekdays

• Unexpected pattern

• In December, professionals tend to shop more on weekdays than on weekends (opposite)

• Professionals in large householdstend to shop more on weekdays than on weekends (opposite)

Marketing Applications-ZoomoutUR association rules.

• Beliefs

• Processionalstend to shop more on weekends than on weekdays

• Unexpected pattern

• In December, shoppers in generalshop more on weekdays than on weekends

• Not necessary be a “professional in December” effect, but shoppers in general

• This rule is not just a refinement of the belief, but a much different rule

http://web.singnet.com.sg/~waichoy/cs6203/

Mining Web Logfile Data-ZoominUR association rules.

• Belief

• For all files, all weeks, the number of hits to a file each week is approximately equal to the file’s average weekly hits

• Unexpected pattern

• For a certain “Call for Papers” file, in the weeks from September 1- through October 29, the weekly access count is much higher than the average

http://web.singnet.com.sg/~waichoy/cs6203/

Problem association rules.

• How good is the initial set of beliefs

• It’s difficult to obtain belief information in practice, especially specific domain knowledge

http://web.singnet.com.sg/~waichoy/cs6203/

Related work association rules.

• Degree of belief

• Bayesian approach, frequency approach, etc.

• When changes to the user-defined beliefs occur, this means that there are interesting patterns in the data

http://web.singnet.com.sg/~waichoy/cs6203/

Related work association rules.

• Perform post analysis to deal with interestingness problem

• Fuzzy matching

• General impression

• Matching discovered rules

• Rank and find the unexpected rules

http://web.singnet.com.sg/~waichoy/cs6203/

Conclusion association rules.

• Comparison operators are needed since many of the interesting patterns are expressed in these terms.

• It's difficult to discover relevant patterns from the raw data without the beliefs because beliefs provide valuable domain knowledge that results in the creation of several defined views and also drive the discovery process.

http://web.singnet.com.sg/~waichoy/cs6203/

Conclusion association rules.(cont.)

• User-defined beliefs can drastically reduce the number of irrelevant and obvious patterns found during the discovery process and focus on the discovery of unexpected patterns

http://web.singnet.com.sg/~waichoy/cs6203/

References association rules.

• Rakesh Agrawal, Tomasz Imielinski, and Arun Swami, "Database mining: A performance perspective," IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING, VOL. 5, NO 6, DECEMBER 1993, 914-925

• Christopher J. Matheus, Philip K. Chanm and Gregory Piatesky-Shapiro, "Systems for Knowledge Discovery in Databases," IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL 5, NO 6, DECEMBER 1993, pp.903-912

• Vasant Dhar, Alexamder Tuzhilin, "Abstract-Driven Pattern Discovery in Databases," IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING, VOL. 5, NO 6, DECEMBER 1993, 926-938

• .Liu, B., Hsu, W. and Chen, S, " Using General Impressions to Analyze Discovered Classification Rules," PROC. OF THE THRID INTL' CONF. ON KNOWLEDDGE DISCOVERY AND DATA MINING (KDD 97), pp. 25-36

• Silberschatz, A. and Tuzhilin, A., "What makes Patterns Interesting in Knowledge Discovery Systems," IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING, VOL. 5, NO 6, DECEMBER 1993, 970

http://web.singnet.com.sg/~waichoy/cs6203/

References association rules.(cont.)

• Bing Liu, Wynne Hsu, Lai-Fun Mun and Hing-Yan Lee, "Finding Interesting Patterns Using User Expectations,"Technical report, TRA7/96, Department of Information Systems and Computer Science, National University of Singapore, 1996

• Bing Liu and Wynne Hsu, "Post-Analysis of Learned Rules," Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), Aug 4-8, 1996, Portland, Oregon, USA, pp. 828-834

http://web.singnet.com.sg/~waichoy/cs6203/