a belief driven method for discovering unexpected patterns
Download
Skip this Video
Download Presentation
A Belief-Driven Method for Discovering Unexpected Patterns

Loading in 2 Seconds...

play fullscreen
1 / 37

A Belief-Driven Method for Discovering Unexpected Patterns - PowerPoint PPT Presentation


  • 200 Views
  • Uploaded on

A Belief-Driven Method for Discovering Unexpected Patterns. Introduction Zhang Yi Algorithms Lee Wai Choy, Julian Application and Conclusions Wee Jee Jeng. Outline of presentation (Part 1). Background information Structure of rule or belief Assumption of belief

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Belief-Driven Method for Discovering Unexpected Patterns' - ryanadan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a belief driven method for discovering unexpected patterns

A Belief-Driven Method for Discovering Unexpected Patterns

Introduction Zhang Yi

Algorithms Lee Wai Choy, Julian

Application and Conclusions Wee Jee Jeng

outline of presentation part 1
Outline of presentation (Part 1)
  • Background information
  • Structure of rule or belief
  • Assumption of belief
  • Unexpectedness of rule
  • Terms for the algorithm

http://web.singnet.com.sg/~waichoy/cs6203/

background
Background
  • Most of the research work in the KDD (Knowledge Discovery in Database) field focuses on the validity aspect.
  • Drawbacks: Many existing tools generate a large number of valid but obvious or irrelevant patterns
  • To address this issue, some researchers have studied the discovery of novel and useful patterns

http://web.singnet.com.sg/~waichoy/cs6203/

background cont
Background (cont.)
  • This paper focuses on discovering the unexpected patterns relative to a belief system.
  • Belief (a logical statement) is the prior knowledge contains a set of expectations about the problem domain.
  • The belief can be generated by elicitation of beliefs for the domain expert, learning them from data, and refinement of existing beliefs using newly discovered patterns, etc.(Not the concern of this paper)

http://web.singnet.com.sg/~waichoy/cs6203/

overview of unexpectedness
Overview of unexpectedness

1. Defined in probabilistic terms(Silberschatz and Tuzhilin 1996a)

  • A rule is considered to be “interesting” if it affects the degrees of beliefs.

2. Based on a syntactic comparison between a rule and a belief In (Liu and Hsu 1996)

  • A rule and a belief are “different” when the consequents of the rule and the belief are “similar” but the antecedents are “far apart” or vice versa

http://web.singnet.com.sg/~waichoy/cs6203/

overview of unexpectedness cont
Overview of unexpectedness (cont.)

3. Logical contradiction

  • This paper uses a new definition of unexpectedness in terms of a logical contradiction of a rule and a belief
  • The method uses these beliefs to seed the search for patterns in data that contradict the beliefs.

http://web.singnet.com.sg/~waichoy/cs6203/

structure of a rule or belief
Structure of a rule or belief
  • Rules and beliefs are in the form: body -> head
  • Body : conjunction of (attribute op value)
  • Head : single (attribute op value)
  • Op  (, , =)

http://web.singnet.com.sg/~waichoy/cs6203/

structure of a rule or belief cont
Structure of a rule or belief (cont.)
  • Sample of a rule:
  • Education level >= Degree and
  • Work experience >= 1 year
  • -> Salary >= 3000

http://web.singnet.com.sg/~waichoy/cs6203/

assumption
Assumption :
  • If a belief Y -> B that we expect to hold on a dataset D,
  • then the belief will also be expected to hold on any “statistically large” subset of D

http://web.singnet.com.sg/~waichoy/cs6203/

definition of unexpectedness used in this paper
Definition of unexpectedness used in this paper
  • The rule A->B is unexpected with respect to the belief X->Y on the dataset D if:
  • B and Y = False (Logically contradict)
  • A and X holds on a statistically large subset of tuples in D
  • Rule A, X -> B holds

http://web.singnet.com.sg/~waichoy/cs6203/

figure 1
Subset contains A and X

Subset contains A(Body of rule)

Subset contains X(body of belief)

Subset contains A, X and B(¬Y)

Figure 1

Belief X->Y

Dataset D

http://web.singnet.com.sg/~waichoy/cs6203/

term contr y contradict condition
Term “CONTR(Y)”(Contradict condition)

1) If the head of the belief is of the form “a  val”:

a) Any condition of the form “avp” CONTR(Y)

if vp  {v1, v2,...vk} And vp < val;

http://web.singnet.com.sg/~waichoy/cs6203/

term contr y cont
Term “CONTR(Y)” (cont.)

b) Any condition of the form

“a = vp”  CONTR(Y)

if vp  {v1, v2,...vk} and vp < val;

An example:

Month  10 is contradicted by

month  x, x  {1,2,...,9}

and

Month = x, x {1,…,9}

http://web.singnet.com.sg/~waichoy/cs6203/

term contr y cont14
Term “CONTR(Y)” (cont.)

2) a val

  • “a  vp”  CONTR(Y) if vp  {v1, v2,...vk} and vp > val
  • “a = vp” CONTR(Y) if vp  {v1, v2,...vk} And vp > val;

http://web.singnet.com.sg/~waichoy/cs6203/

term contr y cont15
Term “CONTR(Y)” (cont.)

3) “ a = vp”

  • If a is an ordered attribute, “a  vp”  CONTR(Y) if vp  {v1, v2,...vk} and vp > val
  • “a  vp”  CONTR(Y) if vp  {v1, v2,...vk} and vp < val
  • “a = vp”  CONTR(Y) if vp  {v1, v2,...vk} and vp <> val

http://web.singnet.com.sg/~waichoy/cs6203/

confidence of the rule
Confidence of the rule
  • confidence of the rule X, P -> C = support(X,P,C)/support(X,P)
  • Two sets of candidate itemsets for support determination:
  • Ck and Ck’

http://web.singnet.com.sg/~waichoy/cs6203/

confidence of the rule cont
Confidence of the rule (cont.)

The form of Itemset Ck: {X, P, C}

(i) the body X of the belief,

(ii) contradict condition of the head of belief (c  CONTR(Y)) and

(iii) k other conditions (i.e. P is a conjunction of k conditions)

http://web.singnet.com.sg/~waichoy/cs6203/

confidence of the rule cont18
Confidence of the rule (cont.)

The form of Itemset Ck’: {X, P}

  • Each itemset in Ck’ (I.e. {X,P}) is generated from an itemset in Ck by dropping a contradictory condition, C

http://web.singnet.com.sg/~waichoy/cs6203/

zoomur
ZoomUR
  • ZoominUR discovers are “refinements” to the beliefs
    • e.g. the beliefs are contradicted
  • ZoomoutUR discovers are more general rules that satisfy the conditions of unexpectedness
  • In general, ZoomUR discovers all non-trivial unexpected rules with respect to a belief :-

X  Y

http://web.singnet.com.sg/~waichoy/cs6203/

zoominur overview
ZoominUR Overview
  • Generate a set of unexpected rules from the set of initial beliefs.
  • e.g.
    • Subscribers with monthly income more than $5000tend tosubscribe to more than 3 magazines.
    • Senior subscribers tend to subscribes to Health related magazines.

http://web.singnet.com.sg/~waichoy/cs6203/

zoominur algorithm
ZoominUR Algorithm

A set of beliefs

1. forall beliefs B Bel_Set

2. { C0 = { {x, body(B) } | x  CONTR(head(B)) };

C0’ = { { body(B) } };

k =0;

3. while ( Ck !=  ) do

4. { forall candidates c  Ck  Ck’, compute support(c);

5. Lk = {x | x  Ck  Ck’, support(x) min_support };

6. k++;

7. Ck = generate_new_candidates(Lk-1, B);

8. Ck’ = generate_bodies(Ck , B);

9. }

10. Let X = { x | x  Li , x  a, a  CONTR(head(B)) }

11. Items_In_UnexpRuleB = 

12. forall (x  X)

13. { forall (a  x  CONTR(head(B)))

14. { rule_conf = support(x) / support(x-a)

15. if (rule “ x – a  a ” is not trival) and (rule_conf > min_conf)

16. { Items_In_UnexpRuleB = Items_In_UnexpRuleB  {x};

17. Output Rule “ x – a  a “;

18. }

19. }

20. }

21. }

Dataset

Expected min_support

Expected min_conf

http://web.singnet.com.sg/~waichoy/cs6203/

zoominur algorithm cont
Step 7 & 8: Generate new candidates, CK using
  • LK-1 wrt B.
      • L0 = {{ sal >= 5000, noOfMag < 3},
      • { sal >= 5000 }}

Then

C1 = {{ sal >= 5000, noOfMag < 3, payment = 1},

{ sal >= 5000, noOfMag < 3, payment = 2}}

C1’ = {{ sal >= 5000, payment = 1},

{ sal >= 5000, payment = 2}}

ZoominUR Algorithm (cont.)

1. forall beliefs B Bel_Set

2. { C0 = { {x, body(B) } | x  CONTR(head(B)) };

C0’ = { { body(B) } };

k =0;

3. while ( Ck !=  ) do

4. { forall candidates c  Ck  Ck’, compute support(c);

5. Lk = {x | x  Ck  Ck’, support(x) min_support };

6. k++;

7. Ck = generate_new_candidates(Lk-1, B);

8. Ck’ = generate_bodies(Ck , B);

9. }

10. Let X = { x | x  Li , x  a, a  CONTR(head(B)) }

11. Items_In_UnexpRuleB = 

12. forall (x  X)

13. { forall (a  x  CONTR(head(B)))

14. { rule_conf = support(x) / support(x-a)

15. if (rule “ x – a  a ” is not trival) and (rule_conf > min_conf)

16. { Items_In_UnexpRuleB = Items_In_UnexpRuleB  {x};

17. Output Rule “ x – a  a “;

18. }

19. }

20. }

21. }

Convert each belief into the form

x  y

e.g. subscriber’s salary >= $5000 tend to subscribe more than 3 magazines

Sal >= 5000  noOfMag >= 3

Then,

C0 = {{ sal >= 5000, noOfMag < 3 }}

C0’ = { sal >= 5000 }

  • Step 4: Compute the support using the dataset
  • Step 5: Generate the large itemset, LK
  • (if the min_support is satisfied).
      • L0 = {{ sal >= 5000, noOfMag < 3},
      • { sal >= 5000 }}

http://web.singnet.com.sg/~waichoy/cs6203/

zoominur algorithm cont23
ZoominUR Algorithm (cont.)

Step 10-20: Generates the unexpected rules X of the form,

x, p  a

Step 12-20: Repeat for all x in X.

Step 3-9: Repeat until CK becomes a null set.

Repeated for each belief in the belief set, Bel_Set.

1. forall beliefs B Bel_Set

2. { C0 = { {x, body(B) } | x  CONTR(head(B)) };

C0’ = { { body(B) } };

k =0;

3. while ( Ck !=  ) do

4. { forall candidates c  Ck  Ck’, compute support(c);

5. Lk = {x | x  Ck  Ck’, support(x) min_support };

6. k++;

7. Ck = generate_new_candidates(Lk-1, B);

8. Ck’ = generate_bodies(Ck , B);

9. }

10. Let X = { x | x  Li , x  a, a  CONTR(head(B)) }

11. Items_In_UnexpRuleB = 

12. forall (x  X)

13. { forall (a  x  CONTR(head(B)))

14. { rule_conf = support(x) / support(x-a)

15. if (rule “ x – a  a ” is not trival) and (rule_conf > min_conf)

16. { Items_In_UnexpRuleB = Items_In_UnexpRuleB  {x};

17. Output Rule “ x – a  a “;

18. }

19. }

20. }

21. }

Step 15: Ensure that rule is

i) NOT trivial, and

ii) Satisfy the min_conf provided by user

Step 14: Compute the confidence value of the rule.

http://web.singnet.com.sg/~waichoy/cs6203/

zoomoutur algorithm
Step 3-6: For each unexpected rules, generate more general association rules.

Step 8-11: Iteratively check if new rules satisfy the minimum confidence required.

ZoomoutUR Algorithm

The set of unexpected rules generated from ZoominUR.

1. forall beliefs B

2. { new_candidates = ;

3. forall (x Items_In_UnexpRulesB )

4. { Let K = { ( k, k’) | k x , k  x – body(B),

k’ = k – a, a  CONTR(head(B)) }

5. new_candidates = new_candidates  K;

6. }

7. find_support(new_candidates);

8. foreach (k, k’)  new_candidates

9. { consider rule: k’  k-k’ with confidence = support(k) / support(k’);

10. if (confidence > min_conf) Output Rule = “ k’  k-k’“

11. }

12. }

Step 7: Find the support of the new candidate using the dataset.

Click Here

http://web.singnet.com.sg/~waichoy/cs6203/

marketing applications zoominur
Marketing Applications-ZoominUR
  • Beliefs
    • Shoppers in households with childrentend to purchase regular beverages more than diet
  • Unexpected pattern
    • When there is a large store advertisement, shoppers with childrenbuy diet beverages(Opposite product)

http://web.singnet.com.sg/~waichoy/cs6203/

marketing applications zoominur cont
Marketing Applications-ZoominUR (cont.)
  • Beliefs
    • Processionalstend to shop more on weekends than on weekdays
  • Unexpected pattern
    • In December, professionals tend to shop more on weekdays than on weekends (opposite)
    • Professionals in large householdstend to shop more on weekdays than on weekends (opposite)
marketing applications zoomoutur
Marketing Applications-ZoomoutUR
  • Beliefs
    • Processionalstend to shop more on weekends than on weekdays
  • Unexpected pattern
    • In December, shoppers in generalshop more on weekdays than on weekends
      • Not necessary be a “professional in December” effect, but shoppers in general
      • This rule is not just a refinement of the belief, but a much different rule

http://web.singnet.com.sg/~waichoy/cs6203/

mining web logfile data zoominur
Mining Web Logfile Data-ZoominUR
  • Belief
    • For all files, all weeks, the number of hits to a file each week is approximately equal to the file’s average weekly hits
  • Unexpected pattern
    • For a certain “Call for Papers” file, in the weeks from September 1- through October 29, the weekly access count is much higher than the average

http://web.singnet.com.sg/~waichoy/cs6203/

problem
Problem
  • How good is the initial set of beliefs
    • It’s difficult to obtain belief information in practice, especially specific domain knowledge

http://web.singnet.com.sg/~waichoy/cs6203/

related work
Related work
  • Degree of belief
    • Bayesian approach, frequency approach, etc.
    • When changes to the user-defined beliefs occur, this means that there are interesting patterns in the data

http://web.singnet.com.sg/~waichoy/cs6203/

related work31
Related work
  • Perform post analysis to deal with interestingness problem
    • Fuzzy matching
    • General impression
      • Matching discovered rules
      • Rank and find the unexpected rules

http://web.singnet.com.sg/~waichoy/cs6203/

conclusion
Conclusion
  • Comparison operators are needed since many of the interesting patterns are expressed in these terms.
  • It's difficult to discover relevant patterns from the raw data without the beliefs because beliefs provide valuable domain knowledge that results in the creation of several defined views and also drive the discovery process.

http://web.singnet.com.sg/~waichoy/cs6203/

conclusion cont
Conclusion (cont.)
  • User-defined beliefs can drastically reduce the number of irrelevant and obvious patterns found during the discovery process and focus on the discovery of unexpected patterns

http://web.singnet.com.sg/~waichoy/cs6203/

references
References
  • Rakesh Agrawal, Tomasz Imielinski, and Arun Swami, "Database mining: A performance perspective," IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING, VOL. 5, NO 6, DECEMBER 1993, 914-925
  • Christopher J. Matheus, Philip K. Chanm and Gregory Piatesky-Shapiro, "Systems for Knowledge Discovery in Databases," IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL 5, NO 6, DECEMBER 1993, pp.903-912
  • Vasant Dhar, Alexamder Tuzhilin, "Abstract-Driven Pattern Discovery in Databases," IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING, VOL. 5, NO 6, DECEMBER 1993, 926-938
  • .Liu, B., Hsu, W. and Chen, S, " Using General Impressions to Analyze Discovered Classification Rules," PROC. OF THE THRID INTL' CONF. ON KNOWLEDDGE DISCOVERY AND DATA MINING (KDD 97), pp. 25-36
  • Silberschatz, A. and Tuzhilin, A., "What makes Patterns Interesting in Knowledge Discovery Systems," IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING, VOL. 5, NO 6, DECEMBER 1993, 970

http://web.singnet.com.sg/~waichoy/cs6203/

references cont
References (cont.)
  • Bing Liu, Wynne Hsu, Lai-Fun Mun and Hing-Yan Lee, "Finding Interesting Patterns Using User Expectations,"Technical report, TRA7/96, Department of Information Systems and Computer Science, National University of Singapore, 1996
  • Bing Liu and Wynne Hsu, "Post-Analysis of Learned Rules," Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), Aug 4-8, 1996, Portland, Oregon, USA, pp. 828-834

http://web.singnet.com.sg/~waichoy/cs6203/

question answer
Question & Answer

?

http://web.singnet.com.sg/~waichoy/cs6203/

ad