data mining association rules n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
DATA MINING - ASSOCIATION RULES- PowerPoint Presentation
Download Presentation
DATA MINING - ASSOCIATION RULES-

Loading in 2 Seconds...

play fullscreen
1 / 18

DATA MINING - ASSOCIATION RULES- - PowerPoint PPT Presentation


  • 595 Views
  • Uploaded on

DATA MINING - ASSOCIATION RULES-. STEFAAN YETIMYAN CS 157A. Outline . 1. Data Mining (DM) ~ KDD [Definition] 2. DM Technique -> Association rules [support & confidence] 3. Example (4. Apriori Algorithm). 1. Data Mining ~ KDD [Definition].

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'DATA MINING - ASSOCIATION RULES-' - lavina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data mining association rules

DATA MINING-ASSOCIATION RULES-

STEFAAN YETIMYAN

CS 157A

outline
Outline

1. Data Mining (DM) ~ KDD [Definition]

2. DM Technique

-> Association rules [support & confidence]

3. Example

(4. Apriori Algorithm)

1 data mining kdd definition
1. Data Mining ~ KDD [Definition]

- "Data mining (DM), also called Knowledge-Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns using specific DM technique."

- [more formal definition] KDD ~ "the non-trivial extraction of implicit, previously unknown and potentially useful knowledge from data"

1 data mining kdd definition1
1. Data Mining ~ KDD [Definition]

Data Mining techniques

  • Information Visualization
  • k-nearest neighbor
  • decision trees
  • neural networks
  • association rules
2 association rules
2. Association rules

Support

Every association rule has a support and a confidence.

“The support is the percentage of transactions that demonstrate the rule.”

Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1: 1, 3, 5.

2: 1, 8, 14, 17, 12.

3: 4, 6, 8, 12, 9, 104.

4: 2, 1, 8.

support {8,12} = 2 (,or 50% ~ 2 of 4 customers)

support {1, 5} = 1 (,or 25% ~ 1 of 4 customers )

support {1} = 3 (,or 75% ~ 3 of 4 customers)

2 association rules1
2. Association rules

Support

An itemset is called frequent if its support is equal or greater than an agreed upon minimal value – the support threshold

add to previous example:

if threshold 50%

then itemsets {8,12} and {1} called frequent

2 association rules2
2. Association rules

Confidence

Every association rule has a support and a confidence.

An association rule is of the form: X => Y

  • X => Y: if someone buys X, he also buys Y

The confidence is the conditional probability that, given X present in a transition , Y will also be present.

Confidence measure, by definition:

Confidence(X=>Y) equals support(X,Y) / support(X)

2 association rules3
2. Association rules

Confidence

We should only consider rules derived from itemsets with high support, and that also have high confidence.

“A rule with low confidence is not meaningful.”

Rules don’t explain anything, they just point out hard facts in data volumes.

3 example
3. Example

Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1: 3, 5, 8.

2: 2, 6, 8.

3: 1, 4, 7, 10.

4: 3, 8, 10.

5: 2, 5, 8.

6: 1, 5, 6.

7: 4, 5, 6, 8.

8: 2, 3, 4.

9: 1, 5, 7, 8.

10: 3, 8, 9, 10.

Conf ( {5} => {8} ) ?

supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4,

thenconf( {5} => {8} ) = 4/5 = 0.8 or 80%

3 example1
3. Example

Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1: 3, 5, 8.

2: 2, 6, 8.

3: 1, 4, 7, 10.

4: 3, 8, 10.

5: 2, 5, 8.

6: 1, 5, 6.

7: 4, 5, 6, 8.

8: 2, 3, 4.

9: 1, 5, 7, 8.

10: 3, 8, 9, 10.

Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ?

supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4,

thenconf( {8} => {5} ) = 4/7 = 0.57 or 57%

3 example2
3. Example

Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf ( {5} => {8} ) ? 80% Done.

Conf ( {8} => {5} ) ? 57% Done.

Rule ( {5} => {8} ) more meaningful then

Rule ( {8} => {5} )

3 example3
3. Example

Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1: 3, 5, 8.

2: 2, 6, 8.

3: 1, 4, 7, 10.

4: 3, 8, 10.

5: 2, 5, 8.

6: 1, 5, 6.

7: 4, 5, 6, 8.

8: 2, 3, 4.

9: 1, 5, 7, 8.

10: 3, 8, 9, 10.

Conf ( {9} => {3} ) ?

supp({9}) = 1 , supp({3}) = 1 , supp({3,9}) = 1,

thenconf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?

3 example4
3. Example

Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf( {9} => {3} ) = 100%. Done.

Notice: High Confidence, Low Support.

->Rule ( {9} => {3} ) not meaningful

4 apriori algorthm
4. APRIORI ALGORTHM

APRIOIRI is an efficient algorithm to find association rules (or, actually, frequent itemsets). The apriori technique is used for “generating large itemsets.” Out of all candidate (k)-itemsets, generate all candidate (k+1)-itemsets.

(Also: Out of one k-itemset, we can produce ((2^k) – 2) rules)

4 apriori algorthm1
4. APRIORI ALGORTHM

Example:

with k = 3 (& k-itemsets lexicographically ordered)

{3,4,5}, {3,4,7}, {3,5,6}, {3,5,7}, {3,5,8}, {4,5,6}, {4,5,7}

genereate all possible (k+1)-itemsets, by, for each to sets where we have

{a1,a2,..a(k-1),X} and {a1,a2,..a(k-1),Y}, results in candidate {a_1,a_2,...a_(k-1),X,Y}.

{3,4,5,7}, {3,5,6,7}, {3,5,6,8}, {3,5,7,8}, {4,5,6,7}

4 apriori algorthm2
4. APRIORI ALGORTHM

Example (CONTINUED):

{3,4,5,7}, {3,5,6,7}, {3,5,6,8}, {3,5,7,8}, {4,5,6,7}

Delete (prune) all itemset candidates with non-frequent subsets. Like; {3,5,6,8} self never frequent since subset {5,6,8} is not frequent.

Actually, here, only one remaining candidate {3,4,5,7}

Last; after pruning, determine the support of the remaining itemsets, and check if they make the threshold.

references
REFERENCES
  • Textbook: DATABASE Systems Concepts (Silberschatz et al.)
  • http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm
  • http://aaaprod.gsfc.nasa.gov/teas/joel.html
  • http://www.liacs.nl/