This presentation is the property of its rightful owner.
Sponsored Links
1 / 135

Κανόνες Συσχέτισης I Ι PowerPoint PPT Presentation


  • 137 Views
  • Uploaded on
  • Presentation posted in: General

Κανόνες Συσχέτισης I Ι. Οι διαφάνειες στηρίζονται στο P.-N. Tan, M.Steinbach, V. Kumar, « Introduction to Data Mining» , Addison Wesley, 2006. Σύντομη Ανακεφαλαίωση. Εισαγωγή. Market-Basket transactions (Το καλάθι της νοικοκυράς!).

Download Presentation

Κανόνες Συσχέτισης I Ι

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


I

I

P.-N. Tan, M.Steinbach, V. Kumar, Introduction to Data Mining, Addison Wesley, 2006


I

II


I

Market-Basket transactions ( !)

: (transactions), (item)

{Diaper} {Beer},{Milk, Bread} {Eggs,Coke},{Beer, Bread} {Milk}

(transaction)

(item)

, (co-occurrence, not causality )

II


I

  • (itemset):

  • k- (k-itemset): k

  • support count () :

  • (Support (s))

  • (Frequent Itemset) minsup

II


I

  • (Association Rule)

  • X Y,

  • X Y

  • , , =

    • : {Milk, Diaper} {Beer}

  • Support (s)

    X Y ( )

  • - Confidence (c)

    ()

  • : T

  • :

    • support minsup

    • confidence minconf

II


I

  • -:

    • (Frequent Itemset Generation)

      minsup

    • (Rule Generation)

      () , , (. )

II


I

Itemset Lattice -

5

d , 2d

II


I

: apriori

Apriori

,

null

A

B

C

D

E

,

AB

AC

AD

AE

BC

BD

BE

CD

CE

DE

ABC

ABD

ABE

ACD

ACE

ADE

BCD

BCE

BDE

CDE

ABCD

ABCE

ABDE

ACDE

BCDE

Support-based pruning

ABCDE

II


I

apriori

k=1 #k:

1-

Repeat until

  • (k+1)-

  • k

  • (k+1)-

  • k = k + 1

II


I

apriori:

  • k-

  • Fk-1 x F1

    (k-1)

  • Fk-1 x Fk-1

    (k-1) k-2

  • , ()

II


I

apriori:

k+1-,

k+1

  • , k+1-

  • , k+1-

II


I

  • L, f L f L f

  • -

  • L = {A,B,C,D}:c(ABC D) c(AB CD) c(A BCD)

    • - RHS

II


I

apriori

{, B, C, D}

ABCD=>{ }

, k=1

k+1

:

BCD=>A

ACD=>B

ABD=>C

ABC=>D

CD=>AB

BD=>AC

BC=>AD

AD=>BC

AC=>BD

AB=>CD

D=>ABC

C=>ABD

B=>ACD

A=>BCD

II


I

  • ,

  • ;

  • (, ):

  • Maximal

II


I

maximal

:

:

null

A

B

C

D

E

AB

AC

AD

AE

BC

BD

BE

CD

CE

DE

ABC

ABD

ABE

ACD

ACE

ADE

BCD

BCE

BDE

CDE

ABCD

ABCE

ABDE

ACDE

BCDE

ABCD

E

II


I

(closed) (, )

(, minsup)

,

:

II


I

TID

1

ABC

2

ABCD

3

BCE

4

ACDE

5

DE

Maximal vs Closed Itemsets

IDs

null

124

123

1234

245

345

A

B

C

D

E

12

124

24

123

4

2

3

24

34

45

AB

AC

AD

AE

BC

BD

BE

CD

CE

DE

12

24

2

2

4

4

3

4

ABC

ABD

ABE

ACD

ACE

ADE

BCD

BCE

BDE

CDE

4

2

ABCD

ABCE

ABDE

ACDE

BCDE

ABCDE

II


I

Maximal vs Closed Itemsets

maximal

= 2

null

124

123

1234

245

345

A

B

C

D

E

maximal

12

124

24

123

4

2

3

24

34

45

AB

AC

AD

AE

BC

BD

BE

CD

CE

DE

12

24

2

2

4

4

3

4

ABC

ABD

ABE

ACD

ACE

ADE

BCD

BCE

BDE

CDE

4

2

# Closed = 9

# Maximal = 4

ABCD

ABCE

ABDE

ACDE

BCDE

,

ABCDE

II


I

II


I

  • Apriori , :

  • I/O

  • :

  • o

II


I

Apriori: --

null

k-1 -> k

A

B

C

D

E

AB

AC

AD

AE

BC

BD

BE

CD

CE

DE

k -> k - 1

ABC

ABD

ABE

ACD

ACE

ADE

BCD

BCE

BDE

CDE

--

ABCD

ABCE

ABDE

ACDE

BCDE

,

(bottom) ,

ABCDE

II


I

:

-- vs --

k -> k 1 (--)

maximal (, )

,

-- --

II


I

:

( )

Apriori: k , 1, 2

Prefix (Suffix): ( -) k

null

null

A

B

D

A

B

D

C

C

AB

AC

AD

BC

BD

AB

AC

CD

BC

AD

BD

CD

ABD

ACD

BCD

ABC

ABD

ACD

ABC

BCD

ABCD

ABCD

(a) Prefix tree

(b) Suffix tree

II


I

:

BFSvs DFS

Apriori

DFS: Depth-First-Search

BFS: Breadth-First-Search

maximal BFS

maximal,

II


I

:

BFSvs DFS

prune

Prune aab ac

Maximal

II


I

TID

Items

A

B

C

D

E

1

A,B,E

1

1

2

2

1

2

B,C,D

4

2

3

4

3

3

C,E

5

5

4

5

6

4

A,C,D

6

7

8

9

5

A,B,C,D

7

8

9

6

A,E

8

10

7

A,B

9

8

A,B,C

9

A,C,D

10

B

: vs

:

apriori

TID-

II


I

TID-

  • k- TID- (k-1)-- .

  • :

  • , TID-

FP-Growth prefix-based

II


Fp growth

FP-Growth

II


I

FP-Growth

:

  • FP-

    • - prefix tree (trie)

    • , FP-

    • : ,

  • FP-, -- (divide-and-conquer)

II


I

FP-Growth

FP-

To FP-

,

{, } {, C, A} ( )

, ( )

null

,

II


I

FP-Growth

FP-

null

TID=1:

A:1

B:1

: ()

II


I

FP-Growth

FP-

null

TID=1:

A:1

B:1

B:1

C:1

TID=2:

D:1

, ()

,

II


I

FP-Growth

FP-

null

TID=1, 2:

A:1

B:1

B:1

C:1

D:1

,

II


I

FP-Growth

FP-

TID=1, 2:

null

TID=3

A:1

A:1

B:1

B:1

C:1

D:1

II


I

FP-Growth

FP-

TID=1, 2:

null

TID=3

A:2

B:1

B:1

C:1

C:1

D:1

D:1

E:1

II


I

FP-Growth

FP-

TID=1, 2:

null

TID=3

A:2

B:1

B:1

C:1

C:1

D:1

D:1

E:1

II


I

FP-Growth

FP-

null

B:3

A:7

B:5

C:3

C:1

D:1

D:1

C:3

E:1

D:1

E:1

D:1

E:1

D:1

II


I

FP-Growth

FP-

  • ,

    • ,

    • , ( , )

II


I

TID

Items

1

{,}

2

{B,C,D}

3

{A,C,D,E}

4

{A,D,E}

5

{,,C}

6

{,,C,D}

7

{B,C}

8

{,,C}

9

{,,D}

10

{B,C,E}

FP-Growth

, : ->

() , , -> , ,

FP-

  • ,

o , ()=7, ()=8, (C)=7, (D)=5, ()=3

, ,,C,D,E

II


I

FP-Growth

  • : FP-

  • :

    • --

      • E, D, C, B, A

        • E DE, CE, BE, AE

II


I

FP-Growth

D CB A

DE CE BE AE

CD BD AD

BC AC

AB

CDE BDE ADE

BCE ACE

ABE

BCD ACD

ABD

ABC

ACDE BCDE

ABDE

ABCE

ABCD

ABCDE

!

II


I

FP-Growth

D CB A

;

DE CE BE AE

CD BD AD

BC AC

AB

;

CDE BDE ADE

BCE ACE

ABE

BCD ACD

ABD

ABC

;

ACDEBCDE

ABDE

ABCE

ABCD

;

ABCDE

!

II


I

FP-Growth

D CB A

;

DE CE BE AE

CD BD AD

BC AC

AB

;

CDEBDEADE

BCE ACE

ABE

BCD ACD

ABD

ABC

;

;

ACDEBCDE

ABDE

ABCE

ABCD

;

ABCDE

!

II


I

FP-Growth

D CB A

;

DE CE BE AE

CD BD AD

BC AC

AB

;

CDE BDE ADE

BCE ACE

ABE

BCD ACD

ABD

ABC

;

ACDEBCDE

ABDE

ABCE

ABCD

ABCDE

!

!

II


I

FP-Growth

FP-

null

B:3

A:7

B:5

C:3

C:1

D:1

Header table

D:1

C:3

D:1

E:1

E:1

D:1

;

Bottom-up traversal

E, D, C, B suffix-based classes ( )

E:1

D:1

II


I

FP-Growth

: E

null

B:3

A:7

B:5

C:3

C:1

D:1

Header table

D:1

C:3

D:1

E:1

E:1

D:1

E:1

D:1

II


I

FP-Growth

null

D

B:3

A:7

B:5

C:3

C:1

D:1

Header table

D:1

C:3

D:1

E:1

E:1

D:1

E:1

D:1

II


I

FP-Growth

null

C

B:3

A:7

B:5

C:3

C:1

D:1

Header table

D:1

C:3

D:1

E:1

E:1

D:1

E:1

D:1

II


I

FP-Growth

null

B

B:3

A:7

B:5

C:3

C:1

D:1

Header table

D:1

C:3

D:1

E:1

E:1

D:1

E:1

D:1

II


I

FP-Growth

null

B:3

A:7

B:5

C:3

C:1

D:1

Header table

D:1

C:3

D:1

E:1

E:1

D:1

E:1

D:1

II


I

FP-Growth

, suffix

  • 1

  • 2

    • , - ,

      • -

II


I

FP-Growth

1

E

(prefix paths)

null

B:3

A:7

B:5

C:3

C:1

D:1

Header table

D:1

C:3

D:1

E:1

E:1

D:1

E:1

D:1

:

{E}, {D,E}, {C,D,E}, {A,D,}, {A,C,D,E}, {C,E}, {B,C,E}

II


I

FP-Growth

1

E

(prefix paths)

null

B:3

A:7

C:3

C:1

D:1

D:1

E:1

E:1

E:1

:

{E}, {D,E}, {C,D,E}, {A,D,}, {A,C,D,E}, {C,E}, {B,C,E}

II


I

FP-Growth

minsup = 2

{E}

;

1+1+1=3>2

{}

null

B:3

A:7

C:3

C:1

D:1

D:1

E:1

E:1

E:1

{E} DE, CE, BE, AE

II


I

FP-Growth

{E} DE, CE, BE, AE

null

2

FP- (conditional FP-tree)

(1)

(2)

B:3

A:7

C:3

C:1

D:1

D:1

E:1

E:1

E:1

II


I

FP-Growth

null->B->C->E {B, C}

null

B:3

A:7

C:3

C:1

D:1

D:1

E:1

E:1

E:1

II


I

FP-Growth

null

B:3

A:7

C:3

C:1

D:1

D:1

E:1

E:1

E:1

II


I

FP-Growth

null

B:3

A:7

C:1

C:1

D:1

D:1

E:1

E:1

E:1

II


I

FP-Growth

null

B:1

A:7

C:1

C:1

D:1

D:1

E:1

E:1

E:1

II


I

FP-Growth

null

B:1

A:7

C:1

C:1

D:1

D:1

E:1

E:1

E:1

II


I

FP-Growth

null

B:1

A:7

C:1

C:1

D:1

D:1

E:1

E:1

E:1

II


I

FP-Growth

null

B:1

A:2

C:1

C:1

D:1

D:1

E:1

E:1

E:1

II


I

FP-Growth

null

B:1

A:2

C:1

C:1

D:1

D:1

E:1

E:1

E:1

II


I

FP-Growth

(truncate)

null

B:1

A:2

C:1

C:1

D:1

D:1

E:1

E:1

E:1

II


I

FP-Growth

(truncate)

null

B:1

A:2

C:1

C:1

D:1

D:1

E:1

E:1

E:1

II


I

FP-Growth

(truncate)

null

B:1

A:2

C:1

C:1

D:1

D:1

II


I

FP-Growth

->

E minsup

null

B:1

A:2

C:1

C:1

D:1

D:1

II


I

FP-Growth

null

B:1

A:2

C:1

C:1

D:1

D:1

II


I

FP-Growth

null

C:1

A:2

C:1

D:1

D:1

II


I

FP-Growth

- FP-

{D, E}, {C, E}, {A, E}

null

C:1

A:2

C:1

D:1

D:1

II


I

FP-Growth

1

D (DE)

(prefix paths)

null

C:1

A:2

C:1

D:1

D:1

II


I

FP-Growth

1

D (DE)

(prefix paths)

null

A:2

C:1

D:1

D:1

II


I

FP-Growth

{D, E}

;

1+1=22

{D, }

null

A:2

C:1

D:1

D:1

II


I

FP-Growth

2

- FP- {D, E}

1.

2.

null

A:2

C:1

D:1

D:1

II


I

FP-Growth

1.

null

A:2

C:1

D:1

D:1

II


I

FP-Growth

2.

null

A:2

C:1

D:1

D:1

II


I

FP-Growth

2.

null

A:2

C:1

II


I

FP-Growth

2.

null

A:2

C:1

II


I

FP-Growth

- FP- {D, E}

null

A:2

minsup -> {, D, E}

,

II


I

FP-Growth

- FP-

{D, E}, {C, E}, {A, E}

null

C:1

A:2

C:1

D:1

D:1

II


I

FP-Growth

1

C (CE)

(prefix paths)

null

C:1

A:2

C:1

D:1

D:1

II


I

FP-Growth

1

C (CE)

(prefix paths)

null

C:1

A:2

C:1

II


I

FP-Growth

{C, E}

;

1+1=22

{C, }

null

C:1

A:2

C:1

II


I

FP-Growth

- FP- {C, E}

1.

2.

null

C:1

A:2

C:1

II


I

FP-Growth

1.

null

C:1

A:1

C:1

II


I

FP-Growth

2.

null

C:1

A:1

C:1

II


I

FP-Growth

2.

null

A:1

II


I

FP-Growth

2.

null

A:1

II


I

FP-Growth

2.

null

,

II


I

FP-Growth

- FP-

{D, E}, {C, E}, {A, E}

null

C:1

A:2

C:1

D:1

D:1

II


I

FP-Growth

1

(AE)

(prefix paths)

null

C:1

A:2

C:1

D:1

D:1

II


I

FP-Growth

1

(AE)

(prefix paths)

null

A:2

II


I

FP-Growth

{, E}

{, }

- FP- {, }

null

A:2

II


I

FP-Growth

{} {D, E} {A, D, E} {C, E} {A, E}

D

II


I

FP-Growth

null

D

B:3

A:7

B:5

C:3

C:1

D:1

Header table

D:1

C:3

D:1

E:1

E:1

D:1

E:1

D:1

II


I

FP-Growth

1

D

5>2 ->

null

B:3

A:7

FP-

B:5

C:3

C:1

D:1

D:1

C:3

D:1

D:1

D:1

II


I

FP-Growth

1.

null

B:3

A:7

B:5

C:3

C:1

D:1

D:1

C:1

D:1

D:1

D:1

II


I

FP-Growth

1.

null

B:3

A:7

B:2

C:3

C:1

D:1

D:1

C:1

D:1

D:1

D:1

II


I

FP-Growth

1.

null

B:3

A:3

B:2

C:3

C:1

D:1

D:1

C:1

D:1

D:1

D:1

II


I

FP-Growth

1.

null

B:3

A:3

B:2

C:1

C:1

D:1

D:1

C:1

D:1

D:1

D:1

II


I

FP-Growth

1.

null

B:1

A:3

B:2

C:1

C:1

D:1

D:1

C:1

D:1

D:1

D:1

II


I

FP-Growth

2.

null

B:1

A:3

B:2

C:1

C:1

D:1

D:1

C:1

D:1

D:1

D:1

II


I

FP-Growth

2.

null

B:1

A:3

B:2

C:1

C:1

C:1

II


I

FP-Growth

D, D CD

null

B:1

A:3

B:2

C:1

C:1

C:1

II


I

FP-Growth

  • --

  • , -:

  • ,

II


I

FP-Growth

FP-Growth (compaction factor)

(bushy) , ( )

II


I

II


I

  • ()

    • {A,B,C} {D} {A,B} {D} &

  • (interestingness) (prune) (rank)

II


I

-

-

uct

uct

uct

uct

uct

uct

uct

uct

uct

uct

Prod

Prod

Prod

Prod

Prod

Prod

Prod

Prod

Prod

Prod

Featur

Featur

e

Featur

e

Featur

e

Featur

e

Featur

e

Featur

e

Featur

e

Featur

e

Featur

e

e

-

( )

II


I

: (objective) (subjective)

:

contingency ()

II


I

f11: support of X and Yf10: support of X and Yf01: support of X and Yf00: support of X and Y

( )

Contingency table ( )

f11 (support count)

f+1 (support count)

, X Y, contingency table

II


I

. ;

: Tea Coffee

= P(Coffee|Tea) = 0.75

,

P(Coffee|Tea) = 0.9375

P(Coffee) = 0.9

RHS

II


I

/,

,

II


I

1000

  • 600 (S)

  • 700 (B)

  • 420 (S,B)

  • P(SB) = 420/1000 = 0.42

  • P(S) P(B) = 0.6 0.7 = 0.42

  • P(SB) = P(S) P(B) =>

  • P(SB) > P(S) P(B) => Positively correlated ( )

  • P(SB) < P(S) P(B) => Negatively correlated ( )

II


I

:

II


I

: Lift/Interest

  • : Tea Coffee

  • = P(Coffee|Tea) = 0.75

  • P(Coffee) = 0.9

  • Interest = 0.15/(0.9*0.2)= 0.8333 (< 1, )

II


I

Lift & Interest

c = 10/100 = 0.1

s = 1

c = 90/100 = 0.9

s = 1

II


Coefficient

-Coefficient

-1 1

Pearsons coefficient

0:

-1:

1:

II


Coefficient1

-Coefficient

Coefficient

II


Coefficient2

-Coefficient

  • ( )

  • ,

II


Is measure

IS-measure

II


I

;

Apriori-style support based pruning ;

II


I

10 contingency :

(1 , 10 ):

II


I

  • Piatetsky-Shapiro:

  • 3 M:

    • M(A,B) = 0

    • M(A,B) P(A,B) P(A) P(B)

    • M(A,B) P(A) [ P(B)] P(A,B) P(B) [ P(A)]

II


Variable permutation

(variable permutation)

M(A,B) = M(B,A)?

(symmetric) :

  • support (), lift, collective strength, cosine, Jaccard,

    (asymmetric) :

  • confidence (), conviction, Laplace, J-measure,

II


Row column scaling

/ (Row/Column Scaling)

- (Mosteller, 1968):

3

4

1

2

2x

10x

Mosteller: -

Invariant under the row/column scaling operation () = () o contingency [f11, f10; f01; f00] o contingency [13f11, 23f10; 14f01; 24f00] 1, 2, 3, 4

II


Inversion operation

1

.

.

.

.

.

N

(Inversion Operation)

Invariant under the inversion operation f11 f00 f10 f01

II


Null addition

Null Addition ( )

f00

Invariant measures:

  • support, cosine, Jaccard,

    Non-invariant measures:

  • correlation, Gini, mutual information, odds ratio,

II


I

Simpson

Students

c({HTVS=Yes} {EM=Yes})=1/10=10%

c({HTVS=No} {EM=Yes})=4/34=11.8%

Working adults

c({HTVS=Yes} {EM=Yes})=99/180=55%

c({HTVS=No} {EM=Yes})=54/120=45%

c({HTVS=Yes} {EM=Yes})=98/170=57.7%

c({HTVS=No} {EM=Yes})=50/86=58.1%

II


I

Simpson

c({HTVS=Yes} {EM=Yes})=1/10=10%

c({HTVS=No} {EM=Yes})=4/34=11.8%

Students

Working adults

c({HTVS=Yes} {EM=Yes})=98/170=57.7%

c({HTVS=No} {EM=Yes})=50/86=58.1%

c({HTVS=Yes} {EM=Yes})=99/180=55%

c({HTVS=No} {EM=Yes})=54/120=45%

a/b < c/d

p/q < r/s

(a+p)/(b+q) < (c+r)/(d+s)!

(stratification)

II


I

  • :

    • ., 21 (support, confidence, Laplace, Gini, mutual information, Jaccard, etc).

  • :

      • (Silberschatz & Tuzhilin)

      • (Silberschatz & Tuzhilin)

II


I

Interestingness () via Unexpectedness ( )

+

Pattern expected to be frequent

-

Pattern expected to be infrequent

Pattern found to be frequent

Pattern found to be infrequent

-

+

Expected Patterns

-

+

Unexpected Patterns

  • (domain knowledge)

  • ( - evidence)

II


I

:

II


I

:

II


Sgi mineset 3 0

: (SGI/MineSet 3.0)

II


  • Login