Incspan incremental mining of sequential patterns in large database
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

IncSpan :Incremental Mining of Sequential Patterns in Large Database PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on
  • Presentation posted in: General

IncSpan :Incremental Mining of Sequential Patterns in Large Database. Hong Cheng , Xifeng Yan , Jiawei Han Proc. 2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD'04) Advisor : Jia-Ling Koh

Download Presentation

IncSpan :Incremental Mining of Sequential Patterns in Large Database

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Incspan incremental mining of sequential patterns in large database

IncSpan :Incremental Mining of Sequential Patterns in Large Database

Hong Cheng, Xifeng Yan , Jiawei Han

Proc. 2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD'04)

Advisor:Jia-Ling Koh

Speaker:Chun-Wei Hsieh

02/25/2005


Problem

Problem

  • Databases are updated incrementally. (Customer shopping transaction sequences, Weather sequences and patient treatmentsequences)

  • Two kinds of database updates

    (1) INSERT :inserting new sequences

    (New customers)

    (2) APPEND: appending new itemsets/items to the existing sequences

    (newly purchased items for existing customers)


The property of updates

The property of updates :

  • INSERT :

  • If a sequence is infrequent in both and ,it cannot be frequent in

  • APPEND:

  • Even if a sequence is infrequent in both and

    ,it might be frequent in

  • When the database is updated with a combination of INSERT

    and APPEND, we can treat INSERT as a special case of APPEND – treating the inserted sequences as appended transactions to an empty sequence in the original database.


Examples

Examples:

Examples in INSERT and APPEND database


Preliminary concepts

Preliminary Concepts

  • An original sequence database

  • An appended sequence database

  • Min_sup: a minimum support threshold

  • FS: the set of frequent sequential pattern

  • Buffer ratio :

  • SFS: the set of semi-frequent sequential pattern

  • The problem of incremental sequential pattern mining is to mine the set of frequent subsequences FS’in based on FSinstead of mining on from scratch.


Buffering semi frequent patterns

Buffering Semi-frequent Patterns

  • When the database is updated to , there are several possibilities:

  • 1. A pattern which is frequent in is still frequent in

  • 2. A pattern which is semi-frequent in becomes frequent in

  • 3. A pattern which is semi-frequent in is still semi-frequent in

  • 4. Appended database brings new items.

  • 5. A pattern which is infrequent in becomes frequent in

  • 6. A pattern which is infrequent in becomes semi-frequent in

  • Case (1)–(3) are trivial cases


Case 4

Case (4):

  • Appended database brings new items. It does not appear in

  • Property: An item which does not appear in and is brought by has no information in FS or SFS.

  • Solution: Scan the database LDB for single items. Then use the new frequent item as prefix to construct projected database and discover frequent and semi-frequent sequences recursively.


Ldb and odb

LDB

ODB

LDB and ODB

  • LDB is the set of sequences in DB’which are appended with items/itemsets.

  • ODB is the set of sequences in DB which are appended with items/itemsets in DB’.


Case 4 examples c

Case (4):examples (c)

Min_sup=3

u=0.6


Case 5

Case (5):

  • A pattern which is infrequent in becomes frequent in

  • Property: If an infrequent sequence p’in becomes frequent in , all of its prefix subsequences must also be frequent in .

  • Solution: Start from its frequent prefix p in FSand construct p-projected database, we will discover p’.

  • A sequence p’which changes from infrequent to frequent must have sup(p’) > (1 - )*min_sup.

  • If supLDB(p) < (1 - )*min_sup, we can safely prune search with prefix p.


Case 5 examples a c

Case (5):examples (a,c)

Min_sup=3

u=0.6


Case 5 theorem

Case (5):theorem

  • For a frequent pattern p, if its support in LDB supLDB(p) < (1 - )*min_sup, then there is no sequence p’ having p as prefix changing from infrequent in to frequent in

  • Proof : p’ was infrequent in , so sup (p’) < *min_sup (1)

    If supLDB(p) < (1 - )*min_sup, then

    supLDB(p’ )supLDB(p) < (1 - )*min_sup

    Since supLDB(p’ ) = supODB(p’ ) + sup(p’ ).

    Then we have sup(p’ )supLDB(p’ ) < (1 - )*min_sup.(2)

    Since sup (p’ ) = sup (p’) + sup(p’), combining (1)and (2), we have sup (p’) < min_sup. So p’ cannot be frequent in


Case 6

Case (6):

  • A pattern which is infrequent in becomes semi-frequent in

  • Property: If an infrequent sequence p’becomes semifrequent in , all of its prefix subsequences must be either frequent or semi-frequent.

  • Solution: Start from its prefix p in FSor SFSand construct p-projected database, we will discover p’


Case 6 examples be

Case (6):examples (be)

Min_sup=3

u=0.6


Incspan

IncSpan

  • Step 1: Scan LDB for single items, as shown in case (4).

  • Step 2: Check every pattern in FS and SFS in LDB to

    adjust the support of those patterns.

  • Step 2.1: If a pattern becomes frequent, add it to FS’. Then check whether it meets the projection condition. If so,use it as prefix to project database, as shown in case (5).

  • Step 2.2: If a pattern is semi-frequent, add it to SFS’.


Algorithm

Algorithm


Reverse pattern matching

Reverse Pattern Matching

  • Since the appended items are always at the end part of the original sequence, reverse pattern matching would be more efficient than projection from the front

  • If the last item of p is not supported by sa, we can prune searching.

  • If the last item of p is supported by sa, we have to check whether s’supports p. If p is not supported by s’, we can prune searching and keep sup(p) unchanged. Otherwise we have to check whether s supports p. If s supports p, keep sup(p) unchanged; otherwise, increase sup(p) by 1.


Shared projection

Shared Projection

  • when we detect some subsequence that needs projecting database, we do not do the projection immediately. Instead we label it. After finishing checking and labeling all the sequences, we do the projection by traversing the sequential pattern tree.

  • <(a)(b)(c)(d)> <(a)(b)(c)(e)>

    DB’|<(a)(b)(c)>


Experiment

Experiment

(a) varying min sup

(b) varying percentage of updated sequences


Experiment1

Experiment

(c) Memory Usage under varied min sup

(a) varying buffer ratio


Experiment2

Experiment

(b) multiple increments of

database

(c) varying # of sequences (in

1000) in DB


  • Login