Mining of frequent patterns from sensor data
Download
1 / 46

- PowerPoint PPT Presentation


  • 456 Views
  • Updated On :

Mining of Frequent Patterns from Sensor Data Presented by: Ivy Tong Suk Man Supervisor: Dr. B C M Kao 20 August, 2003 Outline Outline of the Presentation Motivation Problem Definition Algorithm Apriori with data transformation Interval-List Apriori Experimental Results Conclusion

Related searches for

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - issac


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Mining of frequent patterns from sensor data l.jpg

Mining of Frequent Patterns from Sensor Data

Presented by: Ivy Tong Suk Man

Supervisor: Dr. B C M Kao

20 August, 2003


Outline l.jpg
Outline

  • Outline of the Presentation

    • Motivation

    • Problem Definition

    • Algorithm

      • Apriori with data transformation

      • Interval-List Apriori

    • Experimental Results

    • Conclusion


Motivation l.jpg

25ºC

27ºC

28ºC

26ºC

t

0

1

5

10

Motivation

  • Continuous items

    • reflect values from an entity that changes continuously in the external environment.

    • Update  Change of state of the real entity

    • E.g. temperature reading data

      • Initial temperature: 25ºC at t=0s

      • Sequence of updates: <timestamp, new_temp>

        <1s, 27ºC>, <5s, 28ºC>, <10s, 26ºC>, <14s,..> …

      • t=0s to 1s, 25ºC

        t=1s to 5s, 27ºC

        t=5s to 10s, 28ºC

    • What is the average temperature from t=0s to 10s?

      • Ans: (25x1+27x4+28x5)/10 = 27.3ºC


Motivation4 l.jpg
Motivation

  • Time is a component in some applications

    • E.g. stock price quotes, network traffic data

  • “Sensors” are used to monitor some conditions, for example:

    • Prices of stocks: by getting quotations from a finance website

    • Weather: measuring temperature, humidity, air pressure, wind, etc.

  • We want to find correlations of the readings among a set of sensors

  • Goal: To mine association rules from sensor data


Challenges l.jpg
Challenges

  • How different is it from mining association rules from market basket data?

    • Time component

      When searching for association rules in market basket data, time field is usually ignored as there is no temporal correlation between the transactions

    • Streaming data

      Data arrives continuously, possibly infinitely, and in large volume


N otations l.jpg
Notations

  • We have a set of sensors R = {r1,r2,…,rm}

  • Each sensor ri has a set of numerical states Vi

    • Assume binary states for all sensors

    • Vi = {0,1} i s.t. ri R

  • Dataset D: a sequence of updates of sensor state in the form of <ts, ri, vi> where ri R, vi Vi

    • ts : timestamp of the update

    • ri: sensor to be updated

    • vi: new value of the state of ri

    • For sensors with binary states

      • update in form of <ts, ri> as the new state can be inferred by toggling the old state


Example l.jpg
Example

  • R={A,B,C,D,E,F}

  • Initial states: all off

  • D:

    <1,A>

    <2,B>

    <4,D>

    <5,A>

    <6,E>

    <7,F>

    <8,E>

    <10,A>

    <11,F>

    <13,C>

A

t

0

1

5

10

B

t

2

C

t

13

D

t

4

E

t

6

8

F

t

7

11


More notations l.jpg
More Notations

  • An association rule is a rule, satisfying certain support and confidence restrictions, in the form X  Ywhere XR, YR and XY=


More notations9 l.jpg
More Notations

  • Association rule X  Y has confidence c,

    In c % of the time when the sensors in X are ON (with state = 1), the sensors in Y are ON

  • Association rule X  Y has support s,

    In s% of the total length of history, the sensors in X and Y are ON


More notations10 l.jpg
More Notations

  • TLS(X) denote Total LifeSpan of X

    • Total length of time that the sensors in X are ON

  • T – total length of history

  • Sup(X) = TLS(X)/T

    Conf(X  Y) = Sup(X U Y) / Sup(X)

  • Example:

    T = 15s

    TLS(A)=9, TLS(AB)=8

    Sup(A) = 9/15 = 60%

    Sup(AB) =8/15 = 53%

    Conf(A->B) = 8/9 = 89%

A

t

0

1

5

10

B

t

2


Algorithm a l.jpg
Algorithm A

  • Transform & Apriori

    • Transform the sequence of updates to the form of market basket data

    • At each point of update

      • take a snapshot of the states of all sensors

      • Output all sensors with state=on as a transaction

      • Attach

        Weight(transaction)

        = Lifespan(this update)

        = timestamp(next update) – timestamp(this update)


Algorithm a example l.jpg

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - Example

A

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

4

E

t

6

8

F

t

7

11


Algorithm a example13 l.jpg

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - Example

A

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

timestamp=1

4

E

t

6

8

F

t

7

11

timestamp=1


Algorithm a example14 l.jpg

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - Example

A

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

timestamp=1

4

timestamp=2

E

t

6

8

F

t

7

11

timestamp=2


Algorithm a example15 l.jpg

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - Example

A

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

4

timestamp=2

E

t

6

8

timestamp=4

F

t

7

11

timestamp=4


Algorithm a example16 l.jpg

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - Example

A

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

4

E

t

6

8

F

t

7

11

End of history = 15s

timestamp=13


Algorithm a example17 l.jpg

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - Example

A

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

4

E

t

6

8

F

t

7

11


Algorithm a18 l.jpg
Algorithm A

  • Apply Apriori on the transformed dataset D’

  • Drawbacks:

    • A lot of redundancy

    • Adjacent transactions may be very similar, differed by the one sensor with state update


Algorithm b l.jpg
Algorithm B

  • Interval-List Apriori

  • Uses an “interval-list” format

    • <X, interval1, interval2, interval3, … >

      where intervali is the interval in which all sensors in X are on.

    • TLS(X) =  (intervali.h – intervali.l)

  • Example:

A

t

0

1

5

10

<A, [1,5), [10,15)> TLS(A) = (5-1) + (15-10) = 9


Algorithm b20 l.jpg
Algorithm B

  • Step 1:

    For each ri R,

    build a list of interval in which ri is ON by scanning the sequence of updates

  • Calculate the TLS of each ri

    • If TLS(ri)  min_sup, put ri into L1


Algorithm b example l.jpg
Algorithm B – Example

  • Initial states: all off

  • D:

    <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

  • <A, empty>

  • <B, empty>

  • <C, empty>

  • <D, empty>

  • <E, empty>

  • <F, empty>


Algorithm b example22 l.jpg
Algorithm B – Example

  • Initial states: all off

  • D:

    <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

  • <A, [1,?)>

  • <B, empty>

  • <C, empty>

  • <D, empty>

  • <E, empty>

  • <F, empty>


Algorithm b example23 l.jpg
Algorithm B – Example

  • Initial states: all off

  • D:

    <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

  • <A, [1,?)>

  • <B, [2,?)>

  • <C, empty>

  • <D, empty>

  • <E, empty>

  • <F, empty>


Algorithm b example24 l.jpg
Algorithm B – Example

  • Initial states: all off

  • D:

    <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

  • <A, [1,5)>

  • <B, [2,?)>

  • <C, empty>

  • <D, [4,?)>

  • <E, empty>

  • <F, empty>


Algorithm b example25 l.jpg
Algorithm B – Example

  • Initial states: all off

  • D:

    <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

  • <A, [1,5),[10,?)>

  • <B, [2,?)>

  • <C, [13,?)>

  • <D, [4,?)>

  • <E, [6,8)>

  • <F, [7,11)>


Algorithm b example26 l.jpg
Algorithm B – Example

  • Initial states: all off

  • D:

    <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

  • <A, [1,5),[10,15)>

  • <B, [2,15)>

  • <C, [13,15)>

  • <D, [4,15)>

  • <E, [6,8)>

  • <F, [7,11)>

End of history T =15s


Algorithm b27 l.jpg
Algorithm B

  • Step 2:

    • Find all larger frequent sensor-sets

  • Similar to Apriori Frequent Itemst Property

    • Any subset of a frequent sensor-set must be frequent.

  • Method:

    • Generate candidates of size i+1 from frequent sensor-sets of size i.

    • Approach used: join to obtain sensor-sets of size i+1 if two size-i frequent sensor-sets agree on i-1

    • May also prune candidates who have subsets that are not large.

    • Count the support by merging (intersection of) the interval lists of the two size-i frequent sensor-sets

    • If sup  min_sup, put into Li+1

    • Repeat the process until the candidate set is empty


Algorithm b28 l.jpg
Algorithm B

  • Example:

    • <A, [1,5), [10,15)>

    • <B, [2,15)>

    • <AB, [2,5),[10,15)>

A

t

0

1

5

10

B

t

2

T=15


Algorithm b example29 l.jpg
Algorithm B (Example)

C

D

E

F

A

B

LS:2

LS:11

LS:2

LS:4

LS:13

LS:9

AB

AF

BF

BD

AD

LS:1

LS:4

LS:11

LS:6

LS:8

ABD

Min support count: 3

LS:6


Algorithm b candidate generation l.jpg
Algorithm B – Candidate Generation

  • When generating a candidate sensor-set C of size i from two size i-1 sensor-sets LA and LB (subsets of C), we also construct the interval list of C by intersecting the interval lists of LA and LB.

  • Joining the two interval lists (of length m and n) is a key step in our algorithm

    • Use simple linear scan requires O(m+n) time

  • There are i different size i-1 subset of C

    which two to pick?


Algorithm b candidate generation31 l.jpg
Algorithm B – Candidate Generation

  • Method 1:

    • Choose two lists with fewest no of intervals

    • =>Store no of intervals for each sensor-set

  • Method 2:

    • Choose two lists with smallest count (TLS)

    • Intuitively shorter lifespan implies fewer intervals

    • Easier to implement

      • Have the lifespan when checking if the sensor-set is frequent


Experiments l.jpg
Experiments

  • Data generation

    • Stimulate data generated by a set of n binary sensors

    • Make use of a standard market basket data

    • With n sensors, each of which can be either on or off

      =>2n possible combination of sensor states

    • Assign a probability to each of the combinations


Experiments data gen l.jpg
Experiments – Data Gen

  • How to assign the probabilities?

    • Let N be the no of occurrences of the transaction in the market basket that contains exactly only the sensors that are ON

      • E.g. Consider R={A,B,C,D,E,F}

      • Suppose we want to assign prob to the sensor state AC (only A and C are ON)

      • N is no of transactions that contain exactly only A and C

    • Assign prob = N/|D|, where |D| is the size of the market basket dataset

    • Note: Need sufficiently large market basket data

      • transactions that occur very infrequently will not be given ZERO probability


Experiments data gen34 l.jpg
Experiments – Data Gen

  • Generating sensor set data

    • Choose the initial state (at t=0s)

      • Randomly

      • According to the probabilities assigned

      • Pick the combination with highest probability assigned

        => first sensor set states


Experiment data gen l.jpg
Experiment – Data Gen

  • What is the next set of sensor-set states?

    • For simplicity, in our model, only one sensor can be updated at a time

    • For any two adjacent updates, the sensor-set states at the two time instants are differed by only one sensor

      => change only one sensor state

      => n possible combinations by toggling each of the n sensor states

    • We normalize the probabilities of the n combinations by their sum

    • Pick the next set of sensor-set states according to the normalized probabilities

  • Inter-arrival time of updates: exponential distribution


Experiments36 l.jpg
Experiments

  • Market Basket Dataset

    • 8,000,000 transactions

    • 100 items

    • number of maximal potentially large itemsets = 2000

    • average transaction length: 10

    • average length of maximal large itemsets: 4

    • length of the maximal large itemsets: 11

    • minimum support: 0.05%

    • length of the maximal large itemsets: ?

  • Algorithms:

    • Apriori: cached mode

    • IL-apriori:

      • (a) random-join (IL-apriori)

      • (b) join-by-smallest lifespan (IL-apriori-S)

      • (c) join-by-fewest-no-of-intervals (IL-apriori-C)


Experiments results l.jpg
Experiments - Results

  • Performance of algorithms (larger support):

    • All IL-apriori algorithms outperform cache apriori


Experiments results38 l.jpg
Experiments - Results

  • Performance (lower support):

    • More candidates => IL-apriori: Expensive to join interval lists


Experiments results39 l.jpg
Experiments - Results

  • More long frequent sensor-sets

    • Apriori has to match the candidates by search through the DB

    • IL-apriori-C and IL-apriori-S reduce a lot of time in joining the lists


Experiments results40 l.jpg
Experiments - Results

  • Amounts of memory usage - peak memory usage

  • Cache apriori - store the whole database

  • IL-apriori – store a lot of interval lists when no of candidates is growing large


Experiments results experiments results l.jpg
Experiments – Results Experiments - Results

(min_sup = 0.02%)

  • Apriori is faster in the first 3 passes

  • Running time for IL-apriori drops sharply after

    • Apriori has to scan over the whole database

    • IL-apriori (C/S) needs to join relatively short interval-lists in later passes


Experiments results42 l.jpg
Experiments - Results

(min_sup = 0.02%)

  • Memory requirement for IL-apriori is a lot higher when there are more frequent sensor-set interval lists to join


Experiments results43 l.jpg
Experiments - Results

(min_sup = 0.05%)

  • Runtime for all algorithms increases linearly with total number of transactions


Experiments results44 l.jpg
Experiments - Results

(min_sup = 0.05%)

  • Memory required by all algorithms increases as no of transactions increases.

  • Rate of increase in IL-apriori is faster


Conclusions l.jpg
Conclusions

  • Interval-list method to mine sensor data is described

  • Two interval list joining strategies are quite effective in reducing running time

  • Memory requirement is quite high

  • Future Work

    • Other methods for joining interval-lists

      • Trade-off between time and space

    • Extending to the streaming case

      • Consider approaches other than Lossy Counting Algorithms (Manku, and R. Motwani, VLDB’02)



ad