Mining of frequent patterns from sensor data
Download
1 / 46

Mining of Frequent Patterns from Sensor Data - PowerPoint PPT Presentation


  • 453 Views
  • Updated On :

Mining of Frequent Patterns from Sensor Data Presented by: Ivy Tong Suk Man Supervisor: Dr. B C M Kao 20 August, 2003 Outline Outline of the Presentation Motivation Problem Definition Algorithm Apriori with data transformation Interval-List Apriori Experimental Results Conclusion

Related searches for Mining of Frequent Patterns from Sensor Data

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Mining of Frequent Patterns from Sensor Data' - issac


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Mining of frequent patterns from sensor data l.jpg

Mining of Frequent Patterns from Sensor Data

Presented by: Ivy Tong Suk Man

Supervisor: Dr. B C M Kao

20 August, 2003


Outline l.jpg
Outline

  • Outline of the Presentation

    • Motivation

    • Problem Definition

    • Algorithm

      • Apriori with data transformation

      • Interval-List Apriori

    • Experimental Results

    • Conclusion


Motivation l.jpg

25ºC

27ºC

28ºC

26ºC

t

0

1

5

10

Motivation

  • Continuous items

    • reflect values from an entity that changes continuously in the external environment.

    • Update  Change of state of the real entity

    • E.g. temperature reading data

      • Initial temperature: 25ºC at t=0s

      • Sequence of updates: <timestamp, new_temp>

        <1s, 27ºC>, <5s, 28ºC>, <10s, 26ºC>, <14s,..> …

      • t=0s to 1s, 25ºC

        t=1s to 5s, 27ºC

        t=5s to 10s, 28ºC

    • What is the average temperature from t=0s to 10s?

      • Ans: (25x1+27x4+28x5)/10 = 27.3ºC


Motivation4 l.jpg
Motivation

  • Time is a component in some applications

    • E.g. stock price quotes, network traffic data

  • “Sensors” are used to monitor some conditions, for example:

    • Prices of stocks: by getting quotations from a finance website

    • Weather: measuring temperature, humidity, air pressure, wind, etc.

  • We want to find correlations of the readings among a set of sensors

  • Goal: To mine association rules from sensor data


Challenges l.jpg
Challenges

  • How different is it from mining association rules from market basket data?

    • Time component

      When searching for association rules in market basket data, time field is usually ignored as there is no temporal correlation between the transactions

    • Streaming data

      Data arrives continuously, possibly infinitely, and in large volume


N otations l.jpg
Notations

  • We have a set of sensors R = {r1,r2,…,rm}

  • Each sensor ri has a set of numerical states Vi

    • Assume binary states for all sensors

    • Vi = {0,1} i s.t. ri R

  • Dataset D: a sequence of updates of sensor state in the form of <ts, ri, vi> where ri R, vi Vi

    • ts : timestamp of the update

    • ri: sensor to be updated

    • vi: new value of the state of ri

    • For sensors with binary states

      • update in form of <ts, ri> as the new state can be inferred by toggling the old state


Example l.jpg
Example

  • R={A,B,C,D,E,F}

  • Initial states: all off

  • D:

    <1,A>

    <2,B>

    <4,D>

    <5,A>

    <6,E>

    <7,F>

    <8,E>

    <10,A>

    <11,F>

    <13,C>

A

t

0

1

5

10

B

t

2

C

t

13

D

t

4

E

t

6

8

F

t

7

11


More notations l.jpg
More Notations

  • An association rule is a rule, satisfying certain support and confidence restrictions, in the form X  Ywhere XR, YR and XY=


More notations9 l.jpg
More Notations

  • Association rule X  Y has confidence c,

    In c % of the time when the sensors in X are ON (with state = 1), the sensors in Y are ON

  • Association rule X  Y has support s,

    In s% of the total length of history, the sensors in X and Y are ON


More notations10 l.jpg
More Notations

  • TLS(X) denote Total LifeSpan of X

    • Total length of time that the sensors in X are ON

  • T – total length of history

  • Sup(X) = TLS(X)/T

    Conf(X  Y) = Sup(X U Y) / Sup(X)

  • Example:

    T = 15s

    TLS(A)=9, TLS(AB)=8

    Sup(A) = 9/15 = 60%

    Sup(AB) =8/15 = 53%

    Conf(A->B) = 8/9 = 89%

A

t

0

1

5

10

B

t

2


Algorithm a l.jpg
Algorithm A

  • Transform & Apriori

    • Transform the sequence of updates to the form of market basket data

    • At each point of update

      • take a snapshot of the states of all sensors

      • Output all sensors with state=on as a transaction

      • Attach

        Weight(transaction)

        = Lifespan(this update)

        = timestamp(next update) – timestamp(this update)


Algorithm a example l.jpg

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - Example

A

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

4

E

t

6

8

F

t

7

11


Algorithm a example13 l.jpg

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - Example

A

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

timestamp=1

4

E

t

6

8

F

t

7

11

timestamp=1


Algorithm a example14 l.jpg

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - Example

A

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

timestamp=1

4

timestamp=2

E

t

6

8

F

t

7

11

timestamp=2


Algorithm a example15 l.jpg

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - Example

A

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

4

timestamp=2

E

t

6

8

timestamp=4

F

t

7

11

timestamp=4


Algorithm a example16 l.jpg

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - Example

A

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

4

E

t

6

8

F

t

7

11

End of history = 15s

timestamp=13


Algorithm a example17 l.jpg

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - Example

A

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

4

E

t

6

8

F

t

7

11


Algorithm a18 l.jpg
Algorithm A

  • Apply Apriori on the transformed dataset D’

  • Drawbacks:

    • A lot of redundancy

    • Adjacent transactions may be very similar, differed by the one sensor with state update


Algorithm b l.jpg
Algorithm B

  • Interval-List Apriori

  • Uses an “interval-list” format

    • <X, interval1, interval2, interval3, … >

      where intervali is the interval in which all sensors in X are on.

    • TLS(X) =  (intervali.h – intervali.l)

  • Example:

A

t

0

1

5

10

<A, [1,5), [10,15)> TLS(A) = (5-1) + (15-10) = 9


Algorithm b20 l.jpg
Algorithm B

  • Step 1:

    For each ri R,

    build a list of interval in which ri is ON by scanning the sequence of updates

  • Calculate the TLS of each ri

    • If TLS(ri)  min_sup, put ri into L1


Algorithm b example l.jpg
Algorithm B – Example

  • Initial states: all off

  • D:

    <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

  • <A, empty>

  • <B, empty>

  • <C, empty>

  • <D, empty>

  • <E, empty>

  • <F, empty>


Algorithm b example22 l.jpg
Algorithm B – Example

  • Initial states: all off

  • D:

    <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

  • <A, [1,?)>

  • <B, empty>

  • <C, empty>

  • <D, empty>

  • <E, empty>

  • <F, empty>


Algorithm b example23 l.jpg
Algorithm B – Example

  • Initial states: all off

  • D:

    <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

  • <A, [1,?)>

  • <B, [2,?)>

  • <C, empty>

  • <D, empty>

  • <E, empty>

  • <F, empty>


Algorithm b example24 l.jpg
Algorithm B – Example

  • Initial states: all off

  • D:

    <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

  • <A, [1,5)>

  • <B, [2,?)>

  • <C, empty>

  • <D, [4,?)>

  • <E, empty>

  • <F, empty>


Algorithm b example25 l.jpg
Algorithm B – Example

  • Initial states: all off

  • D:

    <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

  • <A, [1,5),[10,?)>

  • <B, [2,?)>

  • <C, [13,?)>

  • <D, [4,?)>

  • <E, [6,8)>

  • <F, [7,11)>


Algorithm b example26 l.jpg
Algorithm B – Example

  • Initial states: all off

  • D:

    <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

  • <A, [1,5),[10,15)>

  • <B, [2,15)>

  • <C, [13,15)>

  • <D, [4,15)>

  • <E, [6,8)>

  • <F, [7,11)>

End of history T =15s


Algorithm b27 l.jpg
Algorithm B

  • Step 2:

    • Find all larger frequent sensor-sets

  • Similar to Apriori Frequent Itemst Property

    • Any subset of a frequent sensor-set must be frequent.

  • Method:

    • Generate candidates of size i+1 from frequent sensor-sets of size i.

    • Approach used: join to obtain sensor-sets of size i+1 if two size-i frequent sensor-sets agree on i-1

    • May also prune candidates who have subsets that are not large.

    • Count the support by merging (intersection of) the interval lists of the two size-i frequent sensor-sets

    • If sup  min_sup, put into Li+1

    • Repeat the process until the candidate set is empty


Algorithm b28 l.jpg
Algorithm B

  • Example:

    • <A, [1,5), [10,15)>

    • <B, [2,15)>

    • <AB, [2,5),[10,15)>

A

t

0

1

5

10

B

t

2

T=15


Algorithm b example29 l.jpg
Algorithm B (Example)

C

D

E

F

A

B

LS:2

LS:11

LS:2

LS:4

LS:13

LS:9

AB

AF

BF

BD

AD

LS:1

LS:4

LS:11

LS:6

LS:8

ABD

Min support count: 3

LS:6


Algorithm b candidate generation l.jpg
Algorithm B – Candidate Generation

  • When generating a candidate sensor-set C of size i from two size i-1 sensor-sets LA and LB (subsets of C), we also construct the interval list of C by intersecting the interval lists of LA and LB.

  • Joining the two interval lists (of length m and n) is a key step in our algorithm

    • Use simple linear scan requires O(m+n) time

  • There are i different size i-1 subset of C

    which two to pick?


Algorithm b candidate generation31 l.jpg
Algorithm B – Candidate Generation

  • Method 1:

    • Choose two lists with fewest no of intervals

    • =>Store no of intervals for each sensor-set

  • Method 2:

    • Choose two lists with smallest count (TLS)

    • Intuitively shorter lifespan implies fewer intervals

    • Easier to implement

      • Have the lifespan when checking if the sensor-set is frequent


Experiments l.jpg
Experiments

  • Data generation

    • Stimulate data generated by a set of n binary sensors

    • Make use of a standard market basket data

    • With n sensors, each of which can be either on or off

      =>2n possible combination of sensor states

    • Assign a probability to each of the combinations


Experiments data gen l.jpg
Experiments – Data Gen

  • How to assign the probabilities?

    • Let N be the no of occurrences of the transaction in the market basket that contains exactly only the sensors that are ON

      • E.g. Consider R={A,B,C,D,E,F}

      • Suppose we want to assign prob to the sensor state AC (only A and C are ON)

      • N is no of transactions that contain exactly only A and C

    • Assign prob = N/|D|, where |D| is the size of the market basket dataset

    • Note: Need sufficiently large market basket data

      • transactions that occur very infrequently will not be given ZERO probability


Experiments data gen34 l.jpg
Experiments – Data Gen

  • Generating sensor set data

    • Choose the initial state (at t=0s)

      • Randomly

      • According to the probabilities assigned

      • Pick the combination with highest probability assigned

        => first sensor set states


Experiment data gen l.jpg
Experiment – Data Gen

  • What is the next set of sensor-set states?

    • For simplicity, in our model, only one sensor can be updated at a time

    • For any two adjacent updates, the sensor-set states at the two time instants are differed by only one sensor

      => change only one sensor state

      => n possible combinations by toggling each of the n sensor states

    • We normalize the probabilities of the n combinations by their sum

    • Pick the next set of sensor-set states according to the normalized probabilities

  • Inter-arrival time of updates: exponential distribution


Experiments36 l.jpg
Experiments

  • Market Basket Dataset

    • 8,000,000 transactions

    • 100 items

    • number of maximal potentially large itemsets = 2000

    • average transaction length: 10

    • average length of maximal large itemsets: 4

    • length of the maximal large itemsets: 11

    • minimum support: 0.05%

    • length of the maximal large itemsets: ?

  • Algorithms:

    • Apriori: cached mode

    • IL-apriori:

      • (a) random-join (IL-apriori)

      • (b) join-by-smallest lifespan (IL-apriori-S)

      • (c) join-by-fewest-no-of-intervals (IL-apriori-C)


Experiments results l.jpg
Experiments - Results

  • Performance of algorithms (larger support):

    • All IL-apriori algorithms outperform cache apriori


Experiments results38 l.jpg
Experiments - Results

  • Performance (lower support):

    • More candidates => IL-apriori: Expensive to join interval lists


Experiments results39 l.jpg
Experiments - Results

  • More long frequent sensor-sets

    • Apriori has to match the candidates by search through the DB

    • IL-apriori-C and IL-apriori-S reduce a lot of time in joining the lists


Experiments results40 l.jpg
Experiments - Results

  • Amounts of memory usage - peak memory usage

  • Cache apriori - store the whole database

  • IL-apriori – store a lot of interval lists when no of candidates is growing large


Experiments results experiments results l.jpg
Experiments – Results Experiments - Results

(min_sup = 0.02%)

  • Apriori is faster in the first 3 passes

  • Running time for IL-apriori drops sharply after

    • Apriori has to scan over the whole database

    • IL-apriori (C/S) needs to join relatively short interval-lists in later passes


Experiments results42 l.jpg
Experiments - Results

(min_sup = 0.02%)

  • Memory requirement for IL-apriori is a lot higher when there are more frequent sensor-set interval lists to join


Experiments results43 l.jpg
Experiments - Results

(min_sup = 0.05%)

  • Runtime for all algorithms increases linearly with total number of transactions


Experiments results44 l.jpg
Experiments - Results

(min_sup = 0.05%)

  • Memory required by all algorithms increases as no of transactions increases.

  • Rate of increase in IL-apriori is faster


Conclusions l.jpg
Conclusions

  • Interval-list method to mine sensor data is described

  • Two interval list joining strategies are quite effective in reducing running time

  • Memory requirement is quite high

  • Future Work

    • Other methods for joining interval-lists

      • Trade-off between time and space

    • Extending to the streaming case

      • Consider approaches other than Lossy Counting Algorithms (Manku, and R. Motwani, VLDB’02)



ad