- By
**issac** - Follow User

- 481 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Mining of Frequent Patterns from Sensor Data' - issac

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Mining of Frequent Patterns from Sensor Data

Presented by: Ivy Tong Suk Man

Supervisor: Dr. B C M Kao

20 August, 2003

Outline

- Outline of the Presentation
- Motivation
- Problem Definition
- Algorithm
- Apriori with data transformation
- Interval-List Apriori
- Experimental Results
- Conclusion

27ºC

28ºC

26ºC

t

0

1

5

10

Motivation- Continuous items
- reflect values from an entity that changes continuously in the external environment.
- Update Change of state of the real entity
- E.g. temperature reading data
- Initial temperature: 25ºC at t=0s
- Sequence of updates: <timestamp, new_temp>

<1s, 27ºC>, <5s, 28ºC>, <10s, 26ºC>, <14s,..> …

- t=0s to 1s, 25ºC

t=1s to 5s, 27ºC

t=5s to 10s, 28ºC

- What is the average temperature from t=0s to 10s?
- Ans: (25x1+27x4+28x5)/10 = 27.3ºC

Motivation

- Time is a component in some applications
- E.g. stock price quotes, network traffic data
- “Sensors” are used to monitor some conditions, for example:
- Prices of stocks: by getting quotations from a finance website
- Weather: measuring temperature, humidity, air pressure, wind, etc.
- We want to find correlations of the readings among a set of sensors
- Goal: To mine association rules from sensor data

Challenges

- How different is it from mining association rules from market basket data?
- Time component

When searching for association rules in market basket data, time field is usually ignored as there is no temporal correlation between the transactions

- Streaming data

Data arrives continuously, possibly infinitely, and in large volume

Notations

- We have a set of sensors R = {r1,r2,…,rm}
- Each sensor ri has a set of numerical states Vi
- Assume binary states for all sensors
- Vi = {0,1} i s.t. ri R
- Dataset D: a sequence of updates of sensor state in the form of <ts, ri, vi> where ri R, vi Vi
- ts : timestamp of the update
- ri: sensor to be updated
- vi: new value of the state of ri
- For sensors with binary states
- update in form of <ts, ri> as the new state can be inferred by toggling the old state

Example

- R={A,B,C,D,E,F}
- Initial states: all off
- D:

<1,A>

<2,B>

<4,D>

<5,A>

<6,E>

<7,F>

<8,E>

<10,A>

<11,F>

<13,C>

A

t

0

1

5

10

B

t

2

C

t

13

D

t

4

E

t

6

8

F

t

7

11

More Notations

- An association rule is a rule, satisfying certain support and confidence restrictions, in the form X Ywhere XR, YR and XY=

More Notations

- Association rule X Y has confidence c,

In c % of the time when the sensors in X are ON (with state = 1), the sensors in Y are ON

- Association rule X Y has support s,

In s% of the total length of history, the sensors in X and Y are ON

More Notations

- TLS(X) denote Total LifeSpan of X
- Total length of time that the sensors in X are ON
- T – total length of history
- Sup(X) = TLS(X)/T

Conf(X Y) = Sup(X U Y) / Sup(X)

- Example:

T = 15s

TLS(A)=9, TLS(AB)=8

Sup(A) = 9/15 = 60%

Sup(AB) =8/15 = 53%

Conf(A->B) = 8/9 = 89%

A

t

0

1

5

10

B

t

2

Algorithm A

- Transform & Apriori
- Transform the sequence of updates to the form of market basket data
- At each point of update
- take a snapshot of the states of all sensors
- Output all sensors with state=on as a transaction
- Attach

Weight(transaction)

= Lifespan(this update)

= timestamp(next update) – timestamp(this update)

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - ExampleA

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

4

E

t

6

8

F

t

7

11

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - ExampleA

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

timestamp=1

4

E

t

6

8

F

t

7

11

timestamp=1

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - ExampleA

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

timestamp=1

4

timestamp=2

E

t

6

8

F

t

7

11

timestamp=2

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - ExampleA

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

4

timestamp=2

E

t

6

8

timestamp=4

F

t

7

11

timestamp=4

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - ExampleA

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

4

E

t

6

8

F

t

7

11

End of history = 15s

timestamp=13

Initial states: all off

D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C>

Algorithm A - ExampleA

t

0

1

5

10

B

t

2

Transformed database D’:

C

t

13

D

t

4

E

t

6

8

F

t

7

11

Algorithm A

- Apply Apriori on the transformed dataset D’
- Drawbacks:
- A lot of redundancy
- Adjacent transactions may be very similar, differed by the one sensor with state update

Algorithm B

- Interval-List Apriori
- Uses an “interval-list” format
- <X, interval1, interval2, interval3, … >

where intervali is the interval in which all sensors in X are on.

- TLS(X) = (intervali.h – intervali.l)
- Example:

A

t

0

1

5

10

<A, [1,5), [10,15)> TLS(A) = (5-1) + (15-10) = 9

Algorithm B

- Step 1:

For each ri R,

build a list of interval in which ri is ON by scanning the sequence of updates

- Calculate the TLS of each ri
- If TLS(ri) min_sup, put ri into L1

Algorithm B – Example

- Initial states: all off
- D:

<1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

- <A, empty>
- <B, empty>
- <C, empty>
- <D, empty>
- <E, empty>
- <F, empty>

Algorithm B – Example

- Initial states: all off
- D:

<1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

- <A, [1,?)>
- <B, empty>
- <C, empty>
- <D, empty>
- <E, empty>
- <F, empty>

Algorithm B – Example

- Initial states: all off
- D:

<1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

- <A, [1,?)>
- <B, [2,?)>
- <C, empty>
- <D, empty>
- <E, empty>
- <F, empty>

Algorithm B – Example

- Initial states: all off
- D:

<1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

- <A, [1,5)>
- <B, [2,?)>
- <C, empty>
- <D, [4,?)>
- <E, empty>
- <F, empty>

Algorithm B – Example

- Initial states: all off
- D:

<1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

- <A, [1,5),[10,?)>
- <B, [2,?)>
- <C, [13,?)>
- <D, [4,?)>
- <E, [6,8)>
- <F, [7,11)>

Algorithm B – Example

- Initial states: all off
- D:

<1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C>

- <A, [1,5),[10,15)>
- <B, [2,15)>
- <C, [13,15)>
- <D, [4,15)>
- <E, [6,8)>
- <F, [7,11)>

End of history T =15s

Algorithm B

- Step 2:
- Find all larger frequent sensor-sets
- Similar to Apriori Frequent Itemst Property
- Any subset of a frequent sensor-set must be frequent.
- Method:
- Generate candidates of size i+1 from frequent sensor-sets of size i.
- Approach used: join to obtain sensor-sets of size i+1 if two size-i frequent sensor-sets agree on i-1
- May also prune candidates who have subsets that are not large.
- Count the support by merging (intersection of) the interval lists of the two size-i frequent sensor-sets
- If sup min_sup, put into Li+1
- Repeat the process until the candidate set is empty

Algorithm B (Example)

C

D

E

F

A

B

LS:2

LS:11

LS:2

LS:4

LS:13

LS:9

AB

AF

BF

BD

AD

LS:1

LS:4

LS:11

LS:6

LS:8

ABD

Min support count: 3

LS:6

Algorithm B – Candidate Generation

- When generating a candidate sensor-set C of size i from two size i-1 sensor-sets LA and LB (subsets of C), we also construct the interval list of C by intersecting the interval lists of LA and LB.
- Joining the two interval lists (of length m and n) is a key step in our algorithm
- Use simple linear scan requires O(m+n) time
- There are i different size i-1 subset of C

which two to pick?

Algorithm B – Candidate Generation

- Method 1:
- Choose two lists with fewest no of intervals
- =>Store no of intervals for each sensor-set
- Method 2:
- Choose two lists with smallest count (TLS)
- Intuitively shorter lifespan implies fewer intervals
- Easier to implement
- Have the lifespan when checking if the sensor-set is frequent

Experiments

- Data generation
- Stimulate data generated by a set of n binary sensors
- Make use of a standard market basket data
- With n sensors, each of which can be either on or off

=>2n possible combination of sensor states

- Assign a probability to each of the combinations

Experiments – Data Gen

- How to assign the probabilities?
- Let N be the no of occurrences of the transaction in the market basket that contains exactly only the sensors that are ON
- E.g. Consider R={A,B,C,D,E,F}
- Suppose we want to assign prob to the sensor state AC (only A and C are ON)
- N is no of transactions that contain exactly only A and C
- Assign prob = N/|D|, where |D| is the size of the market basket dataset
- Note: Need sufficiently large market basket data
- transactions that occur very infrequently will not be given ZERO probability

Experiments – Data Gen

- Generating sensor set data
- Choose the initial state (at t=0s)
- Randomly
- According to the probabilities assigned
- Pick the combination with highest probability assigned

=> first sensor set states

Experiment – Data Gen

- What is the next set of sensor-set states?
- For simplicity, in our model, only one sensor can be updated at a time
- For any two adjacent updates, the sensor-set states at the two time instants are differed by only one sensor

=> change only one sensor state

=> n possible combinations by toggling each of the n sensor states

- We normalize the probabilities of the n combinations by their sum
- Pick the next set of sensor-set states according to the normalized probabilities
- Inter-arrival time of updates: exponential distribution

Experiments

- Market Basket Dataset
- 8,000,000 transactions
- 100 items
- number of maximal potentially large itemsets = 2000
- average transaction length: 10
- average length of maximal large itemsets: 4
- length of the maximal large itemsets: 11
- minimum support: 0.05%
- length of the maximal large itemsets: ?
- Algorithms:
- Apriori: cached mode
- IL-apriori:
- (a) random-join (IL-apriori)
- (b) join-by-smallest lifespan (IL-apriori-S)
- (c) join-by-fewest-no-of-intervals (IL-apriori-C)

Experiments - Results

- Performance of algorithms (larger support):
- All IL-apriori algorithms outperform cache apriori

Experiments - Results

- Performance (lower support):
- More candidates => IL-apriori: Expensive to join interval lists

Experiments - Results

- More long frequent sensor-sets
- Apriori has to match the candidates by search through the DB
- IL-apriori-C and IL-apriori-S reduce a lot of time in joining the lists

Experiments - Results

- Amounts of memory usage - peak memory usage
- Cache apriori - store the whole database
- IL-apriori – store a lot of interval lists when no of candidates is growing large

Experiments – Results Experiments - Results

(min_sup = 0.02%)

- Apriori is faster in the first 3 passes
- Running time for IL-apriori drops sharply after
- Apriori has to scan over the whole database
- IL-apriori (C/S) needs to join relatively short interval-lists in later passes

Experiments - Results

(min_sup = 0.02%)

- Memory requirement for IL-apriori is a lot higher when there are more frequent sensor-set interval lists to join

Experiments - Results

(min_sup = 0.05%)

- Runtime for all algorithms increases linearly with total number of transactions

Experiments - Results

(min_sup = 0.05%)

- Memory required by all algorithms increases as no of transactions increases.
- Rate of increase in IL-apriori is faster

Conclusions

- Interval-list method to mine sensor data is described
- Two interval list joining strategies are quite effective in reducing running time
- Memory requirement is quite high
- Future Work
- Other methods for joining interval-lists
- Trade-off between time and space
- Extending to the streaming case
- Consider approaches other than Lossy Counting Algorithms (Manku, and R. Motwani, VLDB’02)

Download Presentation

Connecting to Server..