1 / 20

Continuous Data Stream Processing

Continuous Data Stream Processing. MAKE Lab. Post-Excellence Project Subproject 6. Date: 2006/03/07. Peer search engine. Profile database. Cluster coordinator. Cluster monitor. Music channel simulator. XML Filtering engine. MusicXML database. Music Virtual Channel. Clustering

beck
Download Presentation

Continuous Data Stream Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continuous Data Stream Processing MAKE Lab Post-Excellence Project Subproject 6 Date: 2006/03/07

  2. Peer search engine Profile database Cluster coordinator Cluster monitor Music channel simulator XML Filtering engine MusicXML database Music Virtual Channel Clustering engine Interface Channel monitor Profile monitor Favorite channel 1 Internet V.C. player … 2 V.C. player Filtering engine … N Music metadata Music collections

  3. Research Directions Sequence Query Matching Temporal Query Processing Episode Query Matching Range Search Filtering Spatial Query Processing KNN Search Aggregate Query Processing Streaming Data Management Top-K Search Closed Tree Pattern Mining Frequent Tree Pattern Mining Mining Frequent Itemset Mining (sliding window) Frequent Itemset Mining (landmark model)

  4. Sequence Query Matching • Given a set of sequence queries (SQs), how to continuously monitor the event stream for them and report the segments that are approximate answers of certain queries as soon as the segments arrive according to the error bounds of the queries? • Event Stream • <a,b,c,d><c,e><a,b,c><b,d><a,d><e,f><a,e><a,b,c><e,f><a,b,c><e><b,c,e><d,f>······················ • Sequence Query • <a,b,c><b,d><a,c,d><e,f><a,e>, ε=1

  5. 15 seconds 5 seconds Episode Query Matching • Knowledge Discovery from Telecommunication Network Alarm Databases [ICDE96] • If an alarm of type A occurs, then an alarm of type B occurs within 30 seconds with probability 0.8 • If alarms of types A and B occurs within 5 seconds, then a alarm of type C occurs within 60 seconds with probability 0.7 • If an alarm of type A precedes an alarm of type B, and C precedes D, all within 15 seconds, then E will follow within 4 minutes with probability 0.6 B A A A B C D

  6. Top-K Query • Suppose there are two continuous queries  and . Then, another continuous query  is registered. • Which two web documents are the most popular across the first and second servers? • Which two web documents are the most popular across the third and fourth servers? • Which two web documents are the most popular across the second and third servers? Coordinator Queries Server 1 Server4 Server 2 Server 3

  7. Main Difficulties • Heavy Communication Cost • The serve only updates its current data when necessary • Multiple Continuous Queries • Most papers focus on one-time top-k queries or single continuous top-k query • Information sharing is necessary

  8. Spatial Query Processing • Continuous queries for moving objects in high-dimensional space • Range search • KNN search user profile Search engine V.C. player recommended channel user profile, channel Vote Mechanism V.C. player V.C. player V.C. player V.C. player selected channel

  9. Problem Definition • Given a set of objects with their positions on a N-dimension (N>20) region. The set of objects is highly dynamic: each object can move in an unrestricted fashion, i.e., we do not assume any pattern of motion • Continuously monitoring the results of each query point • Range Query • KNN Query

  10. Q1 Q2 Q1 Q1 Q2 Q2 Main Difficulties • Heavy Communication Cost • The object updates occur only when the results for some queries might change • Safe Region [SIGMOD05] • Incremental Update • Efficiently maintain the effective results • Multiple Continuous Queries • Decide the quarantine area for each query • Mixed Types of Queries • Support both the range query and the KNN query

  11. Query Q: (x,y), r Range Query Cell C A: max < r B: min r  max C: min > r max: dis(query,cell) min: dis(query,cell)

  12. Range Query (Cont.) Moving Query MQ How to maintain the Result for a MQ?

  13. Server Q1 Q2 Q3 flag = 0/1 Client No update and no recalculate Update and recalculate for some queries No update and no recalculate We only need to consider those objects marked with B Range Query (Cont.) When to update? Q1 Q2 Q3 A A A A A B A A C

  14. Query Motion C2 C2 Range Query (Cont.) For a range query Q C3 C4 C5 A Result list O3 O5 O7 Covered cells C2 C7 C9 B For a cell C Q2 Q4 Q7 Affected queries A Q3 Q6 Q9 B

  15. Object Update update the order update the order re-computation KNN Query Query Q: (x,y), 3

  16. KNN Query (Cont.) Query Q: (x,y), 3 Query Q’: (x’,y’), r r = d’max d’max

  17. dmax dquery KNN Query (Cont.) Query Q: (x,y), 3 Query Q’: (x’,y’), r r = dmax+dquery

  18. dmax dcell KNN Query (Cont.) Query Q: (x,y), 3 Query Q’: (x’,y’), r r = dmax+dcell

  19. T3 T2 Tree Pattern Mining • As the trees stream in, find out the subtrees that occur more than θ·N times, where N is the number of trees received so far and 0≦θ≦1 Frequent Tree Patterns T1 STMer

  20. A A B B B C D C D C A B B B B A C D C D C frequent subtrees B A B C D 2 3 3 2 2 3 2 2 2 closed Closed Tree Pattern Mining • Mining closed frequent subtrees over data streams • a subtree is closed if none of its proper supertrees has the same support as its

More Related