1 / 49

NiagaraCQ

NiagaraCQ. A Scalable Continuous Query System for Internet Databases. Outline. Problem NiagaraCQ Selection Placement Strategies Dynamic Regrouping Algorithm. Problem.

kaia
Download Presentation

NiagaraCQ

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NiagaraCQ A Scalable Continuous Query System for Internet Databases

  2. Outline • Problem • NiagaraCQ • Selection Placement Strategies • Dynamic Regrouping Algorithm NiagaraCQ

  3. Problem Lack of a scalable and efficient system which supports persistent queries, that allow users to receive new results when they become available: Notify me whenever the price of Dell stock drops by more than 5% and the price of Intel stock remains unchanged over next three months. NiagaraCQ

  4. NiagaraCQ • Support continues queries • Change-based queries • Timer-based queries • Scalability • Performance • Adequate to the Internet • User Interface - high level query language NiagaraCQ

  5. Command Language • Create continuous query: CREATECQ_name XML-QLquery DOaction {STARTstart_time} {EVERYtime_interval} {EXPIREexpiration_time} • Delete continuous query: DELETECQ_name NiagaraCQ

  6. Expression Signature Represent the same syntax structure, but possibly different constant values, in different queries. Where <Quotes> <Quote> <Symbol>INTC</> </> </> element_as $g in “http://www.cs.wisc.edu/db/quotes.xml” construct $g Where <Quotes> <Quote> <Symbol>MSFT</> </> </> element_as $g in “http://www.cs.wisc.edu/db/quotes.xml” construct $g NiagaraCQ

  7. Expression Signature (2) = Quotes.Quote.Symbol constant in quotes.xml NiagaraCQ

  8. Query Plan Trigger Action I Trigger Action J Select Symbol=“INTC” Select Symbol=“MSFT” File Scan File Scan quotes.xml quotes.xml NiagaraCQ

  9. Group Signature Common expression signature of all queries in the group = Quotes.Quote.Symbol constant in quotes.xml NiagaraCQ

  10. Group Constant Table NiagaraCQ

  11. Group Plan …….. Trigger Action I Trigger Action J Split Join Symbol = Constant_value File File Scan Constant Table quotes.xml NiagaraCQ

  12. Incremental Grouping Algorithm • Group optimizer traverses the query plan bottom up. • Matches the query’s expression signature with the signatures of existing groups. Trigger Action Select Symbol=“AOL” File Scan quotes.xml NiagaraCQ

  13. Incremental Grouping Algorithm (2) • Group optimizer breaks the query plan into two parts. Lower – removed Upper – added onto the group plan. • Adds the constant to the constant table. Trigger Action Select Symbol=“AOL” File Scan quotes.xml NiagaraCQ

  14. Pipeline Approach • Tuples are pipelined from the output of one operator into the input of the next operator. • Disadvantages • Doesn’t work for grouping timer-based queries. • Split operator may become a bottleneck. • Not all parts should be executed. NiagaraCQ

  15. Intermediate Files NiagaraCQ

  16. Intermediate Files (2) Advantages • Intermediate files and data sources are monitored uniformly. • Each query is scheduled independently. • The potential bottleneck problem of the pipelined approach is avoided. Disadvantages • Extra disk I/Os. • Split operator becomes a blocking operator. NiagaraCQ

  17. VirtualIntermediate Files Where <Quotes> <Quote> <Change_ratio>$c</> </> </> element_as $g in “quotes.xml”, $c>0.05 construct $g Where <Quotes> <Quote> <Change_ratio>$c</> </> </> element_as $g in “quotes.xml”, $c>0.15 construct $g > Quotes.Quote.Change_Ratio constant in quotes.xml Overlap NiagaraCQ

  18. VirtualIntermediate Files (2) • All outputs from split operator are stored in one real intermediate file. • This file has index on the range attribute. • Virtual intermediate files store a value range. • Modification of virtual intermediate files can trigger upper-level queries. • The value range is used to retrieve data from the real intermediate file. NiagaraCQ

  19. Event Detection Types of Events • Data-source change • Timer Types of data sources • Push-based • Pull-based NiagaraCQ

  20. Timer-based • Timer events are stored in an event list, sorted in time order. • Each entry stores query ids. • Query will be fired if its data source has been modified since its last firing time. • After a timer event, the next firing times are calculated and the queries are added into the corresponding entries. NiagaraCQ

  21. Incremental Evaluation • Queries are been invoked only on changed data. • For each file, NiagaraCQ keeps a “delta file”. • Queries are run over delta files. • Incremental evaluation of join operators requires complete data files. • Time stamp is added to each tuple in order to support timer-based. NiagaraCQ

  22. Memory Caching • Query plans - using LRU policy that favors frequently fired queries. • Data files - favors the delta files. • Event list – only a “time window” NiagaraCQ

  23. System Architecture NiagaraCQ

  24. If file changes and timer events are satisfied, ED provides CQM with a list of firing CQs CQM adds continuous queries with file and timer information to enable ED to monitor the events Continues Queries Processing 1 CQM invokes QE to execute firing CQs Continuous Query Manager (CQM) ED asks DM to monitor changes to files Event Detector (ED) 5 6 2 , 3 4 7 DM informs ED of changes to pushed-based data sources Query Engine (QE) Data Manager (DM) 8 When a timer event happens, ED asks DM the last modified time of files File scan operator calls DM to retrieve selected documents DM only returns changes between last fire time and current fire time NiagaraCQ

  25. Selection Placement Strategies Where <Quotes><Quote><Symbol>$s</> <Price>$p</></> element_as $g </> in “quotes.xml”, $p > 90 <Companies><Company><Symbol>$s</></> element_as $t</> in “profiles.xml” construct $g, $t Where <Quotes><Quote><Symbol>$s</> <Price>$p</></> element_as $g </> in “quotes.xml”, $p > 100 <Companies><Company><Symbol>$s</></> element_as $t</> in “profiles.xml” construct $g, $t NiagaraCQ

  26. Expressions Signatures > Quotes.Quote.Price constant in quotes.xml Symbol=Symbol quotes.xml profiles.xml NiagaraCQ

  27. Where to place the selection operator ? • Below the join - PushDown (σ1R S) U (σ2R S) U … U (σnR S) • Above the join – PullUp σ1(R S) U σ2(R S) U … U σn(R S) • PullUp achieves an average 10-fold performance improvement over PushDown. NiagaraCQ

  28. PushDown - Query Plan Join Select Price>90 profiles.xml quotes.xml NiagaraCQ

  29. PushDown - Groups Plans NiagaraCQ

  30. PullUp - Groups Plans NiagaraCQ

  31. PullUp Vs. PushDown • Only one join group and one selection group • Maintains a single intermediate file • Irrelevant tuples being joined • Very large intermediate file • Changes in profiles.xml affect the intermediate file (file_k) – maintenance overhead. NiagaraCQ

  32. Filtered PullUp quotes.xml Grouped Join Plan Join Selection Price>90 profiles.xml quotes.xml NiagaraCQ

  33. Filtered PullUp Vs. PullUp • Relevant tuples being joined • Reduce the size of intermediate file • Reduce the cost of PullUp by 75% • Complexity – the selection predicate may need to be dynamically modified (query with price>70) NiagaraCQ

  34. Dynamic Re-grouping • Let Q1 (A B C) and Q2 (B C) be two continuous queries submitted sequentially. • Incremental grouping algorithm chooses a plan ((A B) C). • Neither of these groups can be used for Q2. ABC ABC AB BC BC NiagaraCQ

  35. Dynamic Re-grouping (2) • Existing queries are not regrouped with new grouping opportunities introduced by subsequent queries. • Reduction in the overall performance - queries are continuously being added and removed. • Naive regrouping-algorithm – periodically perform a global query optimization: • Expensive • Redundant work (already done by incremental opt.) NiagaraCQ

  36. Data Structures • A query graph – directed acyclic graph, with each node representing an existing join expression in the group plan. Node { char* query; //ASCII query plan SIG_TYPE sig; //signature of the query string int final_node_count; //number of users that require this query. //0: non-final node; >0: final node list<Child*> children; //children of this node, where Child={Node*, weight} list<Node*> parents; //parents of this node float updateFreq; //update frequency of this node float cost; //the cost for computing this node //Following data structures used only for dynamic regrouping int reference_count; //reference count bool visited; //a flag that records whether //purgeSibling has performed on this node } NiagaraCQ

  37. Data Structures (2) • A group table – array of hash tables. i-th hash table - queries with query length (number of joins) i. Hash table entry - mapping from a query string to the corresponding node in the graph. Array Hash Node NiagaraCQ

  38. Data Structures (3) • A query log – array of vectors. Stores new nodes that have been added since the last regrouping. Cleared after regrouping. Array Vector Node NiagaraCQ

  39. Incremental Grouping Algorithm Top-down local exhaustive search: • If the query exists, increases the final node count by 1. Else • Enumerates all possible sub-query in a top-down manner and probes the group table to check whether a sub-query node exists. • Computes the minimal cost of using existing sub-query nodes. • Computes the minimal cost without using existing sub-query nodes. • The least-costly plan will be chosen. NiagaraCQ

  40. Dynamic Regrouping Algorithm • Phase 1 : constructing links among existing nodes and new nodes. • Phase 2 : find minimal-weighted solution from the current solution by removing redundant nodes. ABC AB BC NiagaraCQ

  41. Phase 1: constructing links among existing nodes and new nodes • Main idea - for any pair of nodes in the graph, if one node is a sub-query of another node, it creates a link between them if it did not exist before. • Relationships are only evaluated between existing nodes and nodes added since last regrouping. • The difference of levels between a parent and a child is always 1. NiagaraCQ

  42. Phase 1 - Algorithm bottom-up for each node in level i query log if node has parents in level i+1 group table connect node to parent if node has children in level i-1 group table connect node tochildren NiagaraCQ

  43. Phase 2: A greedy algorithm for level-wise graph minimization • Main idea – traverse the query graph level-by-level and attempt to remove any redundant nodes at one level a time. • Starts from the second level from the top. • Subset of level i nodes retain if: • Nodes at level i+1 have at least one child in this set. • These nodes have a minimum total cost. • Nodes that are not selected are removed permanently. NiagaraCQ

  44. Phase 2 - Algorithm MinimizeGraph() { for each level L in group-table { // L ranging from the maximum number of join-1 to 1 for each node N in the level-L group table InitializeSet(N) for each node N in finalSet PurgeSiblings(N); while (remain set is not empty) { scan each node R in the remain set { if (R’s reference count == 0) { remove R from the remain set deleteNode(R) } else if (R.cost/R.reference_count < Current_minimum) { M=R Current_minimum =R.cost/R.reference_count; } } //scan … remove M from the remain set PurgeSiblings(M) } //while… } //for each level … }//MinimizeGraph InitializeSet(Node N){ if N is a final node Add N into final_set else { add N into the remain_set N.reference_count = number of parents of N } N.visited = false } purgeSiblings(Node N){ For each parent P of N { if (!P.visited) { Decrease the reference count of N’s siblings of same parent P by 1 P.visited = true } } } NiagaraCQ

  45. Cost Analysis • N = number of queries • Number of nodes is proportional to the number of queries = C*N • Each query contains no more then 10 joins. Each level contain about C*N/10 nodes NiagaraCQ

  46. Cost Analysis – Phase 1 • R or K*R = regrouping frequencies • In frequency R • N/R = number of regrouping • C*R = number of nodes that will be joined with existing nodes. • m*C*R = number of nodes after m-1 regrouping. • m*(C*R)2 = number of comparisons for m-th regrouping (ignoring a constant reduction). NiagaraCQ

  47. Cost Analysis – Phase 1 (2) • Total number of comparisons, frequency R: (C*R)2+2*(C*R)2+…+N/R*(C*R)2 = N(N+R)C2/2 = O(N2) • Total number of comparisons, frequency K*R: (C*K*R)2+…+(N/(K*R))*(C*K*R)2 = N(N+KR)C2/2 • The ratio: [N(N+KR)C2/2]/[N(N+R)C2/2] = (N+KR)/(N+R) NiagaraCQ

  48. Cost Analysis – Phase 2 • Worst case – each pass remove one node. • Cost for a level: (C*N/10)+(C*N/10-1)+…+1= CN(CN+10)/200 = O(N2) • Purge siblings: (C*N/10 * C*N/10) = (CN)2/100 = O(N2) • All 9 levels: O(N2) NiagaraCQ

  49. References • NiagaraCQ: A Scalable Continuous Query System for Internet Databases http://www.cs.wisc.edu/niagara/papers/NiagaraCQ.pdf • Design and Evaluation of Alternative Selection Placement Strategies in Optimizing Continuous Queries http://www.cs.wisc.edu/niagara/papers/Icde02.pdf • Dynamic Re-grouping of Continuous Queries http://www.cs.wisc.edu/niagara/papers/507.pdf NiagaraCQ

More Related