1 / 107

Continuous Queries and Publish/Subscribe

Continuous Queries and Publish/Subscribe. Aaron Hirshfield Salinee (NOI) Jencharat Ened Ketri Esther Ryvkina Miriam Speert. Presentation Outline. 1:40 - 1:50 Introduction to Pub/Sub 1:50 - 2:25 EXPRESSION/EVALUATE 2:25 - 3:10 TAPESTRY 3:10 - 3:20 Break

shandi
Download Presentation

Continuous Queries and Publish/Subscribe

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continuous Queries and Publish/Subscribe Aaron Hirshfield Salinee (NOI) Jencharat Ened Ketri Esther Ryvkina Miriam Speert

  2. Presentation Outline • 1:40 - 1:50 Introduction to Pub/Sub • 1:50 - 2:25 EXPRESSION/EVALUATE • 2:25 - 3:10 TAPESTRY • 3:10 - 3:20 Break • 3:20 - 4:00 Efficient Filtering of XML • 4:00 - 4:30 Discussion

  3. References • Aguilera, M., Strom, R., Sturman, D., Astley, M., and Chandra, T. “Matching Events in a Content-based Subscription System”. 18th ACM Symposium on Principles of Distributed Computing, 1999: 53-61 • Fabret, F., Jacobsen, H.A., Llirbat, F., Pereira, J. “Filtering Algorithms and Implementation for Very Fast Publish/Subscribe System”. ACM SIGMOD 2001: 115-126. • Yalamanchi, A., Srinivasan, J., Gawlick, D. “Managing Expressions as Data in Relational Database Systems”. Proceedings of the 2003 CIDR Conference.

  4. References (Continued) • Douglas B. Terry, David Goldberg, David Nichols and Brian M. Oki. Continuous Queries over Append-Only Databases. SIGMOD Conference 1992: 321-330.

  5. What is Publish/Subscribe? • Connecting information providers to information consumers • Delivering published events to subscribers who expressed interests in such events.

  6. Group-Based • Publishers categorize events into fixed groups such as subjects, channels, or topics. • Subscribers choose certain a group(s) and then obtain every event in the specified group(s).

  7. Content-Based • No pre-defined groups - subscribers can filter criteria along multiple dimensions. • No administrative overhead for publishers to maintain groups.

  8. Issues • Large number of subscribers • Large number/high rates of events • Minimize delay • Simple and expressive subscription • Changing subscription

  9. How does it relate to us? • Subset of stream database with only select queries • Events in form of streams • Subscribers’ interests in form of continuous queries or possibly as another stream

  10. Managing Expressions as Data in Relational Database Systems Aravind Yalamanchi Jagannathan Srinivasan Dieter Gawlick

  11. First, a note on efficiency… Save: • time • space • computation • computation resources

  12. Motivation • Sophisticated Publish/Subscribe systems do not currently exist • Authors propose approach to Pub/Sub that allows greater expressiveness

  13. Idea: I want a list of movies meeting the following criteria: • G Rating • Cartoon • Shorter than 3 hours Could execute a query, or…

  14. Different Approach … we could store the query (as data)! • Arrival of new data  search DB for queries that are satisfied • Notify users who issued matching queries

  15. Definition: Expressions • Boolean conditions characterizing user’s interests • Syntax-equivalent to SQL-WHERE clause • Stored in a column of database table • Treated as data

  16. More on Expressions • can reference variables and functions • Example: Tivo/Digital Guide ratingReview(movie) = 4 and Day = 6 and Length(movie) < 200; • finds movies that received 4 stars, less than 200 minutes, and are being shown on day 6 (Saturday)

  17. Expression Data Type • Conditional expression stored as text • Associated with Expression Set (set of expressions in db) Metadata (variables and functions common to set)

  18. Expression Metadata • Expressions share a common list of variables • Metadata made up of these variables, their data types and functions • Expressions can use this Metadata in their predicates

  19. Expression Metadata Cont’d • Metadata determines evaluation context of expressions in a column • Insertion or update to expression: expression checked against metadata to ensure its validity

  20. Expression Metadata Variable names and their data type List of user-defined Function

  21. Definition: EVALUATE • An operation comparable to (eval) in LISP/Scheme • Equivalent SQL Query • metadata in FROM clause • expression in WHERE clause SELECT 1 FROM (SELECT :Genre as Genre, :Rating as Rating FROM <any_table_with_one_row>) WHERE Genre = ‘Comedy’ and Rating = ‘PG’

  22. EVALUATE Operator • Evaluates an expression into 0 or 1. • Takes 2 arguments • text representation of expression • Data Item (String or AnyData type) • SELECT * FROM consumer WHERE EVALUATE( consumer.interest, • AnyData.convertObject( • Movie(‘Comedy’,’PG’))= 1 • SELECT * FROM consumer WHERE EVALUATE( • consumer.interest, • ‘Genre => “COMEDY”, • Rating => “PG”’ ) = 1

  23. Use of EVALUTE in SQL • Use of aggregates to allow complex expressions. • using GROUP BY or ORDER BY on the customer’s other attributes • SELECT * FROM consumer • WHERE EVALUATE(consumer.interest, • <movie details> ) = 1 • Order BY consumer.age

  24. Use of EVALUTE (continued) • Can be combined with other predicates to allow “Mutual Filtering” (Publishers can filter data, not only Subscribers) • notifying only customers with age >= 17 about the R-rated movies

  25. Use of EVALUTE (continued) • Perform JOINS on multiple relations • order the movies in the newMovie table by the number of subscribers interested in the type of the movie (using count(*) in SQL) • SELECT DISTINCT (newMovie.id), count(*) as demand FROM consumer, newMovie • WHERE EVALUATE(consumer.interest, • <movie details from newMovie table> ) = 1 • GROUP BY newMove.id • ORDER BY demand DESC

  26. Use of EVALUTE (continued) • Maintain complex (many-to-many) relationship between multiple tables. • joining customer and movie tables to create a listing of all the actual movies from movie table that each customer is interested in.

  27. Expression Indexing • As in previously seen systems, Oracle developers exploit commonalities • Large set of expressions many will tend to share common parts of their predicates • Index defined on group of expressions in column to process more “efficiently” with EVALUATE

  28. Expression Indexing Cont’d • Exploitation of commonalities leads to more “efficient” processing of expressions • Processing costs shared across multiple predicates • Example: • Given 2 predicates: one wants movie rated PG and one wants movie rated R • If one predicate is satisfied, the other is not

  29. Exploitation of Commonalities • Logical grouping of predicates by their LHS • ( length(movie) < 2; group on length(movie)) LHS OP RHS • Example: want movies shorter than 3 hours and movies shorter than 2 hours, • movies shorter than 2 hours satisfy both predicates

  30. Predicate Table • Predicates are grouped by LHS • Contains 1 row for each expression

  31. Index Processing • Data’s attribute compared with the RHS and OP columns • Example: SELECT exp_id from predicate_table WHERE G2_OP = ‘=‘ and G2_RHS = :rhs_val OR G2_OP = ‘>’ and G2_RHS < :rhs_val OR …

  32. Index on Predicate Table • We can have an index on the Predicate Table. • Processing: 1. Indexed predicate group 2. Stored predicate group 3. Sparse predicate group Some expressions are filtered out after each step - Less to check with slower predicate group.

  33. Predicate Handling-Comparisons

  34. Performance Characterization • Oracle says: preliminary experiments  promising results for the EFI Scheme • Performed best when “fine-tuned” for given expression set-- Tunable Characteristics of an index: • list of common predicates • list of common ops for these preds • # of indexed predicates

  35. Future Work • Native support needed for: • Expression data type • EVALUATE operator

  36. Why Does It Matter?(What to take away) • Expression/EVALUATE allows us to list all interested subscribers for a data by just one query. • Makes it easier to scale for large number of subscribers. • Using Relational Database Management system for Pub/Sub.

  37. Discussion Sneak Preview • What if we don’t have all information? Must conform to metadata, so now what? • Have we seen Pub/Sub before? What about myYahoo portals? • So, if RDMS can be extended to handle Pub/Sub, is this the limit?

  38. Continuous Queries over Append-Only Databases Using the Tapestry System

  39. Talk Topics • Continuous Queries • TQL / Tapestry • Query Transformations • Results • Conclusions

  40. Continuous Queries • What does ‘continuous’ mean? (Continuous Semantics) • The results of a continuous query is the set of data that would be returned if the query were executed at every instant in time. • Why an ‘append-only’ database? • Users are being presented with answers, and we don’t want a data item to be an answer at one point, but not an answer later in time.

  41. Periodic Query Example • Show all e-mail messages that have received a reply.

  42. Problems with Periodic Queries • Non-deterministic results • If two people make this query, will they obtain the same results? • Duplicates can be returned • Every time the query is run, the same results might be returned. • Inefficiency of the system • The query is run over all data, not new data, resulting in slow processing time do to un-necessary work being done by the system.

  43. The Tapestry System • A system that keeps a database of all mail and news messages received. • Designed to filter mail and newsgroup messages. • Allows users to query over this database. • Since Tapestry doesn't use triggers (time is used), it can be implemented in any commercial database that supports SQL applications.

  44. TQL • TQL – Tapestry Query Language • Works much like SQL, with the addition of a timestamp. • Every query uses DISTINCT. • No aggregates at this time. • Does not support outer joins. • TQL  monotone  incremental  SQL

  45. TQL Advantages • Allows ad hoc queries. • Can be used in a conventional relational database. • Time-oriented queries are supported. • Flexible scheduling of query running order.

  46. Continuous Query Example • Show all e-mail messages that have received a reply, and that have not been returned by this query before. • Method: • Subtract old answers from new answers and return this new set of data. • This is done because some past answers might be included in the ‘new’ data.

  47. Incremental Query Example • Show all e-mail messages that have received a reply, and that have not been returned by this query before. (Same query as before) • Method: • Run the query on the new data that has arrived, not the entire database or by comparing to past answers.

  48. Solutions • Incremental Queries can eliminate duplicates. • Timestamp • Query Transformations • To run continuous queries and solve the problem of non-determinism, TQL was developed.

  49. Query Transformation • Q(t)  QM(t)  QI(t, τ) • QM(t) – monotone query • QI (t, τ) – incremental query • Monotone query: QM(t1)  QM(t2) when t1 <t2 • Minimum bounding monotone query • Incremental query: QM(t)-QM(τ)  QI(t, τ)  QM(t) • Should return enough, but not too much

  50. Monotone queries • QM(t) = Us<=t Q(s) • Monotone query result = • = regular query + continuous semantics • Not all queries are monotone, but for an append-only database many queries are

More Related