1 / 35

Answering Queries using views: A survey

Answering Queries using views: A survey. The problem…. We have a schema A and a set of materialized views V1, V2,…, Vn We are given a query Q over schema A Can we answer Q using another query Q’ referring only to (one or more) views? Depending on the specific context:

kolya
Download Presentation

Answering Queries using views: A survey

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Answering Queries using views: A survey

  2. The problem… • We have a schema A and a set of materialized views V1, V2,…, Vn • We are given a query Q over schema A • Can we answer Q using another query Q’ referring only to (one or more) views? • Depending on the specific context: • Schema A can be either virtual or materialized • Q’ may be required to be an equivalent rewriting or a maximally contained rewriting of Q

  3. Equivalent and Containment rewritings • Query Containment: Q1 is contained in Q2 (Q1Q2), if for all db instances D, Q1(D)Q2(D) • Query Containment: Q1 is equivalent to Q2 if Q1Q2 and Q2  Q1 Let A a schema and V={V1, V2, …, Vm} a set of views over A and Q a query over A • Q’ is a equivalent rewriting of Q, if • Q’ refers only to views in V • Q’ is equivalent to Q • Q’ is a maximally contained rewriting of Q, if • Q’ refers only to views in V • Q’  Q • There is no other rewriting Q1, such as Q’  Q1  Q

  4. When the problem arises? • Data integration • The sources are described as views over the mediated schema • Data warehouse • The schema of the warehouse can be defined as views over the schemata of the sources • Query Optimization • Computing the query using previously materialized views • semantic caching • Database design • Independence of the physical view of the data and logical view • Rewriting of a query over the logical schema into a query over the physical view

  5. The variations of the problem

  6. A running example… • Suppose we have the following schema Student(st_id, st_name, dp_id) Professor(pr_id, pr_name) Department(dp_id, dp_description) Course(cr_id, cr_title, cr_semester) Prof_Dep(pr_id, dp_id, university) Student_course(st_id, cr_id) Prof_course(pr_id, cr_id)

  7. Data Integration Mediator Student(st_id, st_name, dp_id) Professor(pr_id, pr_name) Department(dp_id, dp_description) Course(cr_id, cr_title, cr_semester, cr_field) Prof_Dep(pr_id, dp_id, university) Student_course(st_id, cr_id) Prof_course(pr_id, cr_id) Create View AuebProfs As Select cr_title, pr_name From Course, Prof_course, Professor Where Course.cr_id = Prof_course.cr_id and Prof_course.pr_id = Professor.pr_id and Course.univ = “AUEB” Create View DBProfs As Select cr_title, pr_name From Course, Prof_course, Professor Where Course.cr_id = Prof_course.cr_id and Prof_course.pr_id = Professor.pr_id and Course.cr_field = “DB” Source 1 Professors Of Databases Source 1 Professors Of AUEB

  8. Data Integration Mediator Student(st_id, st_name, dp_id) Professor(pr_id, pr_name) Department(dp_id, dp_description) Course(cr_id, cr_title, cr_semester, cr_field) Prof_Dep(pr_id, dp_id, university) Student_course(st_id, cr_id) Prof_course(pr_id, cr_id) Query Q: Select pr_name From Professor Query Q’: Select pr_name From DBProfs Union Select pr_name From AUEBProfs Source 1 Professors Of Databases Source 1 Professors Of AUEB Q’ is a maximally-contained Rewriting of Q (professors of Algorithms in the university of Piraeus cannot be retrieved…)

  9. Data Integration Mediator Query P: Select pr_name From Course, Prof_course, Professor Where Course.cr_id = Prof_course.cr_id and Prof_course.pr_id = Professor.pr_id and Course.cr_field = “DB” And Course.univ = “AUEB” Student(st_id, st_name, dp_id) Professor(pr_id, pr_name) Department(dp_id, dp_description) Course(cr_id, cr_title, cr_semester, cr_field) Prof_Dep(pr_id, dp_id, university) Student_course(st_id, cr_id) Prof_course(pr_id, cr_id) Query P’: Select pr_name From DBProfs, AUEBProfs Where DBProfs.pr_name = AUEBProfs.pr_name Source 1 Professors Of Databases Source 1 Professors Of AUEB P’ is an Equivalent Rewriting of P

  10. Query Optimization • Suppose that we have a database with the schema of our running example • Suppose that we have the following materialized views • create view StudentCoursesView As Select st_name, cr_id from Student, Stud_course where Student.st_id = Stud_course.st_id • create view ProfCoursesView As Select pr_name, pr_id from Professor, Prof_course where Professor.pr_id = Prof_course.pr_id

  11. Query Optimization • Suppose we are given the query Select st_name, prof_name from Student, Stud_course, Prof_course, Professor where Student.st_id = Stud_course.cr_id and Stud_course.cr_id = Prof_course.cr_id and Prof_course.pr_id = Professor.pr_id

  12. Query Optimization • We can exploit the materialized views in order to eliminate joins • There are many possible combinations • One could be: Select st_name, pr_name From StudentCoursesView, ProfessorCoursesView Where StudentCoursesView.cr_id = ProfessorCoursesView.cr_id

  13. Query Optimization • But: • Our aim here is not just to find an equivalent rewriting • We have to find all rewritings and evaluate their costs in order to select the cheapest plan • In some cases views may not help due to loss of indexes • Therefore, we need to make a cost-based rewriting • Similar case is the semantic caching • Client-server architecture • Query results can be temporary maintained as relations in the client side to be exploited for the evaluation of future queries

  14. A common question: When a view V is useful for a query Q? • There must be a mapping μ from the relations of V to the relations of Q • If so, if R1 is both in V and Q then: • If R1 participates in a join in Q with relation R2, then • Either R1 and R2 must be joined in the same attributes with the same conditions in V • Or R1 and R2 must be joined in V in the same attributes but with weaker conditions than those in Q, provided that the join attributes are projected in V • Or only R1 is in V and the attributes of R1 participating in the join in Q, are projected in V • If Q applies a selection predicate in an attribute A of R1, then V: • Either V must apply the same selection predicate in attribute A of R1 • Or V must apply a weaker selection predicate, provided that attribute A is projected in V • Or does not apply a predicate in attribute A of R1 at all and attribute A is projected in V • V must project at least all attributes projected in Q

  15. Examples Q: Select st_name, prof_name from Student, Stud_course, Prof_course, Professor where Student.st_id = Stud_course.st_id and Stud_course.cr_id = Prof_course.cr_id and Prof_course.pr_id = Professor.pr_id and Student.st_year > 2000 V1: Select st_name from Student where st_year>2000 • V1: Select st_name, cr_name • from Student, Stud_course, Course • where Student.st_id = Stud_course.st_id • and Stud_course.cr_id = Course.cr_id V1: Select st_name, st_id from Student where st_year> 2000 • V1: Select st_name, st_year, cr_name, • cr_id • from Student, Stud_course, Course • where Student.st_id = Stud_course.st_id • and Stud_course.cr_id = Course.cr_id V1: Select st_name, st_id from Student where st_year >1996 V1: Select st_name, st_id, ma from Student where am >1996

  16. Algorithms for finding query rewritings using views • In the query optimization context • A variation of the Selinger Algorithm • Cost-based equivalent rewritings • In the data integration context • The bucket algorithm • The inverse-rules algorithm • The MiniCon algorithm Maximally-contained rewritings

  17. Query Optimization – the Selinger Algorithm • Recall: the Selinger Algorithm chooses the best join order • A bottom-up dynamic programming algorithm • Each iteration explores all join combinations of certain length • For each combination of joins, it keeps only the cheapest plan, taking into account costs estimated in the previous iteration

  18. A variation of the Selinger Algorithm • The Selinger Algorithm can be modified in order to choose the cheapest equivalent rewriting • We start with views that seem useful for a certain query • We continue in a bottom-up fashion exploring all join combinations • For each combination we prune plans estimated to be more expensive or less contributing than others

  19. A variation of the Selinger Algorithm • First Iteration: • Find all views that are relevant (useful) to the query • Choose the best access path for each view and evaluate its cost • Second Iteration: • Find all pair combinations • For each pair find all possible Joins in all possible attributes that lead to useful plans • Plans that are complete (i.e. are sufficient to answer the query) are distinguished as final candidates • For the rest plans do the following: • Compare pairs of plans • For each pair (p, p’), if p’ is cheaper and has greater or equal contribution (covers more relations), then p is pruned • Third Iteration: • Find all possible pairs of plans derived from the previous iteration and the plans derived from the first iteration • …

  20. A variation of the Selinger Algorithm Query: R1 R2 R3 R4 Views: V1=R1 R2 , V2=R2 R3 R4 V3=R2 R3, V4=R3 R4, (V1-V3)-V2 (V1-V3)-V4 V1-V2 V1-V3 V1-V4 V2-V3 V2-V4 V3-V4 V1 V2 V3 V4

  21. Algorithms in the Data Integration Context • We will discuss 2 algorithms • The bucket algorithm • The inverse-rules algorithm • But first a short introduction to conjunctive queries

  22. Conjunctive Queries • Conjunctive queries are able to express select-project-join queries • h(X,…) :- a(Y,…), b(Z,…), … • Head and subgoals are atoms • An atom consists of a predicate and a tuple of arguments: a(Y, Z, ‘aueb’, W, 2006) • Predicates stand for relations body subgoals head constants variables

  23. Conjunctive Queries • Subgoals can also be arithmetic comparisons h(X,…) :- a(Y,…), b(Z,…), Z>3 • A variable that appears in the head is said to be distinguished ; otherwise nondistinguished. h(X,Y):-a(X,Z),b(Z,Y) X,Y: distinguished, Z:nondestinguished • project-select-join SQL queries as conjunctive queries: Select pr_name, dep_name From Professor, Department Where Professor.pr_dep = Department.dep_id and Department.dep_year>1996 and Professor.field = “DB” q(pr_name, dep_name) :- Professor(pr_name, dep_id, “DB”), Department(dep_id, dep_name, dep_year), dep_year>1996.

  24. Conjunctive Queries • The problem of rewriting queries using views in the notation of conjunctive queries: • He have • a set of Database Relations • a set of views declared as conjunctive queries over the Database Relations • a query q expressed as a conjunctive query over the Database Relations • We want to find: • A set of conjunctive queries over the view relations, the disjunction of which being a maximally contained rewriting of q

  25. example Relations: R(S,C,Q), CO(C,T), TE(P,C,Q) Views: V1(S, C, Q, T) :- R(S, C, Q), CO(C, T), C>500, Q>97 V2(S, P, C, Q) :- R(S, C, Q), TE(P, C, Q) V3(S, C) :- R(S, C, Q), Q<94 V4(P, C, T, Q) :- R(S, C, Q), CO(C, T), TE(P, C, Q), Q<97 Query: Q(S, C, P) :- TE(P,C,Q), R(S,C,Q), CO(C,T), C>300, Q>95

  26. The Bucket Algorithm • Create a ‘bucket’ for each subgoal of the query q • Fill each bucket with views that ‘cover’ the subgoal • A view V covers a subgoal p(A,B) of q if there is a subgoal in V with the same predicate p(X,Y) and: • there is a mapping from the variables A, B of the subgoal of q to the variables X, Y of the subgoal of the view • For variables of the subgoal of q appearing in the head of q, their mapping variables in the subgoal of the view must also appear in the head of the view • For each variable of the subgoal of q that also appear in arithmetic conditions C in q, if its mapping variable in the subgoal of the view also appear in arithmetic conditions C’ in the view, then C and C’ should not be mutually exclusive • A solution must include q view from each bucket

  27. The Bucket Algorithm V1(S, C, Q, T) :- R(S, C, Q), CO(C, T), C>500, Q>97 V2(S, P, C, Q) :- R(S, C, Q), TE(P, C, Q) V3(S, C) :- R(S, C, Q), Q<94 V4(P, C, T, Q) :- R(S, C, Q), CO(C, T), TE(P, C, Q), Q<97 Q(St, Cn, Pr) :- TE(Pr,Cn,Qu), R(St,Cn,Qu), CO(Cn,Tl), Cn>300, Qu>95 V4(P,C,T’, Q) V2(S,P’,C,Q) V4(S’, C, T, Q’) V2(S’,P,C, Q) V1(S,C,Q,T’) V1(S’, C, Q’, T) CO TE R

  28. The Bucket Algorithm • There are 8 combinations we have to check • For each, we equaling appropriate variables to enforce joins • We check for condition confligs • If there are duplicate subgoals we try to eliminate them V4(P,C,T’, Q) V2(S,P’,C,Q) V4(S’, C, T, Q’) V2(S’,P,C, Q) V1(S,C,Q,T’) V1(S’, C, Q’, T) TE R CO q’(S,C,P):-V2(S’,P,C,Q), V1(S,C,Q,T’), V1(S’,C,Q’,T) The first solution q’(S,C,P):-V2(S’,P,C,Q), V1(S,C,Q,T’)

  29. The inverse-rules algorithm • Key idea: invert the view definitions to give the global predicates (not only those appearing in the query) in terms of views • For each nondistinguished variable X, pick a new function symbol f Example: v(X,Y) :- p(X,Z), p(Z,Y) Replace Z by f(X,Y): v(X,Y) :- p(X,f(X,Y)), p(f(X,Y),Y)

  30. The inverse-rules algorithm • Replace a view definition by rules with: • A subgoal as the head, and • The view itself as the only subgoal of the body. • Example: v(X,Y) :- p(X,f(X,Y)), p(f(X,Y),Y) becomes: p(X,f(X,Y)) :- v(X,Y) p(f(X,Y),Y) :- v(X,Y)

  31. The inverse-rules algorithm - example View: v(D,C):-m(S,D), r(S, C) Query: q(D) :- m(S,D), r(S, 444) After applying the inverse-rules, we have: m(f1(D,C), D) :- v(D, C) r(f1(D,C), C) :- v(D,C)

  32. The inverse-rules algorithm - example • Supose that v has the following tuples: {(CS, 444), (EE, 444), (CS, 333)} • Applying the inverse-rules on these tuples we have: m: { (f1(CS, 444), CS), (f1(EE, 444),EE), (f1(CS, 333), CS)} r: { (f1(CS, 444), 444), (f1(EE, 444),444), (f1(CS, 333), 333)}

  33. The inverse-rules algorithm - example • We can now apply the query in this instance q: {(CS),(EE)}

  34. The MiniCon algorithm • A variation of the bucket algorithm • It takes into account the non-distinguished variables as well, which usually imply joins • If a view contains a ‘target’ subgoal, which participates in a join in the query, then the ‘join’ variables should appear in the head of the view (unless the join takes place also within the view) • It excludes much more views from participating in buckets (here called MCDs) • There are fewer combination to check at the end

  35. Discussion…

More Related