1 / 29

Answering Queries Using Views: A Survey

Answering Queries Using Views: A Survey. Paper by Alon Halevy Presentation by Rachel Pottinger. Reminders. A view is a stored query A Datalog query example: q(code):- Airport(code, city), Feature(city, “Beach”) Find all airport codes of cities that have beaches.

steffes
Download Presentation

Answering Queries Using Views: A Survey

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Answering Queries Using Views: A Survey Paper by Alon Halevy Presentation by Rachel Pottinger

  2. Reminders • A view is a stored query • A Datalog query example:q(code):- Airport(code, city), Feature(city, “Beach”)Find all airport codes of cities that have beaches

  3. Answering Queries Using Views – basic definition • Answer a query using a view rather than using the underlying base table • Query: q(code):- Airport(code, city), Feature(city, POI) • View:feature-code(code,POI):- Airport(code, city), Feature(city,POI) • Rewriting using view:q(code):-feature-code(code,POI)

  4. Two distinct problems: • Query optimization • Data integration • Physical Data Independence

  5. AQUV in Query Optimization Goals • Speed Query Processing • Still need exact answers

  6. AQUV in Query Optimization:Closed World Assumption • Closed World Assumption • Views are complete • Think of as “If and only if” • feature-code(code, POI):- Airport(code, city), Feature(city, POI)retrieves all airport codes for cities with beaches • How do we know this holds? Given from problem – can’t tell from view definition

  7. AQUV in Query Optimization:Looking for Equivalent Rewritings • Rewritings must be equivalent • Think of as “rewritten query must retrieve exactly the same answers as the original query” • Equivalent ex:Query: q(code):- Airport(code, city), Feature(city,POI)View: feature-code(code,POI):- Airport(code, city), Feature(city, POI)Equivalent Rewriting: q(code):-feature-code(code, POI) • Non-equivalent ex:Same QueryView: Beach-code(code):- Airport(code,city), Feature(city, “Beach”) • Non-equivalent (contained) rewriting: q(code):-beach-code(code)

  8. AQUV in Query Optimization:Can still access base relations • Can access views and base relations • Ex: • Query: q(code, URL):- Airport(code, city), Feature(city,POI), Webinfo(POI, URL) • View:feature-code(code,POI):- Airport(code, city), Feature(city, POI) • Rewriting:q(code,URL):-feature-code(code,POI), Webinfo(POI, URL)

  9. AQUV in Query Optimization: General Algorithm • Fold into System-R style optimizer • It’s just another access path

  10. AQUV in Query Optimization: Discussion • Imagine that you're building a query optimizer.  Would you consider it worthwhile to use views when answering queries?  Why or why not?  Would you try it only for certain kinds of queries or situations?  Which ones?

  11. AQUV in Data Integration:Example: Planning a Beach Vacation

  12. Potential Data Integration Architecture:Local-As-View (LAV) User Query Mediated Schema Local Schema 1 Local Schema N Local Database 1 Local Database N Expedia Orbitz Local sources are views on mediated schema

  13. Local As View (LAV) LAV: local source is materializedview over mediated schema Mediated Schema:Airport(code, city)Feature(city, attraction) Local Sources/Views:CAA-Air(code, city) :- Airport(code, city)Beaches(code) :-Airport(code, city), Feature(city, “Beach”) MediatedSchema … CAA-Air Beaches

  14. Local As View (LAV) LAV: local source is materializedview over mediated schema Mediated Schema:Airport(code, city)Feature(city, attraction) Local Sources/Views:CAA-Air(code, city) :- Airport(code, city)Beaches(code) :- Airport(code, city), Feature(city, “Beach”) • Adding new sources is easy • Rewriting queries is NP-complete MediatedSchema … CAA-Air Beaches Non deterministic polynomiali.e., likely exponential time: Have to try all combinations We’ll see how this works on the next slide

  15. AQUV in Data Integration:Assumptions • Open World Assumption • Each source only has some of the tuples • Read as “if  then” • Fodors(city, POI) :- Feature(city, POI)Fodors has some Features • This is an assumption – you can’t tell from view definition • Can’t access base relations • May not be able to find an equivalent rewriting

  16. AQUV in Data Integration:Maximally Contained Rewritings Query: Dest(code) :- Airport(code, city), Feature(city, “Beach”) Sources/Views: CAA-Air(code, city) :- Airport(code, city) Fodors(city, POI) :- Feature(city, POI) Rewriting:Dest(code):-CAA-Air(code, city), Fodors(city, “Beach”) Maximally Contained Rewriting: all answers to Query are a subset of those of Rewriting, and Rewriting contains all possible answers given local sources Q MS Next slide shows additional feature CAA … Fodors We’ll see an example of containment in specific algorithm

  17. Answering Queries Using Views Query: Dest(code) :- Airport(code, city), Feature(city, “Beach”) Sources/Views: CAA-Air(code, city) :- Airport(code, city) Fodors(city, POI) :- Feature(city, POI)Sun-Surf(city) :- Feature(city, “Beach”) Rewriting:Dest(code):-CAA-Air(code, city), Fodors(city, “Beach”) Dest(code):-CAA-Air(code, city), Sun-Surf(city) Maximally Contained Rewriting: all answers to Query are a subset of those of Rewriting, and Rewriting contains all possible answers given local sources Q MS CAA … Fodors How do we find the Maximally Contained Rewriting?

  18. AQUV in Data Integration: Discussion There are two assumptions that are made in maximally-contained rewritings: (a) the sources are incomplete and (b) contained rewritings are okay. • Are there data integration scenarios where you don’t think that this is true? If so, what? • Can you come up with any scenarios where only one of (a) or (b) are true? If so, what?

  19. Naïve Solution: Bucket Algorithm • Created as part of Information Manifold, Levy et al. • Algorithm: • Create a bucket for each query subgoal, place all relevant views into the bucket: Q(X):- g1(x1), …, gn(xn) • For each element in cross product of the buckets, check containment X X Don’t worry what containment is. Also NPC

  20. Subgoal Interaction The Bucket Algorithm does not recognize interactions: Query:Dest(code) :- Airport(code, city), Feature(city, “Beach”) Sources/Views:Orbitz(code):- Airport(code, city) Beaches(code) :- Airport(code, city), Feature(city, “Beach”) Frommers(city, POI):-Feature(city,POI) Bucket would check: Dest'(code):-Orbitz(code),Frommers(city,“Beach”) Expanding this gets:Dest'(code):-Airport(code,_),Feature(city, “Beach”) All answers to Dest' are not answers Dest (containment)

  21. The MiniCon Algorithm: Phase One[Pottinger & (Ha)Levy: VLDB] Query:Dest(code) :- Airport(code, city), Feature(city, “Beach”) Sources/Views:Orbitz(code) :- Airport(code, city)Beaches(code) :- Airport(code, city),Feature(city, “Beach”) Rewriting:Dest(code) :- Beaches(code) Create MiniConDescriptions (MCDs): View subgoals linked by existential variables must be mapped together What a difference datalog makes! What happens when we need two views?

  22. MiniCon Algorithm Phase Two: Combine MCDs with non-overlapping subgoals Combine MCDs with non-overlapping subgoals Query:Dest(code) :- Airport(code, city), Feature(city, “Beach”), Flight(“YVR”, code, airline, number) Sources/Views:Orbitz(code) :- Airport(code, city)Beaches(code) :- Airport(code, city),Feature(city, “Beach”)Expedia(orig, dest) :- Flight(orig, dest, airline, number) Rewriting:Dest(code) :- Beaches(code), Expedia(“YVR”, code) Fewer Combinations No Explicit Containment Check

  23. AQUV Algorithms: Discussion • Does the computational complexity of these problems surprise you? Do they seem harder or easier than expected? • How would you scale the complexity of each of the algorithms presented in terms of the completeness of the algorithms?

  24. What happened then?

  25. Schema mappings • Where do those mappings come from? What do they look like?

  26. Peer Data Management Systems • Rather than have a centralized authority, make things distributed

  27. Model Management • Most metadata applications are redone from scratch every time. • It would be nice to have an algebra (like relational algebra) only on the schema level so that these algorithms could be reused

  28. Data Spaces • Pay as you go data integration

  29. Industry: Data Integration  Enterprise Information Integration • Challenges: • Scale up and performance • Horizontal (general) vs. vertical (solving entire problem) • Integration with EAI and other middleware • But did make it

More Related