1 / 17

Mediators, Wrappers, etc.

Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous data sources, incl., flat files, spreadsheets, … .

alta
Download Presentation

Mediators, Wrappers, etc.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mediators, Wrappers, etc. • Based on TSIMMIS project at Stanford. • Concepts used in several other related projects. • Goal: integrate info. in heterogeneous data sources, incl., flat files, spreadsheets, … . • Key idea: write “wrappers” for data sources that export a relation-like (or something as high level) views. • BUT, remember: sources != DBs. • Exported Views  sets of heterogeneous “lightweight objects”.

  2. II architecture. query mediator • No predefined • hierarchy. • A med talks to • sources via translators • and other med’s. query mediator source source

  3. What data model is appropriate? • Remember role played by data model now: • In db design, you model appln. data first, develop schema, create tables and populate ‘em. • Here, you are trying to abstract existing data and/or applns. using wrappers and would like to leverage the abstraction for querying (i.e., II) via mediators. • So, you don’t get to preach here! • Model as expressive as possible • Yet as flexible as possible • Handle missing, repeated (nested), and heterogeneous data • Support meta-data

  4. What are the architectural requirements? • Facilitate easy joining of new mediators and “registration” of new sources • Need for Mediator generator and wrapper generator

  5. What sort of query model/language is appropriate? • Must understand and be in sync with the expressive but permissive data model we sketched at. • TSIMMIS uses LOREL. • But we will keep our discussion more general. • In principle, can use SchemaSQL, XQuery, etc.

  6. More on data model • Lightweight object model (OEM): an OEM object = OID: <label, type, value>. • Self-descriptive (i.e., schema along with data, and for every data item!). • Value – atomic or set-valued.

  7. An example OEM database guide o1 resto resto resto o2 o3 o4 c n near a address • Not every resto may have • address of same type. • Indeed, some may have no • address! gourmet Three amigos s c westmont z 1650 ste catherine montreal H3G 1M7.

  8. TSIMMIS Query model • Each mediator describes its concepts (whatever it can garner from the sources it talks to) using some logical rules. • TSIMMIS uses MSL, but we will see that SchemaLog can express it easily.

  9. Information Manifold Approach • Two models: (Local as View (LAV)): • World view = global predicates (like base relations but does not exist) • Each source = a description of what info. it can contribute for the global predicate = view over global predicate (derived relations) • Query global predicate • Answer using views (which are the only ones that hold the data!)

  10. IM approach • Alternative model: global predicates exported by sources as a view of the data they actually store • Global as View (GAV) • Query global predicates • Answer by expanding query using view defs. • IM follows LAV

  11. LAV example • Global predicates: emp(E), phone(E,P), office(E,O), mgr(E,M), dept(E,D) (remember they DON’T exist!) • source1(E,P,M)  emp(E), phone(E,P), mgr(E,M). • source2(E,O,D)  emp(E), office(E,O), dept(E,D). • source3(E,P)  emp(E), phone(E,P), dept(E,`toy’). • Points to remember: • Views are descriptive, not prescriptive. • Completeness not guaranteed. • Consistency across sources not guaranteed. • Example query: q1(O,P)  phone(mary,P), office(mary,O).

  12. Query answering • How can we answer such a query? • Must get all relevant info. from views. • I.e., rewrite query using ONLY source/view predicates. • More than one possible way. • Want ALL possible rewrites (to ensure (near) completeness). • Rewritten q1: • r1q1(O,P)  s1(E,P,M), s2(E,O,D). • r2q1(O,P)  s3(E,P), s2(E,O,D). • There are other rewrites too (e.g., join all three sources), but they are contained in one of the above. So, above rewrites are all “minimal” answers. • Compare expanded r1q1 and r2q1 with q1 (w.r.t. containment). What can you say?

  13. How do we get minimal rewrites? • q – original query given (CQ over global predicates). • r – a candidate rewrite. It’s valid provided r’s expansion (by expanding source def.’s), say E(r) is contained in q. • A rewrite r is minimal if E(r) is NOT contained in E(r’) for any other rewrite. • What does minimality really mean?: • Example: s1(X,Y)  a(X,Y). s2(X,Y)  a(X,Y). query: q(X,Y) <- a(X,Y). r1q(X,Y)  s1(X,Y) as well as r2q(X,Y)  s2(X,Y) are needed to answer it. Why? (s1 and s2 do NOT necessarily provide the same set of tuples. Rules are descriptions NOT prescriptions!) • How many rewrites should we try?

  14. Levy-etal. Theorem • Thm.: if a rewrite r of query q has more subgoals than q, then s can’t be minimal. Proof: assume r is valid (or it’s useless). So E(r) is contained in q. let h be the c.m. if r has more subgoals than q, there must be a subgoal p in r, s.t. h doesn’t map any subgoal of q to any subgoal in E(p). Then get rid of all such subgoals  modified rewrite r’. r’ contains r (trivially). But r’ is contained in q (just use the original c.m. h). \qed • Given a q, only consider those sources whose body contains >= 1 global predicate appearing in q. • Still exponential # choices, but not too terrible in practice.

  15. Example revisited & expanded. • Suppose source 1 instead exported s1(E,P) and source 2 s2(E,O). • Is q1 answerable using the views? • What about q2(E)  emp(E), mgr(E, `john’). • What about q3(E1, E2)  phone(E1,P), phone(E2,P). • what about q4(E,M)  emp(E), dept(E, “toy”), mgr(E,M).

  16. QAV (AQUV) – general story • Why is QAV worthwhile problem? • Speed up query processing. • Materialized views. •  can I answer this query using stored view(s)? • Information integration. • Sources store some data, and *describe* (usu. using rules) how local data relates to the global schema (i.e., what are the contributions?) • Can I answer this query using available source data (i.e., views)? • How best can I answer?

  17. QAV – two models • Classic query optimization context: • Equivalent rewriting. • Used extensively in data warehousing/OLAP. • Information integration: • Maximally contained (also called minimal, maximally sound) rewriting. • Excellent survey: Alon Y. Halevy. Answering queries using views: a survey. VLDB Jl. 2001.

More Related