170 likes | 286 Views
Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous data sources, incl., flat files, spreadsheets, … .
E N D
Mediators, Wrappers, etc. • Based on TSIMMIS project at Stanford. • Concepts used in several other related projects. • Goal: integrate info. in heterogeneous data sources, incl., flat files, spreadsheets, … . • Key idea: write “wrappers” for data sources that export a relation-like (or something as high level) views. • BUT, remember: sources != DBs. • Exported Views sets of heterogeneous “lightweight objects”.
II architecture. query mediator • No predefined • hierarchy. • A med talks to • sources via translators • and other med’s. query mediator source source
What data model is appropriate? • Remember role played by data model now: • In db design, you model appln. data first, develop schema, create tables and populate ‘em. • Here, you are trying to abstract existing data and/or applns. using wrappers and would like to leverage the abstraction for querying (i.e., II) via mediators. • So, you don’t get to preach here! • Model as expressive as possible • Yet as flexible as possible • Handle missing, repeated (nested), and heterogeneous data • Support meta-data
What are the architectural requirements? • Facilitate easy joining of new mediators and “registration” of new sources • Need for Mediator generator and wrapper generator
What sort of query model/language is appropriate? • Must understand and be in sync with the expressive but permissive data model we sketched at. • TSIMMIS uses LOREL. • But we will keep our discussion more general. • In principle, can use SchemaSQL, XQuery, etc.
More on data model • Lightweight object model (OEM): an OEM object = OID: <label, type, value>. • Self-descriptive (i.e., schema along with data, and for every data item!). • Value – atomic or set-valued.
An example OEM database guide o1 resto resto resto o2 o3 o4 c n near a address • Not every resto may have • address of same type. • Indeed, some may have no • address! gourmet Three amigos s c westmont z 1650 ste catherine montreal H3G 1M7.
TSIMMIS Query model • Each mediator describes its concepts (whatever it can garner from the sources it talks to) using some logical rules. • TSIMMIS uses MSL, but we will see that SchemaLog can express it easily.
Information Manifold Approach • Two models: (Local as View (LAV)): • World view = global predicates (like base relations but does not exist) • Each source = a description of what info. it can contribute for the global predicate = view over global predicate (derived relations) • Query global predicate • Answer using views (which are the only ones that hold the data!)
IM approach • Alternative model: global predicates exported by sources as a view of the data they actually store • Global as View (GAV) • Query global predicates • Answer by expanding query using view defs. • IM follows LAV
LAV example • Global predicates: emp(E), phone(E,P), office(E,O), mgr(E,M), dept(E,D) (remember they DON’T exist!) • source1(E,P,M) emp(E), phone(E,P), mgr(E,M). • source2(E,O,D) emp(E), office(E,O), dept(E,D). • source3(E,P) emp(E), phone(E,P), dept(E,`toy’). • Points to remember: • Views are descriptive, not prescriptive. • Completeness not guaranteed. • Consistency across sources not guaranteed. • Example query: q1(O,P) phone(mary,P), office(mary,O).
Query answering • How can we answer such a query? • Must get all relevant info. from views. • I.e., rewrite query using ONLY source/view predicates. • More than one possible way. • Want ALL possible rewrites (to ensure (near) completeness). • Rewritten q1: • r1q1(O,P) s1(E,P,M), s2(E,O,D). • r2q1(O,P) s3(E,P), s2(E,O,D). • There are other rewrites too (e.g., join all three sources), but they are contained in one of the above. So, above rewrites are all “minimal” answers. • Compare expanded r1q1 and r2q1 with q1 (w.r.t. containment). What can you say?
How do we get minimal rewrites? • q – original query given (CQ over global predicates). • r – a candidate rewrite. It’s valid provided r’s expansion (by expanding source def.’s), say E(r) is contained in q. • A rewrite r is minimal if E(r) is NOT contained in E(r’) for any other rewrite. • What does minimality really mean?: • Example: s1(X,Y) a(X,Y). s2(X,Y) a(X,Y). query: q(X,Y) <- a(X,Y). r1q(X,Y) s1(X,Y) as well as r2q(X,Y) s2(X,Y) are needed to answer it. Why? (s1 and s2 do NOT necessarily provide the same set of tuples. Rules are descriptions NOT prescriptions!) • How many rewrites should we try?
Levy-etal. Theorem • Thm.: if a rewrite r of query q has more subgoals than q, then s can’t be minimal. Proof: assume r is valid (or it’s useless). So E(r) is contained in q. let h be the c.m. if r has more subgoals than q, there must be a subgoal p in r, s.t. h doesn’t map any subgoal of q to any subgoal in E(p). Then get rid of all such subgoals modified rewrite r’. r’ contains r (trivially). But r’ is contained in q (just use the original c.m. h). \qed • Given a q, only consider those sources whose body contains >= 1 global predicate appearing in q. • Still exponential # choices, but not too terrible in practice.
Example revisited & expanded. • Suppose source 1 instead exported s1(E,P) and source 2 s2(E,O). • Is q1 answerable using the views? • What about q2(E) emp(E), mgr(E, `john’). • What about q3(E1, E2) phone(E1,P), phone(E2,P). • what about q4(E,M) emp(E), dept(E, “toy”), mgr(E,M).
QAV (AQUV) – general story • Why is QAV worthwhile problem? • Speed up query processing. • Materialized views. • can I answer this query using stored view(s)? • Information integration. • Sources store some data, and *describe* (usu. using rules) how local data relates to the global schema (i.e., what are the contributions?) • Can I answer this query using available source data (i.e., views)? • How best can I answer?
QAV – two models • Classic query optimization context: • Equivalent rewriting. • Used extensively in data warehousing/OLAP. • Information integration: • Maximally contained (also called minimal, maximally sound) rewriting. • Excellent survey: Alon Y. Halevy. Answering queries using views: a survey. VLDB Jl. 2001.