1 / 23

CSE 636 Data Integration

CSE 636 Data Integration. Limited Source Capabilities Slides by Hector Garcia-Molina. Heterogeneous Databases. Distributed Database System. DBMS 1. DBMS 2. legacy. web site. data. data. data. data. Limited Capabilities. Example: Amazon.com. must specify at least one of these.

said
Download Presentation

CSE 636 Data Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 636Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

  2. Heterogeneous Databases Distributed Database System DBMS1 DBMS2 legacy web site data data data data

  3. Limited Capabilities

  4. Example: Amazon.com must specify at least one of these author: title: this attribute not returned subject: format: menu of choices price: cannot query on this attribute

  5. Example: BarnesAndNoble.com must specify at least one of these author: title: Menu of choices subject: format: can query if one of other attributes specified price:

  6. Why Limited Capabilities? • Search forms • Security • Indexes • Legacy

  7. Capability vs. Content • Capability description • Can only search for subject = “art,” “history,” “science” • Content description • Source only contains subject = “art,” “history,” “science”

  8. Outline • Describing source capabilities • Extending source capabilities • How mediators cope with limited capabilities • Mediator capabilities • Other topics Mediator Wrapper Wrapper Source Source

  9. Describing Query Capabilities R(X, Y, ... Z) • Adornments: • f: may or may not specify • u: cannot be specified • b: must be specified • c[S]: specified from list S • o[S]: optional, chose from S

  10. Describing Query Capabilities R(X, Y, ... Z) • With output restriction • f’ • u’ • b’ • c’[S] • o’[S] • Adornments: • f: may or may not specify • u: cannot be specified • b: must be specified • c[S]: specified from list S • o[S]: optional, chose from S

  11. Example • Relation R(X, Y, Z) • Description Templates: bu’f, uf’c[z1, z2] • Answerable queries: R(x1, Y, Z), R(X, Y, z1) • Unanswerable queries: R(X, y1, Z), R(X, Y, z3)

  12. Other Description Mechanisms • Tsimmis • Query templates • Information Manifold • capability records (# bound attrs, conditions ok,...) • Disco • Garlic • black box • Context-free grammars

  13. Extending Source Capabilities Query: author=“Freud” AND price > 10 Wrapper amazon Source: R(author, price, ...) Template: b, u, ...

  14. Extending Source Capabilities Query: author=“Freud” AND price > 10 Wrapper Filter: price > 10 Wrapper Source Query: author=“Freud” amazon Source: R(author, price, ...) Template: b, u, ...

  15. Another Example Query: (author = “Freud” OR author = “Jung”) AND price < 10 Wrapper Barnes&Noble R(author, price, …) No disjunctive conditions; Price can only be specified with author

  16. Another Example Query: (author = “Freud” OR author = “Jung”) AND price < 10 Union Operation Q1: author = “Freud” AND price < 10 Q2: author = “Jung” AND price < 10 Wrapper Barnes&Noble R(author, price, …) No disjunctive conditions; Price can only be specified with author

  17. Extending Source Capabilities • General scheme: • try many query rewritings • check if query fragments supported by source • check if wrapper can combine answer fragments • do all this very efficiently!! • H. Garcia-Molina, W. Labio, R. Yerneni: Capability-Sensitive Query Processing on Internet Sources,ICDE 1999 • Tsimmis, Info Manifold: no disjunctive queries • DISCO: no query splitting • Garlic: only CNF queries

  18. Mediator Processing Query: M(5, Y, Z, W, 3) Mediator M(X, Y, Z, W, U) = Join(R, T) Wrapper Wrapper Source Source R(X, Y, Z) f, f, b T(Z, W, U) f, u, b

  19. Plan 1 Query: M(5, Y, Z, W, 3) (3) Join answers Mediator M(X, Y, Z, W, U) = Join(R, T) (1) R(5, Y, Z) (2) T(Z, W, 3) Wrapper Wrapper Source Source R(X, Y, Z) f, f, b T(Z, W, U) f, u, b

  20. Plan 2 Query: M(5, Y, Z, W, 3) (3) Join answers Mediator (2) for each (z,w,u)  P: R(5, Y, u) M(X, Y, Z, W, U) = Join(R, T) (1) P = T(Z, W, 3) Wrapper Wrapper Source Source R(X, Y, Z) f, f, b T(Z, W, U) f, u, b

  21. Mediator Plan Generation • Need feasible and efficient plan • Search space is huge • Tsimmis, Info Manifold, Garlic: • exponential algorithms • Polynomial algorithms: • often find optimal or near-optimal plan • bounded performance • R. Yerneni, C. Li, J. D. Ullman, H. Garcia-Molina: Optimizing Large Join Queries in Mediation Systems, ICDT 1999

  22. Conclusion • Not all sources are created equal! • Need to • describe what sources can do • efficiently process queries with limited sources • describe what mediators can do • exploit content information • deal with unavailable sources

  23. References • Computing Capabilities of Mediators • Ramana Yerneni, Chen Li, Hector Garcia-Molina, Jeffrey D. Ullman • SIGMOD Conference 1999 • Describing and Using Query Capabilities of Heterogeneous Sources • Vasilis Vassalos, Yannis Papakonstantinou • VLDB 1997

More Related