1 / 17

Information Integration

Information Integration. By Neel Bavishi. Mediator Introduction. A mediator supports a virtual view or collection of views that integrates several sources in much the same way as the materialized relation(s) in a ware house integrate sources.

Download Presentation

Information Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Integration By Neel Bavishi

  2. Mediator Introduction • Amediator supports a virtual view or collection of views that integrates several sources in much the same way as the materialized relation(s) in a ware house integrate sources. • Since the mediator doesn't store any data the mechanics of mediators and warehouses are rather different. • To begin, the user issues a query to the mediator. Since the mediator has no data of its own, it must get the relevant data from its sources and use that data to form the answer to the user's query.

  3. Flow using Mediator • A mediator and wrappers translate queries into the terms of the sources and combine the answers.

  4. Wrappers in Mediator based System • Mediator systems require more complex wrappers than do most warehouse systems. • The wrapper must be able to accept a variety of queries from the mediator and translate any of them to the terms of the source. • Of course the wrapper must then communicate the result to the mediator just as a wrapper in a warehouse system communicates with the warehouse. • A systematic way to design a wrapper that connects a mediator to a source is to classify the possible queries that the mediator can ask into templates, which are queries with parameters that represent constants.

  5. Templates for Query Patterns • The mediator can provide the constants and the wrapper executes the query with the given constants. • An example should illustrate the idea: it uses the notation T => S to express the idea that the template T is turned by the wrapper into the source query S. • Example: • Suppose we want to build a wrapper for the source of Dealer 1, which has the schema: Cars(serialNo, model, color, autoTrans, cdPlayer, ... ) • for use by a mediator with schema AutosMed(seria1No, model, color, autoTrans, dealer)

  6. T • SELECT * FROM AutosMed WHERE color = ' red ' ; Translates to S • SELECT serialNo, model, color, autoTrans, ' d e a l e r l ‘ FROM Cars WHERE color = ‘red’;

  7. Wrapper Generators • A wrapper generator produces tables for a driver; the driver and tables constitute the wrapper.

  8. How is Wrapper generated? • The templates defining a wrapper must be turned into code for the wrapper itself. • The software that creates the wrapper is called a wrapper generator; it is similar in spirit to the parser generators (e.g., YACC) that produce components of a compiler from high-level specifications. • The process, suggested in the above figure, begins when a specification, that is, a collection of templates, is given to the wrapper generator. • The wrapper generator creates a table that holds the various query patterns contained in the templates, and the source queries that are associated with each. • A driver is used in each wrapper; in general the driver can be the same for each generated wrapper.

  9. Driver’s Tasks • Accept a query from the mediator. The communication mechanism may be mediator-specific and is given to the driver as a "plug-in," so the same driver can be used in systems that communicate differently. • Search the table for a template that matches the query. If one is found, then the parameter values from the query are used to instantiate a source query. If there is no matching template, the wrapper responds negatively to the mediator. • The source query is sent to the source, again using a "plug-in" communication mechanism. The response is collected by the wrapper. • The response is processed by the wrapper, if necessary, and then returned to the mediator. The next sections discuss how wrappers can support a larger class of queries by processing results.

  10. Filters • An approach to supporting more queries is to have the wrapper filter the results of queries that it poses to the source. • As long as the wrapper has a template that (after proper substitution for the parameters) returns a superset of what the query wants, then it is possible to filter the returned tuples at the wrapper and pass only the desired tuples to the mediator. • Example: • SELECT * FROM AutosMed WHERE color = ‘red' and model = 'Gobi'; • Use the last template where we found car with color=‘red’. • Store the result in a temporary relation. TempAutos(seria1N0, model, color, autoTrans, dealer) 3. Select from TempAutosthe Gobis and return the result, as with the query. • SELECT * FROM TempAutos WHERE model = 'Gobi';

  11. Capability Based Optimization in Mediators

  12. Problems Of Limited Source Capabilities • It is almost impossible to migrate the data from an old database system to a more modern system, because people rely on applications that run only on the legacy system. This problem of being "locked in" to an old system that no one likes is called the legacy database problem, and it is unlikely to be solved any time soon. • For reasons of security, a source may limit the kinds of queries that it will accept. Amazon's unwillingness to answer the query "tell me about all your books" is a rudimentary example. • Indexes on large databases may make certain kinds of queries feasible and others too expensive to execute. Such queries would require examining millions of tuples each.

  13. A notation for describing Source Capabilities • f (free) means that the attribute can be specified or not, as we choose. • b (bound) means that we must specify a value for the attribute, but any value is allowed. • u (unspecified) means that we are not permitted to specify a value for the attribute. • c[S] (choice from set S) means that a value must be specified, and that value must be one of the values in the finite set S. This option corresponds, for instance, to values that are specified from a pull down menu in a Web interface. • o[S] (optional, from set S) means that we either do not specify a value, or we specify one of the values in the finite set S. • The codes we shall use for adornments reflect the most common capabilities of sources. • Example: ubbo[yes, no]o[yes, no]

  14. Capability-Based Query-Plan Selection • Given a query at the mediator, a capability-based query optimizer first considers what queries it can ask at the sources that will help answer the query. • If we imagine those queries asked and answered, then we have bindings for some more attributes, and these bindings may make some more queries at the sources possible. • We repeat this procedure until either : • We have asked enough queries at the sources to resolve all the conditions of the mediator query. and therefore we may answer that query. Such a plan is called feasible. 2. We can construct no more valid forms of source queries, yet we still cannot answer the mediator query. in which case the mediator must give up: it has been given an impossible query.

  15. The simplest form of mediator query for which it need to apply the above strategy is a join of relations, each of which is available, with certain adornments, at one or more sources. • If so, then the search strategy is to try to get tuples for each relation in the join, by providing enough argument bindings that some source allows query about that relation to be asked and answered. • Example: • Let us suppose we have sources like the relations of Dealer 2: Autos(seria1, model, color) Options(seria1, option)

  16. However, let us assume that Autos and Options are relations representing the data at two different source. • Ubf -> sole adornment for Autos • Bu & uc -> adornments for Options • Query -> find the serial numbers and colors of Gobi models with a CD player. • Suppose, there are 3 different plans that a mediator can consider. A capability-based optimizer examines plans such as these and the adornments of the relations involved and eliminates infeasible plans.

  17. Thank You

More Related