1 / 29

A Query Translation Scheme for Rapid Implementation of Wrappers

A Query Translation Scheme for Rapid Implementation of Wrappers. Yannis Papakonstantinou, Ashish Gupta, Hector Garcia-Molina, Jeffery Ullman. Presented By Preetham Swaminathan 03/22/2007. Introduction.

tyne
Download Presentation

A Query Translation Scheme for Rapid Implementation of Wrappers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Query Translation Scheme for Rapid Implementation of Wrappers Yannis Papakonstantinou, Ashish Gupta, Hector Garcia-Molina, Jeffery Ullman Presented By Preetham Swaminathan 03/22/2007

  2. Introduction • As part of the TSIMMIS project a lot of hard coded wrappers have been developed for a variety of sources including legacy systems. • Some Observations • Only small part of code deals with access details of source • Lot of code deals with communication, buffering etc. • Or code implements query and data transformation that can be expressed in a high level declarative fashion.

  3. Introduction • Based on observations Wrapper implementation toolkit for rapid wrapper building developed. • Toolkit contains • Library of commonly used functions • Facility to translate queries into source specific commands and queries. • Translating results into a model useful to the application. • Main focus on the Query translation component of toolkit. (Converter)

  4. Converter • Converter – Query translation component of the toolkit. • An implementor gives converter a set of templates. • These templates describe queries accepted by wrapper. • If application query matches template implementer provides an action. • The action is executed to produce native query for the source which answers the query.

  5. Example • Consider data source that can only do selections on attribute dept. • Source does not understand the notion of projecting attributes. • Template describing the source select * from $X where $X.dept = ‘toy’ • The following query does not match this template because it consists of a projection. select emp.name from emp where emp.dept=‘toy’

  6. Example • The wrapper could process the above query as follows • Transform the query into one without a projection. • Perform a projection on the result of the query – also known as process of filtering. • Wrapper toolkit can handle this type of query transformation. • Convertor not only generates native queries for source but also filters describing additional processing on the results.

  7. Converter • Converters in the toolkit targets MSL query language. • MSL is logic based language for simple object oriented data model called OEM. • Converter is configured with templates written in QDTL. • Each template is associated with an action. • Converter takes as input MSL query and generates • Commands for source and • Filter to be applied to the results.

  8. Converter • Converter will process • Directly supported queries – queries that syntactically match template. • Logically supported queries • Indirectly supported queries – can be processed as a combination of a direct query and a filter.

  9. OEM Model • OEM stands for Object Exchange model. • OEM does not support classes, methods and inheritance. • Classes and methods can be emulated. • Example: <ob1 person {sub1,sub2,sub3,sub4,sub5}> <sub1 last_name, ‘Smith’> <sub2 first_name, ‘John’> <sub3 role , ‘faculty’> <sub4 department, ‘CS’> <sub5 telephone, ‘415-514-1292’>

  10. OEM Model • At each source top level OEM objects are defined. • They provide entry points into object structure. • Sub-objects can be requested as explained below using the following MSL query. (Q1) *P:-<P person {<L last_name ‘Smith’>}> • Tail is of form <object id label value> • Matching • When field is a constant then pattern binds only with objects that have same constant value • When field is a variable the pattern can bind with any OEM object.

  11. A Detailed Query Translation Example • Build a wrapper for a university “lookup” facility that contains information about employees and students. • Accessed from command line of computers and offers limited query capabilities. • Can return only the full records of persons including all fields like firstname, lastname and telephone. • No way for the user to retrieve just one field.

  12. Query Translation • Only queries that are accepted are • Retrieve person records by specifying last name. (L2) lookup –ln Smith • Retrieve person records by specifying first and last name. (L3) lookup –ln Smith –fn John • Retrieve all person records (L4) lookup

  13. Query Translation • Using Query description translation language (QDTL) the description for lookup facility can be written as below. (D1) (QT1.1) Query ::= *O:-<O person {<lastname $LN>}> (QT1.2) Query ::= *O:-<O person {<lastname $LN> <firstname $FN>}> (QT1.3) Query ::= *O:-<O person V> • Identifiers preceded by $ are constant place holders • Upper case identifiers are variable place holders.

  14. Query Translation • Each template describes many more queries than those that match syntactically. • Each template describes following classes of queries. • Directly supported queries. • Logically supported queries. • Indirectly supported queries.

  15. Query Translation • Directly Supported Queries • A query q is directly supported by a template t if q can be derived by substituting the constant placeholders of t by constants and the variables of t by variables. • *P:-<P person {<last_name ‘Smith’>}> is directly supported by template QT1.1 by substituting O with P and $LN with ‘Smith’.

  16. Query Translation • Logically supported queries • A query q is logically supported by a template t if q is logically equivalent to some query q` directly supported by t . *O:-<O person {<first_name ‘John’> <last_name ‘Smith’>}> *O:-<O person {<last_name ‘Smith’> <first_name ‘John’>}> *O:-<O person {<LO last_name ‘Smith’>}> AND <O person {<LO L V> <first_name ‘John’>}> • All these queries are equivalent to *O:-<O person {<first_name ‘John’> <last_name ‘Smith’>}> (supported by QT1.2)

  17. Query Translation • Indirectly supported queries • A query q is indirectly supported by template t if q can be broken down into a directly supported query and then filter is applied on the results. (Q6) *Q:-<Q person {<last_name ‘Smith’> <role ‘student’>}> • The above query is not logically supported by any templates in the description.

  18. Query Translation • Converter realizes that the answer to the following query contains answers to the original query (subset of the following query) (Q7) *Q:-<Q person {<last_name ‘Smith’>} • Thus the converter matches Q6 to template QT1.1 as if it were Q7 binding $LN to ‘Smith’ and generates the filter *O:-<O person {<role ‘Student’>}> • The filter is an MSL query that is applied to the result of Q7 to produce the result of Q6

  19. Native Query Formulation (D2) (QT2.1) Query::=*O:-<O person {<last_name $LN>}> (AC2.1) {sprintf(lookup_query, ’lookup –ln %s’, $LN);} (QT2.2) Query::=*O:-<O person{<last_name $LN> <first_name $FN>}> (AC2.2){sprintf(lookup_query, ‘lookup –ln %s –fn %s’, $LN,$FN);} (QT2.3) Query::=*O:-<O person V> (AC2.3) {sprintf(lookup_query, ‘lookup’);}

  20. Non-terminals (D4) /* A description with nonterminals */ (QT4.1) Query ::= *OP :- <OP person {__OptLN __OptFN __OptRole}> /*Query Template*/ (NT4.2) __OptLN ::= <last name $LN> /*Nonterminal template*/ (NT4.3) __OptLN ::= /* empty nonterminal template*/ (NT4.4) __OptFN ::= <first name $FN> (NT4.5) __OptFN ::= /* empty */ (NT4.6) __OptRole ::= <role $R> (NT4.7) __OptRole ::= /* empty */

  21. Nonterminals - Actions (D5) (QT5.1) Query ::= *OP :- <OP person {_OptLN _OptFN _OptRole}> (AC5.1) {sprintf(lookup query, 'lookup %s %s %s', $ _OptLN, $ _OptFN, $ _OptRole)} ; (NT5.2) _OptLN ::= <last name $LN> (AC5.2) {sprintf($_OptLN,'-ln %s',$LN);} (NT5.3) _OptLN ::= (AC5.3) {$_OptLN = '';} (NT5.4) _OptFN ::= <first name $FN> (AC5.4) {sprintf($ _OptFN, '-fn %s', $FN);} (NT5.5) _OptFN ::= (AC5.5) {$_OptFN = '';} (NT5.6) _OptRole ::= <role $R> (AC5.6) {sprintf($_OptRole,'-role %s',$R);} (NT5.7) _OptRole ::= (AC5.7) {$_OptRole = '';}

  22. Wrapper Architecture • Wrapper Consists of • Implementer • provides the driver that has the primary control of query processing • Provides the QDTL description for the converter • Provides the Data Extraction (DEX) template for the extractor component of the toolkit. • Converter • Driver

  23. Wrapper Architecture

  24. Wrapper Architecture • Wrappers generated with the toolkit behave as server in a client server architecture. • Clients use client support library to issue queries and receive OEM results. • The server support library component of the toolkit receives queries and sends it to driver component for processing. • Driver invokes the converter which finds a query that supports the input query and returns native queries.

  25. Wrapper Architecture • Driver submits the native queries to information source and receives result as OEM objects. • If filter was generated during processing the driver passes the OEM result and the filter to the filter processor. • Data Extractor (DEX) is used to parse the result and identify required data. • DEX is configured with a description of source output and what part of source output needs to be extracted.

  26. Correspondence of OEM to Relational Models • OEM objects are represented relationally by flattening them into tuples of 3 relations top, object and member. • OEM objects can be converted using a few straight forward rules. • For an object o with object id oid, label l and atomic value v the tuple can be written as object(oid,l,v) • If o is a set object then the tuple becomes object(oid,l,set)

  27. OEM to SQL • If o has sub objects oi where 1 ≤ i ≤ n identified by oid then we introduce tuple member(oid,oidi) • Finally if o is a top level object defined by oid then we introduce tuple top(oid) • Relational representation of MSL queries is obtained by querying the top, object and member relations that represent the object structure referenced in the query.

  28. Example • Consider the query *O:-<O person {<LM last_name ‘Smith’>}> • The above MSL query can be written as the following datalog query. answer(O):- top(O), object(O,person,set), member(O,LM), object(LM, last_name, ’Smith’) • Paper contains an algorithm that for a given MSL finds supporting queries from QDTL and if required creates a filter to be applied to OEM result objects.

  29. Conclusions • Toolkit that facilitates implementation of wrappers developed. • Heart of toolkit is the converter that maps incoming queries into native commands of the source. • Converter provides translation flexibility of systems like Yacc, but gives substantially more power (translates a wider class of queries)

More Related