1 / 21

INTEGRATION

INTEGRATION . Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu. USING UNITY. Ken Barker University of Calgary barker@cpsc.ucalgary.ca. Summary.

morey
Download Presentation

INTEGRATION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INTEGRATION Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu

  2. USING UNITY Ken Barker University of Calgary barker@cpsc.ucalgary.ca

  3. Summary • The Unity prototype tackles the schema integration problem by constructing an integrated, global view in a bottom-up approach. • Constructing a global view in this manner requires describing data source semantics using a dictionary and a XML-based language. • The extraction process, which is semi-automatic in nature, is separated from the integration process. • Thus, the integration process is automatic, and there is no requirement for a global human integrator. • Systematic naming using a dictionary allows global queries to be graphically constructed without specifying joins between global relations. • The global view produced demonstrates properties similar to a dynamically constructed Universal Relation.

  4. Benefits and Contributions • The architecture automatically integrates relational schemas into a global view for querying. • Unique contributions: • Synthesizing a global view from the bottom-up instead of top-down improves integration scalability. • Organizing the global view as a hierarchy of concepts instead of relations or predicates simplifies querying as the user does not have to specify specific relations or join conditions. This is called Querying by Context (QBC). • Query processing is achieved by dynamically discoveringextraction rules based on the naming of fields and tables. • The discovered rules are similar to the extraction rules of global-as-view (GAV) systems.

  5. Unity Overview • Unity is a software package that performs bottom-up integration with a GUI. • Developed using Microsoft Visual C++ 6 and Microsoft Foundation Classes (MFC). • Unity allows the user to: • Construct and modify standard dictionaries. • Build X-Specs to describe data sources including extraction of metadata using ODBC and mapping system names to dictionary terms. • Integrate X-Specs into an integrated view. • Transparently query integrated systems using ODBC and automatically generate SQL queries.

  6. Architecture Components • The architecture consists of four components: • A standard dictionary (SD) to capture data semantics • SD terms are used to build semantic names describing semantics of schema elements. • X-Specs for storing data source descriptions • Relational database info. stored and transmitted using XML. • Stores semantic names to describe schema elements. • Integration Algorithm • Identical concepts in different databases are identified by similar semantic names. • Produces an integrated view of all database concepts. • Query Processor • Allows the user to formulate queries on the view. • Translates from semantic names in integrated view to SQL queries and integrates and formats results. • Involves determining correct field and table mappings • and discovery of join conditions and join paths.

  7. Querying by Context (QBC) • Querying by context (QBC) is a methodology for querying relational databases by semantics. • Querying is performed by selecting semantic names that represent query concepts from the integrated view. • The integrated, context view contains all concepts present in the databases referenced by semantic names. • Query by Context performs dynamic closure relating concepts for the user as they browse the integrated view. • This allows a limited form of recursive queries and eliminates the need for the user to specify joins. • The query processor maps the user’s selections and criteria to an actual SQL query.

  8. References • Publications: • Unity - A Database Integration Tool, R. Lawrence and K. Barker, TRLabs Emerging Technology Bulletin, Jan. 2000. • Multidatabase Querying by Context, R. Lawrence and K. Barker, DataSem2000, pages 127-136, Oct. 2000. • Integrating Relational Database Schemas using a Standardized Dictionary, SAC’2001 - ACM Symposium on Applied Computing, pages 225-230, March 2001. • Querying Relational Databases without Explicit Joins DASWIS 2001- International Workshop on Data Semantics in Web Information Systems (with ER'2001), Nov. 2001. • Further Information: • http://www.cs.uiowa.edu/~rlawrenc/

  9. Integration Example

  10. BodyWorks Systems Customer Order Database Invoice Database Shipment Database Web Server Custom Accounting Package Shipment Tracking Software Bodyworks is a fictional company with 3 legacy databases that must be integrated for management reporting.

  11. Query-Driven Data Extraction Integrated Context View X-Spec Editor Standard Dictionary Integration Algorithm Query Processor and ODBC Manager Invoice Database Order Database Shipment Database Unity Software ODBC Querying

  12. Integration Processes • Integration is performed with 3 separate processes: • Capture process: independently extract database schema information into a XML document called a X-Spec. • This process is a semi-automatic description using a dictionary. • Integration process: combines X-Specs into a structurally-neutral hierarchy of database concepts called an integrated context view. • This process performs automatic name matching, but imprecision may occur. • Query process: allows the user to formulate queries on the integrated view that are mapped by the query processor to structural queries (SQL) , executed using ODBC, and the results are combined using global keys. • Users do not have to specify joins when querying the global view.

  13. The Unity Prototype

  14. What is the open problem? • The GAV and LAV approaches are both viable methods for solving data integration. • However, the open problem is that neither approach performs schema integration - the construction of the global view itself. • GAV - GV constructed (schema integration performed) by global designer when specifying extraction rules. • LAV - GV is pre-defined using some previous integration process (most likely manual in nature). • Both methods rely on the concept of a global user to create the global schema.

  15. How Unity is Different • Our integration architecture called Unity is different because it approaches the integration problem from a different perspective: • Thus, the integration problem is tackled from a different set of starting assumptions: • Do not assume pre-existing or manually created GV. • However, assume we have a dictionary and a language for describing schema and data element semantics. • Attempt to automatically build a GV from source descriptions of each data source. How can we automate, or semi-automate, the construction of the global view by extracting information from the local data sources?

  16. The Unity Approach • Given a set of data sources and a dictionary and a language to describe data semantics: • 1) Semi-automatically extract and represent data source semantics in the language using the dictionary. • 2) Automatically match concepts across data sources by using the dictionary to determine related concepts. • This process effectively builds the global level relations or objects initially assumed or created in other approaches. • However, since there is no manual intervention, the precision of global view construction is affected by inconsistencies in the descriptions of the data sources and matching concepts. • 3) Automatically generate queries specified by the user using dictionary terms (not structures) and map the user's query to appropriate data elements in the local sources.

  17. What is wrong with SQL? • There is nothing wrong with SQL. However, SQL is not a simple query language for many reasons: • Querying by structure does not hide complexities introduced due to database normalization. • Structures (fields and tables) may be assigned poor names that do not adequately describe their semantics. • Notion of a “join” is confusing for beginner users especially when multiple joins are present. • SQL forces structural access which does not provide logical query transparency and restricts logical schema evolution. • Querying multiple databases (without a global view) using SQL-variants is complex because naming and structural conflicts must be resolved during query formulation.

More Related