1 / 41

Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George

Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George. Modelling & Data Integration. Key Elements of today’s Presentation Key Drivers for Data Integration Dimensions and Issues in Integration Three Integration Approaches. David George.

aelan
Download Presentation

Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George

  2. Modelling & Data Integration Key Elements of today’s Presentation • Key Drivers for Data Integration • Dimensions and Issues in Integration • Three Integration Approaches Data Integration David George

  3. Drivers for Data Integration Data Integration David George

  4. Drivers for Data Integration (1) • Organisations evolving as global entities with distributed data. • Systems characterised by mix of legacy and new databases and applications. • Organisational change : • Organic growth – size and diversity. • Business re-engineering. • Corporate mergers and acquisitions. Data Integration David George

  5. Drivers for Data Integration (2) • Organisations evolved as collections of distinct, autonomous departments with disconnected systems e.g. in financial services. • Trends in Business Intelligence initiatives: • Decision-making support. • Customer segmentation. • Marketing strategies. • Development of distributed or multidatabase systems. Data Integration David George

  6. Dimensions and Issues in Integration Data Integration David George

  7. Architecture & Design Issues • Multidatabase systems can be classified in two ways: • Homogeneous systems – local databases having same techniques and language. • Heterogeneous systems – local databases demonstrating diverse data models and language. • Key Dimensions in systems heterogeneity • System heterogeneity – hardware, OS, DBMS • Semantic heterogeneity - models and data Data Integration David George

  8. Why Heterogeneity/Conflict? Design >> >>>> <<<< << Check Translating conceptualisations of the real world into database world representations Data Integration David George

  9. Research Work Conceptualised • Books Model (a) The data of interest is about Books, their Publishers and adopting Universities. • Publications Model (b) The data of interest is about Publications and their Types Data Integration David George

  10. Books Name City Title Publisher Book University Published by Adopted by Refer to Name Address Name Topics - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Publications Title Publication Code Publisher contains Word Keywords Code Research Area Data Integration David George

  11. Name City A Title Publisher Book University Published by Adopted by Refer to Name Address Name Topics - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - B Title Publication Publisher Published by Code contains Name Word Name Topics Keywords Code Research Area Data Integration David George

  12. Books and Publications Integrated Name City Title Name Address Publisher Book University Published by Adopted by Published by Refer to Name Publication Topics contains Code Research Area Title Code Data Integration David George

  13. Semantic Heterogeneity/Conflict Structural Conflicts • Generalisation versus Specialisation Conflicts. • Entity versus attributes. • Naming conflicts. Attribute (Domain) Conflicts • Data Type conflicts. • Measure and Scale conflicts. • Integrity, Presence & Absence. • Data Values Data Integration David George

  14. Semantic Heterogeneity/Conflict • Generalisation/Specialisation Conflicts. (i.e. Structural) • Naming conflicts. • Synonyms e.g. vs • Homonyms e.g. vs Customer Client Market (Products) Market (Customers) Data Integration

  15. Semantic Heterogeneity/Conflict • Data Type (representation) conflicts. • Student - 26254006 (integer or string) • Student - No vs Name (integer or string) • Measure and Scale etc conflicts. • Dimension - volume vs weight • Measure - light years vs miles • Scale - miles vs kilometres • Precision - 1:100 versus A:E • Date - dd/mm/yyyy vs mm-dd-yy ??? Data Integration David George

  16. Semantic Heterogeneity/Conflict • Integrity Constraints e.g. • Age Range <21 vs Age >18 • Referential conflict 1:1 vs 1:M (e.g. 1 invoice for 1/ M orders) • Presence/Absence. • No null, nulls – e.g. optional • No corresponding attribute • Data Values • Same items different values Data Integration David George

  17. Integration Approaches Data Integration David George

  18. Integration Approaches • Federated Database (Multidatabase) Systems. • Data Warehouse (Materialised in house) Systems. • Mediators (Virtual integration) Systems. Data Integration David George

  19. Federated Database Systems Data Integration David George

  20. Federated Databases (1) Data Integration David George

  21. Federated Databases (2) • A Class of heterogeneous databases that: • Consist of both new and old systems. • Previously existed in their own stand-alone (autonomous) environments. • Integration is a consequence of distribution. • Organisation can adopt different architectures i.e. the way databases are mapped together: • Loosely Coupled integrations. • Tightly Coupled integrations. Data Integration David George

  22. Federated Databases (3) • Tightly Coupled Federations • Federation administrator determines schema view for all component systems in the federation. • Negotiates export schemas (tables and attributes) from federation participants who control exports of local schemas. • Local schema exports integrated as a federated schema. • Less autonomy at federation user level for view creation. Data Integration David George

  23. Federated Databases (4) • Loosely Coupled Federations • The federated component databases have a greater degree of autonomy. • No central schema view is imposed on users. • Federated user is effectively an administrator creating views. • User employs a MDB Query Language (v TC schema integration). Data Integration David George

  24. Federated Databases (5) • Sharing is made explicit by allowing export schemas from the local or component database. • The export schemas are imported to the federation to represent the shareable federated database. • Each source can call on others for information. • FDBMSs differ from homogeneous Distributed DBMSs – they use the same data model and DBMS. • DDBMSs sharing is therefore implicit. Data Integration David George

  25. Data Warehousing Systems Data Integration David George

  26. Data Warehousing (1) Local Operational Network Internet - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Warehouse Decision Support & Mining Integration & Storage R2 R3 Data Integration David George

  27. Data Warehousing (2) • Represents the physical separation of operational and decision support environments. • Operational data provides the raw material for: • Decision support systems. • Data-mining (DM). • E.g. identifying trends or characteristics. • DM = process of “non-trivial extraction of implicit, previously unknown, and potentially useful information”. Data Integration David George

  28. Data Warehousing (3) • Warehouse integrates multiple, heterogeneous data sources - e.g. Relational DBs, flat files. • Data is pre-fetched into a central or intermediate warehouse repository by mediation process. • Data is “cleaned” and data integration techniques applied e.g. filtered, joined or aggregated. • Data may be transformed to conform to the warehouse schema. • Provides consistency in naming conventions, data structures, attributes, etc. Data Integration David George

  29. Data Warehousing (4) • Data then stored (materialised) in warehouse repository – possibly in separate data marts. • Result is a repository of synthesised data for management decision-making. • Queries are made over the repository’s global schema. • Information is independent from the source data. • Data extraction tends to be periodically. Data Integration David George

  30. Mediator (+Wrapper) Systems Data Integration David George

  31. Mediator Systems (1) Network Internet Query Translation Mediator Data Integration David George

  32. Mediator Systems (2) • Global schema created and mapped to the source schemas. • User makes queries over global, mediated schema. • Mappings can be either: • Global-as-view (GAV). • Local-as-view (LAV). • Mediator translates global schema query and reformulates it into sub-queries of local schemas. • Wrappers execute and return. Data Integration David George

  33. Mediator Systems (3) • Wrappers standardise how source information is described and accessed (i.e. they translate or adapt). • Query answers are returned to the user on demand – after sources are interrogated. • Thus data is always up-to-date (v. Warehousing). • Mediators integrate information view, without integrating the source data. Data Integration David George

  34. Mediator Systems (4) • Results in a homogeneous information source using views - based on the mediated (global) schema. • Integration is virtual i.e. retrieved by the mediator but not stored in any central repository. • Differs from Warehousing Queries – where made to materialised data. • In short – provides virtualsource schema integration via schema mapping and integratedview. Data Integration David George

  35. Comparisons Data Integration David George

  36. Federation versusWarehousing & Mediation • Federation represents a more “static” approach – using agreed couplings to allow view creation. • Warehousing and Mediation addresses integration in a more “dynamic” way – using extraction, transformation and integration processes. Data Integration David George

  37. Warehousing vs. Mediation • Warehouse: • Update-driven: i.e. in warehouse repository • Heterogeneous data is integrated in advance and stored in-house for direct query and analysis. • Mediation: • Wrapper and Mediator layeron top of source DBs. • Query-driven: Query to mediated schema then translated into queries appropriate to sources. • Results integrated into a global answer set. Data Integration David George

  38. Summary Data Integration David George

  39. Summary • Drivers for Data Integration • Organisational change. • Business Intelligence and Strategies. • Integration Issues • Different Conceptual Model representations. • Resulting Semantic Heterogeneities. • Integration Approaches • Federated Systems. • Data Warehousing and Mediator Systems. Data Integration David George

  40. Next step …… Data Integration David George

  41. Research Resources Reference Material • Journals • Books • Presentation slides UCLAN Website • Internal: http://janus/dgeorge/integration/journals.asp • External: http://www.janus.computing.uclan.ac.uk/dgeorge/integration/journals.asp Data Integration David George

More Related