1 / 17

Using AutoMed Metadata in Data Warehousing Environments

Using AutoMed Metadata in Data Warehousing Environments. Hao Fan Alexandra Poulovassilis School of Computer Science & Information Systems Birkbeck college, University of London ACM International Workshop on Data Warehousing and OLAP 7 th November 2003. Outline. What is AutoMed?

Download Presentation

Using AutoMed Metadata in Data Warehousing Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using AutoMed Metadata in Data Warehousing Environments Hao Fan Alexandra Poulovassilis School of Computer Science & Information Systems Birkbeck college, University of London ACM International Workshop on Data Warehousing and OLAP 7th November 2003

  2. Outline • What is AutoMed? • Creating AutoMed DW Metadata • Using AutoMed DW Metadata • Comparison of AutoMed and Conceptual Data Model (CDM) approaches • Conclusion

  3. What is AutoMed • HDM (Hypergraph Data Model) schemas consist of a set of Nodes, Edges and Constraints • Transformation Pathways • add/extend • delete/contract • rename • IQL language (See http://www.doc.ic.ac.uk/automed/techreports/ for a technical report on “The Automed Intermediate Query Language”.)

  4. A Data Integration Example 7. contractHierarchy (<<Salary, Person,id,id>>); 8. contractHierarchy (<<Salary,Job,job_id,job_id>>); 9. contractAtt (<<Salary, salary>>); 10. contractAtt (<<Salary, dept_id>>); 11. contractFact (<<Salary, id, job_id>>); 12. contractAtt (<<Job, job_descr>>); 13. contractDim (<<Job, job_id>>); 14. contractAtt (<<Person, name>>); 15. contractDim (<<Person, id>>); 1. addRel (<<Dept,id>>, <<Dept_id>>); 2. addAtt (<<Dept,dept_name>>, <<_,Dept_id,name>>); 3. addAtt (<<Dept,total_salary>>, gc sum [(d,s)|(i,j,s)<-<<Salary,salary>>; (i',j',d)<-<Salary,dept_id>>; i=i'; j=j']); 4. delEdge (<<_,Dept_id,name>>, <<Dept,dept_name>>); 5. delNode (<<name>>,[n|(d,n)<-<<Dept,dept_name>>); 6. delNode (<<Dept_id>>, <<Dept,id>>);

  5. Data Transformation/Integration

  6. Create Automed metadata repository Any DBMS supporting JDBC Specify data models All data Models used in DW schemas e.g., RDB, XML, Multi-Dim, etc. Extract data source schemas Define transformation pathways Manually Automatically Creating AutoMed Metadata

  7. Transforming Single-source cleansing Multi-source cleansing Integration Summarizing Creating data marts Creating AutoMed Transformation Pathways AutoMed Transformation Pathways can be used for the following data warehousing activities:

  8. adds a new construct `temp’ to the schema, whose extent consists of clean data; contracts the dirty construct , C, which is being cleaned adds a new construct, C, derived from the the data in`temp’ ; deletes or contracts the `temp’ construct. Data Cleansing The general pathway used for Data Cleansing:

  9. Single-source Cleansing Person (id, name, address, zip, city, country) addRel (<<Temp,id,address,zip>>, toolCall 'QuickAddress Batch' '<<Person,address>>' '<<Person,zip>' ' <<Person,city>>'); contractAtt (<<Person,zip>>); contractAtt (<<Person,address>>); addAtt (<<Person,zip>>, [(i,z)|(i,a,z)<-<<Temp,id,address,zip>>]); addAtt (<<Person,address>>, [(i,a)|(i,a,z)<-<<Temp,id,address,zip>>]); deleteRel (<<Temp,id,address,zip>>, [(i,a,z)|(i,a)<-<<Person,address>>; (i',z)<-<<Person,zip>>;i=i']);

  10. Multi-source Cleansing Person (id, maritalStatus) Emp (id, name, maritalStatus) addAtt(<<Person,maritalStatus_new>>, <<Person,maritalStatus>>-- [(i,s)|(i,s)<-<<Person,maritalStatus>>; (i',s')<- <<Emp,maritalStatus>>; i = i'; not (s = s')]); contractAtt(<<Person,maritalStatus>>); renameAtt (<<Person,maritalStatus_new>>, <<Person,maritalStatus>>);

  11. Incremental View Maintenance Data Lineage Tracing Using AutoMed Metadata

  12. Incremental View Maintenance S GS TP = tp1, …, tpr D V i r 1 1 i r tp2, …, tpi tp1 tpi+1, …, tpr Using AutoMed Metadata for IVM • See H. Fan. Incremental view maintenance and data lineage tracing in heterogeneous database environments. In proc. BNCOD’02 PhD Summer school, Sheffied, 2002.

  13. Data Lineage Tracing Algorithms Fully Materialized Pathway Fully Virtual Pathway Partially Materialized Pathway Using AutoMed Metadata for DLT • Data Lineage • Affect-Pool • Origin-Pool • DLT formulae • qsAP(t) • qsOP(t) • SeeH. Fan and A. Poulovassilis. Tracing data lineage using schema transformation pathways. In knowledge Transformation for the Semantic Web, IOS Press, 2003.

  14. AutoMed vs. CDM approach

  15. Semantic mismatches Tightly coupled with the CDM Not straightforward to reuse the integration effort if a source schema is changed No semantic mismatch Possible to extend data warehouse views into a different data model Easily reuse the trans-formation and integration efforts if a source schema is changed - see Section 5 of the paper Discussion Conceptual Data Model: AutoMed:

  16. Conclusion • AutoMed metadata can be used for expressing data warehousing activities, including data cleansing; • AutoMed metadata can be used for incrementally maintaining the DW data and data lineage tracing; • Compared with CDM, AutoMed has several advantages; • In contrast to commercial ETL tools, AutoMed metadata provides sufficient information for IVM and DLT. Limitations: • Not all data warehouse metadata can be captured by AutoMed • Currently, transformation pathways are created manually. However, we are investigating automatic/semi-automatic generation techniques

  17. Acknowledge Thank you!

More Related