1 / 21

Rule-based Management of Schema Changes at ETL sources

Rule-based Management of Schema Changes at ETL sources. G. Papastefanatos 1 , P. Vassiliadis 2 , A. Simitsis 3 , T. Sellis 1,4 , Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) { gpapas , yv }@ dblab . ece . ntua . gr

Download Presentation

Rule-based Management of Schema Changes at ETL sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rule-based Management of Schema Changes at ETL sources G. Papastefanatos1, P. Vassiliadis2, A. Simitsis3, T. Sellis1,4, Y. Vassiliou1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas, yv}@dblab.ece.ntua.gr (2) University of Ioannina, Ioannina, Hellas (Greece) pvassil@cs.uoi.gr (3) HP Labs, Palo Alto, California, USA alkis@hp.com (4) Institute for the Management of Information Systems (Greece) timos@imis.athena-innovation.gr

  2. Outline • Motivation • Graph-based representation of ETL processes • Regulating ETL Evolution • Hecataeus Internals • Conclusions MEDWa ‘09, Riga, September 2009

  3. Outline • Motivation • Graph-based representation of ETL processes • Regulating ETL Evolution • Hecataeus Internals • Conclusions MEDWa ‘09, Riga, September 2009

  4. Data Warehouse Environment MEDWa ‘09, Riga, September 2009

  5. Data Warehouse Schema Evolution • Data warehouses are evolving environments, e.g.: • A dimension is removed or renamed • The structure of a dimension table is updated • A fact table is completely decoupled from a dimension • The measures of a fact table change • An ETL source is modified, etc MEDWa ‘09, Riga, September 2009

  6. Evolving ETL sources… • Schema Changes on the sources of ETL processes. Design constructs are • Added, Removed, Modified • ETL processes affected: • Syntactically – i.e., become invalid • Semantically – i.e., must conform to the new source database semantics • Adaptation of ETL flows • time-consuming task, • treated in most of the cases manually by the administrators/developers MEDWa ‘09, Riga, September 2009

  7. We would like to know... • What part of the process is affected and how if e.g., an attribute is deleted? • Can we predict and handle the impact of changes? • To what extent can readjustment be automated? MEDWa ‘09, Riga, September 2009

  8. Hecataeus Framework • Graphbasedrepresentation of ETL workflows • Evolution events are mapped to changes on the graph constructs • Annotation of graph with rules for adapting ETL processes to source schema evolution • Mechanism for performing what-if analysis for potential changes of ETL sources MEDWa ‘09, Riga, September 2009

  9. Outline • Motivation • Graph-based representation of ETL processes • Regulating ETL Evolution • Hecataeus Internals • Conclusions MEDWa ‘09, Riga, September 2009

  10. ETL Workflow representation MEDWa ‘09, Riga, September 2009

  11. Q: SELECT EMP.Emp#, Sum(WORKS.Hours) as T_Hours FROM EMP, WORKS WHERE EMP.Emp# = WORKS.Emp# GROUP BY EMP.Emp# Join, GB Query representation MEDWa ‘09, Riga, September 2009

  12. Outline • Motivation • Graph-based representation of ETL processes • Regulating ETL Evolution • Hecataeus Internals • Conclusions MEDWa ‘09, Riga, September 2009

  13. Graph Annotation with rules According to prevailing policy, the proper action is taken  graph evolution MEDWa ‘09, Riga, September 2009

  14. Event Add attribute Phone to relation EMP Example Q: SELECT EMP.Emp#, EMP.Name FROM EMP Q: SELECT EMP.Emp#, EMP.Name, Phone FROM EMP MEDWa ‘09, Riga, September 2009

  15. Outline • Motivation • Graph-based representation of ETL processes • Regulating ETL Evolution • Hecataeus Internals • Conclusions MEDWa ‘09, Riga, September 2009

  16. System architecture XML , Graph Viewer Import / jpeg Export DDL files Graph Visualization Scenarios SQL scripts Evolution Manager DB Schema representation Metric Manager Workload representation Parser Evolution Semantics Create Validate DB Workload Schema DB Catalog MEDWa ‘09, Riga, September 2009

  17. Evolution Manager Architecture MEDWa ‘09, Riga, September 2009

  18. Outline • Motivation • Graph-based representation of ETL processes • Regulating ETL Evolution • Hecataeus Internals • Conclusions MEDWa ‘09, Riga, September 2009

  19. Research in DB Evolution • DB Schema Evolution • OODB evolution • Schema versioning • DW Schema Evolution • Taxonomy of evolution events • Versioning • Materialized Views Evolution • View adaptation & synchronization • Evolution wrt Model Mappings MEDWa ‘09, Riga, September 2009

  20. Summarizing • The problem of adaptation of ETL workflows to evolvable data sources • Graph –based representation of ETL activities • Graph enrichment with semantics for evolution events • Graph annotation with rules for handling a priori evolution events • Hecataeus: Framework for performing and evaluating evolution scenarios in DW environments MEDWa ‘09, Riga, September 2009

  21. Thank you ... Hecataeus: A tool for visualizing and performing what-if analysis for evolution scenarios http://www.cs.uoi.gr/~pvassil/projects/hecataeus/ MEDWa ‘09, Riga, September 2009

More Related