1 / 46

DWXML A Preservation Format for Data Warehouses

Carlos Aldeias Gabriel David Cristina Ribeiro. DWXML A Preservation Format for Data Warehouses. Introduction. Outline. Motivation. DW Preservation. DWXML. DBPreserve Suite. Conclusions. Introduction Motivation Data Warehouse Preservation DWXML Definition

euclid
Download Presentation

DWXML A Preservation Format for Data Warehouses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Carlos Aldeias Gabriel David Cristina Ribeiro DWXMLA Preservation Format for Data Warehouses

  2. Introduction Outline Motivation DW Preservation DWXML DBPreserve Suite Conclusions Introduction Motivation Data Warehouse Preservation DWXML Definition DBPreserve Suite Application Conclusions

  3. Introduction Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions • Companies, institutions and governments rely increasingly on On-Line Analytical Processing (OLAP) • Major benefits for analysis and decision support • Selective extraction and analysis of data from different perspectives • Most systems are structured using Data Warehouses • OLAP types: • ROLAP – Relational OLAP • MOLAP – Multidimensional OLAP • HOLAP – Hybrid OLAP

  4. Introduction Preservation Concern Motivation DW Preservation DWXML DBPreserve Suite Conclusions • Data Warehouse as a digital object • Different from conventional digital objects: data warehouses are complex digital objects • They are based on a dimensional model: Star schema, facts, dimensions with levels and hierarchies, bridges and datamarts • They are often implemented on relational databases (ROLAP), keeping data in tables, views and schemas • Data vs. Metadata • The primary data stored into tables must be archived as well as the metadata, both at the relational and dimensional levels • Technologies are evolving continually • Data Warehouses created with today’s technologies may not be accessible with the upcoming versions

  5. Introduction Relevant Works Motivation DW Preservation DWXML DBPreserve Suite Conclusions InterPARES

  6. Introduction Databases / Data Warehouses Motivation DW Preservation DWXML DBPreserve Suite Conclusions • Is Data Warehouse just another name for Database

  7. Introduction Data Warehouse:Dimensional Model Concepts Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  8. Introduction DBPreserve Project Motivation DW Preservation DWXML DBPreserve Suite Conclusions [CCSDS, 2002] [Rahman, 2010]

  9. Introduction Data Warehouse Preservation Motivation DW Preservation DWXML DBPreserve Suite Conclusions Existing preservation approaches don´t comply with data warehouse preservation requirements Regarding data warehouses implemented with relational database technologies, some efforts can be reused Although, they still lack an important metadata layer that describes the data warehouse structure and entities

  10. Introduction Data Warehouse Metadata Motivation DW Preservation DWXML DBPreserve Suite Conclusions Example from a case study, implemented using Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production • Star Schema - fact table is surrounded by dimensional tables • Bridge Tables

  11. Introduction DW Metadata – Fact Tables Motivation DW Preservation DWXML DBPreserve Suite Conclusions • A fact table is the center of a star schema • Consists of facts of a business process • Facts • Measures : • ADDITIVE • NON ADDITIVE • SEMI ADDITIVE

  12. Introduction DW Metadata – Dimensions Motivation DW Preservation DWXML DBPreserve Suite Conclusions • Usually represented by one or more dimensional tables • Levels • Hierarchies • Attributes They give the context and meaning to the facts Represent the relevant vectors of analysis of the business process facts

  13. Introduction DW Metadata: Create Dimension Motivation DW Preservation DWXML DBPreserve Suite Conclusions CREATEDIMENSIONclass_dim LEVEL class IS (IPDW_CLASS.CLASS_ID) LEVEL course IS (IPDW_CLASS.COURSE_ID) HIERARCHYclass_rollup( class CHILD OF course) ATTRIBUTE class DETERMINES (IPDW_CLASS.CODE, IPDW_CLASS.ACRONYM, IPDW_CLASS.NAME, IPDW_CLASS.TYPE) ATTRIBUTE course DETERMINES (IPDW_CLASS.COUR_CODE, IPDW_CLASS.COUR_ACRONYM, IPDW_CLASS.COUR_NAME, IPDW_CLASS.COUR_TYPE, IPDW_CLASS.COURSE_PREVIOUS_COD); Project’s case study implements the dimensional model using Oracle Database 11g

  14. Introduction DW Metadata – Bridge Table Motivation DW Preservation DWXML DBPreserve Suite Conclusions Bridge tables are used to resolve a many to many relationship between a fact and a dimension Also used to flatten out a hierarchy in a dimension

  15. Introduction DW Metadata – Snowflake Motivation DW Preservation DWXML DBPreserve Suite Conclusions • Snowflake schema is similar to a star schema, but one or more dimension tables are partially normalized • Sub-dimensions

  16. Introduction DW Metadata – Datamart Motivation DW Preservation DWXML DBPreserve Suite Conclusions Subset of a data warehouse Typically, a set of star and snowflake schemas

  17. Introduction Data Warehouse Preservation Format Proposal Motivation DW Preservation DWXML DBPreserve Suite Conclusions • Analysis of relational database preservation formats • DBML (Database Markup Language) [Ramalho, 2007] • SIARD Format (Software Independent Archiving of Relational Databases) [SFA, 2008] • Analysis on Data Warehouse XML representation • XCube (for multidimensional schemas) [Hummer, 2003]

  18. Introduction Data Warehouse Preservation Format Proposal Motivation DW Preservation DWXML DBPreserve Suite Conclusions • Decision on extending the SIARD Format • Separates metadata from primary data • Segmented representation of primary data • Ready to use application that creates a SIARD format from a relational database (MSAccess, MSSQL and Oracle) • Add a metadata layer regarding the dimensional model perspective • Extracting data warehouse metadata from data dictionary • Defining a XML structure for the dimensional model • Embedding it into the SIARD format

  19. Introduction Relational Database Preservation with SIARD Motivation DW Preservation DWXML DBPreserve Suite Conclusions • Header folder for metadata • Content folder for primary data • Organized in directories • Single XML file for each data table • SIARD Suite – set of tools formigrating, editing and reactivating databases

  20. Introduction Extending SIARD format Motivation DW Preservation DWXML DBPreserve Suite Conclusions Add a XML file with the extra metadata layer for data warehouse characterization Add the corresponding schema No action on the primary data Data in the DW ingested to the SIARD Suite as a relational database

  21. Introduction DWXML Definition Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  22. Introduction DWXML - Stars Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  23. Introduction DWXML - FactTable Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  24. Introduction DWXML - Dimension Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  25. Introduction DWXML – Level and Attribute Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  26. Introduction DWXML –Hierarchy Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  27. Introduction DWXML –Datamart Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  28. Introduction DWXML –Schema Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  29. Introduction DWXML –Table Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  30. Introduction DWXML –View Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  31. Introduction DWXML –Sample Motivation DW Preservation DWXML DBPreserve Suite Conclusions <?xml version="1.0"encoding="UTF-8"?> <dwxmlversion="1.0"xsi:noNamespaceSchemaLocation="dw.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <stars> <star> <name>IPDW_ANSWERS_STAR</name> <description>Star related to the answers</description> <factTable> <schema>CALDEIAS</schema> <name>IPDW_ANSWERS</name> <facts> <fact> <name>ANSWER</name> <column>ANSWER</column> <measure>ADDITIVE</measure> </fact> </facts> </factTable> <ray> <dimension> <schema>CALDEIAS</schema> <name>IPDW_QUESTION</name> </dimension> </ray> <ray> ... </ray> </star> </stars> ... </dwxml>

  32. Introduction DBPreserve Suite ApplicationFeatures Motivation DW Preservation DWXML DBPreserve Suite Conclusions • Integrates the SiardFromDb application to build the SIARD format of the data warehouse • Extracts metadata for characterization of the dimensional model • Schemas, dimensions, hierarchies, levels, attributes, tables, table comments, primary and foreign keys, views • Sorts the tables according to their role in the data warehouse • Proposes a DWXML description based on the extracted metadata • DWXML editing using GUI • Graphical representation of star schemas and dimensions and their relationships • Creates, views and embeds the DWXML file into the SIARD format • Access and retrieves the primary data

  33. Introduction DBPreserve Suite Architecture Motivation DW Preservation DWXML DBPreserve Suite Conclusions Metadata Module SIARD Module Output Module DWXML Module Connection Module SIARDfromDB OJDBC, … JDOM Netbeans Platform 7 RC1 | JDK 7

  34. Introduction DBPreserve Suite Connection to the DW Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  35. Introduction DBPreserve Suite SIARDfromDB Integration Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  36. Introduction DBPreserve Suite Metadata Extraction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  37. Introduction DBPreserve Suite DWXML Proposal Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  38. Introduction DBPreserve Suite Schemas Viewer: Stars Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  39. Introduction DBPreserve Suite Schemas Viewer: Dimensions Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  40. Introduction DBPreserve Suite DWXML Editing Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  41. Introduction DBPreserve Suite DWXML Embedding and Viewing Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  42. Introduction DBPreserve Suite Primary Data retrieval from XML Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  43. Introduction DBPreserve Suite Case Study Results Motivation DW Preservation DWXML DBPreserve Suite Conclusions • Extraction times: • SIARD data: 2h30m • SIARD metadata : 4 min • DWXML metadata : 3 sec • Data Warehouse • 17 tables (one with more than 2M records) • Data size: 115 MB • SIARD Format • 17 XML files with primary data (one with 323 MB) • SIARD metadata size: 71 KB • DWXML metadata size: 86 KB • Total size: 360 MB

  44. Introduction Conclusions Motivation DW Preservation DWXML DBPreserve Suite Conclusions • Definition of DWXML, a representation of the dimensional model of a DW • Design and implementation of DBPreserve Suite • Extraction of the metadata that describes the dimensional model • Manual adjustments of the dimensional model • Generation of the XML file and embedding into SIARD format file • Primary data browse • The result is compliant with the SIARD Suite tools (just the relational level)

  45. Introduction References Motivation DW Preservation DWXML DBPreserve Suite Conclusions

  46. Introduction References Motivation DW Preservation DWXML DBPreserve Suite Conclusions

More Related