1 / 65

Data Warehouse

Data Warehouse. Objectives. Need for Data Warehouse. What is Data Warehouse? Data Warehouse Properties. Data Warehouse Architectures. Data Marts. Corporate Information Factory. Extraction, Transportation, Loading and Transformation. Design in Data Warehouses. Data Warehousing Schemas.

rburcham
Download Presentation

Data Warehouse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehouse

  2. Objectives • Need for Data Warehouse. • What is Data Warehouse? • Data Warehouse Properties. • Data Warehouse Architectures. • Data Marts. • Corporate Information Factory. • Extraction, Transportation, Loading and Transformation. • Design in Data Warehouses. • Data Warehousing Schemas.

  3. Decision support questions that enterprises need to have answered • How did sales representatives perform over different periods of time? • What are the popular products? • What types of customers buy what types of products? • How much are the various internal organizations spending on what products?

  4. Cont. • What were the variances between the amounts budgeted and the amounts spent? • What positions are being filled by people with what types of background? • What is the average pay for people within different age brackets? • What is the average pay for people within different age brackets?

  5. What is a Data Warehouse? • A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing • A common way of introducing data warehousing is to refer to the characteristics of a data warehouse as set forth by “ William Inmon ”: • Subject Oriented • Integrated • Nonvolatile • Time Variant

  6. Subject Oriented Integrated Non Volatile Time Variant Data Warehouse Properties Data Warehouse

  7. Customer Financial Information Data Warehouse Subject Area Subject Oriented • Data is categorized and stored by business subject rather than by application. • For example, to learn more about your company’s sales data , "Who was our best customer for this item, in this region last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented. Product Region Time Customer Operational Systems

  8. Integrated Data warehouses must put data from disparate sources into a consistent format.

  9. Time Data Jan/03 Data for January Feb/03 Data for February Mar/03 Data for March Data Warehouse Time Variant (time series) • Data is stored as a series of snapshots, each representing a • period of time.

  10. Load Operational Databases Warehouse Database INSERT Read UPDATE DELETE Read Non Volatile • Typically data in the data warehouse is not updated or deleted. Nonvolatile means that, once entered into the warehouse, data should not change .This is logical because the purpose of a warehouse is to enable you to analyze what has occurred.

  11. Other Characteristics of Data Warehouse • Summarized • Not Normalized • Meta Data • Sources (Both operational and external data are presents)

  12. Summary Data • Provide fast access to pre-computed data • Reduce use of • I/O • CPU • Memory • Distill from • Source systems - lightly summarized • Pre-calculated summaries - highly summarized • Determine requirements early

  13. Units Sold Sales($) Store Product A Total Product B Total Product C Total Fact Data Dimension Data Summary Data • Average • Maximum • Total • Percentage

  14. Summary Data Product Time Store Summary Fact (Derived)

  15. Normalization • Normalized data contains no • Redundancy. • Repeating data. • Key independent columns. • Denormalized data often • Improves efficiency in OLAP systems. • Exists in data warehouse databases. • Comprises derived or summary data. • Star and snowflake models are denormalized.

  16. Meta Data (Data about Data) Provides information about the content of the warehouse. Meta Data includes: • A guide to moving data to the warehouse • Rules for summarization • Business terms used to describe data • Technical terminology • Rules for data extractions

  17. Data Warehouse Architectures • Data Warehouse Architecture (Basic) • Data Warehouse Architecture (with a Staging Area) • Data Warehouse Architecture (with a Staging Area and Data Marts)

  18. Data Warehouse Architecture (Basic) • End users directly access data derived from several source systems through the data warehouse.

  19. Data Warehouse Architecture (with a Staging Area) you need to clean and process your operational data before putting it into the warehouse. You can do this programmatically, although most data warehouses use a staging area instead.

  20. Data Warehouse Architecture (with a Staging Area and Data Marts) you may want to customize your warehouse’s architecture for different groups within your organization. You can do this by adding data marts, which are systems designed for a particular line of business.

  21. Data Marts A Data Mart is a small warehouse designed for strategic business unit or a department. Data Mart Advantages: • The cost is low. • Implementation time is shorter. • They are controlled locally rather than centrally. • They contain less information than the data warehouse and hence have more rapid response. • They allow a business unit to build its own DSS without relying on a centralized IS department. Data Mart Types: • Replicated Data Marts. • Stand-alone Data Marts.

  22. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Corporate Information Factory

  23. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Major Business Functions Business Operations Business Intelligence Business Management

  24. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Operational Systems Operational Systems are the internal and external core systems that run the day-to-day business operations. They are accessed through application program interfaces (APIs) and are the source of data for the data warehouse and operational data store.

  25. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management External Data External Data is any data outside the normal data collected through an enterprise’s internal applications. Generally, external data, such as demographic, credit, competitor, and financial information, is purchased by the enterprise from a vendor of such information.

  26. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Data Acquisition Data Acquisition is the set of processes that capture, integrate, transform, cleanse, and load source data into the data warehouse and operational data store.

  27. Data Problems

  28. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Data Warehouse The Data Warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data used to support the strategic decision-making process for the enterprise.

  29. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Operational Data Store The Operational Data Store is an subject-oriented, integrated, current, volatile collection of data used to support the tactical decision-making process for the enterprise.

  30. Comparing an Operational Data Store and a Data Warehouse

  31. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management CIF Data Management CIF Data Management is the set of processes that protect the integrity and continuity of the data within and across the data warehouse and operational data store. It may employ a staging area for cleansing and synchronizing data.

  32. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Transactional Interface The Transactional Interface is an easy-to-use and intuitive interface for the end user to access and manipulate data in the operational data store.

  33. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Data Delivery Data Delivery is the set of processes that enables end users and their supporting IT groups to filter, format, and deliver data to data marts and oper-marts.

  34. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Exploration Warehouse The Exploration Warehouse is a data mart whose purpose is to provide a safe haven for exploratory and ad hoc processing. An exploration warehouse may utilize specialized technologies to provide fast response times with the ability to access the entire database.

  35. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Data Mining Warehouse The Data Mining Warehouse includes tasks known as knowledge extraction, data archaeology, data exploration, data pattern processing and data harvesting.

  36. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management OLAP Data Mart The OLAP (online analytical processing) Data Mart is aggregated and/or summarized data that is derived from the data warehouse and tailored to support the multidimensional requirements of a given business unit or business function.

  37. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Oper-Mart The Oper-Mart is a subset of data derived from of the operational data store used in tactical analysis and usually stored in a multidimensional manner (star schema or hypercube). They may be created in a temporary manner and dismantled when no longer needed.

  38. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Decision Support Interface The Decision Support Interface is an easy-to-use, intuitive tool to enable end user capabilities such as exploration, data mining, OLAP, query, and reporting to distill information from data.

  39. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Meta Data Management Meta Data Management is the set of processes for managing the information needed to promote data legibility, use, and administration.

  40. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Information Feedback Information Feedback is the set of processes that transmit the intelligence gained through usage of the Corporate Information Factory to appropriate data stores.

  41. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Information Workshop Information Workshop is the set of the facilities that optimize use of the Corporate Information Factory by organizing its capabilities and knowledge, and then assimilating them into the business process.

  42. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Library and Toolbox The Library and Toolbox is the collection of meta data and capabilities that provides information to effectively use and administer the Corporate Information Factory. The library provides the medium from which knowledge is enriched. The toolbox is a vehicle for organizing, locating, and accessing capabilities.

  43. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Workbench The Workbench is a strategic mechanism for automating the integration of capabilities and knowledge into the business process.

  44. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Operations and Administration Operation and Administration is the set of activities required to ensure smooth daily operations, to ensure that resources are optimized, and to ensure that growth is managed.

  45. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Systems Management Systems Management is the set of processes for maintaining, versioning, and upgrading the core technology on which the data, software, and tools operate.

  46. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Data Acquisition Management Data Acquisition Management is the set of processes that manage and maintain processes used to capture source data and its preparation for loading into the data warehouse or operational data store.

  47. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Service Management Service Management is the set of processes for promoting user satisfaction and productivity within the Corporate Information Factory. It includes processes that manage and maintain service level agreements, requests for change, user communications, and the data delivery mechanisms.

  48. Information Workshop Workbench Library & Toolbox Information Feedback Exploration Warehouse External DSI Data Warehouse API ERP Data Mining Warehouse DSI Data Acquisition CIF Data Management Data Delivery Internet API OLAP Data Mart DSI Operational Data Store API Legacy TrI Oper Mart DSI API Other Operational Systems Meta Data Management Operation & Administration Service Management Change Management Systems Management Data Acquisition Management Change Management Change Management is the set of processes coordinating modifications to the Corporate Information Factory.

  49. OLTP Databases Staging File Warehouse Database Extraction, Transportation, Loading and Transformation (ETL) • Purchase specialist tools, or develop programs • Extraction - select data using different methods • Transportation - move data into the warehouse • Loading and Transformation - validate, clean, integrate, and time stamp data

  50. Relevant Useful Quality Clean up Accurate Change Accessible Restructure Data Quality - Importance Ensure data is • Relevant • Useful • Quality • Accurate • Accessible Warehouse Operational systems • Large time consuming task

More Related