1 / 26

Data Warehousing – An Introductory Perspective

Data Warehousing – An Introductory Perspective. DWCC BBSR. Agenda. Why Data Warehouse Definition and Architecture Terminology. The Business Need. I think…. errrr, I guess so. Business Decisions Are not made by Rolling Dices. We Don’t know What we don’t know.

pascal
Download Presentation

Data Warehousing – An Introductory Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehousing – An Introductory Perspective DWCC BBSR

  2. Agenda • Why Data Warehouse • Definition and Architecture • Terminology

  3. The Business Need I think…. errrr, I guess so Business Decisions Are not made by Rolling Dices We Don’t know What we don’t know

  4. Current Business Environment • Competitive • Ever Changing • Chaotic • Global • Urgency to make decisions • Competitive advantages stems from well informed decisions • Based on an understanding of: • Your Products • Your Customers Preferences • The Competition • Your own company strengths

  5. The Value Pyramid Each layer provides Value en route to a targeted business Outcome Increased revenue Increased productivity Reduced costs Competitive advantage

  6. Definitions • A collection of integrated, subject oriented databases designed to support the DSS function where each unit of data is relevant at some moment of time (Inmon 1991) • A copy of transaction data specifically structured to Query and Analysis (Kimball 1996) • Data Warehouse is NOT a specific technology • It is a series of processes, procedures and tools that help the enterprise understand more about itself, its products, its customers and the market it services. • It is NOT possible to purchase a Data Warehouse • But, it is possible to build one.

  7. sachin_kambhoj: sachin_kambhoj: FEATURES • Non Volatile - Used mainly for reporting purpose and it is independent of transactional data. • Subject Orientation- All relevant data is stored together. Ex: Sales, Finance, Marketing, Customer data etc. • Historical data- Can contain data of several years depending on company requirements.

  8. Subject Orientation. • Operational Datawarehouse AUTO Customer HEALTH Policy LIFE Premium CASUALTY Claims Applications Subjects

  9. Goals and Applications • Goals of a Data Warehouse • Provide reliable, High performance access • Consistent view of Data: Same query, same data. All users should be warned if data load has not come in. • Slice and dice capability • Quality of data is a driver for business re-engineering. • Data Warehousing Applications: • Customer Profitability Analysis • Customer satisfaction and retention • Buyer behavior. • Pricing, Promotion Analysis • Market research • Inventory optimization

  10. OLTP v/s Data Warehouse OLTP system runs the business, Data Warehouses tell you how to run the business

  11. If most of your business needs are • To report on data in a single transaction processing system • All the historical data you need are in the system • Data in the system is clean • Your hardware can support reporting against the live system data • The structure of the system data is relatively simple • Your firm does not have much interest in end user adhoc query/report tools Data warehousing may not be for your business!!

  12. Modeling Constructs • Entity Relationship Diagram • Star schema • Snow flake schema Within the implementation of a warehouse, several of these constructs may be integrated to form an optimal design

  13. Entity Relationship Diagram • Based on set theory and SQL • Highly normalized • Optimized for update and fast transaction turnaround • Not suited for querying in a data warehouse environment • diagrams like these are very difficult for users to visualize and memorize.

  14. Star Schema A central fact table surrounded by a number of dimension tables. Dimensions are business entities on which calculations are done. They can be numeric or alphanumeric. Example: Product table comprising brand name, category, packaging type, size. Facts are numerical measurements of business with respect to dimensions.They are numeric and additive (summable across any combination) e.g. A sales fact table could contain time, product and store key along with dollars sold, units sold, dollars cost.

  15. Snow Flake Schema Normalized version of the star schema with the addition of normalized dimension tables. Normalization helps to reduce redundancy in the dimension tables, but affects performance and user comprehension.

  16. DW Terminology Granularity • Granularity (or grain) defines the level of detail stored in the physical warehouse • Low granularity indicates lot of detail while high granularity indicates less detail. • Example: A commercial airline is building a data warehouse. What will the granularity be? • Choice A: Each record represents a flight • Choice B: Each record represents the customer on a flight There is no correct answer. To a large extent, the granularity depends on the business User’s exploitation needs. However, you should be aware that the granularity of data affects • Volumes of Data, Data Maintenance, Indexing • Level of Data Exploration • Query and Reporting constraints

  17. DW Terminology Metadata • At all levels of the data warehouse, information is required to support the maintenance and use of the data warehouse. Metadata is data about data. There are two views of Metadata • Business – are warehouse attributes and properties for use by business users • Technical – describe data flow from Operational systems into the data warehouse OLAP • Online Analytical processing • Tool(s) for Analytical Reporting including Graphical capabilities.

  18. DW Terminology OLAP Tools available for exploring the information built in a DW : • Multi-dimensional On-line Analytical Processing (MOLAP) • The data from data warehouse is queried and dumped periodically on to a server on local network to a data storage called Multi-dimensional Database (MDDB) provided by the OLAP tool. This MDDB forms a Data Mart which is then used for querying and reporting. • Relational On-Line Analytical Processing (ROLAP) • Refers to the ability to conduct OLAP analysis directly against a relational warehouse without any constraints on the number of dimensions, database size, analytical complexity, or number and type of users. • Hybrid On-line Analytical Processing (HOLAP) • An environment with a combination of MOLAP and ROLAP data storage. Summarized information is typically stored in an MDDB and detailed data is stored in a Relational environment.

  19. Terminology Data Mart- Contains Data about a specific subject. Eg. Official data, Customer data, Campaign data etc. Metadata- Data about data. Describes the data stored in Data warehouse. Data Cubes- Central object of data containing information in a multidimensional structure. Data Cleansing- Regular cleaning of data. ETL- Extraction, Transformation and Loading of Data. Data Mining- A mechanism which uses intelligent algorithms to discover patterns, clusters and models from data.

  20. Stages Extraction, Transformation & Loading (ETL) Business Intelligence Heterogeneous Source Systems Query & Reporting Operational Staging Area Data Warehouse OLAP Legacy External Data Mining

  21. A Typical Data Warehouse Data Warehouse Summarized Data Facilitates in firing queries on detailed data. Meta Data Detailed Data Data Mart Data Mart Data Mart Data marts contain data specific to a subject.

  22. MOLAP/ROLAP/HOLAP MDD Proprietary API MDD Proprietary API Data Warehouse (RDBMS) SQL Custom Loader Query Tool by MDD Vendor OLAP Engine Rows Rows MDD Database Storage Cubes Periodic, Manual Data Load

  23. OLAP Terminology Region State • Analytical technique whereby the user navigates from the most summarized to the most detailed level. Region District Location Month Product

  24. OLAP Terminology • Rotation Or Dicing Month Region M O N T h P R O D U C t Region Product

  25. OLAP Terminology • Slicing Region M O N T h Product

  26. Products and Vendors • Data Warehouses • Oracle • Sybase • DB2 • OLAP tools • Oracle Express • Hyperion Essbase • Data Mining • Oracle Darwin • IBM Intelligent Data Miner • Querying & Reporting • Oracle Discoverer • Business Objects

More Related