1 / 29

Data Warehouse (II): Its Components

This article provides an overview of the functional components of a data warehouse, including ETL tools, enriched databases, data marts, metadata, and information delivery systems. It also discusses the architectural layers of a data warehouse and the administration and management tools required for its operation.

Download Presentation

Data Warehouse (II): Its Components

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehouse (II): Its Components Developed by: Dr Eddie Ip Modified By : Dr Arif Ansari

  2. Outline • Review • Functional view of DW: 7 components (tools & content) • Architectural view of DW: 6 layers

  3. 7 Functional Components • Extract, Transform, Load (ETL) tools • Enriched DW databases & DBMS tools • Meta data • Data marts • Access & Mining tools • DW administration & management tools • Information delivery system

  4. ETL: Extract/Source data • data from OLTP’s • Access, extract • Transform • Cleaning , e.g. validate • Integrate with external files for enrichment, then load into DW • e.g. business database

  5. ETL tools • Commercial tools: • Informatica • Information Builders (Copy Manager) • MS Data Transformation Services • Typical functions • Define source, query (run SQL), define transformation, define target, verify transformation, schedule run, audit report

  6. Enrichment : example • Dun & Bradstreet : 45m world business • American Business Information

  7. Enrichment :example • Match client’s file with proprietary DB • e.g. name, zip code • Append to matched record • sales volume • yellow page ad size • credit rating (in-house) • fax phone number

  8. Enrichment :example • A well-known life-style segmentation product: PRISM • Contains 62 clusters • E.g., the affluent executives of "Blue Blood Estates" • remote rural families of "Blue Highways“ • Zip code analysis http://cluster2.claritas.com/YAWYL/ziplookup.wjsp http://houseandhome.msn.com/pickaplace/nf_Overview.aspx

  9. Example: enrichment • ZIP code info may be useful to insurance agents • Credit rating info may be useful to bankers • Household info may be useful to direct marketing business

  10. Data mart • Data mart = subset of DW for community users, e.g. accounting department • Sometimes exist as Multidimensional Database • Info mart = summarized data + report for community users

  11. Metadata • Data about data • Field description, business rules (e.g. profit=? formula), log of file updates • Help users understand content & locate data

  12. DW Database • Almost always a relational DB • Oracle, DB2, Sybase, SQL Server • New DB design for special purpose of DW (e.g., scale up, speed up, parallel processing)

  13. Data access & mining • Tools • Query & reporting • OLAP • Data mining, visualization, segmentation, clustering • New developments: text mining, web mining & personalization • Mining multimedia data

  14. DW Administration • Security & priority • Keep track of updates • QC • Purging & copy to data mart

  15. DW Administration • Security issue critical (users at many levels) • Some security measures to protect a DW • Views = limit users to see certain rows/columns • Access control = grant rights to specific users to access selected data (can be created by DBA thro’ SQL commands such as Grant/Revoke) • Admin controls such as group access, firewall, encryption • Audit = track what users are doing

  16. Information Delivery Tools • Commercial tools • Crystal Report, Impromptu, WebFocus • Increasingly common mode of delivery: Web-enabled

  17. Architectural Model • 6 layers: Data flow • Transaction layer • Propagation layer • Database layer • Middleware layer • Application layer • Presentation layer

  18. Transaction layer • Programs are required to • Condition/scrub data = same naming convention, format e.g. MCI = MCI Worldcom=Worldcom

  19. OD1 Six-layer DW Architecture AP1 OD2 AP2 OD3 AP3 OD4 AP4 OD5 Transaction Propagation Database Middleware Application Presentation

  20. Transaction layer • Validate = is data up-to-date and accurate? ID invalid, missing, duplicate values • even name may have different format • First/Last, Last/Middle/First, …. • Define default for missing value • Map source to target = which field in operation data to which field in DW

  21. Propagation layer • Most common option is asynchronous propagation • Transfer : transaction system to DW platform • Load : actual loading • Time for loading depends on a factors such as number of indexes

  22. Database layer • DW is not just the database layer! • Integrated & transformed BEFORE • Analysis & presentation AFTER • Parallel processing capability often required • Symmetric Multi-processor(SMP) • Massively parallel processing (MPP)

  23. Middleware layer • Software that make access to database layer and application layer possible • LAN, WAN • Client server model : logical and physical separation of data from the application

  24. Application layer • Decision support applications • Presentational systems (pre-defined reports) • Interrogative systems (SQL) • Simulation systems (What-if analysis) • Functional systems (e.g. segmentation analysis) • Automatic systems (e.g. credit approval expert system) • Constrained by organizational design • System geographical oriented? Product oriented?

  25. Presentation layer • Interface • Command-line (e.g. unix) • Menu-driven (little interaction) • Query language (e.g. SQL) • Graphical (windows, pull-down menu, buttons) • Groupware

  26. Presentation layer • Multmedia (sound ,video, animation) • Hypertext (html)

  27. Summary

  28. OD1 Six-layer DW Architecture AP1 OD2 AP2 OD3 AP3 OD4 AP4 OD5 Transaction Propagation Database Middleware Application Presentation

  29. Summary • ETL (extract, transform, load) tools • Data Mart • Metadata • Data access & Mining • DW administration • Information Delivery Tools • Architectural Model (6 Layers)

More Related