1 / 29

Components and Architecture

Components and Architecture. CS 543 – Data Warehousing. Architecture. What are the key components of a data warehouse? Architecture is the structure that binds the components into an integrated whole DW architecture provides the overall framework for developing and deploying DW solutions.

jael-benson
Download Presentation

Components and Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Components and Architecture CS 543 – Data Warehousing

  2. Architecture • What are the key components of a data warehouse? • Architecture is the structure that binds the components into an integrated whole • DW architecture provides the overall framework for developing and deploying DW solutions CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  3. Architectural Components CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  4. Distinguishing Characteristics • Different objectives and scope • Data content • Complex analysis and quick response • Flexible and dynamic • Metadata driven CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  5. Architecture Supporting Flow of Data CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  6. Technical Architecture • The technical architecture of a DW is the complete set of functions and services provided within its components • Functions • Services • Rules and procedures • Data stores • Tools are the means to implement an architecture • Architecture comes first, then the tools; select the appropriate tools based on the architecture CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  7. Data Acquisition (1) • This component includes • Extraction • Transfer into staging area • Preparation for loading (transformation, cleansing, and integration) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  8. Data Acquisition (2) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  9. Data Acquisition – Functions and Services (1) • Data extraction • Select data sources and determine the types of filters to apply to individual sources • Generate automatic extract files from operational systems using replication and other techniques • Create intermediary files to store selected data to be merged later • Transport extracted files from multiple platforms • Provide automated job control services for creating extract files • Reformat input from outside sources, departmental files, databases, and spreadsheets • Resolve inconsistencies for common data elements from multiple sources • Generate common application code for data extraction CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  10. Data Acquisition – Functions and Services (2) • Data transformation • Map input data to data for DW repository • Clean data, remove duplicates, merge/purge • De-normalize extracted data structures as required by the dimensional model of the DW • Convert data types • Calculate and derive attribute values • Check for referential integrity • Aggregate data as needed • Resolve missing values • Consolidate and integrate data CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  11. Data Acquisition – Functions and Services (3) • Data staging • Provide backup and recovery for staging area repository • Sort and merge files • Create files as input to make changes to dimension tables • If staging area storage is a relational database, create and populate database CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  12. Data Storage • This architectural component covers the process of loading the prepared data from the data staging area into the data warehouse repository CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  13. Data Storage – Functions and Services • Load data for full refreshes of DW tables • Perform incremental loads at regular prescribed intervals • Support loading into multiple tables at the detailed and summarized levels • Optimize the loading process • Provide automated job control services for loading the data warehouse • Provide backup and recovery for the DW database • Provide security • Monitor and fine-tune the database • Periodically archive data from the database according to preset conditions CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  14. Information Delivery (1) • This architectural component spans a broad spectrum of many different methods of making information available to the users of the DW • To the users, information delivery is the DW; it is the front-end through which the users retrieve information from the DW • Information • Online queries and interactive analyses • Regular and ad-hoc reports • Specialized applications (e.g. executive information system) • Data mining CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  15. Information Delivery (2) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  16. Information Delivery – Functions and Services • Provide security to control information access • Monitor user access to improve service and for future enhancements • Allow users to browse data warehouse content • Simplify access by hiding internal complexities of data storage from users • Automatically reformat queries for optimal execution • Enable queries to be aware of aggregate tables for faster results • Govern queries and control runaway queries • Provide self-service report generation for users • Store result sets for queries and reports for future use • Provide multiple levels of data granularity • Provide event triggers to monitor data loading • Make provision for the users to perform complex analysis • Enable data feeds to downstream, specialized data support systems such as EIS and data mining CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  17. Infrastructure Supporting Architecture • The architecture defines the functions and services; the infrastructure defines the elements to support the architecture • Infrastructure is the foundation supporting the architecture • Hardware servers • OSs • Data management systems • Networking elements • Supporting tools and applications • People • Procedures CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  18. Operational Infrastructure • Operational infrastructure includes • People • Procedures • Training • Management software • Operational infrastructure are the people and procedures that keep the DW functioning, and not those who develop the DW CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  19. Physical Infrastructure (1) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  20. Physical Infrastructure (2) • Physical infrastructure includes • Computing hardware (e.g. server) • OS and utilities • Networking hardware and software • Software tools • Decisions about the physical infrastructure are critical for a DW. Two principles • Leverage as much of the existing physical infrastructure • Keep the infrastructure as modular as possible CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  21. Hardware and Operating System • Hardware • Scalability • Support • Vendor reference • Vendor stability • Operating system • Compatibility • Scalability • Security • Reliability • Availability • Preemptive multitasking • Multi-threaded approach • Memory protection CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  22. Single Platform Option • Simplest option, where all functions and services are performed by a single computing platform • Typically used by small to medium sized companies who have mainframes or large Unix servers already in use with capacity to spare • Some shortcomings of using mainframes • Stretched to capacity • Non availability of tools • Multiple legacy platforms • Company’s migration policy CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  23. Hybrid Option • Most companies opt for the hybrid option where multiple platforms are used for data warehousing (data acquisition, data storage, information delivery) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  24. Data Extraction • Data extraction • Best performed on each source system’s own computing platform • Initial reformatting and merging • Best performed on each source system’s own computing platform • Extract files are reformatted and merged into a smaller number of files performing verification against the source system • Initial data cleansing • Also performed on source system platform • Transformation and consolidation • Performed on the staging area platform • Validation and final quality check • Performed on the staging area platform • Creation of load images • Performed on the staging area platform CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  25. Options for the Data Staging Area • In one of the legacy platforms • On the data storage platform • On a separate optional platform • You can optimize the platform for complex transformations and cleaning • Install specialized tools for transformations and cleaning • Keep track of entire data content in the staging area CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  26. Data Movement CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  27. Client/Server Architecture (1) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  28. Client/Server Architecture (2) • Application server (middle tier) • To run middleware and establish connectivity • To execute management and control software • To handle data access from the Web • To manage metadata • For authentication • As front end • For managing and running standard reports • For sophisticated query management • For OLAP applications CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

  29. Maturing of the Infrastructure CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

More Related