1 / 43

BSAD 482: Business Analytics

Learn about the concept of data warehouse architecture and its various components, including data models and server architecture. Discover the importance of architectural choices and the benefits it brings to communication, planning, flexibility, and productivity. Explore architectural principles and their role in making well-informed decisions. Gain insights into the growth of data warehouses and the challenges they pose. Understand the key choices and tools needed for successful data architecture.

daleflores
Download Presentation

BSAD 482: Business Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BSAD 482: Business Analytics The Data Warehouse - Architecture

  2. Contents • Define the concept of ‘architecture’ • Discuss the various components of data warehouse architecture • Review different data architectures in a data warehouse • Data warehouse • Data Marts

  3. Information Systems Architecture • Information Systems Architecture definition is the process of making the key choices that are essential to the design of an information system (like a Data Warehouse). Business Strategy Line of Code / Process Step Employee / Computer Removal of Choices Architecture Design Implementation

  4. Architecture or Design? • The Customer Dimension will consist of the following 12 attributes… • We will use Microsoft SQL Server Integration Services (SSIS) as our ETL tool • All data marts will employ ‘star schema’ data models • The lowest level of granularity in the sales fact table will be daily sales by store by product.

  5. The Value of Architecture • Communication: • To business sponsors, and business users • Between members of the project team • Planning: • Ensure that all important components of the system are accounted for • Flexibility and Growth • Thinking about overall architecture will reduce risk associated with the ‘success’ of the data warehouse • Productivity and Reuse

  6. Architectural Principles • Mature IS organizations make architectural choices that are driven by well thought out Guiding Principles: “Truths” that guide decision making and hold true irrespective of changes in goals, strategy, and leadership. • Examples: • We will seek to buy solutions vs. build them where possible. • We strive for simplicity in our Technology Infrastructure Footprint. • Solutions must align/integrate with Corporate Security standards • We will implement solutions that emphasize ease of use for stakeholders. • We strive to implement solutions that can support needs across the enterprise and allow data sharing as needed, subject to appropriate governance

  7. Business Processes: support the strategy, Operational organization The Applications support the Business, implement the business functions in the IT systems The Information is key for the organization: It is the fuel that drives the architecture Infrastructure that supports the IS: • Technical components: servers, networks, etc. • Technology: platforms, etc. Architectural ‘Views’ Strategy Business Métier Our Focus…. Applications Information & Data Network & Infrastructure

  8. BI Application Architecture ODS

  9. What’s driving the explosive growth of data warehouses? • Business need…. • Organizations need great information to: • Sense what is happening (and predict what will happen) • Respond effectively by making fast, smart decisions • Technical “enablers” • Computing Power …. • Cost of Computing ….

  10. Moore’s Law and its counterparts… • Drivers: • Moore’s Law: computer ‘chip’ performance per dollar doubles every eighteen months to 2 years: • CPU, RAM, Flash • Data transmission rates: the number of bits that can be transmitted / second between computers: as of 2005: 9 Gbit/sec; 2008: 38 Gbit/second. • Data storage (magnetic disk drives): 1 TB drive… $100?

  11. What’s Special About Data Warehouse Architecture? • Transaction processing systems – growth is (relatively) predictable • Example: • A company uses SAP for order processing • They are opening a new retail store • They predict (based on experience) 2000 transactions per week • To process this volume, we need 3 workstations to capture the transactions • Peak time each day is 11-2 when 50% of transactions occur

  12. What’s Special About Data Warehouse Architecture? • Success drives explosive growth • More users • More (complex) queries • More data • Performance is non-deterministic • Unpredictable queries • Unpredictable use patterns Data Warehouse SAP Growth Banner Time

  13. Data Warehouse/BI Architecture • Choices need to be made regarding: • Choice of Approach/Tools for ETL, Quality Management, User Access • Data architecture: the way data is organized and managed in the data warehouse • Server Architecture/Processing Architecture …. Degree of “Parallelism” needed • Database Management Software • Metadata management: approach for managing technical, operational and user metadata • Network connectivity • BI Tools (the tools that the users access) • Many more • Our focus today: Data Architecture • But first a word on parallelism

  14. Parallelism • Essentially, parallel computing improves performance by breaking up the task and working on the pieces simultaneously. • Performance is improved either through speeding the process (doing the same amount of work faster) or scaling up (doing more work in the same amount of time). In most cases, a data warehouse needs both types of performance gains over its life cycle.

  15. Parallelism • Hardware Vendors offer different architectures to achieve parallelism: • Symmetric Multi-Processing (SMP) • Single computer, 2:N processors sharing a common system bus, memory and disk array • Massively Parallel Processing (MPP) • A series of independent ‘nodes’ interconnected by high speed switch • Each node is a computer controlling its own processor(s), memory and disk • Nodes work independently, activity coordinated by a master program • Hybrid (common today) • Applications: • BI, “Big Data” Analytics, Watson

  16. Build vs. Buy • MANY options today for acquiring Information Technology: • Build it your self • Buy a package (SAP) • Rent a package (SAAS) • There are also options about where you run the technology • Host it in your own data centre • Have someone else host it in the Cloud • With a data warehouse, we buy a set of tools, and typically have to build the DW AND the majority of our BI applications

  17. A Reference DW Data Architecture Build Today…often in Cloud… ODS Buy ETL Tools DBMS BI Tools Metadata Tools

  18. BSAD 482: Business Analytics The Data Warehouse - Architecture

  19. Data Architecture Databases Data Warehouse Data Architecture Data Mart Data Architecture

  20. What is a Database? • At the core of a data warehouse is a huge database • Databases are used to efficiently store and manage large amounts of data. • Managed by a special class of software called database management software (DBMS)

  21. Database Basics • Database tables store information and resemble Excel worksheets. • Tables consist of: • fields (columns) • records (rows) • Tables are connected (or related) through keys. • Tables normally store information about one specific type of data.

  22. Data Architecture Components • Enterprise data warehouse (EDW) A centralize data store for the enterprise. Goals: • Flexibility, detail, history • Feeds Data Marts • Data Mart A subset of “DW” data that stores only data relevant to a specific user group, department, business area • Dependent data mart A subset that is created directly from a data warehouse • Independent data mart A data mart created directly from source systems

  23. Data Architecture Components • Operational data stores (ODS) A type of data store used to provide access to information required in real time or near real time • Metadata Information about the data in a data warehouse. Metadata describes: Meaning Quality Lineage Access Methods (where, how, who)

  24. Data Warehouse Data Architecture • The purpose of the data warehouse is to be the primary repository for data that will feed the BI access layer (the data marts). • It is designed to provide the flexibility to meet both current and future data needs. • Generally, data is highly normalized as this provides the greatest flexibility. • The design based on an enterprise data model NOT any specific application.

  25. Data Warehouse Data Architecture (cont) • Key characteristics: • The data in the data warehouse should generally be stored at an ‘atomic’ (transaction) detail in order to provide flexibility to meet future needs. • The data warehouse is responsible for maintaining historical data required for decision support purposes. • The data warehouse is designed for flexibility and load performance, but not for direct access by end-users. The exception to this rule is for specialized analytics such as exploratory data mining.

  26. Data Mart Data Architecture Relational Star Schema Multi-dimensional

  27. Data Mart Data Model: “Star Schema” Facts contains information about things that an organization wants to measure (aka ‘measures’). A measure provides some indication of performance and on which calculations (e.g., sum, count, average, minimum, maximum) can be made Dimensions and Hierarchies A dimension is a structure that categorizes facts and measures in order to enable users to answer business questions. A hierarchy contains different levels for a dimension: Product Cat  Product Sub-Cat  Product VIDEO

  28. Star Schema Example Fact Table ( Instances of ordered tests ) Dimension Tables ( Descriptive attributes of ordered tests )

  29. Facts & Dimensions DIM_TEST FACT_ORDERED_TEST • Might be 1000s of fact rows referencing this specific type of test. All would have Test_ID = 424 • Test_ID is KEY that a) uniquely identifies each test in DIM_TEST and b) allows us to link Fact_Ordered_Test with Dim_Test to retrieve descriptive info about tests

  30. Simple Example • An auto company wants to analyze its auto service business. Key information needed includes: • Date of service (want to be able to categorize by week, month, year) • Customer • Vehicle (VIN, model, type, manufacturer) • Location of Service (assume many locations, want to know garage, city, province) • Service Technician • Service Reason (assume standard categories) • Tech time spent, in minutes • Customer wait time, in minutes • Total time vehicle in garage, in minutes • Cost of Service – Labour • Cost of Service - Parts

  31. Service Star Schema DIM_SERVICE_DATE DATE WEEK MONTH YEAR DIM_CUSTOMER FACT_SERVICE TECH_TIME_MIN WAIT_TIME_MIN TOTAL_TIME_MIN LABOUR_COST PARTS_COST DIM_VEHICLE VIN MODEL TYPE MANUFACTURER DIM_SERVICE_TECH DIM_LOCATION GARAGE CITY PROVINCE DIM_SERVICE_RSN

  32. Scenario • GASHA (the local health district) wants to build a data mart to analyze patient stays for surgery. The following is the key information needed: • Patient information (name, address, age, gender, etc.) • Total time patient waited for a bed in days • Total time of stay in hospital in days • Cost of Drugs • Cost of Nursing Care • Primary Physician and the specialty area of that physician • Cost of surgery rental • Fee paid to Physician • Hospital where surgery took place • Hospital Unit where patient bed located • Condition requiring surgery (assume we have a standard list) and the category for the condition • Start and End Date of Stay (want to analyze by day of the week, month, and year) • Draw a simple ‘star schema’ to represent this scenario – • What are the ‘measures’ (all in one table) and what are the dimensions (separate tables)?

  33. MDBMS • In most cases we use relational databases to store data warehouse data • Multidimensional database: A form of data mart implemented via a specialized multidimensional database software that pre-calculates / summarizes ALL measure/dimension intersections in advance (as part of ETL) Also called an OLAP database Pros: very, very fast answers Cons: lack of flexibility, limits on data volumes / level of detail VIDEO

  34. Detailed data Relational table: Assume millions of rows

  35. From a “star schema” perspective Dim_Model Dim_Color Type Model Color Fact_price Price_dollars

  36. OLAP representation Dimension: Color Measure: Price (PRE-CALCUATED) Dimension: Model. Has a hierarchy Type  Model

  37. OPTIMAL ARCHITECTURE?

  38. What is the ‘optimal’ DW architecture?

  39. What is the ‘optimal’ DW architecture? • Each architecture has advantages and disadvantages!

  40. DW The Great Data Architecture Debate Sources • Enterprise data warehousewith dependent marts • Pros: more scalable, more efficient in the long run • Cons: early implementations time consuming and expensive, complexity • Independent data marts • Pros: simple, faster to implement • Cons: not scalable, inefficient, difficult to maintain when large number. Sources

  41. Factors that impact data architecture…. • Information Interdependence between Organizational Units: Higher the need to share information across organizational units, greater likelihood of enterprise DW • Upper Management’s Information Needs: Senior management needs information from lower organizational levels to do their job. Higher this need, greater likelihood of enterprise DW. • Urgency of Need for a Data Warehouse: The more time pressures to implement, greater likelihood that there will NOT be enterprise DW.

  42. Factors that impact data architecture…. • Nature of End-User Tasks: Some users perform non-routine tasks. Structured queries and reports are insufficient for their needs. Some users need access to enterprise wide data (not just a specialized mart) • Constraints on Resources: Some data warehouse architectures require more resources (IT and business personnel, money) than others. Greater resource constraints, less likely enterprise DW built • Strategic View of the Data Warehouse Prior to Implementation: The more strategic the approach, the greater the likelihood of enterprise DW.

  43. Factors that impact data architecture…. • Compatibility with Existing Systems: Operational systems in place may dictate architecture. Example: SAP and SAP BW. • Perceived Ability of the In-House IT Staff: Building a data warehouse can challenging, and some architectures are more difficult to build. Staff skills influences ‘ambition’ of scope. Less skill, less likely to build enterprise DW • Technical Factors: Number of users, volume of data, query performance requirements, stability of source systems all influence choices. • Cultural/Political Factors: Influence of experts, culture (do we share data well?), influence of current technical team, etc.

More Related