260 likes | 449 Views
Chapter 14 – Physical Design. Summary of Topics Design Process The Foundation for Design Design Objectives Designing a Data Mart Components of the Data Architecture Design Example. Chapter 14 – Physical Design. Design Process Iterative process with following steps
E N D
Chapter 14 – Physical Design Summary of Topics • Design Process • The Foundation for Design • Design Objectives • Designing a Data Mart • Components of the Data Architecture • Design Example
Chapter 14 – Physical Design Design Process • Iterative process with following steps • ID user requirements and project scope • Develop subject area data model(s) • Develop data warehouse logical model • Develop data warehouse architecture • Design the physical database • Populate user-oriented repository/directory • Identify sources of data • Cleanse and integrate data from legacy systems • Populate the data warehouse • Test for user satisfaction (quality, performance) • Rework design as needed
Chapter 14 – Physical Design The Foundation for Design • Data model is the foundation • 3rd normal form model recommended for depth of understanding, though later de-normalization is common • Top-down, global model ideal, but • Lengthy duration • Corporate funding requirements • Bottom-up may be more practical • Build up ,one subject area at a time • Integrate into complete model over time
Chapter 14 – Physical Design Design Objectives • Define in terms of business objectives • Involves tradeoffs among • Performance • Flexibility • Scalability • Ease of Administration • Data Integrity • Data Consistency • Data Availability • User Satisfaction
Chapter 14 – Physical Design Design Objectives: Performance • Response time in DW typically > OLTP • Important to manage user expectations • Poor performance may result from • Inadequate hardware • Inflexible data architecture • Poor physical design • Unrealistic user expectations • Build performance bottom-up (Fig 14.1) • DBMS Installation parameters • Too few concurrent users • Poor selection of installation parameters • Database Design and Optimization • Application design, SQL quality/efficiency • Query efficiency • Tune performance from top-down
Chapter 14 – Physical Design Server Scalability • Defn: ability to handle increase in # users • Old mainframes known for poor scalability • Many adopt multi-server environment Server Flexibility • Achieve using data model and metadata as basis for physical design • May include giving users flexibility to handle analysis, query, reporting needs • Must accommodate change in today’s business environment
Chapter 14 – Physical Design Designing a Data Mart • Defn: A warehouse developed to meet requirements of a specific dept., group • A subset of a corporate warehouse • Caveat: may result in isolated islands of information
Chapter 14 – Physical Design Components of the Data Architecture • May include the following elements • Legacy Systems • Relational format, Model, Metadata questionable • Data Warehouse • Historical detail, summary, external data • Specialized data subsets, multidimensional data • Data Marts • Tailored to specific dept. or workgroup • Personal Data Warehouse • Draw from warehouse, but not part of it • May contain external, personal data • Operational Data Store (See Fig. 14.2) • Optional • Shared between operational and warehouse env’s • Source of detail data for warehouse • Typically updateable
Chapter 14 – Physical Design Types of Data Structures in the Warehouse • Detail Data • Summary Data • Multidimensional Data • Data Subsets • Specialized Data Caches • Replicated Data • Archived Data • Design choices depend on expected data usage, performance requirements. • Iterative process
Chapter 14 – Physical Design Detail Data • Most basic level of data • Subsets, summaries derived from detail • Should include only the data needed for decision support • Include the element of time • In theory, all queries & reports could be run against detail data but usually denormalized for performance. • Advantages of denormalization • Better performance • More accessible • Spreadsheet view of data (intuitive) • Downside: Implementation size
Chapter 14 – Physical Design Indexing for Performance • Warehouse typ. > #indexes than OLTP • Very beneficial to performance, but • Large # of indexes degrade load time • Place indexes on primary, foreign keys to start • Add others as needed & as feasible • Number of indexes varies with inversely with table size
Chapter 14 – Physical Design Partitioning Data • Can centralize on mainframe/server, or • May partition across multiple servers by workgroup, dept, subject area, application • Choice depends on expected pattern of use • Horizontal partitioning: departmental data on dept. servers • Vertical partitioning: implement columnar subset of normalized table across servers
Chapter 14 – Physical Design Parallel Processing • Key design issue • Appropriate database design to achieve potential for improved performance • Shared Resources • Disk • Memory • Advantages, Disadvantages • Choices based on expected use patterns
Chapter 14 – Physical Design Summary Levels • Summaries may offload detail processing • Separate tables or multidimensional D/B • Key design tradeoffs: • Minimize I/O • Provide access to right data @ right time • Reasonable cost • Recommend incremental/spiral approach to build critical summary levels first
Chapter 14 – Physical Design External Data • Example: government or industry-supplied demographic data • Individual queries may need to merge external, personal, and warehouse data • Document ext. data with metadata • Data quality may be questionable Data Replication • Can improve performance, availability in client/server environment • Reduce network traffic • Redundant availability if primary server down
Chapter 14 – Physical Design Data Placement Guidelines • Do • Place tables commonly joined on same server • Consider replicating tables across servers if joins will cause severe network traffic issues • Consider placing shared detail data on centralized source • Don’t • Place frequently joined tables on same device (e.g. same disk) • Place indexes on same devices as table on which they are built • All these point to reducing network traffic and device I/O.
Chapter 14 – Physical Design RAID Technology • Not a Dept. Of Defense Program!! • “Redundant Arrays of Inexpensive Disks” • Potential solution for Fault Tolerance • Techniques include: • Disk Mirroring (2 disks, same controller) • Disk Duplexing (2 disks, 2 controllers) • Parity/Error Detection and Correction • Disk Striping
Chapter 14 – Physical Design RAID Levels • RAID 0: Sector interleave, no error checking (no redundancy) • RAID 1: Mirroring (duplicate copy) • RAID 2: Bit interleave with error correction codes on multiple drives • RAID 3: Bit interleave with error correction on single drive • RAID 4: Sector interleave with dedicated parity drive • RAID 5: Sector interleave, parity stored on all drives
Chapter 14 – Physical Design Archived Data • Can’t store all the data all the time !! • Need to determine data retention needs • Issues with archiving and retrieval • What if data format changes? • What if we need archive data but no room in current table?
Chapter 14 – Physical Design OLAP and Multidimensional Data • Multidimensional Model • Data stored as facts and dimensions rather than rows and columns • A.K.A. “Cubic”, “Star” model • Major “Fact” tables, usually business related • Smaller “dimension” tables, descriptive data • Dimension table: perspectives on factual data • “by” criteria • Sales by salesperson vs. sales by region • Fact : quantitative or factual data • Usually a large denormalized table • Minimizes number of joins
Chapter 14 – Physical Design Designing Multidimensional Tables • Identify Requirements from analyst’s view (how will data be accessed?) • Identify “facts”, “dimensions” • Identify the roll-up (aggregation) levels • Design fact and dimension tables • Determine data retention, sizing requirements • Identify partitioning requirements • Validate the design • Modify as requirements changes
Chapter 14 – Physical Design Design Example • Figures 14.6 : Partial E-R Diagram
Chapter 14 – Physical Design Figure 14.7: Transformation of E-R Diagram into Multidimensional Model • Entities mapped into dimensions • Product • Location • Time • Entities that belong to more than one dimension become facts
Chapter 14 – Physical Design • Figure 14.10: Modified Star Schema • Note Level ID – tags NULL entries • e.g. regional level data -> STORE_ID = NULL • Includes attribute descriptions