1 / 44

Physical Design Patterns in Information Systems

Karim Ali & Sarah Nadi CS848 – Spring 2010 July 14 th , 2010. Physical Design Patterns in Information Systems. Outline. Database lifecycle Elements of Physical Design in Traditional Databases Physical Design of Different Systems Memory Based Relational Database Systems (MMDB)

illias
Download Presentation

Physical Design Patterns in Information Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Karim Ali & Sarah Nadi CS848 – Spring 2010 July 14th, 2010 Physical Design Patterns in Information Systems

  2. Outline • Database lifecycle • Elements of Physical Design in Traditional Databases • Physical Design of Different Systems • Memory Based Relational Database Systems (MMDB) • XML Databases • Data Warehouses • Conclusions & Future Work Karim Ali & Sarah Nadi

  3. Database Lifecycle Karim Ali & Sarah Nadi

  4. Elements of Physical Design Traditional Disk Based Relational Databases Karim Ali & Sarah Nadi

  5. Traditional database systems • Disk Based Relational Databases (DRDB) • Data is stored on disk as relations (i.e. tables) • Data is organized based on a relational model • Elements of physical design: • Index Structures • Materialized Views • Partitioning • Clustering • Data compression, striping, mirroring & denormalization Karim Ali & Sarah Nadi

  6. 1. Index Structures • Data needs to be organized for quick searching • Need to minimize expensive I/O operations • Examples: • B+trees • Bitmap Index • Hashtable • ... Karim Ali & Sarah Nadi

  7. 1. Index Structures Cont’d • B+tree Karim Ali & Sarah Nadi

  8. 1. Index Structures Cont’d Karim Ali & Sarah Nadi

  9. 2. Materialized Views • Repeated complicated queries should not have to be executed every time • Saves execution time & I/O reads by pre-computing the results • Materialized views are stored on disk • Rewriting queries using materialized views speeds up execution Karim Ali & Sarah Nadi

  10. 3. Partitioning • Divides the data into related partitions • Horizontal Partitioning: divides tables into sets of rows according to a specific attribute (E.g. Date ranges) • Single Vertical Partitioning: divides data into groups by attributes of the same type • Reduces table scan time • Improves performance Karim Ali & Sarah Nadi

  11. 3. Partitioning Cont’d Types of Horizontal Partitioning: Karim Ali & Sarah Nadi

  12. 4. Clustering • Grouping related items together for • Efficiency of access • Resource utilization • Achieved on the page level on disk • Pros • Very useful for multidimensional queries (e.g. group by) • Reduced I/O operations • Reduce CPU cost • Cons • Difficult to define clustering keys, clustering scheme, and the granularity of clustering Karim Ali & Sarah Nadi

  13. 5. Other Methods • Data Compression: • Fitting more data into a fixed amount of space • Striping: • Distribute data that is accessed together across multiple disks • Mirroring: • Duplicating the data to multiple disks • Denormalization: • Refine global schema to reflect query and transaction requirements Karim Ali & Sarah Nadi

  14. Physical Design of Different Information Systems Karim Ali & Sarah Nadi

  15. 1. Main Memory Databases (MMDB) Karim Ali & Sarah Nadi

  16. MMDBOverview • Primary copy of data resides in main memory • Cheaper to access main memory • MMDB has better performance • Usually has an archived copy of the data in case of crashes Karim Ali & Sarah Nadi

  17. MMDB1. Index Structures • Use: • Reduce overall computation time without using too much extra space • Factors to consider: • I/O operations are cheaper • Should be cache conscious • No need to store data in the index structure • Categories of indexes used: • B+trees • T Trees • Search Trees Karim Ali & Sarah Nadi

  18. MMDB1. Index Structures Cont’d Karim Ali & Sarah Nadi

  19. MMDB2. Materialized Views • Not beneficial to MMDB • Cost of computing complicated queries is much less • Maintenance costs will outweigh benefit Karim Ali & Sarah Nadi

  20. MMDB3. Partitioning • Not necessary in main memory • Used for the secondary storage on disk to speed up reload • Horizontal partitioning • Vertical partitioning Karim Ali & Sarah Nadi

  21. MMDB4. Clustering • Not applicable to MMDB • Sequential access in main memory is not cheaper than random or dispersed access Karim Ali & Sarah Nadi

  22. 2. XML Databases (XML DBs) Karim Ali & Sarah Nadi

  23. XML DBsOverview • Store, organize & query XML documents • XML-enabled DBs: • Maps XML documents to relational tables • Native XML DBs: • Data structures store actual XML • Semi-structured data is harder to handle Karim Ali & Sarah Nadi

  24. XML DBs1. Index Structures • XML-enabled DB (e.g. MS SQL Server 2005) • Shred XML data in a relational table with columns: ORDPATH, tag, node type, value, path ID • Use a B+ Tree index based on combination of primary key of base table & the ORDPATH Karim Ali & Sarah Nadi

  25. XML DBs1. Index Structures Cont’d • Native XML DB (E.g. eXist, TIMBER) • Numbering schema for the XML nodes • B+ Tree used on the numbered nodes Karim Ali & Sarah Nadi

  26. XML DB2. Materialized Views • Greatly enhances performance • XQuery and Xpath query results are materialized • Query rewriting is more tricky due to semi-structured nature & complicated querying languages Karim Ali & Sarah Nadi

  27. XML DB3. Partitioning • Horizontal partitioning • Based on node type • Through inlining Karim Ali & Sarah Nadi

  28. XML DBs4. Clustering • Elements & sub-elements are clustered together • XML documents are clustered based on structural similarity Karim Ali & Sarah Nadi

  29. 3. Data Warehouses Karim Ali & Sarah Nadi

  30. Data WarehousesOverview • Collection of data and decision support technologies • Used in: • Retail: user profiling • Finance: claims analysis, risk analysis, credit card analysis, and fraud detection • Healthcare: outcomes analysis Karim Ali & Sarah Nadi

  31. Data WarehousesDesign Karim Ali & Sarah Nadi

  32. Data WarehousesDesign Cont’d Karim Ali & Sarah Nadi

  33. Data WarehousesDesign Cont’d • Star Schema Karim Ali & Sarah Nadi

  34. Data WarehousesChallenges • Data is usually • Extremely large • Multi-dimensional • Priority for aggregated and summarized data • Ad-hoc and complex queries • Expensive operations: aggregation, and joins • The fact table participates in every join Karim Ali & Sarah Nadi

  35. Data Warehouses1. Index Structures Karim Ali & Sarah Nadi

  36. Data Warehouses2. Materialized Views • To materialize or not? • Workload characteristics • Cost for incremental update • Storage requirements • Pros • Behaves like an index • Improves performance through fast lookups • Useful for rollup and drilldown operations • Cons • Not applicable to all queries (e.g. ad-hoc queries) Karim Ali & Sarah Nadi

  37. Data Warehouses3. Partitioning • Dividing database objects into smaller more manageable pieces • Horizontal & Vertical partitioning are used • Pros • Ability to manage larger databases • Enhances query performance over large tables • Enables parallel processing • Facilitates data compression • Cons • Complexity: managing partitions • Efficiency: number of partitions affects the performance of meta data operations (e.g. browsing the data cube definition) • Might affect data refresh operations Karim Ali & Sarah Nadi

  38. Data Warehouses4. Clustering • Data is clustered by nature Karim Ali & Sarah Nadi

  39. Summary Karim Ali & Sarah Nadi

  40. Karim Ali & Sarah Nadi

  41. Conclusions & Future Work Karim Ali & Sarah Nadi

  42. Conclusions • Physical design patterns in traditional relational databases can be adapted to different systems • B+trees are the most popular index structure & have been adapted for the diff. Systems • Materialized views, partitioning & clustering are very important for performance except for MMDBs Karim Ali & Sarah Nadi

  43. Future Work • Physical design of industrial systems • Interplay between systems • Automating physical design • List of the open problems in physical design Karim Ali & Sarah Nadi

  44. Thank you Karim Ali & Sarah Nadi

More Related