Physical Design Patterns in Information Systems
400 likes | 543 Views
Karim Ali & Sarah Nadi CS848 – Spring 2010 July 14 th , 2010. Physical Design Patterns in Information Systems. Outline. Database lifecycle Elements of Physical Design in Traditional Databases Physical Design of Different Systems Memory Based Relational Database Systems (MMDB)
Physical Design Patterns in Information Systems
E N D
Presentation Transcript
Karim Ali & Sarah Nadi CS848 – Spring 2010 July 14th, 2010 Physical Design Patterns in Information Systems
Outline • Database lifecycle • Elements of Physical Design in Traditional Databases • Physical Design of Different Systems • Memory Based Relational Database Systems (MMDB) • XML Databases • Data Warehouses • Conclusions & Future Work Karim Ali & Sarah Nadi
Database Lifecycle Karim Ali & Sarah Nadi
Elements of Physical Design Traditional Disk Based Relational Databases Karim Ali & Sarah Nadi
Traditional database systems • Disk Based Relational Databases (RDBMS) • Data is stored on disk as relations (i.e. tables) • Data is organized based on a relational model • Elements of physical design: • Index Structures • Materialized Views • Partitioning • Clustering • Data compression, striping, mirroring & denormalization Karim Ali & Sarah Nadi
1. Index Structures • Data needs to be organized for quick searching • Need to minimize expensive I/O operations • Examples: • B+trees • Bitmap Index • Hashtable • ... Karim Ali & Sarah Nadi
1. Index Structures Cont’d • B+tree Karim Ali & Sarah Nadi
1. Index Structures Cont’d Karim Ali & Sarah Nadi
2. Materialized Views • Repeated complicated queries should not have to be executed every time • Saves execution time & I/O reads by pre-computing the results • Materialized views are stored on disk • Rewriting queries using materialized views speeds up execution Karim Ali & Sarah Nadi
3. Partitioning • Divides the data into related partitions • Horizontal Partitioning: divides tables into sets of rows according to a specific attribute (E.g. Date ranges) • Vertical Partitioning: divides tables into sets of columns • Reduces table scan time • Improves performance Karim Ali & Sarah Nadi
3. Partitioning Cont’d Types of Horizontal Partitioning: Karim Ali & Sarah Nadi
4. Clustering • Grouping related items together for • Efficiency of access • Resource utilization • Achieved on the page level on disk • Pros • Very useful for multidimensional queries (e.g. group by) • Reduced I/O operations • Reduced CPU cost • Cons • Difficult to define clustering keys, clustering scheme, and the granularity of clustering Karim Ali & Sarah Nadi
5. Other Methods • Data Compression: • Fitting more data into a fixed amount of space • Striping: • Distribute data that is accessed together across multiple disks • Mirroring: • Duplicating the data to multiple disks • Denormalization: • Refine global schema to reflect query and transaction requirements Karim Ali & Sarah Nadi
Physical Design of Different Information Systems Karim Ali & Sarah Nadi
1. Main Memory Databases (MMDB) Karim Ali & Sarah Nadi
MMDBOverview • Primary copy of data resides in main memory • Cheaper to access main memory • MMDB has better performance • Usually has an archived copy of the data in case of crashes Karim Ali & Sarah Nadi
MMDB1. Index Structures • Use: • Reduce overall computation time without using too much extra space • Factors to consider: • Read/Write operations are cheaper • Should be cache conscious • No need to store data in the index structure • Categories of indexes used: • B+trees • T Trees • Search Trees Karim Ali & Sarah Nadi
MMDB1. Index Structures Cont’d Karim Ali & Sarah Nadi
MMDB2. Materialized Views • Not beneficial to MMDB • Cost of computing complicated queries is much less • Maintenance costs will outweigh benefit Karim Ali & Sarah Nadi
MMDB3. Partitioning • Not necessary in main memory • Used for the secondary storage on disk to speed up reload • Horizontal partitioning • Vertical partitioning Karim Ali & Sarah Nadi
MMDB4. Clustering • Not applicable to MMDB • Sequential access in main memory is not cheaper than random or dispersed access Karim Ali & Sarah Nadi
2. XML Databases (XML DBs) Karim Ali & Sarah Nadi
XML DBsOverview • XML-enabled DBs (E.g. SQL Server, Oracle): • Maps XML documents to relational tables • Useful for data-centric XML documents • Native XML DBs (E.g. eXist, Sedna) • Useful for document-centric, semi-structured XML documents • Data structures store actual XML nodes with pointers between them Karim Ali & Sarah Nadi
XML DBs1. Index Structures • XML-enabled DB • Shred XML data in a relational table with columns: ORDPATH, tag, node type, value, path ID • Use a B+tree index based on combination of primary key of base table & the ORDPATH Karim Ali & Sarah Nadi
XML DBs1. Index Structures Cont’d • Native XML DB • Index for document collections • Index for nodes • Numbering schema for the XML nodes • B+tree (or variations) used on the numbered nodes Karim Ali & Sarah Nadi
XML DBs2. Materialized Views • Greatly enhances performance • XQuery and Xpath query results are materialized • Query rewriting is more tricky due to semi-structured nature & complicated querying languages Karim Ali & Sarah Nadi
XML DBs3. Partitioning • Horizontal partitioning • Based on node type • Through inlining Karim Ali & Sarah Nadi
XML DBs4. Clustering • Elements & sub-elements are clustered together • XML documents are clustered based on structural similarity Karim Ali & Sarah Nadi
3. Data Warehouses Karim Ali & Sarah Nadi
Data WarehousesOverview • Collection of data and decision support technologies • Used in: • Retail: user profiling • Finance: claims analysis, risk analysis, credit card analysis, and fraud detection • Healthcare: outcomes analysis Karim Ali & Sarah Nadi
Data WarehousesDesign Karim Ali & Sarah Nadi
Data WarehousesChallenges • Data is usually • Extremely large • Multi-dimensional • Priority for aggregated and summarized data • Ad-hoc and complex queries • Expensive operations: aggregation, and joins • The fact table in the star schema participates in every join Karim Ali & Sarah Nadi
Data Warehouses1. Index Structures Karim Ali & Sarah Nadi
Data Warehouses2. Materialized Views • To materialize or not? • Workload characteristics • Cost for incremental update • Storage requirements • Pros • Improves performance through fast lookups • Useful for rollup and drilldown operations • Cons • Not applicable to all queries (e.g. ad-hoc queries) Karim Ali & Sarah Nadi
Data Warehouses3. Partitioning • Horizontal & Vertical partitioning are used • Pros • Ability to manage larger databases • Enhances query performance over large tables • Enables parallel processing • Facilitates data compression • Cons • Complexity: managing partitions • Efficiency: number of partitions affects the performance of meta data operations (e.g. browsing the data cube definition) Karim Ali & Sarah Nadi
Data Warehouses4. Clustering • Data is clustered by nature Karim Ali & Sarah Nadi
Conclusions & Future Work Karim Ali & Sarah Nadi
Future Work • Physical design of industrial systems • Interplay between systems • Automating physical design • List of the open problems in physical design Karim Ali & Sarah Nadi
Thank you Karim Ali & Sarah Nadi