1 / 75

Multidimensional Databases

Multidimensional Databases. Prof. Navneet Goyal Computer Science & Information Systems Department BITS, Pilani. Database Evolution. Flat files Hierarchical and Network Relational Distributed Relational Multidimensional (MDDB). MDDB History.

mikasi
Download Presentation

Multidimensional Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multidimensional Databases Prof. Navneet Goyal Computer Science & Information Systems Department BITS, Pilani

  2. Database Evolution • Flat files • Hierarchical and Network • Relational • Distributed Relational • Multidimensional (MDDB)

  3. MDDB History • Multidimensional databases do not have their origin in DB technology but stem from multidimensional matrix algebra, which has been used for manual data analysis since the late 19th century

  4. MDDB History • Evolved from econometric research conducted at MIT in the 1960s, the MDDB has matured into the database engine of choice for data analysis applications

  5. MDDB History • During the late 1960s, IRI Software and Comshare independently began developing what later became MDDB systems • IRI Express, a popular tool for marketing analysis in the late 1970s & early 1980s, became a market-leading online analytical processing tool & was acquired by Oracle.

  6. MDDB History • Concurrently, the Comshare system developed into System W, which saw heavy use for financial planning, analysis, & reporting during the 1980s.

  7. MDDB History • In 1991, Arbor Software, now Hyperion Solutions, was formed withthe specific purpose of creating a multiuser, MDDB server, which resulted in the Essbase system. Arbor later licensed a basic version of Essbase to IBM for integration into DB2.

  8. MDDB History • In 1993, E.F. Codd coined the term OLAP1 • Another significant development in the early 1990s was the advent of data warehouses, which are typically based on relational star or snowflake schemas, an approach that uses relational database technology to handle multidimensional data 1http://dev.hyperion.com/resource_library/white_papers/

  9. Why Multi-Dimensional Databases? • No single "best" data structure for all applications within an enterprise • Organizations have abandoned the search for the HOLY GRAIL of globally accepted database • Select the most appropriate data structure on a case-by-case basis from a palette of standard database structures • Multidimensional Databases for OLAP?

  10. Why Multi-Dimensional Databases? • Inherent ability to integrate and analyze large volumes of enterprise data • Offers a good conceptual fit with the way end-users visualize business data • Most business people already think about their businesses in multidimensional terms • Managers tend to ask questions about product sales in different markets over specific time periods

  11. Why Multi-Dimensional Databases? • Multidimensional databases is a key technology in the enabling of interactive analyses of large amounts of data for decision-making purposes. • Multidimensional databases differ from previous technologies by viewing data specifically as multidimensional cubes, which have proven to be particularly well suited for data analyses.

  12. Multi-Dimensional Data Model • The multidimensional data model, has emerged that has taken over from the relational model when the objective is to analyze data, rather than to perform on-line transactions • Multidimensional models categorize data as being either facts with associated numerical measures, or as being dimensionsthat characterize the facts & are mostly textual.

  13. Multi-Dimensional Data Model 3Major Applications within data analysis: • Data Warehousing • OLAP • Data Mining

  14. Data Warehousing Information Operational data Knowledgeworkers Fig. taken from w w w . e f o r c e g l o b a l . c o m

  15. Operational Systems Data Warehouse Sales Customer Financial Geography Inventory Product Organized by processes or tasks Organized by subject A subject-Oriented Approach Fig. taken from w w w . e f o r c e g l o b a l . c o m

  16. Data Warehouse Process DATA SOURCES STAGING AREA DATA WAREHOUSE DECISION SUPPORT Application Databases ______________________________________________________ Reports Packaged application/ERP Data DATA MARTS INCOME ANNUAL REPORT ___ ___ ____ _____ ___ __ ___ ___ ____ _____ ___ __ ___ ___ ____ _____ ___ __ EXTRACTION TRANSFORMING CLEANING AGGREGATION DATA WAREHOUSE EIS Desktop Data OLAP External Data Statistical & Financial Analysis Web-based Data [Adapted from SunExpert Magazine, October 1998.]

  17. What is OLAP? • OLAP enables analysts, managers, and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information • OLAP transforms raw data so that it reflects the real dimensionality of the enterprise as understood by the user

  18. What is OLAP? • OLAP is a decision support tool which enables the sophisticated analysis of an organization’s performance by providing access to views of data which characterize the multidimensional nature of the enterprise [Codd et at 1993]

  19. Data Mining • Discovery of useful summaries of data - Ullman • Extracting or “Mining” knowledge form large amounts of data • The efficient discovery of previously unknown patterns in large databases • Technology which predict future trends based on historical data • It helps businesses make proactive and knowledge-driven decisions • Data Mining vs. KDD • The name “Data Mining” a misnomer?

  20. Continuum of Analysis OLTP Specialized Algorithms SQL + Extensions SQL Data Mining DW/ OLAP Primitive & Canned Analysis Complex Ad-hoc Analysis Automated Analysis Role of MDDBs??

  21. SQL Extensions • Window Queries • Top N Queries • Rollup Operator • Cube

  22. Multidimensional Database • Spreadsheets – A 2D database? • Functionalities • What about a stack of similar spreadsheets for different times? • Limitations? We can not relate data in different sheets easily!

  23. Multidimensional Database An MDDB is a computer software system designed to allow for the efficient and convenient storage and retrieval system of large volumes of data that is • Intimately related & • Stored, viewed and analyzed form different perspectives These perspectives are called Dimensions

  24. A Motivating Example An automobile manufacturer wants to increase sale volumes by examining sales data collected throughout the organization. The evaluation would require viewing historical sales volume figures from multiple dimensions such as • Sales volume by model • Sales volume by color • Sales volume by dealer • Sales volume over time

  25. Relational Structure

  26. 6 5 4 M O D 3 5 5 E L 4 3 2 COLOR Multidimensional Array Structure Measurement Sales Volumes Dimension MiniVan Coupe Sedan Blue Red White Dimension Positions

  27. RDBMS vs. MDDB • Multidimensional array structure represents a higher level of organization than the relational table • Perspectives are embedded directly into the structure in the multidimensional model • All possible combinations of perspectives containing a specific attribute (the color BLUE, for example) line up along the dimension position for that attribute. • Perspectives are placed in fields in the relational model - tells us nothing about field contents.

  28. RDBMS vs. MDD • MDD makes data browsing and manipulation intuitive to the end-user • Any data manipulation action possible with a MDD is also possible using relational technology • Substantial cognitive advantages in query formulation • Substantial computational performance advantages in query processing when using MDD

  29. RDBMS vs. MDD

  30. Mini Van M Coupe O D Carr Sedan E Gleason L Clyde DEALERSHIP Blue Red White COLOR Mutlidimensional Representation Sales Volumes

  31. M O D E L DEALERSHIP COLOR Viewing Data - An Example Sales Volumes Assume that each dimension has 10 positions, as shown in the cube above

  32. Viewing Data - An Example • How many records would be there in a relational table? • Implications for viewing data from an end-user standpoint? SALES VOLUMES FOR ALL DEALERSHIPS MODEL COLOR DEALERSHIP VOLUME MINI VAN BLUE CLYDE 2 MINI VAN BLUE GLEASON 2 MINI VAN BLUE CARR 2 MINI VAN RED CLYDE 1 MINI VAN WHITE GLEASON 3 • • • RECORD NUMBER.... 998 RECORD NUMBER.... 999 RECORD NUMBER.... 1000

  33. Performance Advantages • Volume figure when car type = SEDAN, color=BLUE, & dealer=GLEASON? • RDBMS – all 1000 records might need to be searched to find the right record • MDDB has more ‘knowledge’ about where the data lies • Max. of 30 position searches!! • Average case 15 vs. 500

  34. Performance Advantages • Total Sales across all colors and dealers when model = SEDAN? • RDBMS – all 1000 records must be searched to get the answer • MDDB – Sum the contents of one 10x10 ‘slice’

  35. Performance Advantages • Data manipulation that requires a minute in RDBMS may require only a few seconds in MDDB • MDDBs are an order of magnitude faster than RDBMSs • Performance benefits are more for queries that generate cross-tab views of data • The performance advantages offered by multidimensional technology facilitates the development of interactive decision support applications like OLAP that can be impractical in a relational environment.

  36. RDBMS vs. MDDB • Any data manipulation action possible with a multidimensional database is also possible using relational technology • MDDBs however offer several advantages like: • Ease of data presentation and navigation • Ease of maintenance • Performance

  37. Ease of Data Presentation & Navigation • Intuitive spreadsheet like data views are natural output of MDBs • Obtaining the same views in a relational environment, requires either a complex SQL or a SQL generator against a RDB to convert the table outputs into a more intuitive format • Top N queries are not possible with SQL at all

  38. Ease of Maintenance • Ease of maintenance because data is stored as it is viewed • No additional overhead is required to translate user queries into requests for data • To provide same intuitiveness, RDBs use indexes and sophisticated joins which require significant maintenance and storage

  39. Performance • Performance of MDDBs can be matched by RDBs through database tuning • Not possible to tune the database for all possible adhoc queries • Tuning requires resources of an expensive DB specialist • Aggregate navigators are helping RDBs to catch up with MDBs as far as aggregation queries are concerned

  40. M Mini Van Mini Van Mini Van O D Coupe Coupe Coupe E L Carr Carr Carr Sedan Sedan Sedan Gleason Gleason Gleason DEALERSHIP Clyde Clyde Clyde Blue Red White Blue Red White Blue Red White COLOR COLOR COLOR JANUARY FEBRUARY MARCH Adding Dimension –An Example Sales Volumes

  41. When is MDD (In)appropriate? First, consider situation 1 PERSONNEL EMPLOYEE# LAST NAME EMPLOYEE AGE SMITH 01 21 REGAN 12 19 FOX 31 63 WELD 14 31 KELLY 54 27 LINK 03 56 KRANZ 41 45 LUCUS 33 41 WEISS 23 19

  42. When is MDD (In)appropriate? Now consider situation 2 1. Set up a MDD structure for situation 1, with LAST NAME & Employee# as dimensions, and AGE as the measurement. 2. Set up a MDD structure for situation 2, with MODEL & COLOR as dimensions, and SALES VOLUME as the measurement.

  43. EmployeeAge Smith 21 Regan 19 Sales Volumes Fox 63 L A 6 5 4 Mini Van Weld 31 M S O T D 3 5 5 Coupe Kelly 27 E N L A 4 3 2 Link M 56 Sedan E Red White Blue Kranz 45 COLOR Lucas 41 Weiss 19 31 41 23 01 14 54 03 12 33 EMPLOYEE # When is MDD (In)appropriate? MDD Structures for the Situations Note the sparse between the two MDD representations

  44. When is MDD (In)appropriate? • Our sales volume dataset has a great number of meaningful interrelationships • Interrelationships more meaningful than individual data elements themselves. • The greater the number of inherent interrelationships between the elements of a dataset, the more likely it is that a study of those interrelationships will yield business information of value to the company. • Highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis

  45. When is MDD (In)appropriate? • No last name is matching with more than one emp # and no emp # is matching with more than one last name • In contrast, there is a sales figure associated with every combination of model and color resulting in a completed filled up 3x3 matrix • Performance suffers (RDB 9 vs. MDB 18)

  46. When is MDD (In)appropriate? • The relative performance advantages of storing multidimensional data in a multidimensional array increase as the size of the dataset increases • The relative performance disadvantages of storing non-multidimensional data in a multidimensional array increase as the size of the dataset increases. • NO inherent value of storing Non-multidimensional data (employee data) in multidimensional arrays

  47. When is MDD Appropriate? • The greater the number of inherent interrelationships between the elements of a dataset, the more likely it is that a study of those interrelationships will yield business information of value to the company. • Most companies have limited time and resources to devote to analyzing data • It therefore becomes critical that these highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis.

  48. Queries • High degree of structure in MDB makes the query language very simple and efficient • Query language is intuitive • Output is immediately useful to end user

  49. Queries: Example • Display sales volume by model for each dealership PRINT TOTAL.(SALES_VOLUME KEEP MODEL DEALERSHIP) Trends emerge and comparisons are easily made DEALERSHIP MODELCLYDE GLEASON CARR MINI VAN 7 5 6 SPORTS COUPE 4 6 8 SEDAN 3 8 12

  50. Queries: Example • Corresponding SQL SELECT MODEL, DEALERSHIP, SUM(SALES_VOLUME) FROM SALES_VOLUME GROUP BY MODEL, DEALERSHIP ORDER BY MODEL, DEALERSHIP MODEL | DEALERSHIP | SUM(SALES_VOLUME) MINI VAN | CLYDE | 7 MINI VAN | GLEASON | 5 MINI VAN | CARR | 6 SPORTS COUPE | CLYDE | 4 SPORTS COUPE | GLEASON | 6 SPORTS COUPE| CARR | 8 SEDAN | CLYDE | 3 SEDAN | GLEASON | 8 SEDAN| CARR | 12

More Related