Building scalable olap applications with mondrian and mysql
Download
1 / 41

Building Scalable OLAP Applications with Mondrian and MySQL - PowerPoint PPT Presentation


  • 130 Views
  • Uploaded on

Building Scalable OLAP Applications with Mondrian and MySQL. Julian Hyde: Chief Architect, OLAP, at Pentaho and Mondrian Project Founder Tuesday, April 24th 2007. Agenda. Pentaho Introduction What is OLAP? Key features and architecture Getting started Schemas & queries

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Building Scalable OLAP Applications with Mondrian and MySQL' - allayna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Building scalable olap applications with mondrian and mysql

Building Scalable OLAP Applications with Mondrian and MySQL

Julian Hyde: Chief Architect, OLAP, at Pentaho

and Mondrian Project Founder

Tuesday, April 24th 2007


Agenda
Agenda

  • Pentaho Introduction

  • What is OLAP?

  • Key features and architecture

  • Getting started

  • Schemas & queries

  • Tuning & scalability

  • Active Data Warehousing

  • Case Studies

  • Business Intelligence suite

  • Q & A


Pentaho introduction
Pentaho Introduction

  • World’s most popular enterprise open source BI Suite

    • 2 million lifetime downloads, averaging 100K / month

    • Founded in 2004: Pioneer in professional open source BI

  • Management - proven BI and open source veterans

    • from Business Objects, Cognos, Hyperion, JBoss, Oracle, Red Hat, SAS

  • Board of Directors – deep expertise and proven success in open source

    • Larry Augustin - founder, VA Software, helped coin the phrase “open source”

    • New Enterprise Associates – investors in SugarCRM, Xensource, others

    • Index Ventures – investors in MySQL, Zend, others

  • Widely recognized as the leader in open source BI

    • Distributed worldwide by Red Hat via the Red Hat Exchange

    • Embedded in next release of OpenOffice (40 million users worldwide)


What is olap
What is OLAP?

  • View data “dimensionally”

    • i.e. Sales by region, by channel, by time period

  • Navigate and explore

    • Ad Hoc analysis

    • “Drill-down” from year to quarter

    • Pivot

    • Select specific members for analysis

  • Interact with high performance

    • Technology optimized for rapid interactive response


A brief history of olap
A brief history of OLAP

2010

Pentaho

BEE

Open-source BI suites

JRubik

OpenI

JPivot

Palo

Open-source OLAP

Mondrian

MySQL

2000

Microsoft OLAP Services

Oracle partitions,bitmap indexes

RDBMS support for DW

Informix MetaCube

Codd’s “12Rules of OLAP”

Sybase IQ

Business Objects

1990

Desktop OLAP

MicroStrategy

Essbase

Cognos PowerPlay

Spreadsheets

1980

Comshare

Mainframe OLAP

Express

1970

APL



Key features
Key Features

  • On-Line Analytical Processing (OLAP) cubes

    • automated aggregation

    • speed-of-thought response times

  • Open Architecture

    • 100% Java

    • J2EE

    • Supports any JDBC data source

    • MDX and XML/A

  • Analysis Viewers

    • Enables ad-hoc, interactive data exploration

    • Ability to slice-and-dice, drill-down, and pivot

    • Provides insights into problems or successes


How mondrian extends mysql for olap applications

MySQL Provides

Data storage

SQL query execution

Heavy-duty sorting, correlation, aggregation

Integration point for all BI tools

Mondrian Provides

Dimensional view of data

MDX parsing

SQL generation

Caching

Higher-level calculations

Aggregate awareness

How Mondrian Extends MySQL for OLAP Applications


Open architecture

Microsoft Excel(via Spreadsheet Services)

Open Architecture

Viewers

  • Open Standards (Java, XML, MDX, XML/A, SQL)

  • Cross Platform (Windows & Unix/Linux)

  • J2EE Architecture

    • Server Clustering

    • Fault Tolerance

  • Data Sources

    • JDBC

    • JNDI

Web Server

JPivot servlet

J2EE Application Server

File or RDBMS Repository

JPivot servlet

XML/A servlet

Mondrian

Cube

Schema

XML

Cube

Schema

XML

Cube

Schema

XML

cube

cube

cube

JDBC

JDBC

JDBC

RDBMS

RDBMS



Getting started with mondrian
Getting started with Mondrian

  • Download mondrian from http://mondrian.pentaho.org and unzip

  • Copy mondrian.war to TOMCAT_HOME/webapps.

  • Create a database:

    $mysqladmin create foodmart$mysqlmysql>grant all privileges on *.* to 'foodmart'@'localhost' identified by 'foodmart';Query OK, 0 rows affected (0.00 sec)mysql>quitBye

  • Install demo dataset (300,000 rows):

    $ java -cp "/mondrian/lib/mondrian.jar:/mondrian/lib/log4j-1.2.9.jar:/mondrian/lib/eigenbase-xom.jar:/mondrian/lib/eigenbase-resgen.jar:/mondrian/lib/eigenbase-properties.jar:/usr/local/mysql/mysql-connector-java-3.1.12-bin.jar"     mondrian.test.loader.MondrianFoodMartLoader     -verbose -tables -data -indexes     -jdbcDrivers=com.mysql.jdbc.Driver     -inputFile=/mondrian/demo/FoodMartCreateData.sql      -outputJdbcURL= "jdbc:mysql://localhost/foodmart?user=foodmart&password=foodmart“

  • Start tomcat, and hit http://localhost:8080/mondrian




A mondrian schema consists of
A Mondrian schema consists of…

A dimensional model (logical)

  • Cubes & virtual cubes

  • Shared & private dimensions

  • Calculated measures in cube and in query language

  • Parent-child hierarchies

    … mapped onto a star/snowflake schema (physical)

  • Fact table

  • Dimension tables

  • Joined by foreign key relationships


Writing a mondrian schema
Writing a Mondrian Schema

  • Regular cubes, dimensions, hierarchies

  • Shared dimensions

  • Virtual cubes

  • Parent-child hierarchies

  • Custom readers

  • Access-control

<!-- Shared dimensions -->

<Dimension name="Region">

<Hierarchy hasAll="true“ allMemberName="All Regions">

<Table name="QUADRANT_ACTUALS"/>

<Level name="Region" column="REGION“ uniqueMembers="true"/>

</Hierarchy>

</Dimension>

<Dimension name="Department">

<Hierarchy hasAll="true“ allMemberName="All Departments">

<Table name="QUADRANT_ACTUALS"/>

<Level name="Department“ column="DEPARTMENT“ uniqueMembers="true"/>

</Hierarchy>

</Dimension>

  • (Refer to http://mondrian.pentaho.org/documentation/schema.php )


Tools
Tools

  • Schema Workbench

  • Pentaho cube designer

  • cmdrunner


Mdx multi dimensional expressions
MDX – Multi-Dimensional Expressions

  • A language for multidimensional queries

  • Plays the same role in Mondrian’s API as SQL does in JDBC

  • SQL-like syntax

  • … but un-SQL-like semantics

SELECT {[Measures].[Unit Sales]} ON COLUMNS, {[Store].[USA], [Store].[USA].[CA]} ON ROWS

FROM [Sales]

WHERE [Time].[1997].[Q1]

  • (Refer to http://mondrian.pentaho.org/documentation/mdx.php )



Tuning is a process
Tuning is a Process

Some techniques:

  • Test performance on a realistic data set!

  • Use tracing to identify long-running SQL statements

    • mondrian.trace.level=2

  • Tune SQL queries

  • Run the same query several times, to examine cache effectiveness

  • When Mondrian starts, prime the cache by running queries

  • Get to know Mondrian’s system properties


Tuning mysql for mondrian
Tuning MySQL for Mondrian

  • Fact table:

    • Index foreign keys

    • Try indexing multiple combinations of foreign keys

    • Use MyISAM or MERGE (also known as MRG_MyISAM)

    • Partitioned tables

  • Dimension tables:

    • Index primary and foreign keys

    • Use MyISAM for large dimension tables, MEMORY for smaller dimension tables

  • Create aggregate tables:

    • Contain pre-computed summaries

    • Prevent full table scan followed by massive GROUP BY

  • Parent-child hierarchies

    • Create closure tables


Scaling to large datasets
Scaling to large datasets

  • Tune MySQL

    • Partitions allow parallelism

  • Aggregate tables allow you to do the hard work to load-time, not runtime

  • Increase cache size


Scaling to large numbers of concurrent users

Viewers

Viewers

Viewers

Web Server

Web Server

JPivot servlet

JPivot servlet

Web Server

Web Server

Web Server

Web Server

Web Server

Web Server

Web Server

Web Server

JPivot servlet

JPivot servlet

JPivot servlet

JPivot servlet

JPivot servlet

JPivot servlet

JPivot servlet

JPivot servlet

J2EE Application Server

J2EE Application Server

J2EE Application Server

J2EE Application Server

XML/A servlet

JPivot servlet

J2EE Application Server

J2EE Application Server

XML/A servlet

XML/A servlet

Mondrian

Mondrian

XML/A servlet

XML/A servlet

cube

cube

Mondrian

Mondrian

Mondrian

Mondrian

cube

cube

cube

cube

JDBC

JDBC

JDBC

JDBC

JDBC

JDBC

RDBMS

RDBMS

RDBMS

RDBMS

RDBMS

Scaling to large numbers of concurrent users

  • Tune MySQL

  • Increase JVM memory

  • Make sure that connections are using the same cache

  • Multiple web servers

  • Multiple mondrian instances

  • Multiple DBMS instances with read-only copies of the data



Active data warehousing1
Active Data Warehousing

An Active Data Warehouse:

  • Extends the traditional DW to support operational processing

  • Provides a single view of the business

  • Is updated in near real-time

    Cost/benefit:

  • High cost (technical challenges)

  • High reward (time is money)


Technical challenges of active data warehousing
Technical challenges of Active Data Warehousing

  • DBMS has mixed work-load

    • Short, fast queries to query specific records

    • Longer queries to look at changing patterns

      • Emphasis on recent data

    • Background DML to keep database up to date

  • Continuous, incremental ETL

    • Continuous or near real-time

    • Incremental

    • Consider message-driven ETL

  • OLAP cache

    • OLAP servers use cache and/or MOLAP database for performance

    • Cache consistency


Mondrian and active data warehouse
Mondrian and Active Data Warehouse

  • “ROLAP with caching”

    • No MOLAP data store to maintain

  • Disable aggregate tables

    • Unless your ETL process can maintain these incrementally

  • Either turn off cache

    • mondrian.rolap.star.disableCaching=true

    • “Pure ROLAP” mode

    • Results are cached for the lifetime of a single MDX statement

  • Or use new CacheControl API to selectively flush mondrian’s cache


Cache control
Cache control

  • In an Active Warehouse, the most recent data are modified much more often than historic data

  • Cache control API allows the application to selectively flush the cache

  • (Refer to http://mondrian.pentaho.org/documentation/cache_control.php )


Cache control a simple example
Cache control: A simple example

  • Let’s run a simple MDX query:

SELECT{[Time].[1997], [Time].[1997].[Q1], [Time].[1997].[Q2]} ON COLUMNS,{[Customers].[USA], [Customers].[USA].[OR], [Customers].[USA].[WA]} ON ROWSFROM [Sales]


Cache contents after running query
Cache contents after running query

  • year=1997, nation=USA, measure=unit_sales

  • year=1997, nation=USA, state={OR, WA}, measure=unit_sales

  • year=1997, quarter=any, nation=USA, measure=unit_sales

  • year=1997, quarter=*, nation=USA,state={OR, WA}, measure=unit_sales


Flushing part of a segment
Flushing part of a segment

Cachemanager

  • year=1997, quarter=*, nation=USA,state={OR, WA}, measure=unit_sales

  • year=1997, quarter=*, nation=USA,state={OR, WA}, measure=unit_sales;exclude: state=WA and quarter=Q2

  • year=1997, quarter=*, nation=USA,state={CA, WA}, measure=unit_sales


Code using cachecontrol to flush part of a segment
Code using CacheControl to flush part of a segment

Connection connection;

CacheControl cacheControl = connection.getCacheControl(null);

Cube salesCube = connection.getSchema().lookupCube("Sales", true);

SchemaReader schemaReader = salesCube.getSchemaReader(null);

// Create region containing [Customer].[USA].[OR].

Member memberOregon = schemaReader.getMemberByUniqueName(

new String[]{“Customer", “USA", “OR"}, true);

CacheControl.CellRegion regionOregon =

cacheControl.createMemberRegion(memberOregon, true);

// Create region containing [Customer].[USA].[OR].

Member memberTimeQ2 = schemaReader.getMemberByUniqueName(

new String[]{“Time", “1997", “Q2"}, true);

CacheControl.CellRegion regionTimeQ2 =

cacheControl.createMemberRegion(memberTimeQ2, true);

// Create region containing ([Customer].[USA].[OR], [Time].[1997].[Q2])

CacheControl.CellRegion regionOregonQ2 =

cacheControl.createCrossjoinRegion(regionOregon, regionTimeQ2);

cacheControl.flush(regionOregonQ2);



Case study frontier airlines
Case Study: Frontier Airlines

Frontier Airlines

Key Challenges

  • Understanding and optimizing fares to ensure

    • Maximum occupancy (no empty seats)

    • Maximum profitability (revenue per seat)

      Pentaho Solution

  • Pentaho Analysis (Mondrian)

  • Chose Open Source RDBMS and Mondrian over Oracle

  • 500 GB of data, 6 server cluster

    Results

  • Comprehensive, integrated analysis to set strategic pricing

  • Improved per-seat profitability (amount not disclosed)

    Why Pentaho

  • Rich analytical and MDX functionality

  • Cost of ownership

“The competition is intense in the airline industry and Frontier is committed to staying ahead of the curve by leveraging technology that will help us offer the best prices and the best flight experience…. [the application] fits right in with our philosophy of providing world-class performance at a low price.”


Case study u s based public software company
Case Study: U.S.-based, Public Software Company

OEM Example

Key Requirements

  • Embedding analysis within a pricing planning application

    Incumbent Vendor

  • MicroStrategy

    Decision Criteria

  • Embeddability

  • Extensibility

  • Cost of ownership

    Best Practices

  • Evaluate replacement costs holistically

  • Treat migrations as an opportunity to improve a deployment, not just move it

  • Understand and prioritize requirements – functionality, usability, architecture, etc.

“The old solution (MicroStrategy) was built to be a standalone application. It expected to own the browser session, to have separate security, and its own user interface standards. We were able to make Mondrian a seamless part of the application, and it was much easier to do so.”



Business intelligence suite
Business Intelligence Suite

  • Mondrian OLAP

  • Analysis tools:

    • Pivot table

    • Charting

    • Dashboards

  • ETL (extract/transform/load)

  • Integration with operational reporting

  • Integration with data mining

  • Actions on operational data

  • Design/tuning tools


Pentaho open source bi offerings
Pentaho Open Source BI Offerings

All available in a Free Open Source license


A sample of joint mysql pentaho users
A Sample of Joint MySQL-Pentaho Users

“Pentaho provided a robust, open source platform for our sales reporting application, and the ongoing support we needed. The experts at OpenBI provided outstanding services and training, and allowed us to deploy and start generating results very quickly.”

“We selected Pentaho for its ease-of-use. Pentaho addressed many of our requirements -- from reporting and analysis to dashboards, OLAP and ETL, and offered our business users the Excel-based access that they wanted.”


Next steps and resources
Next Steps and Resources

  • Contact Information

    • Julian Hyde, Chief Architect, [email protected]

  • More information http://www.pentaho.org and http://mondrian.pentaho.org

  • Pentaho Community Forum http://community.pentaho.org

    • Go to Developer Zone

    • Discussions

  • Pentaho BI Platform including Mondrian http://www.pentaho.org/download/latest

  • Mondrian OLAP Library only

    http://sourceforge.net/project/showfiles.php?group_id=35302


Birds of a feather session mysql data warehousing and bi
Birds of a Feather session:MySQL Data Warehousing and BI

Speakers:

  • James Dixon, Senior Architect and Chief Technology (a.k.a. "Chief Geek"), Pentaho Corporation

  • Roland Bouman, Certification Developer, MySQL A.B.

  • Matt Casters, Data Integration Architect and Kettle Project Founder, Pentaho

  • Julian Hyde, OLAP Architect and Mondrian Project Founder, Pentaho

  • Brian Miezejewski, Principal Consultant, MySQL inc.

    Track: Data Warehousing and Business Intelligence

    Date: TONIGHT!!

    Time: 7:30pm - 9:00pm

    Location: Camino Real



ad