data warehousing with mysql by anand pandey l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data warehousing with MySQL By Anand Pandey PowerPoint Presentation
Download Presentation
Data warehousing with MySQL By Anand Pandey

Loading in 2 Seconds...

play fullscreen
1 / 24

Data warehousing with MySQL By Anand Pandey - PowerPoint PPT Presentation


  • 473 Views
  • Uploaded on

Data warehousing with MySQL By Anand Pandey. MySQL. MS-SQL. Oracle. DB2. Flat Files. MySQL. Agenda. Introduction Free and Open Source Software Data Warehousing application Extraction, Transformation and Loading Partitioning and Storage Engine Configuration Parameters

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data warehousing with MySQL By Anand Pandey' - Ava


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data warehousing with mysql by anand pandey
Data warehousing with MySQLBy Anand Pandey

MySQL

MS-SQL

Oracle

DB2

Flat Files

MySQL

agenda
Agenda
  • Introduction
  • Free and Open Source Software
  • Data Warehousing application
  • Extraction, Transformation and Loading
  • Partitioning and Storage Engine
  • Configuration Parameters
  • Business Intelligence
  • Summary
  • Q & A
introduction
Introduction

MySQL AB develops and markets a family of high performance, affordable database servers and tools.

MySQL is a key part of LAMP (Linux, Apache, MySQL, PHP / Perl / Python), a fast growing open source enterprise software stack.

Anand Pandey, Senior Consultant, MySQL Inc.

Josh Chamas, Senior Consultant, MySQL Inc.

free and open source software

Freeand Open Source Software

MySQL is licensed under GPL.

The GPL is a Free and Open Source Software (FOSS) license that grants licensees many rights to the software under the condition that, if they choose to share the software, or software built with GPL-licensed software, they share it under the same liberal terms.

free and open source software5
Free and Open Source Software

Quid Pro Quo

MySQL has a dual license that works on a quid pro quo basis—i.e., if you're free, MySQL is free. If you're closed, you need a license.

free and open source software6

Freeand Open Source Software

Advantages of Open Source

MySQL has 5 million plus active installation base.

New releases immediately downloaded by users providing early feedback on bugs and features.

Access to source code

Write your own features/proprietary Storage Engine

Freedom !

data warehousing application

DataWarehousing application

Data Warehouse is a relational database.

It is designed for query and analysis rather than for transaction processing.

It enables an organization to consolidate data from several resources.

data warehousing application8
DataWarehousing application

Why DWH?

  • How to measure and manage your company's intangible assets?
  • How to leverage its data for competitive advantage ?
  • How to measure sales performance of previous year?
  • Which department produced the maximum profits in the current financial year?

SOLUTION: Create and Manage Data Warehouse.

data warehousing application10

A Typical Data Warehouse

Data

Source

Staging

Area

DWH

BI / OLTP

MySQL

Mining

Oracle

Meta

Data

Staging

Database

AWH

SWH

SWH

Analysis

MS-SQL

Flat

File

Reporting

DataWarehousing application
data warehousing application11
DataWarehousing application

DWH Design

  • Identification of important things (Entities), their properties (Attributes) and relationship among them (ER modeling ).
  • Summary data is more important than individual transactions (Physical and Logical Design).
  • Use tools for modeling like ERWin and many others.
data warehousing application12
DataWarehousing application

DWH Design

  • Most common schemas
    • Third Normal Form schema
    • Star schema
    • Snowflake schema
  • Most popular table structure
    • Fact Table
    • Dimensional tables
extraction transformation and loading

Transform

Extraction ,Transformation and Loading

Data

Source

Extract

Load

MERGE

& BULK

INSERT

MERGE

Tables

Storage

Indexes,

Memory

Views, Summary

Staging

Tables

Users

AWH

SWH

HEAP

Perfor-

mance

OLTP/ BI

extraction transformation and loading14
Extraction ,Transformation and Loading
  • Staging database
  • “LOAD DATA INFILE ….” Command.
  • Merging of SQLs
  • Segregating Informations
  • View enhancements
  • Index Enhancement
  • Memory Manipulation
extraction transformation and loading15
Extraction, Transformation and Loading

Staging Area and its benefits

Relational Table structures are flattened to support extract processes in Staging Area.

  • First data is loaded into the temporary table and then to the main DB tables.
  • Reduces the required space during ETL.
  • Data can be distributed to any number of data marts
partitioning and storage engine
Partitioning and Storage Engine

The MERGE Table

  • A collection of identical MyISAM tables used as one
  • You can use SELECT, DELETE, UPDATE, and INSERT on the collection of tables.
  • Use it when having large tables
  • DROP the MERGE table, you drop only the MERGE spec.
  • Advantage : manageability and performance

MERGESALES Table

Sales

for

Yr’04

Aug’04

Oct’04

Sep’04

partitioning and storage engine17
Partitioning and Storage Engine

MERGING based on month as Range

JUN2004

JUN2004

-

OCT2004

JUL2004

AUG2004

SEP2004

OCT2004

partitioning and storage engine18
Partitioning and Storage Engine

MERGE Table Example

mysql> CREATE TABLE jan04 ( -> a INT NOT NULL AUTO_INCREMENT PRIMARY KEY, -> message CHAR(20));

mysql> CREATE TABLE feb04 ( -> a INT NOT NULL AUTO_INCREMENT PRIMARY KEY, -> message CHAR(20));

mysql> CREATE TABLE year04 ( -> a INT NOT NULL AUTO_INCREMENT, -> message CHAR(20), INDEX(a)) -> TYPE=MERGE UNION=(jan04,feb04) INSERT_METHOD=LAST;

partitioning and storage engine19
Partitioning and Storage Engine

MyISAM Storage Engine

  • Supports MERGE table.
  • Support fulltext indexing
  • “INSERT DELAYED ...” option very useful when clients can't wait for the INSERT to complete. Many client bundled together and written in one block
  • Compress MyISAM tables with “myisampack” to take up much less space.
  • Benefit from higher performance on SELECT statements
partitioning and storage engine20
Partitioning and Storage Engine

Restrictions on MERGE tables

  • You can use only identical MyISAM tables for a MERGE table.
  • MERGE tables use more file descriptors. If 10 clients are using a MERGE table that maps to 10 tables, the server uses (10*10) + 10 file descriptors.
  • Key reads are slower. When you read a key, the MERGE storage engine needs to issue a read on all underlying tables to check which one most closely matches the given key.
partitioning and storage engine21
Partitioning and Storage Engine

my.cnf parameters for DWH (example)

  • key_buffer = 1G
  • myisam_sort_buffer_size = 256M
  • sort_buffer = 5M
  • query_cache_type = 1
  • query_cache_size = 100Mkey_buffer is the important one, this tells mysql how much memory to cap itself
business intelligence
Business Intelligence

Using MySQL database server

  • Drastically reduce information retrieval by distributing data into replicated clusters. This enables parallel processing.
  • Tighter storage format (3 TB squeezed to 1TB)
  • Aggregate huge amount of data and deliver reports for OLAP
  • Relieve overloaded OLTP databases
  • Availability, scalability and throughput for the most demanding applications, and of course affordability
summary
Summary
  • Free and Open Source under GPL
  • MyISAM Storage Engine
  • No Transactional Overhead
  • MERGE Table
  • Tighter storage format
  • Highly efficient