Data Warehouses and Analytical Data Processing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 54

Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on
  • Presentation posted in: General

Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems. Jan Janke Software Engineer CERN / GS-AIS. October 25 - 29, 2010 JINR/CERN Grid and Management Information Systems. Agenda. Data Warehouses in Administrative Computing

Download Presentation

Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data warehouses and analytical data processing in cern s administrative decision making support systems

Data Warehouses and Analytical Data Processingin CERN’s Administrative Decision Making Support Systems

Jan JankeSoftware EngineerCERN / GS-AIS

October 25 - 29, 2010

JINR/CERN Grid and Management Information Systems


Agenda

Agenda

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • Data Warehouses in Administrative Computing

  • Recap: Data Warehouses Theory

  • Data Warehouses and Information Systems in AIS

    • Foundation, HR and FI Information Systems

    • Complex Data Extraction Processes

    • Pixel-Perfect Reporting

    • Dashboards

  • Detailed Data Warehouse Example

    • Management Data Layer (MDL)


Agenda1

Agenda

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • Data Warehouses in Administrative Computing

  • Recap: Data Warehouses Theory

  • Data Warehouses and Information Systems in AIS

    • Foundation, HR and FI Information Systems

    • Complex Data Extraction Processes

    • Pixel-Perfect Reporting

    • Dashboards

  • Detailed Data Warehouse Example

    • Management Data Layer (MDL)


Ca 16 000 people

Ca. 16,000 People

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Mankind s largest machine

Mankind’s Largest Machine

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Enormous amount of data

Enormous Amount of Data

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Administrative computing

Administrative Computing

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

Provides means to administrate CERN

Enables physicists to focus on their work

Allows management to make the right moves


Why data warehouses

Why Data Warehouses?

  • Support administrative staff

  • Enforce security and safety on site

  • Allow management to make decisions

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

Heterogeneous computing landscape

Various specialised OLTP systems

Planning needs

Legal Requirements


Example keep finances under control

Example: Keep Finances Under Control

Specialised small user groups

Distinct databases

Systems only accessible to authorised specialists

High availability

and performance,

real-time data

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • Specialised Systems

    • Accounting, ERP for CERN stores

    • External contracts management

    • Payroll, treasury management, …


Example keep finances under control1

Example: Keep Finances Under Control

Specialised small user groups

Distinct databases

Systems only accessible to authorised specialists

High availability

and performance,

real-time data

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • General Financial Information System

    • Single system

    • Access to data from multiple sources

    • Different levels of complexity


Example keep finances under control2

Example: Keep Finances Under Control

Users from all areas of CERN

Single data warehouse

Security is extremely important! System is accessible CERN wide.

High availability

and performance,

but no necessity for real-time data

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • General Financial Information System

    • Single system

    • Access to data from multiple sources

    • Different levels of complexity


Ais financial data warehouse

AIS’ Financial Data Warehouse

  • Technologies:

  • ORACLE RAC database

  • Java Enterprise web applications

  • In-house developed frameworks

  • Third-party BI and reporting tools

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

Keep data in sync with data providers

Master complex data extraction process

Ensure high query performance

Base for detailed data analysis


Agenda2

Agenda

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • Data Warehouses in Administrative Computing

  • Recap: Data Warehouses Theory

  • Data Warehouses and Information Systems in AIS

    • Foundation, HR and FI Information Systems

    • Complex Data Extraction Processes

    • Pixel-Perfect Reporting

    • Dashboards

  • Detailed Data Warehouse Example

    • Management Data Layer (MDL)


Find the needle in the hay

Find the Needle in the Hay …

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Oltp vs olap

OLTP vs OLAP

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Oltp vs olap1

OLTP vs OLAP

That’s theory!Real world is not that easy…

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Normalisation codd boyce

Normalisation (Codd/Boyce)

Not in 3NF, why ?

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • 1NF

    • 1 table = 1 relation, no repeating groups or duplicate rows

  • 2NF

    • All non prime attributes depend on all parts (attributes) of a composite key

  • 3NF

    • All non prime attributes depend only on the (whole) key


Star schema

Star Schema

item

time

item_key

item_name

brand

type

supplier_type

time_key

dayday_of_the_week

month

quarter

year

Sales Fact Table

time_key

item_key

branch_key

location

location_key

Branch

location_key

Street

city

state_or_province

country

units_sold

branch_key

branch_name

branch_type

dollars_sold

avg_sales

Measures

Source: http://www.executionmih.com/data-warehouse/star-snowflake-schema.php (16/10/2010)

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Snowflake schema

SnowflakeSchema

item

time

supplier

item_key

item_name

brand

type

supplier_key

time_key

dayday_of_the_week

month

quarter

year

Sales Fact Table

supplier_key

Supplier_type

time_key

item_key

branch_key

location

location_key

Branch

location_key

street

city_key

units_sold

branch_key

branch_name

branch_type

dollars_sold

city

avg_sales

city_key

city

state_or_province

country

Measures

Source: http://www.executionmih.com/data-warehouse/star-snowflake-schema.php (16/10/2010)

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


From operations to reporting

From Operations to Reporting

FI

ERP

HR

Source: http://www.deakin.edu.au/ddw/what-is.php (16/10/2010)

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Analysis

Analysis

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • Data Mining

  • Drilldown

    • Finer detail granularity (e.g. add a group-by column)

  • Slice & dice

    • Play with the dimensions

      • Combine different dimensions

      • Remove/add a dimension

      • Analyse fact changes


Agenda3

Agenda

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • Data Warehouses in Administrative Computing

  • Recap: Data Warehouses Theory

  • Data Warehouses and Information Systems in AIS

    • Foundation, HR and FI Information Systems

    • Complex Data Extraction Processes

    • Pixel-Perfect Reporting

    • Dashboards

  • Detailed Data Warehouse Example

    • Management Data Layer (MDL)


Cern ais business map

CERN/AIS Business Map

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Foundation

Foundation

Operative systems

HR Information System (HRT)

FI Information System (CET)

… more domain specific information systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

Common data layer for various AIS services

Data interfaces for other CERN services

Common applications (e.g. mgmt. of roles)


Various specialised systems

Various Specialised Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

ORACLE HR

CERN Training Application

Safety & access systems

EDH (Electronic Document Handling)

Accounting Application

ERP system for CERN stores

Contract follow-up


Technical environment

Technical Environment

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • Source databases:

    • ORACLE 10g

    • Microsoft Excel

  • HR/FI Information Systems:

    • ORACLE 10g

    • Java Enterprise web applications

    • SAP Business Objects tool family


Data extractions

Data Extractions

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • Nightly scheduled batch jobs

  • Extractions organised in SQL scripts

  • Run by self-developed “batch runner”

    • Controls

      • Order of execution (sequential, parallel)

      • Criticality

      • Logging

      • Problem escalation (automatic emails)


Definition of extraction process 1

Definition of Extraction Process (1)

General definitions

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Definition of extraction process 2

Definition of Extraction Process (2)

Batches & commands

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Importance of monitoring

Importance of Monitoring

New hardware for DEV databases (gain > 1h)

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Turtle or leopard

Turtle or Leopard ?

The difference may be subtle …

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Oracle materialised views

ORACLE Materialised Views

Source: ORACLE 10g Documentation / Data Warehousing Guide

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

Pre-aggregated summaries

Benefit from query rewrite


Materialised summary views

Materialised (Summary) Views

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

Don’t use remote tables if you need query rewrite

Create materialized view log on all source tables


Snapshots

Snapshots

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • Use snapshots to efficiently access remote tables

    • Syntax: CREATE SNAPSHOT … AS [Your Query]

    • Refresh options:

      • FAST

      • COMPLETE

      • FORCE


Pipelined functions

Pipelined Functions

1

CREATE OR REPLACE TYPE myTableFormat

AS OBJECT(

col_a NUMBER,

col_b DATE,

col_c VARCHAR2(25) )

/

CREATE OR REPLACE TYPE myTableType

AS TABLE OF myTableFormat

/

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

PL/SQL is data source instead of a table

May increase performance in environments with heavy PL/SQL use


Pipelined functions1

Pipelined Functions

2

CREATE OR REPLACE FUNCTION myFunc

RETURN myTableType PIPELINED IS


BEGIN

FOR i in 1 .. 5
   

LOOP

PIPE ROW ( myTableFormat(

i, SYSDATE+i, 'Row '||i ) );

END LOOP;
   

RETURN;
 

END;


END;


/

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Pipelined functions2

Pipelined Functions

3

SELECT * FROM TABLE( myFunc() );col_a col_b col_c

--------- ---------- ----------

1 27/10/2010 Row 1

2 28/10/2010 Row 2

3 29/10/2010 Row 3

4 30/10/2010 Row 4

5 31/10/2010 Row 5

Use a pipelined function if you require a data source other than a table!

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Database design

Database Design

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • Star schema like

  • Highly de-normalised incl. duplication of data

  • Use single-attribute keys wherever possible

  • Performance matters!

    • Be careful when extracting over database links

    • Certain tables from operational systems are copied

    • Deletion & recreation of indexes

    • Use partitions

    • Manual control of statistics collection

    • Optimizing execution plans very time-consuming


Reporting application framework

Reporting Application Framework

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

Column and ordering selection

Sub reports

Various output formats (e.g. HTML, PDF)

Charts

Self-service reporting

Automated scheduled report execution

Row and column based access control


Data access

Data access

Which data (columns) am I allowed to see? As a supervisor I may not be entitled to see the health insurance category. A safety or medical officer may not see the salary, etc.

Which rows are visible to me? Unit leader of B only sees persons from Unit B.

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


User interface

User Interface

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Pixel perfect forms

Pixel Perfect Forms

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • Use of Apache FOP library

    • Examples:

      • Employment & training attestations

      • Swiss / French card application forms

  • Business Objects XI Enterprise

    • Direct use

    • Indirect use via Business Objects Java SDK

    • Examples:

      • Salary slips

      • Car stickers

      • Work orders


Business objects

Business Objects

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • Commercial tool family from SAP

  • Advantages

    • Rich reporting possibilities (interactive or via SDK)

    • Appealing dashboards using Xcelsius

    • Only a few users need the knowledge to design reports

  • Drawbacks

    • Two-way data storage (file system & database)

    • Sometimes stability problems

    • Time-intensive administration and maintenance

    • Expensive


Management dashboards

Management Dashboards

Designed locally using MS Office and Xcelsius.

Data comes from the MDL data warehouse.

Published as Flash to the BO Server.

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Agenda4

Agenda

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • Data Warehouses in Administrative Computing

  • Recap: Data Warehouses Theory

  • Data Warehouses and Information Systems in AIS

    • Foundation, HR and FI Information Systems

    • Complex Data Extraction Processes

    • Pixel-Perfect Reporting

    • Dashboards

  • Detailed Data Warehouse Example

    • Management Data Layer (MDL)


Management data layer mdl

Management Data Layer (MDL)

Performance: Currently ca. 170 GB data in two tables

Generality: Different forms of data sources, new sources are added and removed all the time.

Integration with existing tools and development frameworks (ORACLE, Excel, BO, …)

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

KPI data warehouse

Very extensible

Fixed generic schema

Feeds management dashboards


Mdl data model

MDL Data Model

n

MDL_HEADERS

MDL_DIMENSIONS

n

MDL_VALUES

describes

n

n

MDL_RAW_DATA

n

n

MDL_SUMMARY_DATA

MDL_LOOKUP_DATA

describes

MDL_LOOKUP_INFO

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Mdl data model1

MDL Data Model

n

MDL_HEADERS

MDL_DIMENSIONS

n

MDL_VALUES

describes

n

n

MDL_RAW_DATA

n

n

MDL_SUMMARY_DATA

MDL_LOOKUP_DATA

describes

MDL_LOOKUP_INFO

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Mdl data model2

MDL Data Model

n

MDL_HEADERS

MDL_DIMENSIONS

n

MDL_VALUES

describes

n

n

MDL_RAW_DATA

n

n

MDL_SUMMARY_DATA

MDL_LOOKUP_DATA

describes

MDL_LOOKUP_INFO

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Mdl data model3

MDL Data Model

n

MDL_HEADERS

MDL_DIMENSIONS

n

MDL_VALUES

describes

n

n

MDL_RAW_DATA

n

n

MDL_SUMMARY_DATA

MDL_LOOKUP_DATA

describes

MDL_LOOKUP_INFO

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Fact table partitioning

Fact Table Partitioning

Range Partitioning

Jan Janke: "Data Warehouses and Analytical Data Processing ..."


Query optimisation

Query optimisation

SELECT dimension1, dimension3, sum( value2)FROM mdl_raw_data

WHERE data_id = 45

AND value_date > 20100000

GROUP BY dimension1, dimension2

ORDER BY 1, 2;

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

Keep it simple

Redesign / add data source if required

Use partitions and indexes


Remember

Remember:

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • High data volumes + analysis = data warehouse

  • OLTP vs. OLAP

  • Use the facilities the tool provides

    • Materialized views, snapshots, pipelined functions

  • Keep things extensible and simple!

  • Partitions are very helpful


Thank you

Thank You!

Jan Janke: "Data Warehouses and Analytical Data Processing ..."

  • Data Warehouses in Administrative Computing

  • Recap: Data Warehouses Theory

  • Data Warehouses and Information Systems in AIS

    • Foundation, HR and FI Information Systems

    • Complex Data Extraction Processes

    • Pixel-Perfect Reporting

    • Dashboards

  • Detailed Data Warehouse Example

    • Management Data Layer (MDL)


  • Login