Agenda
This presentation is the property of its rightful owner.
Sponsored Links
1 / 62

Agenda PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on
  • Presentation posted in: General

Agenda. ODI Performance ODI Scheduling ODI Deployment /Release. Uli Bethke. Dublin based Blog www.bi-q.ie ODI 2007 Reviewer two ODI books ODI articles OTN Deputy chair OUG BI SIG. Next event 11 th June ODI advanced trainer. ODI performance.

Download Presentation

Agenda

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Agenda

Agenda

  • ODI Performance

  • ODI Scheduling

  • ODI Deployment/Release


Uli bethke

Uli Bethke

  • Dublin based

  • Blog www.bi-q.ie

  • ODI 2007

  • Reviewer two ODI books

  • ODI articles OTN

  • Deputy chair OUG BI SIG. Next event 11thJune

  • ODI advanced trainer


Odi performance

ODI performance

ODI is a metadata driven (SQL) code generator using code templates (knowledge modules). It uses a Java agent to communicate and send data between source and target systems and the repository over the network.


Agenda

SQL

  • > 80%: ODI performance issues = SQL issues => SQL main ODI skill

  • Perfect your SQL. Advanced SQL. Analytic Functions

  • Know your database(s) inside out. In particular the target

  • Understand, write, and modify Knowledge Modules


Agent

Agent

  • Light weight Java based application

  • Tied to host OS

  • Generates code based on ODI metadata.

  • Communicates source, target, repository.

  • JDBC data transport

  • XML

  • Jetty

  • Interpreters: Jython, JBS, JavaScript, Groovy

  • HSQLDB in memory database

  • Scheduler

  • Sizing


Agent1

Agent

Target

  • Least amount of roundtrips. Network (JDBC, XML)

  • One target database server only (DW)

    Another Server

  • ODBC drivers

  • JEE agent on Weblogic

  • No support for target OS

  • Resources on target

  • DBA


Interfaces

interfaces

  • No!! KM using row by row processing

  • Use ODI functions rather than DB functions

  • Don’t overuse CKM (especially for large data volumes)

  • temp indexes (I$)

  • Gather statistics (C$, I$, TGT when applicable)

  • Rule of thumb: Use loader KMs or db link KMs rather than JDBC KMs


Source target

Source/target

  • Schemas on same database server. Physical schema and not data server.

  • Have sources physically close to target

  • Minimize impact on source

  • Chunking


Critical path

CRITICAL PATH

Network Paths:Path Durations:

B>E>H6+2+11=19B>D>F6+4+14=24B>D>G6+4+10=20A>C>G9+8+10=27Critical Path


Micro tuning

Micro Tuning

  • JDBC drivers

  • JVM

  • Type 4 or 5 JDBC drivers (Data Direct)

  • Array fetch size.

  • DB packet size.

  • Network packet size.


Performance monitoring

Performance Monitoring

  • ODI Log Data Mart

  • Facts

  • Dimensions

  • Metrics

  • Frontend


Dbms sqltune util0

Dbms_sqltune_util0

  • dbms_sqltune_util0.sqltext_to_sqlid

  • Link to Data Dictionary Tables


Maciej kocon

maciEJKOCON

  • Dublin based

  • ODI 2005 (Sunopsis)

  • Reviewer two ODI books

  • Blog www.bi-q.ie

  • [email protected]


Orchestrating dwh processes

ORCHESTRATING DWHPROCESSES

  • Orchestration of Data Process Flow

    • Standard DWH Process flow orchestration

    • Packages in Oracle Data Integrator 10g

    • Load Plans in Oracle Data Integrator 11g

  • Process Flow use cases - efficiency analysis

  • Alternative scheduling

    • benefits


Agenda

TYPICAL DATA FLOW in DWH

1

step

STAGE

E-LT

DATA EXTRACT

loads data from sources


Agenda

TYPICAL DATA FLOW in DWH

1

2

step

step

DIMs

STAGE

E-LT

LABEL

provides structured labeling

information

DATA EXTRACT

loads data from sources


Agenda

TYPICAL DATA FLOW in DWH

1

2

3

step

step

step

FACTS

DIMs

STAGE

E-LT

FACTS

consists of measurements, metrics or facts

LABEL

provides structured labeling

information

DATA EXTRACT

loads data from sources


Agenda

TYPICAL DATA FLOW in DWH

1

2

3

step

step

step

FACTS

DIMs

STAGE

E-LT

FACTS

consists of measurements, metrics or facts

LABEL

provides structured labeling

information

DATA EXTRACT

loads data from sources

data transport &

transform units


Agenda

TYPICAL DATA FLOW in DWH

1

2

3

step

step

step

FACTS

DIMs

STAGE

E-LT

FACTS

consists of measurements, metrics or facts

LABEL

provides structured labeling

information

DATA EXTRACT

loads data from sources

data transport &

transform units

ODI 11

Load Plans

ODI 10g

Packages

orchestration


Agenda

ORCHESTRATION – ODIPACKAGES

using object directly

PRC_B

PKG_ABC

PKG_DE

INT_A

INT_D

INT_C

INT_E


Agenda

ORCHESTRATION – ODIPACKAGES

using object directly

using scenarios – compiled code

PRC_B

PRC_B

SYNCHRONOUS

PKG_ABC

PKG_DE

PKG_ABCDE

PKG_DE

INT_A

INT_C

INT_A

INT_D

INT_C

INT_E


Agenda

ORCHESTRATION – ODIPACKAGES

using object directly

using scenarios – compiled code

PRC_B

PRC_B

PRC_B

SYNCHRONOUS

PKG_ABCDE

PKG_ABC

PKG_DE

PKG_ABCDE

PKG_DE

PKG_DE

INT_A

INT_A

INT_C

INT_C

INT_A

INT_D

INT_E

INT_C

ASYNCHRONOUS


Agenda

ODI 10g vs. ODI 11

FACTS

DIMs

STAGE

PRC_D

PRC_G

PRC_B

PKG_DM

A

C

D

F

E

G

B

PKG_FG

PKG_DE

PKG_ABC

ODI 10g

Packages

INT_C

INT_F

INT_A

INT_C


Agenda

ODI 10g vs. ODI 11

FACTS

DIMs

STAGE

PRC_G

PRC_D

PRC_B

PKG_DM

PKG_FG

PKG_DE

PKG_ABC

ODI 10g

Packages

INT_A

INT_C

INT_F

INT_C

ODI 11

Load plans


Agenda

ODI 10g vs. ODI 11

FACTS

DIMs

STAGE

PRC_B

PRC_G

PRC_D

PKG_DM

C

A

D

F

B

G

E

PKG_FG

PKG_DE

PKG_ABC

ODI 10g

Packages

INT_F

INT_C

INT_A

INT_C

ODI 11

Load plans

same

effect!


Agenda

PROCESS FLOW EFFICIENCY ANALYSIS

Standard Flow Orchestration: Stage-(stop)DIMs-(stop)Facts

30

A

B

G

C

F

D

E

sequential

10

30

10

10

10

10

parallel

10

30

10

10

30 + 30 + 10 = 70

G

A

B

F

D

E

C

10

30

10


Agenda

PROCESS FLOW EFFICIENCY ANALYSIS

Standard Flow Orchestration: Stage-(stop)DIMs-(stop)Facts

30

A

B

G

C

F

D

E

sequential

10

10

10

10

10

30

parallel

10

30

10

10

30 + 30 + 10 = 70

B

D

E

C

G

A

F

10

30

10

  • DOWNSIDES:

  • POSSIBLE INEFFICIENCIES (IDLE RESOURCES)


Agenda

PROCESS FLOW EFFICIENCY ANALYSIS

30

A

B

G

C

D

F

E

OPTIMIZATION ATTEMPT

10

10

10

30

10

10


Agenda

PROCESS FLOW EFFICIENCY ANALYSIS

30

A

B

G

C

F

D

E

OPTIMIZATION ATTEMPT

sequential

30

10

10

10

10

10

30 + 10

10 + 30

+ 10 = 50

parallel

10

30

10

10

G

F

B

C

A

E

D

10

30

10

70  50 = 1.4 times quicker!

  • UPSIDE:

  • EFFICIENCY IMPROVED


Agenda

ADVANCEDData Flow example


Agenda

Enterprise DWH Data Flow example


Agenda

Enterprise DWH Data Flow example


Agenda

PROCESS FLOW EFFICIENCY ANALYSIS

30

A

B

G

C

F

D

E

OPTIMIZATION ATTEMPT

sequential

30

10

10

10

10

10

30 + 10

10 + 30

+ 10 = 50

parallel

10

30

10

10

G

F

B

C

A

E

D

10

30

10

70  50 = 1.4 times quicker!

  • UPSIDE:

  • EFFICIENCY IMPROVED

  • DOWNSIDES:

  • TIMINGS KNOWLEDGE REQUIRED

  • OVERALL DEPENDECY KNOWLEDGE REQURED


Agenda

PROCESS FLOW EFFICIENCY ANALYSIS

30

A

B

G

C

F

D

E

OPTIMIZATION ATTEMPT

sequential

30

10

10

10

10

10

parallel

10

30

10

70

10

30 + 30 + 10 = 70

A

B

C

E

F

G

D

10

30

10

70

  • DOWNSIDE:

  • INEFFICIENCY EXISTS BUT CAN’T BE RESOLVED

  • CONSUMER WAITING & IMPACT


Agenda

TraditionalScheduling - limitations

  • Possible inefficiencies (idle resources)

  • Timings knowledge required

  • Overall dependecy knowledge requred

  • Inefficiency exists but can’t be resolved

  • Consumer waiting & impact


Agenda

TraditionalScheduling - limitations

  • Possible inefficiencies (idle resources)

  • Timings knowledge required

  • Overall dependecy knowledge required

  • Inefficiency exists but can’t be resolved

  • Consumer waiting & impact

SCHEDULER


Agenda

DEPENDENCY DRIVENScheduling

C

E

A

E

D

C

B

C

E

D

D

A

B

B

A

C

E

E

B

A

C

D

D

A

C

D

E

B

A

C

B

B

D

A

E


Agenda

DEPENDENCY DRIVENScheduling

C

A

B

E

D

C

B

D

E

E

C

A

B

A

D

C

D

B

A

C

D

E

E

B

A

D

E

B

A

C

C

B

PACKGAGES&LOAD PLANS

D

A

E


Agenda

PROCESS FLOW EFFICIENCY ANALYSIS

sequential

30

30

A

A

B

B

G

G

C

C

D

F

F

D

E

E

30

10

10

10

30

10

10

10

10

10

10

10

parallel

10

30

10

10

30 + 30 + 10 = 70

10

30

10

70

D

C

B

A

F

G

E

70

10

10

10

10

10

30

30


Agenda

PROCESS FLOW EFFICIENCY ANALYSIS

sequential

30

30

A

A

B

B

G

G

C

C

F

D

D

F

E

E

30

10

10

30

10

10

10

10

10

10

10

10

parallel

10

30

10

10

30 + 30 + 10 = 70

10

30

10

30

70

G

D

C

E

F

A

B

70

70

10

10

10

10

10

30

30

70  30 = 2.3 times faster!


Dependency driven scheduling

Dependency Driven Scheduling

  • Simplifies orchestrating the flow

    • only immediate upstream definition required

    • execution timings not relevant

    • self-adapts in the most effective way

  • Improves overall E-LT performance

    • Less idle resources – better utilization

    • Independency

    • unveilsits full potential in complex Enterprise class DWHs (Inmon)


Dependency driven scheduling1

Dependency Driven Scheduling

  • Notifications

    • errors (+auto-restartability)

    • finish summary

    • logging

  • Multiple/overlapping E-LT streams

    • load with different frequencies

  • Parameterization

    • improved system stress control

    • process prioritization


Agenda

FIRST RUN

10

processes


Agenda

FIRST RUN

TODAY

10

584

1389

DEPENDENCIES

processes

processes


Agenda

FIRST RUN

TODAY

10

584

1389

DEPENDENCIES

processes

processes

132 231 SCENARIOS RUN


Agenda

FIRST RUN

TODAY

10

584

1389

DEPENDENCIES

processes

processes

132 231 SCENARIOS RUN

12h43m

TIME

LOAD PLANS


Agenda

FIRST RUN

TODAY

10

584

1389

DEPENDENCIES

processes

processes

132 231 SCENARIOS RUN

2.9

12h43m

4h21m

TIME

TIMES

FASTER

LOAD PLANS

SCHEDULER


Agenda

Enterprise DWH Data FloW


Release 1 0

Release 1.0


Release 2 0 tst

Release 2.0 TST


Testing release 2 0

TESTING Release 2.0


Deploy release 2 0 prd

Deploy Release 2.0 PRD


The hot fix situation

The Hot fix SITUATION


Release frequently

Release frequently


Ci environment

CI environment


Ci environment1

CI environment


The build master

The build master


Automate stuff

AUTOMATE Stuff


Odi vs source control

ODI vs. Source control


Odi structure

ODI structure


Beyond intra build dependencies

Beyond intra build Dependencies


  • Login