agenda
Download
Skip this Video
Download Presentation
Agenda

Loading in 2 Seconds...

play fullscreen
1 / 62

Agenda - PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on

Agenda. ODI Performance ODI Scheduling ODI Deployment /Release. Uli Bethke. Dublin based Blog www.bi-q.ie ODI 2007 Reviewer two ODI books ODI articles OTN Deputy chair OUG BI SIG. Next event 11 th June ODI advanced trainer. ODI performance.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Agenda' - simeon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
agenda
Agenda
  • ODI Performance
  • ODI Scheduling
  • ODI Deployment/Release
uli bethke
Uli Bethke
  • Dublin based
  • Blog www.bi-q.ie
  • ODI 2007
  • Reviewer two ODI books
  • ODI articles OTN
  • Deputy chair OUG BI SIG. Next event 11thJune
  • ODI advanced trainer
odi performance
ODI performance

ODI is a metadata driven (SQL) code generator using code templates (knowledge modules). It uses a Java agent to communicate and send data between source and target systems and the repository over the network.

slide4
SQL
  • > 80%: ODI performance issues = SQL issues => SQL main ODI skill
  • Perfect your SQL. Advanced SQL. Analytic Functions
  • Know your database(s) inside out. In particular the target
  • Understand, write, and modify Knowledge Modules
agent
Agent
  • Light weight Java based application
  • Tied to host OS
  • Generates code based on ODI metadata.
  • Communicates source, target, repository.
  • JDBC data transport
  • XML
  • Jetty
  • Interpreters: Jython, JBS, JavaScript, Groovy
  • HSQLDB in memory database
  • Scheduler
  • Sizing
agent1
Agent

Target

  • Least amount of roundtrips. Network (JDBC, XML)
  • One target database server only (DW)

Another Server

  • ODBC drivers
  • JEE agent on Weblogic
  • No support for target OS
  • Resources on target
  • DBA
interfaces
interfaces
  • No!! KM using row by row processing
  • Use ODI functions rather than DB functions
  • Don’t overuse CKM (especially for large data volumes)
  • temp indexes (I$)
  • Gather statistics (C$, I$, TGT when applicable)
  • Rule of thumb: Use loader KMs or db link KMs rather than JDBC KMs
source target
Source/target
  • Schemas on same database server. Physical schema and not data server.
  • Have sources physically close to target
  • Minimize impact on source
  • Chunking
critical path
CRITICAL PATH

Network Paths: Path Durations:

B > E > H 6 + 2 + 11 = 19 B > D > F 6 + 4 + 14 = 24 B > D > G 6 + 4 + 10 = 20A > C > G 9 + 8 + 10 = 27 Critical Path

micro tuning
Micro Tuning
  • JDBC drivers
  • JVM
  • Type 4 or 5 JDBC drivers (Data Direct)
  • Array fetch size.
  • DB packet size.
  • Network packet size.
performance monitoring
Performance Monitoring
  • ODI Log Data Mart
  • Facts
  • Dimensions
  • Metrics
  • Frontend
dbms sqltune util0
Dbms_sqltune_util0
  • dbms_sqltune_util0.sqltext_to_sqlid
  • Link to Data Dictionary Tables
maciej kocon
maciEJKOCON
orchestrating dwh processes
ORCHESTRATING DWHPROCESSES
  • Orchestration of Data Process Flow
    • Standard DWH Process flow orchestration
    • Packages in Oracle Data Integrator 10g
    • Load Plans in Oracle Data Integrator 11g
  • Process Flow use cases - efficiency analysis
  • Alternative scheduling
    • benefits
slide15
TYPICAL DATA FLOW in DWH

1

step

STAGE

E-LT

DATA EXTRACT

loads data from sources

slide16
TYPICAL DATA FLOW in DWH

1

2

step

step

DIMs

STAGE

E-LT

LABEL

provides structured labeling

information

DATA EXTRACT

loads data from sources

slide17
TYPICAL DATA FLOW in DWH

1

2

3

step

step

step

FACTS

DIMs

STAGE

E-LT

FACTS

consists of measurements, metrics or facts

LABEL

provides structured labeling

information

DATA EXTRACT

loads data from sources

slide18
TYPICAL DATA FLOW in DWH

1

2

3

step

step

step

FACTS

DIMs

STAGE

E-LT

FACTS

consists of measurements, metrics or facts

LABEL

provides structured labeling

information

DATA EXTRACT

loads data from sources

data transport &

transform units

slide19
TYPICAL DATA FLOW in DWH

1

2

3

step

step

step

FACTS

DIMs

STAGE

E-LT

FACTS

consists of measurements, metrics or facts

LABEL

provides structured labeling

information

DATA EXTRACT

loads data from sources

data transport &

transform units

ODI 11

Load Plans

ODI 10g

Packages

orchestration

slide20
ORCHESTRATION – ODIPACKAGES

using object directly

PRC_B

PKG_ABC

PKG_DE

INT_A

INT_D

INT_C

INT_E

slide21
ORCHESTRATION – ODIPACKAGES

using object directly

using scenarios – compiled code

PRC_B

PRC_B

SYNCHRONOUS

PKG_ABC

PKG_DE

PKG_ABCDE

PKG_DE

INT_A

INT_C

INT_A

INT_D

INT_C

INT_E

slide22
ORCHESTRATION – ODIPACKAGES

using object directly

using scenarios – compiled code

PRC_B

PRC_B

PRC_B

SYNCHRONOUS

PKG_ABCDE

PKG_ABC

PKG_DE

PKG_ABCDE

PKG_DE

PKG_DE

INT_A

INT_A

INT_C

INT_C

INT_A

INT_D

INT_E

INT_C

ASYNCHRONOUS

slide23
ODI 10g vs. ODI 11

FACTS

DIMs

STAGE

PRC_D

PRC_G

PRC_B

PKG_DM

A

C

D

F

E

G

B

PKG_FG

PKG_DE

PKG_ABC

ODI 10g

Packages

INT_C

INT_F

INT_A

INT_C

slide24
ODI 10g vs. ODI 11

FACTS

DIMs

STAGE

PRC_G

PRC_D

PRC_B

PKG_DM

PKG_FG

PKG_DE

PKG_ABC

ODI 10g

Packages

INT_A

INT_C

INT_F

INT_C

ODI 11

Load plans

slide25
ODI 10g vs. ODI 11

FACTS

DIMs

STAGE

PRC_B

PRC_G

PRC_D

PKG_DM

C

A

D

F

B

G

E

PKG_FG

PKG_DE

PKG_ABC

ODI 10g

Packages

INT_F

INT_C

INT_A

INT_C

ODI 11

Load plans

same

effect!

slide26
PROCESS FLOW EFFICIENCY ANALYSIS

Standard Flow Orchestration: Stage-(stop)DIMs-(stop)Facts

30

A

B

G

C

F

D

E

sequential

10

30

10

10

10

10

parallel

10

30

10

10

30 + 30 + 10 = 70

G

A

B

F

D

E

C

10

30

10

slide27
PROCESS FLOW EFFICIENCY ANALYSIS

Standard Flow Orchestration: Stage-(stop)DIMs-(stop)Facts

30

A

B

G

C

F

D

E

sequential

10

10

10

10

10

30

parallel

10

30

10

10

30 + 30 + 10 = 70

B

D

E

C

G

A

F

10

30

10

  • DOWNSIDES:
  • POSSIBLE INEFFICIENCIES (IDLE RESOURCES)
slide28
PROCESS FLOW EFFICIENCY ANALYSIS

30

A

B

G

C

D

F

E

OPTIMIZATION ATTEMPT

10

10

10

30

10

10

slide29
PROCESS FLOW EFFICIENCY ANALYSIS

30

A

B

G

C

F

D

E

OPTIMIZATION ATTEMPT

sequential

30

10

10

10

10

10

30 + 10

10 + 30

+ 10 = 50

parallel

10

30

10

10

G

F

B

C

A

E

D

10

30

10

70  50 = 1.4 times quicker!

  • UPSIDE:
  • EFFICIENCY IMPROVED
slide33
PROCESS FLOW EFFICIENCY ANALYSIS

30

A

B

G

C

F

D

E

OPTIMIZATION ATTEMPT

sequential

30

10

10

10

10

10

30 + 10

10 + 30

+ 10 = 50

parallel

10

30

10

10

G

F

B

C

A

E

D

10

30

10

70  50 = 1.4 times quicker!

  • UPSIDE:
  • EFFICIENCY IMPROVED
  • DOWNSIDES:
  • TIMINGS KNOWLEDGE REQUIRED
  • OVERALL DEPENDECY KNOWLEDGE REQURED
slide34
PROCESS FLOW EFFICIENCY ANALYSIS

30

A

B

G

C

F

D

E

OPTIMIZATION ATTEMPT

sequential

30

10

10

10

10

10

parallel

10

30

10

70

10

30 + 30 + 10 = 70

A

B

C

E

F

G

D

10

30

10

70

  • DOWNSIDE:
  • INEFFICIENCY EXISTS BUT CAN’T BE RESOLVED
  • CONSUMER WAITING & IMPACT
slide35
TraditionalScheduling - limitations
  • Possible inefficiencies (idle resources)
  • Timings knowledge required
  • Overall dependecy knowledge requred
  • Inefficiency exists but can’t be resolved
  • Consumer waiting & impact
slide36
TraditionalScheduling - limitations
  • Possible inefficiencies (idle resources)
  • Timings knowledge required
  • Overall dependecy knowledge required
  • Inefficiency exists but can’t be resolved
  • Consumer waiting & impact

SCHEDULER

slide37
DEPENDENCY DRIVENScheduling

C

E

A

E

D

C

B

C

E

D

D

A

B

B

A

C

E

E

B

A

C

D

D

A

C

D

E

B

A

C

B

B

D

A

E

slide38
DEPENDENCY DRIVENScheduling

C

A

B

E

D

C

B

D

E

E

C

A

B

A

D

C

D

B

A

C

D

E

E

B

A

D

E

B

A

C

C

B

PACKGAGES&LOAD PLANS

D

A

E

slide39
PROCESS FLOW EFFICIENCY ANALYSIS

sequential

30

30

A

A

B

B

G

G

C

C

D

F

F

D

E

E

30

10

10

10

30

10

10

10

10

10

10

10

parallel

10

30

10

10

30 + 30 + 10 = 70

10

30

10

70

D

C

B

A

F

G

E

70

10

10

10

10

10

30

30

slide40
PROCESS FLOW EFFICIENCY ANALYSIS

sequential

30

30

A

A

B

B

G

G

C

C

F

D

D

F

E

E

30

10

10

30

10

10

10

10

10

10

10

10

parallel

10

30

10

10

30 + 30 + 10 = 70

10

30

10

30

70

G

D

C

E

F

A

B

70

70

10

10

10

10

10

30

30

70  30 = 2.3 times faster!

dependency driven scheduling
Dependency Driven Scheduling
  • Simplifies orchestrating the flow
    • only immediate upstream definition required
    • execution timings not relevant
    • self-adapts in the most effective way
  • Improves overall E-LT performance
    • Less idle resources – better utilization
    • Independency
    • unveilsits full potential in complex Enterprise class DWHs (Inmon)
dependency driven scheduling1
Dependency Driven Scheduling
  • Notifications
    • errors (+auto-restartability)
    • finish summary
    • logging
  • Multiple/overlapping E-LT streams
    • load with different frequencies
  • Parameterization
    • improved system stress control
    • process prioritization
slide43
FIRST RUN

10

processes

slide44
FIRST RUN

TODAY

10

584

1389

DEPENDENCIES

processes

processes

slide45
FIRST RUN

TODAY

10

584

1389

DEPENDENCIES

processes

processes

132 231 SCENARIOS RUN

slide46
FIRST RUN

TODAY

10

584

1389

DEPENDENCIES

processes

processes

132 231 SCENARIOS RUN

12h43m

TIME

LOAD PLANS

slide47
FIRST RUN

TODAY

10

584

1389

DEPENDENCIES

processes

processes

132 231 SCENARIOS RUN

2.9

12h43m

4h21m

TIME

TIMES

FASTER

LOAD PLANS

SCHEDULER

ad