1 / 62

Agenda

Agenda. ODI Performance ODI Scheduling ODI Deployment /Release. Uli Bethke. Dublin based Blog www.bi-q.ie ODI 2007 Reviewer two ODI books ODI articles OTN Deputy chair OUG BI SIG. Next event 11 th June ODI advanced trainer. ODI performance.

simeon
Download Presentation

Agenda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Agenda • ODI Performance • ODI Scheduling • ODI Deployment/Release

  2. Uli Bethke • Dublin based • Blog www.bi-q.ie • ODI 2007 • Reviewer two ODI books • ODI articles OTN • Deputy chair OUG BI SIG. Next event 11thJune • ODI advanced trainer

  3. ODI performance ODI is a metadata driven (SQL) code generator using code templates (knowledge modules). It uses a Java agent to communicate and send data between source and target systems and the repository over the network.

  4. SQL • > 80%: ODI performance issues = SQL issues => SQL main ODI skill • Perfect your SQL. Advanced SQL. Analytic Functions • Know your database(s) inside out. In particular the target • Understand, write, and modify Knowledge Modules

  5. Agent • Light weight Java based application • Tied to host OS • Generates code based on ODI metadata. • Communicates source, target, repository. • JDBC data transport • XML • Jetty • Interpreters: Jython, JBS, JavaScript, Groovy • HSQLDB in memory database • Scheduler • Sizing

  6. Agent Target • Least amount of roundtrips. Network (JDBC, XML) • One target database server only (DW) Another Server • ODBC drivers • JEE agent on Weblogic • No support for target OS • Resources on target • DBA

  7. interfaces • No!! KM using row by row processing • Use ODI functions rather than DB functions • Don’t overuse CKM (especially for large data volumes) • temp indexes (I$) • Gather statistics (C$, I$, TGT when applicable) • Rule of thumb: Use loader KMs or db link KMs rather than JDBC KMs

  8. Source/target • Schemas on same database server. Physical schema and not data server. • Have sources physically close to target • Minimize impact on source • Chunking

  9. CRITICAL PATH Network Paths: Path Durations: B > E > H 6 + 2 + 11 = 19 B > D > F 6 + 4 + 14 = 24 B > D > G 6 + 4 + 10 = 20A > C > G 9 + 8 + 10 = 27 Critical Path

  10. Micro Tuning • JDBC drivers • JVM • Type 4 or 5 JDBC drivers (Data Direct) • Array fetch size. • DB packet size. • Network packet size.

  11. Performance Monitoring • ODI Log Data Mart • Facts • Dimensions • Metrics • Frontend

  12. Dbms_sqltune_util0 • dbms_sqltune_util0.sqltext_to_sqlid • Link to Data Dictionary Tables

  13. maciEJKOCON • Dublin based • ODI 2005 (Sunopsis) • Reviewer two ODI books • Blog www.bi-q.ie • maciek@bi-q.ie

  14. ORCHESTRATING DWHPROCESSES • Orchestration of Data Process Flow • Standard DWH Process flow orchestration • Packages in Oracle Data Integrator 10g • Load Plans in Oracle Data Integrator 11g • Process Flow use cases - efficiency analysis • Alternative scheduling • benefits

  15. TYPICAL DATA FLOW in DWH 1 step STAGE E-LT DATA EXTRACT loads data from sources

  16. TYPICAL DATA FLOW in DWH 1 2 step step DIMs STAGE E-LT LABEL provides structured labeling information DATA EXTRACT loads data from sources

  17. TYPICAL DATA FLOW in DWH 1 2 3 step step step FACTS DIMs STAGE E-LT FACTS consists of measurements, metrics or facts LABEL provides structured labeling information DATA EXTRACT loads data from sources

  18. TYPICAL DATA FLOW in DWH 1 2 3 step step step FACTS DIMs STAGE E-LT FACTS consists of measurements, metrics or facts LABEL provides structured labeling information DATA EXTRACT loads data from sources data transport & transform units

  19. TYPICAL DATA FLOW in DWH 1 2 3 step step step FACTS DIMs STAGE E-LT FACTS consists of measurements, metrics or facts LABEL provides structured labeling information DATA EXTRACT loads data from sources data transport & transform units ODI 11 Load Plans ODI 10g Packages orchestration

  20. ORCHESTRATION – ODIPACKAGES using object directly PRC_B PKG_ABC PKG_DE INT_A INT_D INT_C INT_E 

  21. ORCHESTRATION – ODIPACKAGES using object directly using scenarios – compiled code PRC_B PRC_B SYNCHRONOUS PKG_ABC PKG_DE PKG_ABCDE PKG_DE INT_A INT_C INT_A INT_D INT_C INT_E 

  22. ORCHESTRATION – ODIPACKAGES using object directly using scenarios – compiled code PRC_B PRC_B PRC_B SYNCHRONOUS PKG_ABCDE PKG_ABC PKG_DE PKG_ABCDE PKG_DE PKG_DE INT_A INT_A INT_C INT_C INT_A INT_D INT_E INT_C  ASYNCHRONOUS

  23. ODI 10g vs. ODI 11 FACTS DIMs STAGE PRC_D PRC_G PRC_B PKG_DM A C D F E G B PKG_FG PKG_DE PKG_ABC ODI 10g Packages INT_C INT_F INT_A INT_C

  24. ODI 10g vs. ODI 11 FACTS DIMs STAGE PRC_G PRC_D PRC_B PKG_DM PKG_FG PKG_DE PKG_ABC ODI 10g Packages INT_A INT_C INT_F INT_C ODI 11 Load plans

  25. ODI 10g vs. ODI 11 FACTS DIMs STAGE PRC_B PRC_G PRC_D PKG_DM C A D F B G E PKG_FG PKG_DE PKG_ABC ODI 10g Packages INT_F INT_C INT_A INT_C ODI 11 Load plans same effect!

  26. PROCESS FLOW EFFICIENCY ANALYSIS Standard Flow Orchestration: Stage-(stop)DIMs-(stop)Facts 30 A B G C F D E sequential 10 30 10 10 10 10 parallel 10 30 10 10 30 + 30 + 10 = 70 G A B F D E C 10 30 10

  27. PROCESS FLOW EFFICIENCY ANALYSIS Standard Flow Orchestration: Stage-(stop)DIMs-(stop)Facts 30 A B G C F D E sequential 10 10 10 10 10 30 parallel 10 30 10 10 30 + 30 + 10 = 70 B D E C G A F 10 30 10 • DOWNSIDES: • POSSIBLE INEFFICIENCIES (IDLE RESOURCES)

  28. PROCESS FLOW EFFICIENCY ANALYSIS 30 A B G C D F E OPTIMIZATION ATTEMPT 10 10 10 30 10 10

  29. PROCESS FLOW EFFICIENCY ANALYSIS 30 A B G C F D E OPTIMIZATION ATTEMPT sequential 30 10 10 10 10 10 30 + 10 10 + 30 + 10 = 50 parallel 10 30 10 10 G F B C A E D 10 30 10 70  50 = 1.4 times quicker! • UPSIDE: • EFFICIENCY IMPROVED

  30. ADVANCEDData Flow example

  31. Enterprise DWH Data Flow example

  32. Enterprise DWH Data Flow example

  33. PROCESS FLOW EFFICIENCY ANALYSIS 30 A B G C F D E OPTIMIZATION ATTEMPT sequential 30 10 10 10 10 10 30 + 10 10 + 30 + 10 = 50 parallel 10 30 10 10 G F B C A E D 10 30 10 70  50 = 1.4 times quicker! • UPSIDE: • EFFICIENCY IMPROVED • DOWNSIDES: • TIMINGS KNOWLEDGE REQUIRED • OVERALL DEPENDECY KNOWLEDGE REQURED

  34. PROCESS FLOW EFFICIENCY ANALYSIS 30 A B G C F D E OPTIMIZATION ATTEMPT sequential 30 10 10 10 10 10 parallel 10 30 10 70 10 30 + 30 + 10 = 70 A B C E F G D 10 30 10 70 • DOWNSIDE: • INEFFICIENCY EXISTS BUT CAN’T BE RESOLVED • CONSUMER WAITING & IMPACT

  35. TraditionalScheduling - limitations • Possible inefficiencies (idle resources) • Timings knowledge required • Overall dependecy knowledge requred • Inefficiency exists but can’t be resolved • Consumer waiting & impact

  36. TraditionalScheduling - limitations • Possible inefficiencies (idle resources) • Timings knowledge required • Overall dependecy knowledge required • Inefficiency exists but can’t be resolved • Consumer waiting & impact SCHEDULER

  37. DEPENDENCY DRIVENScheduling C E A E D C B C E D D A B B A C E E B A C D D A C D E B A C B B D A E

  38. DEPENDENCY DRIVENScheduling C A B E D C B D E E C A B A D C D B A C D E E B A D E B A C C B PACKGAGES&LOAD PLANS D A E

  39. PROCESS FLOW EFFICIENCY ANALYSIS sequential 30 30 A A B B G G C C D F F D E E 30 10 10 10 30 10 10 10 10 10 10 10 parallel 10 30 10 10 30 + 30 + 10 = 70 10 30 10 70 D C B A F G E 70 10 10 10 10 10 30 30

  40. PROCESS FLOW EFFICIENCY ANALYSIS sequential 30 30 A A B B G G C C F D D F E E 30 10 10 30 10 10 10 10 10 10 10 10 parallel 10 30 10 10 30 + 30 + 10 = 70 10 30 10 30 70 G D C E F A B 70 70 10 10 10 10 10 30 30 70  30 = 2.3 times faster!

  41. Dependency Driven Scheduling • Simplifies orchestrating the flow • only immediate upstream definition required • execution timings not relevant • self-adapts in the most effective way • Improves overall E-LT performance • Less idle resources – better utilization • Independency • unveilsits full potential in complex Enterprise class DWHs (Inmon)

  42. Dependency Driven Scheduling • Notifications • errors (+auto-restartability) • finish summary • logging • Multiple/overlapping E-LT streams • load with different frequencies • Parameterization • improved system stress control • process prioritization

  43. FIRST RUN 10 processes

  44. FIRST RUN TODAY 10 584 1389 DEPENDENCIES processes processes

  45. FIRST RUN TODAY 10 584 1389 DEPENDENCIES processes processes 132 231 SCENARIOS RUN

  46. FIRST RUN TODAY 10 584 1389 DEPENDENCIES processes processes 132 231 SCENARIOS RUN 12h43m TIME LOAD PLANS

  47. FIRST RUN TODAY 10 584 1389 DEPENDENCIES processes processes 132 231 SCENARIOS RUN 2.9 12h43m 4h21m TIME TIMES FASTER LOAD PLANS SCHEDULER

  48. Enterprise DWH Data FloW

  49. Release 1.0

More Related