1 / 34

Parallel Operations in Data Warehouses

Parallel Operations in Data Warehouses. Objectives. After completing this lesson, you should be able to do the following: Determine when partitionwise joins are used Use the ALTER SESSION command to force parallel operations Create indexes in parallel

Download Presentation

Parallel Operations in Data Warehouses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Operations in Data Warehouses

  2. Objectives • After completing this lesson, you should be able to do the following: • Determine when partitionwise joins are used • Use the ALTER SESSION command to force parallel operations • Create indexes in parallel • Set initialization parameters relating to parallel operations tuning • Query dynamic performance views related to parallel operations tuning • Tune PDML operations

  3. Parallel Query • The various query types that can be parallelized are: • Access methods: • Table scans, fast full index scans • Partitioned index range scans • Various SQL operations:Sorts, aggregations, and set operations • Join methods: • Nested loop, sort merge • Hash, star transformation, partitionwise join

  4. Parallel Partitioned Table Scan • Scan by ROWID for partitioned tables. SELECT /*+ PARALLEL(SALES,9)*/ * FROM SALES; PQ6 PQ4 PQ5 PQ3 PQ7 PQ2 PQ8 PQ1 PQ9 SALES

  5. Parallel Partitioned Index Scan • Scans can be performed by partition for partitioned indexes and tables. The hint must contain the table name, index name, and degree of parallelism. SELECT /*+ PARALLEL_INDEX(c,ic,3)*/ * FROM customers c WHERE cust_city = 'MARSEILLE'; PQ1 Nonprefixed index IC Indexed column cust_city PQ2 PQ3

  6. Partitionwise Joins QS QS QS QS QS QS Partial partitionwise join Full partitionwise join Partitioned table Query slave Partition

  7. Parallel DDL • The parallel DDL statements for nonpartitioned tables and indexes are: • CREATE INDEX • CREATE TABLE ... AS SELECT • ALTER INDEX ... REBUILD • The parallel DDL statements for partitioned tables and indexes are: • CREATE INDEX • CREATE TABLE ... AS SELECT • ALTER TABLE ... MOVE PARTITION • ALTER TABLE ... SPLIT PARTITION • ALTER TABLE ... COALESCE PARTITION • ALTER INDEX ... REBUILD PARTITION • ALTER INDEX ... SPLIT PARTITION

  8. Creating Indexes in Parallel Nonpartitioned index Object Parallel execution server Index piece Data

  9. Creating Indexes in Parallel Global partitioned index Object Parallel execution server Index piece Data

  10. Creating Indexes in Parallel Local partitioned index Object Parallel execution server Index piece Data

  11. Parallel DDL: Example • DDL statements are parallelized by specifying a PARALLEL clause: CREATE BITMAP INDEX fk_sales_prod ON sales(prod_id) PARALLEL 16 LOCAL NOLOGGING;

  12. Parallel DML: Overview • Complement parallel query architecture by providing parallelization for: • INSERT • UPDATE • DELETE • MERGE • Useful when changing big tables

  13. Performance Benefits of Parallel DML • Can dramatically speed up DML transactions that can be parallelized Serial update: … UPDATE sales SET amount_sold=amount_sold*6.55957; … Parallel update: alter session enable parallel DML; UPDATE /*+PARALLEL(sales,12)*/ sales SET amount_sold=amount_sold*6.55957; … …

  14. Enabling Parallel DML • You must enable PDML. • The ALTER SESSION statement enables parallel DML mode: • Parallel queries are still parallelized, even if parallel DML is disabled. • The default mode of a session is DISABLE PARALLEL DML. PARALLEL DML ALTER SESSION ENABLE DISABLE PARALLEL DML ALTER SESSION FORCE PARALLEL n

  15. Parallel DML: Example MERGE /*+ PARALLEL(c,3) PARALLEL(d,3) */ INTO customers c USING diff_customers d ON (d.cust_id = c.cust_id) WHEN MATCHED THEN UPDATE SET c.cust_last_name = d.cust_last_name, c.cust_city = d.cust_city WHEN NOT MATCHED THEN INSERT (c.cust_id,c.cust_last_name) VALUES (d.cust_id,d.cust_last_name);

  16. When to Use Parallel DML • Scenarios where parallel DML is used include: • Refreshing large tables in a data warehouse • Creating intermediate summary tables • Using scoring tables • Updating historical tables • Running batch jobs

  17. Restrictions on Parallel DML • After a PDML statement modifies an object, it is no longer possible to query or modify this object in the same transaction (ORA-12838). • Limited integrity constraint support is provided. • Clustered tables are not supported. • PDML is not allowed: • On tables with enabled triggers • On remote objects • When the operation is part of a distributed transaction

  18. Tuning Parameters for Parallel Execution • The initial computed values of parallel execution parameters should be acceptable in most cases. • They are based on the values of CPU_COUNT and PARALLEL_THREADS_PER_CPU at database startup. • Oracle Corporation recommends that you use the default settings. • Manual tuning of parallel execution is more complex than using default settings. • Manual parallel execution tuning requires more attentive administration than automated tuning. • Manual tuning is prone to user-load and system resource miscalculations.

  19. Using Default Parameter Settings • Parameter • Defaults • Comments • PARALLEL_ADAPTIVE_ • MULTI_USER • TRUE • Throttles DOP requests to prevent system overload • PARALLEL_MAX_SERVERS • CPU_COUNT x PARALLEL_THREADS_PER_CPU x (1; 2 if PGA_AGGREGATE_TARGET > 0) x 5 • Maximizes the number of processes used by parallel execution • PARALLEL_EXECUTION_ MESSAGE_SIZE • 2 KB (port specific) • Increasing 4K to 8K improves parallel performance if SGA is large enough

  20. Balancing the Workload • To optimize performance, all parallel execution servers should have equal workloads. • For parallelization by block range or parallel execution servers, the workload is dynamically divided among the parallel execution servers. • By choosing an appropriate DOP, you can: • Minimize workload skew • Optimize performance

  21. Adaptive Multiuser and DOP • The adaptive multiuser feature adjusts the DOP on the basis of user load. • PARALLEL_ADAPTIVE_MULTI_USER set to: • TRUE improves performance in a multiuser environment (default) • FALSE is used for batch processing • PARALLEL_AUTOMATIC_TUNING has been deprecated in Oracle Database 10g. • Kept for backward compatibility only

  22. Resource Manager and the DOP • Resource plan: NIGHT_PLAN Resourceconsumer group OLTP DSS Allocation method parameters CPU = 25% Max Degree of parallelism = 2 CPU = 75% Max Degree of parallelism = 20

  23. Are There Execution Problems? • Use V$PQ_TQSTAT to find out whether there is data distribution skew or bad object statistics: • Contains traffic statistics between slaves at table queue level • Valid only when queried from the parallel session • Check I/O and CPU bound at the operating system level and decide on the parallelism. • Check for contention.

  24. Data Distribution and V$PQ_TQSTAT SELECT /*+PARALLEL*/ cust_city, sum(amount_sold) FROM sales s, customers c WHERE s.cust_id=c.cust_id GROUP BY cust_city; ... PX SEND QC (RANDOM) :TQ10002 P->S QC (RAND) HASH GROUP BY PCWP PX RECEIVE PCWP PX SEND HASH :TQ10001 P->P HASH HASH GROUP BY PCWP HASH JOIN PCWP PX RECEIVE PCWP PX SEND BROADCAST :TQ10000 P->P BROADCAST PX BLOCK ITERATOR PCWC TABLE ACCESS FULL CUSTOMERS PCWP PX BLOCK ITERATOR PCWC TABLE ACCESS FULL SALES PCWP

  25. Data Distribution and V$PQ_TQSTAT DFO_NUMBER TQ_ID SERVER_TYP PROCESS NUM_ROWS ---------- ----- ---------- ---------- --------- 1 0 Consumer P002 55500 1 0 Consumer P003 55500 1 0 Producer P000 55310 1 0 Producer P001 55690 1 1 Consumer P000 613 1 1 Consumer P001 589 1 1 Producer P002 602 1 1 Producer P003 600 1 2 Consumer QC 608 1 2 Consumer P001 309 1 2 Producer P002 299

  26. Using Other Dynamic Performance Views • General information: • V$FILESTAT • V$SESSTAT, V$SYSSTAT • Information about parallel execution: • V$PX_SESSION • V$PX_PROCESS • V$PX_PROCESS_SYSSTAT • V$PX_SESSTAT • V$PQ_SESSTAT • V$PX_BUFFER_ADVICE • TIMED_STATISTICS should be set to TRUE.

  27. Using V$PX_SESSION SELECT qcsid, sid, server_group "Group", server_set "Set", degree "Degree", req_degree "ReqDegree" FROM V$PX_SESSION ORDER BY 1,3,4; QCSID SID Group Set Degree ReqDegree ----- --- ----- --- ------ ---------- 9 9 9 7 1 1 2 2 9 21 1 1 2 2 9 18 1 2 2 2 9 20 1 2 2 2

  28. Using V$PX_SESSTAT SELECT qcsid, sid, server_group "Group", server_set "Set", name "StatName", VALUE FROM V$PX_SESSTAT A, V$STATNAME B WHERE A.statistic# = B.statistic# AND name = 'physical reads' AND value > 0 ORDER BY 1,3,4; QCSID SID Group Set StatName VALUE ----- --- ----- --- -------------- ----- 9 9 physical reads 3863 9 7 1 1 physical reads 2 9 21 1 1 physical reads 2 9 18 1 2 physical reads 2 9 20 1 2 physical reads 2

  29. Using V$PX_PROCESS SELECT * FROM V$PX_PROCESS; SERV STATUS PID SPID SID SERIAL ---- --------- ------ --------- ------ ------ P002 IN USE 16 16955 21 7729 P003 IN USE 17 16957 20 2921 P004 AVAILABLE 18 16959 P005 AVAILABLE 19 16962 P000 IN USE 12 6999 18 4720 P001 IN USE 13 7004 7 234

  30. Using V$SYSSTAT SELECT name, value FROM V$SYSSTAT WHERE UPPER(name) LIKE '%PARALLEL OPERATIONS%' OR UPPER(name) LIKE '%PARALLELIZED%' OR UPPER(name) LIKE '%PX%';

  31. Using V$SYSSTAT NAME VALUE ------------------------------------------- ----- queries parallelized 347 DML statements parallelized 0 DDL statements parallelized 0 DFO trees parallelized 463 Parallel operations not downgraded 28 Parallel operations downgraded to serial 31 Parallel operations downgraded 75 to 99 pct 252 Parallel operations downgraded 50 to 75 pct 128 Parallel operations downgraded 25 to 50 pct 43 Parallel operations downgraded 1 to 25 pct 12 PX local messages sent 74548 PX local messages recv'd 74128 PX remote messages sent 0 PX remote messages recv'd 0

  32. Tuning PDML • Using local striping for local indexes • Using global striping for global indexes • Increasing INITRANS for global indexes • Using NOLOGGING • Using multiple archivers • Using multiple DBWRs or I/O slaves

  33. Summary • In this lesson, you should have learned how to: • Determine when partitionwise joins are used • Use the ALTER SESSION command to force parallel operations • Create indexes in parallel • Set initialization parameters relating to parallel operations tuning • Query dynamic performance views related to parallel operations tuning • Tune PDML operations

  34. Practice 8: Overview • This practice covers the following topics: • Forcing parallelization of queries • Interpreting a partial parallel partitionwise join execution plan • Forcing parallelization of DMLs • Verifying the requested DOP for a parallel DDL operation

More Related