1 / 65

Improve the Speed and Performance

Samar Desai (samadesa@in.ibm.com) Call-in details:. Improve the Speed and Performance. Talk Outline. Query Processing Index Self Join Update Statistics Sorting Index Optimization Partial Column Forest of Trees Fragmentation Overview Fragment by expression Using Hash Functions

Download Presentation

Improve the Speed and Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Samar Desai (samadesa@in.ibm.com) Call-in details: Improve the Speed and Performance

  2. Talk Outline • Query Processing • Index Self Join • Update Statistics • Sorting • Index Optimization • Partial Column • Forest of Trees • Fragmentation Overview • Fragment by expression • Using Hash Functions • New Fragmentation techniques • Miscellaneous • External Tables • Pre-Load C-UDRs

  3. Query Processing

  4. Index Self Join

  5. Purpose of Index Self Join • Improve performance on range based indexes scans which have highly duplicate lead keys • Improve use of subsequent parts of composite index when used with range expression

  6. Situation • Composite index defined on columns (DayOfWeek, FamilySize, part, partdesc). • (DayOfWeek, FamilySize) have lots of duplicates • Only consider to use self joins on columns with distributions Query: SELECT * FROM tab WHERE (DayOfWeek >= 1 and DayOfWeek <= 5) AND (FamilySize >= 2 and FamilySize <= 4) AND (part >= 100 and part <= 102)

  7. Prior to Version 11 Lower Filter DayOfWeek >= 1 Upper Filter DayOfWeek <= 5 The following filters are applied after reading the data row (FamilySize >= 2 and FamilySize <= 4) AND (part >= 100 and part <= 102)

  8. Index Self Join • Original query filters: (DayOfWeek >= 1 and DayOfWeek <= 5) and (FamilySize >= 2 and FamilySize <= 4) and (part >= 100 and part <= 102) • Logically transformed query filters: DayOfWeek = 1 and FamilySize = 2 and part >= 100 and part <= 102 UNION DayOfWeek = 1 and FamilySize = 3 and part >= 100 and part <= 102 UNION DayOfWeek = 1 and FamilySize = 4 and part >= 100 and part <= 102 UNION DayOfWeek = 2 and FamilySize = 2 and part >= 100 and part <= 102 UNION DayOfWeek = 2 and FamilySize = 3 and part >= 100 and part <= 102 Nest Loop Join INDEX INDEX SCAN SCAN (DayOfWeek, FamilySize) (DayOfWeek, FamilySize, part)

  9. Using Index Self Join DayOfWeek = 1 FamilySize = 2 part => 100 & part <=102 Lead Keys: DayOfWeek , FamilySize Lower Filter DayOfWeek = DayOfWeek and FamilySize = FamilySize and part >= 100 Upper Filter part <= 102 DayOfWeek = 1 FamilySize = 3 part => 100 & part <=102 DayOfWeek = 1 FamilySize = 4 part => 100 & part <=102 DayOfWeek = 2 FamilySize = 1 part => 100 & part <=102 Scanned Regions

  10. Setup of Index Self Join Example create raw table customer ( dayofweek integer, familysize integer, part integer, partdesc char(60) ) lock mode row; INSERT INTO customer SELECT mod(A.tabid,7), mod(A.tabid,4), value+A.tabid, tabname FROM sysmaster:systables A, sysmaster:sysshmhdr B, sysmaster:syscolumns WHERE B.number=4; create index ix1 on customer (dayofweek,familysize,part,partdesc) insert into customer values (1,3,77,“JUICE"); insert into customer values (2,3,77," JUICE "); insert into customer values (3,3,78,“MILK"); insert into customer values (4,3,78," MILK "); insert into customer values (5,3,78," MILK "); insert into customer values (1,4,77," JUICE "); insert into customer values (2,4,77," JUICE "); insert into customer values (3,4,78," MILK "); insert into customer values (4,4,78," MILK "); insert into customer values (5,4,78," MILK ");

  11. Sample Queries • Only difference is the directive select {+ AVOID_INDEX_SJ(customer ix1) } count(*) from customer where dayofweek >=1 and dayofweek <=5 AND familysize>=3 and familysize<=4 and partnumber >76 and partnumber < 80; select count(*) from customer where dayofweek >=1 and dayofweek <=5 AND familysize>=3 and familysize<=4 and partnumber >76 and partnumber < 80;

  12. Retrieving Query Results SELECT sql_runtime,sql_bfreads,sql_pgreads,sql_statement[1,30] FROM sysmaster:syssqltrace WHERE sql_sid = DBINFO('sessionid') AND lower(sql_tablelist) matches "*customer*" sql_runtime sql_bfreads sql_statement 1.0920813 303236 select {+ AVOID 0.3031994 92435 select count(*)

  13. Improving Results • Add distributions to the second column in the index update statistics high for table customer(familysize) distributions only; sql_runtime sql_bfreads sql_statement 1.0920813 303236 select {+ AVOID 0.3031994 92435 select count(*) 0.0006977 142 select count(*)

  14. Automatic Read Ahead • What is read ahead • Sequential scan read ahead • Index scan read ahead • Index/Data scan read ahead • Benefits of automatic read ahead • Automatically detects when read ahead would improve the performance of a query • Automatically detects when read ahead would not improve performance of a query • Read ahead daemon • Query detects it waited on i/o so it submits a read ahead request • Start read ahead from the point in the scan where the i/o was detected • Leaf read ahead for index/data scans

  15. What are Light Scans (Recap)? • A sequential scan of large tables which read pages in parallel from disk and store in private buffers in Virtual memory • Advantages of light scans for sequential scans: • Bypass the overhead of the buffer pool when many pages are read • Prevent frequently accessed pages from being forced out of the buffer pool • Transfer larger blocks of data in one I/O operation (64K/128K platform dependent) • Conditions to invoke light scans: • The optimizer chooses a sequential scan of the table • The isolation level dirty read or a shared lock on the table • Monitor using onstat -g scn

  16. Monitoring Scans • onstat -g scn • Displays the status of all scans using BATCHEDREAD RSAM sequential scan info SesID Thread Partnum Rowid Rows Scan'd Scan Type Lock Mode 16 96 20002 8301 2708 Buffpool SLock+Test 16 97 30002 5b10 1898 Light Table RSAM index scan info SesID Thread Partnum Scan Type Lock Mode 15 129 20008 SLock+Test Start Key GTEQ :5: Stop Key GT :10: Current key :6: Current position: buffp 0 pagenum 3 slot 6 rowoff 4 flags 0

  17. Update Statistics

  18. Improved Optimizer Statistics • Auto Statistics on index creation • Improved Tracking of Update Statistics • Improved Temp Table Statistics • Auto Update Statistics

  19. Create Index with Statistics • Statistics (i.e. update statistics high and low) are created automatically when an index is created either implicit or explicit. • Sample size is the data seen during the static phase of the online index build, catch up data is ignored. • Leverages the sorted data produced by create index Little to no additional time is added to the create index

  20. Tracking of Update Statistics • Low / Statistics • Columns added to systables to track last time low was run SELECT ustlowts FROM SYSTABLES WHERE tabname = “foo”; ustlowts 2006-09-26 11:42:50.00000 • Medium & High / Distributions • Four new columns added to sysdistrib to track distribution creation SELECT constr_time, smplsize, rowssmpld, ustnrows FROM sysdistrib WHERE colno=1 AND tabid = (SELECT tabid FROM systables WHERE tabname=‘foo’); constr_time smplsize rowssmpld ustnrows 2007-09-07 13:43:59.00000 0.00 66.00000 66.00000

  21. Temp Table Statistics Improvements • Users are no longer required to run update statistics low on temp tables. • Number of rows and pages is updated every time we access the temp tables data dictionary entry. • Adding indexes to temp tables will automatically create distributions and statistics for the temp table.

  22. Improving Update Statistics • Fragment Level Statistics • When attaching a new fragment only the new fragments needs to be scanned, not the entire table • Only fragments which have expired statistics are scanned • Defining statistics expiration policies at the table level • Detailed tracking of modification to each table and fragment • Automatically skipping tables or fragments whose statistics are not expired • ANSI database implicitly commit after each index/column statistics is updated

  23. Table Level Optimizer Statistics Policies CREATE TABLE …. STATLEVEL [ TABLE | FRAGMENT | AUTO ] STATCHANGE 1...100 • STATLEVEL • Defines the granularity of statistics created for a table • STATCHANGE • Percentage of the table modified before table/fragment statistics are considered expired

  24. New Syntax for “Smarter” Update Statistics UPDATE STATISTICS FOR TABLE …. [ AUTO | FORCE ] • AUTO • Only tables or fragments having expired statistic are re-calculated • FORCE • All indexes and columns listed in the command will have their statistics re-calculated • Default behavior is set by AUTO_STAT_MODE parameter • Enabled by default (i.e. AUTO) 24 Page 24

  25. Customer Comment about Implementing DS_NONPDQ_QUERY_MEM …CPU load average on the server dropped from 90% during peak to less than 50%. I also tracked the actual CPU seconds that oninit was using before and after the change. During peak times the oninit's for this instance was using 5 out of the 8 physical cpu's. After the change we consume less than 3 cpu's.

  26. Large Sort Tasks • PDQ • Enable PDQPRIORITY to allow more threads and resources • Ensure DS_TOTAL_MEMORY in the onconfig file is set properly • Make sure PSORT_NPROCES is set to the number of CPU VPs • DS_MAX_QUERIES is set If large sorting batch jobs run at night consider dynamically increase this at the start of the evening and resetting it to its original value in the early morning with onmode -M

  27. Selecting only the First Set of Rows • First rows can be optimized by not requiring the entire result set be returned • SELECT FIRST N • Used to limit the number of rows returned from the query • SKIP & FIRST • Return the 7 to 17 row SELECT FIRST 10 * FROM employees SELECT SKIP 7 FIRST 10 * FROM employees

  28. First Rows Optimizer Directive • Optimizer to choose the query plan that returns the first record as soon as possible • Always prefers pipeline operations versus blocking operations • Pipeline operations • Nest loop Join • Index Scans • Blocking operations • Sorts • Hash Joins SELECT {+ first_rows } * FROM employees

  29. First Rows Optimization • Very fast to retrieve the first few rows • To process the entire query can takes more resources and time • Set in either: ONCONFIG parameter, Application Statement, Optimizer directive

  30. Index Optimization

  31. Partial Column Index • Ability to create an index on the first N bytes of string column • Benefits • Smaller index size, few pages • Faster searching for long character strings which are unique in the beginning create index index1 on tab1(col1(3));

  32. Create Index with a Specific Extent Size • Create Index with a Specific Extent Size • The create index syntax has been enhanced to support the addition of an extent size for indexes • Better sizing and utilization • Default index extent size is the • index key size / data row size * data extent size CREATE INDEX index_1 ON tab_1(col_1) EXTENT SIZE 32 NEXT SIZE 32;

  33. Creating Constraints without an Index CREATE TABLE parent(c1 INT PRIMARY KEY CONSTRAINT parent_c1, c2 INT, c3 INT); CREATE TABLE child(x1 INT, x2 INT, x3 VARCHAR(32)); ALTER TABLE child ADD CONSTRAINT (FOREIGN KEY(x1) REFERENCES parent(c1) CONSTRAINT cons_child_x1 INDEX DISABLED); • Saves the overhead of the index for small child tables • Bang for the Buck • When the ratio of rows between the parent and the child exceed 10,000 • Removes indexes with large number of duplicates

  34. Parent Child Benefits of Constraints without an Index Every child must have a parent Index to enforce the referential constraint Index to enforce the primary key

  35. B-Tree Index The figure above one Root Node and access to the underlying twig and leave pages are through this page, which is where there can be mutex contention

  36. New Index Type “Forest Of Trees” • Traditional B-tree index suffer from performance issues when many concurrent users access the index • Root Node contention can occur when many session are reading the same index at the same time • The depth of large B-tree index increases the number of levels created, which results in more buffer reads required • Forest Of Tress (FOT) reduces some of the B-tree index issues: • Index is larger but often not deeper • Reduces the time for index traversals to leaf nodes • Index has multiplesubtrees (root nodes) called buckets • Reduces root node contention by enabling more concurrent users to access the index

  37. Forrest of Trees Index (FOT Index) • Reduces contentions on an indexes root node • Several root nodes • Some B-Tree functional is NOT supported • max() and min() create index index_2 on TAB1( C1, C2 ) hash on ( C1 ) with 3 buckets;

  38. FOT - Determining use - ONCHECK & SYSMASTER • Check oncheck –pT information • Check sysmaster database select nhashcols, nbuckets from sysindices Average Average Level Total No. Keys Free Bytes ----- -------- -------- ---------- 1 100 27 21350 2 655 15 4535 ----- -------- -------- ---------- Total 755 42 25885 There are 100 Level 1buckets (Root Nodes)

  39. When to use a FOT Index • Lead index column(s) using equality predicates • Contention on the root node of an index • onstat -g spi | sort -nr Spin locks with waits: Num Waits Num Loops Avg Loop/Wait Name 121434 8283634 68.22 fast mutex, 0:bf[421226] 0x30000b 0x83d61000 9362 644934 68.89 fast mutex, 0:bf[388111] 0x300009 0x7fcb3800 7864 519351 66.04 fast mutex, 0:bf[410993] 0x30000a 0x82964800

  40. Fragmentation Overview • Introduction to Fragmentation • Defining Fragmented Tables • Altering/Attach/Detach • Monitoring/Guidelines • New Fragmentation Schemes

  41. What is Fragmentation? • Also called “Table Partitioning” • Fragmentation is a database server feature that allows you to • Distribute data for a table across separate dbspaces or partitions • Control where data is stored at the row level • Transparent to end users and client applications • No changes for applications whether the table is fragmented or not fragmented • Optimizer automatically takes advantage of the fragment expression

  42. Data in a non-fragmented table Jan Feb Mar

  43. Fragment by Expression Data in 3 table fragments/partitions/data ranges. Jan data Feb data Mar data

  44. Create a Fragmented Table • This syntax (fragments/partitions in different dbspaces) works on all IDS versions. CREATE TABLE employee ( empnum SERIAL, ... ) FRAGMENT BY EXPRESSION empnum <= 2500 IN dbspace1, empnum <= 5000 AND empnum > 2500 IN dbspace2, empnum > 5000 IN dbspace3 EXTENT SIZE 100 NEXT SIZE 60 LOCK MODE ROW;

  45. Using Hash Functions CREATE TABLE employee ( empnum serial, empname char(50) ) FRAGMENT BY EXPRESSION partition p1 mod(empnum,3) = 0 in rootdbs, partition p2 mod(empnum,3) = 1 in rootdbs, partition p3 mod(empnum,3) = 2 in rootdbs; • Hash expression allows fragment elimination when there is an equality search (including inserts and deletes). • Useful when data is accessed via a particular column but the distribution of values within the column is not known.

  46. Using Hash Functions • Only 2/3 of the table is scanned • Scans a single fragment at a time QUERY: (OPTIMIZATION TIMESTAMP: 09-12-2007 17:37:43) ------ select * from employee where empnum in( 6, 7) Estimated Cost: 3 Estimated # of Rows Returned: 1 1) informix.employee: SEQUENTIAL SCAN (Serial, fragments: 0, 1) Filters: informix.employee.empnum IN ( 6, 7 )

  47. Using Hash Functions with PDQ • One fragment eliminated, • Two fragments scan simultaneous QUERY: (OPTIMIZATION TIMESTAMP: 09-12-2007 17:37:43) ------ select * from employee where empnum in( 6, 7) Estimated Cost: 3 Estimated # of Rows Returned: 1 1) informix.employee: SEQUENTIAL SCAN (Parallel, fragments: 0, 1) Filters: informix.employee.empnum IN ( 6, 7 )

  48. Why Fragment? • Fragment Elimination • Improve query performance. • A large table can perform like a smaller one. • Larger Tables • Internally, each table fragment is treated like a table. So, some limits/properties that apply to a table (like maximum number of extents) apply to each fragment/partition. • Balanced I/O • If using PDQ, parallel scans, inserts, joins, sorts, aggregates, groups. (DSS systems)

  49. Indexes on Fragmented Tables • If you do not want to fragment your index, specify the dbspace to put the index in CREATE INDEX <idx1>... IN DBSPACE <dbspace1>; • If dbspace is NOT specified, index will be fragmented, following the fragmentation scheme of the table -- BAD if table is fragmented by ROUND ROBIN. Index

  50. Create Index Statement • Index fragmented by expression Index not fragmented • Index follows tables fragmentation schema CREATE INDEX idx1 ON table1(col1) FRAGMENT BY EXPRESSION col1 < 10000 IN dbspace1, col1 >= 10000 IN dbspace2; CREATE INDEX idx1 ON table1(col1) IN dbspace1; CREATE INDEX idx1 ON table1(col1);

More Related