1 / 45

1, 2, 3… Scatter!

1, 2, 3… Scatter!. Getting Your Humpty-Dumpty Database in Order. Tom Bascom, White Star Software tom@wss.com. A Few Words about the Speaker. Tom Bascom ; Progress user & roaming DBA since 1987 VP, White Star Software, LLC

liana
Download Presentation

1, 2, 3… Scatter!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1, 2, 3…Scatter! Getting Your Humpty-Dumpty Database in Order. Tom Bascom, White Star Software tom@wss.com

  2. A Few Words about the Speaker • Tom Bascom; Progress user & roaming DBA since 1987 • VP, White Star Software, LLC • Expert consulting services related to all aspects of Progress and OpenEdge. • tom@wss.com • President, DBAppraise, LLC • Remote database management service for OpenEdge. • Simplifying the job of managing and monitoring the world’s best business applications. • tom@dbappraise.com

  3. “Fragmentation” vs “Scatter”

  4. Fragmentation $ proutil dbname –C dbanalys > dbname.dba … RECORD BLOCK SUMMARY FOR AREA "APP_FLAGS_Dat" : 95 ------------------------------------------------------- Record Size (B) -Fragments- Scatter Table Records Size Min Max Mean Count Factor Factor PUB.APP_FLAGS 1676180 47.9M 28 58 29 1676190 1.0 1.9 … “Fragmentation” is splitting records into multiple pieces.

  5. Fragmentation $ proutil dbname –C dbanalys > dbname.dba … RECORD BLOCK SUMMARY FOR AREA "APP_FLAGS_Dat" : 95 ------------------------------------------------------- Record Size (B) -Fragments- Scatter Table Records Size Min Max Mean Count Factor Factor PUB.APP_FLAGS 1676180 47.9M 28 58 29 1676190 1.0 1.9 … 10 additional fragments Beware! “Fragmentation” is splitting records into multiple pieces.

  6. Fragmentation Occurs When Record data is too big for the block: i.e. 16k of data going into a 4k block. Updated data needs more room to expand than is available. The “create limit” and the “toss limit” can be used to reserve more free space in blocks and control fragmentation. Progress will automatically “de-frag” when possible (10.1+).

  7. Scatter $ proutil dbname –C dbanalys > dbname.dba … RECORD BLOCK SUMMARY FOR AREA "APP_FLAGS_Dat" : 95 ------------------------------------------------------- Record Size (B) -Fragments- Scatter Table Records Size Min Max Mean Count Factor Factor PUB.APP_FLAGS 1676180 47.9M 28 58 29 1676190 1.0 1.9 … “Scatter” is a measure of the “sequentialness” of records.

  8. What is “Scatter Factor”? The “factor” does not care about any ordering – not even primary key. For type 1 areas it is a measure of how well the data fits into the minimum # of blocks that would be required to hold it with “distance” between blocks taken into account. For type 2 areas there is no distance penalty – but free space in a cluster can increase scatter. “Logically adjacent” isn’t really reported by dbanalys.

  9. Fragmentation and Scatter “Fragmentation” is splitting records into multiple pieces. “Scatter” is a measure of the “sequentialness” of records. The “scatter factor” that proutil reports might be better described as “density”. If you have two or more indexes at least one of them is probably scattered.

  10. Why is ThisImportant?

  11. Locality of Reference When data is referenced there is a high probability that it will be referenced again soon. If data is referenced there is a high probability that “nearby” data will be referenced soon. Locality of reference is why caching exists at all levels of computing.

  12. How Does Progress Help? • Temporal Locality (Will be reused “soon”) • -B (LRU Chain) • -mmax • -Bt • Spatial Locality (“nearby” data will be accessed) • Type 2 storage areas • -B2 (no LRU) • Memory mapped prolib

  13. Cache Effectiveness

  14. Logical Scatter

  15. Definition Logical Scatter is the probability that records in a given logical order are also in “physical” order (in the same block). Each index has its own ordering and thus its own logical scatter. It is very unlikely that more than one index will be well ordered. It is quite possible that all indexes might be scattered.

  16. Type 1 Storage Area

  17. Type 2 Storage Area

  18. Tangent… The preceding slides should be all you need to see in order to be convinced that type 1 areas are a bad place to be putting data. The schema area is always a type 1 area. Should it have data in it?

  19. How to Determine “Logical Scatter”? You could read the whole database… Multiple times… (Because every index must be considered) -or- For each table randomly choose a record. For each index of that table find the NEXT record. Is it in the same block? Lather, Rinse and Repeat.

  20. Type 2 Storage Area Id = 100% Name = 25% City 19%

  21. Which Index?

  22. 4GL Index Selection compile cust.pxref“cust.xrf” /* cust.p */ for each customer no-lock: display custNum. end. cust.p1 COMPILE cust.p cust.p1 CPINTERNAL ISO8859-1 cust.p1 CPSTREAM ISO8859-1 cust.p1 STRING "Customer" 8 NONE UNTRANSLATABLE cust.p1 SEARCH sports2000.Customer CustNumWHOLE-INDEX cust.p2 ACCESS sports2000.Customer CustNum cust.p2 STRING ">>>>9" 5 NONE TRANSLATABLE FORMAT cust.p3 STRING "CustNum" 8 LEFT TRANSLATABLE cust.p3 STRING "CustNum" 7 NONE UNTRANSLATABLE cust.p3 STRING "--------" 8 NONE UNTRANSLATABLE

  23. Amdahl’s Law The performance enhancement possible with a given improvement is limited by the fraction of the execution time that the improved feature is used.

  24. Compile Time is Not Enough Dynamic Queries SQL-92 Cost Based Optimizer Unreached Code Rarely Run Code Widespread, but Low Impact Code

  25. Execution Time Index Usage for each _indexStatno-lock: find _index no-lock where _index._idx-num = _indexStat._indexStat-id no-error. if available( _index ) then do: find _file no-lock where recid( _file ) = _index._file-recid no-error. display _indexStat._indexStat-id "*" when ( _file._prime-index = recid( _index )) "u" when ( _index._unique = true ) _file._file-name when available _file _index._index-name when available _index _indexStat._indexStat-read . end. end.

  26. Execution Time Index Usage

  27. VST Note define variable t as integer no-undo label “Tables”. define variable i as integer no-undo label “Indexes”. for each _file no-lock where _hidden = no: t = t + 1. end. for each _index no-lock: i = i + 1. end. display t i. -tablerangesize 1000 -indexrangesize 3000 The default db settings only collect statistics for the first 50 tables and indexes. To fix this:

  28. Case Study

  29. Logical Scatter Case Study • A process reading approximately 1,000,000 records. • An initial run time of 2 hours. • 139 records/sec. • Un-optimized database.

  30. Baseline 4k DB Block Type 1 Area • -B 25,000 • Hit Ratio 95% • 19,208 IO ops • Run time 2 hours

  31. Round 1 – Increase Big B 4k DB Block Type 1 Area • -B 100,000 • Hit Ratio 98% • 9,816 IO ops • Run time 60 minutes

  32. Round 2 – Increase Some More 4k DB Block Type 1 Area • -B 200,000 • Hit Ratio 99% • 6,416 IO ops • Run time 40 minutes

  33. Restructure DB Dump & Load Convert to 8KB DB Blocks Convert to Type 2 Storage Areas

  34. Round 3 – Back to Baseline –B 8k DB Block Type 2 Areas • -B 12,500 • Hit Ratio 95% • 9,417 IO ops • Run time 55 minutes

  35. Round 4 – Bump Big B 8k DB Block Type 2 Areas • -B 50,000 • Hit Ratio 98% • 4,746 IO ops • Run time 30 minutes

  36. Round 5 – Big B … Again 8k DB Block Type 2 Areas • -B 100,000 • Hit Ratio 99% • 3,192 IO ops • Run time 20 minutes

  37. Are We Done? 8k DB Block Type 2 Areas • The most used index is not the most sequential index!

  38. Restructure DB • Dump & Load • Dump Table 2 using the most used index: • t2_idx1 • Load Normally

  39. Why Not? • Return on Investment: • Pickups from improving %SEQ are less than those from improving Hit Ratio. • That last 15% is a drop in the bucket compared to the 6x improvement already gained. • Expected improvement would be about 2% -- of 20 minutes. Or around 24 seconds.

  40. However… • At low buffer hit ratios (95% or lower): • Restructuring to favor the most used index results in a 60% improvement in time. • And the hit ratio improves to 99.75%. • By eliminating 95% of the disk IO ops (112,247 -> 5196). • On the other hand… the system in question has grown again and it may now be worth revisiting.

  41. Conclusion Type 2 Storage Areas improve “logical scatter”. Addressing “logical scatter” can be a powerful performance improvement technique. Addressing “logical scatter” can be an alternative to increasing –B in environments where shared memory is constrained.

  42. Questions?

  43. Questions? Should I use USE-INDEX to force a “well ordered” index? Why might scatter grow over time? I have two (or more) conflicting, but very important, needs. What can I do?

  44. Thank You!

More Related