Db2 10 hash access access path or collision course
Download
1 / 40

DB2 10 Hash Access: Access Path or Collision Course - PowerPoint PPT Presentation


  • 136 Views
  • Uploaded on

DB2 10 Hash Access: Access Path or Collision Course?. Donna Di Carlo BMC Software Session Code: A13 Wednesday, 16 November 2011 | Platform: DB2 for z/OS. Objectives. An introduction to the new DB2 10 Hash Access See how a hashed table and index are physically organized 

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'DB2 10 Hash Access: Access Path or Collision Course' - juancarlos


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Db2 10 hash access access path or collision course

DB2 10 Hash Access: Access Path or Collision Course?

Donna Di Carlo

BMC Software

Session Code: A13

Wednesday, 16 November 2011 | Platform: DB2 for z/OS


Objectives
Objectives

  • An introduction to the new DB2 10 Hash Access

  • See how a hashed table and index are physically organized 

  • Determine if Hash Access is suitable for your application

  • Some rules of thumb on how to define hashed tables and indexes

  • What happens to performance when hashed objects are defined correctly; and worse, what happens when they are defined incorrectly


Db2 access paths
DB2 Access Paths

  • DB2 has several different methods to access data

  • DB2 10 has added hash access to its bag of tricks

    • Single row access

    • Unique non-updateable key

    • Fixed hash space size

Table space Scan

Rid List Scan

Index Only

Prefetch

ORGANIZE BY HASH

UNIQUE (column names) HASH SPACE size K|M|G



Implementing hash access
Implementing Hash Access

  • Create a new table with hash organization

    • Need to specify hash space size

    • Unique key can be comprised of multiple columns, but the length must be less than 256

    • Entire hash area is allocated, plus 1% - 3% for overflow

    • Recommendation

      • Set ZPARM IMPDSDEF=YES so the overflow index is instantiated during the CREATE – avoids penalty for first insert into the overflow


Implementing hash access cont
Implementing Hash Access (cont.)

  • Alter existing table to hash organization

    • Criteria

      • Must be UTS

      • Must contain a unique non-updatable key

      • Must have Reordered Row Format

      • Table size should be relatively static

      • Can’t be MEMBER CLUSTER

      • Can’t have clustering index


Implementing hash access cont1
Implementing Hash Access (cont.)

  • Alter existing table to hash organization (cont.)

    • Table space set in AREOR status

    • Overflow index is created and set in RBDP status

    • Updates and deletes allowed

    • Keys cannot be updated

    • Inserts allowed after overflow index is rebuilt

      • Not a sparse index at this time

    • Must REORG table space to instantiate hash access

      • SHRLEVEL NONE and REORG by part are not supported

    • REBIND


Implementing hash access cont2
Implementing Hash Access (cont.)

  • REORG

    • AUTOESTSPACE(YES/NO)

      • Estimate the hash space size

      • Targets about 8% - 15% overflow

      • Catalog value HASHSPACE remains unchanged

      • Catalog value HASHDATAPAGES updated with new allocation

    • FREEPAGE is ignored

    • PCTFREE is ignored for AUTOESTSPACE(NO)

    • PCTFREE has a new meaning for AUTOESTSPACE(YES)

      • Determines target hash space, then increases it by PCTFREE

      • PCTFREE is ignored for the overflow area


Hash organization cont
Hash Organization (cont.)

  • REORG (cont.)

    • Recommendation

      • Even though AUTOESTSPACE(YES) makes sizing the hash area easy, it’s not fool proof

        • Overflow estimate is less accurate for varying length rows

        • If DSSIZE is not large enough to accommodate the hash area and overflow, SQL and utilities can fail for a PBR

        • It’s expensive for overflows when the hash and overflow are in different data sets of a PBG


Implementing hash access cont3
Implementing Hash Access (cont.)

  • LOAD

    • Does not sort input

    • Hashes each row

    • FREEPAGE and PCTFREE are ignored

    • SLOWWWWWWWW

    • Recommendation

      • Create non-hashed table

      • LOAD

      • ALTER ADD HASHED

      • REORG

      • Instead of creating a new table, you can drop hash organization on an existing table, but…UH OH!


Implementing hash access cont4
Implementing Hash Access (cont.)

  • ALTER TABLE DROP HASH

    • Overflow index is immediately dropped

    • Table space is placed in REORP status – must REORG

    • Cannot recover to a point prior to the DROP

    • Recommendation

      • When defining the unique key think carefully - changing it requires a DROP

      • If you drop hash altogether, consider adding an index




Bird s eye view of hashing select cont
Bird’s Eye View of Hashing – SELECT (cont.)

  • Recommendation

    • Tables with multi-level indexes are good candidates for Hash Access

    • Keep the overflow index small – it will increase the odds of finding the index pages in the buffer pool


A bird s eye view of hashing insert
A Bird’s Eye View of Hashing - INSERT

  • Unique key and hash algorithm determine page number and page location

  • If another row is at this location, row is added to the collision chain, also known as the Anchor ID Map

    • There can be several collisions on a single chain

  • An overflow does not occur until the page is full

    • The row is stored in the overflow area and the overflow index is updated



A bird s eye view of hashing insert cont1
A Bird’s Eye View of Hashing – INSERT (cont.)

  • Recommendation

    • Tables with high insert activity may not be good candidates for Hash Access


Bird s eye view of hashing update
Bird’s Eye View of Hashing - UPDATE

  • Unique key and hash algorithm determines page number and page location

  • If row does not exist on page, access the overflow index

  • If row does exist on page

    • If row length decreased or stayed the same, update in place

    • If row length increased, move row to another place on the page

    • If row length increased, but no more room on page

      • Replace row with a pointer record

      • Move indirect reference to overflow

      • No need to index indirect reference


Bird s eye view of hashing update cont
Bird’s Eye View of Hashing – UPDATE (cont.)

  • Recommendation

    • Tables with high update activity and varying length rows may not be good candidates for Hash Activity

      • VARCHAR, VARGRAPHIC…

      • Compression



Bird s eye view of hashed data range predicate cont
Bird’s Eye View of Hashed Data – Range Predicate (cont.)

  • Recommendation

    • Tables that have queries with range predicates are poor choices for hashed tables

      • What used to be clustered will now be random

    • If you must use a query with a range predicate, consider creating an index to access the data

      • You may be able to force index access, but the query may still require a RID sort


Hash organization1
Hash Organization

  • Header page

    • HPGHASHMOD – number of pages in the Hash Area

      • Rounded down to the nearest prime number

    • HPGHASHFIXIDMAP – number of hash Anchor ID Map entries

      • More on that later

    • HPGLASTHASHPG_P - page number of last page in hash area


Hash organization cont1
Hash Organization (cont.)

  • Space Map

    • Hashed inserts do not have to use the space map to determine if a page has room

      • Hash Area segment entries are not chained - SEGNEXT always zero

      • Free space bit indicators are still maintained

    • During creation, all space maps are formatted for the Hash Area

    • There will always be a dictionary and system page segments, even if the space is not compressed or versioned

      • If more system pages are needed, extra segments will be allocated in the overflow


Data overflow page
Data/Overflow Page

  • The first x14 bytes contain header information

    • Page number, free space management, number of rows

  • The RID map is at the bottom of the page

    • Each RID position contains an offset to the row in the page


Hash home page
Hash Home Page

  • PGCOMB contains hash home page bit

  • Hash header

    • PGAPBEGIN - pointer to beginning of data rows

  • Anchor ID Map contains collision chains

    • The number of anchors, HPGHASHFIXIDMAP, is stored in the header page

    • The number of IDs vary from table to table based on page and row size


Non hash data row
Non-Hash Data Row

  • RID Map

    • Contains offset of row location

    • Can have up to 255 RID values on one page

  • Row header

    • PGSFLAGS

      • Pointer

      • Indirection

      • Compressed

      • Versioned

    • PGSLTH – row length

    • PGSOBD - OBID of table

    • PGSID - RID map ID or version number


  • Hash row
    Hash Row

    • PGSOBD is not needed for UTS

      • One table per table space

      • Hash home pages reuse this value to point to the hash anchor ID map or the next row in the collision chain

      • The first bit in the second byte indicates if this is the last entry in the chain


    Hash algorithm
    Hash Algorithm

    • Key is passed to the hash algorithm

      • Returns a double word integer - HASHEDINT

    • Page Number = HASHEDINT % HPGHASHMOD

    • Anchor ID = HASHEDINT % HPGHASHFIXIDMAP

      • If a row already exists at this anchor, the new row is chained



    Anchor id map and overflow
    Anchor ID Map and Overflow

    • Second byte of anchor ID is updated when overflow occurs

      • It doesn’t necessarily mean there was a collision


    Real time statistics rts
    Real Time Statistics (RTS)

    • SYSTABLESPACESTATS.DATASIZE

      • Total number of data bytes in the table

    • SYSTABLESPACESTATS.TOTALROWS

      • Total number of rows in the table

    • SYSINDEXSPACESTATS .TOTALENTRIES

      • Total number of overflow rows


    Real time statistics rts1
    Real Time Statistics (RTS)

    • SYSTABLESPACESTATS. HASHLASTUSED

      • Last time the table was accessed by hash

    • SYSTABLESPACESTATS. REORGHASHACCESS

      • Number of hash accesses

    • SYSTABLESPACESTATS. REORGSCANACCESS

      • Number of table space scans


    Runstats
    RUNSTATS

    • No new statistics collected

    • Columns for sparse index not valid (contains -2)

      • SYSIBM.SYSINDEXPART

        • LEAFDIST

        • FAROFFPOSF

        • NEAROFFPOSF

      • SYSIBM.SYSCOLUMNS

        • COLCARDF

        • HIGH2KEY

      • SYSIBM.INDEXES

        • CLUSTERRATIOF

    • LEAFNEAR

    • LEAFFAR 

    • LOW2KEY

    • LEAFDIST


    Explain plan table
    Explain & Plan Table

    • ACCESSTYPE

      • H – hash access

      • HN – hash access using an IN predicate

        • Some SQL may be converted by DB2 to IN predicate

      • MH – multiple index scan using the hash overflow index


    Sizing your hash area
    Sizing Your Hash Area

    • Delicate balance between storage utilization and GETPAGE activity

      • Slow response and high CPU if more than one GETPAGE is needed

      • How much are you willing to pay for storage to get excellent response?

    • All of this can be measured!

    GETPAGE

    STORAGE




    Hash vs non hash access getpage counts
    Hash vs. Non-hash Access GETPAGE Counts

    • Range predicates on a hashed table can be disastrous

      • SELECT *

        FROM DMD.DMDPBGTB01

        WHERE COL1_CHAR8 LIKE 'ABC%';

    • Hash table space

      • 32,049 table GETPAGE requests

    • Non-hash table space with clustering unique key

      • 6 table GETPAGE requests

      • 3 index GETPAGE requests

    • Can add extra index to hash


    Summary
    Summary

    • Choose hash candidates carefully

      • Tables with mostly random access

      • Tables with multi-level unique indexes

      • Tables with low insert activity

      • Tables with update activity should have static length rows

    • Monitor to ensure continued efficiency

      • As overflow area grows, so does GETPAGE counts

        • RTS TOTALENTRIES

      • Use REORG AUTOESTSPACE(YES)

    • Beware of table space scans

      • RTS REORGSCANACCESS


    Donna di carlo bmc software donna di carlo@bmc com

    Donna Di CarloBMC [email protected]

    Session

    DB2 10 Hash Access: Access Path or Collision Course?


    ad