Db2 10 hash access access path or collision course
1 / 40

DB2 10 Hash Access: Access Path or Collision Course - PowerPoint PPT Presentation

  • Uploaded on

DB2 10 Hash Access: Access Path or Collision Course?. Donna Di Carlo BMC Software Session Code: A13 Wednesday, 16 November 2011 | Platform: DB2 for z/OS. Objectives. An introduction to the new DB2 10 Hash Access See how a hashed table and index are physically organized 

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'DB2 10 Hash Access: Access Path or Collision Course' - juancarlos

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Db2 10 hash access access path or collision course

DB2 10 Hash Access: Access Path or Collision Course?

Donna Di Carlo

BMC Software

Session Code: A13

Wednesday, 16 November 2011 | Platform: DB2 for z/OS


  • An introduction to the new DB2 10 Hash Access

  • See how a hashed table and index are physically organized 

  • Determine if Hash Access is suitable for your application

  • Some rules of thumb on how to define hashed tables and indexes

  • What happens to performance when hashed objects are defined correctly; and worse, what happens when they are defined incorrectly

Db2 access paths
DB2 Access Paths

  • DB2 has several different methods to access data

  • DB2 10 has added hash access to its bag of tricks

    • Single row access

    • Unique non-updateable key

    • Fixed hash space size

Table space Scan

Rid List Scan

Index Only



UNIQUE (column names) HASH SPACE size K|M|G

Implementing hash access
Implementing Hash Access

  • Create a new table with hash organization

    • Need to specify hash space size

    • Unique key can be comprised of multiple columns, but the length must be less than 256

    • Entire hash area is allocated, plus 1% - 3% for overflow

    • Recommendation

      • Set ZPARM IMPDSDEF=YES so the overflow index is instantiated during the CREATE – avoids penalty for first insert into the overflow

Implementing hash access cont
Implementing Hash Access (cont.)

  • Alter existing table to hash organization

    • Criteria

      • Must be UTS

      • Must contain a unique non-updatable key

      • Must have Reordered Row Format

      • Table size should be relatively static

      • Can’t be MEMBER CLUSTER

      • Can’t have clustering index

Implementing hash access cont1
Implementing Hash Access (cont.)

  • Alter existing table to hash organization (cont.)

    • Table space set in AREOR status

    • Overflow index is created and set in RBDP status

    • Updates and deletes allowed

    • Keys cannot be updated

    • Inserts allowed after overflow index is rebuilt

      • Not a sparse index at this time

    • Must REORG table space to instantiate hash access

      • SHRLEVEL NONE and REORG by part are not supported

    • REBIND

Implementing hash access cont2
Implementing Hash Access (cont.)



      • Estimate the hash space size

      • Targets about 8% - 15% overflow

      • Catalog value HASHSPACE remains unchanged

      • Catalog value HASHDATAPAGES updated with new allocation

    • FREEPAGE is ignored

    • PCTFREE is ignored for AUTOESTSPACE(NO)

    • PCTFREE has a new meaning for AUTOESTSPACE(YES)

      • Determines target hash space, then increases it by PCTFREE

      • PCTFREE is ignored for the overflow area

Hash organization cont
Hash Organization (cont.)

  • REORG (cont.)

    • Recommendation

      • Even though AUTOESTSPACE(YES) makes sizing the hash area easy, it’s not fool proof

        • Overflow estimate is less accurate for varying length rows

        • If DSSIZE is not large enough to accommodate the hash area and overflow, SQL and utilities can fail for a PBR

        • It’s expensive for overflows when the hash and overflow are in different data sets of a PBG

Implementing hash access cont3
Implementing Hash Access (cont.)

  • LOAD

    • Does not sort input

    • Hashes each row

    • FREEPAGE and PCTFREE are ignored


    • Recommendation

      • Create non-hashed table

      • LOAD


      • REORG

      • Instead of creating a new table, you can drop hash organization on an existing table, but…UH OH!

Implementing hash access cont4
Implementing Hash Access (cont.)


    • Overflow index is immediately dropped

    • Table space is placed in REORP status – must REORG

    • Cannot recover to a point prior to the DROP

    • Recommendation

      • When defining the unique key think carefully - changing it requires a DROP

      • If you drop hash altogether, consider adding an index

Bird s eye view of hashing select cont
Bird’s Eye View of Hashing – SELECT (cont.)

  • Recommendation

    • Tables with multi-level indexes are good candidates for Hash Access

    • Keep the overflow index small – it will increase the odds of finding the index pages in the buffer pool

A bird s eye view of hashing insert
A Bird’s Eye View of Hashing - INSERT

  • Unique key and hash algorithm determine page number and page location

  • If another row is at this location, row is added to the collision chain, also known as the Anchor ID Map

    • There can be several collisions on a single chain

  • An overflow does not occur until the page is full

    • The row is stored in the overflow area and the overflow index is updated

A bird s eye view of hashing insert cont1
A Bird’s Eye View of Hashing – INSERT (cont.)

  • Recommendation

    • Tables with high insert activity may not be good candidates for Hash Access

Bird s eye view of hashing update
Bird’s Eye View of Hashing - UPDATE

  • Unique key and hash algorithm determines page number and page location

  • If row does not exist on page, access the overflow index

  • If row does exist on page

    • If row length decreased or stayed the same, update in place

    • If row length increased, move row to another place on the page

    • If row length increased, but no more room on page

      • Replace row with a pointer record

      • Move indirect reference to overflow

      • No need to index indirect reference

Bird s eye view of hashing update cont
Bird’s Eye View of Hashing – UPDATE (cont.)

  • Recommendation

    • Tables with high update activity and varying length rows may not be good candidates for Hash Activity


      • Compression

Bird s eye view of hashed data range predicate cont
Bird’s Eye View of Hashed Data – Range Predicate (cont.)

  • Recommendation

    • Tables that have queries with range predicates are poor choices for hashed tables

      • What used to be clustered will now be random

    • If you must use a query with a range predicate, consider creating an index to access the data

      • You may be able to force index access, but the query may still require a RID sort

Hash organization1
Hash Organization

  • Header page

    • HPGHASHMOD – number of pages in the Hash Area

      • Rounded down to the nearest prime number

    • HPGHASHFIXIDMAP – number of hash Anchor ID Map entries

      • More on that later

    • HPGLASTHASHPG_P - page number of last page in hash area

Hash organization cont1
Hash Organization (cont.)

  • Space Map

    • Hashed inserts do not have to use the space map to determine if a page has room

      • Hash Area segment entries are not chained - SEGNEXT always zero

      • Free space bit indicators are still maintained

    • During creation, all space maps are formatted for the Hash Area

    • There will always be a dictionary and system page segments, even if the space is not compressed or versioned

      • If more system pages are needed, extra segments will be allocated in the overflow

Data overflow page
Data/Overflow Page

  • The first x14 bytes contain header information

    • Page number, free space management, number of rows

  • The RID map is at the bottom of the page

    • Each RID position contains an offset to the row in the page

Hash home page
Hash Home Page

  • PGCOMB contains hash home page bit

  • Hash header

    • PGAPBEGIN - pointer to beginning of data rows

  • Anchor ID Map contains collision chains

    • The number of anchors, HPGHASHFIXIDMAP, is stored in the header page

    • The number of IDs vary from table to table based on page and row size

Non hash data row
Non-Hash Data Row

  • RID Map

    • Contains offset of row location

    • Can have up to 255 RID values on one page

  • Row header


      • Pointer

      • Indirection

      • Compressed

      • Versioned

    • PGSLTH – row length

    • PGSOBD - OBID of table

    • PGSID - RID map ID or version number

  • Hash row
    Hash Row

    • PGSOBD is not needed for UTS

      • One table per table space

      • Hash home pages reuse this value to point to the hash anchor ID map or the next row in the collision chain

      • The first bit in the second byte indicates if this is the last entry in the chain

    Hash algorithm
    Hash Algorithm

    • Key is passed to the hash algorithm

      • Returns a double word integer - HASHEDINT

    • Page Number = HASHEDINT % HPGHASHMOD


      • If a row already exists at this anchor, the new row is chained

    Anchor id map and overflow
    Anchor ID Map and Overflow

    • Second byte of anchor ID is updated when overflow occurs

      • It doesn’t necessarily mean there was a collision

    Real time statistics rts
    Real Time Statistics (RTS)


      • Total number of data bytes in the table


      • Total number of rows in the table


      • Total number of overflow rows

    Real time statistics rts1
    Real Time Statistics (RTS)


      • Last time the table was accessed by hash


      • Number of hash accesses


      • Number of table space scans


    • No new statistics collected

    • Columns for sparse index not valid (contains -2)


        • LEAFDIST

        • FAROFFPOSF



        • COLCARDF

        • HIGH2KEY




    • LEAFFAR 

    • LOW2KEY


    Explain plan table
    Explain & Plan Table


      • H – hash access

      • HN – hash access using an IN predicate

        • Some SQL may be converted by DB2 to IN predicate

      • MH – multiple index scan using the hash overflow index

    Sizing your hash area
    Sizing Your Hash Area

    • Delicate balance between storage utilization and GETPAGE activity

      • Slow response and high CPU if more than one GETPAGE is needed

      • How much are you willing to pay for storage to get excellent response?

    • All of this can be measured!



    Hash vs non hash access getpage counts
    Hash vs. Non-hash Access GETPAGE Counts

    • Range predicates on a hashed table can be disastrous

      • SELECT *



    • Hash table space

      • 32,049 table GETPAGE requests

    • Non-hash table space with clustering unique key

      • 6 table GETPAGE requests

      • 3 index GETPAGE requests

    • Can add extra index to hash


    • Choose hash candidates carefully

      • Tables with mostly random access

      • Tables with multi-level unique indexes

      • Tables with low insert activity

      • Tables with update activity should have static length rows

    • Monitor to ensure continued efficiency

      • As overflow area grows, so does GETPAGE counts



    • Beware of table space scans


    Donna di carlo bmc software donna di carlo@bmc com

    Donna Di CarloBMC [email protected]


    DB2 10 Hash Access: Access Path or Collision Course?