Db2 10 hash access access path or collision course
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

DB2 10 Hash Access: Access Path or Collision Course? PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on
  • Presentation posted in: General

DB2 10 Hash Access: Access Path or Collision Course?. Donna Di Carlo BMC Software Session Code: A13 Wednesday, 16 November 2011 | Platform: DB2 for z/OS. Objectives. An introduction to the new DB2 10 Hash Access See how a hashed table and index are physically organized 

Download Presentation

DB2 10 Hash Access: Access Path or Collision Course?

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Db2 10 hash access access path or collision course

DB2 10 Hash Access: Access Path or Collision Course?

Donna Di Carlo

BMC Software

Session Code: A13

Wednesday, 16 November 2011 | Platform: DB2 for z/OS


Objectives

Objectives

  • An introduction to the new DB2 10 Hash Access

  • See how a hashed table and index are physically organized 

  • Determine if Hash Access is suitable for your application

  • Some rules of thumb on how to define hashed tables and indexes

  • What happens to performance when hashed objects are defined correctly; and worse, what happens when they are defined incorrectly


Db2 access paths

DB2 Access Paths

  • DB2 has several different methods to access data

  • DB2 10 has added hash access to its bag of tricks

    • Single row access

    • Unique non-updateable key

    • Fixed hash space size

Table space Scan

Rid List Scan

Index Only

Prefetch

ORGANIZE BY HASH

UNIQUE (column names) HASH SPACE size K|M|G


Hash organization

Hash Organization


Implementing hash access

Implementing Hash Access

  • Create a new table with hash organization

    • Need to specify hash space size

    • Unique key can be comprised of multiple columns, but the length must be less than 256

    • Entire hash area is allocated, plus 1% - 3% for overflow

    • Recommendation

      • Set ZPARM IMPDSDEF=YES so the overflow index is instantiated during the CREATE – avoids penalty for first insert into the overflow


Implementing hash access cont

Implementing Hash Access (cont.)

  • Alter existing table to hash organization

    • Criteria

      • Must be UTS

      • Must contain a unique non-updatable key

      • Must have Reordered Row Format

      • Table size should be relatively static

      • Can’t be MEMBER CLUSTER

      • Can’t have clustering index


Implementing hash access cont1

Implementing Hash Access (cont.)

  • Alter existing table to hash organization (cont.)

    • Table space set in AREOR status

    • Overflow index is created and set in RBDP status

    • Updates and deletes allowed

    • Keys cannot be updated

    • Inserts allowed after overflow index is rebuilt

      • Not a sparse index at this time

    • Must REORG table space to instantiate hash access

      • SHRLEVEL NONE and REORG by part are not supported

    • REBIND


Implementing hash access cont2

Implementing Hash Access (cont.)

  • REORG

    • AUTOESTSPACE(YES/NO)

      • Estimate the hash space size

      • Targets about 8% - 15% overflow

      • Catalog value HASHSPACE remains unchanged

      • Catalog value HASHDATAPAGES updated with new allocation

    • FREEPAGE is ignored

    • PCTFREE is ignored for AUTOESTSPACE(NO)

    • PCTFREE has a new meaning for AUTOESTSPACE(YES)

      • Determines target hash space, then increases it by PCTFREE

      • PCTFREE is ignored for the overflow area


Hash organization cont

Hash Organization (cont.)

  • REORG (cont.)

    • Recommendation

      • Even though AUTOESTSPACE(YES) makes sizing the hash area easy, it’s not fool proof

        • Overflow estimate is less accurate for varying length rows

        • If DSSIZE is not large enough to accommodate the hash area and overflow, SQL and utilities can fail for a PBR

        • It’s expensive for overflows when the hash and overflow are in different data sets of a PBG


Implementing hash access cont3

Implementing Hash Access (cont.)

  • LOAD

    • Does not sort input

    • Hashes each row

    • FREEPAGE and PCTFREE are ignored

    • SLOWWWWWWWW

    • Recommendation

      • Create non-hashed table

      • LOAD

      • ALTER ADD HASHED

      • REORG

      • Instead of creating a new table, you can drop hash organization on an existing table, but…UH OH!


Implementing hash access cont4

Implementing Hash Access (cont.)

  • ALTER TABLE DROP HASH

    • Overflow index is immediately dropped

    • Table space is placed in REORP status – must REORG

    • Cannot recover to a point prior to the DROP

    • Recommendation

      • When defining the unique key think carefully - changing it requires a DROP

      • If you drop hash altogether, consider adding an index


Birds eye view of index access select

Birds Eye View of Index Access - SELECT


Bird s eye view of hashing select

Bird’s Eye View of Hashing - SELECT


Bird s eye view of hashing select cont

Bird’s Eye View of Hashing – SELECT (cont.)

  • Recommendation

    • Tables with multi-level indexes are good candidates for Hash Access

    • Keep the overflow index small – it will increase the odds of finding the index pages in the buffer pool


A bird s eye view of hashing insert

A Bird’s Eye View of Hashing - INSERT

  • Unique key and hash algorithm determine page number and page location

  • If another row is at this location, row is added to the collision chain, also known as the Anchor ID Map

    • There can be several collisions on a single chain

  • An overflow does not occur until the page is full

    • The row is stored in the overflow area and the overflow index is updated


A bird s eye view of hashing insert cont

A Bird’s Eye View of Hashing – INSERT (cont.)


A bird s eye view of hashing insert cont1

A Bird’s Eye View of Hashing – INSERT (cont.)

  • Recommendation

    • Tables with high insert activity may not be good candidates for Hash Access


Bird s eye view of hashing update

Bird’s Eye View of Hashing - UPDATE

  • Unique key and hash algorithm determines page number and page location

  • If row does not exist on page, access the overflow index

  • If row does exist on page

    • If row length decreased or stayed the same, update in place

    • If row length increased, move row to another place on the page

    • If row length increased, but no more room on page

      • Replace row with a pointer record

      • Move indirect reference to overflow

      • No need to index indirect reference


Bird s eye view of hashing update cont

Bird’s Eye View of Hashing – UPDATE (cont.)

  • Recommendation

    • Tables with high update activity and varying length rows may not be good candidates for Hash Activity

      • VARCHAR, VARGRAPHIC…

      • Compression


Bird s eye view of hashing range predicate

Bird’s Eye View of Hashing – Range Predicate

Clustered Data

Hashed Data


Bird s eye view of hashed data range predicate cont

Bird’s Eye View of Hashed Data – Range Predicate (cont.)

  • Recommendation

    • Tables that have queries with range predicates are poor choices for hashed tables

      • What used to be clustered will now be random

    • If you must use a query with a range predicate, consider creating an index to access the data

      • You may be able to force index access, but the query may still require a RID sort


Hash organization1

Hash Organization

  • Header page

    • HPGHASHMOD – number of pages in the Hash Area

      • Rounded down to the nearest prime number

    • HPGHASHFIXIDMAP – number of hash Anchor ID Map entries

      • More on that later

    • HPGLASTHASHPG_P - page number of last page in hash area


Hash organization cont1

Hash Organization (cont.)

  • Space Map

    • Hashed inserts do not have to use the space map to determine if a page has room

      • Hash Area segment entries are not chained - SEGNEXT always zero

      • Free space bit indicators are still maintained

    • During creation, all space maps are formatted for the Hash Area

    • There will always be a dictionary and system page segments, even if the space is not compressed or versioned

      • If more system pages are needed, extra segments will be allocated in the overflow


Data overflow page

Data/Overflow Page

  • The first x14 bytes contain header information

    • Page number, free space management, number of rows

  • The RID map is at the bottom of the page

    • Each RID position contains an offset to the row in the page


Hash home page

Hash Home Page

  • PGCOMB contains hash home page bit

  • Hash header

    • PGAPBEGIN - pointer to beginning of data rows

  • Anchor ID Map contains collision chains

    • The number of anchors, HPGHASHFIXIDMAP, is stored in the header page

    • The number of IDs vary from table to table based on page and row size


Non hash data row

Non-Hash Data Row

  • RID Map

    • Contains offset of row location

    • Can have up to 255 RID values on one page

  • Row header

    • PGSFLAGS

      • Pointer

      • Indirection

      • Compressed

      • Versioned

    • PGSLTH – row length

    • PGSOBD - OBID of table

    • PGSID - RID map ID or version number


  • Hash row

    Hash Row

    • PGSOBD is not needed for UTS

      • One table per table space

      • Hash home pages reuse this value to point to the hash anchor ID map or the next row in the collision chain

      • The first bit in the second byte indicates if this is the last entry in the chain


    Hash algorithm

    Hash Algorithm

    • Key is passed to the hash algorithm

      • Returns a double word integer - HASHEDINT

    • Page Number = HASHEDINT % HPGHASHMOD

    • Anchor ID = HASHEDINT % HPGHASHFIXIDMAP

      • If a row already exists at this anchor, the new row is chained


    Accessing a hashed row

    Accessing a Hashed Row


    Anchor id map and overflow

    Anchor ID Map and Overflow

    • Second byte of anchor ID is updated when overflow occurs

      • It doesn’t necessarily mean there was a collision


    Real time statistics rts

    Real Time Statistics (RTS)

    • SYSTABLESPACESTATS.DATASIZE

      • Total number of data bytes in the table

    • SYSTABLESPACESTATS.TOTALROWS

      • Total number of rows in the table

    • SYSINDEXSPACESTATS .TOTALENTRIES

      • Total number of overflow rows


    Real time statistics rts1

    Real Time Statistics (RTS)

    • SYSTABLESPACESTATS. HASHLASTUSED

      • Last time the table was accessed by hash

    • SYSTABLESPACESTATS. REORGHASHACCESS

      • Number of hash accesses

    • SYSTABLESPACESTATS. REORGSCANACCESS

      • Number of table space scans


    Runstats

    RUNSTATS

    • No new statistics collected

    • Columns for sparse index not valid (contains -2)

      • SYSIBM.SYSINDEXPART

        • LEAFDIST

        • FAROFFPOSF

        • NEAROFFPOSF

      • SYSIBM.SYSCOLUMNS

        • COLCARDF

        • HIGH2KEY

      • SYSIBM.INDEXES

        • CLUSTERRATIOF

    • LEAFNEAR

    • LEAFFAR 

    • LOW2KEY

    • LEAFDIST


    Explain plan table

    Explain & Plan Table

    • ACCESSTYPE

      • H – hash access

      • HN – hash access using an IN predicate

        • Some SQL may be converted by DB2 to IN predicate

      • MH – multiple index scan using the hash overflow index


    Sizing your hash area

    Sizing Your Hash Area

    • Delicate balance between storage utilization and GETPAGE activity

      • Slow response and high CPU if more than one GETPAGE is needed

      • How much are you willing to pay for storage to get excellent response?

    • All of this can be measured!

    GETPAGE

    STORAGE


    Hash storage efficiency

    Hash Storage Efficiency


    Hash access getpage counts

    Hash Access GETPAGE Counts


    Hash vs non hash access getpage counts

    Hash vs. Non-hash Access GETPAGE Counts

    • Range predicates on a hashed table can be disastrous

      • SELECT *

        FROM DMD.DMDPBGTB01

        WHERE COL1_CHAR8 LIKE 'ABC%';

    • Hash table space

      • 32,049 table GETPAGE requests

    • Non-hash table space with clustering unique key

      • 6 table GETPAGE requests

      • 3 index GETPAGE requests

    • Can add extra index to hash


    Summary

    Summary

    • Choose hash candidates carefully

      • Tables with mostly random access

      • Tables with multi-level unique indexes

      • Tables with low insert activity

      • Tables with update activity should have static length rows

    • Monitor to ensure continued efficiency

      • As overflow area grows, so does GETPAGE counts

        • RTS TOTALENTRIES

      • Use REORG AUTOESTSPACE(YES)

    • Beware of table space scans

      • RTS REORGSCANACCESS


    Donna di carlo bmc software donna di carlo@bmc com

    Donna Di CarloBMC [email protected]

    Session

    DB2 10 Hash Access: Access Path or Collision Course?


  • Login