A primer on multidimensional clustering for udb luw
This presentation is the property of its rightful owner.
Sponsored Links
1 / 50

A Primer on Multidimensional Clustering for UDB LUW PowerPoint PPT Presentation


  • 69 Views
  • Uploaded on
  • Presentation posted in: General

A Primer on Multidimensional Clustering for UDB LUW. He once made an SQL statement run faster just by staring at it. He has never had a Network Security firewall rule refuse him access. He had the "Backspace" and "Delete" keys permanently removed from his keyboard.

Download Presentation

A Primer on Multidimensional Clustering for UDB LUW

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A primer on multidimensional clustering for udb luw

A Primer on MultidimensionalClustering for UDB LUW


A primer on multidimensional clustering for udb luw

He once made an SQL statement run faster just by staring at it.

He has never had a Network Security firewall rule refuse him access.

He had the "Backspace" and "Delete" keys permanently removed from his keyboard.

On a slow day, he will reorg large tables completely in his mind.

His sysadmins call him daily to ask if they can give him more disk space.

He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32.

He wrote an improved version of the DB2 Optimizer, using only 9 lines of code.

He can type all SQL syntax with 100% accuracy from memory.

The DB2 Optimizer asks him for the best access path.

He once made an SQL statement run faster just by staring at it.

He has never had a Network Security firewall rule refuse him access.

He had the "Backspace" and "Delete" keys permanently removed from his keyboard.

On a slow day, he will reorg large tables completely in his mind.

His sysadmins call him daily to ask if they can give him more disk space.

He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32.

He wrote an improved version of the DB2 Optimizer, using only 9 lines of code.

He can type all SQL syntax with 100% accuracy from memory.

The DB2 Optimizer asks him for the best access path.

He once made an SQL statement run faster just by staring at it.

He has never had a Network Security firewall rule refuse him access.

On a slow day, he will reorg large tables completely in his mind.

He had the "Backspace" and "Delete" keys permanently removed from his keyboard.

His sysadmins call him daily to ask if they can give him more disk space.

He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32.

He wrote an improved version of the DB2 Optimizer, using only 9 lines of code.

He can type all SQL syntax with 100% accuracy from memory.

The DB2 Optimizer asks him for the best access path.

He once made an SQL statement run faster just by staring at it.

He has never clicked on

the “undo” arrow.

He has never clicked on

the “undo” arrow.

He has never clicked on

the “undo” arrow.

He has never clicked on

the “undo” arrow.

He has never clicked on

the “undo” arrow.

He has never had a Network Security firewall rule refuse him access.

He had the "Backspace" and "Delete" keys permanently removed from his keyboard.

On a slow day, he will reorg large tables completely in his mind.

His sysadmins call him daily to ask if they can give him more disk space.

He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32.

He wrote an improved version of the DB2 Optimizer, using only 9 lines of code.

He can type all SQL syntax with 100% accuracy from memory.

The DB2 Optimizer asks him for the best access path.

He once made an SQL statement run faster just by staring at it.

He has never had a Network Security firewall rule refuse him access.

He had the "Backspace" and "Delete" keys permanently removed from his keyboard.

On a slow day, he will reorg large tables completely in his mind.

His sysadmins call him daily to ask if they can give him more disk space.

He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32.

He wrote an improved version of the DB2 Optimizer, using only 9 lines of code.

He can type all SQL syntax with 100% accuracy from memory.

The DB2 Optimizer asks him for the best access path.


A brief bio

A brief bio…

  • 29 years IT, 19 years of DBA experience

    • UDB LUW on AIX

    • DB2/ZOS

    • Oracle

  • Longest query I ever tuned was over 4 feet long when printed out

  • Favorite saying: “Even a blind squirrel finds a nut once in a while”


Agenda

Agenda

  • What is clustering?

  • What is multidimensional clustering (MDC)?

  • Some design guidelines for MDC


Left outer join

Left Outer Join

L

JOIN

T

E

F


Backup

Backup

Hint: The most important thing to a DBA

back


Create table in tablespace

Create table in tablespace

tablecreatetablespace


What is clustering

What is Clustering?

  • Physical sequence of rows in a DB2 table.

  • Determined by defining one index as the “clustering index”.

  • As rows are inserted, DB2 attempts to put them in correct clustering location

  • During Reorg, rows are sorted in clustering order before reloading back into table

  • Is the table clustered or is the index clustered??


Regular non clustering indexes

Regular non-clustering indexes

Index

On Region

Table

Index on

Year


Clustering index

Clustering Index

Clustering Index

On Region

Table

Index on

Year


Why are reads faster when a table is clustered

Why are reads faster when a table is clustered??

The first I/O reads a page into memory which contains many rows with the same key or a range of key values

Example: App needs 500 rows for a given region….

If the DBMS knows that it will need to fetch several or many consecutive pages, then it can begin “prefetching” extents (multiple pages) into memory before application needs it

18 IOs

vs

3 IOs

vs

1 IO


Sequential prefetch

Sequential Prefetch

  • “Holy Grail” when accessing large numbers of rows

  • Significant reduction in I/O

  • Physical reads vs Logical reads

  • Tablespace Page size (bytes)

  • Tablespace Extent size (pages)

  • Tablespace Prefetch size (pages)


How udb uses clustering

How UDB uses clustering

  • Sequential prefetch is turned on if UDB determines cost savings

  • Clustered data makes it more likely for sequential prefetch to be turned on

  • Optimizer looks at clusterratio and clusterfactor (on syscat.indexes)

  • Sequential detection can be turned on dynamically during query execution


So what s the shortfall with clustering

So what’s the shortfall with Clustering?

  • Clustering deteriorates over time (probably) – requiring reorgs

  • Record based indexes with a pointer for every single record, so can become very large in size

  • Only get one choice for the clustering index.

  • If Joe needs the table clustered by timestamp and Bill needs it clustered by policy #, one of them will probably be unhappy.


Partitioned database

Partitioned Database

DA TA BA SE


Multidimensional clustering

MultiDimensional Clustering

Dimensionalclustering

Dimensionalclustering

Dimensionalclustering


Multidimensional clustering mdc

MultiDimensional Clustering (MDC)

  • What if your data could be physically sequenced in more than one way at the same time??

  • Great in theory, but how do you make this happen in real life on a real table??


Multidimensional clustering1

MultiDimensional Clustering

  • Data is physically grouped together by “dimensions” into separate blocks, or extents

  • Each page belongs to exactly one block

  • All blocks are of equal size

  • Tablespace Page size (bytes)

  • Tablespace Extent size (pages)

  • Tablespace Prefetch size (pages)


What is an extent

What is an extent?

  • An extent is a set of contiguous data pages on disk, specified at tablespace creation time.

  • Physical size of an extent determined by:

    • Extent Size (# of pages)

    • Page Size (kb)


Mdc with three dimensions

MDC with Three Dimensions

Age

3

2

1

Red

Color

Blue

Green

2002

2003

2004

Year


What is a logical cell

What is a (logical) cell?

  • Contains all rows for a unique combination of dimension values

  • Physically made up of one or more blocks (extents)

  • Blocks are only allocated for logical cells which actually have records for a given combination of dimension values


A cell

A “Cell”

Age

3

2

1

2002, Red, 1

Red

Color

Blue

Green

2002

2003

2004

Year


What is a slice

What is a Slice?

  • A slice is a set of blocks having a particular dimension key.


A red slice of the color dimension

A Red “Slice” of the Color Dimension

Age

3

2

1

2002, Red, 1

2003, Red, 1

2004, Red, 1

Red

Color

Blue

Green

2002

2003

2004

Year


A 2004 slice of the year dimension

A 2004 “Slice” of the Year Dimension

Age

3

2

1

2004, Red, 1

Red

Color

Blue

2004, Blue, 1

Green

2004, Green, 1

2002

2003

2004

Year


A 1 slice of the age dimension

A 1 “Slice” of the Age Dimension

Age

3

2

1

2002, Red, 1

2003, Red, 1

2004, Red, 1

Red

Color

Blue

2002, Blue, 1

2003, Blue, 1

2004, Blue, 1

Green

2002, Green, 1

2003, Green, 1

2004, Green, 1

2002

2003

2004

Year


How mdc works

How MDC works

Red

Red

2000

2001

Red

Red

Red

2000

2001

2002

Blue

2000

Blue

Blue

Blue

Blue

2000

2001

2002

2003

Blue

Blue

2000

2001

Green

Green

Green

2000

2001

2002

Green

Green

Green

2000

2002

2003

Dimension Block

Index on Year

  • Rows are organized in extents based upon dimensions

Dimension Block

Index on

Color


Multidimensional clustering2

MultiDimensional Clustering

  • MDC introduces indexes that are block-based – much smaller than record-based

    • A pointer for each block instead of a pointer for each row

  • MDC allows a table to be physically clustered on more than one key or dimension

  • MDC table is able to maintain and guarantee clustering over all dimensions automatically and continuously


Mdc indexes

MDC Indexes

  • A dimension block index is automatically created for each dimension specified

  • A composite block index is automatically created containing all columns across all dimensions

  • Composite index used to maintain clustering

  • Much lower overhead for logging


Creating an mdc table

Creating an MDC table

Create table t1 (age int, color char(10), year char(4), c1 int, c2 int)

organize by dimensions (age, color, year)

  • Three dimension block indexes (one each for age, color and year).

  • A composite block index is also created which includes (age,color, year).

  • Traditional “RID” indexes can also be created on an MDC

  • Can logical AND/OR between BID and RID indexes


Select processing in mdc ex 1

Select Processing in MDC (ex #1)

Age

3

2

Select …

From Table

Where Age = ‘1’

1

2002, Red, 1

2003, Red, 1

2004, Red, 1

Red

Color

Blue

2002, Blue, 1

2003, Blue, 1

2004, Blue, 1

Green

2002, Green, 1

2003, Green, 1

2004, Green, 1

2002

2003

2004

Year


Select processing in mdc ex 2

Select Processing in MDC (ex #2)

Age

3

2

Select …

From Table

Where color = ‘Red’

1

2002, Red, 1

2003, Red, 1

2004, Red, 1

Red

Color

Blue

Green

2002

2003

2004

Year


Select processing in mdc ex 3

Select Processing in MDC (ex #3)

Age

3

2

Select …

From Table

Where color = ‘Red’

And Age = 1

And Year = ‘2002’

1

2002, Red, 1

Red

Color

Blue

Green

2002

2003

2004

Year


Insert processing in mdc

Insert Processing in MDC

  • Probe composite block index to see if this is a new combination of dimensions (new logical cell)

  • If existing, search list of BIDs to look for space to insert row

  • If new logical cell or all blocks full for an existing cell, then create a new block


Delete processing in mdc

Delete Processing in MDC

  • If the record being deleted is not the last record in block, UDB just deletes the record and removes its RID from any record based indexes

  • If deleting last record in block, UDB frees the block by changing its IN_USE status bit and removing the BID from all block indexes and also remove RID from record based indexes


Update processing in mdc

Update Processing in MDC

  • Updates on non-dimension values are done in place just as with regular tables

    • No need to update block indexes unless no space is found and a new block needs to be added to cell

  • Updates of dimension values are treated as delete/insert

    • Block indexes will need to be updated


Mdc benefits

MDC Benefits

  • Can cluster in multiple dimensions

  • Clustering is automatically and dynamically maintained over time.

  • Reorg not necessary for re-clustering

  • Block indexes are much smaller and have much less overhead for maintenance and logging


Design guidelines for mdc

Design Guidelines for MDC

  • MDC is great tool

  • But, used incorrectly, can make things worse just as much as it can make things better

  • Requires knowledge of data and data useage by users


Mdc design

MDC Design

  • Most important design criteria for MDC is to select proper dimension columns and appropriate exent size

  • Columns that are used in queries as equality or range predicates

  • Low cardinality

  • Desire high density – blocks are mostly full

  • Generally no more than 3 or 4 dimensions


Mdc size considerations

MDC Size Considerations

  • At least one extent will be allocated for every unique combination of dimensions in the data

  • Evaluate dimension volumetrics and row size to establish tablespace extent size

    • Select dimcol1, dimcol2, dimcol3, count(*) from table

  • Example: 8k page size * 32 page extent size gives 256k extent size

  • If you have 1 million unique dimension combinations – minimum table size of 256 GB!!


What happens if you choose wrong

What happens if you choose wrong??

  • A high cardinality column(s) will explode the size of your table and destroy performance!!!

  • Remember that a block is physically allocated for each unique combination of dimension key values

  • NEVER use a high cardinality column or a unique column for an MDC dimension


Down right stupid

Down Right Stupid

Choosing a unique column as a dimension is just:

Stupid

Stupid

Stupid

Stupid


Using column expressions with mdc

Using column expressions with MDC

  • What if a column is a good dimension candidate, but cardinality is way too high (ex: timestamp column)

  • Create table t1 (c1 timestamp, c2 int, c3 int generated always as year(c1)) organize by dimensions (c2, c3)

  • Monotonic – generated column increases/decreases the same as base column

  • A non-monotonic column will only allow equality or IN predicates on the base column to use the block index


Mdc tables and database partitioning

MDC tables and database partitioning

  • DB2 LUW DPF partitioning is just a way to spread the data across partitions (not range partitioning like DB2/ZOS

  • The reason for partitioning a table is independent of whether the table is an MDC table or a regular table

  • Can partition on a dimension column or a non-dimension column

    • However, partitioning on a dimension column means that all rows for a particular dimension value exist on only 1 partition

  • If partitioning, remember that logical cells can spread across partitions

    • Important for sizing of extents


Block index considerations

Block Index Considerations

  • Composite block index columns are ordered based upon “organize by dimensions” clause

  • Create table t1 (c1 int, c2 int, c3 int, c4 int) organize by dimensions (c1, c4, (c3,c1), c2)

    • Composite index will be (c1,c4,c3,c2)

  • Create table t1 (c1 int, c2 int, c3 int, c4 int) organize by dimensions (c1, c2, (c3,c1), c4)

    • Composite index will be (c1,c2,c3,c4)


The customer is always right

The Customer is always right

Everything

Else

Customer


A primer on multidimensional clustering for udb luw

STORY


To make a long story short

To make a long story short

STORY


Questions

Questions???


  • Login