B-Tree Indexes in Sybase and Other RDBMS Systems

6 IQ Indexes

Indexes • Typical RDBMS uses B-tree index • usually a separate structure in addition to data • consists of data pages with values and pointers • Sybase (and others) has special type which contains pointers and data pages • B-tree indexes are expensive • consume space and time (to build) • require maintenance after data refresh • not useful with low cardinality data columns

4 2 6 1 3 5 7 1 2 3 4 5 6 7 8 B-Tree – an example Let’s assume a table with key values from 1 to 8 – and build the B-Tree The decision tree is now very simple. The question asked is “Is the required value less than or equal to the value in the tree?”If it is take the left fork else go right Index Levels - Keys Leaf Level (Data)

IQ-M Indexes • Indexes are the data (a data access method) • no separate data store • A column is likely to have multiple indexes • always has at least one index • Index selection for a column is based upon • Number of discrete values (cardinality) • Usage in queries • To some extent, the column’s data type

Index Types – Now 10 (13 in 12.5)! 1 • Default Index (aka Fast Projection or FP) • Raw data compressed on disk • comes in 3 flavours • Low Fast (LF) • Bit map index • High Non Group (HNG) • Bit-wise index • High Group (HG) • G-Array (relative of a B-tree) • Comes in 3 flavours

Index Types – Now 10 (13 in 12.5)! 2 • Compare Index (CMP) • Column Compare Index • Word Index (WD) • Sort of “Free Text” Index • 12.5 DATE • 12.5 TIME • 12.5 DTTM (datetime index)

Default Index - FP • Created automatically by Create Table • Used for • Ad-Hoc joins • String searches • Certain calculations • Projection of data • This index cannot be created or dropped

IQ Unique Constraint - 1 • Create Table department(dept_idchar(4)Not Null IQ Unique(200),emp_lnamevarchar(25)Not Null IQ Unique(75000), • Improves load processing • IQ will construct 1-byte FP index for dept_id • Won’t try either a 1-byte or 2-byte FP on emp_lname • Too many values for either

IQ Unique Constraint - 2 • In terms of query performance the 1-byte and 2-byte FP indexes can speed up the server • The first method is that searching an FP (of any type) can be done in parallel, which may be faster than a so-called fast index (LF and HG) • The second speed up is that for a LIKE operator the 1-byte and 2-byte FP can be faster than an HNG index • Based on the width of the search predicate • And the width of the FP index (1 or 2 bytes)

IQ Unique Constraint - 3 • There is no mileage in setting IQ UNIQUE 255 set on every column at index/table create time. • The incremental cost of the rollovers to 2-byte FP and Flat FP is very high for larger table sizes (above 10-100 million rows). • Testing at 50 million rows indicates that • the 1-byte to 2-byte cost is very high (increase per index load time by 200 times) • the 2-byte to Flat FP transition increases per index load times by another 300 times. • But let’s think about the look up pages…

IQ UNIQUE Constraint – 4 • Consider using Minimize_Storage option in Sybase IQ 12.5. This will place an IQ UNIQUE(255) on every column for every table created and removes the need to use IQ UNIQUE • If the value is <= 255 then IQ will place a 1-byte FP index on the column – 1 byte of storage per row • If the value is > 255 but <= 65536 then IQ will place a 2-byte FP index on the column – 2 bytes of storage per row • May slightly hinder data loads, but improve query speeds • May incur onetime slight load slowdown while 2-byte FP is converted to flat FP, but this usually happens during the first load

FP Index Types • Depending on cardinality and the use of IQ UNIQUE, IQ-M will initially construct FP index in one of 3 ways • If > 65536 distinct values in column • Flat FP index • If < 256 distinct values • 1-byte FP • Between 256 and 65536 distinct values • 2-byte FP

More than 65536 values in a column The raw data is compressed (on disk) Flat FP Index Color Red Blue Green Red

The Flat FP Index Structure

Less than 256 values in a column A one byte lookup table is built 1 2 3 1 3 1 3 2 1-byte FP Index Data Lookup Table Data Values Red Color 1 Red Blue Green 2 Blue Green Red 3

Between 256 and 65536 values the data is stored in a two byte lookup table Data Values 1 0 Red 0 2 0 1 Blue 3 0 Green 0 2 0 1 0 3 Red 3 0 0 1 3 0 0 2 2-byte FP Index Data Lookup Table Color Red Blue Green

FP Indexes • During load, IQ-M will try to build Flat FP first, unless otherwise specified • If you specify a low cardinality 1-byte FP at the start (by using IQ UNIQUE) then this will • Resort to a 2-byte FP after 256 values… then … • Resort to Flat FP after approx. 65526 values or n lookup pages (see two slides on…) • If you specify a 2-byte FP at the start (by using IQ UNIQUE) then this will • Resort to Flat FP after approx. 65526 values or n lookup pages (see two slides on…)

FP Index Growth • Although an FP can, as the cardinality grows, change (1-byte FP>2-byte FP) or (2-byte FP>Flat FP) it can never revert • There is a high cost in conversion – either way • The only backwards conversion is to drop the column (not the index), recreate and reload (Expensive)

Limit for 2-byte FP - 1 • FP_Lookup_Size Def 32767 • This option controls the number of discrete values that a 2-byte FP can contain, as a maximum Note this is in Kbytes • This was new in 12.4.2 for constraining the size of a 1- or 2- byte FP • By default a 2-byte FP to Flat FP flip will occur when the lookup table grows beyond 32 MB Note – If you have a pre-12.4 1 byte FP it is constrained by FP_Number_Lookup_Pages

Limit for 2-byte FP - 2 • FP_Lookup_Size Def 32767 • For a bigint of 8 bytes, 65536 entries take only around 1+ MB. • For a max varchar of 255 bytes, 65536 entries take no more than 18 MB. • It is critical to keep the entire ByteStore (lookup table) in memory for performance reason.

Look up Pages • If we specified IQ UNIQUE 255 for all columns • Then all columns would have a pinned lookup page (or pages) in memory • If we don’t have a lot of memory for the caches then we could flood memory with lookup pages • This needs further thought…

Use of 1 and 2-byte FP Indexes • Access Paths • In 12.4.2 and beyond we can perform more operations on the lookup table of the FP – instead of the other indexes. • LIKE (simple and Complex) • Simple Predicate • YEAR(column_name) > ‘1995’ • SQRT(column_name) < 100 • DATEPART(hh,column_name) between 10 and 12 • Also the resulting “scan” is performed in parallel (if required)

12.6 - Optimized FP Indexes • The IQ Query Engine is taking even more advantage of the Optimized FP Indexes (1-2 byte FP Indexes) to improve Query Performance • The DBA may periodically rebuild the FP indexes • New System Procedure: sp_iqrebuildindex • Rebuilds the FP index(es) for a table or column

CASE - 1 • CASE • CASE RESPECT is the fastest • There is a 10-20% hit going to CASE IGNORE • Implications to FP Indexes • Regardless of RESPECT vs. IGNORE all 1-byte and 2-byte FP indexes store all the binary values for the data • So ABC, abc, Abc, Abc are all stored even for CASE IGNORE

CASE - 2 • Remember because we store all bitmaps we can go from 1-byte to 2-byte FP, or 2-byte to flat FP where we might not want to • A solution to this is to set the server in CASE RESPECT (because it is faster) • Then use an ETL tool to rtrim() and ucase() or lcase() all of the incoming character data

CASE - 3 • The HG index stores data in what is called “conditioned” mode. • For a CASE IGNORE database there is only one entry per logical value • ABC = abc = Abc etc. • For a CASE RESPECT database there has to be one entry per value • ABC != abc != Abc etc.

CASE – 4 • For an LF index we hold partially conditioned values • For CASE RESPECT and CASE IGNORE all values have a bit-map • This can be wasteful on space • The reason for having this is two fold • To allow for the recreation of the FP index from the LF • To allow for some rare cases (some group by’s) where we still project values from an LF index

Low Fast Index • Traditional Bit Map for Low Cardinality • Less than 10,000 unique values in a column • Can be unique • Required for performance involving • Joins • Group by • MIN, MAX, functions • Where clause predicates • Equality / Inequality, Ranges, IN lists

Bitmap Indexes • What are bitmaps? • Bitmaps are representations for each value in a field • True = 1 • False = 0 • Bit position corresponds to a fixed row ID • For each discrete value there is one bitmap – the length of which is number of rows in the table

Digression on Bitmaps - 1 • If there are 7,000,000 rows in the table each bitmap will be 7,000,000 bits long • This could be 1,000,000 bytes (almost a megabyte) • If there are 1000 possible values in the column this could mean 1 Gbyte (approx.) for the column • Is this correct ?

Digressions on Bitmaps - 2 • Yes, sort of… • In IQ-M there are 4 ways of holding a bitmap page • The conditions for the 4 types of page are • All Zero Bitmap • Few 1s • 20-80% 1s • Almost all 1s • All 1 bitmap

Bitmap “Types” • An all Zero bitmap page is not stored • just an entry in the block map • An all 1 bitmap page is also not stored • a similar entry in the block map • For the 20-80% 1s there is a real bitmap • For the nearly all 1s or nearly all zero pages the data in Run Length Encoded

Run Length Encoding • Used when there is a very sparse set in bits set (or not set) • Very efficient on storage 1-50,90,102-135,1090-4573,7833, 9011-11430,...

... Bitmap Indexes • Each unique value has it’s own bitmap • Designed for incremental additions of rows • Query: • select count(*) from customers where state =‘AL’

High Group Index - 1 • High Cardinality data columns • More than 1000 unique values • Can enforce uniqueness • Special internal structure for unique HG indexes • Automatically created by Create Table for columns with UNIQUE or PRIMARYKEYconstraint (regardless of cardinality)

a b c 4 High Group Index - 2 B-Tree Index 1,2 3,5,6 New Blocks can be added into the Linked List

a B(ptr) C = 4 High Group Index - 3 Much faster for load Much faster for Skewed data give bitmaps directly to optimizer B-Tree Index 1,2 When a page is completely filled With one value the array is converted to a bitmap 1011010101001 1010001001001 0010100101010

High Group Index - 4 • Required for performance on High Cardinality columns used for: • Joins • Select Distinct, Count Distinct • Group By • Takes up the most space in the database • Requires the longest time to load/delete • Cannot be used with certain data types

Page Size Transition Point 64 Kbytes 4,096 rowids 128 Kbytes 8,192 rowids 256 Kbytes 16,384 rowids 512 Kbytes 32,768 rowids High Group Transition - 1 • The point at which a High Group G-Array transitions from a list of rowids to a bitmap is dependant upon the IQ Page Size: • This is quite important when calculating how large indexes are likely to grow, and hence potentially which index to use for a given column (datatype/cardinality).

Page Size # rows in a 1 page bitmap 64 Kbytes 640,000 rowids 128 Kbytes 1.2 M rowids 256 Kbytes 2.4 M rowids 512 Kbytes 5.1 M rowids High Group Transition - 2 • Remember when the G-Array is a list or rowids it is, at most, one page long for each value. A bitmap is a lot bigger (for larger tables). • So as you can see - if you are running on a 64 KB database, and have a 2 million row table - when the G-Array entry flips to a bitmap – it will grow the G-Array part of the index size by 3 pages (for each and every value that flips).

Specialised High Group - 1 • When a column is created (or altered) to have the following constraints • UNIQUE • Primary Key • Then the column has a unique HG index created automatically • This is a HG without a G-Array

Specialised High Group - 2 • Provided the combination of two or more columns is unique, then you can generate a unique multi-column HG index on product of the two columns • You may still generate other indexes on the base columns • 12.5 Non-Unique Multi-Column Index (required for the Referential Integrity Process)

High Non Group Index (HNG) • Bit-Wise Index • data stored as binary • vertically partitioned • patented by Sybase • cannot be unique • cannot be used with certain data types • Used for • range searches for all cardinality columns • aggregation (sum and average functions)

HNG - High Card Bit-Wise Index • Data with large number of values stored in binary form • Data sliced vertically - each bit position can be manipulated separately • Many bit positions are either all on or all off so no storage space is required with compression

HNG Index Processing For the query: select sum(sales) from customers ASIQ performs the sum as follows : #1bits on*1 + #2bits on*2 + #4bits on*4 +#8bits on*8 (6*1) + (4*2) +(4*4)+ (4*8) = 62

HNG Indexes with Other Indexes • Any Cardinality columns also need an HNG Index • Columns used with aggregates (sum, avg) • Range searches • Root String searches • example: where cust_name like “Stan%” • all other string searches will use the FP index

Compare (CMP) Index • The CMP index is used for “comparing” 2 columns in the same table • It is really just 3 bitmaps • A “less than” bitmap • An “Equal to” bitmap • A “Greater than” bitmap • Performance of t1.col1 > t1.col2 is substantially improved • Load times are only marginally affected (<1%)

Word (WD) Index 12.4.3 • This is a specialised index that indexes each and every “word” in a column • Used for char() varchar() and long varchar() • Slightly faster to load than an equivalent column HG index • Accessed by the “contains” verb • where t1.col1 contains (‘Richard’, ‘Soundy’)

Word (WD) Index 12.5 • In 12.5 this will support the like clause – but it is only accelerated with the WRD index if the token is delimited. • Like “Richard” -> Handled by HG or LF • Like “%Richard%” -> Handled by FP • Like “Richard%” -> Handled by HNG/HG/LF (Range query) • Like “%ÿRichardÿ%” -> Handled by WRD (Note spaces)

Word Index Use • The delimiters and the length of the entries can be set for the WD index during index creation time. • By default the delimiters are all the ASCII characters not defined as number or alphabetic • The limit is the max size of each entry • CREATE WD INDEX earnings_wd ON earnings_report_table(earn_col) DELIMITED BY ‘ :;.’ LIMIT 25

B-Tree Indexes in Sybase and Other RDBMS Systems

B-Tree Indexes in Sybase and Other RDBMS Systems

Presentation Transcript

Indexes

Indexes

Indexes

Indexes

Variant Indexes

Indexes

Indexes

Indexes

IQ Indexes

Indexes

BITMAP INDEXES

Secure Indexes

BITMAP INDEXES

Indexes

Indexes

Primary Indexes

Indexes

Indexes:

Indexes