Index Internals

Agenda • Introduction • Block Structure • Block Compression • Insertion • Deletion • Coalesce / Rebuild • Freelists • Virtual Indexes

B*Tree Indexes • Based on modified B*Tree algorithm • Contain branch blocks and leaf blocks • Blocks contain keys and data • Keys maintained in sorted order within blocks • All leaf blocks are at the same depth • All blocks are on average 75% full

Index Types • There are several recent variants of B*tree indexes including

Limits • Maximum number of B*tree levels is 24 • Maximum number of columns is 16 in 7.3 and below; 32 in 8.0 and above • Maximum key lengths vary with release and block size

Leaf Blocks • Every index has a least one leaf block • Each leaf block contains 0 or more rows • Each row contains a key and data • Indexes can be unique or non-unique • Leaf row formats differ for unique and non-unique indexes

Leaf Block Structure Block Common Header 20 bytes 72 bytes Transaction Header 16 bytes Index Header 16 bytes Index Leaf Header 2 bytes Block Size 2 bytes per row Slot Array Free Space Index Leaf Rows 4 bytes Tail

Branch Blocks • Indexes may contain branch blocks • Branch blocks point to other branch blocks or leaf blocks • Branch blocks contain 0 or more rows • Each row has a suffix compressed key and a pointer to the next block • Compressed rows are terminated with 0xFE byte

Branch Block Structure Block Common Header 20 bytes 48 bytes Transaction Header 16 bytes Index Header 24 bytes Index Branch Header 2 bytes Block Size 2 bytes per row Slot Array Free Space Index Branch Rows 4 bytes Tail

S D E U AUSBELCAN SCOSPA USA DEN ENG Branch Blocks • Each block has a pointer to the left hand side of the tree. This is part of the header • A branch block containing N rows points to N+1 blocks. Root Block Level 2 Branch Blocks Level 1 Leaf Blocks Level 0

Root Block • Every index has one root block • May be a leaf block or a branch block • Can be an empty leaf block • Always the next block after the segment header in the first extent Segment Header RootBlock ….. First Extent

BLEVEL versus Height • BLEVEL is the number of branch block levels in the B*tree ANALYZE INDEX i1 COMPUTE STATISTICS; SELECT blevel FROM dba_indexesWHERE index_name = 'I1'; • Height is the total number of levels in the B*tree ANALYZE INDEX i1 VALIDATE STRUCTURE; SELECT height FROM index_stats; • Height = BLEVEL + 1

Internal versus External ROWIDs • Internal ROWIDs are used in branch blocks Always 4 bytes. • External ROWIDs are used in leaf blocks • In 7.3.4 and below all ROWIDs are 4 bytes • IOTs do not use external ROWIDs

FlagByte LockByte ROWID LengthByte(s) Column 1 LengthByte(s) Column 2 FlagByte LockByte LengthByte(s) Column 1 LengthByte(s) Column 2 LengthByte ROWID Leaf Rows • Unique indexes use one byte per row less than non-unique indexes • Each column has one length byte (< 128 bytes); two length bytes otherwise Unique Index Non-Unique Index

Non Unique versus Unique Indexes SELECT c01,c02FROM t1WHERE c01 IN( SELECT c01 FROM t2); Non UniqueIndex UniqueIndex CREATE INDEX i2 ON t2 (c01); CREATE UNIQUE INDEX i2 ON t2 (c01); SELECT STATEMENT MERGE JOIN SORT JOIN TABLE ACCESS (FULL) 'T1' SORT JOIN VIEW OF 'VW_NSO_1' SORT (UNIQUE) TABLE ACCESS (FULL) 'T2' SELECT STATEMENT NESTED LOOPS TABLE ACCESS (FULL) 'T1' INDEX (UNIQUE SCAN) 'I2'

Non Unique Leaf Rows • All leaf rows are stored in sorted order • For non-unique indexes the ROWID is appended to the key to create a unique key • Keys must be effectively unique so that updates can traverse the B*tree directly to the affected leaf block without requiring a scan • For concatenated indexes allows range scans of prefix columns

Y 01 41 E9 A7 Y 01 41 E9 A8 00 02 01 41 E9 A8 00 02 01 41 E9 A5 00 01 01 41 E9 A7 00 00 Y Y Y 01 41 E9 A5 00 02 01 41 E9 A9 00 00 01 41 E9 A7 00 01 Y Y Y 01 41 E9 A7 00 02 01 41 E9 A9 00 01 01 41 E9 A6 00 00 Y Y Y 01 41 E9 A8 00 00 01 41 E9 A6 00 01 01 41 E9 A9 00 02 Y Y Y 01 41 E9 A6 00 02 01 41 E9 AA 00 00 01 41 E9 A8 00 01 Y Y Y Non-Unique Leaf Rows • For non-unique indexes ROWIDs may be stored in branch blocks • ROWIDs are suffix compressed where possible

Branch Block Compression • Branch block rows are suffix compressed • Number of branch blocks in an index is determined by • Length of index key and data • Number of leaf blocks • Uniqueness of the leading edge of the key • Number of branch blocks affects index height

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 0 0 0 0 9 9 9 9 9 9 Branch Block Compression • A single column 1000000 row CHAR(N) index Monotonic values 000000 to 999999 Constant prefix

0 0 0 0 9 9 9 9 9 9 X X X X X X X X X X X X X X X X X X X X X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X 0 0 0 0 0 0 X 9 9 9 9 9 9 9 X 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 X X X Branch Block Compression • A single column 1000000 row CHAR(12) index Monotonic values 000000 to 999999 Constant prefix Filler

Leaf Block Compression • In Oracle 8.1.5 and above, index leaf blocks can be compressed e.g. CREATE INDEX i1 ON t1 (c01,c02,c03) COMPRESS 2; • Each leaf row is split into a prefix and a suffix • The number of columns in the prefix is specified using the COMPRESS clause • Repeating prefix columns within the block are held once in a prefix row • Suffix row has pointer to prefix row • Prefix row has implicit pointers to suffix rows

Compressed Leaf Block Structure 20 bytes Block Common Header 72 bytes Transaction Header 16 bytes Index Header Index Leaf Header 16 bytes Block Size 2 bytes Compression Header 4 bytes Suffix Slot Array 2 bytes per row 4 bytes per row Prefix Slot Array Free Space Prefix rows and Suffix rows 4 bytes Tail

Suffix Slot Array 1600 1400 1300 1200 1000 900 6 1500 1 1100 1700 4 Prefix Slot Array 900 Rome Suffix Row 1000 Milan Suffix Row 1100 Italy Prefix Row 1200 Munich Suffix Row 1300 Frankfurt Suffix Row 1400 Berlin Suffix Row 1500 Germany Prefix Row 1600 Paris Suffix Row 1700 France Prefix Row Leaf Block Compression

Leaf Block Compression • Example – single column index CREATE TABLE (c01 CHAR(5)); CREATE INDEX i1 ON t1 (c01) COMPRESS 1;

ENGSCOUSA Insertion Inserting 'ENG','SCO' and 'USA' Before After Index initially has single empty leaf block at root. Rows are inserted into leaf block

ENGSCOUSA U BELENGSCO USA Insertion Inserting 'BEL' Before After Root node is now a branch block. Two new leaf nodes created. Leaf rows split between the two new blocks

S U U BELENGSCO USA BELENG SCOSPA USA Insertion Inserting 'SPA' Before After New leaf node created. Leaf rows split between the two leaf nodes. New leaf block pointer added to branch block

S S U U BELENG SCOSPA USA AUSBELENG SCOSPA USA Insertion Inserting 'AUS' Before After New row added to leaf block. No other blocks affected

S U E S U AUSBELENG SCOSPA USA AUSBELCAN SCOSPA USA ENG Insertion Inserting 'CAN' Before After New leaf node created. Leaf rows split between the two leaf nodes. New leaf block pointer added to branch block

S D E U E S U AUSBELCAN SCOSPA USA ENG AUSBELCAN SCOSPA USA DEN ENG Insertion Inserting 'DEN' Before After New leaf node created. Branch block now full so new branch blocks created. Index now has three levels

PCTFREE • Only applies to CREATE INDEX statement. • Not used for subsequent inserts CREATE TABLE t1 (c01 NUMBER); Insert 100000 rows CREATE INDEX i1 ON t1 (c01)PCTFREE 50 CREATE TABLE t1 (c01 NUMBER); CREATE INDEX i1 ON t1 (c01)PCTFREE 50; Insert 100000 rows

S S D E U D E U AUSBELCAN SCOSPA USA AUSBELCAN SCOSPA USA DEN ENG DEN ENG Deletion Deleting 'CAN' Before After Deleted flag is set for leaf row. Leaf block free space count is adjusted. No effect on other blocks

S S D E U D E U AUSBELCAN SCOSPA USA AUSBELCAN SCOSPA USA DEN ENG DEN ENG Deletion Deleting 'ENG' Before After Deleted flag is set for leaf row. Leaf block free space count is adjusted. Leaf block pointer is not deleted from branch

S S D E U D E U AUSBELCAN SCOSPA USA AUSBELCAN SCOSPA USA DEN ENG DEN FRA Deletion Inserting 'FRA' Before After Leaf block is reused. Leaf row added to leaf block. No other blocks affected

S S D D E U AUSBELCAN SCOSPA USA AUSBELCAN SCOSPAUSA DEN ENG DENENG Coalesce ALTER INDEX index1 COALESCE; Before After Leaf rows are merged in leaf blocks. Branch blocks are updated where necessary

DEN S D SP D E U AUSBELCAN SPAUSA DENENGSCO AUSBELCAN SCOSPA USA ENG Rebuild ALTER INDEX index1 REBUILD; Before After Index is rebuilt in a new segment. All blocks are modified

Freelists • Index blocks can be returned to the freelist after all rows have been deleted • Empty index blocks can be recycled • Leaf blocks may be reused as branch blocks and vice versa • Blocks will not be reused until existing blocks on the freelist have been used • Empty blocks are initially returned to the transaction freelist. They are subsequently moved to the master freelist.

724 723 726 730 727 725 724 728 729 722 722 726 730 727 725 723 728 729 725 723 724 726 Freelists Deleting all rows from blocks 723 to 726 Before After Master Freelist Master Freelist 730 731 730 731 TransactionFreelist TransactionFreelist Blocks are inserted in transaction freelist. Branch block is not modified in case rows are re-inserted again

722 726 730 727 725 724 723 728 730 729 722 729 728 726 731 727 725 724 723 Freelists Inserting new rows Before After Master Freelist Master Freelist 730 731 731 TransactionFreelist TransactionFreelist 725 723 724 726 725 723 724 726 Block 730 is full and is removed from master freelist. New rows added to block 731

729 730 722 729 727 726 730 725 723 723 724 724 722 731 728 726 731 727 728 725 731 724 725 726 723 725 724 726 723 Freelists Inserting new rows Before After Master Freelist Master Freelist TransactionFreelist TransactionFreelist Block 731 is full and is removed from master freelist. Blocks 723 to 726 moved from transaction freelist to master freelist. New rows added to block 726

723 725 731 729 728 727 726 730 725 724 722 728 729 722 731 723 724 730 727 726 724 723 725 726 724 725 723 Freelists Inserting new rows Before After Master Freelist Master Freelist TransactionFreelist TransactionFreelist Block 726 is full and is removed from master freelist. New rows added to block 725Leaf block 724 becomes a new branch block

722 723 731 726 730 725 724 728 729 727 731 722 729 727 726 730 725 724 723 Master Freelist Master Freelist TransactionFreelist TransactionFreelist 723 725 728 723 724 Freelists Inserting new rows Before After Blocks 725 and 724 are removed from master freelist. Branch block 728 is moved to transaction freelistNew rows added to block 723

728 724 724 731 727 726 730 725 723 722 729 722 731 729 723 725 727 726 730 Master Freelist Master Freelist TransactionFreelist TransactionFreelist 723 728 728 Freelists Inserting new rows Before After Block 723 is full and is removed from master freelist. Branch block 728 becomes a leaf blockNew rows added to block 728

Virtual Indexes • In 8.1.5 and above it is possible to create virtual indexes • Virtual indexes have a data dictionary definition, but no associated segment • Effectiveness of new indexes can be tested by generating theoretical execution plans • The CBO will consider virtual indexes if the hidden parameter _use_nosegment_indexes is set to true

Virtual Indexes • Consider the following analysed table CREATE TABLE t1 ASSELECT * FROM dba_objectsWHERE ROWNUM < 1000; ANALYZE TABLE t1 COMPUTE STATISTICS; • A virtual index can be created using the NOSEGMENT keyword CREATE INDEX i1 ON t1 (owner, object_name) NOSEGMENT; • Statistics must be generated for the new index based on the existing statistics for the table EXECUTE DBMS_STATS.GENERATE_STATS (USER,'I1');

Virtual Indexes • The statement SELECT object_id FROM t1WHERE owner = USER AND object_name = 'T1'; • generates the plan SELECT STATEMENT Optimizer = CHOOSE TABLE ACCESS (FULL) OF 'T1' • With the hidden parameter enabled ALTER SESSION SET "_use_nosegment_indexes" = TRUE; • the same statement generates a different plan SELECT STATEMENT Optimizer = CHOOSE TABLE ACCESS (BY INDEX ROWID) INDEX (RANGE SCAN) OF 'I1'

Thank you for your interest For more information and to provide feedback please contact me My e-mail address is: info@juliandyke.com My website address is: www.juliandyke.com

Index Internals

Index Internals

Presentation Transcript

Lisp Internals

JVM Internals

Database Internals

CPU Internals

SQL Server Index Internals

Indexing Internals

Browser internals

Process Internals

Thread Internals

ROI_pac Internals

APST Internals

C++ internals

Database Internals

Windows internals

EPICS Internals

VM Internals

Latch Internals

K42 Internals

THE INTERNALS

JVM Internals

Router Internals