Database tuning principles experiments and troubleshooting techniques
Download
1 / 210

Database Tuning Principles, Experiments and Troubleshooting Techniques - PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on

Database Tuning Principles, Experiments and Troubleshooting Techniques. http://www.mkp.com/dbtune Dennis Shasha ([email protected]) Philippe Bonnet ([email protected]). Availability of Materials.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Database Tuning Principles, Experiments and Troubleshooting Techniques' - nanji


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Database tuning principles experiments and troubleshooting techniques

Database TuningPrinciples, Experiments and Troubleshooting Techniques

http://www.mkp.com/dbtune

Dennis Shasha ([email protected])

Philippe Bonnet ([email protected])


Availability of materials
Availability of Materials

  • Power Point presentation is available on my web site (and Philippe Bonnet’s). Just type our names into google

  • Experiments available from Denmark site maintained by Philippe (site in a few slides)

  • Book with same title available from Morgan Kaufmann.


Database tuning
Database Tuning

Database Tuning is the activity of making a database application run more quickly. “More quickly” usually means higher throughput, though it may mean lower response time for time-critical applications.


ApplicationProgrammer(e.g., business analyst,

Data architect)

Application

SophisticatedApplicationProgrammer(e.g., SAP admin)

QueryProcessor

Indexes

Storage Subsystem

Concurrency Control

Recovery

DBA,Tuner

Operating System

Hardware[Processor(s), Disk(s), Memory]


Outline of tutorial
Outline of Tutorial

  • Basic Principles

  • Tuning the guts

  • Indexes

  • Relational Systems

  • Application Interface

  • Ecommerce Applications

  • Data warehouse Applications

  • Distributed Applications

  • Troubleshooting


Goal of the tutorial
Goal of the Tutorial

  • To show:

    • Tuning principles that port from one system to the other and to new technologies

    • Experimental results to show the effect of these tuning principles.

    • Troubleshooting techniques for chasing down performance problems.


Tuning principles leitmotifs
Tuning Principles Leitmotifs

  • Think globally, fix locally (does it matter?)

  • Partitioning breaks bottlenecks (temporal and spatial)

  • Start-up costs are high; running costs are low (disk transfer, cursors)

  • Be prepared for trade-offs (indexes and inserts)


Experiments why and where
Experiments -- why and where

  • Simple experiments to illustrate the performance impact of tuning principles.

  • http://www.diku.dk/dbtune/experiments to get the SQL scripts, the data and a tool to run the experiments.


Experimental dbms and hardware
Experimental DBMS and Hardware

  • Results presented throughout this tutorial obtained with:

    • SQL Server 7, SQL Server 2000, Oracle 8i, Oracle 9i, DB2 UDB 7.1

    • Three configurations:

    • Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb) 2 Ultra 160 channels, 4x18Gb drives (10000RPM), Windows 2000.

    • Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

    • Pentium III (1 GHz, 256 Kb), 1Gb RAM, Adapter 39160 with 2 channels, 3x18Gb drives (10000RPM), Linux Debian 2.4.


Tuning the guts
Tuning the Guts

  • Concurrency Control

    • How to minimize lock contention?

  • Recovery

    • How to manage the writes to the log (to dumps)?

  • OS

    • How to optimize buffer size, process scheduling, …

  • Hardware

    • How to allocate CPU, RAM and disk subsystem resources?


Isolation
Isolation

  • Correctness vs. Performance

    • Number of locks held by each transaction

    • Kind of locks

    • Length of time a transaction holds locks


Isolation levels
Isolation Levels

  • Read Uncommitted (No lost update)

    • Exclusive locks for write operations are held for the duration of the transactions

    • No locks for read

  • Read Committed (No dirty retrieval)

    • Shared locks are released as soon as the read operation terminates.

  • Repeatable Read (no unrepeatable reads for read/write )

    • Two phase locking

  • Serializable (read/write/insert/delete model)

    • Table locking or index locking to avoid phantoms


Snapshot isolation

Each transaction executes against the version of the data items that was committed when the transaction started:

No locks for read

Costs space (old copy of data must be kept)

Almost serializable level:

T1: x:=y

T2: y:= x

Initially x=3 and y =17

Serial execution: x,y=17 or x,y=3

Snapshot isolation: x=17, y=3 if both transactions start at the same time.

R(Y) returns 1

R(Z) returns 0

R(X) returns 0

T1

W(Y:=1)

W(X:=2, Z:=3)

T2

T3

TIME

X=Y=Z=0

Snapshot isolation


Value of serializability data
Value of Serializability -- Data items that was committed when the transaction started:

Settings:

accounts( number, branchnum, balance);

create clustered index c on accounts(number);

  • 100000 rows

  • Cold buffer; same buffer size on all systems.

  • Row level locking

  • Isolation level (SERIALIZABLE or READ COMMITTED)

  • SQL Server 7, DB2 v7.1 and Oracle 8i on Windows 2000

  • Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.


Value of serializability transactions
Value of Serializability -- transactions items that was committed when the transaction started:

Concurrent Transactions:

  • T1: summation query [1 thread]

    select sum(balance) from accounts;

  • T2: swap balance between two account numbers (in order of scan to avoid deadlocks) [N threads]

    T1:

    valX:=select balance from accounts where number=X;valY:=select balance from accounts where number=Y;T2:

    update accounts set balance=valX where number=Y;update accounts set balance=valY where number=X;


Value of serializability results

With SQL Server and DB2 the scan returns incorrect answers if the read committed isolation level is used (default setting)

With Oracle correct answers are returned (snapshot isolation).

Value of Serializability -- results


Cost of serializability

Because the update conflicts with the scan, correct answers are obtained at the cost of decreased concurrency and thus decreased throughput.

Cost of Serializability


Locking overhead data
Locking Overhead -- data are obtained at the cost of decreased concurrency and thus decreased throughput.

Settings:

accounts( number, branchnum, balance);

create clustered index c on accounts(number);

  • 100000 rows

  • Cold buffer

  • SQL Server 7, DB2 v7.1 and Oracle 8i on Windows 2000

  • No lock escalation on Oracle; Parameter set so that there is no lock escalation on DB2; no control on SQL Server.

  • Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.


Locking overhead transactions
Locking Overhead -- transactions are obtained at the cost of decreased concurrency and thus decreased throughput.

No Concurrent Transactions:

  • Update [10 000 updates]update accounts set balance = Val;

  • Insert [10 000 transactions], e.g. typical one:insert into accounts values(664366,72255,2296.12);


Locking overhead

Row locking is barely more expensive than table locking because recovery overhead is higher than row locking overhead

Exception is updates on DB2 where table locking is distinctly less expensive than row locking.

Locking Overhead


Logical bottleneck sequential key generation
Logical Bottleneck: Sequential Key generation because recovery overhead is higher than row locking overhead

  • Consider an application in which one needs a sequential number to act as a key in a table, e.g. invoice numbers for bills.

  • Ad hoc approach: a separate table holding the last invoice number. Fetch and update that number on each insert transaction.

  • Counter approach: use facility such as Sequence (Oracle)/Identity(MSSQL).


Counter facility data
Counter Facility -- data because recovery overhead is higher than row locking overhead

Settings:

  • default isolation level: READ COMMITTED; Empty tables

  • Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

accounts( number, branchnum, balance);

create clustered index c on accounts(number);

counter ( nextkey );

insert into counter values (1);


Counter facility transactions
Counter Facility -- transactions because recovery overhead is higher than row locking overhead

No Concurrent Transactions:

  • System [100 000 inserts, N threads]

    • SQL Server 7 (uses Identity column)

      insert into accounts values (94496,2789);

    • Oracle 8i

      insert into accounts values (seq.nextval,94496,2789);

  • Ad-hoc [100 000 inserts, N threads]begin transactionNextKey:=select nextkey from counter; update counter set nextkey = NextKey+1; insert into accounts values(NextKey,?,?);commit transaction


Avoid bottlenecks counters

System generated counter (system) much better than a counter managed as an attribute value within a table (ad hoc). Counter is separate transaction.

The Oracle counter can become a bottleneck if every update is logged to disk, but caching many counter numbers is possible.

Counters may miss ids.

Avoid Bottlenecks: Counters


Semantics altering chopping
Semantics-Altering Chopping managed as an attribute value within a table (ad hoc). Counter is separate transaction.

You call up an airline and want to reserve a seat. You talk to the agent, find a seat and then reserve it.

Should all of this be a single transaction?


Semantics altering chopping ii
Semantics-Altering Chopping II managed as an attribute value within a table (ad hoc). Counter is separate transaction.

Probably not. You don’t want to hold locks through human interaction.

Transaction redesign: read is its own transaction. Get seat is another.

Consequences?


Atomicity and durability

Every transaction either commits or aborts. It cannot change its mind

Even in the face of failures:

Effects of committed transactions should be permanent;

Effects of aborted transactions should leave no trace.

Atomicity and Durability

COMMITTED

COMMIT

Ø

ACTIVE

(running, waiting)

BEGINTRANS

ABORTED

ROLLBACK


UNSTABLE STORAGE its mind

DATABASE BUFFER

LOG BUFFER

lri

lrj

WRITE log records before commit

WRITE

modified pages after commit

LOG

DATA

DATA

DATA

RECOVERY

STABLE STORAGE


Log io data
Log IO -- data its mind

Settings:

lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER , L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

  • READ COMMITTED isolation level

  • Empty table

  • Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.


Log io transactions
Log IO -- transactions its mind

No Concurrent Transactions:

Insertions [300 000 inserts, 10 threads], e.g.,

insert into lineitem values (1,7760,401,1,17,28351.92,0.04,0.02,'N','O','1996-03-13','1996-02-12','1996-03-22','DELIVER IN PERSON','TRUCK','blithely regular ideas caj');


Group commits

DB2 UDB v7.1 on Windows 2000 its mind

Log records of many transactions are written together

Increases throughput by reducing the number of writes

at cost of increased minimum response time.

Group Commits


Put the log on a separate disk

DB2 UDB v7.1 on Windows 2000 its mind

5 % performance improvement if log is located on a different disk

Controller cache hides negative impact

mid-range server, with Adaptec RAID controller (80Mb RAM) and 2x18Gb disk drives.

Put the Log on a Separate Disk


Tuning database writes
Tuning Database Writes its mind

  • Dirty data is written to disk

    • When the number of dirty pages is greater than a given parameter (Oracle 8)

    • When the number of dirty pages crosses a given threshold (less than 3% of free pages in the database buffer for SQL Server 7)

    • When the log is full, a checkpoint is forced. This can have a significant impact on performance.


Tune checkpoint intervals

Oracle 8i on Windows 2000 its mind

A checkpoint (partial flush of dirty pages to disk) occurs at regular intervals or when the log is full:

Impacts the performance of on-line processing

Reduces the size of log

Reduces time to recover from a crash

Tune Checkpoint Intervals


Database buffer size

Buffer too small, then hit ratio too small its mind

hit ratio = (logical acc. - physical acc.) / (logical acc.)

Buffer too large, paging

Recommended strategy: monitor hit ratio and increase buffer size until hit ratio flattens out. If there is still paging, then buy memory.

DATABASE PROCESSES

RAM

DATABASEBUFFER

Paging Disk

LOG

DATA

DATA

Database Buffer Size


Buffer size data
Buffer Size -- data its mind

Settings:

employees(ssnum, name, lat, long, hundreds1,

hundreds2);

clustered index c on employees(lat); (unused)

  • 10 distinct values of lat and long, 100 distinct values of hundreds1 and hundreds2

  • 20000000 rows (630 Mb);

  • Warm Buffer

  • Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000 RPM), Windows 2000.


Buffer size queries
Buffer Size -- queries its mind

Queries:

  • Scan Query

    select sum(long) from employees;

  • Multipoint query

    select * from employees where lat = ?;


Database buffer size1

SQL Server 7 on Windows 2000 its mind

Scan query:

LRU (least recently used) does badly when table spills to disk as Stonebraker observed 20 years ago.

Multipoint query:

Throughput increases with buffer size until all data is accessed from RAM.

Database Buffer Size


Scan performance data
Scan Performance -- data its mind

Settings:

lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER , L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

  • 600 000 rows

  • Lineitem tuples are ~ 160 bytes long

  • Cold Buffer

  • Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.


Scan performance queries
Scan Performance -- queries its mind

Queries:

select avg(l_discount) from lineitem;


Prefetching

DB2 UDB v7.1 on Windows 2000 its mind

Throughput increases up to a certain point when prefetching size increases.

Prefetching


Usage factor

DB2 UDB v7.1 on Windows 2000 its mind

Usage factor is the percentage of the page used by tuples and auxilliary data structures (the rest is reserved for future)

Scan throughput increases with usage factor.

Usage Factor



Outline
Outline its mind

  • Storage Subsystem Components

    • Moore’s law and consequences

    • Magnetic disk performances

  • From SCSI to SAN, NAS and Beyond

    • Storage virtualization

  • Tuning the Storage Subsystem

    • RAID levels

    • RAID controller cache


Exponential growth
Exponential Growth its mind

Moore’s law

  • Every 18 months:

    • New processing = sum of all existing processing

    • New storage = sum of all existing storage

  • 2x / 18 months ~ 100x / 10 years

    http://www.intel.com/research/silicon/moorespaper.pdf


Consequences of moore s law
Consequences of “Moore’s law” its mind

  • Over the last decade:

    • 10x better access time

    • 10x more bandwidth

    • 100x more capacity

    • 4000x lower media price

    • Scan takes 10x longer (3 min vs 45 min)

    • Data on disk is accessed 25x less often (on average)


Memory hierarchy
Memory Hierarchy its mind

Access Time

Price $/ Mb

1 ns

Processor cache

100

x10

RAM/flash

10

6

0.2

x10

Disks

0.2 (nearline)

10

Tapes / Optical Disks

x10


Magnetic disks

1956 its mind: IBM (RAMAC) first disk drive

5 Mb – 0.002 Mb/in235000$/year9 Kb/sec

1980: SEAGATE

first 5.25’’ disk drive

5 Mb – 1.96 Mb/in2625 Kb/sec

1999: IBM MICRODRIVE

first 1’’ disk drive340Mb 6.1 MB/sec

tracks

spindle

platter

read/write head

actuator

disk arm

Controller

disk interface

Magnetic Disks


Magnetic disks1

Access Time (2001) its mind

Controller overhead (0.2 ms)

Seek Time (4 to 9 ms)

Rotational Delay (2 to 6 ms)

Read/Write Time (10 to 500 KB/ms)

Disk Interface

IDE (16 bits, Ultra DMA - 25 MHz)

SCSI: width (narrow 8 bits vs. wide 16 bits) - frequency (Ultra3 - 80 MHz).

http://www.pcguide.com/ref/hdd/

Magnetic Disks


Raid levels
RAID Levels its mind

  • RAID 0: striping (no redundancy)

  • RAID 1: mirroring (2 disks)

  • RAID 5: parity checking

    • Read: stripes read from multiple disks (in parallel)

    • Write: 2 reads + 2 writes

  • RAID 10: striping and mirroring

  • Software vs. Hardware RAID:

    • Software RAID: run on the server’s CPU

    • Hardware RAID: run on the RAID controller’s CPU


Why 4 read writes when updating a single stripe using raid 5
Why 4 read/writes when updating a single stripe using RAID 5?

  • Read old data stripe; read parity stripe (2 reads)

  • XOR old data stripe with replacing one.

  • Take result of XOR and XOR with parity stripe.

  • Write new data stripe and new parity stripe (2 writes).


Raid levels data
RAID Levels -- data 5?

Settings:

accounts( number, branchnum, balance);

create clustered index c on accounts(number);

  • 100000 rows

  • Cold Buffer

  • Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.


Raid levels transactions
RAID Levels -- transactions 5?

No Concurrent Transactions:

  • Read Intensive:

    select avg(balance) from accounts;

  • Write Intensive, e.g. typical insert:

    insert into accounts values (690466,6840,2272.76);

    Writes are uniformly distributed.


Raid levels1

SQL Server7 on Windows 2000 (SoftRAID means striping/parity at host)

Read-Intensive:

Using multiple disks (RAID0, RAID 10, RAID5) increases throughput significantly.

Write-Intensive:

Without cache, RAID 5 suffers. With cache, it is ok.

RAID Levels


Raid levels2
RAID Levels at host)

  • Log File

    • RAID 1 is appropriate

      • Fault tolerance with high write throughput. Writes are synchronous and sequential. No benefits in striping.

  • Temporary Files

    • RAID 0 is appropriate.

      • No fault tolerance. High throughput.

  • Data and Index Files

    • RAID 5 is best suited for read intensive apps or if the RAID controller cache is effective enough.

    • RAID 10 is best suited for write intensive apps.


Controller prefetching no write back yes
Controller Prefetching no, at host)Write-back yes.

  • Read-ahead:

    • Prefetching at the disk controller level.

    • No information on access pattern.

    • Better to let database management system do it.

  • Write-back vs. write through:

    • Write back: transfer terminated as soon as data is written to cache.

      • Batteries to guarantee write back in case of power failure

    • Write through: transfer terminated as soon as data is written to disk.


Scsi controller cache data
SCSI Controller Cache -- data at host)

Settings:

employees(ssnum, name, lat, long, hundreds1,

hundreds2);

create clustered index c on employees(hundreds2);

  • Employees table partitioned over two disks; Log on a separate disk; same controller (same channel).

  • 200 000 rows per table

  • Database buffer size limited to 400 Mb.

  • Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.


Scsi not disk controller cache transactions
SCSI (not disk) Controller Cache -- transactions at host)

No Concurrent Transactions:

update employees set lat = long, long = lat where hundreds2 = ?;

  • cache friendly: update of 20,000 rows (~90Mb)

  • cache unfriendly: update of 200,000 rows (~900Mb)


Scsi controller cache

SQL Server 7 on Windows 2000. at host)

Adaptec ServerRaid controller:

80 Mb RAM

Write-back mode

Updates

Controller cache increases throughput whether operation is cache friendly or not.

Efficient replacement policy!

SCSI Controller Cache


Index tuning
Index Tuning at host)

  • Index issues

    • Indexes may be better or worse than scans

    • Multi-table joins that run on for hours, because the wrong indexes are defined

    • Concurrency control bottlenecks

    • Indexes that are maintained and never used


Index
Index at host)

An index is a data structure that supports efficient access to data

Set ofRecords

Matchingrecords

Conditiononattributevalue

index

(search key)


Index1
Index at host)

An index is a data structure that supports efficient access to data

Set ofRecords

Matchingrecords

Conditiononattributevalue

index

(search key)


Search keys
Search Keys at host)

  • A (search) key is a sequence of attributes.

    create index i1 on accounts(branchnum, balance);

  • Types of keys

    • Sequential: the value of the key is monotonic with the insertion order (e.g., counter or timestamp)

    • Non sequential: the value of the key is unrelated to the insertion order (e.g., social security number)


Data structures
Data Structures at host)

  • Most index data structures can be viewed as trees.

  • In general, the root of this tree will always be in main memory, while the leaves will be located on disk.

    • The performance of a data structure depends on the number of nodes in the average path from the root to the leaf.

    • Data structure with high fan-out (maximum number of children of an internal node) are thus preferred.


B tree
B+-Tree at host)

  • A B+-Tree is a balanced tree whose leaves contain a sequence of key-pointer pairs.

96

75 83

107

33 48 69

75 80 81

83 92 95

96 98 103

107 110 120


B tree performance
B+-Tree Performance at host)

  • Tree levels

    • Tree Fanout

      • Size of key

      • Page utilization

  • Tree maintenance

    • Online

      • On inserts

      • On deletes

    • Offline

  • Tree locking

  • Tree root in main memory


B tree performance1
B+-Tree Performance at host)

  • Key length influences fanout

    • Choose small key when creating an index

    • Key compression

      • Prefix compression (Oracle 8, MySQL): only store that part of the key that is needed to distinguish it from its neighbors: Smi, Smo, Smy for Smith, Smoot, Smythe.

      • Front compression (Oracle 5): adjacent keys have their front portion factored out: Smi, (2)o, (2)y. There are problems with this approach:

        • Processor overhead for maintenance

        • Locking Smoot requires locking Smith too.


Hash index

values at host)

Hashed key

key

R1 R5

Hashfunction

0

1

R3 R6 R9

R14 R17 R21

R25

2341

n

Hash Index

  • A hash index stores key-value pairs based on a pseudo-randomizing function called a hash function.

The length of

these chains impacts performance


Clustered non clustered index

Clustered index (primary index) at host)

A clustered index on attribute X co-locates records whose X values are near to one another.

Non-clustered index (secondary index)

A non clustered index does not constrain table organization.

There might be several non-clustered indexes per table.

Clustered / Non clustered index

Records

Records


Dense sparse index

Sparse index at host)

Pointers are associated to pages

Dense index

Pointers are associated to records

Non clustered indexes are dense

Dense / Sparse Index

P1

P2

Pi

record

record

record


Index implementations in some major dbms

SQL Server at host)

B+-Tree data structure

Clustered indexes are sparse

Indexes maintained as updates/insertions/deletes are performed

DB2

B+-Tree data structure, spatial extender for R-tree

Clustered indexes are dense

Explicit command for index reorganization

Oracle

B+-tree, hash, bitmap, spatial extender for R-Tree

clustered index

Index organized table (unique/clustered)

Clusters used when creating tables.

Index-organized tables organize data according to index.

Index Implementations in some major DBMS


Types of queries

Point Query at host)SELECT balanceFROM accountsWHERE number = 1023;

Multipoint QuerySELECT balanceFROM accountsWHERE branchnum = 100;

Range QuerySELECT numberFROM accountsWHERE balance > 10000 and balance <= 20000;

Prefix Match QuerySELECT *FROM employeesWHERE name = ‘J*’ ;

Types of Queries


More types of queries

Extremal Query at host)SELECT *FROM accountsWHERE balance = max(select balance from accounts)

Ordering QuerySELECT *FROM accountsORDER BY balance;

Grouping QuerySELECT branchnum, avg(balance)FROM accountsGROUP BY branchnum;

Join QuerySELECT distinct branch.adresseFROM accounts, branchWHERE accounts.branchnum = branch.numberand accounts.balance > 10000;

More Types of Queries


Index tuning data

Settings: at host)

employees(ssnum, name, lat, long, hundreds1,

hundreds2);

clustered index c on employees(hundreds1) with fillfactor = 100;

nonclustered index nc on employees (hundreds2);

index nc3 on employees (ssnum, name, hundreds2);

index nc4 on employees (lat, ssnum, name);

1000000 rows ; Cold buffer

Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.

Index Tuning -- data


Index tuning operations

Operations: at host)

Update: update employees set name = ‘XXX’ where ssnum = ?;

Insert: insert into employees values (1003505,'polo94064',97.48,84.03,4700.55,3987.2);

Multipoint query: select * from employees where hundreds1= ?; select * from employees where hundreds2= ?;

Covered query: select ssnum, name, lat from employees;

Range Query: select * from employees where long between ? and ?;

Point Query: select * from employees where ssnum = ?

Index Tuning -- operations


Clustered index

Multipoint query that returns 100 records out of 1000000. at host)

Cold buffer

Clustered index is twice as fast as non-clustered index and orders of magnitude faster than a scan.

Clustered Index


Index face lifts

Index is created with fillfactor = 100. at host)

Insertions cause page splits and extra I/O for each query

Maintenance consists in dropping and recreating the index

With maintenance performance is constant while performance degrades significantly if no maintenance is performed.

Index “Face Lifts”


Index maintenance

In Oracle, clustered index are approximated by an index defined on a clustered table

No automatic physical reorganization

Index defined with pctfree = 0

Overflow pages cause performance degradation

Index Maintenance


Covering index defined
Covering Index - defined defined on a clustered table

  • Select name from employee where department = “marketing”

  • Good covering index would be on (department, name)

  • Index on (name, department) less useful.

  • Index on department alone moderately useful.


Covering index impact

Covering index performs better than clustering index when first attributes of index are in the where clause and last attributes in the select.

When attributes are not in order then performance is much worse.

Covering Index - impact


Scan can sometimes win

IBM DB2 v7.1 on Windows 2000 first attributes of index are in the where clause and last attributes in the select.

Range Query

If a query retrieves 10% of the records or more, scanning is often better than using a non-clustering non-covering index. Crossover > 10% when records are large or table is fragmented on disk – scan cost increases.

Scan Can Sometimes Win


Index on small tables

Small table: 100 records, i.e., a few pages. first attributes of index are in the where clause and last attributes in the select.

Two concurrent processes perform updates (each process works for 10ms before it commits)

No index: the table is scanned for each update. No concurrent updates.

A clustered index allows to take advantage of row locking.

Index on Small Tables


Bitmap vs hash vs b tree
Bitmap vs. Hash vs. B+-Tree first attributes of index are in the where clause and last attributes in the select.

Settings:

employees(ssnum, name, lat, long, hundreds1,

hundreds2);

create cluster c_hundreds (hundreds2 number(8)) PCTFREE 0;

create cluster c_ssnum(ssnum integer) PCTFREE 0 size 60;

create cluster c_hundreds(hundreds2 number(8)) PCTFREE 0 HASHKEYS 1000 size 600;

create cluster c_ssnum(ssnum integer) PCTFREE 0 HASHKEYS 1000000 SIZE 60;

create bitmap index b on employees (hundreds2);

create bitmap index b2 on employees (ssnum);

  • 1000000 rows ; Cold buffer

  • Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.


Multipoint query b tree hash tree bitmap

There is an overflow chain in a hash index because hundreds2 has few values

In a clustered B-Tree index records are on contiguous pages.

Bitmap is proportional to size of table and non-clustered for record access.

Multipoint query: B-Tree, Hash Tree, Bitmap


Hash indexes don’t help when evaluating range queries has few values

Hash index outperforms B-tree on point queries

B-Tree, Hash Tree, Bitmap


Tuning relational systems
Tuning Relational Systems has few values

  • Schema Tuning

    • Denormalization, Vertical Partitioning

  • Query Tuning

    • Query rewriting

    • Materialized views


Denormalizing data

Settings: has few values

lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

region( R_REGIONKEY, R_NAME, R_COMMENT );

nation( N_NATIONKEY, N_NAME, N_REGIONKEY, N_COMMENT,);

supplier( S_SUPPKEY, S_NAME, S_ADDRESS, S_NATIONKEY, S_PHONE, S_ACCTBAL, S_COMMENT);

600000 rows in lineitem, 25 nations, 5 regions, 500 suppliers

Denormalizing -- data


Denormalizing transactions
Denormalizing -- transactions has few values

lineitemdenormalized ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT, L_REGIONNAME);

  • 600000 rows in lineitemdenormalized

  • Cold Buffer

  • Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.


Queries on normalized vs denormalized schemas
Queries on Normalized vs. Denormalized Schemas has few values

Queries:

select L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, R_NAME

from LINEITEM, REGION, SUPPLIER, NATION

where

L_SUPPKEY = S_SUPPKEY

and S_NATIONKEY = N_NATIONKEY

and N_REGIONKEY = R_REGIONKEY

and R_NAME = 'EUROPE';

select L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_REGIONNAME

from LINEITEMDENORMALIZED

where L_REGIONNAME = 'EUROPE';


Denormalization

TPC-H schema has few values

Query: find all lineitems whose supplier is in Europe.

With a normalized schema this query is a 4-way join.

If we denormalize lineitem and add the name of the region for each lineitem (foreign key denormalization) throughput improves 30%

Denormalization


Vertical partitioning
Vertical Partitioning has few values

  • Consider account(id, balance, homeaddress)

  • When might it be a good idea to do a “vertical partitioning” into account1(id,balance) and account2(id,homeaddress)?

  • Join vs. size.


Vertical partitioning1

Which design is better depends on the query pattern: has few values

The application that sends a monthly statement is the principal user of the address of the owner of an account

The balance is updated or examined several times a day.

The second schema might be better because the relation (account_ID, balance) can be made smaller:

More account_ID, balance pairs fit in memory, thus increasing the hit ratio

A scan performs better because there are fewer pages.

Vertical Partitioning


Tuning normalization
Tuning Normalization has few values

  • A single normalized relation XYZ is better than two normalized relations XY and XZ if the single relation design allows queries to access X, Y and Z together without requiring a join.

  • The two-relation design is better iff:

    • Users access tend to partition between the two sets Y and Z most of the time

    • Attributes Y or Z have large values


Vertical partitioning and scan

R (X,Y,Z) has few values

X is an integer

YZ are large strings

Scan Query

Vertical partitioning exhibits poor performance when all attributes are accessed.

Vertical partitioning provides a sped up if only two of the attributes are accessed.

Vertical Partitioningand Scan


Vertical partitioning and point queries

R (X,Y,Z) has few values

X is an integer

YZ are large strings

A mix of point queries access either XYZ or XY.

Vertical partitioning gives a performance advantage if the proportion of queries accessing only XY is greater than 20%.

The join is not expensive compared to a simple look-up.

Vertical Partitioningand Point Queries


Queries

Settings: has few values

employee(ssnum, name, dept, salary, numfriends);

student(ssnum, name, course, grade);

techdept(dept, manager, location);

clustered index i1 on employee (ssnum);

nonclustered index i2 on employee (name);

nonclustered index i3 on employee (dept);

clustered index i4 on student (ssnum);

nonclustered index i5 on student (name);

clustered index i6 on techdept (dept);

100000 rows in employee, 100000 students, 10 departments; Cold buffer

Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.

Queries


Queries view on join

View Techlocation: has few valuescreate view techlocation as select ssnum, techdept.dept, location from employee, techdept where employee.dept = techdept.dept;

Queries:

Original:

select dept from techlocation where ssnum = ?;

Rewritten:

select dept from employee where ssnum = ?;

Queries – View on Join


Query rewriting views

All systems expand the selection on a view into a join has few values

The difference between a plain selection and a join (on a primary key-foreign key) followed by a projection is greater on SQL Server than on Oracle and DB2 v7.1.

Query Rewriting - Views


Queries correlated subqueries

Queries: has few values

Original:

select ssnum from employee e1 where salary = (select max(salary) from employee e2 where e2.dept = e1.dept);

Rewritten:

select max(salary) as bigsalary, dept

into TEMP

from employee group by dept;

select ssnum

from employee, TEMP

where salary = bigsalary

and employee.dept = temp.dept;

Queries – Correlated Subqueries


Query rewriting correlated subqueries

SQL Server 2000 does a good job at handling the correlated subqueries (a hash join is used as opposed to a nested loop between query blocks)

The techniques implemented in SQL Server 2000 are described in “Orthogonal Optimization of Subqueries and Aggregates” by C.Galindo-Legaria and M.Joshi, SIGMOD 2001.

Query Rewriting – Correlated Subqueries

> 1000

> 10000


Eliminate unneeded distincts
Eliminate unneeded DISTINCTs subqueries (a hash join is used as opposed to a nested loop between query blocks)

  • Query: Find employees who work in the information systems department. There should be no duplicates.

    SELECT distinct ssnumFROM employeeWHERE dept = ‘information systems’

  • DISTINCT is unnecessary, since ssnum is a key of employee so certainly is a key of a subset of employee.


Eliminate unneeded distincts1
Eliminate unneeded DISTINCTs subqueries (a hash join is used as opposed to a nested loop between query blocks)

  • Query: Find social security numbers of employees in the technical departments. There should be no duplicates.

    SELECT DISTINCT ssnumFROM employee, techWHERE employee.dept = tech.dept

  • Is DISTINCT needed?


Distinct unnecessary here too
Distinct Unnecessary Here Too subqueries (a hash join is used as opposed to a nested loop between query blocks)

  • Since dept is a key of the tech table, each employee record will join with at most one record in tech.

  • Because ssnum is a key for employee, distinct is unnecessary.


Reaching
Reaching subqueries (a hash join is used as opposed to a nested loop between query blocks)

  • The relationship among DISTINCT, keys and joins can be generalized:

    • Call a table T privileged if the fields returned by the select contain a key of T

    • Let R be an unprivileged table. Suppose that R is joined on equality by its key field to some other table S, then we say R reaches S.

    • Now, define reaches to be transitive. So, if R1 reaches R2 and R2 reaches R3 then say that R1 reaches R3.


Reaches main theorem
Reaches: Main Theorem subqueries (a hash join is used as opposed to a nested loop between query blocks)

  • There will be no duplicates among the records returned by a selection, even in the absence of DISTINCT if one of the two following conditions hold:

    • Every table mentioned in the FROM clause is privileged

    • Every unprivileged table reaches at least one privileged table.


Reaches proof sketch
Reaches: Proof Sketch subqueries (a hash join is used as opposed to a nested loop between query blocks)

  • If every relation is privileged then there are no duplicates

    • The keys of those relations are in the from clause

  • Suppose some relation T is not privileged but reaches at least one privileged one, say R. Then the qualifications linking T with R ensure that each distinct combination of privileged records is joined with at most one record of T.


Reaches example 1
Reaches: Example 1 subqueries (a hash join is used as opposed to a nested loop between query blocks)

  • SELECT ssnumFROM employee, techWHERE employee.manager = tech.manager

  • The same employee record may match several tech records (because manager is not a key of tech), so the ssnum of that employee record may appear several times.

  • Tech does not reach the privileged relation employee.


Reaches example 2
Reaches: Example 2 subqueries (a hash join is used as opposed to a nested loop between query blocks)

  • Each repetition of a given ssnum vlaue would be accompanied by a new tech.dept since tech.dept is a key of tech

  • Both relations are privileged.

SELECT ssnum, tech.deptFROM employee, techWHERE employee.manager = tech.manager


Reaches example 3
Reaches: Example 3 subqueries (a hash join is used as opposed to a nested loop between query blocks)

SELECT student.ssnumFROM student, employee, techWHERE student.name = employee.name AND employee.dept = tech.dept;

  • Student is priviledged

  • Employee does not reach student (name is not a key of employee)

  • DISTINCT is needed to avoid duplicates.


Aggregate maintenance data
Aggregate Maintenance -- data subqueries (a hash join is used as opposed to a nested loop between query blocks)

Settings:

orders( ordernum, itemnum, quantity, storeid, vendorid );

create clustered index i_order on orders(itemnum);

store( storeid, name );

item(itemnum, price);

create clustered index i_item on item(itemnum);

vendorOutstanding( vendorid, amount);

storeOutstanding( storeid, amount);

  • 1000000 orders, 10000 stores, 400000 items; Cold buffer

  • Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.


Aggregate maintenance triggers
Aggregate Maintenance -- triggers subqueries (a hash join is used as opposed to a nested loop between query blocks)

Triggers for Aggregate Maintenance

create trigger updateVendorOutstanding on orders for insert as

update vendorOutstanding

set amount =

(select vendorOutstanding.amount+sum(inserted.quantity*item.price)

from inserted,item

where inserted.itemnum = item.itemnum

)

where vendorid = (select vendorid from inserted) ;

create trigger updateStoreOutstanding on orders for insert as

update storeOutstanding

set amount =

(select storeOutstanding.amount+sum(inserted.quantity*item.price)

from inserted,item

where inserted.itemnum = item.itemnum

)

where storeid = (select storeid from inserted)


Aggregate maintenance transactions
Aggregate Maintenance -- transactions subqueries (a hash join is used as opposed to a nested loop between query blocks)

Concurrent Transactions:

  • Insertions

    insert into orders values (1000350,7825,562,'xxxxxx6944','vendor4');

  • Queries (first without, then with redundant tables)

    select orders.vendor, sum(orders.quantity*item.price)

    from orders,item

    where orders.itemnum = item.itemnum

    group by orders.vendorid;

    vs. select * from vendorOutstanding;

    select store.storeid, sum(orders.quantity*item.price)

    from orders,item, store

    where orders.itemnum = item.itemnum

    and orders.storename = store.name

    group by store.storeid;

    vs. select * from storeOutstanding;


Aggregate maintenance

SQLServer 2000 on Windows 2000 subqueries (a hash join is used as opposed to a nested loop between query blocks)

Using triggers for view maintenance

If queries frequent or important, then aggregate maintenance is good.

Aggregate Maintenance


Superlinearity data
Superlinearity -- data subqueries (a hash join is used as opposed to a nested loop between query blocks)

Settings:

sales( id, itemid, customerid, storeid, amount, quantity);

item (itemid);

customer (customerid);

store (storeid);

A sale is successful if all foreign keys are present.

successfulsales(id, itemid, customerid, storeid, amount, quantity);

unsuccessfulsales(id, itemid, customerid, storeid, amount, quantity);

tempsales( id, itemid, customerid, storeid, amount,quantity);


Superlinearity indexes
Superlinearity -- indexes subqueries (a hash join is used as opposed to a nested loop between query blocks)

Settings (non-clustering, dense indexes):

index s1 on item(itemid);

index s2 on customer(customerid);

index s3 on store(storeid);

index succ on successfulsales(id);

  • 1000000 sales, 400000 customers, 40000 items, 1000 stores

  • Cold buffer

  • Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.


Superlinearity queries
Superlinearity -- queries subqueries (a hash join is used as opposed to a nested loop between query blocks)

Queries:

  • Insert/create indexdelete

    insert into successfulsales

    select sales.id, sales.itemid, sales.customerid, sales.storeid, sales.amount, sales.quantity

    from sales, item, customer, store

    where sales.itemid = item.itemid

    and sales.customerid = customer.customerid

    and sales.storeid = store.storeid;

    insert into unsuccessfulsales

    select * from sales;

    go

    delete from unsuccessfulsales

    where id in (select id from successfulsales)


Superlinearity batch queries
Superlinearity -- batch queries subqueries (a hash join is used as opposed to a nested loop between query blocks)

Queries:

  • Small batches

    DECLARE @Nlow INT;

    DECLARE @Nhigh INT;

    DECLARE @INCR INT;

    set @INCR = 100000

    set @NLow = 0

    set @Nhigh = @INCR

    WHILE (@NLow <= 500000)

    BEGIN

    insert into tempsales

    select * from sales

    where id between @NLow and @Nhigh

    set @Nlow = @Nlow + @INCR

    set @Nhigh = @Nhigh + @INCR

    delete from tempsales

    where id in (select id from successfulsales);

    insert into unsuccessfulsales

    select * from tempsales;

    delete from tempsales;

    END


Superlinearity outer join
Superlinearity -- outer join subqueries (a hash join is used as opposed to a nested loop between query blocks)

Queries:

  • outerjoin

    insert into successfulsales

    select sales.id, item.itemid, customer.customerid, store.storeid, sales.amount, sales.quantity

    from

    ((sales left outer join item on sales.itemid = item.itemid)

    left outer join customer on sales.customerid = customer.customerid)

    left outer join store on sales.storeid = store.storeid;

    insert into unsuccessfulsales

    select *

    from successfulsales

    where itemid is null

    or customerid is null

    or storeid is null;

    go

    delete from successfulsales

    where itemid is null

    or customerid is null

    or storeid is null


Circumventing superlinearity

SQL Server 2000 subqueries (a hash join is used as opposed to a nested loop between query blocks)

Outer join achieves the best response time.

Small batches do not help because overhead of crossing the application interface is higher than the benefit of joining with smaller tables.

Circumventing Superlinearity


Tuning the application interface
Tuning the Application Interface subqueries (a hash join is used as opposed to a nested loop between query blocks)

  • 4GL

    • Power++, Visual basic

  • Programming language + Call Level Interface

    • ODBC: Open DataBase Connectivity

    • JDBC: Java based API

    • OCI (C++/Oracle), CLI (C++/ DB2), Perl/DBI

  • In the following experiments, the client program is located on the database server site. Overhead is due to crossing the application interface.


Looping can hurt data
Looping can hurt -- data subqueries (a hash join is used as opposed to a nested loop between query blocks)

Settings:

lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

  • 600 000 rows; warm buffer.

  • Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.


Looping can hurt queries

Queries: subqueries (a hash join is used as opposed to a nested loop between query blocks)

No loop:

sqlStmt = “select * from lineitem where l_partkey <= 200;”

odbc->prepareStmt(sqlStmt);

odbc->execPrepared(sqlStmt);

Loop:

sqlStmt = “select * from lineitem where l_partkey = ?;”

odbc->prepareStmt(sqlStmt);

for (int i=1; i<100; i++)

{

odbc->bindParameter(1, SQL_INTEGER, i);

odbc->execPrepared(sqlStmt);

}

Looping can hurt -- queries


Looping can hurt

SQL Server 2000 on Windows 2000 subqueries (a hash join is used as opposed to a nested loop between query blocks)

Crossing the application interface has a significant impact on performance.

Why would a programmer use a loop instead of relying on set-oriented operations: object-orientation?

Looping can Hurt


Cursors are death data
Cursors are Death -- data subqueries (a hash join is used as opposed to a nested loop between query blocks)

Settings:

employees(ssnum, name, lat, long, hundreds1,

hundreds2);

  • 100000 rows ; Cold buffer

  • Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.


Cursors are death queries
Cursors are Death -- queries subqueries (a hash join is used as opposed to a nested loop between query blocks)

Queries:

  • No cursor

    select * from employees;

  • Cursor

    DECLARE d_cursor CURSOR FOR select * from employees;

    OPEN d_cursorwhile (@@FETCH_STATUS = 0)

    BEGIN

    FETCH NEXT from d_cursorEND

    CLOSE d_cursor

    go


Cursors are death

SQL Server 2000 on Windows 2000 subqueries (a hash join is used as opposed to a nested loop between query blocks)

Response time is a few seconds with a SQL query and more than an hour iterating over a cursor.

Cursors are Death


Retrieve needed columns only data
Retrieve Needed Columns Only - data subqueries (a hash join is used as opposed to a nested loop between query blocks)

Settings:

lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

create index i_nc_lineitem on lineitem (l_orderkey, l_partkey, l_suppkey, l_shipdate, l_commitdate);

  • 600 000 rows; warm buffer.

  • Lineitem records are ~ 10 bytes long

  • Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.


Retrieve needed columns only queries
Retrieve Needed Columns Only - queries subqueries (a hash join is used as opposed to a nested loop between query blocks)

Queries:

  • All

    Select * from lineitem;

  • Covered subset

    Select l_orderkey, l_partkey, l_suppkey, l_shipdate, l_commitdate from lineitem;


Retrieve needed columns only

Avoid transferring unnecessary data subqueries (a hash join is used as opposed to a nested loop between query blocks)

May enable use of a covering index.

In the experiment the subset contains ¼ of the attributes.

Reducing the amount of data that crosses the application interface yields significant performance improvement.

Retrieve Needed Columns Only

Experiment performed on Oracle8iEE on Windows 2000.


Bulk loading data
Bulk Loading Data subqueries (a hash join is used as opposed to a nested loop between query blocks)

Settings:

lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

  • Initially the table is empty; 600 000 rows to be inserted (138Mb)

  • Table sits one disk. No constraint, index is defined.

  • Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.


Bulk loading queries
Bulk Loading Queries subqueries (a hash join is used as opposed to a nested loop between query blocks)

Oracle 8i

sqlldr directpath=true control=load_lineitem.ctl data=E:\Data\lineitem.tbl

load data

infile "lineitem.tbl"

into table LINEITEM append

fields terminated by '|'

(

L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE DATE "YYYY-MM-DD", L_COMMITDATE DATE "YYYY-MM-DD", L_RECEIPTDATE DATE "YYYY-MM-DD", L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT

)


Direct path

Direct path loading bypasses the query engine and the storage manager. It is orders of magnitude faster than for conventional bulk load (commit every 100 records) and inserts (commit for each record).

Direct Path

Experiment performed on Oracle8iEE on Windows 2000.


Batch size

Throughput increases steadily when the batch size increases to 100000 records.Throughput remains constant afterwards.

Trade-off between performance and amount of data that has to be reloaded in case of problem.

Batch Size

Experiment performed on SQL Server 2000 on Windows 2000.


Tuning e commerce applications
Tuning to 100000 records.Throughput remains constant afterwards.E-Commerce Applications

Database-backed web-sites:

  • Online shops

  • Shop comparison portals

  • MS TerraServer


E commerce application architecture
E-commerce Application Architecture to 100000 records.Throughput remains constant afterwards.

Clients

Web servers

Application servers

Database server

Web cache

DB cache

Web cache

DB cache

Web cache


E commerce application workload
E-commerce Application Workload to 100000 records.Throughput remains constant afterwards.

  • Touristic searching (frequent, cached)

    • Access the top few pages. Pages may be personalized. Data may be out-of-date.

  • Category searching (frequent, partly cached and need for timeliness guarantees)

    • Down some hierarchy, e.g., men’s clothing.

  • Keyword searching (frequent, uncached, need for timeliness guarantees)

  • Shopping cart interactions (rare, but transactional)

  • Electronic purchasing (rare, but transactional)


Design issues
Design Issues to 100000 records.Throughput remains constant afterwards.

  • Need to keep historic information

    • Electronic payment acknowledgements get lost.

  • Preparation for variable load

    • Regular patterns of web site accesses during the day, and within a week.

  • Possibility of disconnections

    • State information transmitted to the client (cookies)

  • Special consideration for low bandwidth

  • Schema evolution

    • Representing e-commerce data as attribute-value pairs (IBM Websphere)


Caching
Caching to 100000 records.Throughput remains constant afterwards.

  • Web cache:

    • Static web pages

    • Caching fragments of dynamically created web pages

  • Database cache (Oracle9iAS, TimesTen’s FrontTier)

    • Materialized views to represent cached data.

    • Queries are executed either using the database cache or the database server. Updates are propagated to keep the cache(s) consistent.

    • Note to vendors: It would be good to have queries distributed between cache and server.


Ecommerce setting
Ecommerce -- setting to 100000 records.Throughput remains constant afterwards.

Settings:

shoppingcart( shopperid, itemid, price, qty);

  • 500000 rows; warm buffer

  • Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.


Ecommerce transactions
Ecommerce -- transactions to 100000 records.Throughput remains constant afterwards.

Concurrent Transactions:

  • Mix

    insert into shoppingcart values (107999,914,870,214);

    update shoppingcart set Qty = 10 where shopperid = 95047 and itemid = 88636;

    delete from shoppingcart where shopperid = 86123 and itemid = 8321;

    select shopperid, itemid, qty, price from shoppingcart where shopperid = ?;

  • Queries Only

    select shopperid, itemid, qty, price from shoppingcart where shopperid = ?;


Connection pooling no refusals

Each thread establishes a connection and performs 5 insert statements.

If a connection cannot be established the thread waits 15 secs before trying again.

The number of connection is limited to 60 on the database server.

Using connection pooling, the requests are queued and serviced when possible. There are no refused connections.

Connection Pooling (no refusals)

Experiment performed on Oracle8i on Windows 2000


Indexing

Using a clustered index on shopperid in the shopping cart provides:

Query speed-up

Update/Deletion speed-up

Indexing

Experiment performed on SQL Server 2000 on Windows 2000


Capacity planning

Arrival Rate provides:

A1 is given as an assumption

A2 = (0.4 A1) + (0.5 A2)

A3 = 0.1 A2

Service Time (S)

S1, S2, S3 are measured

Utilization

U = A x S

Response Time

R = U/(A(1-U)) = S/(1-U)

(assuming Poisson arrivals)

Capacity Planning

Entry (S1)

0.4

0.5

Search (S2)

0.1

Checkout (S3)

Getting the demand assumptions right

is what makes capacity planning hard


How to handle multiple servers
How to Handle Multiple Servers provides:

  • Suppose one has n servers for some task that requires S time for a single server to perform.

  • The perfect parallelism model is that it is as if one has a single server that is n times as fast.

  • However, this overstates the advantage of parallelism, because even if there were no waiting, single tasks require S time.


Rough estimate for multiple servers
Rough Estimate for Multiple Servers provides:

  • There are two components to response time: waiting time + service time.

  • In the parallel setting, the service time is still S.

  • The waiting time however can be well estimated by a server that is n times as fast.


Approximating waiting time for n parallel servers
Approximating waiting time for n parallel servers. provides:

  • Recall: R = U/(A(1-U)) = S/(1-U)

  • On an n-times faster server, service time is divided by n, so the single processor utilization U is also divided by n. So we would get: Rideal = (S/n)/(1 – (U/n)).

  • That Rideal = serviceideal + waitideal.

  • So waitideal = Rideal – S/n

  • Our assumption: waitideal ~ wait for n processors.


Approximating response time for n parallel servers
Approximating response time for provides:n parallel servers

  • Waiting time for n parallel processors ~ (S/n)/(1 – (U/n)) – S/n = (S/n) ( 1/(1-(U/n)) – 1) =(S/(n(1 – U/n)))(U/n) = (S/(n – U))(U/n)

  • So, response time for n parallel processors is above waiting time + S.


Example
Example provides:

  • A = 8 per second.

  • S = 0.1 second.

  • U = 0.8.

  • Single server response time = S/(1-U) = 0.1/0.2 = 0.5 seconds.

  • If we have 2 servers, then we estimate waiting time to be (0.1/(2-0.8))(0.4) = 0.04/1.2 = 0.033. So the response time is 0.133.

  • For a 2-times faster server, S = 0.05, U = 0.4, so response time is 0.05/0.6 = 0.0833


Example continued
Example -- continued provides:

  • A = 8 per second.

  • S = 0.1 second.

  • U = 0.8.

  • If we have 4 servers, then we estimate waiting time to be (S/(n – U))(U/n) = 0.1/(3.2) * (0.8/4) = 0.02/3.2 = 0.00625 So response time is 0.10625.


Datawarehouse tuning
Datawarehouse Tuning provides:

  • Aggregate (strategic) targeting:

    • Aggregates flow up from a wide selection of data, and then

    • Targeted decisions flow down

  • Examples:

    • Riding the wave of clothing fads

    • Tracking delays for frequent-flyer customers


Data warehouse workload
Data Warehouse provides:Workload

  • Broad

    • Aggregate queries over ranges of values, e.g., find the total sales by region and quarter.

  • Deep

    • Queries that require precise individualized information, e.g., which frequent flyers have been delayed several times in the last month?

  • Dynamic (vs. Static)

    • Queries that require up-to-date information, e.g. which nodes have the highest traffic now?


Tuning knobs
Tuning Knobs provides:

  • Indexes

  • Materialized views

  • Approximation


Bitmaps data
Bitmaps -- data provides:

Settings:

lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE , L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS , L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT , L_SHIPMODE , L_COMMENT );

create bitmap index b_lin_2 on lineitem(l_returnflag);

create bitmap index b_lin_3 on lineitem(l_linestatus);

create bitmap index b_lin_4 on lineitem(l_linenumber);

  • 100000 rows ; cold buffer

  • Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.


Bitmaps queries
Bitmaps -- queries provides:

Queries:

  • 1 attribute

    select count(*) from lineitem where l_returnflag = 'N';

  • 2 attributes

    select count(*) from lineitem where l_returnflag = 'N' and l_linenumber > 3;

  • 3 attributes

    select count(*) from lineitem where l_returnflag =

    'N' and l_linenumber > 3 and l_linestatus = 'F';


Bitmaps

Order of magnitude improvement compared to scan, because summing up the 1s is very fast.

Bitmaps are best suited for multiple conditions on several attributes, each having a low selectivity.

ANR

l_returnflag

O

F

l_linestatus

Bitmaps


Vertical partitioning revisited
Vertical Partitioning Revisited summing up the 1s is very fast.

In the same settings that bit vectors are useful – lots of attributes, unpredictable combinations queried – so is vertical partitioning more attractive.

If a table has 200 attributes, no reason to retrieve 200 field rows if only three attributes are needed.

Sybase IQ, KDB, Vertica all use column-oriented tables.


Multidimensional indexes data
Multidimensional Indexes -- data summing up the 1s is very fast.

Settings:

create table spatial_facts( a1 int, a2 int, a3 int, a4 int, a5 int, a6 int, a7 int, a8 int, a9 int, a10 int, geom_a3_a7 mdsys.sdo_geometry );

create index r_spatialfacts on spatial_facts(geom_a3_a7) indextype is mdsys.spatial_index;

create bitmap index b2_spatialfacts on spatial_facts(a3,a7);

  • 500000 rows ; cold buffer

  • Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives (10000RPM), Windows 2000.


Multidimensional indexes queries
Multidimensional Indexes -- queries summing up the 1s is very fast.

Queries:

  • Point Queries

    select count(*) from fact where a3 = 694014 and a7 = 928878;

    select count(*) from spatial_facts where SDO_RELATE(geom_a3_a7, MDSYS.SDO_GEOMETRY(2001, NULL, MDSYS.SDO_POINT_TYPE(694014,928878, NULL), NULL, NULL), 'mask=equal querytype=WINDOW') = 'TRUE';

  • Range Queries

    select count(*) from spatial_facts where SDO_RELATE(geom_a3_a7, mdsys.sdo_geometry(2003,NULL,NULL, mdsys.sdo_elem_info_array(1,1003,3),mdsys.sdo_ordinate_array(10,800000,1000000,1000000)), 'mask=inside querytype=WINDOW') = 'TRUE';

    select count(*) from spatial_facts where a3 > 10 and a3 < 1000000 and a7 > 800000 and a7 < 1000000;


Multidimensional indexes

Oracle 8i on Windows 2000 summing up the 1s is very fast.

Spatial Extension:

2-dimensional data

Spatial functions used in the query

R-tree does not perform well because of the overhead of spatial extension.

Multidimensional Indexes


Multidimensional indexes1

R-Tree summing up the 1s is very fast.

SELECT STATEMENT

SORT AGGREGATE

TABLE ACCESS BY INDEX ROWID SPATIAL_FACTS

DOMAIN INDEX R_SPATIALFACTS

Bitmaps

SELECT STATEMENT

SORT AGGREGATE

BITMAP CONVERSION COUNT

BITMAP AND

BITMAP INDEX SINGLE VALUE B_FACT7

BITMAP INDEX SINGLE VALUE B_FACT3

Multidimensional Indexes


Materialized views data
Materialized Views -- data summing up the 1s is very fast.

Settings:

orders( ordernum, itemnum, quantity, storeid, vendor );

create clustered index i_order on orders(itemnum);

store( storeid, name );

item(itemnum, price);

create clustered index i_item on item(itemnum);

  • 1000000 orders, 10000 stores, 400000 items; Cold buffer

  • Oracle 9i

  • Pentium III (1 GHz, 256 Kb), 1Gb RAM, Adapter 39160 with 2 channels, 3x18Gb drives (10000RPM), Linux Debian 2.4.


Materialized views data1
Materialized Views -- data summing up the 1s is very fast.

Settings:

create materialized view vendorOutstanding

build immediate

refresh complete

enable query rewrite

as

select orders.vendor, sum(orders.quantity*item.price)

from orders,item

where orders.itemnum = item.itemnum

group by orders.vendor;


Materialized views transactions
Materialized Views -- transactions summing up the 1s is very fast.

Concurrent Transactions:

  • Insertions

    insert into orders values (1000350,7825,562,'xxxxxx6944','vendor4');

  • Queries

    select orders.vendor, sum(orders.quantity*item.price)

    from orders,item

    where orders.itemnum = item.itemnum

    group by orders.vendor;

    select * from vendorOutstanding;


Materialized views

Graph: summing up the 1s is very fast.

Oracle9i on Linux

Total sale by vendor is materialized

Trade-off between query speed-up and view maintenance:

The impact of incremental maintenance on performance is significant.

Rebuild maintenance achieves a good throughput.

A static data warehouse offers a good trade-off.

Materialized Views


Materialized view maintenance

Problem when large number of views to maintain. summing up the 1s is very fast.

The order in which views are maintained is important:

A view can be computed from an existing view instead of being recomputed from the base relations (total per region can be computed from total per nation).

Let the views and base tables be nodes v_i

Let there be an edge from v_1 to v_2 if it possible to compute the view v_2 from v_1. Associate the cost of computing v_2 from v_1 to this edge.

Compute all pairs shortest path where the start nodes are the set of base tables.

The result is an acyclic graph A. Take a topological sort of A and let that be the order of view construction.

Materialized View Maintenance


Materialized view example
Materialized View Example summing up the 1s is very fast.

  • Detail(storeid, item, qty, price, date)

  • Materialized view1(storeid, category, qty, month)

  • Materializedview2(city, category, qty, month)

  • Materializedview3(storeid, category, qty, year)


Approximations data

Settings: summing up the 1s is very fast.

TPC-H schema

Approximations

insert into approxlineitem

select top 6000 *

from lineitem

where l_linenumber = 4;

insert into approxorders

select O_ORDERKEY, O_CUSTKEY, O_ORDERSTATUS, O_TOTALPRICE, O_ORDERDATE, O_ORDERPRIORITY, O_CLERK, O_SHIPPRIORITY, O_COMMENT

from orders, approxlineitem

where o_orderkey = l_orderkey;

Approximations -- data


Approximations queries

insert into approxsupplier summing up the 1s is very fast.

select distinct S_SUPPKEY,

S_NAME ,

S_ADDRESS,

S_NATIONKEY,

S_PHONE,

S_ACCTBAL,

S_COMMENT

from approxlineitem, supplier

where s_suppkey = l_suppkey;

insert into approxpart

select distinct P_PARTKEY,

P_NAME ,

P_MFGR ,

P_BRAND ,

P_TYPE ,

P_SIZE ,

P_CONTAINER ,

P_RETAILPRICE ,

P_COMMENT

from approxlineitem, part

where p_partkey = l_partkey;

insert into approxpartsupp

select distinct PS_PARTKEY,

PS_SUPPKEY,

PS_AVAILQTY,

PS_SUPPLYCOST,

PS_COMMENT

from partsupp, approxpart, approxsupplier

where ps_partkey = p_partkey and ps_suppkey = s_suppkey;

insert into approxcustomer

select distinct C_CUSTKEY,

C_NAME ,

C_ADDRESS,

C_NATIONKEY,

C_PHONE ,

C_ACCTBAL,

C_MKTSEGMENT,

C_COMMENT

from customer, approxorders

where o_custkey = c_custkey;

insert into approxregion select * from region;

insert into approxnation select * from nation;

Approximations -- queries


Approximations more queries
Approximations -- more queries summing up the 1s is very fast.

Queries:

  • Single table query on lineitem

    select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price,

    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,

    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,

    avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order

    from lineitem

    where datediff(day, l_shipdate, '1998-12-01') <= '120'

    group by l_returnflag, l_linestatus

    order by l_returnflag, l_linestatus;


Approximations still more
Approximations -- still more summing up the 1s is very fast.

Queries:

  • 6-way join

    select n_name, avg(l_extendedprice * (1 - l_discount)) as revenue

    from customer, orders, lineitem, supplier, nation, region

    where c_custkey = o_custkey

    and l_orderkey = o_orderkey

    and l_suppkey = s_suppkey

    and c_nationkey = s_nationkey

    and s_nationkey = n_nationkey

    and n_regionkey = r_regionkey

    and r_name = 'AFRICA'

    and o_orderdate >= '1993-01-01'

    and datediff(year, o_orderdate,'1993-01-01') < 1

    group by n_name

    order by revenue desc;


Approximation accuracy

Good approximation for query Q1 on lineitem summing up the 1s is very fast.

The aggregated values obtained on a query with a 6-way join are significantly different from the actual values -- for some applications may still be good enough.

Approximation accuracy


Approximation speedup

Aqua approximation on the TPC-H schema summing up the 1s is very fast.

1% and 10% lineitem sample propagated.

The query speed-up obtained with approximated relations is significant.

Approximation Speedup


Approximating count distinct
Approximating Count Distinct summing up the 1s is very fast.

  • Suppose you have one or more columns with duplicates. You want to know how many distinct values there are.

  • Imagine that you could hash into (0,1). (Simulate this by a very large integer range).

  • Hashing will put duplicates in the same location and, if done right, will distribute values uniformly.

  • How can we use this?


Approximating count distinct1
Approximating Count Distinct summing up the 1s is very fast.

  • Suppose you took the minimum 1000 hash values.

  • Suppose that highest value of the 1000 is f.

  • What is a good approximation of the number of distinct values?


Tuning distributed applications
Tuning Distributed Applications summing up the 1s is very fast.

  • Queries across multiple databases

    • Federated Datawarehouse

    • IBM’s DataJoiner now integrated at DB2 v7.2

      • The source should perform as much work as possible and return as few data as possible for processing at the federated server

  • Processing data across multiple databases


A puzzle
A puzzle summing up the 1s is very fast.

  • Two databases X and Y

    • X records inventory data to be used for restocking

    • Y contains delivery data about shipments

  • Improve the speed of shipping by sharing data

    • Certain data of X should be postprocessed on Y shortly after it enters X

    • You want to avoid losing data from X and you want to avoid double-processing data on Y, even in the face of failures.


Two phase commit
Two-Phase Commit summing up the 1s is very fast.

commit

X

  • Commits are coordinated between the source and the destination

  • If one participant fails then blocking can occur

  • Not all db systems support prepare-to-commit interface

commit

Source(coordinator

& participant)

Y

Destination

(participant)


Replication server

Destination within a few seconds of being up-to-date summing up the 1s is very fast.

Decision support queries can be asked on destination db

Administrator is needed when network connection breaks!

Replication Server

X

Source

Y

Destination


Staging tables

No specific mechanism is necessary at source or destination summing up the 1s is very fast.

Coordination of transactions on X and Y

Staging Tables

X

Staging table

Source

Y

Destination


Issues in solution
Issues in Solution summing up the 1s is very fast.

  • A single thread can invoke a transaction on X, Y or both, but the thread may fail in the middle. A new thread will regain state by looking at the database.

  • We want to delete data from the staging tables when we are finished.

  • The tuples in the staging tables will represent an operation as well as data.


States
States summing up the 1s is very fast.

  • A tuple (operation plus data) will start in state unprocessed, then state processed, and then deleted.

  • The same transaction that processes the tuple on the destination site also changes its state. This is important so each tuple’s operation is done exactly once.


Staging tables1
Staging Tables summing up the 1s is very fast.

Write to destination

Read from source table

M2

M1

Yunprocessed

Yunprocessed

Yunprocessed

Yunprocessed

unprocessed

unprocessed

unprocessed

Yunprocessed

Yunprocessed

Table I

Table S

Table I

Table S

Database X

Database Y

Database X

Database Y

STEP 1

STEP 2


Staging tables2
Staging Tables summing up the 1s is very fast.

Delete source; then dest Y

Xact: Update source site

Xact:update destination site

Update on destination

M3

M4

Yunprocessed

Yunprocessed

processed

unprocessed

unprocessed

Yunprocessed

Yunprocessed

processed

unprocessed

unprocessed

Yunprocessed

Yunprocessed

Table I

Table S

Table I

Table S

Database X

Database Y

Database X

Database Y

STEP 3

STEP 4


What to look for
What to Look For summing up the 1s is very fast.

  • Transactions are atomic but threads, remember, are not. So a replacement thread will look to the database for information.

  • In which order should the final step do its transactions? Should it delete the unprocessed tuple from X as the first transaction or as the second?


Troubleshooting techniques
Troubleshooting Techniques summing up the 1s is very fast.(*)

  • Extraction and Analysis of Performance Indicators

    • Consumer-producer chain framework

    • Tools

      • Query plan monitors

      • Performance monitors

      • Event monitors

        (*) From Alberto Lerner’s chapter


A consumer producer chain of a dbms s resources

High Level summing up the 1s is very fast.Consumers

IntermediateResources/Consumers

probingspots

for

indicators

PrimaryResources

A Consumer-Producer Chain of a DBMS’s Resources

sql

commands

PARSEROPTIMIZER

EXECUTIONSUBSYSTEM

DISKSYBSYSTEM

LOCKINGSUBSYSTEM

CACHEMANAGER

LOGGINGSUBSYSTEM

MEMORY

CPU

DISK/CONTROLLER

NETWORK


Recurrent patterns of problems
Recurrent Patterns of Problems summing up the 1s is very fast.

Effects are not always felt first where the cause is!

  • An overloading high-level consumer

  • A poorly parameterized subsystem

  • An overloaded primary resource


A systematic approach to monitoring
A Systematic Approach to Monitoring summing up the 1s is very fast.

Extract indicators to answer the following questions

  • Question 1: Are critical queries being served in the most efficient manner?

  • Question 2: Are subsystems making optimal use of resources?

  • Question 3: Are there enough primary resources available?


Investigating high level consumers
Investigating High Level Consumers summing up the 1s is very fast.

  • Answer question 1: “Are critical queries being served in the most efficient manner?”

    • Identify the critical queries

    • Analyze their access plans

    • Profile their execution


Identifying critical queries
Identifying Critical Queries summing up the 1s is very fast.

Critical queries are usually those that:

  • Take a long time

  • Are frequently executed

    Often, a user complaint will tip us off.


Event monitors to identify critical queries
Event Monitors to Identify summing up the 1s is very fast.Critical Queries

  • If no user complains...

  • Capture usage measurements at specific events (e.g., end of each query) and then sort by usage

  • Less overhead than other type of tools because indicators are usually by-product of events monitored

  • Typical measures include CPU used, IO used, locks obtained etc.


An example event monitor
An example Event Monitor summing up the 1s is very fast.

  • CPU indicatorssorted by Oracle’sTrace Data Viewer

  • Similar tools: DB2’s Event Monitor and MSSQL’s Server Profiler


An example plan explainer
An example Plan Explainer summing up the 1s is very fast.

  • Access plan according to MSSQL’s Query Analyzer

  • Similar tools: DB2’s Visual Explain and Oracle’s SQL AnalyzeTool


Finding strangeness in access plans
Finding Strangeness in Access Plans summing up the 1s is very fast.

What to pay attention to in a plan

  • Access paths for each table

  • Sorts or intermediary results (join, group by, distinct, order by)

  • Order of operations

  • Algorithms used in the operators


To index or not to index
To Index or not to index? summing up the 1s is very fast.

select c_name, n_name from CUSTOMER join NATIONon c_nationkey=n_nationkey where c_acctbal > 0

Which plan performs best? (nation_pk is an non-clustered index over n_nationkey, and similarly for acctbal_ix over c_acctbal)


Non clustering indexes can be trouble
Non-clustering indexes can be trouble summing up the 1s is very fast.

For a low selectivity predicate, each access to the index generates a random access to the table – possibly duplicate! It ends up that the number of pages read from the table is greater than its size, i.e., a table scan is way better

Table Scan

Index Scan

76 sec272,618 pages131,425 pages273,173 pages552 pages

CPU timedata logical readsdata physical readsindex logical readsindex physical reads

5 sec143,075 pages6,777 pages136,319 pages7 pages


An example performance monitor query level
An example summing up the 1s is very fast.Performance Monitor (query level)

Statement number: 1

select C_NAME, N_NAME

from DBA.CUSTOMER join DBA.NATION on C_NATIONKEY = N_NATIONKEY

where C_ACCTBAL > 0

Number of rows retrieved is: 136308

Number of rows sent to output is: 0

Elapsed Time is: 76.349 seconds

Buffer pool data logical reads = 272618

Buffer pool data physical reads = 131425

Buffer pool data writes = 0

Buffer pool index logical reads = 273173

Buffer pool index physical reads = 552

Buffer pool index writes = 0

Total buffer pool read time (ms) = 71352

Total buffer pool write time (ms) = 0

Summary of Results

==================

Elapsed Agent CPU Rows Rows

Statement # Time (s) Time (s) Fetched Printed

1 76.349 6.670 136308 0

  • Buffer and CPU consumption for a query according to DB2’s Benchmark tool

  • Similar tools: MSSQL’s SET STATISTICS switch and Oracle’s SQL Analyze Tool


An example performance monitor system level
An example summing up the 1s is very fast.Performance Monitor (system level)

  • An IO indicator’s consumption evolution (qualitative and quantitative) according to DB2’s System Monitor

  • Similar tools: Window’s Performance Monitor and Oracle’s Performance Manager


Investigating high level consumers summary
Investigating High Level Consumers: Summary summing up the 1s is very fast.

Find criticalqueries

Investigatelower levels

Found any?

Answer Q1over them

no

yes

Overcon-sumption?

Tune problematicqueries

no

yes


Investigating primary resources
Investigating Primary Resources summing up the 1s is very fast.

  • Answer question 3:“Are there enough primary resources available for a DBMS to consume?”

  • Primary resources are: CPU, disk & controllers, memory, and network

  • Analyze specific OS-level indicators to discover bottlenecks.

  • A system-level Performance Monitor is the right tool here


Cpu consumption indicators at the os level
CPU Consumption Indicators summing up the 1s is very fast.at the OS Level

Sustained utilizationover 70% should trigger the alert.

System utilization shouldn’t be more than 40%.DBMS (on a non-dedicated machine)should be getting a decent time share.

100%

CPU% ofutilization

total usage

70%

system usage

time


Disk performance indicators at the os level

Average Queue Size summing up the 1s is very fast.

Disk Performance Indicators at the OS Level

Should beclose to zero

New requests

Wait queue

Disk Transfers/second

Wait timesshould also be close to zero

Idle disk with pending requests?Check controller contention.Also, transfersshould be balanced amongdisks/controllers


Memory consumption indicators at the os level
Memory Consumption Indicators summing up the 1s is very fast.at the OS Level

Page faults/timeshould be closeto zero. If paginghappens, at least not DB cache pages.

virtualmemory

real memory

% of pagefile inuse will tellyou how much memory is “needed.”

pagefile


Investigating intermediate resources consumers
Investigating Intermediate Resources/Consumers summing up the 1s is very fast.

  • Answer question 2:“Are subsystems making optimal use of resources?”

  • Main subsystems: Cache Manager, Disk subsystem, Lock subsystem, and Log/Recovery subsystem

  • Similarly to Q3, extract and analyze relevant Performance Indicators


Cache manager performance indicators
Cache Manager Performance Indicators summing up the 1s is very fast.

If page is not in the cache, readpage (logical) generates an actual IO (physical). Fraction of readpages that did not generate physical IO should be 90% or more (hit ratio)

Tablescan

readpage()

CacheManager

Pickvictimstrategy

Data Pages

Pages are regularly saved to disk to make free space.# of free slots should always be > 0

Free Page slots

Page reads/writes


Disk manager performance indicators
Disk Manager Performance Indicators summing up the 1s is very fast.

Displaced rows (moved to other pages) should be kept under 5% of rows

rows

Free space fragmentation: pages with little space should not be in the free list

page

Storage Hierarchy (simplified)

extent

Data fragmentation: ideally files that store DB objects (table, index) should be in one or few (<5) contiguous extents

file

File position: should balance workload evenly among all disks

disk


Lock manager performance indicators

Object summing up the 1s is very fast.

Lock Type

TXN ID

Lock Manager Performance Indicators

Lock wait time for a transaction should be a small fraction of the whole transaction time.Number of lock waitsshould be a small fraction of the number of locks on the lock list.

Lock request

Lockspending list

LockList

Deadlocks and timeouts should seldom happen (no more then 1% of the transactions)


Investigating intermediate and primary resources summary
Investigating Intermediate and Primary Resources: Summary summing up the 1s is very fast.

Answer Q3

Problems at

OS level?

Answer Q2

Tune low-levelresources

no

yes

Problematic

subsystems?

Tune subsystems

Investigateupper level

yes

no


Troubleshooting techniques1
Troubleshooting Techniques summing up the 1s is very fast.

  • Monitoring a DBMS’s performance should be based on queries and resources.

    • The consumption chain helps distinguish problems’ causes from their symptoms

    • Existing tools help extracting relevant performance indicators


Recall tuning principles
Recall Tuning Principles summing up the 1s is very fast.

  • Think globally, fix locally (troubleshoot to see what matters)

  • Partitioning breaks bottlenecks (find parallelism in processors, controllers, caches, and disks)

  • Start-up costs are high; running costs are low (batch size, cursors)

  • Be prepared for trade-offs (unless you can rethink the queries)


ad