930 likes | 1.04k Views
Database Design for Success. Informix 101. #include <disclaimer.h>. Agenda. Sizing your machine Install and Configure the Engine Build a database Loading Data Monitoring, Tuning & Ongoing Administration Odds, Ends and other Pet Peeves. Sizing Your Machine. The Data Model
E N D
Database Design for Success Informix 101
Agenda • Sizing your machine • Install and Configure the Engine • Build a database • Loading Data • Monitoring, Tuning & Ongoing Administration • Odds, Ends and other Pet Peeves
Sizing Your Machine • The Data Model • The holy trinity • Disk, CPU, Memory • Raid, Mirroring, Striping
No plan of battle survives it’s first contact with the enemy. - Clausewitz
No System Design survives it’s first contact with the data. - Parker That’s no excuse for not making one.
The Logical Model (LDM) • An informal representation of data and it’s relationships. • Format of the model can be changed to fit the users understandings. • Used to communicate with users. • Is NOT suited to be the foundation of a database. • Entities, attributes
The Physical Data Model (PDM) • A formal description of a database and the links between tables. • Very rigid format. • Points out mistakes. • Can provide a sizing estimate. • Table, Columns. • A roadmap for new team members
Input: table, rowsize, nrows A Sizing Tool Outputs: extent size, estimated scan time
Sizing is not always possible • Have to order the machine before we know the full extent of the data. • Multiply the raw data size * 4 = disk • Divide by 25GB = CPU • round up to multiples of 4 • Multiply * 500MB = Memory • Make sure you can grow from there • Make sure to indicate this is an estimate.
To Raid or not to Raid(a battle you will lose) • Expensive in terms of performance. Your machine is doing extra work – depending on the Raid you choose. • Mirroring is best, but expensive in terms of $. • 500 disks, 50,000 MTBF = 1 failure every 100 hours.
Striping • Takes control out of the hands of the DBA and gives it to the sysadmin – who doesn’t have a clue. • Hard to say no to. • Mitigate it by striping small disk sets (groups of 4 or 8). • Worse comes to worse, benchmark.
You now have • A machine. • A bunch of raw disks that have been sliced up. • Some expectation of table and index size. • A Physical Data Model expressed in terms of a schema.
Setup • Getting the engine online • Layout of disks • root • logical logs • physical logs • data • temp
Install Informix • Use $INFORMIXDIR as a link, not a hard coded path. • Run as root • Everybody knows this, right?
Disks • mkdir /opt/server_1/dev (or somesuch) • ln –s all disks to this directory • Give them names that match underlying disks. • Be VERY careful not to overlay one link on top of another. (keep a map) Recovery note: • use a script to do the links. Save it. • $INFORMIXDIR/server_1/rebuild
Raw vs Cooked Disk • Cooked disk is a file • Uses Unix file system I/O • Risk • Raw disk is a character special device (/dev/rdsk/c0d3s2) • Uses Kernel I/O (KIO) or Asynchronous I/O (AIO) • Engine does the work • 15% faster
Disk Example • Raw Device • /dev/rdsk/c1d0s3 • Link to raw device • /opt/server_1/dev/c1d0s3 -> /dev/rdsk/c1d0s3 • onstat –d • dbsp1.1 /opt/server_1/dev/c1d0s3 • dbsp1.2 /opt/server_1/dev/c2d0s3 • dbsp1.3 /opt/server_1/dev/c3d0s3
oninit -ivy • Cheat and use onmonitor • $ONCONFIG • $INFORMIXDIR/etc/sqlhosts • /etc/services • Don’t forget: $INFORMIXDIR, $INFORMIXSERVER, $TERM, $ONCONFIG, ($SQLHOSTS)
The engine trinity • $ONCONFIG • DBSERVERNAME, ALIAS • SQLHOSTS • SERVERNAME, SERVICE • /etc/services • Service • $INFORMIXSERVER
Move logs • Logical and Physical logs can be high activity. Good practice to move them to their own disks. • Physical • Change name of PHYSDBS in $ONCONFIG • Logical • Add new logs to new DBS • Force to next log (onmode –l) • Drop original logs (onparams –d –l log_id)
Temp Disks • More is better • 2 or 3 per cpu • Many shallow is better than few deep. • Used to sort, build indices, overflows. • Size to largest index, or largest hash join.
What is Parallelism? Sort vertical parallelism horizontal parallelism Join Time to Process Scan ParallelTasks Parallel Subtasks Tasks
How do you turn it on? • PDQPRIORITY • Fragment your data
PDQPRIORITY • Turns on parallelism (=1) • Allocates Decision Support (DS) Memory • $PDQPRIORITY • SET PDQPRIORITY • Memory = DS_Memory * (MAX_PDQPRIORITY/100) * (PDQPRIORITY/100) • 1GB * 50 * 50 /10000 = 250MB
Fragmentation • Create table profile (……..) • Fragment by • round robin in dbslc1 • expression (col1 > abc in dbslc1.1, col1 < ghi and col1 > def in dbslc 1.2 • hash(col1) in dbslc1 • hybrid (col1) expression col2 > abc in dbslc1.1, col2 < ghi and col2 > def in dbslc1.2
Data Disks • Wide is better than deep • Break into logical groupings of dbspaces • Multiples of CPU’s – to a max of 3. • DO NOT name according to tables. • Wide.1…Wide.32 • Thin.1 … Thin.4 • Temp.1 … Temp.16 • (XPS DBslice)
Table layout • Size first and next extents properly • Spread across multiple disks • Across multiple controllers • Balance Fragments to CPUs
Balance • Horizontally • Even distributions of data across disks • Disks matched to CPUs (data and temp) • Vertically • Spread the load of checkpoints across Page cleaners and checkpoint writes
Loading data • Check out the loader.FAQ www.artentech.com/downloads.htm • Load • DBload • Insert Cursor • HPL • PLoader
Update Statistics • What does it do? • systables, sysindexes, • sysdistrib • Why do it? • How to make it faster • DBUPSPACE • Run in parallel • with LOCK MODE WAIT • Stored Procedures • Scripts at www.iiug.org
Building Indices • Why build an index? • Find specific rows • Resolve query with index only scans • Promote nested loop joins • Bitmaps and Generalized Key indices • Fill Factor, Clustered • Involves: • Scan, Sort, btree-appender threads • Write to disk
Attached, Detached, Fragmented • Detached • specify storage for the index • Otherwise uses the tablespace • Default in 7 is attached • Default in 9 is detached, 8 is detached • Detached means another 4 bytes • Why detach?
Things to tune • CPU • Memory • Disk • Network / Interconnect • Data Model • Processing
Things to tune The Engine 2x - 3x
Things to tune Disk 2x - 100x The Engine 2x - 3x
Things to tune Code 10x – 1000x Disk Engine
Things to tune Data Model Code Disk Engine 10x – 10000x
Tuning • Big knobs • Disk • Disk • Disk • Memory • Little knobs • The rest of $ONCONFIG
MonitorngSo many utilities… • onstat • sysmaster • OS utilities (iostat,mpstat) • Graphical (onperf, xtree) • Specific (ISpy) • Third Party (Patrol, Big Brother, etc) • Query Plan • Home grown
… It all boils down to • What do YOU care about. • Look for problems, and monitor events leading up to those.
Some things I watch • Disk fragmentation, size, up • onstat –p • Did expected event take place? • Long checkpoints • Logfiles • errors in online.log • Resource hogs
onstat –pdipstick & thermometer rolled into one • Cache, waits, compresses, rollbacks • Negative values • Capture them periodically and reset • if you care that much
Disk • Are table fragmented/sized properly • full scan of table – how long? • fragment elimination • < 4 extents per dbspace • Has someone dropped a new table somewhere? • Good indices? • Look at query plans
Different disk utilities report different things • Raw Data Size • From Unix 32,000,000,000 chars • Pages allocated • From oncheck –pe 10,000,000 pages • Created table as • First Size 100,000 kbytes • Pages used • From npused 7,928,642 pages
onchk.awk Chunk: rootdbs Size: 19.53 Used: 13.54 Ckfree: 5.99 PctFree: 30.69sysutils 0.62 Extents 39 *****LOGICAL 9.77 Extents 20 *****stores7 0.67 Extents 40 *****PHYSICAL 0.98 Extents 1sysmaster 1.17 Extents 73 ***** Chunk: testdbs Size: 9.77 Used: 0.12 Ckfree: 9.65 PctFree: 98.78 *** OVER 80% FREE SPACE ***stores7 0.02 Extents 1 Grand total disk use (MB) 27.19 See Handout
Memory • Is there enough? • What does overflow look like? • Is there enough temp space to cover it? • How are people joining tables?