Backup and recovery Basics of Backup and restoration Types of recovery Defining strategy Starting up and shutting down 80/20 rule SLA’s
Database backup restoration and recovery • Physical files that make up db need to be backed up • RMAN or user managed • Copy files to correct location and let db recover itself • Recovery process is starting the database and making it consistent with the backed up files
Types of failure • Non media or media/disk failure • Media failures are most critical. • To recover: • Determine what files need recovery • What type of media recovery is required • Restore backups • Apply offline redo logs (archive) • Restore db to desired point • Test process often!
Defining backup and recovery strategy • Has the following requirements / considerations • Amount of data that can be lost in event of failure • Length of time business can run without the DB • Can DB be offline to perform backup and how long for? • Types of resources available to perform backup/ recovery • Procedures for undoing changes to DB • Cost of buying and maintaining HW and additional backups V cost of replacing or recreating data
Losing data in DB failure The amount of data that can be lost in a failure helps determine the backup and recovery strategy that is implemented • Cold backups with no archive of redo logs • Updates since the last backup will be lost. The frequency of cold backup dictates the amount of data lost. • Hot or cold backup with archiving of redo logs • No updates lost as long as all redo logs are kept since the last backup.
Surviving without the DB in a DB failure • The amount of time a business can survive without the DB is dependent on the type of business. • Mean time to recover (MTTR) is obtained during testing of failure scenarios. • If MTTR > business survival time then backup and recovery time should be re-addressed.
Online backup • There are two types of backup: offline and online. They are referred to as cold and hot. • Offline (cold) are done when the database is closed. The downtime is related to database size and normally runs for several hours. • Online (hot) are done when the DB is open but can affect performance. • Archiving must be turned on for the hot backup.
Undoing changes to DB • There are 3 primary ways of undoing changes to the DB which do not involve backups • Manual • Re executing code to rebuild data, dependent on the sophistication of the code releases and the configuration of the management control of the application (used in warehousing systems). • Oracle log miner – querying the redo logs to recover the changed/dropped data. • Flash back query – query undo segments to recover the changed/dropped data.
Log miner • Oracle utility allows you to generate the insert, delete and update statements from the change vectors in redo logs • Logminer packages • Sys.dbms_logmnr_d.build – builds DD externally • Dbms_logmnr.add_logfile – access to desired redo log • Dbms_logmnr.start_logmnr – starts logminer session
DD access • To fully translate the contents of the redo logs, logminer requires access to data dictionary. • Dbms_logmnr_d.build extracts DD information to a flat file or the redo logs in 9i • More data can be found in oracle documentation e.g. Oracle9i Database Administrator's Guide – Chapter 9
Flash back • Works of the undo data and lets the user see a read consistent view of the database in the past. It is easier than logminer. • Only transactions committed can be viewed. • User can specify the read only view based on system time or a system change number (SCN). • Enables at session level. • Max of 5 days back.
Cont .. • DBA must set the under retention interval long enough to be able to reconstruct the snapshot. Alter system set undo_retention = <seconds>; • Max is 5 days • Requires execute permission on DBMS_flashback Execute dbms_flashback.enable_at_time(sysdate-1/24); • Just remember to disable afterwards
Setting archive mode • Alter database archivelog (or noarchivelog) • Sends files to location in init.ora file for log_archive_dest • Default is off • Select name, logmode from v$database;
Cont .. • Change or add init.ora parameter • Log_archive_start = true • Startup mount • Alter database archivelog • Alter database open Archive log list to check its logging
Starting up a database Startup [option] • Options available will determine state of database • Open enables users to access db • Mount mounts DB for certain DBA activities • Nomount starts BG processes but does not allow access • Exclusive permits only current instance to access the DB
Start up errors ORA-12547: TNS: lost contact Or ORA-09352:windows 32-bit two task driver unable to spawn new oracle task These are common errors and mean that the oracle services in the control panel have not started.
Shutdown Shutdown [option] • Options • Normal default –waits for current sessions & transactions to end forces a check point and closes files • Transactional – waits for current transaction to finish before closing. • Immediate – forces check point and closes files • Abort – shuts down no save or close
80/20 rule • Rule conceived by Vilfredo Pareto • Minority of causes produce the majority of results • Achieved by doing as much work as possible up front – in the design phase • Set realistic expectations • Reasonable response times • Majority of efforts in tuning improves response time • Response time = work time + wait time
Improving response time • Improve either work time or wait time • Improvement is driven by need to have transactions finish faster • Tune component with the longest time • E.g.: if 20% of the time is spent with the CPU processing data and 80% due to delay in processing then improving CPU time will have less effect than reducing processing time • Governed by an SLA (service level agreement)
SLA’s • How many transactions should be completed per min? • How many users should be connected at peak times? • How many concurrent transactions are expected at peak time? • What times of the day are considered for online use only? • What is that batch processing window? • What is the expected response time for online processes? • How much system resource should be available during peak time? – define margin of error