1 / 50

Oracle High Availability - A Case Study

Rama Balaji Senior Oracle Consultant. Oracle High Availability - A Case Study. Overview. This presentation is a case study of a customer whose storage array failed during heavy transaction processing period. Methods followed to quickly restore the database.

makana
Download Presentation

Oracle High Availability - A Case Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rama Balaji Senior Oracle Consultant Oracle High Availability - A Case Study

  2. Overview • This presentation is a case study of a customer whose storage array failed during heavy transaction processing period. • Methods followed to quickly restore the database. • How the best practice configuration was arrived while maintaining at least two copies of production databases on-line. • How Oracle’s HA components were effectively explored to achieve the final configuration with only few minutes of production downtime.

  3. Agenda • Background • Approach • Initial Configuration • Disaster to Full recovery • High Availability Features • Conclusion and best practices • Question and Answer

  4. Oracle Features Used • RMAN compressed backups • Physical Standby using Data Guard • ASM to non-ASM and vice versa • Cascaded Destinations Standby • 2 node RAC • Failover and Switchover • Flash Recovery Area and Flash Back Logs

  5. Background • E-commerce client • Nature of business • Initial cost effective business approach • Business growth • IT infrastructure

  6. Approach Event Database Availability 1. Storage array failure 2. Restored database from tape using RMAN 3. Set up the Physical Standby 4. Second Physical Standby using cascaded destination feature 5. Failover from production to standby No Database One Two Three Two

  7. Approach Event Database Availability 6. 2-Node RAC with ASM on the new storage array as standby 7.Switched over from single node Db instance to 2 Node RAC as primary 8.Disconnected the second standby 9. 2 node RAC standby to 2 node RAC primary 10. Flashback Logs cleanup from FRA Three Three Two Two No Downtime

  8. Server - Linux Red Hat x86_64 Database – Oracle 10.2.0.3 ASM – two diskgroups DATA_DG – Database FLASH_DG – flash_recovery_area Daily full database backup to disk- RMAN, and to tape using third party software Hourly archivelog backups to disk – RMAN, and to tape using third party software No RMAN catalog 2 node RAC + ASM Storage Array Initial configuration

  9. RMAN • RMAN> show controlfile autobackup; RMAN configuration parameters are: CONFIGURE CONTROLFILE AUTOBACKUP ON; • RMAN> show controlfile autobackup format; RMAN configuration parameters are: CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/opt/oracle/admin/PROD/backup/cntl/%F';

  10. RMAN Backup script • RUN { CONFIGURE RETENTION POLICY TO REDUNDANCY 1; ALLOCATE CHANNEL ch1 DEVICE TYPE DISK MAXPIECESIZE 2G FORMAT '/opt/oracle/admin/PROD/backup/db/RCOMPRESSED_%U'; BACKUP as compressed backupset DATABASE plus archivelog channel ch1; }

  11. RMAN Archivelog backup script • RUN { ALLOCATE CHANNEL ch1 DEVICE TYPE DISK FORMAT '/opt/oracle/admin/PROD/backup/archive/ARCH_%U'; BACKUP as compressed backupset archivelog all; }

  12. 2 node RAC + ASM Storage Array Storage Array Failure • Storage admin determined that the SAN failure has caused loss of entire data including the database and backups. • Validated that the RMAN backup on tape from previous night as well as archivelog files from previous hour were intact.

  13. 2 node RAC + ASM Single DB Non- ASM Storage Array Restore using RMAN • Restored the tape backup and all archivelog file backups from tape to internal disks on one of the RAC node

  14. Restore from ASM to non-ASM • Restored the backup from tape to the original backup location. • Restored spfile from autobackup. RMAN> restore spfile from autobackup; • Created pfile from spfile;

  15. Restore from ASM to non-ASM (Continued…) • Edited the following parameters in pfile • control_files - Changed +DATA_DG and +FLASH_DG to file system • db_file_name_convert - Changed +DATA_DG to file system • log_file_name_convert - Changed +DATA_DG and +FLASH_DG to file system • startup nomount using pfile; • Using RMAN nocatalog, restore controlfile from autobackup. RMAN> restore controlfile from ‘/opt/oracle/admin/PROD/backup/cntl/c-662196802-20081230-0e’

  16. Restore from ASM to non-ASM (Continued…) • alter database mount; • restore database; • recover database; • Created temporary tablespaces • alter database open;

  17. Standby Database Primary Database Physical Standby PROD PRODS

  18. Physical Standby Steps • On the primary database (PROD) generated pfile. Create pfile=’/tmp/initPROD.ora’ from spfile; and copy that file to the standby server. • On the primary (PROD) backed up the current controlfile using RMAN. RMAN> backup current controlfile for standby; • RMAN> copy current controlfile for standby to '/tmp/sby_control01.ctl';

  19. Physical Standby Steps (Continued…) • Run a full RMAN backup on the primary (PROD). RMAN> RUN { ALLOCATE CHANNEL ch1 DEVICE TYPE DISK MAXPIECESIZE 2G FORMAT '/home/oracle/backup_standby/RCOMPRESSED_%U'; BACKUP as compressed backupset DATABASE plus archivelog channel ch1; } • Copied all backup files as well as controlfile backup to the standby (PRODS). Backup files needs to be in the same location as the primary backup location or create a symbolic link.

  20. Physical Standby Steps (Continued…) • Edited the parameter file on the standby (PRODS). • Changed control_files parameter to point to the controlfile on the standby server. • Changed db_file_name_convert to use the new location on the standby server. • Changed log_file_name_convert to use the new location on the standby server. • Start up the standby (PRODS)instance in nomount

  21. Physical Standby Steps (Continued…) • rman nocatalog RMAN> connect target username/password@PROD RMAN> connect auxiliary / RMAN> RUN { ALLOCATE auxiliary CHANNEL ch1 DEVICE TYPE DISK FORMAT '/home/oracle/backup_standby/RCOMPRESSED_%U'; duplicate target database for standby; } • Changed the following parameter on the primary (PROD). SQL> ALTER SYSTEM SET log_archive_dest_2= 'SERVICE=PRODS_XPT REOPEN=300'

  22. Physical Standby Steps (Continued…) • Changed the standby (PRODS) to Managed Recovery alter database recover managed standby database disconnect from session; • Issued the following command to make sure the archive log files are applied on the standby (PRODS) Select sequence#, applied from v$archived_log order by sequence#;

  23. Data Guard Cascaded Destinations PROD PRODS PRODC Standby Database 1 Data Guard Cascaded Destination Standby Database 2 Primary Database Data Guard LOG_ARCHIVE_DEST_2='SERVICE=PRODC_XPT REOPEN=300'; LOG_ARCHIVE_DEST_2='SERVICE=PRODS_XPT REOPEN=300';

  24. Data Guard Failover PROD PRODS PRODC Primary Database Standby Database Primary Database Data Guard Failover Data Guard

  25. Data Guard Failover Steps • On the primary (PROD)I did the following. After users were logged out, created a table called TEST. Create table test as select * from dba_users; This was a last operation on the primary database. • On the primary (PROD) issued the following command couple of times. Alter system switch logfile;

  26. Data Guard Failover Steps (Continued…) • On the primary (PROD), noted down the sequence# SQL> archive log list Database log mode Archive Mode Automatic archival Enabled Archive destination /d01/oracle/flashback Oldest online log sequence 6503 Next log sequence to archive 6504 Current log sequence 6504 • On the standby (PRODS) issued the following command to make sure standby is in maximum performance mode. SQL> alter database set standby database to maximize performance;

  27. Data Guard Failover Steps (Continued…) • On the standby(PRODS), issued the following command SQL> select thread#, low_sequence#, high_sequence# from v$archive_gap; no rows selected • On the standby (PRODS), check the last sequence#, SQL> select max(sequence#) from v$archived_log; MAX(SEQUENCE#) ------------- 6504

  28. Data Guard Failover Steps (Continued…) • On the standby (PRODS), SQL> alter database recover managed standby database finish force; Database altered On the standby (PRODS), SQL> alter database commit to switchover to primary; Database altered • On the standby instance (PRODS), SQL> shutdown immediate ORA-01507: database not mounted ORACLE instance shut down.

  29. Data Guard Failover Steps (Continued…) • SQL> startup ORACLE instance started. Total System Global Area 2147483648 bytes Fixed Size 2074152 bytes Variable Size 1056967128 bytes Database Buffers 1073741824 bytes Redo Buffers 14700544 bytes Database mounted. Database opened. At this point standby is opened as primary.

  30. Data Guard Failover Steps (Continued…) • On the current primary (PRODS), issued the following query to make sure the table was brought over. Select * from test; then drop the table. • On the second standby (PRODC) instance verified the cascaded standby destination by querying v$archived_log. • I had to bounce the cascaded standby instance (PRODC), and put it back in a recovery mode.

  31. PRODC PRODS Primary Database Standby Database 1 Data Guard Storage Array 2 node RAC + ASM Standby Database 2 PROD2 2-Node RAC as Standby PROD1

  32. Data Guard Switchover Steps • Make sure only one instance on the RAC system is mounted(PROD1). • Make sure the FAL_SERVER and FAL_CLIENT parameters are set correctly on the primary as well as standby. On the primary FAL_SERVER=standby instance FAL_CLIENT=primary instance On the Physical Standby FAL_SERVER=primary instance FAL_CLIENT=standby instance

  33. Data Guard Switchover Steps (Continued…) • On the primary (PRODS) issued the following SQL> select switchover_status from v$database; SWITCHOVER_STATUS -------------------------------- TO STANDBY • On the standby (PROD1) instance issued the following command SQL> select switchover_status from v$database; SWITCHOVER_STATUS -------------------------------- TO PRIMARY

  34. Data Guard Switchover Steps (Continued…) • On the primary (PRODS) issued the following command SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY; • SQL> ALTER SYSTEM SET log_archive_dest_state_2='DEFER'; • Shutdown the primary instance (PRODS) SQL> shutdown immediate; SQL> Startup nomount; SQL> alter database mount standby database;

  35. Data Guard Switchover Steps (Continued…) • On the standby (PROD1) issued the following command SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY; • On the standby (PROD1) SQL>shutdown immediate; SQL> startup open; • SQL>alter system switch logfile; • Make sure other instance(PROD2) comes up as well.

  36. 2 node RAC + ASM Primary Database 1 Storage Array Data Guard Switchover PROD PRODS PRODC Standby Database 2 Standby Database 1 Data Guard

  37. 2 node RAC + ASM Primary Database 2 node RAC + ASM Standby Database Storage Array Storage Array Final Configuration

  38. Real Application Clusters • RAC misconception. • Always register the database and instances to the cluster. • You have to use “netca” to register the listener to crs in 10g. • Srvctl is “case sensitive”. • Always use single parameter file for multiple RAC nodes. • Client side load balancing. • Server side load balancing.

  39. ASM Configuration Storage Group 2 Storage Group 1 ASM Disk Groups DATA FLASH LUN 1 LUN 2 LUN 5 LUN 6 ASM disks LUN 7 LUN 8 LUN 3 LUN 4 Archive log files RMAN backups Mirrored cntl and log Flashback Logs Data Files Control Files Online log files

  40. Flash Recovery Area • Show parameter db_recovery NAME TYPE VALUE ------------------------------------ ----------- --------------------------- db_recovery_file_dest string +FLASH_DG db_recovery_file_dest_size big integer 190000M • SQL> select space_used/(1024*1024*1024),space_limit/(1024*1024*1024) from v$recovery_file_dest; SPACE_USED (in GB) SPACE_LIMIT (in GB) --------------------------- ---------------------------- 55.6249833 185.546875

  41. Flash Recovery Area • SQL> select * from v$flash_recovery_area_usage; CONTROLFILE 0 0 0 ONLINELOG 0 0 0 ARCHIVELOG 18.17 6.03 4691 BACKUPPIECE 0 0 0 IMAGECOPY 0 0 0 FLASHBACKLOG 11.81 11.73 1077 • ORA-19815: WARNING: db_recovery_file_dest_size of 53687091200 bytes is 87.63% used, and has 6642196992 remaining bytes available

  42. Flash Recovery Area • SQL> alter system set db_recovery_file_dest_size=55G scope=memory; • SQL> select * from v$flash_recovery_area_usage; CONTROLFILE 0 0 0 ONLINELOG 0 0 0 ARCHIVELOG 61.3 20.34 4692 BACKUPPIECE 0 0 0 IMAGECOPY 0 0 0 FLASHBACKLOG 38.7 38.44 1046

  43. Cleanup of Archivelog files from Flash Recovery Area • Issue the following RMAN commands RMAN> crosscheck archivelog all; RMAN> delete expired archivelog all; • After “delete expired” command CONTROLFILE 0 0 0 ONLINELOG 0 0 0 ARCHIVELOG 3.83 0 1274 BACKUPPIECE 0 0 0 IMAGECOPY 0 0 0 FLASHBACKLOG 51.23 50.93 1007

  44. Flashback Logs • SQL> show parameter db_flashback NAME TYPE VALUE ------------------------------------ ------------------------------------------ ------------------ db_flashback_retention_target integer 1440 • SQL> show parameter log_archive_min_succeed_dest; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ log_archive_min_succeed_dest integer 1

  45. How to cleanup Flashback Logs from Flash Recovery Area • SQL> alter system set log_archive_dest_1='LOCATION=/home/oracle/temp_archivelog' scope=memory; • SQL> alter system set log_archive_dest_state_10=defer scope=memory; • SQL> alter system set db_recovery_file_dest_size=55G scope=memory; • Check the alert log for “deleted Oracle managed files” messages.

  46. How to cleanup Flashback Logs from Flash Recovery Area • SQL> select * from v$flash_recovery_area_usage; CONTROLFILE 0 0 0 ONLINELOG 0 0 0 ARCHIVELOG 92.19 0 1274 BACKUPPIECE 0 0 0 IMAGECOPY 0 0 0 FLASHBACKLOG 7.3 0 6 • SQL> alter system set db_recovery_file_dest_size=190G scope=memory;

  47. How to cleanup Flashback Logs from Flash Recovery Area • SQL> alter system set log_archive_dest_1='' scope=memory; • SQL> alter system set log_archive_dest_state_10=enable scope=memory

  48. Conclusion and Best Practices • All configuration changes were performed without any significant downtime outages to the production or standby databases. • Various Oracle technologies, including RMAN, RAC, ASM, and Data Guard, were successfully utilized together to achieve the final high-availability solution. • An awareness and understanding of the technologies available from Oracle, together with an innovative approach to implementation, were critical in building the environment while complying with the customer’s business needs.

  49. Question Answers Contact Information Rama Balaji balajir@TUSC.com (303) 985-2213 www.tusc.com

  50. Contact Information Rama Balaji balajir@TUSC.com (303) 985-2213 www.tusc.com Core Services Oracle E-Business Suite Enterprise Performance Management (EPM) Oracle DBA, Database, and Infrastructure Business Intelligence and Data Warehousing Managed Services – remote support & hosting Oracle Fusion Middleware Oracle Hyperion Software Training World-wide Team of IT Professionals

More Related