A rough guide to rac
This presentation is the property of its rightful owner.
Sponsored Links
1 / 63

A Rough Guide to RAC PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on
  • Presentation posted in: General

A Rough Guide to RAC. Julian Dyke Independent Consultant. Web Version. juliandyke.com. Agenda. Introduction Availability Scalability Manageability Total Cost of Ownership Conclusion. Introduction. Some RAC Terminology. OCRDUMP. SRVCTL. RAC. GCS. CLUVFY. CSS. LMD. CRSCTL.

Download Presentation

A Rough Guide to RAC

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A rough guide to rac

A Rough Guide to RAC

Julian Dyke

Independent Consultant

Web Version

juliandyke.com


Agenda

Agenda

  • Introduction

  • Availability

  • Scalability

  • Manageability

  • Total Cost of Ownership

  • Conclusion


A rough guide to rac

Introduction


Some rac terminology

Some RAC Terminology

OCRDUMP

SRVCTL

RAC

GCS

CLUVFY

CSS

LMD

CRSCTL

OCFS2

PI

LMS

OCR

OIFCFG

LCK

OCRCHECK

VIP

OCSSD

CRSD

GRD

CRS

FAN

DIAG

ONS

VIPCA

EVMD

LMON

OCRCONFIG

BAST

OCFS

GES

AST

ASM

TAF

GSD

LKDEBUG

FCF

CRS_STAT


What is rac

Node 1

Node 2

Instance 1

Instance 2

Interconnect

LocalDisk

Shared Storage

LocalDisk

What is RAC?

  • Multiple instances running on separate servers (nodes)

  • Single database on shared storage accessible to all nodes

  • Instances exchange information over an interconnect network


Instances versus databases

Instances versus Databases

  • A RAC cluster includes

    • one database

    • one or more instances

  • A database is a set of files

    • Located on shared storage

    • Contains all persistent resources

  • An instance is a set of memory structures and processes

    • Contain all temporal resources

    • Can be started and stopped independently


Instances versus databases1

Instance 1

Instance 2

Instance 3

Instance 4

Node 1

Node 2

Node 3

Node 4

Instances versus Databases

PublicNetwork

Private Network(Interconnect)

Storage Network

Database


What is a rac database

What is a RAC Database?

  • Located on shared storage accessible by all instances

  • Includes

    • Control Files

    • Data Files

    • Online Redo Logs

    • Server Parameter File

  • May optionally include

    • Archived Redo Logs

    • Backups

    • Flashback Logs (Oracle 10.1 and above)

    • Change Tracking Writer files (Oracle 10.1 and above)


What is a rac database1

What is a RAC Database?

  • Contents similar to single instance database except

    • One redo thread per instance

ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 3 SIZE 51200K,GROUP 4 SIZE 51200K;

ALTER DATABASE ENABLE PUBLIC THREAD 2;

  • If using Automatic Undo Management also require one UNDO tablespace per instance

CREATE UNDO TABLESPACE "UNDOTBS2" DATAFILE SIZE 25600K AUTOEXTEND ON MAXSIZE UNLIMITED EXTENT MANAGEMENT LOCAL;

  • Additional dynamic performance views (V$, GV$ but not X$) created by $ORACLE_HOME/rdbms/admin/catclust.sql


What is the interconnect

What is the Interconnect?

  • Instances communicate with each other over the interconnect (network)

  • Information transferred between instances includes

    • data blocks

    • locks

    • SCNs

  • Typically 1GB Ethernet

    • UDP protocol

    • Often teamed in pairs to avoid SPOFs

  • Can also use Infiniband

    • Fewer levels in stack

  • Other proprietary protocols are available


Why use shared storage

Why Use Shared Storage?

  • Mandatory for

    • Database files

    • Control files

    • Online redo logs

    • Server Parameter file (if used)

  • Optional for

    • Archived redo logs (recommended)

    • Executables (Binaries)

      • Password files

      • Parameter files

      • Network configuration files

    • Administrative directories

      • Alert Log

      • Dump Files


What shared storage is supported

What Shared Storage is Supported?

  • Oracle supplied options

    • Oracle Cluster File System (OCFS)

      • Version 1

        • Windows and Linux

        • Supports database and archived redo logs

        • No executables

      • Version 2 - August 2005

        • Linux, Windows and Solaris

        • As OCFS1 plus executables

    • Automatic Storage Management (ASM)

      • Oracle 10.1 and above

      • More transparent in Oracle 10.2 and above

    • Both require underlying SAN or NAS

      • Do not require LVM


What shared storage is supported1

What Shared Storage is Supported?

  • Can use (continued)

    • Network Attached Storage

      • NFS-based

      • Potentially lower cost - no fibre channel required

      • Easy to administer

    • Raw devices

      • Difficult to administer

      • Cannot be used with archived redo logs

    • Third-party Cluster File System

      • Still a popular choice with many sites

    • Others (not supported)

      • Firewire - maximum two nodes - recommended in 10g

      • NBD - Network Block Devices - Solaris and Linux

      • NFS - not supported, but might still work


What is a shared oracle home

What is a Shared Oracle Home?

  • Can install multiple copies of Oracle executables on local disks on each node

  • Can also install Shared Oracle Home

    • single copy of Oracle executables on shared storage

  • Oracle 9.2

    • Only Oracle database software

  • Oracle 10.1

    • Cluster Ready Services (CRS)

    • Oracle database software + ASM

  • Oracle 10.2

    • Oracle Clusterware (CRS)

    • ASM

    • Oracle database software


Internal structures and services

Internal Structures and Services

  • Global Resource Directory (GRD)

    • Records current state and owner of each resource

    • Contains convert and write queues

    • Distributed across all instances in cluster

  • Global Cache Services (GCS)

    • Implements cache coherency for database

    • Coordinates access to database blocks for instances

    • Maintains GRD

  • Global Enqueue Services (GES)

    • Controls access to other resources (locks) including

      • library cache

      • dictionary cache


Background processes

Background Processes

  • Each RAC instance has set of standard background processes e.g.

    • PMON

    • SMON

    • LGWR

    • DBWn

    • ARCn

  • RAC instances use additional background processes to support GCS and GES including

    • LMON

    • LCK0

    • LMDn

    • LMSn

    • DIAG


Portability

Portability

  • Most single-instance applications should port to RAC

  • Some exceptions

    • Application must scale well on single instance

      • Can be difficult to evaluate

    • Some features do not work e.g.

      • DBMS_ALERT

      • DBMS_PIPE

    • External inputs/outputs may need modification

      • Flat files etc

    • Some RAC features require additional coding

      • TAF

    • Code may need upgrading to use RAC functionality e.g.

      • FCF requires JDBC Implicit Connection Cache


Why do users deploy rac

Why Do Users Deploy RAC?

  • Users may deploy RAC to achieve

    • Increasing availability

    • Increasing scalability

    • Improving maintainability

    • Reduction in total cost of ownership


Why do dbas deploy rac

Why Do DBAs Deploy RAC?

  • DBAs may want to deploy RAC because:

    • Realistic next step for experienced Oracle DBAs

    • Intellectual challenge

    • Job protection - ties organisation to Oracle technology

    • Possible improved earnings

    • It looks good on their CV


A rough guide to rac

Availability


What is failover

What is Failover?

  • If one node or instance fails

    • Node detecting failure will

      • Read redo log of failed instance from last checkpoint

      • Apply redo to datafiles including undo segments (roll forward)

      • Rollback uncommitted transactions

    • Cluster is frozen during part of this process

Interconnect

Instance 1

Instance 2

Node 1

Node 2


What are database services

What are Database Services?

  • Database Services are logical groups of sessions

  • Can be configured using

    • DBCA

    • Enterprise Manager (10.2 and above)

  • Can also be configured using

    • SRVCTL (Oracle Cluster Registry only)

    • SQL*Plus (Data Dictionary only)

    • Text editor (Network Configuration)

  • In Oracle 10.1 and above, each service has

    • Preferred Nodes (used by default)

    • Available Nodes (used if preferred node fails)


What are database services1

What are Database Services?

  • Can be used with Resource Manager to control resource usage e.g.

    • CPU

    • Parallel execution

  • Can be used for monitoring

    • V$SERVICE_STATS

  • Can be used for diagnostics

    • DBMS_MONITOR

      • trace

      • statistics


  • What is oracle clusterware

    What is Oracle Clusterware?

    • Introduced in Oracle 10.1 (Cluster Ready Services - CRS)

    • Renamed in Oracle 10.2 to Oracle Clusterware

    • Cluster Manager providing

      • Node membership services

      • Global resource management

      • High availability functions

    • On Linux

      • Configured in /etc/inittab

      • Implemented using three daemons

        • CRS - Cluster Ready Service

        • CSS - Cluster Synchronization Service

        • EVM - Event Manager

    • In Oracle 10.2 includes High Availability framework

      • Allows non-Oracle applications to be managed


    What is the ocr

    What is the OCR?

    • Oracle Cluster Registry (OCR)

      • Configuration information for Oracle Clusterware / CRS

    • Introduced in Oracle 10.1

      • Replaced Server Management (SRVM) disk/file

    • Similar to Windows Registry

    • Located on shared storage

    • In Oracle 10.2 and above can be mirrored

      • Maximum two copies


    What is the ocr1

    What is the OCR?

    • Defines cluster resources including:

      • Databases

      • Instances

        • RDBMS

        • ASM

      • Services

      • Node Applications

        • VIP

        • ONS

        • GSD

      • Listener Process


    What is a voting disk

    What is a Voting Disk?

    • Known as Quorum Disk / File in Oracle 9i

    • Located on shared storage accessible to all instances

    • Used to determine RAC instance membership

    • In the event of node failure voting disk is used to determine which instance takes control of cluster

      • Avoids split brain

    • In Oracle 10.2 and above can be mirrored

      • Odd number of copies (1, 3, 5 etc)


    What is vip

    What is VIP?

    • Node application introduced in Oracle 10.1

    • Allows Virtual IP address to be defined for each node

    • All applications connect using Virtual IP addresses

    • If node fails Virtual IP address is automatically relocated to another node

    • Only applies to newly connecting sessions


    What is taf

    What is TAF?

    • TAF is Transparent Application Failover

    • Sessions connected to a failed instance will be terminated

      • Uncommitted transactions will be rolled back

    • Sessions can be reconnected to another instance automatically if using TAF

      • Can optionally re-execute in-progress SELECT statements

        • Statement re-executed with same SCN

        • Fetches resume at point of failure

      • Session state is lost including

        • Session parameters

        • Package variables

        • Class and ADT instantiations


    What is taf1

    What is TAF?

    • TAF is Transparent Application Failover

    • Requires additional coding in client

    • Requires configuration in TNSNAMES.ORA

    RAC_FAILOVER = (DESCRIPTION =(ADDRESS_LIST =(FAILOVER = ON)(ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1521)))(CONNECT_DATA =(SERVICE_NAME = RAC)(SERVER = DEDICATED)(FAILOVER_MODE =(TYPE=SELECT)(METHOD=BASIC)(RETRIES=30)(DELAY=5))))


    What is fan

    What is FAN?

    • Fast Application Notification (FAN)

    • Introduced in Oracle 10.1

    • Method by which applications can be informed of changes in cluster status

      • Handle node failures

      • Workload balancing

    • Applications must connect using services

    • Can be notified using

      • Server side callouts

      • Fast Connection Failover (FCF)

      • ONS API


    What is ons

    What is ONS?

    • Oracle Notification Service (ONS)

    • Introduced in Oracle 10.1

    • Allows out-of-band messages to be sent to

      • Nodes in cluster

      • Middle-tier application servers

      • Clients

    • Underlying mechanism for Fast Application Notification (FAN)


    Does rac increase availability

    Does RAC Increase Availability?

    • Depends on definition of availability

      • May achieve less unplanned downtime

      • May have more time to respond to failures

    • Instance failover means any node can fail without total loss of service

    • Must provide have overcapacity in cluster to survive failover

      • Additional Oracle and RAC licenses

      • Load can be distributed over all running nodes

      • Can use Grid to provision additional nodes


    Does rac increase availability1

    Does RAC Increase Availability?

    • Can still get data corruptions

      • Human errors / software errors

      • Only one logical copy of data

      • Only one logical copy of application / Oracle software

    • Lots of possibility for human errors

      • Power / network cabling / storage configuration

    • Upgrades and patches are more complex

      • Can upgrade software on subset of nodes

      • If database is affected then still need downtime


    A rough guide to rac

    Scalability


    What is scalability

    What is Scalability?

    • RAC overhead means that linear scalability is difficult to achieve

      • Global Cache Services (blocks)

      • Global Enqueue Services (locks)

    • As number of instances increases, probability that instance is a resource master decreases

    • Scaling factor of 1.8 is considered good

    • Dependent on application design and implementation

    • Scaling factor improves with

      • Node affinity

      • Elimination of contention


    What is scalability1

    What is Scalability?

    • Scalability is the relationship between increments of resources and workloads

    • Can be any resource but with RAC normally refers to adding instances

    • Scalability can be

      • linear - optimal but rare

      • non-linear - suboptimal but normal

    Workload

    Workload

    Non-Linear

    Linear

    Resource

    Resource


    What is workload balancing

    What is Workload Balancing?

    • Balancing of workload across available instances

    • Can have

      • Client-side connection balancing

      • Server-side connection balancing

    • Client-side connection balancing

      • Workload distributed randomly across nodes

    RAC = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1521))(LOAD_BALANCE = ON)(FAILOVER = ON)(CONNECT_DATA = (SERVICE_NAME = RAC) (FAILOVER_MODE = (TYPE = SELECT)(METHOD = BASIC)) ))


    What is workload balancing1

    What is Workload Balancing?

    • Server-side connection balancing

    • Dependent on current workload on each node

    • PMON monitors workload and updates listeners

    • Depends on long or short connections

    • In Oracle 10.1

      • Set PREFER_LEAST_LOADED_NODE in listener.ora

        • OFF for long connections

        • ON for short connections (default)

    • In Oracle 10.2

      • Can specify load balancing goal for each service

        • NONE, SERVICE_TIME or THROUGHPUT

      • Can also specify connection load balancing goal

        • SHORT or LONG


    Increasing scalability

    Increasing Scalability

    • If application scales well on a single-instance then it should scale well on RAC

    • Eliminate contention

      • Use sequences

      • Use locally partitioned tables and indexes

        • Attempt to achieve node affinity

      • Avoid contention for single blocks

        • Distribute rows for hot blocks

          • Small block size e.g. 2048 or 4096

          • ALTER TABLE MINIMIZE RECORDS PER BLOCK

          • High PCTFREE / Low PCTUSED

          • Filler columns e.g. CHAR (2000)


    Increasing scalability1

    Increasing Scalability

    • Use Automatic Segment Space Management

      • Default in Oracle 10.2

    • Use larger block size for read-only objects

      • Reduce number of GCS messages required

    • Minimize lock usage

      • Eliminate unnecessary parsing

        • Increase size of shared pool

        • Bind variables

        • Cursor sharing

    • Use optimistic locking

      • Eliminate unnecessary SELECT FOR UPDATE statements


    A rough guide to rac

    Manageability


    Server parameter file

    Server Parameter File

    • Introduced in Oracle 9.0.1

    • Must reside on shared storage

    • Shared by all RAC instances

    • Binary (not text) files

    • Parameters can be changed using ALTER SYSTEM

    • Can be backed up using the Recovery Manager (RMAN)

    • Created using

    CREATE SPFILE [ = ‘SPFILE_NAME’ ]FROM PFILE [ = ‘PFILE_NAME’ ];

    • init.ora file on each node must contain SPFILE parameter

    SPFILE = <pathname>


    Parameters

    Parameters

    • RAC uses same parameters as single-instance

      • Some must be different on each instance

      • Some must be same on each instance

    • Can be global or local

    [*.]<parameter_name> = <value>[<sid>]<parameter_name> = <value>

    • Must be set using ALTER SYSTEM statement

    ALTER SYSTEM SET parameter = value[ SCOPE = MEMORY | SPFILE | BOTH ] [ SID = <sid>]

    ALTER SYSTEM RESET parameter = value[ SCOPE = MEMORY | SPFILE | BOTH ] [ SID = <sid>]


    Parameters1

    Parameters

    • Some parameters must be same on each instance including *:

      • ACTIVE_INSTANCE_COUNT

      • ARCHIVE_LAG_TARGET

      • CLUSTER_DATABASE

      • CONTROL_FILES

      • DB_BLOCK_SIZE

      • DB_DOMAIN

      • DB_FILES

      • DB_NAME

      • DB_RECOVERY_FILE_DEST

      • DB_RECOVERY_FILE_DEST_SIZE

      • DB_UNIQUE_NAME

      • MAX_COMMIT_PROPAGATION_DELAY

      • TRACE_ENABLED

      • UNDO_MANAGEMENT

    • * Correct for Oracle 10.1


    Parameters2

    Parameters

    • Some parameters, if used, must be different on each instance including

      • THREAD

      • INSTANCE_NUMBER

      • INSTANCE_NAME

      • UNDO_TABLESPACE

      • ROLLBACK_SEGMENTS

    • DML_LOCKS must be identical on each instance if set to zero


    A rough guide to rac

    DBCA

    • Can be used to

      • Create RAC database and instances

      • Create ASM instance

      • Manage ASM instance (10.2)

      • Add RAC instances

      • Create RAC database templates

        • structure only

        • with data

      • Create clone RAC database (10.2)

      • Create, Manage and Drop Services

      • Drop instances and database


    What is srvctl

    What is SRVCTL?

    • Utility used to manage cluster database

    • Configured in Oracle Cluster Registry (OCR)

    • Controls

      • Database

      • Instance

      • ASM

      • Listener

      • Node Applications

      • Services

    • Options include

      • Start / Stop

      • Enable / Disable

      • Add / Delete

      • Show current configuration

      • Show current status


    Srvctl examples

    SRVCTL - Examples

    • Starting and Stopping a Database

    srvctl start database -d RACsrvctl stop database -d RAC

    • Starting and Stopping an Instance

    srvctl start instance -d RAC -i RAC1srvctl stop instance -d RAC -i RAC1

    • Starting and Stopping a Service

    srvctl start service -d RAC -s SERVICE1srvctl stop service -d RAC -s SERVICE1

    • Starting and Stopping ASM on a specified node

    srvctl start asm -n node1srvctl stop asm -n node1


    Enterprise manager

    Enterprise Manager

    • In Oracle 10.1 and above

      • Database Control

        • Installed by DBCA

        • Controls single cluster

      • Grid Control

        • Uses separate repository

        • Oracle 10.2 version available

          • Requires Oracle 10.1 database

      • Fully supports RAC in both versions

      • Except

        • Oracle 10.1 cannot create / delete services

        • Oracle 10.2 better interconnect performance monitoring


    What is cluvfy

    What is CLUVFY?

    • Introduced in Oracle 10.2

    • Supplied with Oracle Clusterware

      • Can be downloaded from OTN (Linux and Windows)

    • Written in Java - requires JRE (supplied)

    • Also works with 10.1 (specify -10gR1 option)

    • Checks cluster configuration

      • stages - verifies all steps for specified stage have been completed

      • components - verifies specified component has been correctly installed


    Cluvfy

    CLUVFY

    • Stages include


    Cluvfy1

    CLUVFY

    • Components include


    Cluvfy2

    CLUVFY

    • For example, to check configuration before installing Oracle Clusterware on node1 and node2 use:

    sh runcluvfy.sh stage -pre crsinst -n node1,node2

    • Checks:

      • node reachability

      • user equivalence

      • administrative privileges

      • node connectivity

      • shared stored accessibility

    • If any checks fail append -verbose to display more information


    Other utilities

    Other Utilities

    • Additional RAC utilities and diagnostics include

      • OCRCONFIG

      • OCRCHECK

      • OCRDUMP

      • CRSCTL

      • CRS_STAT

    • Additional RAC diagnostics can be obtained using

      • ORADEBUG utility

        • DUMP option

        • LKDEBUG option

      • Events


    Does rac improve manageability

    Does RAC Improve Manageability?

    • Advantages

      • Fewer databases to manage

      • Easier to monitor

      • Easier to upgrade

      • Easier to control resource allocation

      • Resources can be shared between applications

    • Disadvantages

      • Upgrades potentially more complex

      • Downtime may affect more applications

      • Requires more experienced operational staff

        • Higher cost / harder to replace


    A rough guide to rac

    Total Cost ofOwnership


    Reduction in tco

    Reduction in TCO?

    • Possible for sites with legacy systems

      • Mainframes / Minicomputers

      • Applications / Packages

    • RAC option adds 50% to licence costs except for

      • Users with site licences

      • Standard edition (10.1+, max 4 CPU with ASM)

    • Retrain existing staff or use dedicated staff

    • Consolidation may bring economies of scale

      • Monitoring

      • Backups

      • Disaster Recovery


    Reduction in tco1

    Reduction in TCO?

    • Additional resources required

      • Redundant hardware

        • Nodes

        • Network switches

        • SAN fabric

        • Hardware e.g. fibre channel cards

    • Reduction in hardware support costs

      • May not require 24 hour support

      • Viable to hold stock of spare components


    What are the alternatives to rac

    What are the Alternatives to RAC?

    • Data Guard

      • Physical Standby

        • Introduced in Oracle 7.3.4

        • Stable, well proven technology

        • Requires redundant hardware

        • Implemented by many sites

        • Can be used with RAC

      • Logical Standby

        • Introduced in Oracle 9.2

        • Still not widely adopted

      • Streams

        • Introduced in Oracle 9.2

        • Implemented by increasing number of sites

      • Advanced Replication


    What are the alternatives to rac1

    What are the Alternatives to RAC?

    • Symmetric Multiprocessing (SMP) Systems

      • Single Point of Failure

      • Simplified configuration

      • Eliminate RAC overhead

    • Parallel systems

      • For systems with deterministic input

      • Messaging

      • Data Warehouses

    • Other Clustering Technologies

      • SAN

      • Operating System

      • etc


    Conclusion

    Conclusion

    • Success of RAC deployments dependent on

      • Application design and implementation

      • Failover requirements

      • IT infrastructure

      • Flexibility and commitment of IT department(s)

    • Before deploying RAC

      • Investigate and reject alternatives

      • Perform proof of concept

        • Test application

        • Evaluate benefits and costs

        • Learn RAC concepts and administration

      • Buy a good book :)


    Thank you for your interest

    Thank you for your interest

    For more information and to provide feedback please contact me

    My e-mail address is:

    [email protected]

    My website address is:

    www.juliandyke.com


  • Login