Using openvms clusters for disaster tolerance
Download
1 / 321

Using OpenVMS Clusters for Disaster Tolerance - PowerPoint PPT Presentation


  • 133 Views
  • Uploaded on

Using OpenVMS Clusters for Disaster Tolerance. Keith Parris Systems/Software Engineer June 16, 2008. Introduction to Disaster Tolerance Using OpenVMS Clusters. Media coverage of OpenVMS and Disaster Tolerance.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Using OpenVMS Clusters for Disaster Tolerance' - hedley-griffin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Using openvms clusters for disaster tolerance

Using OpenVMS Clusters for Disaster Tolerance

Keith Parris

Systems/Software Engineer

June 16, 2008




“Among operating systems, OpenVMS and Unix seem to be favored more than others. Alpha/OpenVMS, for example, has built-in clustering technology that many companies use to mirror data between sites. Many financial institutions, including Commerzbank, the International Securities Exchange and Deutsche Borse AG, rely on VMS-based mirroring to protect their heavy-duty transaction-processing systems.”

“Disaster Recovery: Are you ready for trouble?” by Drew Robb

Computerworld, April 25, 2005

http://www.computerworld.com/securitytopics/security/recovery/story/0,10801,101249,00.html


“Deutsche Borse, a German exchange for stocks and derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. … ‘DR is not about cold or warm backups, it's about having your data active and online no matter what,’ says Michael Gruth, head of systems and network support at Deutsche Borse. ‘That requires cluster technology which is online at both sites.’"

“Disaster Recovery: Are you ready for trouble?” by Drew Robb

Computerworld, April 25, 2005

http://www.computerworld.com/securitytopics/security/recovery/story/0,10801,101249,00.html


Terminology derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. …


High availability ha
High Availability (HA) derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. …

  • Ability for application processing to continue with high probability in the face of common (mostly hardware) failures

  • Typical technique: redundancy

2N

mirroring

duplexing

3N

triplex

N+1

sparing


High availability ha1
High Availability (HA) derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. …

  • Typical technologies:

    • Redundant power supplies and fans

    • RAID / mirroring for disks

    • Clusters of servers

    • Multiple NICs, redundant routers

    • Facilities: Dual power feeds, n+1 Air Conditioning units, UPS, generator


Fault tolerance ft
Fault Tolerance (FT) derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. …

  • Ability for a computer system to continue operating despite hardware and/or software failures

  • Typically requires:

    • Special hardware with full redundancy, error-checking, and hot-swap support

    • Special software

  • Provides the highest availability possible within a single datacenter

VAXft

NonStop


Disaster recovery dr
Disaster Recovery (DR) derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. …

  • Disaster Recovery is the ability to resume operations after a disaster

    • Disaster could be as bad as destruction of the entire datacenter site and everything in it

    • But many events short of total destruction can also disrupt service at a site:

      • Power loss in the area for an extended period of time

      • Bomb threat (or natural gas leak) prompting evacuation of everyone from the site

      • Water leak

      • Air conditioning failure


Disaster recovery dr1
Disaster Recovery (DR) derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. …

  • Basic principle behind Disaster Recovery:

    • To be able to resume operations after a disaster implies off-site data storage of some sort


Disaster recovery dr2
Disaster Recovery (DR) derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. …

  • Typically,

    • There is some delay before operations can continue (many hours, possibly days), and

    • Some transaction data may have been lost from IT systems and must be re-entered

  • Success hinges on ability to restore, replace, or re-create:

    • Data (and external data feeds)

    • Facilities

    • Systems

    • Networks

    • User access


Dr methods
DR Methods derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. …

  • Tape Backup

  • Expedited hardware replacement

  • Vendor Recovery Site

  • Data Vaulting

  • Hot Site


Dr methods tape backup
DR Methods: derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. … Tape Backup

  • Data is copied to tape, with off-site storage at a remote site

  • Very-common method. Inexpensive.

  • Data lost in a disaster is:

    • All the changes since the last tape backup that is safely located off-site

  • There may be significant delay before data can actually be used


Dr methods expedited hardware replacement
DR Methods: derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. … Expedited Hardware Replacement

  • Vendor agrees that in the event of a disaster, a complete set of replacement hardware will be shipped to the customer within a specified (short) period of time

    • HP has Quick Ship program

  • Typically there would be at least several days of delay before data can be used


Dr methods vendor recovery site
DR Methods: derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. … Vendor Recovery Site

  • Vendor provides datacenter space, compatible hardware, networking, and sometimes user work areas as well

    • When a disaster is declared, systems are configured and data is restored to them

  • Typically there are hours to days of delay before data can actually be used


Dr methods data vaulting
DR Methods: derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. … Data Vaulting

  • Copy of data is saved at a remote site

    • Periodically or continuously, via network

    • Remote site may be own site or at a vendor location

  • Minimal or no data may be lost in a disaster

  • There is typically some delay before data can actually be used


Dr methods hot site
DR Methods: derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. … Hot Site

  • Company itself (or a vendor) provides pre-configured compatible hardware, networking, and datacenter space

  • Systems are pre-configured, ready to go

    • Data may already resident be at the Hot Site thanks to Data Vaulting

  • Typically there are minutes to hours of delay before data can be used


Disaster tolerance vs disaster recovery
Disaster Tolerance vs. derivatives, has deployed an OpenVMS cluster over two sites situated 5 kilometers apart. … Disaster Recovery

  • Disaster Recovery is the ability to resume operations after a disaster.

  • Disaster Tolerance is the ability tocontinueoperations uninterrupted despite a disaster.


“ Disaster Tolerance… Expensive? Yes. But if an hour of downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

Enterprise IT Planet, August 18, 2004

http://www.enterpriseitplanet.com/storage/features/article.php/3396941


Disaster tolerant hp platforms
Disaster-Tolerant HP Platforms downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • OpenVMS

  • HP-UX and Linux

  • Tru64

  • NonStop

  • Microsoft


Openvms clusters
OpenVMS Clusters downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”


Hp ux and linux
HP-UX and Linux downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”


Tru64
Tru64 downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”


Nonstop
NonStop downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”


Microsoft
Microsoft downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”


Disaster tolerance ideals
Disaster Tolerance Ideals downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Ideally, Disaster Tolerance allows one to continue operations uninterrupted despite a disaster:

    • Without any appreciable delays

    • Without any lost transaction data


Disaster tolerance vs disaster recovery1
Disaster Tolerance vs. Disaster Recovery downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Businesses vary in their requirements with respect to:

    • Acceptable recovery time

    • Allowable data loss

  • So some businesses need only Disaster Recovery, and some need Disaster Tolerance

    • And many need DR for some (less-critical) functions and DT for other (more-critical) functions


Disaster tolerance vs disaster recovery2
Disaster Tolerance vs. Disaster Recovery downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Basic Principle:

    • Determine requirements based on business needs first,

      • Then find acceptable technologies to meet the needs of each area of the business


Disaster tolerance and business needs
Disaster Tolerance and Business Needs downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Even within the realm of businesses needing Disaster Tolerance, business requirements vary with respect to:

    • Acceptable recovery time

    • Allowable data loss

  • Technologies also vary in their ability to achieve the Disaster Tolerance ideals of no data loss and zero recovery time

  • So we need ways of measuring needs and comparing different solutions


Quantifying disaster tolerance and disaster recovery requirements
Quantifying Disaster Tolerance and downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”Disaster Recovery Requirements

  • Commonly-used metrics:

    • Recovery Point Objective (RPO):

      • Amount of data loss that is acceptable, if any

    • Recovery Time Objective (RTO):

      • Amount of downtime that is acceptable, if any


Recovery point objective rpo
Recovery Point Objective (RPO) downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Recovery Point Objective is measured in terms of time

  • RPO indicates the point in time to which one is able to recover the data after a failure, relative to the time of the failure itself

  • RPO effectively quantifies the amount of data loss permissible before the business is adversely affected

Recovery Point Objective

Time

Disaster

Backup


Recovery time objective rto
Recovery Time Objective (RTO) downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Recovery Time Objective is also measured in terms of time

  • Measures downtime:

    • from time of disaster until business can continue

  • Downtime costs vary with the nature of the business, and with outage length

Recovery Time Objective

Time

Business

Resumes

Disaster


Disaster tolerance vs disaster recovery based on rpo and rto metrics
Disaster Tolerance vs. Disaster Recovery downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”based on RPO and RTO Metrics

Increasing Data Loss

Recovery Point Objective

Disaster Recovery

Disaster Tolerance

Zero

Zero

Recovery Time Objective

Increasing Downtime


Examples of business requirements and rpo rto values
Examples of Business Requirements downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”and RPO / RTO Values

  • Greeting card manufacturer

    • RPO = zero; RTO = 3 days

  • Online stock brokerage

    • RPO = zero; RTO = seconds

  • ATM machine

    • RPO = hours; RTO = minutes

  • Semiconductor fabrication plant

    • RPO = zero; RTO = minutes

      • but data protection by geographical separation is not needed


Recovery point objective rpo1
Recovery Point Objective (RPO) downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • RPO examples, and technologies to meet them:

    • RPO of 24 hours:

      • Backups at midnight every night to off-site tape drive, and recovery is to restore data from set of last backup tapes

    • RPO of 1 hour:

      • Ship database logs hourly to remote site; recover database to point of last log shipment

    • RPO of a few minutes:

      • Mirror data asynchronously to remote site

    • RPO of zero:

      • Shadow data to remote site (strictly synchronous replication)


Recovery time objective rto1
Recovery Time Objective (RTO) downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • RTO examples, and technologies to meet them:

    • RTO of 72 hours:

      • Restore tapes to configure-to-order systems at vendor DR site

    • RTO of 12 hours:

      • Restore tapes to system at hot site with systems already in place

    • RTO of 4 hours:

      • Data vaulting to hot site with systems already in place

    • RTO of 1 hour:

      • Disaster-tolerant cluster with controller-based cross-site disk mirroring


Recovery time objective rto2
Recovery Time Objective (RTO) downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • RTO examples, and technologies to meet them:

    • RTO of 10 seconds:

      • Disaster-tolerant cluster with:

        • Redundant inter-site links, carefully configured

          • To avoid bridge Spanning Tree Reconfiguration delay

        • Host-based Volume Shadowing for data replication

          • To avoid time-consuming manual failover process with controller-based mirroring

        • Tie-breaking vote at a 3rd site

          • To avoid loss of quorum after site failure

        • Distributed Lock Manager and Cluster-Wide File System (or the equivalent in database software), allowing applications to run at both sites simultaneously

          • To avoid having to start applications at failover site after the failure


Cluster Technology downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”


Clustering
Clustering downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Allows a set of individual computer systems to be used together in some coordinated fashion


Cluster types
Cluster types downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Different types of clusters meet different needs:

    • Scalability clusters allow multiple nodes to work on different portions of a sub-dividable problem

      • Workstation farms, compute clusters, Beowulf clusters

    • Availability clusters allow one node to take over application processing if another node fails

  • For Disaster Tolerance, we’re talking primarily about Availability clusters

    • (geographically dispersed)


High availability clusters
High Availability Clusters downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Transparency of failover and degrees of resource sharing differ:

    • “Shared-Nothing” clusters

    • “Shared-Storage” clusters

    • “Shared-Everything” clusters


Shared nothing clusters
“Shared-Nothing” Clusters downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Data may be partitioned among nodes

  • Only one node is allowed to access a given disk or to run a specific instance of a given application at a time, so:

    • No simultaneous access (sharing) of disks or other resources is allowed (and this must be enforced in some way), and

    • No method of coordination of simultaneous access (such as a Distributed Lock Manager) exists, since simultaneous access is never allowed


Shared storage clusters
“Shared-Storage” Clusters downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • In simple “Fail-over” clusters, one node runs an application and updates the data; another node stands idly by until needed, then takes over completely

  • In more-sophisticated clusters, multiple nodes may access data, but typically one node at a time “serves” a file system to the rest of the nodes, and performs all coordination for that file system


Shared everything clusters
“Shared-Everything” Clusters downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • “Shared-Everything” clusters allow any application to run on any node or nodes

    • Disks are accessible to all nodes under a Cluster File System

    • File sharing and data updates are coordinated by a Lock Manager


Cluster file system
Cluster File System downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Allows multiple nodes in a cluster to access data in a shared file system simultaneously

  • View of file system is the same from any node in the cluster


Distributed lock manager
Distributed Lock Manager downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Allows systems in a cluster to coordinate their access to shared resources, such as:

    • Mass-storage devices (disks, tape drives)

    • File systems

    • Files, and specific data within files

    • Database tables


Multi site clusters
Multi-Site Clusters downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Consist of multiple sites with one or more systems, in different locations

  • Systems at each site are all part of the same cluster

  • Sites are typically connected by bridges (or bridge-routers

    • Pure routers don’t pass the SCS cluster protocol used within OpenVMS clusters)

      • Work is underway to support for IP as a cluster interconnect option, as noted in the OpenVMS Roadmap


Trends downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”


Trends
Trends downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Increase in disasters

  • Longer inter-site distances for better protection

  • Business pressures for shorter distances for performance

  • Increasing pressure not to bridge LANs between sites


Trends downtime costs you millions of dollars, or could result in loss of life, the price is worth paying. That's why big companies are willing to spend big to achieve disaster tolerance.”

  • Trends

    • Increase in Disasters


“Natural disasters have quadrupled over the last two decades, from an average of 120 a year in the early 1980s to as many as 500 today.”

Continuity Insights Magazine

Nov./Dec. 2007 issue, page 10


“There has been a six-fold increase in floods since 1980. The number of floods and wind-storms have increased from 60 in 1980 to 240 last year.”

Continuity Insights Magazine

Nov./Dec. 2007 issue, page 10


“Meanwhile the number of geothermal events, such as earthquakes and volcanic eruptions, has stayed relatively static.”

Continuity Insights Magazine

Nov./Dec. 2007 issue, page 10


Increase in disasters
Increase in Disasters earthquakes and volcanic eruptions, has stayed relatively static.”

http://www.oxfam.org/en/files/bp108_climate_change_alarm_0711.pdf/download


Increase in number killed by disasters
Increase in Number Killed by Disasters earthquakes and volcanic eruptions, has stayed relatively static.”

http://www.oxfam.org/en/files/bp108_climate_change_alarm_0711.pdf/download


Trends earthquakes and volcanic eruptions, has stayed relatively static.”

  • Trends

    • Longer inter-site distances for better protection


“Some CIOs are imagining potential disasters that go well beyond the everyday hiccups that can disrupt applications and networks. Others, recognizing how integral IT is to business today, are focusing on the need to recover instantaneously from any unforeseen event.” …“It's a different world. There are so many more things to consider than the traditional fire, flood and theft.”

“Redefining Disaster“

Mary K. Pratt, Computerworld, June 20, 2005

http://www.computerworld.com/hardwaretopics/storage/story/0,10801,102576,00.html


Northeast us before blackout
Northeast US Before Blackout beyond the everyday hiccups that can disrupt applications and networks. Others, recognizing how integral IT is to business today, are focusing on the need to recover instantaneously from any unforeseen event.” …

Source: NOAA/DMSP


Northeast us after blackout
Northeast US After Blackout beyond the everyday hiccups that can disrupt applications and networks. Others, recognizing how integral IT is to business today, are focusing on the need to recover instantaneously from any unforeseen event.” …

Source: NOAA/DMSP


“The blackout has pushed many companies to expand their data center infrastructures to support data replication between two or even three IT facilities -- one of which may be located on a separate power grid.”

Computerworld, August 2, 2004

http://www.computerworld.com/securitytopics/security/recovery/story/0,10801,94944,00.html


“You have to be far enough apart to make sure that conditions in one place are not likely to be duplicated in the other.“… “A useful rule of thumb might be a minimum of about 50 km, the length of a MAN, though the other side of the continent might be necessary to play it safe.”

“Disaster Recovery Sites: How Far Away is Far Enough?”

Drew Robb, Datamation, October 4, 2005

http://www.enterprisestorageforum.com/continuity/features/article.php/3552971


Trends longer inter site distances for better protection
Trends: conditions in one place are not likely to be duplicated in the other.“… “A useful rule of thumb might be a minimum of about 50 km, the length of a MAN, though the other side of the continent might be necessary to play it safe.”Longer inter-site distances for better protection

  • In the past, protection was focused against risks like fires, floods, tornadoes. 1 to 5 miles was fine between sites.

  • Right after 9/11, 60 to100 miles looked much better.

  • After the Northeast Blackout of 2003, and increasing awareness of the possibility of a terrorist group obtaining a nuclear device and wiping out an entire metropolitan area is no longer inconceivable.

    • Resulting pressure is for inter-site distances of 1,000 to 1,500 miles

  • Challenges:

    • Telecommunications links

    • Latency due to speed of light adversely affects performance


Trends conditions in one place are not likely to be duplicated in the other.“… “A useful rule of thumb might be a minimum of about 50 km, the length of a MAN, though the other side of the continent might be necessary to play it safe.”

  • Trends

    • Business pressures for shorter distances for performance


“A 1-millisecond advantage in trading applications can be worth $100 million a year to a major brokerage firm, by one estimate.”

Richard Martin, InformationWeek,

April 23, 2007


“The fastest systems, running from traders' desks to exchange data centers, can execute transactions in a few milliseconds -- so fast, in fact, that the physical distance between two computers processing a transaction can slow down how fast it happens.”

Richard Martin, InformationWeek,

April 23, 2007


“This problem is called data latency -- delays measured in split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”

Richard Martin, InformationWeek,

April 23, 2007


Trends split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”

  • Trends

    • Increasing pressure not to bridge LANs between sites


Trends increasing resistance to lan bridging
Trends: split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”Increasing Resistance to LAN Bridging

  • In the past, setting up a VLAN spanning sites for an OpenVMS disaster-tolerant cluster was common

  • Networks are now IP-centric

  • IP network mindset sees LAN bridging as “bad,” sometimes even “totally unacceptable”

  • Alternatives:

    • Separate, private link for OpenVMS Multi-site Cluster

    • Metropolitan Area Networks (MANs) using MPLS

    • Ethernet-over-IP (EoIP)

    • SCS-over-IP support planned for OpenVMS 8.4


Foundation for Disaster Tolerance split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”


Disaster tolerant clusters foundation
Disaster-Tolerant Clusters: split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”Foundation

  • Goal: Survive loss of an entire datacenter (or 2)

  • Foundation:

    • Two or more datacenters a “safe” distance apart

    • Cluster software for coordination

    • Inter-site link for cluster interconnect

    • Data replication of some sort for 2 or more identical copies of data, one at each site


Disaster tolerant clusters foundation1
Disaster-Tolerant Clusters: split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”Foundation

  • Foundation:

    Management and monitoring tools

    • Remote system console access or KVM system

    • Failure detection and alerting, for things like:

      • Network monitoring (especially for inter-site link)

      • Shadowset member loss

      • Node failure

    • Quorum recovery tool or mechanism (for 2-site clusters with balanced votes)

  • HP toolsets for managing OpenVMS DT Clusters:

    • Tools included in DTCS package (Windows-based)

    • CockpitMgr (OpenVMS-based)


Disaster tolerant clusters foundation2
Disaster-Tolerant Clusters: split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”Foundation

  • Foundation:

    Knowledge and Implementation Assistance

    • Feasibility study, planning, configuration design, and implementation assistance, plus staff training

      • HP recommends HP Disaster Tolerant Services consulting services to meet this need:

        • http://h20219.www2.hp.com/services/cache/10597-0-0-225-121.aspx


Disaster tolerant clusters foundation3
Disaster-Tolerant Clusters: split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”Foundation

  • Foundation:

    Procedures and Documentation

    • Carefully-planned (and documented) procedures for:

      • Normal operations

      • Scheduled downtime and outages

      • Detailed diagnostic and recovery action plans for various failure scenarios


Disaster tolerant clusters foundation4
Disaster-Tolerant Clusters: split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”Foundation

  • Foundation:

    Data Replication

    • Data is constantly replicated to or copied to a 2nd site (& possibly a 3rd), so data is preserved in a disaster

    • Solution must also be able to redirect applications and users to the site with the up-to-date copy of the data

    • Examples:

      • OpenVMS Volume Shadowing Software

      • Continuous Access for EVA or XP

      • Database replication

      • Reliable Transaction Router


Disaster tolerant clusters foundation5
Disaster-Tolerant Clusters: split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”Foundation

  • Foundation:

    Complete redundancy in facilities and hardware

    • Second site with its own storage, networking, computing hardware, and user access mechanisms in place

    • Sufficient computing capacity is in place at the alternate site(s) to handle expected workloads alone if one site is destroyed

    • Monitoring, management, and control mechanisms are in place to facilitate fail-over


Planning for Disaster Tolerance split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”


Planning for disaster tolerance
Planning for Disaster Tolerance split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”

  • Remembering that the goal is to continue operating despite loss of an entire datacenter,

    • All the pieces must be in place to allow that:

      • User access to both sites

      • Network connections to both sites

      • Operations staff at both sites

    • Business can’t depend on anything that is only at either site


Planning for dt site selection
Planning for DT: split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”Site Selection

Sites must be carefully selected:

  • Avoid hazards

    • Especially hazards common to both (and the loss of both datacenters at once which might result from that)

  • Make them a “safe” distance apart

  • Select site separation in a “safe” direction


Planning for dt what is a safe distance
Planning for DT: split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”What is a “Safe Distance”

Analyze likely hazards of proposed sites:

  • Natural hazards

    • Fire (building, forest, gas leak, explosive materials)

    • Storms (Tornado, Hurricane, Lightning, Hail, Ice)

    • Flooding (excess rainfall, dam breakage, storm surge, broken water pipe)

    • Earthquakes, Tsunamis


Planning for dt what is a safe distance1
Planning for DT: split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”What is a “Safe Distance”

Analyze likely hazards of proposed sites:

  • Man-made hazards

    • Nearby transportation of hazardous materials (highway, rail)

    • Terrorist with a bomb

    • Disgruntled customer with a weapon

    • Enemy attack in war (nearby military or industrial targets)

    • Civil unrest (riots, vandalism)


Former atlas e missile silo site in kimball nebraska
Former Atlas E Missile Silo Site in Kimball, Nebraska split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”


Planning for dt site separation distance
Planning for DT: split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”Site Separation Distance

  • Make sites a “safe” distance apart

  • This must be a compromise. Factors:

    • Risks

    • Performance (inter-site latency)

    • Interconnect costs

    • Ease of travel between sites

    • Availability of workforce


Planning for dt site separation distance1
Planning for DT: split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”Site Separation Distance

  • Select site separation distance:

    • 1-3 miles: protects against most building fires, natural gas leaks, armed intruders, terrorist bombs

    • 10-30 miles: protects against most tornadoes, floods, hazardous material spills, release of poisonous gas, non-nuclear military bomb strike

    • 100-300 miles: protects against most hurricanes, earthquakes, tsunamis, forest fires, most biological weapons, most power outages, suitcase-sized nuclear bomb

    • 1,000-3,000 miles: protects against “dirty” bombs, major region-wide power outages, and possibly military nuclear attacks

Threat Radius


"You have to be far enough away to be beyond the immediate threat you are planning for.“…"At the same time, you have to be close enough for it to be practical to get to the remote facility rapidly.“

“Disaster Recovery Sites: How Far Away is Far Enough?” By Drew Robb

Enterprise Storage Forum, September 30, 2005

http://www.enterprisestorageforum.com/continuity/features/article.php/3552971


“Survivors of hurricanes, floods, and the London terrorist bombings offer best practices and advice on disaster recovery planning.”

“A Watertight Plan” By Penny Lunt Crosman, IT Architect, Sept. 1, 2005

http://www.itarchitect.com/showArticle.jhtml?articleID=169400810



Planning for dt site separation direction
Planning for DT: Architect, Sept. 1, 2005Site Separation Direction

  • Select site separation direction:

    • Not along same earthquake fault-line

    • Not along likely storm tracks

    • Not in same floodplain or downstream of same dam

    • Not on the same coastline

    • Not in line with prevailing winds (that might carry hazardous materials or radioactive fallout)


Planning for dt providing total redundancy
Planning for DT: Architect, Sept. 1, 2005Providing Total Redundancy

  • Redundancy must be provided for:

    • Datacenter and facilities (A/C, power, user workspace, etc.)

    • Data

      • And data feeds, if any

    • Systems

    • Network

    • User access and workspace

    • Workers themselves


Planning for dt life after a disaster
Planning for DT: Architect, Sept. 1, 2005Life After a Disaster

  • Also plan for continued operation after a disaster

    • Surviving site will likely have to operate alone for a long period before the other site can be repaired or replaced

    • If surviving site was “lights-out”, it will now need to have staff on-site

    • Provide redundancy within each site

      • Facilities: Power feeds, A/C

      • Mirroring or RAID to protect disks

      • Clustering for servers

      • Network redundancy


Planning for dt life after a disaster1
Planning for DT: Architect, Sept. 1, 2005Life After a Disaster

  • Plan for continued operation after a disaster

    • Provide enough capacity within each site to run the business alone if the other site is lost

      • and handle normal workload growth rate

    • Having 3 full datacenters is an option to seriously consider:

      • Leaves two redundant sites after a disaster

      • Leaves 2/3 capacity instead of ½


Planning for dt life after a disaster2
Planning for DT: Architect, Sept. 1, 2005Life After a Disaster

  • When running workload at both sites, be careful to watch utilization.

  • Utilization over 35% will result in utilization over 70% if one site is lost

  • Utilization over 50% will mean there is no possible way one surviving site can handle all the workload


Response time vs utilization
Response time vs. Utilization Architect, Sept. 1, 2005


Response time vs utilization impact of losing 1 site
Response time vs. Utilization: Architect, Sept. 1, 2005Impact of losing 1 site


Planning for dt testing
Planning for DT: Architect, Sept. 1, 2005Testing

  • Separate test environment is very helpful, and highly recommended

  • Spreading test environment across inter-site link is best

  • Good practices require periodic testing of a simulated disaster. Allows you to:

    • Validate your procedures

    • Train your people


Inter-Site Links Architect, Sept. 1, 2005


Inter site link s
Inter-site Link(s) Architect, Sept. 1, 2005

  • Sites linked by:

    • DS-3/T3 (E3 in Europe) or ATM circuits from a TelCo

    • Microwave link

    • Radio Frequency link (e.g. UHF, wireless)

    • Free-Space Optics link (short distance, low cost)

    • “Dark fiber” where available (or lambdas over WDM):

      • Ethernet over fiber (10 mb, Fast, Gigabit, 10-Gigabit)

      • Fibre Channel

      • FDDI

      • Fiber links between Memory Channel switches (up to 3 km)


Dark fiber availability example
Dark Fiber Availability Example Architect, Sept. 1, 2005

Source: AboveNet above.net


Dark fiber availability example1
Dark Fiber Availability Example Architect, Sept. 1, 2005

Source: AboveNetabove.net


Inter site link options
Inter-site Link Options Architect, Sept. 1, 2005

  • Sites linked by:

    • Wave Division Multiplexing (WDM), in either Coarse (CWDM) or Dense (DWDM) Wave Division Multiplexing flavors

      • Can carry any of the types of traffic that can run over a single fiber

    • Individual WDM channel(s) from a vendor, rather than entire dark fibers


Bandwidth of inter site link s
Bandwidth of Inter-Site Link(s) Architect, Sept. 1, 2005


Inter site link choices
Inter-Site Link Choices Architect, Sept. 1, 2005

  • Service type choices

    • Telco-provided data circuit service, own microwave link, FSO link, dark fiber, lambda?

    • Dedicated bandwidth, or shared pipe?

    • Single or multiple (redundant) links? If redundant links, then:

      • Diverse paths?

      • Multiple vendors?


Introduction to Data Replication Architect, Sept. 1, 2005


Cross site data replication methods
Cross-site Data Replication Methods Architect, Sept. 1, 2005

  • Hardware

    • Storage controller

  • Software

    • Host-based Volume Shadowing (mirroring) software

    • Database replication or log-shipping

    • Transaction-processing monitor or middleware with replication functionality, e.g. Reliable Transaction Router (RTR)


Host based volume shadowing
Host-Based Volume Shadowing Architect, Sept. 1, 2005

  • Host software keeps multiple disks identical

  • All writes go to all shadowset members

  • Reads can be directed to any one member

    • Different read operations can go to different members at once, helping throughput

  • Synchronization (or Re-synchronization after a failure) is done with a Copy operation

  • Re-synchronization after a node failure is done with a Merge operation


Managing replicated data
Managing Replicated Data Architect, Sept. 1, 2005

  • If the inter-site link fails, both sites might conceivably continue to process transactions, and the copies of the data at each site would continue to diverge over time

  • This is called a “Partitioned Cluster” in the OpenVMS world, or “Split-Brain Syndrome” in the UNIX world

  • The most common solution to this potential problem is a Quorum-based scheme


Introduction to the Quorum Scheme Architect, Sept. 1, 2005


Quorum schemes
Quorum Schemes Architect, Sept. 1, 2005

  • Idea comes from familiar parliamentary procedures

  • Systems and/or disks are given votes

  • Quorum is defined to be a simple majority of the total votes


Quorum schemes1
Quorum Schemes Architect, Sept. 1, 2005

  • In the event of a communications failure,

    • Systems in the minority voluntarily suspend processing, while

    • Systems in the majority can continue to process transactions


Quorum schemes2
Quorum Schemes Architect, Sept. 1, 2005

  • To handle cases where there are an even number of votes

    • For example, with only 2 systems,

    • Or where half of the votes are at each of 2 sites

  • provision may be made for

    • a tie-breaking vote, or

    • human intervention


Quorum schemes tie breaking vote
Quorum Schemes: Architect, Sept. 1, 2005Tie-breaking vote

  • This can be provided by a disk:

    • Quorum Disk

  • Or by a system with a vote, located at a 3rd site

    • Additional OpenVMS cluster member, called a “quorum node”


Quorum scheme under openvms
Quorum Scheme Under OpenVMS Architect, Sept. 1, 2005

  • Rule of Total Connectivity

  • VOTES

  • EXPECTED_VOTES

  • Quorum

  • Loss of Quorum

  • Selection of Optimal Sub-cluster after a failure


Quorum configurations in multi site clusters
Quorum configurations in Architect, Sept. 1, 2005Multi-Site Clusters

  • 3 sites, equal votes in 2 sites

    • Intuitively ideal; easiest to manage & operate

    • 3rd site contains a “quorum node”, and serves as tie-breaker

Site A

2 votes

Site B

2 votes

3rd Site

1 vote


Quorum configurations in multi site clusters1
Quorum configurations in Architect, Sept. 1, 2005Multi-Site Clusters

  • 3 sites, equal votes in 2 sites

    • Hard to do in practice, due to cost of inter-site links beyond on-campus distances

      • Could use links to 3rd site as backup for main inter-site link if links are high-bandwidth and connected together

      • Could use 2 less-expensive, lower-bandwidth links to 3rd site, to lower cost


Quorum configurations in 3 site clusters
Quorum configurations in Architect, Sept. 1, 20053-Site Clusters

N

N

N

N

B

B

B

B

B

B

B

N

N

10 megabit

DS3, Gbe, FC, ATM


Quorum configurations in multi site clusters2
Quorum configurations in Architect, Sept. 1, 2005Multi-Site Clusters

  • 2 sites:

    • Most common & most problematic:

      • How do you arrange votes? Balanced? Unbalanced?

      • If votes are balanced, how do you recover from loss of quorum which will result when either site or the inter-site link fails?

Site A

Site B


Quorum configurations in two site clusters
Quorum configurations in Architect, Sept. 1, 2005Two-Site Clusters

  • One solution: Unbalanced Votes

    • More votes at one site

    • Site with more votes can continue without human intervention in the event of loss of either the other site or the inter-site link

    • Site with fewer votes pauses on a failure and requires manual action to continue after loss of the other site

Site A

2 votes

Site B

1 vote

Can continue automatically

Requires manual intervention to continue alone


Quorum configurations in two site clusters1
Quorum configurations in Architect, Sept. 1, 2005Two-Site Clusters

  • Unbalanced Votes

    • Common mistake:

      • Give more votes to Primary site

      • Leave Standby site unmanned

    • Result: cluster can’t run without Primary site, unless there is human intervention, which is unavailable at the (unmanned) Standby site

Staffed

Lights-out

Site A

1 vote

Site B

0 votes

Can continue automatically

Requires manual intervention to continue alone


Quorum configurations in two site clusters2
Quorum configurations in Architect, Sept. 1, 2005Two-Site Clusters

  • Unbalanced Votes

    • Also very common in remote-shadowing-only clusters (for data vaulting -- not full disaster-tolerant clusters)

      • 0 votes is a common choice for the remote site in this case

        • But that has its dangers

Site A

1 vote

Site B

0 votes

Can continue automatically

Requires manual intervention to continue alone


Optimal sub cluster selection
Optimal Sub-cluster Selection Architect, Sept. 1, 2005

Connection manager compares potential node subsets that could make up surviving portion of the cluster

  • Pick sub-cluster with the most votes

  • If votes are tied, pick sub-cluster with the most nodes

  • If nodes are tied, arbitrarily pick a winner

    • based on comparing SCSSYSTEMID values of set of nodes with most-recent cluster software revision


Two site cluster with unbalanced votes
Two-Site Cluster with Unbalanced Votes Architect, Sept. 1, 2005

1

1

0

0

Shadowsets


Two site cluster with unbalanced votes1
Two-Site Cluster with Unbalanced Votes Architect, Sept. 1, 2005

1

1

0

0

Shadowsets

Which subset of nodes is selected as the optimal sub-cluster?


Two site cluster with unbalanced votes2
Two-Site Cluster with Unbalanced Votes Architect, Sept. 1, 2005

1

1

0

0

Shadowsets

Nodes at this site continue

Nodes at this site CLUEXIT


Two site cluster with unbalanced votes3
Two-Site Cluster with Unbalanced Votes Architect, Sept. 1, 2005

2

2

1

1

Shadowsets

One possible solution


Quorum configurations in two site clusters3
Quorum configurations in Architect, Sept. 1, 2005Two-Site Clusters

  • Balanced Votes

    • Equal votes at each site

    • Manual action required to restore quorum and continue processing in the event of either:

      • Site failure, or

      • Inter-site link failure

Site A

2 votes

Site B

2 votes

Requires manual intervention to continue alone

Requires manual intervention to continue alone


Quorum recovery methods
Quorum Recovery Methods Architect, Sept. 1, 2005

  • Software interrupt at IPL 12 from console

    • IPC> Q

  • Availability Manager (or DECamds):

    • System Fix; Adjust Quorum

  • DTCS (or BRS) integrated tool, using same RMDRIVER (DECamds client) interface as DECamds / AM



Advanced Architect, Sept. 1, 2005Data Replication Discussion


Host based volume shadowing and storageworks continuous access
Host-Based Volume Shadowing and StorageWorks Continuous Access

  • Fibre Channel introduces new capabilities into OpenVMS disaster-tolerant clusters


Fibre channel and scsi in clusters
Fibre Channel and SCSI in Clusters Access

  • Fibre Channel and SCSI are Storage-Only Interconnects

    • Provide access to storage devices and controllers

    • Cannot carry SCS protocol (e.g. Connection Manager and Lock Manager traffic)

      • Need SCS-capable Cluster Interconnect also:

        • Memory Channel, Computer Interconnect (CI), DSSI, FDDI, Ethernet, or Galaxy Shared Memory Cluster Interconnect (SMCI)

    • Fail-over between a direct path and an MSCP-served path is first supported in OpenVMS version 7.3-1


Host based volume shadowing1
Host-Based Volume Shadowing Access

SCS--capable interconnect

Node

Node

FC Switch

FC Switch

EVA

EVA

Host-Based

Shadowset


Host based volume shadowing with inter site fibre channel link
Host-Based Volume Shadowing Accesswith Inter-Site Fibre Channel Link

SCS--capable interconnect

Node

Node

Inter-site FC Link

FC Switch

FC Switch

EVA

EVA

Host-Based

Shadowset


Direct vs mscp served paths
Direct vs. MSCP-Served Paths Access

SCS--capable interconnect

Node

Node

Inter-site FC Link

FC Switch

FC Switch

EVA

EVA

Host-Based

Shadowset


Direct vs mscp served paths1
Direct vs. MSCP-Served Paths Access

SCS--capable interconnect

Node

Node

Inter-site FC Link

FC Switch

FC Switch

EVA

EVA

Host-Based

Shadowset


Host based volume shadowing with inter site fibre channel link1
Host-Based Volume Shadowing Accesswith Inter-Site Fibre Channel Link

SCS--capable interconnect

Node

Node

Inter-site FC Link

FC Switch

FC Switch

EVA

EVA

Host-Based

Shadowset


New failure scenarios scs link ok but fc link broken
New Failure Scenarios: AccessSCS link OK but FC link broken

(Direct-to-MSCP-served path failover provides protection)

SCS--capable interconnect

Node

Node

Inter-site FC Link

FC Switch

FC Switch

EVA

EVA

Host-Based

Shadowset


New failure scenarios scs link broken but fc link ok
New Failure Scenarios: AccessSCS link broken but FC link OK

(Quorum scheme provides protection)

SCS--capable interconnect

Node

Node

Inter-site FC Link

FC Switch

FC Switch

EVA

EVA

Host-Based

Shadowset


Cross site shadowed system disk
Cross-site Shadowed System Disk Access

  • With only an SCS link between sites, it was impractical to have a shadowed system disk and boot nodes from it at multiple sites

  • With a Fibre Channel inter-site link, it becomes possible to do this

    • but it is probably still not a good idea (single point of failure for the cluster)


Advanced AccessVolume Shadowing Discussion


Shadowing performance
Shadowing Performance Access

  • Shadowing is primarily an availability tool, but:

    • Shadowing can often improve performance (e.g. reads), however:

    • Shadowing can often decrease performance (e.g. writes, and during merges and copies)


Shadowset state hierarchy
Shadowset State Hierarchy Access

  • Mini-Merge state

  • Copy state

    • Mini-Copy state Transient states

    • Full Copy state

  • Full Merge state

  • Steady state


Copy and merge
Copy and Merge Access

  • How a Copy or Merge gets started:

    • A $MOUNT command triggers a Copy

    • This implies a Copy is a Scheduled operation

    • Departure of a system from the cluster while it has a shadowset mounted triggers a Merge

    • This implies a Merge is an Unscheduled operation


I o algorithms used by shadowing
I/O Algorithms Used by Shadowing Access

  • Variations:

    • Reads vs. Writes

    • Steady-State vs. Copy vs. Merge

      • And for Copies and Merges,

        • Full- or Mini- operation

        • Copy or Merge Thread I/Os

        • Application I/Os: Ahead of or Behind the Copy/Merge “Fence”


Application read during steady state
Application Read during Steady-State Access

Application

  • Read from any single member

SHDRIVER

1. Read

Member

Member

Member


Application write during steady state
Application Write during Steady-State Access

Application

  • Write to all members in parallel

SHDRIVER

1. Write

1. Write

1. Write

Member

Member

Member


Full copy and full merge thread algorithms
Full-Copy and Full-Merge AccessThread Algorithms

  • Action or Event triggers Copy or Merge:

    • $MOUNT command triggers Copy (scheduled operation)

    • System departure from cluster while it has shadowset mounted triggers Merge (unscheduled operation)

  • SHADOW_SERVER on one node picks up Thread from work queue and does I/Os for the Copy or Merge Thread

    • SHADOW_MAX_COPY parameter controls maximum number of Threads on a node

  • SHAD$COPY_BUFFER_SIZE logical name controls number of blocks handled at a time

    • No double-buffering or other speed-up tricks are used


Full copy and full merge thread algorithms1
Full-Copy and Full-Merge AccessThread Algorithms

  • Start at first Logical Block on disk (LBN zero)

  • Process 127 blocks at a time from beginning to end

  • Symbolic “Fence” separates processed area from un-processed area

Beginning

Processed area

Fence

Un-processed area

Thread

Progress

End


Full copy thread algorithm
Full-Copy Thread Algorithm Access

  • Read from source

  • Compare with target

  • If different, write data to target and start over at Step 1.

Shadow_Server

1. Read

2. Compare

3. Write

Source

Target


Full copy thread algorithm1
Full-Copy Thread Algorithm Access

Why the odd algorithm?

  • To ensure correct results in cases like:

    • With application writes occurring in parallel with the copy thread

      • application writes during a Full Copy go to the set of source disks first, then to the set of target disk(s)

    • On system disks with an OpenVMS node booting or dumping while shadow copy is going on

  • To avoid the overhead of having to use the Distributed Lock Manager for every I/O


Speeding shadow copies
Speeding Shadow Copies Access

  • Implications:

    • Shadow copy completes fastest if data is identical beforehand

      • Fortunately, this is the most-common case – re-adding a shadow member into shadowset again after it was a member before


Full copy thread algorithm data identical
Full-Copy Thread Algorithm Access(Data Identical)

  • Read from source

  • Compare with target

Shadow_Server

1. Read

2. Compare

Source

Target


Full copy thread algorithm data different
Full-Copy Thread Algorithm Access(Data Different)

  • Read from source

  • Compare with target (difference found)

  • Write to target

  • Read from source

  • Compare with target (difference seldom found). If different, write to target and start over at Step 4.

Shadow_Server

1. Read

4. Read

2. Compare

5. Compare

3. Write

6. Write

Source

Target


Speeding shadow copies1
Speeding Shadow Copies Access

  • If data is very different, empirical tests have shown that it is faster to:

    • Do BACKUP/PHYSICAL from source shadowset to /FOREIGN-mounted target disk

    • Then do shadow copy afterward

  • than to simply initiate the shadow copy with differing data.


Why a merge is needed
Why a Merge is Needed Access

  • If a system has a shadowset mounted, and may be writing data to the shadowset

  • If that system crashes, then:

    • The state of any “in-flight” Write I/Os is indeterminate

      • All, some, or none of the members may have been written to

    • An application Read I/O could potentially return different data for the same block on different disks

  • Since Shadowing guarantees that the data on all shadowset members must appear to be identical, this uncertainty must be removed

    • This is done by a Merge operation, coupled with special treatment of Reads until the Merge is complete


What happens in full merge state
What Happens in Full Merge State Access

Full Merge operation will:

  • Read and compare data on all members

  • Fix any differences found

  • Moves across disk from start to end, 127 blocks at a time

  • Progress tracked by a “Fence” separating the merged (behind the fence) area from the unmerged (ahead of the fence) area

    Special treatment of Reads:

  • Every application read I/O must be merged:

    • Read data from each member (for just the area of the read)

    • Detect any differences

    • Overwrite any different data with data from Master member

    • Return the now-consistent read data to the application

  • This must be done for any Reads ahead of the Merge Fence


Hbvs merge operations
HBVS Merge Operations Access

Q: Why doesn’t it matter which member we choose to duplicate to the others on a merge? Don’t we need to ensure we replicate the newest data?

A: Shadowing reproduces the semantics of a regular disk:

1) Each time you read a block, you always get the same data returned

2) Once you do a write and get successful status back, you always expect to get the new data instead

3) If the system fails while you are doing a write, and the application doesn’t get successful status returned, you don’t know if the new or old data is on disk. You have to use journaling technology to handle this case.

Shadowing provides the same behavior.


Full merge thread algorithm
Full-Merge Thread Algorithm Access

  • Read from master

  • Compare with other member(s)

  • If different, write data to target(s) and start over at Step 1.

Shadow_Server

1. Read

2. Compare

2. Compare

3. Write

3. Write

Master

Member

Member


Speeding shadow copies mini copy operations
Speeding Shadow Copies: AccessMini-Copy Operations

  • If one knows ahead of time that one shadowset member will temporarily be removed, one can set things up to allow a Mini-Copy instead of a Full-Copy when the disk is returned to the shadowset

  • With OpenVMS Version 8.3, Automatic Mini-Copy on Volume Processing allows a Mini-Merge write bitmap to be designated as Multi-Use, allowing it to be converted into a Mini-Copy bitmap at the time a shadowset member must be removed from the shadowset, allowing a Mini-Copy to be used instead of a Full Copy to restore redundancy to that shadowset if a repair can be made and access to the removed member can be restored later


Mini copy thread algorithm
Mini-Copy Thread Algorithm Access

  • Only for areas the write bitmap indicates have changed:

  • Read from source

  • Write to target

  • Read from source

  • Compare with target

  • If different (quite unlikely), write data to target and start over at Step 3.

Shadow_Server

1. Read

3. Read

4. Compare

2. Write

5. Write

Source

Target


Mini copy vs full copy
Mini-Copy vs. Full-Copy Access

If data is different, Full-Copy algorithm requires 5 I/O operations per 127-block segment:

  • Read source

  • Compare target

  • Write target

  • Read source

  • Compare target

    Mini-Copy requires only 4 I/O operations per segment:

  • Read source

  • Write target

  • Read source

  • Compare target

    So Mini-Copy is faster even if 100% of data has changed


Compare operation
Compare operation Access

  • On MSCP controllers (HSJ, HSD), this is an MSCP CompareHost Data operation

    • Data is passed to the controller, which compares it with the data on disk, returning status

      • Cache is explicitly bypassed for the comparison

        • So Read-Ahead and Read Cache do not help with Compares

      • But Read-Ahead can be very helpful with Source disk Reads, particularly when the disks are close to identical

  • On SCSI / Fibre Channel controllers (EVA, MSA, XP, HSG, HSZ), this is emulated in DKDRIVER

    • Data is read from the disk into a second buffer and compared with the data using the host CPU (in interrupt state, at IPL 8)

      • Read-ahead cache can be very helpful, particularly when disks are close to identical


Speeding shadow copies2
Speeding Shadow Copies Access

  • Read-Ahead cache in storage controller subsystems can assist with source read I/Os

  • By default, Shadowing reads from source members in a round-robin fashion

    • But with 3-member shadowsets, with two source members, I/Os are divided among 2 sources

  • $SET DEVICE /COPY_SOURCE device

    • Applied to shadowset member, makes the member the preferred source for copies & merges

    • Applied to Virtual Unit, current master member is used as source for copies & merges


Full merge thread algorithm1
Full-Merge Thread Algorithm Access

  • Read from master

  • Compare with other member(s)

  • If different, write data to target(s) and start over at Step 1.

Shadow_Server

1. Read

2. Compare

2. Compare

3. Write

3. Write

Master

Member

Member


Host based mini merge
Host-Based Mini-Merge Access

  • One or more OpenVMS systems keep a bitmap of areas that have been written to recently

    • Other nodes notify (update) these nodes as new areas are written to

    • Bitmaps are cleared periodically

  • If a node crashes, only areas recently written to, as indicated by the bitmap, must be merged

  • If all the bitmaps are lost, a Full Merge is needed


Speeding shadow copies merges
Speeding Shadow Copies/Merges Access

  • Because of the sequential, non-pipelined nature of the shadow-copy algorithm, progressing across the entire LBN range of the disk, to speed shadow copies/merges:

    • Rather than forming controller-based stripesets and shadowing those, shadow individual disks in parallel, and combine them into RAID-0 (0+1) arrays with host-based RAID software


Speeding shadow copies merges1
Speeding Shadow Copies/Merges Access

  • Dividing a disk up into 4 partitions at the controller level, and shadow-copying all 4 in parallel takes only 40% of the time required to shadow-copy the entire disk as a whole


Application i os
Application I/Os Access

  • During Full-Copy

  • During Full-Merge


Application read during full copy
Application Read AccessDuring Full-Copy

  • In Processed area:

  • Pick any member to satisfy a read from

Processed area

Fence

  • In Un-processed area:

  • Pick any source member to satisfy a read from (never read from target)

Un-processed area


Application read during full copy already copied area
Application Read during Full-Copy; already-copied area Access

Application

  • Read from any single member

  • Return data to application

SHDRIVER

1. Read

Member

Member

Member


Application read during full copy not yet copied area
Application Read during Full-Copy; not-yet-copied area Access

Application

  • Read from any source disk

  • Return data to application

SHDRIVER

1. Read

Source

Source

Target

Full-Copy underway


Application read during full merge
Application Read AccessDuring Full-Merge

  • In Processed area:

  • Pick one member to read from

Processed area

Fence

  • In Un-processed area:

  • Merge the area covered by the read operation

  • Return data

Un-processed area


Application read during full merge not yet merged area
Application Read during Full-Merge; not-yet-merged area Access

Application

  • Read from source

  • Compare with target(s)

  • If different, write data to target(s) and start over at Step 1.

  • Return data to application

SHDRIVER

1. Read

2. Compare

2. Compare

3. Write

3. Write

Master

Member

Member


Application read during full merge already merged area
Application Read during Full-Merge; already-merged area Access

Application

  • Read from any single member

SHDRIVER

1. Read

Member

Member

Member


Application write during full copy
Application Write AccessDuring Full-Copy

  • In Processed area:

  • Write to all members in parallel

Processed area

Fence

  • In Un-processed area:

  • Write to all source members in parallel

  • Write to target disk

Un-processed area


Application write during full copy already copied area
Application Write during Full-Copy; already-copied area Access

Application

  • Write to all members in parallel

SHDRIVER

1. Write

1. Write

1. Write

Source

Source

Target

Full-Copy underway


Application write during full copy not yet copied area
Application Write during Full-Copy; not-yet-copied area Access

Application

  • Write to all source member(s); wait until these writes have completed

  • Write to all target(s)

SHDRIVER

1. Write

1. Write

2. Write

Source

Source

Target

Full-Copy underway


Application writes during full merge
Application Writes AccessDuring Full-Merge

  • Regardless of area:

  • Write to all members in parallel

Processed area

Fence

Un-processed area


Application write during full merge
Application Write during Full-Merge Access

Application

  • Write to all members in parallel

SHDRIVER

1. Write

1. Write

1. Write

Master

Member

Member

Full-Merge underway


Shadowing between sites in multi site clusters
Shadowing Between Sites Accessin Multi-Site Clusters

Because Direct operations are lower in latency than MSCP-served operations, even when the inter-site distance is small:

It is generally best to have an inter-site Fibre Channel link, if possible.

And because inter-site latency can be much greater than intra-site latency, due to the speed of light:

If the inter-site distance is large, it is best to direct Read operations to the local disks, not remote disks

  • Write operations have to go to all disks in a shadowset, remote as well as local members

    If the inter-site distance is small, performance may be best if reads are spread across all members at both sites


Shadowset member selection for reads read cost
Shadowset Member Selection for Reads: AccessRead Cost

  • Shadowing assigns a default “read cost” for each shadowset member, with values in this order (lowest cost to highest cost):

    • DECram device

    • Directly-connected device in the same physical location

    • Directly-connected device in a remote location

    • DECram MSCP-served device

    • All other MSCP-served devices

  • Shadowing adds the Read Cost to the Queue Length and picks the member with the lowest value to do the read

    • Round-robin distribution for disks with equal values


Shadowset member selection for reads read cost1
Shadowset Member Selection for Reads: AccessRead Cost

  • For even greater control, one can override the default “read cost” to a given disk from a given node:

    $ SET DEVICE /READ_COST=nnn $1$DGAnnn:

    For reads, OpenVMS adds the Read Cost to the queue length and picks the member with the lowest value to do the read

  • To return all member disks of a shadowset to their default values after such changes have been done, one can use the command:

    • $ SET DEVICE /READ_COST = 1 DSAnnn:


Shadowing between sites local vs remote reads on fibre channel
Shadowing Between Sites: AccessLocal vs. Remote Reads on Fibre Channel

  • With an inter-site Fibre Channel link, Shadowing can’t tell which Fibre Channel disks are local and which are remote, so we have to give it some help:

    • To indicate which site a given disk is at, do:

      $ SET DEVICE/SITE=xxx $1$DGAnnn:

    • To indicate which site a given OpenVMS node is at, do:

      $ SET DEVICE/SITE=xxx DSAnn:

      for each virtual unit, from each OpenVMS node, specifying the appropriate SITE value for each OpenVMS node.


Data protection mechanisms and scenarios
Data Protection Mechanisms and Scenarios Access

  • Protection of the data is obviously extremely important in a disaster-tolerant cluster

  • We’ll examine the mechanisms Volume Shadowing uses to protect the data

  • We’ll look at one scenario that has happened in real life and resulted in data loss:

    • “Wrong-way Shadow Copy”

  • We’ll also look at two obscure but potentially dangerous scenarios that theoretically could occur and would result in data loss:

    • “Creeping Doom” and “Rolling Disaster”


Protecting shadowed data
Protecting Shadowed Data Access

  • Shadowing keeps a “Generation Number” in the SCB on shadow member disks

  • Shadowing “Bumps” the Generation number at the time of various shadowset events, such as mounting, dismounting, or membership changes (addition of a member or loss of a member)


Protecting shadowed data1
Protecting Shadowed Data Access

  • Generation number is designed to monotonically increase over time, never decrease

  • Implementation is based on OpenVMS timestamp value

    • During a “Bump” operation it is increased

      • to the current time value, or

        • if the generation number already represents a time in the future for some reason (such as time skew among cluster member clocks), then it is simply incremented

    • The new value is stored on all shadowset members at the time of the Bump operation


Protecting shadowed data2
Protecting Shadowed Data Access

  • Generation number in SCB on removed members will thus gradually fall farther and farther behind that of current members

  • In comparing two disks, a later generation number should always be on the more up-to-date member, under normal circumstances


Wrong way shadow copy scenario
“Wrong-Way Shadow Copy” Scenario Access

  • Shadow-copy “nightmare scenario”:

    • Shadow copy in “wrong” direction copies old data over new

  • Real-life example:

    • Inter-site link failure occurs

    • Due to unbalanced votes, Site A continues to run

      • Shadowing increases generation numbers on Site A disks after removing Site B members from shadowset


Wrong way shadow copy
Wrong-Way Shadow Copy Access

Site B

Site A

Incoming transactions

(Site now inactive)

Inter-site link

Data being updated

Data becomes stale

Generation number still at old value

Generation number now higher


Wrong way shadow copy1
Wrong-Way Shadow Copy Access

  • Site B is brought up briefly by itself

    • Shadowing can’t see Site A disks. Shadowsets mount with Site B disks only. Shadowing bumps generation numbers on Site B disks. Generation number is now greater than on Site A disks.


Wrong way shadow copy2
Wrong-Way Shadow Copy Access

Site B

Site A

Isolated nodes rebooted just to check hardware; shadowsets mounted

Incoming transactions

Data being updated

Data still stale

Generation number now highest

Generation number unaffected


Wrong way shadow copy3
Wrong-Way Shadow Copy Access

  • Link gets fixed. “Just to be safe,” they decide to reboot the cluster from scratch. The main site is shut down. The remote site is shut down. Then both sites are rebooted at once.

  • Shadowing compares shadowset generation numbers and thinks the Site B disks are more current, and copies them over to Site A’s disks. Result: Data Loss.


Wrong way shadow copy4
Wrong-Way Shadow Copy Access

Site B

Site A

Before link is restored, entire cluster is taken down, “just in case”, then rebooted from scratch.

Inter-site link

Shadow Copy

Valid data overwritten

Data still stale

Generation number is highest


Protecting shadowed data3
Protecting Shadowed Data Access

  • If shadowing can’t “see” a later disk’s SCB (i.e. because the site or link to the site is down), it may use an older member and then update the Generation number to a current timestamp value

  • New /POLICY=REQUIRE_MEMBERS qualifier on MOUNT command prevents a mount unless all of the listed members are present for Shadowing to compare Generation numbers on

  • New /POLICY=VERIFY_LABEL qualifier on MOUNT means volume label on member must be SCRATCH_DISK, or it won’t be added to the shadowset as a full-copy target


Creeping doom scenario
“Creeping Doom” Scenario Access

Inter-site link

Shadowset


Creeping doom scenario1
“Creeping Doom” Scenario Access

A lightning strike hits the network room, taking out (all of) the inter-site link(s).

Inter-site link

Shadowset


Creeping doom scenario2
“Creeping Doom” Scenario Access

  • First symptom is failure of link(s) between two sites

    • Forces choice of which datacenter of the two will continue

  • Transactions then continue to be processed at chosen datacenter, updating the data


Creeping doom scenario3
“Creeping Doom” Scenario Access

Incoming transactions

(Site now inactive)

Inter-site link

Data being updated

Data becomes stale


Creeping doom scenario4
“Creeping Doom” Scenario Access

  • In this scenario, the same failure which caused the inter-site link(s) to go down expands to destroy the entire datacenter


Creeping doom scenario5
“Creeping Doom” Scenario Access

Inter-site link

Data with updates is destroyed

Stale data


Creeping doom scenario6
“Creeping Doom” Scenario Access

  • Transactions processed after “wrong” datacenter choice are thus lost

    • Commitments implied to customers by those transactions are also lost


Creeping doom scenario7
“Creeping Doom” Scenario Access

  • Techniques for avoiding data loss due to “Creeping Doom”:

    • Tie-breaker at 3rd site helps in many (but not all) cases

    • 3rd copy of data at 3rd site


Rolling disaster scenario
“Rolling Disaster” Scenario Access

  • Problem or scheduled outage makes one site’s data out-of-date

  • While doing a shadowing Full-Copy to update the disks at the formerly-down site, a disaster takes out the primary site


Rolling disaster scenario1
“Rolling Disaster” Scenario Access

Inter-site link

Shadow Copy operation

Target disks

Source disks


Rolling disaster scenario2
“Rolling Disaster” Scenario Access

Inter-site link

Shadow Copy interrupted

Partially-updated disks

Source disks destroyed


Rolling disaster scenario3
“Rolling Disaster” Scenario Access

  • Techniques for avoiding data loss due to “Rolling Disaster”:

    • Keep copy (backup, snapshot, clone) of out-of-date copy at target site instead of over-writing the only copy there. Perhaps perform shadow copy to a different set of disks.

      • The surviving copy will be out-of-date, but at least you’ll have some copy of the data

  • Keeping a 3rd copy of data at 3rd site is the only way to ensure there is no data lost


Advanced AccessQuorumDiscussion


Quorum configurations in two site clusters4
Quorum configurations in AccessTwo-Site Clusters

  • Balanced Votes

    • Note: Using REMOVE_NODE option with SHUTDOWN.COM (post V6.2) when taking down a node effectively “unbalances” votes:

Node

1 Vote

Node

1 Vote

Node

1 Vote

Node

1 Vote

Total votes: 4 Quorum: 3

Shutdown with

REMOVE_NODE

Node

1 Vote

Node

1 Vote

Node

1 Vote

Total votes: 3 Quorum: 2


Advanced AccessSystem Management Discussion


Easing system management of disaster tolerant clusters tips and techniques
Easing System Management of Disaster-Tolerant Clusters: Tips and Techniques

  • Create a “cluster-common” disk

  • Use AUTOGEN “include” files

  • Use “Cloning” technique to ease workload of maintaining multiple system disks


System management of disaster tolerant clusters cluster common disk
System Management of Disaster-Tolerant Clusters: Cluster-Common Disk

  • Create a “cluster-common” disk

    • Cross-site shadowset

    • Mount it in SYLOGICALS.COM

    • Put all cluster-common files there, and define logical names in SYLOGICALS.COM to point to them:

      • SYSUAF, RIGHTSLIST

      • Queue file, LMF database, etc.


System management of disaster tolerant clusters cluster common disk1
System Management of Disaster-Tolerant Clusters: Cluster-Common Disk

  • Put startup files on cluster-common disk also; and replace startup files on all system disks with a pointer to the common one:

    • e.g. SYS$STARTUP:STARTUP_VMS.COM contains only:

    • $ @CLUSTER_COMMON:SYSTARTUP_VMS

  • To allow for differences between nodes, test for node name in common startup files, e.g.

    • $ NODE = F$GETSYI(“NODENAME”)

    • $ IF NODE .EQS. “GEORGE” THEN ...


System management of disaster tolerant clusters autogen include files
System Management of Disaster-Tolerant Clusters: AUTOGEN “include” Files

  • Create a MODPARAMS_COMMON.DAT file on the cluster-common disk which contains system parameter settings common to all nodes

    • For multi-site or disaster-tolerant clusters, also create one of these for each site

  • Include an AGEN$INCLUDE_PARAMS line in each node-specific MODPARAMS.DAT to include the common parameter settings


System management of disaster tolerant clusters system disk cloning
System Management of Disaster-Tolerant Clusters: System Disk “Cloning”

  • Use “Cloning” technique to replicate system disks

  • Goal: Avoid doing “n” upgrades for “n” system disks


System management of disaster tolerant clusters system disk cloning1
System Management of Disaster-Tolerant Clusters: System Disk “Cloning”

  • Create “Master” system disk with roots for all nodes. Use Backup to create Clone system disks.

    • To minimize disk space, move dump files off system disk

  • Before an upgrade, save any important system-specific info from Clone system disks into the corresponding roots on the Master system disk

    • Basically anything that’s in SYS$SPECIFIC:[*]

    • Examples: ALPHAVMSSYS.PAR, MODPARAMS.DAT, AGEN$FEEDBACK.DAT, ERRLOG.SYS, OPERATOR.LOG, ACCOUNTNG.DAT

  • Perform upgrade on Master disk

  • Use BACKUP to copy Master disk to Clone disks again.



Background “Cloning”


Historical context
Historical Context “Cloning”

Example: New York City, USA

  • 1993 World Trade Center bombing raised awareness of DR and prompted some improvements

  • Sept. 11, 2001 has had dramatic and far-reaching effects

    • Scramble to find replacement office space

    • Many datacenters moved off Manhattan Island, some out of NYC entirely

    • Increased distances to DR sites

    • Induced regulatory responses (in USA & abroad)


Trends and driving forces in the us
Trends and Driving Forces in the US “Cloning”

  • BC, DR and DT in a post-9/11 world:

    • Recognition of greater risk to datacenters

      • Particularly in major metropolitan areas

    • Push toward greater distances between redundant datacenters

      • It is no longer inconceivable that, for example, terrorists might obtain a nuclear device and destroy the entire NYC metropolitan area


Trends and driving forces in the us1
Trends and Driving Forces in the US “Cloning”

  • "Draft Interagency White Paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System“

    • http://www.sec.gov/news/studies/34-47638.htm

  • Agencies involved:

    Federal Reserve System

    Department of the Treasury

    Securities & Exchange Commission (SEC)

  • Applies to:

    Financial institutions critical to the US economy


Us draft interagency white paper
US Draft Interagency White Paper “Cloning”

The early “concept release” inviting input made mention of a 200-300 mile limit (only as part of an example when asking for feedback as to whether any minimum distance value should be specified or not):

“Sound practices. Have the agencies sufficiently described expectations regarding out-of-region back-up resources? Should some minimum distance from primary sites be specified for back-up facilities for core clearing and settlement organizations and firms that play significant roles in critical markets (e.g., 200 - 300 miles between primary and back-up sites)? What factors should be used to identify such a minimum distance?”


Us draft interagency white paper1
US Draft Interagency White Paper “Cloning”

This induced panic in several quarters:

  • NYC feared additional economic damage of companies moving out

  • Some pointed out the technology limitations of some synchronous mirroring products and of Fibre Channel at the time which typically limited them to a distance of 100 miles or 100 km

    Revised draft contained no specific distance numbers; just cautionary wording

    Ironically, that same non-specific wording now often results in DR datacenters 1,000 to 1,500 miles away


Us draft interagency white paper2
US Draft Interagency White Paper “Cloning”

“Maintain sufficient geographically dispersed resources to meet recovery and resumption objectives.”

“Long-standing principles of business continuity planning suggest that back-up arrangements should be as far away from the primary site as necessary to avoid being subject to the same set of risks as the primary location.”


Us draft interagency white paper3
US Draft Interagency White Paper “Cloning”

“Organizations should establish back-up facilities a significant distance away from their primary sites.”

“The agencies expect that, as technology and business processes … continue to improve and become increasingly cost effective, firms will take advantage of these developments to increase the geographic diversification of their back-up sites.”


Ripple effect of regulatory activity within the usa
Ripple effect of Regulatory Activity Within the USA “Cloning”

  • National Association of Securities Dealers (NASD):

    • Rule 3510 & 3520

  • New York Stock Exchange (NYSE):

    • Rule 446


Ripple effect of regulatory activity outside the usa
Ripple effect of Regulatory Activity Outside the USA “Cloning”

  • United Kingdom: Financial Services Authority:

    • Consultation Paper 142 – Operational Risk and Systems Control

  • Europe:

    • Basel II Accord

  • Australian Prudential Regulation Authority

    • Prudential Standard for business continuity management APS 232 and guidance note AGN 232.1

  • Monetary Authority of Singapore (MAS)

    • “Guidelines on Risk Management Practices – Business Continuity Management” affecting “Significantly Important Institutions” (SIIs)


Resiliency maturity model project
Resiliency Maturity Model project “Cloning”

  • The Financial Services Technology Consortium (FTSC) has begun work on a Resiliency Maturity Model

    • Taking inspiration from the Carnegie Mellon Software Engineering Institute’s Capability Maturity Model (CMM) and Networked Systems Survivability Program

    • Intent is to develop industry standard metrics to evaluate an institution’s business continuity, disaster recovery, and crisis management capabilities


Long-distance Effects: “Cloning”Inter-site Latency


Long distance cluster issues
Long-distance Cluster Issues “Cloning”

  • Latency due to speed of light becomes significant at higher distances. Rules of thumb:

    • About 1 ms per 100 miles, one-way

    • About 1 ms per 50 miles round-trip latency

  • Actual circuit path length can be longer than highway mileage between sites

  • Latency can adversely affect performance of

    • Remote I/O operations

    • Remote locking operations



Inter site latency actual customer measurements
Inter-site Latency: “Cloning”Actual Customer Measurements


Differentiate between latency and bandwidth
Differentiate between latency and bandwidth “Cloning”

  • Can’t get around the speed of light and its latency effects over long distances

    • Higher-bandwidth link doesn’t mean lower latency


Long-distance Techniques: “Cloning”SAN Extension


San extension
SAN Extension “Cloning”

  • Fibre Channel distance over fiber is limited to about 100 kilometers

    • Shortage of buffer-to-buffer credits adversely affects Fibre Channel performance above about 50 kilometers

  • Various vendors provide “SAN Extension” boxes to connect Fibre Channel SANs over an inter-site link

  • See SAN Design Reference Guide Vol. 4 “SAN extension and bridging”:

    • http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00310437/c00310437.pdf



Disk data replication
Disk Data Replication “Cloning”

  • Data mirroring schemes

    • Synchronous

      • Slower, but no chance of data loss in conjunction with a site loss

    • Asynchronous

      • Faster, and works for longer distances

        • but can lose seconds’ or minutes’ worth of data (more under high loads) in a site disaster


Continuous access synchronous replication
Continuous Access “Cloning”Synchronous Replication

Node

Node

Write

FC Switch

FC Switch

Controller in charge of mirrorset:

EVA

EVA

Mirrorset


Continuous access synchronous replication1
Continuous Access “Cloning”Synchronous Replication

Node

Node

Write

FC Switch

FC Switch

Controller in charge of mirrorset:

Write

EVA

EVA

Mirrorset


Continuous access synchronous replication2
Continuous Access “Cloning”Synchronous Replication

Node

Node

Write

FC Switch

FC Switch

Controller in charge of mirrorset:

Write

EVA

EVA

Success status

Mirrorset


Continuous access synchronous replication3
Continuous Access “Cloning”Synchronous Replication

Node

Node

Success status

Write

FC Switch

FC Switch

Controller in charge of mirrorset:

Write

EVA

EVA

Success status

Mirrorset


Continuous access synchronous replication4
Continuous Access “Cloning”Synchronous Replication

Node

Node

Application continues

Success status

Write

FC Switch

FC Switch

Controller in charge of mirrorset:

Write

EVA

EVA

Success status

Mirrorset


Continuous access asynchronous replication
Continuous Access “Cloning”Asynchronous Replication

Node

Node

Write

FC Switch

FC Switch

Controller in charge of mirrorset:

EVA

EVA

Mirrorset


Continuous access asynchronous replication1
Continuous Access “Cloning”Asynchronous Replication

Node

Node

Success status

Write

FC Switch

FC Switch

Controller in charge of mirrorset:

EVA

EVA

Mirrorset


Continuous access asynchronous replication2
Continuous Access “Cloning”Asynchronous Replication

Node

Node

Application continues

Success status

Write

FC Switch

FC Switch

Controller in charge of mirrorset:

EVA

EVA

Mirrorset


Continuous access asynchronous replication3
Continuous Access “Cloning”Asynchronous Replication

Node

Node

Application continues

Success status

Write

FC Switch

FC Switch

Controller in charge of mirrorset:

Write

EVA

EVA

Mirrorset


Synchronous versus asynchronous replication and link bandwidth

MB/Sec “Cloning”

Synchronous – RPO = 0

Asynchronous – RPO 2 hrs. max

Asynchronous – RPO many hrs.

0 8 am 12 noon 5 pm 12 pm

Time

Synchronous versus Asynchronous Replication and Link Bandwidth

Application write bandwidth


Data replication and long distances
Data Replication and Long Distances “Cloning”

  • Some vendors claim synchronous mirroring is impossible at a distance over 100 kilometers, 100 miles, or 200 miles, because their product cannot support synchronous mirroring over greater distances

  • OpenVMS Volume Shadowing does synchronous mirroring

    • Acceptable application performance is the only limit found so far on inter-site distance for HBVS


Long distance synchronous host based mirroring software tests
Long-distance Synchronous “Cloning”Host-based Mirroring Software Tests

  • OpenVMS Host-Based Volume Shadowing (HBVS) software (host-based mirroring software)

  • SAN Extension used to extend SAN using FCIP boxes

  • AdTech box used to simulate distance via introduced packet latency

  • No OpenVMS Cluster involved across this distance (no OpenVMS node at the remote end; just “data vaulting” to a “distant” disk controller)




Mitigating the impact of distance
Mitigating the Impact of Distance “Cloning”

  • Do transactions as much as possible in parallel rather than serially

    • May have to find and eliminate hidden serialization points in applications and system software

    • Have to avoid dependencies between parallel transactions in flight at one time

  • Minimize number of round trips between sites to complete a transaction


Minimizing round trips between sites
Minimizing Round Trips Between Sites “Cloning”

  • Some vendors have Fibre Channel SCSI-3 protocol tricks to do writes in 1 round trip vs. 2

    • e.g. Brocade’s “FastWrite” or Cisco’s “Write Acceleration”

  • Application design can also affect number of round-trips required between sites


Mitigating impact of inter site latency
Mitigating Impact of Inter-Site Latency “Cloning”

  • How applications are distributed across a multi-site OpenVMS cluster can affect performance

    • This represents a trade-off among performance, availability, and resource utilization


Application scheme 1 hot primary cold standby
Application Scheme 1: “Cloning”Hot Primary/Cold Standby

  • All applications normally run at the primary site

    • Second site is idle, except for data replication work, until primary site fails, then it takes over processing

  • Performance will be good (all-local locking)

  • Fail-over time will be poor, and risk high (standby systems not active and thus not being tested)

  • Wastes computing capacity at the remote site


Application scheme 2 hot hot but alternate workloads
Application Scheme 2: “Cloning”Hot/Hot but Alternate Workloads

  • All applications normally run at one site or the other, but not both; data is mirrored between sites, and the opposite site takes over upon a failure

  • Performance will be good (all-local locking)

  • Fail-over time will be poor, and risk moderate (standby systems in use, but specific applications not active and thus not being tested from that site)

  • Second site’s computing capacity is actively used


Application scheme 3 uniform workload across sites
Application Scheme 3: “Cloning”Uniform Workload Across Sites

  • All applications normally run at both sites simultaneously. (This would be considered the “norm” for most OpenVMS clusters.)

  • Surviving site takes all load upon failure

  • Performance may be impacted (some remote locking) if inter-site distance is large

  • “Fail-over” time will be excellent, and risk low (all systems are already in use running the same applications, thus constantly being tested)

  • Both sites’ computing capacity is actively used


Work arounds being used today
Work-arounds being used today “Cloning”

  • Multi-hop replication

    • Synchronous to nearby site

    • Asynchronous to far-away site

  • Transaction-based replication

    • e.g. replicate transaction (a few hundred bytes) with Reliable Transaction Router instead of having to replicate all the database page updates (often 8 kilobytes or 64 kilobytes per page) and journal log file writes behind a database


Promising areas for investigation
Promising areas for investigation “Cloning”

Parallelism as a potential solution

  • Rationale:

    • Adding 30 milliseconds to a typical transaction for a human may not be noticeable

      • Having to wait for many 30-millisecond transactions in front of yours slows things down

        Applications in the future may have to be built with greater awareness of inter-site latency

  • Promising directions:

    • Allow many more transactions to be in flight in parallel

    • Each will take longer, but overall throughput (transaction rate) might be the same or even higher than now


Testing simulation
Testing / Simulation “Cloning”

  • Before incurring the risk and expense of site selection, datacenter construction, and inter-site link procurement:

  • Test within a single-datacenter test environment, with distance simulated by introducing packet latency

  • A couple of vendors / products:

    • Shunra STORM Network Emulator

    • Spirent AdTech


Continuous Access or “Cloning”Host-Based Volume Shadowing: Which Should I Use?


Contrast compare hbvs with ca
Contrast / Compare HBVS with CA “Cloning”

Implementation

  • HBVS:

    • HBVS is implemented in software at the host level

  • CA:

    • CA is implemented in software (firmware) at the controller subsystem level


Contrast compare hbvs with ca1
Contrast / Compare HBVS with CA “Cloning”

Writing Data

  • HBVS:

    • Host writes data to multiple disk LUNs in parallel

  • CA:

    • Host writes data to a LUN on one controller; that controller talks to other controller(s) and sends data to them so they can write it to their disk(s)


Host based volume shadowing2
Host-Based Volume Shadowing “Cloning”

SCS--capable interconnect

Node

Node

FC Switch

FC Switch

EVA

EVA

Host-Based

Shadowset


Host based volume shadowing writes to all members
Host-Based Volume Shadowing: “Cloning”Writes to All Members

Node

Node

Write

FC Switch

FC Switch

EVA

EVA

Host-Based

Shadowset


Continuous access
Continuous Access “Cloning”

Inter-site SCS Link

Node

Node

Inter-site FC Link

FC Switch

FC Switch

EVA

EVA

Controller-Based

Mirrorset


Continuous access writes thru master primary side controller
Continuous Access: “Cloning”Writes thru Master/Primary-side Controller

Node

Node

Write

FC Switch

FC Switch

Controller in charge of mirrorset:

Write

EVA

EVA

Mirrorset


Contrast compare hbvs with ca2
Contrast / Compare HBVS with CA “Cloning”

Reading Data

  • HBVS:

    • Host can reads from any member (selection is based on Read_Cost criteria) so in multi-site cluster all 2-3 sites can read from local copy and not incur inter-site latency

  • CA:

    • Host must read a given LUN from a single controller at a time; selection of replication direction (and thus which controller to direct reads/writes to) is made at controller level

    • In multi-site configurations, reads will only be local from 1 site and remote from the other site(s)

      • Proxy reads are possible but discouraged for performance reasons; hosts might as well direct the I/O request to the controller in charge vs. asking another controller to relay the request


Host based volume shadowing reads from local member
Host-Based Volume Shadowing: “Cloning”Reads from Local Member

Node

Node

Read

Read

FC Switch

FC Switch

EVA

EVA

Host-Based

Shadowset


Continuous access reads from one site are remote one local
Continuous Access: “Cloning”Reads from one site are Remote, one Local

Node

Node

Read

Read

FC Switch

FC Switch

Controller in charge of mirrorset:

EVA

EVA

Mirrorset


Controller based data replication data access at remote site
Controller-based Data Replication: “Cloning”Data Access at Remote Site

  • All access to a disk unit is typically from one controller at a time

    • Failover involves controller commands

      • Manual, or scripted (but still take some time to perform)

    • Read-only access may be possible at remote site with some products; other controller may be able to redirect a request (“Proxy” I/O) to the primary controller in some products

Site A

Site B

No Access

Secondary

Primary

All Access

Replication


Continuous access failover required on controller failure
Continuous Access “Cloning”Failover Required on Controller Failure

Node

Node

Nodes must now switch to access data through this controller

FC Switch

FC Switch

EVA

EVA

Mirrorset


Controller based data replication bi directional replication
Controller-Based Data Replication: “Cloning”“Bi-directional” Replication

  • CA can only replicate a given disk unit in one direction or the other

  • Some disk units may be replicated in one direction, and other disk units in the opposite direction at the same time; this may be called “bi-directional” replication even though it is just two “one-way” copies

Site B

Site A

Secondary

Primary

Transactions

Replication

Primary

Secondary

Replication

Transactions


Host based volume shadowing writes in both directions at once
Host-Based Volume Shadowing: “Cloning”Writes in Both Directions at Once

Node

Node

Write

Write

FC Switch

FC Switch

EVA

EVA

Host-Based

Shadowset


Contrast compare hbvs with ca3
Contrast / Compare HBVS with CA “Cloning”

Number of replicated copies:

  • HBVS:

    • Can support 2-member or 3-member shadowsets

      • Some customers have 3 sites with 1 shadowset member at each of the 3 sites, and can survive the destruction of any 2 sites and continue without data loss

  • CA:

    • Can support one remote copy (2-site protection)


Contrast compare hbvs with ca4
Contrast / Compare HBVS with CA “Cloning”

Host CPU Overhead:

  • HBVS:

    • Extremely low CPU overhead (typically un-measurable)

      • Reads: Compare queue length + Read_Cost for each disk and select one disk to read from

      • Writes: Clone I/O Request Packet and send to 2 (or 3) disks

      • Copy/Merge operations: Shadow_Server process issues 127-block I/Os

      • No RAID-5 parity calculation

  • CA:

    • No additional host CPU overhead

      • Controller CPU load overload may be of concern undersome circumstances


Contrast compare hbvs with ca5
Contrast / Compare HBVS with CA “Cloning”

Replication between different disk/controller types:

  • HBVS:

    • Can support shadowsets across different storage subsystems

      • Can be mix of technologies, e.g. local SCSI and Fibre Channel

      • Even supports shadowing different-sized disk units

  • CA:

    • Must replicate to a like storage controller

      • But CA-XP supports layering on top of other Fibre Channel storage and replicating it between XP boxes


Contrast compare hbvs with ca6
Contrast / Compare HBVS with CA “Cloning”

OS Platform Support:

  • HBVS:

    • Protects OpenVMS data

      • Can serve shadowed data to other platforms with NFS, Advanced Server / Samba, HTTP, FTP, etc.

  • CA:

    • Supports multiple OS platforms


Contrast compare hbvs with ca7
Contrast / Compare HBVS with CA “Cloning”

Type(s) of mirroring supported:

  • HBVS:

    • Does strictly synchronous mirroring

      • Advanced System Concepts, Inc. RemoteSHADOW product does asynchronous host-based replication

  • CA:

    • Can do either synchronous or asynchronous mirroring


Why hasn t ca been more widely used in openvms cluster configurations
Why hasn’t CA been more widely used in OpenVMS Cluster configurations?

  • Failover Time:

    • Failover with HBVS is automatic, and can happen within seconds

    • Failover with CA tends to be manual, or scripted at best, resulting in longer downtime in conjunction with a disaster

  • Mirrorsets under Continuous Access can only be accessed through one controller or the other at a time

    • With HBVS, reads can be local to a site, which are faster with long inter-site distances, and the capacity of both controllers can be used at once with heavy read rates

  • Inter-site link costs:

    • HBVS can use MSCP-serving over a LAN (or MC) link

    • CA requires an inter-site SAN link, maybe SAN Extension

  • License cost

    • HBVS licenses are often less expensive than CA


When is ca most popular
When is CA most popular? configurations?

  • Cases where the Recovery Point Objective (RPO, measuring allowable data loss) is zero or low, but where the Recovery Time Objective (RTO, measuring allowable downtime) is fairly large (for example, a day or more). In this case, a Remote Vaulting solution (constantly or periodically copying data to a remote site) may be all that is needed, rather than a full-blown multi-site disaster-tolerant cluster.


When is ca most popular1
When is CA most popular? configurations?

  • An OpenVMS systems environment where the distance between datacenter sites is on the order of 1,000 miles or more, and the business needs for the data replication solution are more along the lines of Disaster Recovery rather than Disaster Tolerance, and to provide acceptable performance, asynchronous replication is the required solution. This assumes a non-zero RPO requirement, allowing for some amount of data loss in connection with a disaster.


When is ca most popular2
When is CA most popular? configurations?

  • Customer sites which have a variety of different operating-system (OS) platforms, and where OpenVMS plays only a minor role; in this case, since Continuous Access may be the chosen solution for data replication for all of the other OS platforms, it may be simpler to use Continuous Access for OpenVMS as well, and follow the same recovery/failover procedures, rather than setting up and maintaining a unique HBVS-based solution for only the minor OpenVMS piece.


When is ca most popular3
When is CA most popular? configurations?

  • Customers who need both the zero RPO of synchronous replication and the data of long inter-site distances. These customers may use a “Multi-Hop” configuration.


Data Replication over Long Distances: configurations?Multi-Hop Replication

  • It may be desirable to synchronously replicate data to a nearby “short-haul” site, and asynchronously replicate from there to a more-distant site

    • This is sometimes called “cascaded” data replication

Long-Haul

Short-Haul

100 miles

1,000 miles

Tertiary

Primary

Secondary

Synch

Asynch



Camden arkansas nts
Camden Arkansas NTS configurations?


The failover datacenter
The Failover Datacenter configurations?





Openvms disaster proof configuration application
OpenVMS Disaster-Proof configuration & application configurations?

Quorum

Integrity

rx2620

SDboom Integrity Superdome

Kaboom Alpha ES40

Stream of I/Os

Stream of I/Os

Stream of I/Os

All I/O’s need to complete

to all spindles before it is

considered done.

When a spindle drops out

The shadow set is reformed

with remaining members.

I/O’s “in flight” wait for the

Shadow set to be reformed.

Shadow set

XP24000

XP12000

The longest outstanding request for an I/O during the DP demo was 13.71 seconds.



Openvms system parameter settings for the disaster proof demonstration
OpenVMS System Parameter Settings for the Disaster Proof Demonstration

  • SHADOW_MBR_TMO lowered from default of 120 down to 8 seconds

  • RECNXINTERVAL lowered from default of 20 down to 10 seconds

  • TIMVCFAIL lowered from default of 1600 to 400 (4 seconds, in 10-millisecond clock units) to detect node failure in 4 seconds, worst-case, (detecting failure at the SYSAP level)

  • LAN_FLAGS bit 12 set to enable Fast LAN Transmit Timeout (give up on a failed packet transmit in 1.25 seconds, worst case, instead of an order of magnitude more in some cases)

  • PE4 set to hexadecimal 0703 (Hello transmit interval of 0.7 seconds, nominal; Listen Timeout of 3 seconds), to detect node failure in 3-4 seconds at the PEDRIVER level


Disaster proof demo timeline
Disaster Proof Demo Timeline Demonstration

  • Time = 0: Explosion occurs

  • Time around 3.5 seconds: Node failure detected, via either PEDRIVER Hello Listen Timeout or TIMVCFAIL mechanism. VC closed; Reconnection Interval starts.

  • Time = 8 seconds: Shadow Member Timeout expires; shadowset members removed.

  • Time around13.5 seconds: Reconnection Interval expires; State Transition begins.

  • Time = 13.71 seconds: Recovery complete; Application processing resumes.


Disaster proof demo timeline1
Disaster Proof Demo Timeline Demonstration

PEDRIVER Hello Listen Timeout or TIMVCFAIL Timeout

Explosion

Failure Detection Time

T = about 3.5 seconds

T = 0


Disaster proof demo timeline2
Disaster Proof Demo Timeline Demonstration

Failed Shadowset Members Removed

Explosion

Shadow Member Timeout

T = 8 seconds

T = 0


Disaster proof demo timeline3
Disaster Proof Demo Timeline Demonstration

PEDRIVER Hello Listen Timeout or TIMVCFAIL Timeout

State Transition Begins

Explosion

Reconnection Interval

T = about 3.5 seconds

T = 0

T = about 13.5 seconds


Disaster proof demo timeline4
Disaster Proof Demo Timeline Demonstration

State Transition Begins

Node Removed from Cluster

Application Resumes

Explosion

Lock Database Rebuild

Cluster State Transition

T = 13.71 seconds

T = 0

T = about 13.5 seconds


Real-Life Examples Demonstration


Real life example credit lyonnais paris
Real-Life Example: DemonstrationCredit Lyonnais, Paris

  • Credit Lyonnais fire in May 1996

  • OpenVMS multi-site cluster with data replication between sites (Volume Shadowing) saved the data

  • Fire occurred over a weekend, and DR site plus quick procurement of replacement hardware allowed bank to reopen on Monday

Source: Metropole Paris


“ In any disaster, the key is to protect the data. If you lose your CPUs, you can replace them. If you lose your network, you can rebuild it. If you lose your data, you are down for several months. In the capital markets, that means you are dead. During the fire at our headquarters, the DIGITAL VMS Clusters were very effective at protecting the data.”Jordan DoePatrick HummelIT Director, Capital Markets Division, Credit Lyonnais


Headquarters for Manhattan's Municipal Credit Union (MCU) were across the street from the World Trade Center, and were devastated on Sept. 11."It took several days to salvage critical data from hard-drive arrays and back-up tapes and bring the system back up” ...“During those first few chaotic days after Sept. 11, MCU allowed customers to withdraw cash from its ATMs, even when account balances could not be verified. Unfortunately, up to 4,000 people fraudulently withdrew about $15 million."

Ann Silverthorn, Network World Fusion, 10/07/2002

http://www.nwfusion.com/research/2002/1007feat2.html


Real life examples commerzbank on 9 11
Real-Life Examples: were across the street from the World Trade Center, and were devastated on Sept. 11.Commerzbank on 9/11

  • Datacenter near WTC towers

  • Generators took over after power failure, but dust & debris eventually caused A/C units to fail

  • Data replicated to remote site 30 miles away

  • One AlphaServer continued to run despite 104° F temperatures, running off the copy of the data at the opposite site after the local disk drives had succumbed to the heat

  • See http://h71000.www7.hp.com/openvms/brochures/commerzbank/


“Because of the intense heat in our data were across the street from the World Trade Center, and were devastated on Sept. 11.center, all systems crashed except for ourAlphaServer GS160... OpenVMS wide-areaclustering and volume-shadowing technologykept our primary system running off thedrives at our remote site 30 miles away.”Werner Boensch, Executive Vice PresidentCommerzbank, North America


Real life examples of openvms international securities exchange
Real-Life Examples of OpenVMS: were across the street from the World Trade Center, and were devastated on Sept. 11.International Securities Exchange

  • All-electronic stock derivatives (options) exchange

  • First new stock exchange in the US in 26 years

  • Went from nothing to majority market share in 3 years

  • OpenVMS Disaster-Tolerant Cluster at the core, surrounded by other OpenVMS systems

  • See http://h71000.www7.hp.com/openvms/brochures/ise/


“OpenVMS is a proven product that’s been were across the street from the World Trade Center, and were devastated on Sept. 11.battle tested in the field. That’s why wewere extremely confident in building thetechnology architecture of the ISE onOpenVMS AlphaServer systems.”Danny Friel, Sr. Vice President,Technology / Chief Information Officer,International Securities Exchange


“ We just had a disaster at one of our 3 sites 4 hours ago. Both the site's 2 nodes and 78 shadow members dropped when outside contractors killed all power to the computer room during maintenance. Fortunately the mirrored site 8 miles away and a third quorum site in another direction kept the cluster up after a minute of cluster state transition.”

Lee Mah,Capital Health Authority

writing in comp.os.vms, Aug. 20, 2004


“I have lost an entire data center due to a combination of a faulty UPScombined with a car vs. powerpole, and again when we needed to do major power maintenance. Both times, the remaining half of the cluster kept us going.”

Ed Wilts, Merrill Corporation

writing in comp.os.vms, July 22, 2005


Business Continuity of a faulty UPS


Business continuity not just it
Business Continuity: Not Just IT of a faulty UPS

  • The goal of Business Continuity is the ability for the entire business, not just IT, to continue operating despite a disaster.

  • Not just computers and data:

    • People

    • Facilities

    • Communications: Data networks and voice

    • Transportation

    • Supply chain, distribution channels

    • etc.


Useful of a faulty UPSResources


Business continuity resources
Business Continuity Resources of a faulty UPS

  • Disaster Recovery Journal:

    • http://www.drj.com/

  • Continuity Insights Magazine:

    • http://www.continuityinsights.com//

  • Contingency Planning & Management Magazine

    • http://www.contingencyplanning.com/

  • All are high-quality journals. The first two are available free to qualified subscribers

  • All hold conferences as well


Multi os disaster tolerant reference architectures whitepaper
Multi-OS Disaster-Tolerant Reference Architectures Whitepaper

  • Entitled “Delivering high availability and disaster tolerance in a multi-operating-system HP Integrity server environment”

  • Describes DT configurations across all of HP’s platforms: HP-UX, OpenVMS, Linux, Windows, and NonStop

  • http://h71028.www7.hp.com/ERC/downloads/4AA0-6737ENW.pdf


Tabb research report
Tabb Research Report Whitepaper

  • "Crisis in Continuity: Financial Markets Firms Tackle the 100 km Question"

    • available from https://h30046.www3.hp.com/campaigns/2005/promo/wwfsi/index.php?mcc=landing_page&jumpid=ex_R2548_promo/fsipaper_mcc%7Clanding_page


Draft interagency white paper
Draft Interagency White Paper Whitepaper

  • "Draft Interagency White Paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System“

    • http://www.sec.gov/news/studies/34-47638.htm

  • Agencies involved:

    Federal Reserve System, Department of the Treasury, Securities & Exchange Commission (SEC)

  • Applies to:

    Financial institutions critical to the US economy

    • But many other agencies around the world are adopting similar rules


Business continuity and disaster tolerance services from hp
Business Continuity and Disaster Tolerance Services from HP Whitepaper

Web resources:

  • BC Services:

    • http://h20219.www2.hp.com/services/cache/10107-0-0-225-121.aspx

  • DT Services:

    • http://h20219.www2.hp.com/services/cache/10597-0-0-225-121.aspx


Openvms disaster tolerant cluster resources
OpenVMS Disaster-Tolerant Cluster Resources Whitepaper

  • OpenVMS Documentation at OpenVMS website:

    • OpenVMS Cluster Systems

    • HP Volume Shadowing for OpenVMS

    • Guidelines for OpenVMS Cluster Configurations

  • OpenVMS High-Availability and Disaster-Tolerant Cluster information at the HP corporate website: http://h71000.www7.hp.com/availability/index.html andhttp://h18002.www1.hp.com/alphaserver/ad/disastertolerance.html

  • More-detailed seminar and workshop notes at http://www2.openvms.org/kparris/ and http://www.geocities.com/keithparris/

  • Book “VAXcluster Principles” by Roy G. Davis, Digital Press, 1993, ISBN 1-55558-112-9


Disaster proof demonstration
Disaster-Proof Demonstration Whitepaper

  • HP proves disaster tolerance works by blowing up a datacenter’s worth of hardware

  • See video at http://hp.com/go/DisasterProof/

  • For the earlier Bulletproof XP video, see:

  • http://h71028.www7.hp.com/ERC/cache/320954-0-0-0-121.html?ERL=true


T4 Whitepaper

  • T4: T4 collects and (TLviz displays) performance data. To determine inter-site bandwidth used, you can measure the write data rate with T4’s Fibre Channel Monitor (FCM) data collector, since writes translate into remote shadowset writes. http://h71000.www7.hp.com/openvms/products/t4/index.html

    • But typically, bandwidth required to do a Shadowing Full-Copy operation in a reasonable amount of time swamps the peak application write rate and will be the major factor in inter-site link bandwidth selection


Availability manager
Availability Manager Whitepaper

  • Availability Manager:

  • http://h71000.www7.hp.com/openvms/products/availman/index.html

    • Useful for performance monitoring and can also do a quorum adjustment to recover quorum


Questions? Whitepaper


Speaker contact info
Speaker Contact Info: Whitepaper

  • Keith Parris

  • E-mail: [email protected] [email protected]

  • Web: http://www2.openvms.org/kparris/

Produced in cooperation with:


ad