Information technology computing services
1 / 61

Information Technology & Computing Services - PowerPoint PPT Presentation

  • Uploaded on

East Carolina University. Information Technology & Computing Services. Planning for “What if” Events. Carol Davis, IT DRP Coordinator Jonathan Rose, Systems Programmer. Agenda. ITCS Disaster Recovery Planning Goals ITCS DRP Overview Activation of the Plan Review of Team Responsibilities

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Information Technology & Computing Services' - pello

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Information technology computing services

East Carolina University

Information Technology & Computing Services

Planning for “What if” Events

Carol Davis, IT DRP Coordinator

Jonathan Rose, Systems Programmer


  • ITCS Disaster Recovery Planning Goals

  • ITCS DRP Overview

  • Activation of the Plan

  • Review of Team Responsibilities

  • ITCS and Departmental Testing

  • Recovering a Mission Critical System

  • ITDRP Centralized Sharepoint

  • Campus Disaster Planning

  • Other Discussion

Interesting facts
Interesting Facts…

  • Nearly 60 percent of organizations don’t train employees about their roles and responsibilities in the event of a disaster. More than 80 percent of organizations have locally-managed life safety plans in place, but only 20 percent of those respondents have evacuation and relocation plans

  • Although 65 percent of respondents said business recovery plans are important, only 37 percent of organizations test their business recovery plans each year. Another 29 percent merely recognize the need for such plans.

  • More than 60 percent of organizations have plans for recovering key IT Assets such as mainframes and networks. Yet, more than 20 percent of respondents said these plans are focused solely on getting machines working again after a disaster. Only one-third of respondents said their organizations test telecommunications recovery plans annually.

McCollum, 2005 - ITAUDIT

Primary goals of drp
Primary Goals of DRP

  • Details the correct course of action to follow in the event of a disaster

  • Planning helps to minimize confusion, errors, and expense

  • Quick and complete recovery of critically outlined services

  • Involves departments in business continuity

Secondary goals of drp
Secondary Goals of DRP

  • Reduce risks of loss of services

  • Provide ongoing protection of university assets

  • Learn departmental critical needs for recovery efforts

  • Ensure the continued viability of this Plan

  • Provide DR training in an annual disaster recovery retreat for staff to understand their recovery roles

Policy statement
Policy Statement

  • Identifying & protecting assets within their control

  • Ensuring employees understand their obligation to protect identified assets

  • Implementing security practices and procedures consistent with generally accepted practices

  • Assigning responsibilities for establishing, maintaining, and testing a Disaster Recovery Plan

What is cobit
What is COBIT?

  • COBIT stands for Control Objectives for Information and Related Technology

  • Issued by the IT Governance Institute and accepted internationally as good practice for control over information and IT related risks.

  • COBIT is a way to bridge the communication gap between IT functions, the business and auditors, by providing a common approach, understandable by all.

Cobit framework
COBIT Framework

  • There are 34 high-level control objectives & 318 detailed control objectives

  • The four groups are planning & organization, acquisition & implementation, delivery & support, and monitoring

  • Addressing the high-level control objectives can ensure that an adequate control system is provided for the IT environment.

Itcs disaster recovery

ITCS Disaster Recovery

Plan Overview

The plan components
The Plan Components

Readiness Team - Responsible for constructing and maintaining the Disaster Recovery Plan, for managing the DR activities, and for the continued viability of the Plan

Major Services and Key Considerations - Descriptions of the critical applications, identification of users, and key considerations such as equipment configurations, user work schedules, and processing priorities

Drp components continued
DRP Components (continued)

General Procedures for Potential Interruptions – Likely causes of service interruptions, instructions for handling the interruptions (e.g., fire, power outage, and telecommunications failure)

Policies for Reducing Risks – Policies for:

  • Disasters that may occur

  • Excessive damage when they do occur

  • Failing to recover from a disaster

Drp components continued1
DRP Components (continued)

Contingency Site Description– The facilities provided and all requirements associated with the use of the site

Recovery Procedures for a Major Disaster - Instructions and procedures to be followed in the event of a major disaster (e.g., activating the emergency procedures, establishing operations at the contingency site, and restoring the university to normal operations)

Drp components continued2
DRP Components (continued)

Testing and Maintenance of the Plan - Policies and procedures for ensuring the Plan remains viable as the business environment evolves

Disaster Recovery Scenarios - Examples that illustrate differences in recovery steps and elapsed times for emergencies of minor, moderate, and major severity

Major services critical applications
Major Services - Critical Applications

  • Electronic Mail

  • Healthcare Applications

  • Financial Applications

  • Student Records/Registration

  • Academic Applications

  • Public Web Services

  • Phone Services

  • Banner transition items

  • Infrastructure systems

Major services priorities
Major Services - Priorities

1. Healthcare Applications

2. Financial Accounting

3. Purchase Order

4. Student Records*

5. Fixed Asset

6. All Others

* May have a higher priority during registration

Systems testing schedule
Systems Testing Schedule

  • Administrative Applications Testing Schedule was developed last year

  • This helps proactively plan by utilizing a testing rotation schedule

  • New applications must be added as needed

  • SCT Banner is requiring changes to this schedule

General procedures for potential interruptions
General Procedures for Potential Interruptions

  • Fire (Prevention, Detection, Extinguishing, Evacuation)

    • Call the fire department immediately (911) and utilize a pull station. If the fire is small, use a fire extinguisher.

    • Fire extinguishers are located in the Operations Computer Room adjacent to each computer room exit and located throughout the computer room and building as per the fire inspector’s recommendations.

    • If the employees need to evacuate the building and no alarm has sounded, utilize a pull station. If there is time, computer operations should power down the system(s) before cutting power. Trip the Emergency Power Off (EPO) or if this fails, shut off the main breakers in the mechanical room.

General procedures for potential interruptions1
General Procedures for Potential Interruptions

  • Electrical power outages

  • Network or telecommunications failure

  • Flooding

  • Hardware failure

  • Software failure

  • Major disasters

Emergency procedure goals
Emergency Procedure Goals

  • Protect the lives and health of employees

  • Protect essential documents, records, and data

  • Minimize damage to data processing equipment and other property

Policies for reducing risk
Policies for Reducing Risk

  • Protection of computer data

  • Backup of data, hardware, supplies, and documentation

  • Security of Data Center Operation

  • Offsite storage of tapes and materials

  • Insurance on equipment

  • Be prepared as much as possible!

Contingency site description
Contingency Site Description

  • SunGard primary and secondary hotsite location with account manager information

  • Service arrangement with machine configuration and facilities is located on the (SunGard Schedule A)

  • Travel/Hotel accommodations for staff are made by the Administrative Staff

  • SunGard emergency numbers

Itcs disaster recovery1

ITCS Disaster Recovery

Readiness Team Responsibilities

Drp readiness team



Carol Davis










Offsite Emergency






DRP Readiness Team

Readiness team roles
Readiness Team Roles

  • The “Disaster Management Team”

  • Purpose is to establish and direct plans of action

  • Maintain readiness for emergencies

  • Manage DR activities following a disaster

  • Administration of the Plan

  • Emergency Control Center

  • Offsite operations

Emergency coordinators
Emergency Coordinators

  • Develop and coordinate the Readiness Team

  • Activate and direct all activities during disaster

  • Review and update DRP annually

  • Evaluating readiness of action teams

  • Maintain the Emergency Control Center

  • Liaison with local fire and polices agencies and other involved parties

  • Assists with campus disaster recovery needs

Offsite coordinators
Offsite Coordinators

  • Review the Plan and ensure adequacy of testing and contingency site procedures

  • Conduct periodic tests of contingency site

  • Communicate status of contingency operations via Emergency Control Center

  • Backup Emergency Coordinators as needed

Action team leaders
Action Team Leaders

  • Review the DR Plan with respect to recovery procedures, team responsibilities, changes in personnel, availability of resources

  • Recommend changes or improvements to the Plan

  • Assist in annual training and training others on the team on disaster recovery efforts.

Itcs disaster recovery2

ITCS Disaster Recovery

Action Team Responsibilities

Action teams
Action Teams






Action Team



Offsite Emergency




















Emergency action teams
Emergency Action Teams

Applications Team

Team Leader

Database Team

Team Leader

Infrastructure Wiring

Team Leader

Telecomm Team

Team Leader

Facilities Team

Team Leader

Operations Team

Team Leader

SysMain Team

Team Leader

Systech Team

Team Leader

Network Team

Team Leader

Administrative Team

Team Leader

- Individual teams and team leaders are responsible for ordering and tracking needed hardware.

- All ITCS employees are considered critical staff and may be asked to participate in one of the defined roles.

Action team responsibilities
Action Team Responsibilities

  • Operations Team ensures the resumption of computer services following a disaster by restoring and continuing scheduled processing at the contingency site until such time that operations can resume at the original or replacement data center.

  • SysMain/SysTech is to restore or replace needed systems in the event of a disaster.

Action team responsibilities1
Action Team Responsibilities

  • Network/Telecom Team is to restore or replace the data or telecommunication systems.

  • Administrative Team is responsible for arranging transportation, housing, expense advances, shipping, etc., and performing clerical and other functions.

  • Applications Team ensures proper functioning of the applications at the contingency site and to coordinate with users about how their applications should be operated during the contingency period.

Action team responsibilities2
Action Team Responsibilities

  • Database team is responsible for recovery of any and all database activities and works with the other teams as needed on recovery efforts.

  • Infrastructure Wiring is to restore or replace needed wiring in the event of a disaster.

  • Facilities Team is to restore or replace the Data Center and other data processing facilities following a disaster.

Itcs disaster recovery3

ITCS Disaster Recovery

Activation of the DRP

Readiness team notifications
Readiness Team Notifications

  • Public Safety may contact the Emergency Coordinator

  • Readiness Team Leaders will assist in notifications to assemble the team at the Data Center or Emergency Control Center

  • Quick reaction of the readiness team is crucial

  • The situation will be assessed to determine the needed course of action

Readiness team notifications1
Readiness Team Notifications

  • Ensure the Emergency Coordinator or Alternate Emergency Coordinator is contacted if this hasn’t been completed.

  • If the situation is judged to be a major disaster:

    • Activate Emergency Control Center

    • Notify Top management

    • Notify Readiness and Action Teams

    • Notify the Offsite storage site

    • Notify the Offsite contingency site

Emergency control center
Emergency Control Center

  • Provide centralized and coordinated control of communications during emergencies

  • Primary site: should be designated

  • Secondary site: should be designated

  • Activated by Emergency Coordinator or Alternate Emergency Coordinator

  • Emergency Coordinators and Team Leaders to coordinate their actions with the Emergency Control Center

Sungard alert notification
SunGard Alert Notification

  • Call SunGard NUMBER

  • Inform the operator whether you are calling in an alert notification or a disaster declaration.

  • Please provide the following information:

    • Your company’s full name

    • Your name and password (if applicable)

    • The address of the site affected

    • Primary and secondary phone numbers where you can be reached

    • The nature of the alert or disaster

    • The type of systems/servers that you are declaring or placing on alert

    • The SunGard facility your company utilizes for testing

  • A Crisis Management team member will access your Disaster Declaration Authorization (DDA) form to ensure you are authorized to provide an alert notification

Itcs disaster recovery4

ITCS Disaster Recovery

Annual Testing

Drp testing maintenance
DRP Testing & Maintenance

  • ITCS DR Plan is to be tested annually

  • The Plan is to be revised at least once every two years or as needed with technology updates

  • A hard copy and electronic copies are distributed to the readiness teams

  • MS Sharepoint is used to maintain the IT DR Plan under the Master, Planning, Testing sites for updates and is accessible depending on access privileges

2005 hotsite testing
2005 Hotsite Testing

  • Recover the system & applications from backups to vendor supplied hardware at the “hot site” in Chicago

  • Allow system and departmental testers in Greenville to remotely test the applications running in Chicago

  • Complete testing recovery templates

  • Review the IT Disaster Recovery Plan for updates and suggestions

Recovering a mission critical system

Recovering a “Mission Critical System”

ITCS Disaster Recovery

What is a mission critical system
What is a “Mission Critical System”

  • A system so critical to the functioning of an organization that its destruction or loss would cause an extreme interruption to the business, have significant financial implications and or threaten the health or safety of a person

An integrated environment
An Integrated Environment

  • “System” as it relates to recovery planning should include all business assets necessary to deliver the service






What if planning data center destruction scenario
“What If” PlanningData Center Destruction Scenario

  • It’s the weekend and you are at home enjoying a pizza and watching the NCAA tournament. Your boss calls and leaves voice mail on your answering machine indicating that a tornado has struck your data center. The facility has suffered significant damage and your sites critical systems have been damaged. He needs you to prepare for travel to the “hot site” and recover the systems.

Quiz what do you do
Quiz: What Do You Do?

Multiple Choice: (Select all that apply)

  • Pretend that you didn’t get the message. Finish your pizza and enjoy the game

  • Fall out, dream you’re on the Apprentice, in the board room with “Donald”. You’re Fired

  • Confidently contact your boss to begin executing your thoroughly tested disaster recovery plans

3 keys to a successful recovery
3 Keys to a Successful Recovery

  • Backups

    • Without good backups you are rebuilding your system, not recovering it

  • Available Hardware

    • Can’t restore to what you don’t have

  • Procedures & Training

    • Document & Test your procedures

Backups data protection
Backups (Data Protection)

  • Build in as much data redundancy as possible. (RAID, Shadowing, etc.)

  • Frequent Backups – The more the better

  • Randomly test restoring your data

  • Track the age of tapes used for backups

  • Adequate number of tapes in rotation

  • Offsite storage of recent backups

Available hardware
Available Hardware

  • Identify & Avoid single points of failure

  • Build in as much redundancy as possible (CPU, Memory, power, NICS, disks,…)

  • Ensure Secondary Offsite Hardware

    • Option 1: Identical offsite system

    • Option 2: Offsite Cluster Member

    • Option 3: Contract with recovery company

Procedures training
Procedures & Training

  • Develop verbose procedures explaining the recovery process in your environment

  • Make sure your procedures are readily available to all necessary staff

  • Test your procedures – Practice makes perfect

What if planning
“What If” Planning

  • At the start, focus your planning on scenarios that affect the critical 3. Data, Hardware and Know How

  • Be proactive and not reactive - “An ounce of prevention is worth a pound of cure”, so build in redundancy to avoid single points of failure

  • The old cliché holds true, if you fail to plan then plan to fail

What we do at east carolina
What We Do at East Carolina

  • Data Redundancy

    • Nightly “Full” Backups

    • Monitor vintage of tapes and rotate backups offsite

    • Monthly restore of Live data to Development system

  • Hardware Availability

    • Redundant components on Live & Development systems

    • Development system capable of running Live

    • Contract with SunGard for recovery services

  • Know How

    • Verbose procedures on recovering the environment

    • Yearly offsite disaster recovery test

Itcs disaster recovery5

ITCS Disaster Recovery

ITCSDRP Sharepoint Site

Itcsdrp sharepoint site
ITCSDRP Sharepoint Site

  • (example)

Itdrp sharepoint site
ITDRP Sharepoint Site


    • The ITCSDRP top-level site is the central starting point for ITCS Disaster Recovery.


    • This site contains the MASTER IT Disaster Recover Plan (DRP) manual in electronic format. 


    • Those needing modify access in ITCS will have contributor rights to the PLANNING site. 


    • The TESTING site is for those in ITCS and at the department level involved in annual testing. 

Itcs disaster recovery6

ITCS Disaster Recovery

Campus Disaster Planning

Campus disaster planning
Campus Disaster Planning

  • The Crisis Decision Team addresses University wide issues such as class canceling or other mission oriented issues.

  • Campus Operations organizes and prioritizes the physical response and recovery efforts

  • EH&S organizes the actual Emergency Operations Center to provide overall coordination of recovery efforts

  • ITCS and other critical departments operate their own EOC's which coordinate their recovery efforts with the central EOC

Campus emergency operations center eoc
Campus - Emergency Operations Center (EOC)

  • University Emergency Coordinator oversees campus emergencies

  • Key administrators form the Emergency Management Team

  • Todd Dining in the Sweatheart Banquet Room is the primary EOC location

Itcs disaster recovery7

ITCS Disaster Recovery

Questions & Answers