300 likes | 487 Views
Disaster Recovery . Technical Plan summary 2007. Presentation Purpose. Understand Impact of Various Disaster Situations What it means to operations Impact duration Understand Alternate Work Options and Recovery Process Understand Roles .
E N D
Disaster Recovery Technical Plan summary 2007
Presentation Purpose • Understand Impact of Various Disaster Situations • What it means to operations • Impact duration • Understand Alternate Work Options and Recovery Process • Understand Roles
DECLARATION OF A DISASTER, which activates all DR procedures, would be would be made in the event of a facility loss or regional disaster. Activation of subsections or various disaster alternate work means, however, may occur in the event of various service failures (example: phone services out of operation). Various Possible Disaster Scenarios
Aside from a regional disaster, the most significant impact would occur with to the Farr Regional Library. Impact to Operations
In the event of the loss of a branch or member library, impact would be less extensive. Impact to Operations
1) Data (data only, not system configurations) • ILS • Email • Individual Shares • Department Shares • 2) Systems – all systems to be rebuilt • 3) Paper Copies – not included First, let’s confirm what we are trying to protect/manage with a disaster recovery plan. DR PLAN - What’s Included
What if we don’t do anything, is it worth the effort? Basically, we’re looking at insurance plans to cover a risk/investment of $1.5-2.5 million. Cost of Downtime
The optimal solution is one that determines the best fir for cost versus time to recover (and also to what point in time data is recovered). Optimal Solution
Disaster Recovery (DR) • Disaster recovery. Typically this is associated with a technology recovery plan. • Business Continuity Plan • An overarching business disaster recovery plan which includes staffing, public communication, and more • Recovery time • Time for the system to be operational and available for use • Point in time recovery • The amount (in time) of data that may be lost as a result of the process There are a few basic terminology items to be aware of. Terminology
The disaster recovery effort has been divided into three subject areas. The first efforts define the procedures to be followed during a disruption to technical services. The second effort surrounds the technical recovery design which impacts how long emergency procedures will need to be followed and how successfully data can be recovered to a specific point in time. Finally, the last area looks at all services to ensure the most cost effective appraoches are being taken for recovery.
Emergency Boxes at each location • Downtime Phones • Afterhours support information for facilities and IT • ILS downtime procedure • PC Res downtime procedure • All telephone and/or network downtime procedures • Filtering product downtime This section of a disaster recovery plan focuses on ensuring appropriate materials are available and staff are trained and can operate during downtime situations. I. Continue Operations
This table depicts at the highest level, the alternate means by which operations can be conducted in the event of a regional failure. Alternate Work Processes
Or simpler yet… • Know where the emergency box is at your location. • Immediately begin to use alternate work methods (to continue operations as normally as possible) • Check http://www.fred.sharepointspace.com for updates.
The speed of recovery is dependent upon -vendor response -resource time available (priorities) -equipment availability -complexity of the recovery II: Speed of Recovery
For the WLD, when backups occur only the data of the system is being retained. In some instances systems configurations are backed up but a full system installation is typically not captured. What this means: although the data is available, the hardware must be recovered and then all software reloaded. WLD is assessing when virtual machines can be created and easily backed up. Awareness point
Three point in time technical designs were considered. Note all designs assume a baseline of primary equipment designed fro high availability with redundant power supplies, RAID and other standard features inherent in business class servers and equipment.
Hosted backup w/ tape archive • expert resources monitoring and tracking data backup process • additional capacity available to provide needed data in the even to fan emergency (can work with other vendor partners) • best medium for ensuring data integrity (avoiding bad tapes, etc) • estimated solution duration 2 maybe three years • Dependent on data and systems growth • Review annually to determine if best fit • What it looks like • 7 days of onsite and offsite data backup on disk (fast recovery) • After 7 days, historical data available on tape (at risk for older recovery) • Virtual Machines • HIP • Horizon if possible (testing to start now) • MyLibrary • Other…. After reviewing options WLD will be working with Iron Mountain Best Fit
The table to the right depicts the estimated length of time needed for various services to be available, both temporary work means and full as well as full recovery where normal operations have resumed. Time to Recover
Continued data collection and research in 2007 • Share data • Email file size • Work processes (IT example) • What about personal folders, archives • Review findings and develop a recommendation in 2008 • Share policies/use • Email mailbox size rules • Other? Cost management includes efforts to help smartly manage data growth and use. Is money being spent backing up old or inappropriate data (mp3 files, family pictures, other?) Cost Management
CONTINUE OPERATIONS: • Staff immediately shifts to downtime operations • RECOVERY QUICKLY: • Director/associate director immediately updates FRED sharepoint with first conference call time • Site is http://fred.sharepointspace.com – updates will be posted on the DR page • All managers to participate • use 866-258-0959 meeting room ID *1338021* using 1857 • Daily meetings at 8:15 daily (target breif, 15 minute information sharing) until recovery is completed • Managers to update staff after daily meeting • Communication to staff posted on DR site • IT will join all morning meetings to provide updates and will post specific information on the DR site as well Assuming the worst case disaster (Farr destroyed) short of a regional catastrophe
Staffing information • Do staff report, where, when? • Will staff be paid? • Public and board communication plan • How to keep public notified of the status • Peer communication plan • ILL services, other, how to operate • Actual physical recovery if location destroyed • Rebuild/other? • Insurance processes • timeline for recovery (and again, staff impact in the interim) • Other? What’s not defined, subjects for a business continuity plan
Next Steps as of Sept 12, 2007 • Present to branch managers for awareness of full plan • Complete testing of Horizon VM instance • Due in 30 days • Complete testing of Iron mountain service • In process, decision due October • Complete migration of applicable services to virtual machines (includes installing separate copy • Q1 2008 • Finalize the archive configuration for the ILS • Work with Kari/Managers to train appropriate staff on store and forward uploads • Conduct DR test on Nov 5 Final Next Steps