IBM Data Center Operations & Management Services ECC – Enterprise Command Center Solutions

IBM Data Center Operations & Management Services ECC – Enterprise Command Center Solutions
IBM Data Center Operations & Management Services ECC – Enterprise Command Center Solutions

Presentation Transcript

  1. IBM GTS Site & Facilities Services (COE) – Center of Excellence June 2013 • .“By 2015, more then half of organizations that don’t yet have a data center operations command center will create one” • -Gartner,Best Practices for a World-Class Data Center Operations Command Center” IBM Data Center Operations & Management Services ECC – Enterprise Command Center Solutions Providing efficiency savings and optimization of service delivery, and management of your data center operations environment

  2. IBM Offers the Full Spectrum of Site & Facilities Services Data Center and Facilities Strategy & Services Identify and analyze requirements, define best option IT Facilities Assessment, Design and Construction Data Center Operations and Management Services IT Facilities Consolidation and Relocation Build new data centers or optimize existing ones Higher Availability, Reduced Cost & Complexities, Optimized Command Center Design/Build & Operations Reduce risk, improve redundancy measures • Data Center build out- Engineer to order • Scalable Modular data center for SMB • Site Assessment Audit Services • Thermal analysis for high density computing – New • Data Center – Health check Program • High density computing data center readiness assessments • Data Center Energy Efficiency Assessment • Enterprise Command Center Health Check Services – ECC, NOC, EOC, SOC, etc. • Integrated DC/ECC-NOC Design/Build and IT Operations services • Data Center Integrated Monitoring & Managed Services • Integration of IT and Facilities Management • Data Center Automated Operations • Data center global consolidation and relocation enablement IBM has built over 3 million sq m of raised floor space globally North Shore LIJ and IBM Confidential

  3. Why do we need an operationally efficient ECC/NOC ? Customers’ challenges are driven by the following… Reduction in Complexity Continued cost pressure Improved Availability 14% $2.8M/Hr 80% Of unplanned outages are caused from people and process failures The number of organizations surveyed that use extensive workflow and task automation Average cost of unplanned outages 70% 10% Of IT costs are on operations and firefighting The number of operations groups that are capable of defining their IT portfolio The number of organizations surveyed that make use of quality metrics 34%

  4. What is an ECC/NOC ? The “place” where people, process, and technology, comes together, to manage the data that drives the business PHYSICAL DESIGN LOGICAL DESIGN • Physical Environment - Design • and Build • Human Centered Design • Ergonomics • Circadian Rhythms • Acoustic theory / EMF • Ambient task lighting • Color theory • Aromatherapy • Large screen technologies • Temperature - Human/Machine • 24/7 Operator Seating • Audible Alarms • Critical Situation Room sSecurity (Attended and Unattended Zones) • Facilities – Environmental -Building Mgmt interface • Design/Build plans, architecture, +3D snap shots • Measurements & Metrics • Tools & Technologies • Process & Procedures • Skills & Resources • Organization & Governance • Knowledge Management • Automation & Metrics • Quantitative & Qualitative • Minimal Human Intervention & Error 4 4 4

  5. DC Operations – ECC/NOC “Best Practices” results

  6. Enterprise Command Center Logical and Physical DesignA major financial client expects to exceed $18M savings over a 3-year period Business challenge • Lower operating costs by 20% • Reduce the number of physical command centers • Improve operational efficiency Solution • Roadmap to consolidate from 7 to 2 command centers • Maintain availability throughout consolidation • Process enhancements to enable virtualization • Metrics to measure progress against service level agreements and cost targets Benefits • Overachieve expected savings: 30% of original budget • Expected improvements: • Mean Time to Recovery by 10% percent • Maintaining a defect rate below 1% 6

  7. IBM ‘s ECC solutions are based on best practices from over 25 plus years of experience owning and operating enterprise command centers around the world. • Optimized service delivery environment • Deployed waste reduction initiatives on tools and processes • Implemented advanced automation • Created global service line for governance across the globe • Consolidated U.S. command centers • Relocated staff to major locations • Vacated remote command centers • Reduced underutilized skills • Exceeded all tool consolidation targets • Formed governance teams to focus on standardization • Deployed common enterprise operational procedures • Created a single governing body for standardization efforts in the Americas Virtualization Optimization Future state • Improve availability through tools and technology with global dashboards, consolidated to a single pane • Reduce complexity through utilization of any-to-any command center capabilities. • Lower-cost delivery through physically consolidated global delivery facilities. • Enhance efficiency through plug-and-play resource capabilities (labor) Consolidation Standardization 7

  8. IBM’s ECC Health Check Workshop focuses on a standard, modular, and scalable approach to Data Center operations optimization 8 8

  9. Around-the-clock command center evaluation Interviews Surveys Onsite observation Questionnaires Tool analysis Enterprise command center operations service delivery modular approach Organization and governance Measurement and metrics Tools and technology Process and procedure Skills and resources Physical environment Knowledge management Automated metrics tracking tool Organization maturity assessment Auto discovery tools Six Sigma/Lean methodology tool set Time-in-motion studies Command center virtual rendering Audit compliance verification Enterprise command center execution roadmap Baseline Tactical Roadmap Metrics scorecard Optimized staffing models Financial - cost justification Consolidation plans Strategic Roadmap Standardization Consolidation Optimization Virtualization IBM’s ECC Health Check Workshop helps organizations develop a cost-effective strategy to command center maturation through a scalable, modular, and integrated approach, resulting in higher availability and improved operational efficiencies • Reduce operating costs by up to 20 percent • Reduce operational complexity through improved resource utilization • Exceed availability requirements through reduction of outages and automation implementations (IBM has seen a reduction in high severity incidents resulting in a 49% improvement in availability) • Improve operational efficiency through improvements to metric tracking, tool automation, and enhanced skills and governance 9 9 9

  10. ECC Health Check Workshop - Modular, Scalable & Integrated Approach: The Workshop focuses on a four phase approach – Planning, Data Collection, Analysis and Recommendations *Post Workshop Recommendations by priority and complexityfollow on consulting activities can be performed as part of an additional phase of work Planning Data Collection Analysis Recommendations Execution GTS Services Gaps Implementation Consulting IBM LeadingPractices IBM Collect Data Identify Data Sources IBM Research and Methodology Managed Services Needs Identify Interviews and Complete Questionnaire Final Report Analysis Meetings Research Starburst Customer Provides docs Define Tasks/Timeline Baselining Roadmaps and Execution Plans Findings & Observations Leading Practices Provide Thought Leadership 3 – 5 days 5 – 7 Days 3 – 5 days 4 - 8 days GTS Services 4 – 6 Weeks

  11. Logical Command Center Environment Maturity Chart EFFICIENCY EFFECTIVENESS 11 11 11

  12. Physical Command Center Environment Maturity Chart EFFICIENCY EFFECTIVENESS Management Hardening 12 12

  13. Measurements and metrics help improve availability and efficiency of your command center resources and identify areas for improvement. Activities • Develop value-add measurements and metrics • Utilize existing automation for gathering of key performance indicators • Evaluate performance targets against service level agreements • Assess existing targets for credibility and validity An automated metrics tracking tool targeting and tracking key performance indicators. Benefits • Established targets and goals • Communicate goals and progress • Provide tracking and improvement measurements *This represents a typical “new” metric tracking worksheet utilizing your targets. A comparison based on industry standards would portray additional areas in need of attention. IBM experience • IBM continues to drive down key metrics such as mean time to recovery and improve availability toward 99.999 percent requirements. Net effect: Thorough tracking and articulation of goals and achievements across your organization 13 13 13

  14. 1 Inventory & Lifecycle Utility Provider Layers Power Generation Utility Distribution 2 Substation Distrib’n Instrumented Metering M1 Data Center Facility Layers Primary Distribution M3 M2 M4 House System Distribution Mechanical Distribution Critical Distribution • Key Data Center metrics: • PUE(power usage effectiveness): Total kW / IT kW • DCiE (data center IT efficiency): 1 / PUE ( in % ) • CPE (compute power efficiency): Utilization * DCiE • Capacity vs Utilization at all system component levels 3 M5 M8 PDU/RPP Cooling Tower Consolidation & Condition Monitoring M6 M9 Chiller/Handler Rack (circuits) 4 M7 M10 Integrated Analysis & Optimization CRAC/ACU Devices Importance of Instrumented Metering “What gets measured gets managed” -- Peter Drucker • If these systems fail, IT environment will fail: • Power: utility feeds, transformers, generators, UPS, static switches, PDU, sub-panel, circuits • Cooling: chillers, pumps, condensers, air handlers (temperature, humidity, pressure, flow) • Subsystems: leak detection, security, fuel, battery monitoring, fire suppression, etc. Measured power & thermal utilization will drive both resiliency and efficiency at system component levels

  15. By streamlining the physical command center environment, you gain a method to reduce complexity and cost of delivering services. Command Center consolidation Number of command centers Time to implement Net effect: Reduced complexity and cost through consolidation 15 15 15

  16. The ECC health check workshop leverages tools and technology to help reduce human error, drive down resource costs and improve productivity. Activities • Analyze the existing technology implementation • Identify improvement opportunities • Develop a tools and technology action plan focused on efficient use of labor and infrastructure resource Operations labor reduction through automation Benefits • Reduced operating cost by eliminating tools overlap and duplication • Reduced labor costs by reducing manual labor and technology requirements • Reduced complexity by simplifying the technology foot print Monitoring full-time equivalent Tools automated IBM experience • Agnostic evaluation of tools and technology to promote the greatest return on investment • Tools and technology standardization to drive automation and reduce manual labor Net effect: Augmented staffing through automation, with improved tools and technology 16 16 16

  17. Evaluate your available skills and resources to maximize efficiency and productivity Utilization of the Time In Motion Tool to align staffing models to ticket volumes Number of High Severity Incidents Net Effect: staffing levels that are adjusted to map to daily workload requirements *Time in Motion Studies are an available add on Option to the ECC Health Check Workshop 17 17 17

  18. A mature Organization and Governance model reinforces delivery excellence Automation Improvements Manager to FTE Ratios Consolidation Waste Elimination Net effect: A streamlined, cost effective Organization with a mature, delivery focused governance model 18 18 18 18

  19. Process and Procedures Activities Review the existing process and procedures framework Collect and validate a sample set of procedures and procedures as they relate to core ITIL competencies Document opportunities for streamlining or enhancement Benefits Repeatable, reliable process and procedures to manage delivery Enhanced availability due to reduced mean time to recovery and operator error Reduced cost of delivery through streamlining of process and procedures IBM Experience Reduced downtime by focused improvements to Incident and change process’s Creating robust, repeatable processes and procedures are foundational to increasing availability and driving down cost Number of Sev1 Incidents Handled within Criteria as part of the Incident Management Process Net Effect: Streamlined Enterprise Processes and Procedures improve key performance indicators

  20. IBM’s Data Center & Command Center Strategy • Simplify IT environment • Migrate applications into fewer images • Reduce operational resources • Improve application monitoring & tuning - IT Infrastructure Energy Efficiency Strategy • Remove physical resource boundaries • Increased hardware utilization • Allocate less than physical boundary • Reduce software licensing costs Application Integration • Consolidate many servers into fewer on physical resource boundaries • Reduce system management complexity • Reduce physical footprints • Consolidate many centers into fewer • Reduce infrastructure complexity • Improve facilities management • Reduce staffing requirements • Improve business resilience (manage fewer things better) • Improve operational costs Virtualization State-of-the-Art • Integrated power management • Direct liquid cooling • Combined heat and power Physical Consolidation Best Practices • Hot and cold aisles • Improved efficiency transformers, UPS, chillers, fans, and pumps • Free cooling Centralization Improved Operations • Conservation techniques • Infrastructure energy efficiency • Improved airflow management Facility Infrastructure Energy Efficiency Strategy

  21. Integration of -Building, Environment, Facilities &IT Infrastructure IT Infrastructure Monitoring and Control Monitoring and Control Thermal Servers • Mainframe / Midrange • Distributed (Unix & Intel) • Blade Servers Facility Cooling AlternativePower Backup Battery Data Network IT Cooling IT Power Electric Services H2O Storage • SATA Disk Array • Tape & Optical • Blended Water Cooling Towers Economizer Pumps Network • Account / Corporate • Modem, Router, Hub • Bandwidth & Encryption • Integrated Blade/Switch Utility Chillers Computer Room Air Conditioning Rates,Incentives Substation Communicating Revenue Meter Servers Power • Floor Mount PDUs • Floor Mount PDU panels • Floor Mount PDU circuits • Rack Mount PDUs / IPDUs • Circuit Amps, Watts, Routing Thermal • Heat eXchangers • Liquid Cooled Racks • Perimeter & In-row AHU • Cooling Distribution Units • Liquid Cooled IT Equipment • MMT thermodynamic sensors Storage $ Network Generator Parallel or Transfer Eqpt Power Distribution Units Medium Voltage>600VAC Eqpt Data Center Raised Floor Central UPS Switchgear Low Voltage600VAC Eqpt Power CHP Fuel Cell, MicroTurbine or Turbine DC Power • Correlation of facility & IT assets with real-time monitoring enables space, power, and thermal optimization!

  22. Physical Command Center Design Considerations

  23. Natural Disasters and Conflict Create Great Risk 23

  24. 24 x 7 operational requirements must be supported to be effective Monitoring infrastructure accelerates data protection capabilities Shared large screen dashboard information enables team approach to proactive solutions Improved verbal and visual communications promote faster solutions During Abnormal Situations - Management should be Normal 24

  25. ECC Physical Environment and Design Considerations Optimal Planning of an ECC - DC Operations physical environment requires knowledge within multiple physical and logical components and disciplines that you might not expect. • New research and technology • Site selection criteria • Human centered design • Acoustic theory • Lighting • Interior design • Color theory • EMF • AutoCAD • Optical illusions • Aromatherapy • Circadian rhythms • Ergonomics • Psychology • Physical Security • Adjacency • Mind Maps • Facility Infrastructure • Computers • Servers • KVM • Audio • Video • Display technology • Communications workflow • Phone systems • Basic electronics theory • Architectural standards • Construction • HVAC • Sensor technology • Software • Dashboard design • Windows Unix Linux • Midrange, mainframe • ITIL • Metrics 25

  26. Basic Acoustics Sound waves bounce around a room and concentrate in corners reflecting a stronger wave, this can happen 100’s of times until the sound waves energy is absorbed or dissipated 26 ECC Physical Boot Camp Training 26

  27. Ergonomic Studies • Console Design • Console Location • Screen Height Off Floor • Projected Light Path • Staff adjacency • Egress • communications • man machine interface • tabletop height • keyboard/mouse location • view & distance to Information Display Wall Ergonomic Studies play an important role in maximizing the ECC 27 27

  28. Large Screen Displays Types Projectors Cubes Flat screen New technologies 28 ECC Physical Boot Camp Training 28

  29. The Human Eye The sense of vision in humans is only slightly faster than our hearing. We can’t see in the dark Every human see’s color differently that makes sight a subjective reference Approximately 30% of the population has some form of color blindness Approximately 7% of the population is totally color blind 29 ECC Physical Boot Camp Training 29

  30. Seat Map IBM design incorporates seat mapping to enhance work productivity and related tasks Seat mapping related tasks and escalation paths provides better communication for more rapid issue resolution 30

  31. Technical Console Furniture What furniture provides complete support for your business and staff? 31

  32. Operator Console Design IBM has studied many areas of human engineering • IBM Console design is developed to maximize attentive productivity • Interface with logical tools and hardware • Design to the tasks and consolidation of any and all systems • Ensure proper screen size and glass area match the tasks • Hardware configuration and remote connectivity Technical Console Furniture, Operator hardware, and staff ergonomics are key factors in Operator Console Design 32

  33. Seating Systems Which provides the best support for staff productivity 33

  34. Command Center Lighting Light needs to be controlled Too bright reduces the visual communications ability of the large screen displays Too dim creates environments prone to mistakes Should you use direct or indirect lighting Should you use task lighting, what type Should you consider LED lighting • Lets look at these rooms and consider the room lighting 34

  35. Acoustic Room Examples Lets look at these rooms and consider the room acoustics • A perfect square does not promote communication • A parabolic dish focuses sound and amplifies noise back into the environment • Equipment rooms next door have noisy equipment which causes communication issues • Noise masking systems add noise and are not affective in these environments 35

  36. Resolution The display system is the visual dashboard it is a key component The number of applications monitored will determine the number of screens and resolution required Projectors make a big picture but if you need to display 10 applications the resolution will not support a good image Display Technologies Visual Acuity • Using a border around data, Humans read it 26% faster. • Using a second color to display data, Humans read it 78% faster. Based on a 1986 study by the Pennsylvania College of Optometry. 36

  37. Display Technologies What technologies provide the best systems and environmental information to my staff 37

  38. IBM’s Global Command Centers History

  39. Global Systems Operations Command Center Monitoring System and application availability and recovery Production Services Library and Batch Management, Code Promotion and Softcopy Services Recovery Management Business Impact Mitigation – Recovery Leadership and Notification Tactical and Strategic Efficiencies Consolidation of Command Center Footprint Standardization across all Command Centers Global resource deployment Enterprise delivery optimization – Global Delivery Framework Global Command Center Operations Managed Customer/IBM Assets 470 Data Centers 1,200 Mainframe Servers 204,000 Open System Servers 1 Billion Tivoli Alert Notifications Per Month Dublin, Ireland Calgary,Canada Brno, Czech Rep Shenzhen, China Boulder, CO Chennai, India Tokyo, Japan Raleigh, NC Bangalore, India Johannesburg, South Africa Hortolandia, Brazil Buenos Aires, Argentina Sydney, Australia North Shore LIJ and IBM Confidential

  40. 2010 – Data Center Operations & Management Services Launched 2008 Global ECC Health Check Workshop Launched 2006-2007 ECC Consolidation reduces footprints Why IBM ? 25+ Years ECC Solutions - Focused Global Service Line – One Governing Body for the entire world 2006 LEAN deployed, 20% Year over Year Savings Americas Service Line – Governance by Continent Roadmap to Future State Any-to-Any ECC functions & Capabilities 2002 Cross Key Enabler Teams Process, Tools, Skills, Metrics, etc. 1992 Design/Build/Automate => Boulder ECC Geoplex – ECC Solutions Team 2005 Cross Platform & Industry Stratified Resource Model 1989 – SDS - ISSC Clients Outsource Systems “Cost Savings” from Tower Support - DC-ECC Consulting 2000 WWMO => WWAO Automation & Focal Point Cross Platform, Process & Globally Shared Resources Service Delivery Centers – Consolidated Geographic Command Center Governance 1983 – FE Boulder Automated Operations 1996 Distributed Services (ECC Consolidations) ECC Design & Automation Team Geoplexes – Disparate Autonomous Command Centers 40 40 40 40

  41. The IBM Boulder ECC is one of several CC’s in Boulder including (Intrusion Detection, World Wide Command Center, Production Control). All are configured similarly, but the others utilize alternate large screen technology. On Average, we are monitoring over 20,000 servers, 8 million batch jobs per month, and 70 thousand MIPs, for 200 commercial accounts with less than 90 Operators From a Global Perspective we are watching roughly seven times (7x) that number 90% of the devices/application monitored are not physically located on the Boulder site, but are geographically and globally disbursed (other IBM sites, Customer Data Centers, etc). If you enter any one of our global command centers you will see nearly identical process, technology, skills, and metrics. We are working towards global standardization and virtualization. This Command Center contains monitoring for AS400, DS, MF and e-business environments Boulder Operations is the Largest SSO Command Center In North America, and the first Command Center in North America to implement the GDF delivery Model. Our Command Center is currently iin a rapid growth cycle driven by new accounts, GDF implementation and site consolidation. The Command Center has been able to achieve double digit productivity gains for the last 4 years IBM Enterprise Command CentersLeading the way through innovation

  42. Rationalize IBM’s Enterprise Command CentersSmart transformation has delivered cost savings and operational efficiency • ECC Mega-Plex • Best in Class <5 hrs = 99.95% unplanned <12 hrs planned – Source Gartner 2009 • Total number of operators • Approximate average ratio • Instances reduced through Focal Point sign-on North Shore LIJ and IBM Confidential

  43. Clients experience dramatic improvements in availability and problem resolution driven by predictive analytics Policy database 20,000 policies 50% Filter Noise and Superfluous Events Aggregate for future analysis File system warning Web Server non-responsive Failed network device Server Host is down Failed network device Failed network device Web Server non-responsive File system warning Failed network device Failed network device Failed network device File system warning 1 Billion events per month Web Server non-responsive Failed network device Server Host is down Failed network device Web Server non-responsive Failed network device 10% Send to Operator Server Host is down Failed network device File system warning Failed network device File system warning Web Server non-responsive 20% Automatically Correct System takes action to resolve incident Failed network device Failed network device Web Server non-responsive Failed network device File system warning Failed network device 20% Automatically Ticket Send incident to specialist for disposition Faster response to critical business issues impacting service Prevented a large financial services company from risking personal investments of their clients Poor performing server was not allowing transactions to complete Automation identified and corrected the problem – without manual intervention

  44. Client “Green” ECC 44 44

  45. IBM Boulder Command Center - 1992 45 45 45

  46. IBM Boulder Command Center - 2012 46 46

  47. By adopting enterprise command center health check workshop principles from IBM, many businesses have gained significant improvements for their command centers. A major international energy company saved greater than US$1.5 million US dollars through the initiation of global delivery. A major IT company reduced complexity through the elimination of the need to log on to 89 separate ticketing tools, resulting in improved productivity. A major financial institution increased their management span of control by nearly 60 percent, resulting in improved efficiency. A major financial company was able to consolidate the number of command centers from seven to two, helping to reduce complexity within their environment. IBM achieved over US$10 million in labor cost savings through consolidation, optimization and standardization of command centers. IBM improved availability by 49 percent through a reduction in the number of high-severity incidents by optimizing tools, processes and governance. 47

  48. Identify executive champion to lead a Health Check Workshop engagement Validate senior executive support for a data center operations focused optimization program Assemble Data Center operations and Command Center support team Command Center Manager, IT directors, and finance representation *IBM can assist you in identifying the appropriate personnel to participate in this process Identify your Command Center operations goals What are your biggest data center operations issues? What direction would you like to take your enterprise command center? (consolidation, standardization, “Best Practices”, etc) Determine key stakeholders Specify objectives of the ECC Health Check Workshop Identify key success factors Understand scope and level of effort and timeframe Identify physical and logistical requirements The Next Steps…

  49. 49

  50. Thank you for your time today. For any additional information please contact: • Rodney OgataECC Global Offering Executive & OwnerPhone: 1-808-222-2827 e-mail: 50 50 50 50