slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Problem Management PowerPoint Presentation
Download Presentation
Problem Management

Loading in 2 Seconds...

play fullscreen
1 / 19

Problem Management - PowerPoint PPT Presentation

  • Uploaded on

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y IT OPERATIONS. Problem Management . Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca, Manager, Problem Management 2/17/11. Agenda. PM Overview History Vision & Mission Operational Level Agreement (OLA) Action Items

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Problem Management

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. A A A N C N U I N F O R M A T I O N T E C H N O L O G Y IT OPERATIONS Problem Management • Jim Heronime, Manager, ITSM Program • Tanya Friehauf-Dungca, Manager, Problem Management • 2/17/11

    2. Agenda • PM Overview • History • Vision & Mission • Operational Level Agreement (OLA) • Action Items • Trending (Proactive Problem Management) • Facilitated Meetings (MIR & ToE) • KPIs and Metrics • Future Initiatives • Questions? Problem Management Team Members

    3. Problem Management Overview • Main goal of Problem Management: • Detection of the underlying causes of an incident and the subsequent resolution and prevention of the incidents. • Problem Management ensures: • The identification and classification of problems, root cause analysis, and resolution of problems • Problem Management process also includes: • The formulation of recommendations for improvement, maintenance of problem records, and review of the status of corrective actions

    4. History of PM at AAA • Began our formal Problem Management practice in 2008. • Track major incidents • ID Root cause for major incidents • Rudimentary MS-Access dB to store info • Began formal implementation of ITSM in June 2009 • Average root cause found was 55.4% • Mean time to close problems = 6 days • Implemented current iteration of Problem Management October 2009. By January 2010. • Average root cause found was 83% • Mean time to close problems = 3 days • We continue to mature our process

    5. Vision and Mission • VISION: • To permanently eliminate problems in our production environment and prevent new problems from occurring • MISSION: • To aggressively identify root cause of problems and drive permanent solutions to stabilize our IT infrastructure • We do this by: • PROCESSES: Ensuring PM processes and procedures are followed by IT support teams • ACTION ITEMS: Managing assigned action items and their timeframes with support teams to drive permanent solutions • ROOT CAUSE: Driving root cause identification within OLA timeframes

    6. OLAs for PM Be aggressive: 3 Business days to identify root cause - Report enables us to track daily progress

    7. Action Items • Objective: • Action items are identified and assigned to drive permanent solutions • Types of Action Items: • Root cause identification for every problem created from an incident • Areas of improvement • Documentation • Process improvement & training • Vendor management • Hardware replacement • How are Action Items identified? • Incident management activities • Problem management activities – Root Cause Analysis • Meetings: Daily IT Operations Meeting, Major Incident Review (MIR), or Team of Experts (ToE) • How are they tracked? • Maximo – integrated system with Change, Incident, and Asset

    8. Trend Analysis (Proactive Problem Management) • Objective: • Analyze related incidents for common root causes • Collaboration with Operations Bridge: • Weekly work sessions to identify potential areas of concern • The Problem Management team reviews related incidents to look for common symptoms, causes, or conditions • Commonalities identified by trend analysis? • A Global Problem record is created and assigned to the Service Owner with appropriately assigned action items • Service Owner analysis: • The Service Owner prioritizes their efforts • Determine to identify root cause • Prioritize and approve with business for funding, scheduling

    9. Major Incident Review (MIR) • What is it? • Evaluation of the incident process after a major incident • What’s it’s purpose? • Validate details of the incident record • Review incident handling – identify opportunities • Identify lessons learned - share across the enterprise • Identify action items • When is one required? • Mandated for all Severity 1 incidents • Lower severities by request or as needed • Why does Problem Management facilitate a Major Incident Review? • Unbiased view of events – no call involvement

    10. MIR Agenda

    11. MIR Template

    12. Team of Experts (ToE) • What is it? • A special team of technical subject matter experts (SMEs) assembled to analyze and resolve critical problems at an accelerated pace to minimize or eliminate exposure. • How long has this process been in place? • This is one of our newest additions – since December 2010 • Why are ToEs initiated? • Teams not collaboratively engaging each other • Need to identify root cause immediately – back to back incidents • Leadership’s request for information and status of critical or chronic problems

    13. ToE (cont.) • ToE Activities • Root cause analysis • Brainstorm solutions and permanent fixes • Assign action items and due dates • Where’s the template? • Currently under construction

    14. KPIs and Metrics • KPIs • Root cause identified within OLA • MIRs conducted for Sev1 Incidents • Operational Metrics • Total Problems by Severity • Problems by Causing Party • Outages by Domain (Applications, Network, Security, Servers, Telecom or Other)

    15. KPIs *Baseline determined by internal historical data = 82% *Industry standards non-existent

    16. KPI Details *2010 Average for RC Identified within OLA = 85.7%

    17. Examples of Metrics *Change Freeze AT&T AAA NCNU

    18. Future Initiatives • Workarounds and defects – Known Error Database • Action item validation – quality check on completed actions • ToE template development

    19. Questions? • PROBLEM MANAGEMENT TEAM MEMBERS • Mark Hernandez - IT Service Transition Analyst V • Gessica Briggs-Sullivan – IT Service Transition Analyst III • Andrew Egan - Intern