ai approaches to network fault management n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
AI Approaches to Network Fault Management PowerPoint Presentation
Download Presentation
AI Approaches to Network Fault Management

Loading in 2 Seconds...

play fullscreen
1 / 24

AI Approaches to Network Fault Management - PowerPoint PPT Presentation


  • 365 Views
  • Uploaded on

AI Approaches to Network Fault Management. Andrew Learn 29 Nov 2001. Outline. Fault Management Process AI Approaches Expert Systems Neural Networks Case-based Reasoning. Network Faults. Hardware Wear and tear Cut cables Improper installation Software Incorrect design Bugs

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'AI Approaches to Network Fault Management' - Antony


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Fault Management Process
  • AI Approaches
    • Expert Systems
    • Neural Networks
    • Case-based Reasoning
network faults
Network Faults
  • Hardware
    • Wear and tear
    • Cut cables
    • Improper installation
  • Software
    • Incorrect design
    • Bugs
    • Incorrect data (e.g. routing tables)
fault management process
Fault Management Process
  • Collect alarms
  • Filter and correlate alarms
  • Diagnose faults
  • Restoration and repair
  • Evaluate effectiveness
1 collect alarms
1. Collect Alarms
  • Types of alarms
    • Physical: Failure in communication
      • e.g. loss of signal, CRC failure
    • Logical: Statistical values exceed threshold
      • e.g. number of packets dropped
  • Communication with components
    • Control protocol: Simple Network Management Protocol (SNMP)
    • Data format: Management Information Base (MIB-II, 1990) has ~170 manageable objects
slide6
Sample MIB Entry
  • Sample SNMP “get” call

ipInReceives OBJECT-TYPE SYNTAX Counter ACCESS read-only STATUS mandatory DESCRIPTION "The total number of input datagrams

received from interfaces, including

those received in error." ::= { ip 3 }

snmpget netdev-kbox.cc.cmu.edu public system.sysUpTime.0

 Name: system.sysUpTime.0 Timeticks: (2270351) 6:18:23

2 filter and correlate alarms
2. Filter and Correlate Alarms
  • Filter
    • Eliminate redundant alarms
    • Suppress noncritical alarms
    • Inhibit low-priority alarms in presence of high-priority alarms
  • Correlate
    • Analyze and interpret multiple alarms to assign new meaning (derived alarm)
3 diagnose faults
3. Diagnose Faults
  • May require additional tests/diagnostics on circuits or components
    • Automated or manual
  • Analyze all info from alarms, tests, performance monitoring
  • Identify smallest system module that needs to be repaired or replaced
4 restoration and repair
4. Restoration and Repair
  • Restoration: Continue service in presence of fault
    • Switch over to spares
    • Reroute around trouble spot
    • Restore software or data from backup
  • Repair
    • Replace parts
    • Repair cables
    • Debug software
  • Retest to verify fault is eliminated
5 evaluate effectiveness
5. Evaluate Effectiveness
  • Questions to answer :
    • How often do faults occur?
    • How many faults affect service?
    • How long is service interrupted?
    • How long to repair?
  • Provides assessment of:
    • Performance of fault management system
    • Reliability of equipment
ai approaches to fault management
AI Approaches to Fault Management
  • Well-developed approach:
    • Expert systems
  • New approaches:
    • Neural networks
    • Case-based reasoning
    • Other
why ai
Why AI?
  • Need for intelligence
    • Data analysis
    • Pattern recognition
    • Clustering and categorization
    • Problem solving
  • Need for automation
    • Manual analysis/solution takes time
    • Limited manpower
    • Limited expertise
well developed approach expert systems
Well-developed approach: Expert Systems
  • Expert systems = Rule-base + Working Memory
  • Three parts to rules:
    • Context trigger (when should rule be considered)
    • Condition ( if X . . . )
    • Conclusion ( . . . then Y)
  • Used since 1980’s by major telecomm companies
    • Bell: Automated Cable Expertise (ACE) system
    • GTE: Central Office Maintenance Printout Analysis & Suggestion System (COMPASS)
    • AT&T: Network Management Expert System (NEMESYS)
need for new approaches
Need for New Approaches
  • Weaknesses of expert systems
    • Brittle in unforeseen situations
    • Cannot learn from experience
    • Hard to maintain (adding/deleting/modifying rules)
    • Knowledge acquisition bottleneck
    • Can’t handle incomplete or probabilistic data
  • Factors driving new approach
    • Rapidly changing technology
    • Dynamic network topology
    • Network complexity
    • Competition, demand for QoS
neural nets
Neural Nets
  • Structure: input, hidden, output layers
  • Training
    • Supervised: Input pattern & desired output
    • Unsupervised: Clustering of similar inputs

weights

Input

Output

Hidden

neural nets1
Neural Nets
  • Advantages
    • Pattern matching & generalization
    • Fast & efficient
    • Trainable
    • Handles incomplete, ambiguous data
  • Disadvantages
    • Black box
    • Lack of training data
neural net example
Neural Net Example
  • Example: Alarm correlation in cell phone networks (Univ of Hannover, Germany)

Maintenance Center

MC

BS1

Microwave Links

BSC

BS2

Base Station Controller

Switching Centers

Mobile units

Base Stations

neural net example1
Neural Net Example
  • Test Results:
    • 94 alarms
    • 99.76% correct classification with up to 25% noise

BSC alarms

ML-1 fault

.

.

.

Initial Cause

BS-1 alarms

ML-2 fault

.

.

.

BS-2 alarms

case based reasoning
Case-Based Reasoning
  • Case-based reasoning = matching previous examples
    • Case library: Set of previous faults, diagnoses, solutions
    • Usually based on “trouble ticket” help-desk databases
  • Design considerations:
    • What are key attributes of a case?
    • What attributes will be used to index & access a case?
case based reasoning1
Case-Based Reasoning
  • Advantages
    • Easier knowledge acquisition than expert systems
    • Can learn by adding new cases
    • Doesn’t require extensive maintenance
  • Disadvantages
    • Requires time-consuming user interaction
    • No help for first-time problems
case based reasoning example
Case-Based Reasoning Example

Case 134

Problem Type: Performance

Description: High error rate in comm between POA-SP & DF

No access: Intermittent

Retrieval: Case 103 [Similarity = 0.69]

Description: 64kb line from VendorX drops big datagrams.

Additional Info requested: Is there loss of big datagrams in ping test? (Result: Yes)

Cause: Link 34 inside Bldg 207 was defective

Solution: Vendor replaced cabling.

summary of 3 ai methods
Summary of 3 AI Methods
  • Expert systems
    • If / then rules
    • Well-developed technology
    • Brittle, hard to maintain
  • Neural networks
    • Output = weighted transform of inputs
    • Fast pattern matching, robust to noise
    • Black box, lack of training data
  • Case-based systems
    • Trouble-ticket retrieval
    • Easy to build, maintain
    • Slower diagnosis, takes time to build
other approaches
Other Approaches
  • Bayesian networks
    • Model statistical probabilities and dependence of faults
  • Mobile intelligent agents
    • Independent software agents cooperate to collect info, suggest solutions
future trends
Future Trends
  • Proactive fault detection
    • Recognizing trouble signs and taking corrective action before service degrades
  • Hybrid systems
    • Multiple AI methods integrated