Automated fault diagnosis in voip
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Automated Fault diagnosis in VoIP PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on
  • Presentation posted in: General

Automated Fault diagnosis in VoIP. 31st March,2006 Vishal Kumar Singh and Henning Schulzrinne. VoIP Diagnosis. What is automated VoIP diagnosis Determining failures in network Automatically finding the root cause of the failure Why VoIP diagnosis

Download Presentation

Automated Fault diagnosis in VoIP

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Automated fault diagnosis in voip

Automated Fault diagnosis in VoIP

31st March,2006

Vishal Kumar Singh and Henning Schulzrinne


Voip diagnosis

VoIP Diagnosis

  • What is automated VoIP diagnosis

    • Determining failures in network

    • Automatically finding the root cause of the failure

  • Why VoIP diagnosis

    • Networks are complex, making it difficult to troubleshoot problems

    • Automatic fault diagnosis reduces human intervention

  • Issues in VoIP diagnosis

    • Detecting failures/faults

    • Finding the cause of failure, determining dependency relationships among different components for diagnosis

  • Solution steps and approaches


Issues in automated voip diagnosis

Issues in Automated VoIP Diagnosis

  • Increasingly complex and diverse network elements

  • Complex interactions/relationships between different network elements

  • Different run time bindings for each application usage instance, e.g., different calls may use different DNS, SIP proxy servers, media path

  • Problem in one network element may manifest itself as user perceived failure of another element


Fault identification

Fault Identification

  • Service unavailability reporting

    • Node/Device/UA generates faults (failure events) e.g. SNMP Traps, failure messages

    • Monitoring application e.g., SNMP based application detects service unavailability and reports the failure event

    • Affected user reports service unavailability , e.g., by e-mail, calling to helpdesk, automatically by pressing a button on phone while in a call and experiencing echo

    • Dependent application detects service unavailability and generates fault (failure events)


Fault localization determining the source of problem

Fault Localization : Determining the Source of Problem

  • Fault Classification – Local Vs. Global

    (Does it affect only me or Does it affect others also)

    • Global failures

      • Server failure e.g. SIP proxy, DNS failure, DB failures

      • Network failures

    • Local failures

      • Specific Source failure e.g. node A cannot make call to anyone

      • Specific destination or participant failure e.g. No one can make call to node B

    • Locally observed but global failures e.g., DNS service failed, but only B observed it.


Solution approach

Solution Approach

  • DYSWIS “Do you see what I see” [1]

  • Peers (Nodes) perform diagnostic tests when another peer reports or detects failure

  • Nodes can choose the diagnostic test depending on dependency encoded as decision tree

  • Nodes (at least some) will be initially preloaded with the dependency relationship in some format (e.g., XML based)

  • Nodes (at least some) may build and update the dependency relationship based on statistical and temporal analysis of failure events which they receive and diagnostic tests which they perform


Solution approach1

Solution Approach

  • Store context information of past failures experienced by each node

    • E.g., specific server that was acting as the proxy server (for my call which failed)

  • Store locality of past failures instances

    • LAN, domain, subnet

    • First hop at each layer e.g., switch (MAC), default gateway (IP), domain’s proxy (Application layer),

  • Failure count for each network element (statistical)

  • Last failure timestamp for each network element

  • Last successfully seen timestamp for each network element (why do I need to test the proxy for you, my call just went through)

  • Temporal correlation of past failures (proxy seems to be failing after DNS fails)

  • Each node has a runtime dependency list based on past failures and diagnostic tests


Solution architecture

P6

P2P

P2P

PESQ Test

P5

P7

P2P

Service Provider 1

Service Provider 2

P2P

P8

P4

P2P

P2P

DNS Test

SIP Test

P2

SIP Server

DNS Server

P3

P2P

P2P

P1

Call Failed at P1

Domain A

Solution Architecture

Nodes in different domains cooperating to determine cause of failure


Solution architecture logical view

Dependencies

encoded as

decision tree, static

and dynamic

rules

Alerts

Admin

input

Solution Architecture: Logical View

Failures in

Network

Dependency graph generation

[Bayesian network based,

Inference, other models ]

Test results

Decision Tree updates

Triggers

to perform TESTS.

(Peer selection and

Probe selection.

[Dependency

relationships and

tests (XML) ]

The above figure shows logical entities and separation of dependency graph generation and Distributed diagnostic infrastructure (enclosed in blue).


Solution requirements

Solution Requirements

  • Request-Response protocol between the node which experiences the failure and the peer nodes

  • Nodes capability to perform diagnostic tests (probes), probe selection based on cost/result

  • Encoding the dependency relationship into a decision tree (giving as an input from an expert e.g., as XML)

  • Peer node discovery, based on

    • Location (local network, domain)

    • Capability to perform tests (based on specific tests)

  • Dependency graph generation and updation, based on

    • Network failure events

    • Diagnostic test results correlated with failures


Test probe selection

Test/ Probe Selection

  • Which diagnostic probe to run – network layer or application layer and for what kind of failures.

    • A probe covering broad range of failures can give faster and crude but less accurate results

      • E.g. PING vs TCP Connect vs. SIP PING tests

    • Cost of Probe


Dependency classifications

Dependency Classifications

  • Functional dependency:

    • At generic service level e.g. SIP proxy depends on DB service, DNS service

  • Structural dependency

    • Configuration time e.g. Columbia CS SIP proxy is configured to use mysql database on metro-north

  • Operational dependency

    • Runtime dependencies or run time bindings, e.g., the call which failed was using failover SIP server obtained from DNS which was running on host a.b.c.d in IRT lab


Dependency classifications layered approach

Dependency classifications: Layered Approach

  • Vertical and Lateral dependencies: Applications depends on other application layer services (e.g., SIP service depends on DB, DNS service) as well as lower layer services

    • OSI layers as service dependency layers

      • Application layer service also depends on transport layer service which in turn depends on network layer service

        • MAC layer: Access point, Switch

        • Network layer: Router

        • Application layer: DNS, SIP, Database

  • Topology based dependency

    • e.g., calls from CS domain depends on specific SIP server, calls from lab phones depends on specific switches and routers


Dependency graph

Dependency Graph


Dependency graph encoded to decision tree

A

A

A Failed,

Use Decision Tree

D

B

C

C

Yes

No

Invokes Decision

Tree for C

B

No

Yes

A = SIP Call

C = SIP Proxy

B = DNS Server

D = Connectivity

D

Invokes Decision

Tree for B

Yes

No

Cause Not Known

Report, Add new

Dependency

Invokes Decision

Tree for D

Dependency Graph Encoded to Decision Tree


Diagnostic tests

Diagnostic Tests

  • SIP proxy

    • Proxy server availability

      • SIP PING

    • Call Routing availability

      • Invite tests

    • Call Path determination

      • SIP TraceRoute

  • Media path

    • Quality related

      • Speech quality degradation - MOS

      • Echo

      • jitter- MOS, PESQ

      • QoS – RTCP

    • NAT/Firewall

      • Checking binding expiration.

      • Firewall failure to open a port - One way media.

        • How to determine which Firewall in the path ? SIP signaling ?


Diagnostic tests1

Diagnostic Tests

  • DNS tests

  • DHCP

  • Switch/Router

    • ARP/RARP/Multicast

    • BGP failures

  • Conference mixers

  • Gateway

    • Echo return loss- readings- Analysis

  • DB

  • XCAP server tests

  • Presence service availability tests


Example

Example

  • Call Failure – Possible Causes

    • SIP Proxy server

      • Database

      • Authentication

    • Media path failure

      • Gateway

        • Specific call legs – ERL, Authentication, etc.

    • DNS server failure

    • End station failure

    • Network failure, e.g., router, switch failure

  • Different calls will have different run time dependencies


Mapping to a human medical system

Mapping to a Human Medical System

  • Doctors perform diagnostic tests to find out the cause of disease when the symptoms are mentioned – They may learn new things about the disease as a part of diagnostic tests

    • Failures and triggered tests update the dependency graph

  • Medical researchers do different types of tests to learn about new diseases, determine the cause and relationship of a disease with other physiological system

    • Set of tests that can run periodically and can be used to build dependency graph independent of failures


Solution evolution

Solution Evolution

  • Learning the dependency graph from failure events and diagnostic tests

  • Learning using random/periodic testing to identify failures and determine relationships


Future directions

Future Directions

  • Self healing

  • Predicting failures

  • Protocols for labeling event failures which would enable automatically incorporating new devices/applications to the dependency system

  • Decision tree (dependency graph) based event correlation


Reference

Reference

  • [1] User-oriented Management of VoIP Applications (http://www.ibr.cs.tu-bs.de/projects/nmrg/meetings/2005/nancy/dyswis.pdf)


  • Login