Network Management

Network Management Lecture 4

Performance Management • The practice of optimizing network service response time. • It also entails managing the consistency and quality of individual and overall network services. • The most important service is the need to measure the user/application response time. • For most users, response time is the critical performance success factor. This variable will shape the perception of network success by both your users and application administrators. (cisco)

What is Performance Management • Quantification of performance indicators on • Server • Network • Workstation • Applications • Standard performance goals are: • Response time • Utilization • Throughput • Capacity

In Performance Management • Need to • Maintain continuous indicator for performance evaluation • Verify what levels of service need to be maintained • Identify actual and potential bottlenecks • Establish and report on usage trends

Objectives of Performance Management • Need to ensure that network highway remains accessible and not crowded • Provide a consistent level of service • Avoid degradation of performance • Provideproactivemanagement

Performance Indicators Required • Transmission capacity • Expressed in bits per second • Signal Propagation delay • Time required to transmit signal to its destination • Longer the propagation, the longer the delay

Performance Indicators Required • Topology • Star, Tree, Ring, BUS or Combination of Star and Ring • Would limit number of workstations or hosts per cable segment which can be attached to the network • The higher the number of nodes, the lower the performance

Performance Indicators Required • Frame/Packet Size • Most LANs are designed to support only a specific, fixed size frame or packet • If message is larger than the frame size, it must be broken into smaller sizes • Increased in number of frames per message would add to delay

Performance Indicators Required • Access protocols • Most influential metric • e.g CSMA/CD, Token ring • User Traffic profile • Time of use • Type of message generated by user (Single, Broadcast) • Number of users on line

Performance Indicators Required • Buffer Size • Piece of memory used to receive, store, process and forward messages • If buffer is too small, delays or discarding of packets may occur

Performance Indicators Required • Data Collision and Retransmission • Collision is inevitable • Factors to be considered • Time it takes to detect collision • Transmission time of collided messages

Performance Indicators Required • Resource usage • How much resource is used by the user, application • How much reserves are left • Processing Delays • Can be caused by both host and network • Host delays - divided into system and application processing delays

Performance Indicators Required • Processing Delays (Con’t) • Network - hardware and software cause • (Network card vs network driver) • Throughput • Measurement of transmission capacity • Statistical measurement over time

Performance Indicators Required • Availability • Service availability from an end-user’s point of view • If delays are long, then even if the network is available, where the end user is concern, the network is virtually unavailable

Performance Indicators Required • Fairness of measured data • Important to take measurements at peak to average ratio levels • Collect data at known high usage and average usage periods • Sample measurement • Measurement of traffic volume • Ensure sampling interval is the same as the above

Performance Management Measurement Methods • Collect data on current utilisation of network devices and links • Static vs dynamic • Once off or continuos sampling • Event reporting or polling • Analyse the relevant data • Set utilisation thresholds • Simulate the network

Performance Management Measurement Methods • Good sample size collected • Do not just use one measurement • Do several and take average • Ensure samples are representative • Do measurements at different times of the day/week • Compare load (e.g lunch time load vs end of month)

Performance Management Measurement Methods • Beware of the unexpected • Unusual use on day of test • Backups at 3 am

Threshold and Exception Reporting • Define indicators • Determine frequency of measurements • Define threshold for each indicator • Get guidelines from vendors

Threshold and Exception Reporting • Design reporting systems • Determine information areas and indicators • What equipment, networks or objects are monitored • Determine distribution matrix • Who gets reports • How often and at what level of detail • Presentation

Network Performance Analysis • Data Analysis • What are the effects of hardware/ software on the network? • Dependent on • Network type/protocols • Packet size • Buffer size • Processes running • Routing algorithms

Network Performance Tuning • Tune to service requirements • Calculate payback in advance • Observe the 80-20 rule and 1:4 internet traffic rule • Focus on critical resources • Determine when capacity is exhausted • Define objectives • Determine time frames

System Design for Better Performance • CPU speed is more important than network speed • No effects on bottlenecks • Reduce packet count to reduce software overheads • Each packet has its associated overheads • The more the number of packets, the more the overheads • Increase packets size to reduce packet overheads

System Design for Better Performance • Minimise context switching • e.g kernel to user mode • Waste processing time and power • Reduced by having library procedures that send data to do internal buffering until a substantial amount has been collected before processing. • Minimise copying • Copying e.g from buffer to kernal to network layer buffer to transport layer buffer • Copy procedures should be minimised if not required

Event Correlation Techniques • Basic elements • Detection and filtering of events • Correlation of observed events using AI • Localize the source of the problem • Identify the cause of the problem • Techniques • Rule-based reasoning • Model-based reasoning • Case-based reasoning • Codebook correlation model • State transition graph model • Finite state machine model

Rule-Based Reasoning

Rule-Based Reasoning • Rule-based paradigm is an iterative process • RBR is “brittle” if no precedence exists • An exponential growth in knowledge base poses problem in scalability • Problem with instability if packet loss < 10% alarm green if packet loss => 10% < 15% alarm yellow if packet loss => 15% alarm red Solution using fuzzy logic

Configuration for RBR Example

RBR Example

Model-Based Reasoning • Object-oriented model • Model is a representation of the component it models • Model has attributes and relations to other models • Relationship between objects reflected in a similar relationship between models

MBR Event Correlator Example: Hub 1 fails Recognized by Hub 1 model Hub 1 model queries router model Router model declares no failure Router model declares failure Hub 1 model declares Failure Hub 1 model declares NO failure

Case-Based Reasoning • Unit of knowledge • RBR rule • CBR case • CBR based on the case experienced before; extend to the current situation by adaptation • Three adaptation schemes • Parameterized adaptation • Abstraction / re-specialization adaptation • Critic-based adaptation

CBR: Matching Trouble Ticket Example: File transfer throughput problem

CBR: Parameterized Adaptation • A = f(F) • A’ = f(F’) • Functional relationship f(x) remains the same

CBR: Abstraction / Re-specialization • Two possible resolutions • A = f(F) Adjust network load level • B = g(F) Adjust bandwidth • Resolution based on constraint imposed

CBR: Critic-Based Adaptation • Human expertise introduces a new case • N (network load) is an additional parameter added to the functional relationship

CBR-Based Critter

Codebook Correlation Model: Generic Architecture • Yemini, et.al. proposed this model • Monitors capture alarm events • Configuration model contains the configuration of the network • Event model represents events and their causal relationships • Correlator correlates alarm events with event model and determines the problem that caused the events

Codebook Approach Approach: • Correlation algorithms based upon coding approach to even correlation • Problem events viewed as messages generated by a system and encoded in sets of alarms • Correlator decodes the problem messages to identify the problems Two phases: 1. Codebook selection phase: Problems to be monitored identified and the symptoms they generate are associated with the problem. This generates codebook (problem-symptom matrix) 2. Correlator compares alarm events with codebook and identifies the problem.

Causality Graph • Each node is an event • An event may cause other events • Directed edges start at a causing event and terminate at a resulting event • Picture causing events as problems and resulting events as symptoms

Labeled Causality Graph • Ps are problems and Ss are symptoms • P1 causes S1 and S2 • Note directed edge from S1 to S2 removed; S2 is caused directly or indirectly (via S1) by P1 • S2 could also be caused by either P2 or P3

Codebook • Codebook is problem-symptom matrix • It is derived from causality graph after removing directed edges of propagation of symptoms • Number of symptoms => number of problems • 2 rows are adequate to identify uniquely 3 problems

Correlation Matrix • Correlation matrix is reduced codebook

State Transition Model • Used in Seagate’s NerveCenter correlation system • Integrated in NMS, such as OpenView • Used to determine the status of a node

State Transition Model Example • NMS pings hubs every minute • Failure indicated by the absence of a response

State Transition Graph

Finite State Machine Model • Finite state machine model is a passive system; state transition graph model is an active system • An observer agent is present in each node and reports abnormalities, such as a Web agent • A central system correlates events reported by the agents • Failure is detected by a node entering an illegal state

Network Management