Network Troubleshooting

Network Troubleshooting Accessing the WAN– Chapter 8 Modified by Mike Haines 09/20/2008

Objectives • In this chapter, you will learn to: • Establish and document a network baseline. • Describe the various troubleshooting methodologies and troubleshooting tools. • Describe the common issues that occur during WAN implementation. • Identify and troubleshoot common enterprise network implementation issues using a layered model approach.

Documenting Your Network • To efficiently diagnose and correct network problems, a network engineer needs to know network baseline . • This information is captured in documentation. • Network documentation include 3 components: 1. Network configuration table 2. End-system configuration table 3. Network topology diagram 1. Network Configuration Table • Contains up-to-date records of hardware and software • Type of device, model designation • IOS image name • Device network hostname • Location of the device (building, floor, room, rack, panel) • If it is a modular device, include all module types and in which module slot they are located • Data link layer addresses • Network layer addresses • Any additional important information about physical aspects of the device

Documenting Your Network 2. End-system Configuration Table • Contains baseline records used in end-system devices such as servers, and desktop workstations. • Device name (purpose) • Operating system and version • IP address • Subnet mask • Default gateway, DNS server, and WINS server addresses • Any high-bandwidth network applications that the end-system runs 3. Network Topology Diagram • Graphical representation of a network, which illustrates how each device in a network is connected and its logical architecture. • Routing protocols can also be shown. • Symbols for all devices and how they are connected • Interface types and numbers • IP addresses • Subnet masks

Network Documentation Process • When you document your network, you may have to gather information directly from routers and switches. • Commands that are useful to the network documentation process include: • The ping command is used to test connectivity with neighboring devices. Pinging to other PCs in the network also initiates the MAC address auto-discovery process. • The telnet command is used to log in remotely to a device for accessing configuration information. • The show ip interface brief is used to display the up or down status and IP address of all interfaces. • The show ip route command is used to display the routing table in a router to learn the directly connected neighbors, more remote devices (through learned routes), and the routing protocols. • The show cdp neighbor detail command is used to obtain detailed information about directly connected Cisco neighbor devices.

Why is Establishing a Baseline Important? • Establishing a network performance baseline requires collecting key performance data from the ports and devices that are essential to network operation. • How does the network perform during a normal or average day? • Measuring the initial performance allows a network administrator to determine the difference between abnormal behavior and proper network performance. • Where are the underutilized and over-utilized areas? • It may also reveal areas in the network that are underutilized and quite often can lead to network redesign efforts based on quality and capacity observations. • Where are the most errors occurring? • In addition, analysis after an initial baseline tends to reveal hidden problems. • What thresholds should be set for the devices that need to be monitored? • Can the network deliver the identified policies? • The baseline also provides insight into whether the current network design can deliver the required policies.

Steps for Establishing a Network Baseline 3 steps for planning the first baseline: • Step 1. Determine what types of data to collect • When conducting the initial baseline, start by selecting a few variables that represent the defined policies. If too many data points are selected, the amount of data can be overwhelming. • Generally, some good measures are interface utilization and CPU utilization. • Step 2. Identify devices and ports of interest • . Devices and ports of interest include: • Network device ports that connect to other network devices • Servers • Key users • Anything else considered critical to operations. • By narrowing the ports polled, the results are concise, and network management load is minimized.

Steps for Establishing a Network Baseline Step 3. Determine the baseline duration This period should be at least seven days to capture any daily or weekly trends. A baseline needs to last no more than six weeks. Generally, a two-to-four-week baseline is adequate. The figure shows examples of several screenshots of CPU utilization trends captured over a daily, weekly, monthly, and yearly period. The work week trends are too short to accurately reveal the recurring nature of the utilization surge that occurs every weekend when a database backup operation consumes network bandwidth. The yearly trend shown in the example is too long a duration to provide meaningful baseline performance details. Baseline analysis of the network should be conducted on a regular basis. Analysis must be conducted regularly to understand how the network is affected by growth and other changes.

Measuring Network Performance Data • Sophisticated network management software is often used to baseline large networks. • For example, Fluke Network SuperAgent module enables administrators to automatically create reports using Intelligent Baselines feature. • This feature compares current performance levels with historical observations and can automatically identify performance problems and applications that do not provide expected levels of service. • In simpler networks, the baseline tasks may require a combination of manual data collection and simple network protocol inspectors. • Hand collection using show commands on individual network devices is extremely time consuming and should be limited to mission-critical network devices.

General Approach to Troubleshooting • Using efficient troubleshooting techniques shortens overall troubleshooting time. • Two extreme approaches to troubleshooting almost always result in disappointment, delay, or failure. • At one extreme is the theorist, or rocket scientist, approach. • The rocket scientist analyzes and reanalyzes the situation until the exact cause at the root of the problem has been identified. • While this process is fairly reliable, few companies can afford to have their networks down for the hours or days. • At the other extreme is the impractical, or caveman, approach. • The caveman's first instinct is to start swapping cards, cables, and software until miraculously the network begins operating again. • This approach may achieve a change in symptoms faster, it is not reliable. • the better approach is somewhere in the middle using elements of both. • It is important to analyze the network as a whole rather than in a piecemeal fashion. • A systematic approach minimizes confusion and cuts down on time otherwise wasted with trial and error.

Using Layered Models for Troubleshooting OSI Versus TCP/IP Layered Models • OSI Reference Model • The upper layers (5-7) deal with application issues and are implemented only in software. • The lower layers (1-4) handle data-transport issues. • Layers 3 and 4 are generally implemented only in software. • The physical layer (Layer 1) and data link layer (Layer 2) are implemented in hardware and software. • TCP/IP Model • The application layer in the TCP/IP suite actually combines the functions of the three OSI model layers: session, presentation, and application. • The transport layers of TCP/IP is responsible for exchanging segments between devices. • The Internet layer is responsible for placing messages in a fixed format that allows devices to handle them. • The network access layer communicates directly with the network media and provides an interface between the architecture of the network and the Internet layer.

General Troubleshooting Procedures • The stages of the general troubleshooting process are: • Stage 1 Gather symptoms - Troubleshooting begins with the process of gathering and documenting symptoms from the network, end systems, and users. • Symptoms may appear in many different forms, including alerts from the network management system, console messages, and user complaints. • Stage 2 Isolate the problem - The problem is not isolated until a single problem, or a set of problems, is identified. • Stage 3 Correct the problem - Having isolated and identified the cause of the problem, the network administrator works to correct the problem by implementing, testing, and documenting a solution. • If the network administrator determines that the corrective action has created another problem, • the attempted solution is documented, the changes are removed, and the network administrator returns to gathering symptoms and isolating the problem.

Troubleshooting Methods • There are three main methods for troubleshooting: • Bottom-Up Troubleshooting Method • In bottom-up troubleshooting you start with the physical components of the network and move up through the layers. • Bottom-up troubleshooting is a good approach to use when the problem is suspected to be a physical one. • Top-Down Troubleshooting Method • In top-down troubleshooting your start with the end-user applications and move down the layers of the OSI model. • Use this approach for simpler problems or when you think the problem is with a piece of software. • Divide-and-Conquer Troubleshooting Method • In divide-and-conquer troubleshooting you start by collecting user experience of the problem, document the symptoms and then, using that information, make an informed guess as to which OSI layer to start your investigation. • For example, if users can't access the web server and you can ping the server, then you know that the problem is above Layer 3. • If you can't ping the server, then you know the problem is likely at a lower OSI layer.

Guidelines for Selecting a Troubleshooting Method • To quickly resolve network problems, take the time to select the most effective troubleshooting method. • Use the process shown in the figure to help you select the most efficient troubleshooting method. • For example: Two IP routers are not exchanging routing information. The last time this type of problem occurred it was a protocol issue. So you choose the divide-and-conquer troubleshooting method.

Gathering Symptoms • Step 1. Analyze existing symptoms • Analyze symptoms gathered from the trouble ticket or users to form a definition of the problem. • Step 2. Determine ownership • If problem is within your system, move onto next stage. • If the problem is outside the boundary of your control, for example, lost Internet connectivity you need to contact an administrator for the external system. • Step 3. Narrow the scope • Determine if the problem is at the core, distribution, or access layer of the network. • Step 4. Gather symptoms from suspect devices • Use knowledge and experience to determine if the problem is a hardware or software problem. • Step 5. Document symptoms • Sometimes the problem can be solved using the documented symptoms. If not, begin the isolating phase of the general troubleshooting process.

Gathering Symptoms • Use the Cisco IOS commands to gather symptoms about the network. • Although the debug command is an important tool for gathering symptoms it generates a large amount of console message traffic and the performance of a network device can be noticeably affected. • Make sure you warn network users that a troubleshooting effort is underway and that network performance may be affected. • Remember to disable debugging when you are done.

Gathering Symptoms: Questioning End Users • When you question end users about a network problem they may be experiencing, use effective questioning techniques. • This way you will get the information you need to effectively document the symptoms of a problem. • The table in the figure provides some guidelines and end-user example questions.

Software Troubleshooting Tools • NMS Tools • Network management system (NMS) tools include device-level monitoring, configuration, and fault management tools. • Network monitoring software graphically displays a physical view of network devices, allowing network managers to monitor remote devices without physically checking them. • Examples are CiscoView, HP Openview, Solar Winds, and What's Up Gold. • Knowledge Bases • On-line network device vendor knowledge bases have become indispensable sources of information. • When vendor-based knowledge bases are combined with Internet search engines like Google, a network administrator has access to a vast pool of experience-based information.

Software Troubleshooting Tools • Baselining Tools • For example they can help you draw network diagrams, help you to keep network software and hardware documentation up-to-date and help you to cost-effectively measure baseline network bandwidth use. • Many tools for automating the network documentation and baselining process are available. • The figure shows a screen chapter of the SolarWinds LAN surveyor and CyberGauge software. • Protocol Analyzers • A protocol analyzer decodes the various protocol layers in a recorded frame and presents this information in a relatively easy to use format. • The figure shows a screen capture of the Wireshark protocol analyzer. • Most protocol analyzers can filter traffic that meets certain criteria so that, for example, all traffic to and from a particular device can be captured.

Hardware Troubleshooting Tools • Network Analysis Module • A network analysis module (NAM) can be installed in Cisco Catalyst 6500 series switches and Cisco 7600 series routers to provide a graphical representation of traffic. • Digital Multimeters • Digital multimeters (DMMs) are test instruments that are used to directly measure electrical values of voltage, current, and resistance. • Cable Testers • Cabling testers can be used to detect broken wires, crossed-over wiring, shorted connections, and improperly paired connections. • These devices can be inexpensive continuity testers, moderately priced data cabling testers, or expensive time-domain reflectometers (TDRs). • TDRs are used to test the distance to a break in a cable. • TDRs used to test fiber optic cables are known as optical time-domain reflectometers (OTDRs).

Hardware Troubleshooting Tools • Cable Analyzers • Cable analyzers are multifunctional handheld devices that are used to test and certify copper and fiber cables for different services and standards. • The more sophisticated tools include advanced troubleshooting diagnostics that measure distance to performance defect (NEXT, RL), identify corrective actions, and graphically display crosstalk and impedance behavior. • Portable Network Analyzers • Portable devices that are used for troubleshooting switched networks and VLANs. • By plugging the network analyzer in anywhere on the network, a network engineer can see the switch port to which the device is connected and the average and peak utilization. • The analyzer can also be used to discover VLAN configuration, identify top network talkers, analyze network traffic, and view interface details.

Troubleshooting Tools: Research Activity • The following are links to various troubleshooting tools. • Software Tools • Network Management Systems: • http://www.ipswitch.com/products/whatsup/index.asp?t=demo • http://www.solarwinds.com/products/network_tools.aspx • Baselining Tools: • http://www.networkuptime.com/tools/enterprise • Knowledge Bases: • http://www.cisco.com • Protocol Analyzers: • http://www.flukenetworks.com/fnet/en-us/products/OptiView+Protocol+Expert/ • Hardware Tools • Cisco Network Analyzer Module (NAM): • http://www.cisco.com/en/US/docs/net_mgmt/network_analysis_module_software/3.5/user/guide/user.html • Cable Testers: • http://www.flukenetworks.com/fnet/en-us/products/CableIQ+Qualification+Tester/Demo.htm • Cable Analyzers: • http://www.flukenetworks.com/fnet/en-us/products/DTX+CableAnalyzer+Series/Demo.htm • Network Analyzers: • http://www.flukenetworks.com/fnet/en-us/products/OptiView+Series+III+Integrated+Network+Analyzer/Demos.htm

WAN Communications • WAN technologies function at the lower three layers of the OSI reference model. • A communications provider normally owns the data links that make up a WAN. • The links are made available to subscribers for a fee and are used to interconnect LANs or connect to remote networks. • WAN data transfer speed (bandwidth) is considerably slower than the common LAN bandwidth. • The charges for link provision are the major cost element, therefore the WAN implementation must aim to provide maximum bandwidth at acceptable cost.

Steps in WAN Design • WAN connectivity is important to business and expensive, these are the steps for designing or modifying a WAN: • Step 1. Locate LANs - Establish the source and destination endpoints that will connect through the WAN. • Step 2. Analyze traffic - Know what data traffic must be carried, its origin, and its destination. • Step 3. Plan the topology - A high requirement for availability requires extra links that provide alternative data paths for redundancy and load balancing. • Step 4. Estimate the required bandwidth - Traffic on the links may have varying requirements for latency and jitter. • Step 5. Choose the WAN technology - Suitable link technologies must be selected. • Step 6. Evaluate costs - When all the requirements are established, installation and operational costs for the WAN can be determined and compared with the business need driving the WAN implementation.

WAN Traffic Considerations • The table in the figure shows the wide variety of traffic types and their varying requirements of bandwidth, latency, and jitter that WAN links are required to carry. • To determine traffic flow conditions and timing of a WAN link, you need to analyze the traffic characteristics specific to each LAN that is connected to the WAN.

WAN Topology Considerations • Designing a WAN topology consists of the following: • Selecting an interconnection pattern or layout for the links between the various locations • Selecting the technologies for those links to meet the enterprise requirements at an acceptable cost • More links increase the cost of the network services, but having multiple paths between destinations increases reliability. • Adding more network devices to the data path increase latency and decreases reliability. • Many WANs use a star topology. • As the enterprise grows and new branches are added, the branches are connected back to the head office, producing a traditional star topology. • Star endpoints are sometimes cross-connected, creating a mesh or partial mesh topology. • This provides for many possible combinations for interconnections.

WAN Topology Considerations - Hierarchical When many locations must be joined, a hierarchical solution is recommended. For example, imagine an enterprise that is operational in every country of the European Union and has a branch in every town with a population over 10,000. Each branch has a LAN, and it has been decided to interconnect the branches. A mesh network is clearly not feasible because there would be hundreds of thousands of links. A three-layer hierarchy is often useful when the network traffic mirrors the enterprise branch structure and is divided into regions, areas, and branches Group the LANs in each area and interconnected them to form a region, The area could be based on the number of locations to be connected with an upper limit of between 30 and 50. The area would have a star topology, with the hubs of the stars linked to form the region. interconnect the regions to form the core of the WAN. Regions could be geographic, connecting between three and 10 areas, and the hub of each region could be linked point-to-point.

WAN Connection Technologies • A typical private WAN uses a combination of technologies that are usually chosen based on traffic type and volume. • ISDN, DSL, Frame Relay, or leased lines are used to connect individual branches into an area. • Frame Relay, ATM, or leased lines are used to connect external areas back to the backbone. • ATM or leased lines form the WAN backbone. • Technologies that require the establishment of a connection before data can be transmitted, such as basic telephone, ISDN, or X.25, are not suitable for WANs that require rapid response time or low latency.

WAN Connection Technologies • Frame Relay and ATM are examples of shared networks. • Because several customers are sharing the link, the cost to each is generally less than the cost of a direct link of the same capacity. • Although ATM is a shared network, it has been designed to produce minimal latency and jitter through high-speed internal links sending easily manageable units of data, called cells. • ATM cells have a fixed length of 53 bytes, 48 bytes for data and 5 bytes for the header. ATM is widely used for carrying delay-sensitive traffic. • Frame Relay may also be used for delay-sensitive traffic, often using QoS mechanisms to give priority to the more sensitive data. • Leased lines are typically more expensive than access links but are available at virtually any bandwidth and provide very low latency and jitter [They are not shared].

WAN Topology Considerations • Although the Internet may pose a security problem it does provides an alternative for inter-branch traffic. • Part of the traffic that must be considered during design is going to or coming from the Internet. • Common implementations are to have each network in the company connect to a different ISP, or to have all company networks connect to a single ISP from a core layer connection.

WAN Bandwidth Considerations • Many companies rely on the high-speed transfer of data between remote locations. • Consequently, higher bandwidth is crucial because it allows more data to be transmitted in a given time. • When bandwidth is inadequate, competition between various types of traffic causes response times to increase, which reduces employee productivity and slows down critical web-based business processes.

Common WAN Implementation Issues • The figure summarizes the common WAN implement issues and the questions you need to answer before you can effectively implement a WAN.

Case Study: WAN Troubleshooting From an ISP’s Perspective • A significant proportion of the support calls received by an ISP refer to slowness of the Network. To troubleshoot, you have to isolate the components: • Individual PC host • A large number of user applications open on the PC. • Tools like Task Manager can help determine CPU utilization • LAN • If the customer LAN is frequently reaching 100 percent utilization. This is a the customer internal problem . • This is why a network baseline is so important. • Link from the edge of the user network to the edge of the ISP • This problem is ISP's responsibility. • Backbone of the ISP • the ISP can determine which link is causing the problem. • Server being accessed • In some cases the slowness, being attributed to the network, may be caused by server congestion. This problem is the hardest to diagnose and it should be the last option pursued after all other options have been eliminated.

Interpreting Network Diagrams to Identify Problems • It is impossible to troubleshoot any type of network connectivity issue without a network diagram. • Physical Network Diagram • A physical network diagram shows the physical layout of the devices connected to the network. • Device type • Model and manufacturer • Operating system version • Cable type and identifier • Cable specification • Connector type • Cabling endpoints • Logical Network Diagram • A logical network diagram shows how data is transferred on the network. • Device identifiers • IP address and subnet • Interface identifiers • Connection type • DLCI for virtual circuits • Site-to-site VPNs • Routing protocols • Static routes • Data-link protocols • WAN technologies used

Symptoms of Physical Layer Problems • The physical layer transmits bits from one computer to another and regulates the transmission of a stream of bits over the physical medium. • Common symptoms of problems at the physical layer: • Performance lower than baseline - If performance is unsatisfactory all the time, the problem is probably related to inadequate capacity. • Loss of connectivity - If a cable or device fails, the most obvious symptom is a loss of connectivity. • High collision counts - Collisions are normally a more significant problem on shared media. Collision-based problems may be a bad cable to a single station on a hub. • Network bottlenecks or congestion - If an interface fails, routing protocols may redirect traffic to other routes that are not designed to carry the extra capacity. • High CPU utilization rates - High CPU utilization rates are a symptom that a device, such as a router, switch, or server, is operating at or exceeding its design limits. • Console error messages - Error messages reported on the device console indicate a physical layer problem.

Causes of Physical Layer Problems • Power-related • Power-related issues are the most fundamental reason for network failure. Check the operation of the fans. • Hardware faults • Faulty NICs can be the cause of transmission errors due to late collisions, short frames, and jabber. • Cabling faults • Many problems can be corrected by simply reseating cables that have become partially disconnected. • Look for damaged cables, improper cable types, and poorly crimped RJ-45s. • Attenuation • Attenuation can be caused if a cable length exceeds the design limit for the media (for example, an Ethernet cable is limited to 100 meters (328 feet). • Interface configuration errors • Many things can be misconfigured to cause it to go down. • Serial links reconfigured as asynchronous instead of synchronous • Incorrect clock rate • Incorrect clock source • Interface not turned on

Causes of Physical Layer Problems • Noise • Local electromagnetic interference (EMI) is known as noise. There are four types of noise that are significant to networks: • Impulse noise that is caused by voltage fluctuations or current spikes induced on the cabling. • Random (white) noise that is generated by such as FM radio stations, police radio, and building security. • Alien crosstalk, which is noise induced by other cables in the same pathway. • Near end crosstalk (NEXT), which is noise originating from crosstalk from other adjacent cables or noise from nearby electric cables, devices with large electric motors, or anything that includes a transmitter more powerful than a cell phone. • Exceeding design limits • A component may be operating suboptimally at the physical layer because it is being utilized at a higher average rate than it is configured to operate. • CPU overload • One of the causes of CPU overload in a router is high traffic. If some interfaces are regularly overloaded with traffic, consider redesigning the traffic flow in the network or upgrading the hardware.

To isolate problems at the physical layers • Check for bad cables or connections • Verify that the cable is properly connected and is in good condition. Your cable tester might reveal an open wire. • Check that the correct cabling standard is adhered to throughout the network • Verify that the proper cable is being used. For example, in the figure, the Fluke meter detected that a cable was good for Fast Ethernet, it is not qualified to support 1000BASE-T. • Check that devices are cabled correctly • Check that cables are connected to their correct ports. • This is where having a neat and organized wiring closet saves you a great deal of time. • Verify proper interface configurations • Check that all switch ports are set in the correct VLAN and, speed, and duplex settings are correctly configured. • Check operational statistics and data error rates • Use Cisco show commands to check for statistics such as collisions and input and output errors.

Symptoms of Data Link Layer Problems • Common symptoms at the data link layer include: • No functionality or connectivity at network layer or above • Some Layer 2 problems can stop the frames across a link. • Network is operating below baseline performance levels • There are two types of suboptimal Layer 2 operation: • Frames take an illogical path to their destination but do arrive. An example of a problem which could cause frames to take a suboptimal path is a poorly designed Layer 2 spanning-tree. • Some frames are dropped. An extended or continuous ping also reveals if frames are being dropped. • Excessive broadcasts • Excessive broadcasts result from one of the following: • Poorly programmed or configured applications • Large Layer 2 broadcast domains • Underlying network problems, such as STP loops. • Console messages • In some instances, a router recognizes a Layer 2 problem has occurred and sends alert messages to the console. • The most common console is line protocol down message.

Causes of Data Link Layer Problems • Encapsulation errors • This condition occurs when the encapsulation at one end of a WAN link is configured differently from the encapsulation used at the other end. • Address mapping errors • When using static maps in Frame Relay, an incorrect map is a common mistake. • Simple configuration errors can result in a mismatch of Layer 2 and Layer 3 addressing information. • In a dynamic environment, the mapping of Layer 2 and Layer 3 information can fail for the following reasons: • Devices may have been specifically configured not to respond to ARP or Inverse-ARP requests. • The Layer 2 or Layer 3 information that is cached may have physically changed. • Invalid ARP replies are received because of a misconfiguration or a security attack.

Causes of Data Link Layer Problems Framing errors Frames usually work in groups of 8 bit bytes. A framing error occurs when a frame does not end on an 8-bit byte boundary. When this happens, the receiver may have problems determining where one frame ends and another frame starts. Framing errors can be caused by a noisy serial line, an improperly designed cable (too long), or an incorrectly configured CSU line clock. STP failures or loops Most STP problems revolve around these issues: Forwarding loops that occur when no port in a redundant topology is blocked and traffic is forwarded in circles indefinitely. Excessive flooding because of a high rate of STP topology changes. Slow STP convergence, which can be caused by a mismatch between the real and documented topology, a configuration error, such as an inconsistent configuration of STP timers.

Troubleshooting Layer 2 - PPP • Most of problems with PPP involve link negotiation. • The steps for troubleshooting PPP are as follows: • Step 1. Check that the appropriate encapsulation is in use at both ends, • Using show interfaces serial command. • In the figure, the output reveals that R2 has been incorrectly configured to use HDLC encapsulation. • Step 2. Confirm that the Link Control Protocol (LCP) negotiations have succeeded by checking the output for the LCP Open message. • In the figure, the encapsulation on R2 has been changed to PPP. The output shows the LCP Open message and the LCP negotiations have succeeded. • Step 3. Verify authentication on both sides of the link using the debug ppp authentication command. • In the figure, the output of debug ppp authentication command shows that R1 is unable to authenticate R2 using CHAP, because the username and password have not been configured on R1.

Troubleshooting Layer 2 - Frame Relay • Step 1. Verify the physical connection between the CSU/DSU and the router. • Step 2. Verify that the router and Frame Relay provider are properly exchanging LMI by using the show frame-relay lmi command. • In the figure, the output of R2 shows no errors. This indicates that R2 and the Frame Relay switch are properly exchanging LMI information. • Step 3. Verify that the PVC status is active by using the show frame-relay pvc command. • In the figure, the output of R2 verifies that the PVC status is active. • Step 4. Verify that the Frame Relay encapsulation matches on both routers with the show interfaces serial command. • In the figure, the output of routers R2 and R3 shows that there is an encapsulation mismatch. • R3 has been incorrectly configured to use HDLC encapsulation instead of Frame Relay.

Troubleshooting Layer 2 - STP Loops • If you suspect an STP loop is causing a Layer 2 problem, verify if the STP is running on each of the switches. • Step 1. Identify that an STP loop is occurring. • When a forwarding loop has developed in the network, these are the usual symptoms: • Loss of connectivity to, from, and through the affected network • High CPU utilization on routers connected to affected VLANs • High link utilization (often 100 percent) • High switch backplane utilization • Syslog messages that indicate packet looping in the network (for example, HSRP duplicate IP address messages) • Syslog messages that indicate constant address relearning or MAC address flapping messages • Increasing number of output drops on many interfaces • Step 2. Discover the topology (scope) of the loop. • The highest priority is to stop the loop and restore network. • To stop the loop, you must know which ports are involved. Look at the ports with the highest link utilization. The show interface command displays the utilization for each interface.

Troubleshooting Layer 2 - STP Loops • Step 3. Break the loop. • Shut down or disconnect the involved ports one at a time. • After you disable or disconnect each port, check whether the switch backplane utilization is back to a normal level. • Document your findings. • Step 4. Find and fix the cause of the loop. • Investigate the topology diagram to find a redundant path. • For every switch on the redundant path, check these issues: • Does the switch know the correct STP root? • Is the root port identified correctly? • Are Bridge Protocol Data Units (BPDUs) received regularly on the root port and on ports that are supposed to be blocking? • Are BPDUs sent regularly on non-root, designated ports? • Step 5. Restore the redundancy. • After the device or link that is causing the loop has been found and the problem has been resolved, restore the redundant links that were disconnected. • http://cisco.com/en/US/tech/tk389/tk621/technologies_tech_note09186a0080136673.shtml#troubleshoot.

Symptoms of Network Layer Problems • Network layer problems include any problem that involves a Layer 3 protocol, both routed protocols and routing protocols. • This topic focuses primarily on IP routing protocols. • Problems at the network layer: • Network failure • The network is nearly or completely nonfunctional, affecting all users and applications using the network. • These failures are usually noticed quickly by users and network administrators, and are obviously critical to the productivity of a company. • Network optimization problems • usually involve a subset of users, applications, destinations, or a particular type of traffic. • Optimization issues in general can be more difficult to detect and even harder to isolate and diagnose because they usually involve multiple layers or even the host computer itself. • Determining that the problem is a network layer problem can take time.

Troubleshooting Layer 3 Problems • In most networks, static routes are used in combination with dynamic routing protocols. • Improper configuration of static routes can lead to less than optimal routing and, in some cases the network to become unreachable. • Here are some possible problems involving routing protocols: • General network issues • Often a change in the topology, such as a down link, may have affects on other areas that might not be obvious at the time. • Connectivity issues • Check for any equipment problems, cabling, and ISP problems. • Neighbor issues • Check if there are any problems with the routers forming neighbor. • Topology database • Check the topology table, for any missing or unexpected entries. • Routing table • Check the routing table for anything missing or unexpected routes. • Use debug commands to view routing updates and maintenance.

Transport Layer Troubleshooting: Access List Issues 1. Selection of traffic flow • ACL must be applied to the correct interface, and correct traffic direction must be selected to function properly. • If the router is running both ACLs and NAT, the order in which each of these technologies is applied is important: • Inbound traffic is processed by the inbound ACL before being processed by outside-to-inside NAT. • Outbound traffic is processed by the outbound ACL after being processed by inside-to-outside NAT. 2. Order of access control elements • The elements ACL should be from specific to general. 3. Implicit deny all • Forgetting about this implicit access control element may be the cause of an ACL misconfiguration. 4. Addresses and wildcard masks • Complex wildcard masks provide significant improvements in efficiency, but are more subject to configuration errors. • The address 10.0.32.0 and wildcard mask 0.0.32.15 to select the first 15 host addresses in either the 10.0.0.0 or 10.0.32.0 network.

Transport Layer Troubleshooting: Access List Issues 5. Selection of transport layer protocol • When configuring ACLs, it is important that only the correct transport layer protocols [TCP, UDP] be specified. 6. Source and destination ports • Address and port information for traffic generated by a replying host is the mirror address and port from the source host. 7. Use of the established keyword • If the keyword is applied to an outbound ACL, unexpected results may occur. 8. Uncommon protocols • Uncommon protocols that are gaining popularity are VPN and encryption protocols. • Troubleshooting Access Control Lists • A useful command for viewing ACL operation is the log keyword on ACL entries. • This keyword instructs the router to place an entry in the system log whenever that entry condition is matched.

Transport Layer Troubleshooting: NAT Issues • The biggest problem with all NAT technologies is interoperability with other network technologies: • BOOTP and DHCP - Because NAT requires both a valid destination and source IP address. • Configuring the IP helper feature can help solve this problem. • DNS and WINS – Because NAT is changing the relationship between inside and outside addresses. • Configuring the IP helper feature can help solve this problem. • SNMP - NAT is not able to alter the addressing information stored in the data payload of the packet. • Configuring the IP helper feature can help solve this problem. • Tunneling and encryption protocols - Encryption and tunneling protocols often require that traffic be sourced from a specific UDP or TCP port. • If encryption or tunneling protocols must be run through a NAT router, network administrator can create a static NAT entry for the required port for on the inside of the NAT router. • Improperly timers can also result in unexpected operation. • If NAT timers are too short, entries in the NAT table may expire before replies are received, so packets are discarded. • If timers are too long, entries may stay in the NAT table longer than necessary, consuming the available connection pool.

Network Troubleshooting