Thursday, May 26, 2011

Chapter 03: Using Maintenance and Troubleshooting Tools and Applications (Part01)

Troubleshooting can be a time-consuming process. While the network is down, productivity and revenue are lost, and reputations can be ruined. Tools that enable you to diagnose and resolve problems quickly recoup their acquisition and maintenance costs. Some diagnostic tools are built in to Cisco IOS Software, and therefore learning and optimizing the use of those tools should be a top priority for any engineer that performs troubleshooting. Furthermore, Cisco IOS Software supports many technologies and protocols that can be used in combination with other specialized tools and applications to support troubleshooting and maintenance processes such as fault notification and baseline creation. This chapter reviews the built-in Cisco IOS tools and commands and specialized tools and applications.

Add a note here Using Cisco IOS Software for Maintenance and Troubleshooting

Add a note hereAs covered previously in Chapter 2, “Troubleshooting Processes for Complex Enterprise Networks,” much of the total time spent on troubleshooting processes is usually spent on the information-gathering stage. One of the challenges during this process is how to gather only the relevant information. Collecting and processing a lot of irrelevant information is distracting and a waste of time. Learning how to efficiently and effectively apply the basic tools that support the elementary diagnostic processes that you repeatedly exercise is worthwhile. Learning the Cisco IOS show commands used for collecting and filtering information and the commands used to test connectivity problems is vital to the support staff’s troubleshooting strength. Other relevant and beneficial skills are collecting real-time information using Cisco IOS debug commands and diagnosing basic hardware-related problems.

Add a note here Collecting and Filtering Information Using Cisco IOS show Commands

Add a note here You must learn how to apply filtering to Cisco IOS show commands to optimize your information gathering. During troubleshooting, you are often looking for specific information. For example, you might be looking for a particular prefix in the routing table, or you might want to verify whether a specific MAC address has been learned on an interface. Sometimes you need to find out the percentage of CPU time that is being used by a process such as the IP Input process. Using the show ip route command and the show mac-address-table command, you can display the IP routing table and the MAC address table, and using the show processes cpu command, you can check the CPU utilization for all processes on a Cisco router or switch. However, because the routing table and MAC address table can contain thousands to tens of thousands of entries, scanning through these tables to find a particular entry is neither viable nor realistic. Also, if you cannot find the entry that you are searching for, does it really mean that it is not in the table or that you simply did not spot it? Repeating the command and not seeing what you are looking for again still does not guarantee that you did not simply miss it. The list of processes on a router or switch is not hundreds or thousands of entries long; you could indeed just look through the full list and find a single process such as the IP Input process. But if you want to repeat the command every minute to see how the CPU usage for the IP Input process changes over time, displaying the whole table might not be desirable. In all these cases, you are interested in only a small subset of the information that the commands can provide. Cisco IOS Software provides options to limit or filter the output that displays.

Add a note hereTo limit the output of the show ip route command, you can optionally enter a specific IP address on the command line. Doing so causes the router to execute a routing table lookup for that specific IP address and see whether it finds a match. If the router finds a match in the routing table, it displays the corresponding entry with all its details. If the router does not find a match in the routing table, it displays the % Subnet not in table message (see Example 3-1). Keep in mind that if gateway of last resort (default route) is present in the IP routing table, but no entry matches the IP address you entered, the router again responds with the % Subnet not in table message even though packets for that destination are forwarded using the gateway of last resort.

Add a note here Example 3-1: Filtering Output of the show ip route Command

Add a note hereRO1# show ip route 10.1.193.3
Routing entry for 10.1.193.0/30
Known via "connected", distance 0, metric 0 (connected, via interface)
Redistributing via eigrp 1
Routing Descriptor Blocks:
* directly connected, via Serial0/0/1
Route metric is 0, traffic share count is 1

RO1# show ip route 10.1.193.10
% Subnet not in table

Add a note here Another option to limit the output of the show ip route command to a particular subset of routing information that you are interested in is typing a prefix followed by the optional longer-prefixes keyword, as demonstrated in Example 3-2. The router will then list all subnets that fall within the prefix that you have specified (including that prefix itself, if it is present in the routing table). If the network that you are troubleshooting has a good hierarchical IP numbering plan, the longer-prefixes command option can prove useful for displaying addresses from a particular part of the network. You can display all subnets of a particular branch office or data center, for example, using the summary address for these blocks and the longer–prefixes keyword.

Add a note here Example 3-2: Using the longer-prefixes Keyword with show ip route

Add a note hereCRO1# show ip route 10.1.193.0 255.255.255.0 longer-prefixes
Codes—C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static route
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 46 subnets, 6 masks
C 10.1.193.2/32 is directly connected, Serial0/0/1
C 10.1.193.0/30 is directly connected, Serial0/0/1
D 10.1.193.6/32 [90/20517120] via 10.1.192.9, 2d01h, FastEthernet0/1
[90/20517120] via 10.1.192.1, 2d01h, FastEthernet0/0
D 10.1.193.4/30 [90/20517120] via 10.1.192.9, 2d01h, FastEthernet0/1
[90/20517120] via 10.1.192.1, 2d01h, FastEthernet0/0
D 10.1.193.5/32 [90/41024000] via 10.1.194.6, 2d01h, Serial0/0/0.122

Add a note hereUnfortunately, show commands do not always have the option that allows you filter the output down to exactly what you need. You can still perform a more generic way of filtering. The output of Cisco IOS show commands can be filtered by appending a pipe character (|) to the show command followed by one of the keywords include, exclude, or begin, and then a regular expression. Regular expressions are patterns that can be used to match strings in a piece of text. In its simplest form, you can use it to match words or text fragments in a line of text, but full use of the regular expression syntax allow you to build complex expressions that match specific text patterns. Example 3-3 shows usage of the include, exclude, and begin keywords with the show processes cpu, show ip interface brief, and the show running-config commands correspondingly.

Add a note here Example 3-3: Using include, exclude, and begin Keywords with show Commands

Add a note hereRO1# show processes cpu | include IP Input
71 3149172 7922812 397 0.24% 0.15% 0.05% 0 IP Input

SW1# show ip interface brief | exclude unassigned
Interface IP-Address OK? Method Status Protocol
Vlan128 10.1.156.1 YES NVRAM up up

SW1# show running-config | begin line vty
line vty 0 4
transport input telnet ssh
line vty 5 15
transport input telnet ssh
!
end

Add a note hereIn Example 3-3 you are only interested in the IP Input process in the output of the show processes cpu command, so you select only the lines that contain the string “IP Input” by using the command show processes cpu | include IP Input.

Add a note hereYou can exclude lines from the output through use of the | exclude option, which, for example, can be useful on a switch where you are trying to obtain all of the IP addresses on the interfaces with the show ip interface brief command. On a switch that has many interfaces (ports), the output of this command will also list all the interfaces that have no IP address assigned. If you are looking for the interfaces that have an IP address only, these lines obscure the output. If you know that all interfaces without an IP address have the string “unassigned” in place of the IP address, as you can see in Example 3-3, you can exclude those lines from the output by issuing the command show ip interface brief | exclude unassigned.

Add a note hereFinally, using | begin allows you to skip all command output up to the first occurrence of the regular expression pattern. In Example 3-3, you are only interested in checking the configuration for the vty lines and you know that the vty configuration commands are at the bottom of the router’s running configuration file. So, you jump straight to the vty configuration point by issuing the command show running-config | begin line vty.

Add a note hereCisco IOS Software Release (12.3(2)T) introduced the section option, which allows you to select and display a specific section or lines from the configuration that match a particular regular expression and any following associated lines. For example, Example 3-4 demonstrates using the command show running-config | section router eigrp to display the EIGRP configuration section only.

Add a note here Example 3-4: Using the | section and ^ Options to Filter Output of show Commands

Add a note hereRO1# show running-config | section router eigrp
router eigrp 1
network 10.1.192.2 0.0.0.0
network 10.1.192.10 0.0.0.0
network 10.1.193.1 0.0.0.0
no auto-summary

RO1# show processes cpu | include ^CPU|IP Input
CPU utilization for five seconds: 1%/0%; one minute: 1%; five minutes: 1%

71 3149424 7923898 397 0.24% 0.04% 0.00% 0 IP Input

Add a note here If you used show running-config | section router however, all lines that include the expression router and the configuration section that follows that line would be displayed. In other words, all routing protocol configuration sections would be displayed and the rest of the configuration wouldn’t. This makes | section more restrictive than the | begin option, but more useful than the | include option when you want to select sections instead of only lines that contain a specific expression. Although, the show running-config command is the most obvious candidate for the use of the | section option, this option can also be applied to any show command that separates its output in sections. For example, if you want to display only the standard access lists in the output of the show access-lists command, you could achieve that by issuing the command show access-lists | section standard.

Add a note hereThe include, exclude, begin, and section options are usually followed by just a word or text fragment, but it is possible to use regular expressions for more granular filtering. For example, the second command used in Example 3-4 uses the caret (^) character, which is used to denote that a particular string will be matched only if it occurs at the beginning of a line. The expression ^CPU will therefore only match lines that start with the characters “CPU” and not any line that contains the string “CPU”. The same line uses the pipe character (|) (without a preceding and following space) as part of a regular expression to signify a logical OR. As a result, the show processes cpu | include ^CPU|IP Input command displays only the lines that start with the string “CPU” or contain the string “IP Input”.

Add a note hereOther useful options that can be used with the pipe character after the show command are redirect, tee, and append. The output of a show command can be redirected, copied or appended to a file by using the pipe character, followed by the options redirect, tee, or append and a URL that denotes the file. Example 3-5 depicts sample usage of these options with the show tech-support, show ip interface brief and the show version commands.

Add a note here Example 3-5: Using the redirect, append, and tee options with show Commands

Add a note hereRO1# show tech-support | redirect tftp://192.168.37.2/show-tech.txt
! The redirect option does not display the output on the console
RO1# show ip interface brief | tee flash:show-int-brief.txt
! The tee option displays the output on the console and send it to the file
Interface IP-Address OK? Method Status
Protocol
FastEthernet0/0 10.1.192.2 YES manual up up
FastEthernet0/1 10.1.192.10 YES manual up up
Loopback0 10.1.220.1 YES manual up up

RO1# dir flash:
Directory of flash:/
1 -rw- 23361156 Mar 2 2009 16:25:54 -08:00 c1841-
advipservicesk9mz.1243.bin
2 -rw- 680 Mar 7 2009 02:16:56 -08:00 show-int-brief.txt

RO1# show version | append flash:show-commands.txt
RO1# show ip interface brief | append flash:show-commands.txt
! The append option allows you to add the command output to an existing file
RO1# more flash:show-commands.txt
Cisco IOS Software, 1841 Software (C1841-ADVIPSERVICESK9-M), Version 12.4(23),
RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2008 by Cisco Systems, Inc.
Compiled Sat 08-Nov-08 20:07 by prod_rel_team
ROM: System Bootstrap, Version 12.3(8r)T9, RELEASE SOFTWARE (fc1)
RO1 uptime is 3 days, 1 hour, 22 minutes
<...output omitted...>
Interface IP-Address OK? Method Status
Protocol
FastEthernet0/0 10.1.192.2 YES manual up up
FastEthernet0/1 10.1.192.10 YES manual up up
Loopback0 10.1.220.1 YES manual up up

Add a note here When you use the | redirect option on a show command, the output will not display on the screen, but will be redirected to a text file instead. This file can be stored locally on the device’s flash memory or it can be stored on a network server such as a TFTP or FTP server. The | tee option is similar to the | redirect option, but this command both displays the output on your screen and copies it to a text file. Finally, the | append option is analogous to the | redirect option, but it allows you to append the output to a file instead of replacing that file. The use of this command option makes it easy to collect the output of several show commands in a text file. A prerequisite for this option is that the file system that you are writing to must support “append” operations; so for instance, a TFTP server cannot be used in this case.

Add a note here Testing Network Connectivity Using ping and Telnet

Add a note here The ping utility is a popular network connectivity testing tool that has been part of Cisco IOS Software since the first version of IOS. The ping utility has some extended options that are useful for testing specific conditions, including the following:

  • Add a note here repeat repeat-count: By default, he Cisco IOS ping command sends out five ICMP echo-request packets. The repeat option allows you to specify how many echo-request packets are sent. This proves particularly useful when you are troubleshooting a packet-loss situation. The repeat option enables you to send out hundreds to thousands of packets to help pinpoint a pattern in the occurrence of the packet loss. For example, if you see a pattern where every other packet is lost, resulting in exactly 50 percent packet loss, you might have a load-balancing situation with packet loss on one path.

  • Add a note here size datagram-size: This option allows you to specify the total size of the ping packet (including headers) in bytes that will be sent. In combination with the repeat option, you can send a steady stream of large packets and generate some load. The quickest way to generate a heavy load using the ping command is to combine a very large repeat number, a size set to 1500 bytes, and the timeout option set to 0 seconds. When used with the Don’t Fragment (df-bit) option (discussed after Example 3-6), the size option allows you to determine the maximum transmission unit (MTU) along the path to a particular destination IP address.

  • Add a note here source [address | interface]: This option allows you to set the source IP address or interface of the ping packet. The IP address has to be one of the local device’s own IP addresses. If this option is not used, the router will select the IP address of the egress interface as the source of the ping packets.

  • Add a note here Example 3-6 shows a case where a simple ping succeeds, but the ping with the source IP address set to the IP address of the FastEthernet 0/0 interface fails. You can conclude from the successful initial ping that the local router has a working path to the destination IP address 10.1.156.1. For the second ping, because a different source address is used, the return packets will have a different destination IP address. The most likely explanation for the failure of the second ping is that at least one of the routers on the return path does not have a route to the address/subnet of the FastEthernet 0/0 interface (used as source in the second ping). There might be several other reasons for this, too. For example, an access list on one of the transit routers might be blocking the IP address of the Fa 0/0 interface. Specifying the source IP address or interface proves useful when you want to check two-way reachability to/from a network/address other than the router’s egress interface’s IP address/network.

    Add a note here Example 3-6: ping Extended Option: Source

    Add a note here


    Add a note hereRO1# ping 10.1.156.1
    Type escape sequence to abort.
    Sending 5, 100-byte ICMP Echos to 10.1.156.1, timeout is 2 seconds:
    !!!!!
    Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms

    RO1# ping 10.1.156.1 source FastEthernet 0/0
    Type escape sequence to abort.
    Sending 5, 100-byte ICMP Echos to 10.1.156.1, timeout is 2 seconds:
    Packet sent with a source address of 10.1.192.2
    .....
    Success rate is 0 percent (0/5)

  • Add a note here df-bit: This option sets the Don’t Fragment bit in the IP header to indicate that routers should not fragment this packet. If it is larger than the MTU of the outbound interface, the router should drop the packet and send an ICMP Fragmentation needed and DF bit set message back to the source. This option can be very useful when you are troubleshooting MTU-related problems. By setting the df-bit option and combining it with the size option, you can force routers along the path to drop the packets if they would have to fragment them. By varying the size and looking at which point the packets start being dropped, you can determine the MTU.

Add a note here Example 3-7 shows successful ping results when packet size of 1476 bytes is used; however, ping packets with a size of 1477 bytes are not successful. The M in the output of the ping command signifies that an ICMP Fragmentation needed and DF bit set message was received. From this, you can conclude that somewhere along the path to the destination there must be a host that has an MTU of 1476 bytes. A possible explanation for this could be usage of a generic routing encapsulation (GRE) tunnel, which typically has an MTU of 1476 bytes (1500 bytes default MTU minus 24 bytes for the GRE and IP headers).

Add a note here Example 3-7: ping Extended Option: df-bit

Add a note hereRO1# ping 10.1.221.1 size 1476 df-bit
Type escape sequence to abort.
Sending 5, 1476-byte ICMP Echos to 10.1.221.1, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 184/189/193 ms

RO1# ping 10.1.221.1 size 1477 df-bit
Type escape sequence to abort.
Sending 5, 1477-byte ICMP Echos to 10.1.221.1, timeout is 2 seconds:
Packet sent with the DF bit set
M.M.M
Success rate is 0 percent (0/5)

Add a note here There are more extended options available for ping through the interactive dialog. If you type ping without any additional options and press Enter, you will be prompted with a series of questions regarding the source and destination and all the ping options. In Example 3-8, the Sweep range of sizes option is highlighted. This option allows you to send a series of packets that increase in size and can be useful to determine the MTU along a path, similar to the previous example.

Add a note here Example 3-8: ping Option: Sweep Range of Sizes

Add a note hereRO1# ping
Protocol [ip]:
Target IP address: 10.1.221.1
Repeat count [5]: 1
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface:
Type of service [0]:
Set DF bit in IP header? [no]: yes
Validate reply data? [no]:
Data pattern [0xABCD]:
Loose, Strict, Record, Time stamp, Verbose[none]:
Sweep range of sizes [n]: y
Sweep min size [36]: 1400
Sweep max size [18024]: 1500
Sweep interval [1]:
Type escape sequence to abort.
Sending 101, [1400..1500]-byte ICMP Echos to 10.1.221.1, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!M.M.M.M.M.M.M.M.M.M.M.M.
Success rate is 76 percent (77/101), round-trip min/avg/max = 176/184/193 ms

Add a note hereWhen you want to determine the MTU of a particular path, a lot of times you do not really have a good initial guess, and it might take you many tries to find the exact MTU. In Example 3-8, the router is instructed to send packets starting at a size of 1400 bytes, sending a single packet per size and increasing the size one byte at a time until a size of 1500 bytes is reached. Again, the DF bit is set on the packets. The result is that the router sent out 101 consecutive packets, the first one was 1400 bytes, the last one was 1500 bytes, 77 of the pings were successful, and 24 failed. Again, this means that there must be a link along the path that has an MTU of 1476 bytes.


Note

Add a note hereBecause some applications cannot reassemble fragmented packets, if the network fragments that application’s packet, the application will fail. Sometimes by discovering the MTU of a path, the application can be configured to not send packets larger than the MTU so that fragmentation does not happen. That is why it is sometimes necessary to find out the MTU of a path.


Note

Add a note hereThe various symbols generated in ping output are described as follows:

Add a note here !:

Add a note hereEach exclamation point indicates receipt of a reply.

Add a note here .:

Add a note hereEach period indicates the network server timed out while waiting for a reply.

Add a note here U:

Add a note hereA destination unreachable error PDU was received.

Add a note here Q:

Add a note hereSource quench (destination too busy).

Add a note here M:

Add a note hereCould not fragment.

Add a note here ?:

Add a note hereUnknown packet type.

Add a note here &:

Add a note herePacket lifetime exceeded

Add a note hereTelnet is an excellent companion to ping for testing transport layer connections from the command line. Assume that you are troubleshooting a problem where someone has trouble sending e-mail through a particular SMTP server. Taking a divide-and-conquer approach, you ping the server, and it is successful. This means that the network layer between your device and the server is operational. Now you have to investigate the transport layer. You could configure a client and start a top-down troubleshooting procedure, but it is more convenient if you first establish whether Layer 4 is operational. The Telnet protocol can prove useful in this situation. If you want to determine whether a particular TCP-based application is active on a server, you can attempt a Telnet connection to the TCP port of that application. In Example 3-9, a Telnet connection to port 80 (HTTP) on a server shows success, and a Telnet connection to port 25 (SMTP) is unsuccessful.

Add a note here Example 3-9: Using Telnet to Test the Transport Layer

Add a note hereRO1# telnet 192.168.37.2 80
Trying 192.168.37.2, 80 ... Open
GET

It works!

[Connection to 192.168.37.2 closed by foreign host]


RO1# telnet 192.168.37.2 25
Trying 192.168.37.2, 25 ...
% Connection refused by remote host

Add a note here Even though the Telnet server application uses the TCP well-known port number 23 and Telnet clients connect to that port by default, you can specify a specific port number on the client and connect to any TCP port that you want to test. The connection is either accepted (as indicated by the word Open in Example 3-9), or it is refused, or times out. The Open response indicates that the port (application) you attempted is active, and the other results require further investigation. For applications that use an ASCII-based session protocol, you might even see an application banner or you might be able to trigger some responses from the server by typing in some keywords (as in Example 3-9). Good examples of these types of protocols are SMTP, FTP, and HTTP.

Add a note here Collecting Real-time Information Using Cisco IOS debug Commands

Add a note hereFirst, it is important to caution readers that because debugging output is assigned high priority in the CPU process, it can render the system unusable. For this reason, use debug commands only to troubleshoot specific problems or during troubleshooting sessions with Cisco technical support staff. Moreover, it is best to use debug commands during periods of lower network traffic and fewer users.

Add a note hereAll debug commands are entered in privileged EXEC mode, and most debug commands take no arguments. All debug commands can be turned off by retyping the command and preceding it with a no. To display the state of each debugging option, enter the show debugging command in the Cisco IOS privileged EXEC mode. The no debug all command turns off all diagnostic output. Using the no debug all command is a convenient way to ensure that you have not accidentally left any debug commands turned on. To list and see a brief description of all the debugging command options, enter the debug? command. Because there are numerous useful Cisco IOS debug commands, only two of debug ip options are discussed here.

debug ip packet [access-list-number][detail]

Add a note hereTo display general IP debugging information and IP security option (IPSO) security transactions, use the debug ip packet command. The option to use an access list with the debug ip packet command enables you to limit the scope of the debug ip packet command to those packets that match the access list. The detail option of this debug displays detailed IP packet-debugging information. This information includes the packet types and codes and source and destination port numbers.

Add a note hereIf a communication session is closing when it should not be, an end-to-end connection problem may be the cause. The debug ip packet command is useful for analyzing the messages traveling between the local and remote hosts. IP packet debugging captures the packets that are process switched including received, generated, and forwarded packets. Example 3-10 shows sample output from the debug ip packet command.

Add a note here Example 3-10: debug ip packet Sample Output

Add a note hereIP: s=172.69.13.44 (Fddi0), d=10.125.254.1 (Serial2), g=172.69.16.2, forward
IP: s=172.69.1.57 (Ethernet4), d=10.36.125.2 (Serial2), g=172.69.16.2, forward
IP: s=172.69.1.6 (Ethernet4), d=255.255.255.255, rcvd 2
IP: s=172.69.1.55 (Ethernet4), d=172.69.2.42 (Fddi0), g=172.69.13.6, forward
IP: s=172.69.89.33 (Ethernet2), d=10.130.2.156 (Serial2), g=172.69.16.2, forward
IP: s=172.69.1.27 (Ethernet4), d=172.69.43.126 (Fddi1), g=172.69.23.5, forward
IP: s=172.69.1.27 (Ethernet4), d=172.69.43.126 (Fddi0), g=172.69.13.6, forward
IP: s=172.69.20.32 (Ethernet2), d=255.255.255.255, rcvd 2
IP: s=172.69.1.57 (Ethernet4), d=10.36.125.2 (Serial2), g=172.69.16.2, access denied

debug ip rip

Add a note hereTo display information on Routing Information Protocol (RIP) routing transactions, use the debug ip rip command. Example 3-11 shows a sample output from the debug ip rip command.

Add a note here Example 3-11: debug ip rip Sample Output

Add a note hereRIP: received v2 update from 10.1.1.2 on Serial0/0/0
30.0.0.0/8 via 0.0.0.0 in 1 hops
RIP: sending v2 update to 224.0.0.9 via FastEthernet0/0 (20.1.1.1)
RIP: build update entries
10.0.0.0/8 via 0.0.0.0, metric 1, tag 0
30.0.0.0/8 via 0.0.0.0, metric 2, tag 0
RIP: sending v2 update to 224.0.0.9 via Serial0/0/0 (10.1.1.1)
RIP: build update entries
20.0.0.0/8 via 0.0.0.0, metric 1, tag 0
RIP: received v2 update from 10.1.1.2 on Serial0/0/0
30.0.0.0/8 via 0.0.0.0 in 1 hops
RIP: sending v2 update to 224.0.0.9 via FastEthernet0/0 (20.1.1.1)

Add a note here Example 3-11 shows that the router being debugged has received a RIPv2 update from a router with the address 10.1.2.2. That router sent an update about network 30.0.0.0/8 being one hop away from it. If a destination is reported as more than 15 hops away, it is considered inaccessible. The router being debugged also sent updates to the multicast address 224.0.0.9, as opposed to RIP, which sends updates to the broadcast address 255.255.255.255.

Add a note here Diagnosing Hardware Issues Using Cisco IOS Commands

Add a note hereThe three main categories of failure causes in a network are as follows: hardware failures, software failures (bugs), and configuration errors. One could argue that performance problems form a fourth category, but performance problems are symptoms rather than failure causes. Having a performance problem means that there is a difference between the expected behavior and the observed behavior of a system. Sometimes the system is functioning as it should, but the results are not what were expected or promised. In this case, the problem is not technical, but organizational in nature and cannot be resolved through technical means. On the other hand, there are situations where the system is not functioning as it should. In this case, the system behaves differently than expected, but the underlying cause is a hardware failure, a software failure, or a configuration error. The focus here is on diagnosing and resolving configuration errors. There are a number of reasons for this focus. Hardware and software can really be swapped out only if they are suspected to be the cause of the problem, so the actions that can be taken to resolve the problem are limited.

Add a note hereThe detailed information necessary to pinpoint a specific hardware or software problem is often not publicly available, and therefore hardware and software troubleshooting are processes that are generally executed as a joint effort with a vendor (or a reseller or partner for that vendor). Documentation of the configuration and operation of software features is generally publicly available, and therefore configuration problems can often be diagnosed without the need for direct assistance from the vendor or reseller. However, even if you decide to focus your troubleshooting effort on configuration errors initially, as your work progresses and you eliminate common configuration problems from the equation, you might pick up clues that hardware components are the root cause of the problem. You will then need to do an initial analysis and diagnosis of the problem, before it is escalated to the vendor. The move the problem method is an obvious candidate to approach suspected hardware problems, but this method works well only if the problem is strictly due to a broken piece of hardware. Performance problems that might be caused by hardware failures generally require a more subtle approach and require more detailed information gathering. When hardware problems are intermittent, they are harder to diagnose and isolate.

Add a note hereDue to its nature, diagnosing hardware problems is highly product and platform dependent. However, you can use a number of generic commands to diagnose performance-related hardware issues on all Cisco IOS platforms. Essentially, a network device is a specialized computer, with a CPU, RAM, and storage, to say the least. This allows the network devise to boot and run the operating system. Next, interfaces are initialized and started, which allows for reception and transmission of network traffic. Therefore, when you decide that a problem you are observing on a given device may be hardware related, it is important that you verify the operation of these generic components. The most commonly used Cisco IOS commands used for this purpose are the show processes cpu, show memory, and show interface commands, as covered in the sections that follow.

Checking CPU Utilization

Add a note hereBoth routers and switches have a main CPU that executes the processes that constitute Cisco IOS Software. Processes are scheduled to share the available CPU cycles and take turns executing their code. The show processes cpu command provides you with an overview of all processes currently running on the router, including a display of the total CPU time that the processes have consumed over their lifetime; plus their CPU usage over the last 5 seconds, 1 minute, and 5 minutes. The first line of output from the show processes cpu command displays the percentages of the CPU cycles. From this information, you can see whether the total CPU usage is high or low and which processes might be causing the CPU load. By default, the processes are sorted by process ID, but they can be sorted based on the 5-second, 1-minute, and 5-minute averages. Figure 3-1 shows a sample output of the show processes cpu command entered with the 1-minute sort option.

Click to collapse
Add a note hereFigure 3-1: The show processes cpu Command Output Example

Add a note hereThe example depicted in Figure 3-1 shows that over the past minute 31 percent of the available CPU has been used and the “SSH Process” was responsible for roughly half of these CPU cycles (15.67 percent) over that period. However, the next process in this sorted list is the “Check heaps” process, which has consumed only 0.78 percent of the total available CPU time over the last minute and the list quickly drops off after that. You might wonder what the remaining 15 percent CPU cycles recorded over the last minute were spent on. On the router used to generate the output depicted in Figure 3-1, the same CPU that is used to run the operating system processes is also responsible for packet switching. The CPU is interrupted to suspend the current process that it is executing, switch one or more packets, and resume the execution of scheduled processes. The CPU time spent on interrupt-driven tasks can be calculated by adding the CPU percentages for all processes and then subtracting that total from the total CPU percentage listed at the top. For the 5-second CPU usage, this figure is actually even listed separately behind the slash. This means that in the example shown in Figure 3-1, 30 percent of the total available CPU cycles over the past 5 seconds were used, out of which 26 percent were spent in interrupt mode and 4 percent for the execution of scheduled processes.

Add a note hereBecause of this, it is quite normal for routers to be running at high CPU loads during peaks in network traffic. In those cases, most of the CPU cycles will be consumed in interrupt mode. If particular processes consistently use large chunks of the available CPU time, however, this could be a clue that a problem exists associated with that particular process. However, to be able to draw any definitive conclusions, you need to have a baseline of the CPU usage over time. Keep in mind that the better caching mechanisms reduce the number of CPU interrupts and, consequently, the CPU utilization attributable to interrupts. For example, Cisco Express Forwarding (CEF) in distributed mode allows most packet switching to happen on the line card without causing any CPU interrupts.

Add a note hereOn LAN switches, the essential elements of the show processes cpu command output are the same as routers, but the interpretation of the numbers tends to be a bit different. Switches have specialized hardware that handle the switching task, so the main CPU should in general not be involved in this. When you see a high percentage of the CPU time being spent in interrupt mode, this usually indicates that the forwarded traffic is being forwarded in software instead of by the ternary content-addressable memory (TCAM). Punted traffic is the traffic that is processed and forwarded through less-efficient means for a reason, such as tunneling or encryption. Once you have determined that the CPU load is abnormally high and you decide to investigate further, you generally have to resort to platform-specific troubleshooting commands to gain more insight into what is happening.

Checking Memory Utilization

Add a note hereSimilar to CPU cycles, memory is a finite resource shared by the various processes that togetheform the Cisco IOS operating system. Memory is divided into different pools and used for different purposes: the processor pool contains memory that can be used by the scheduled processes, and the I/O pool is used to temporarily buffer packets during packet switching. Processes allocate and release memory, as needed, from the processor pool, and generally there is more than enough of free memory for all the processes to share. Example 3-12 shows sample output from the show memory command. In this example, the processor memory is shown on the first line, and the I/O memory is shown on the second line. Each row shows the total memory available, used memory, and free memory. The least amount of free memory and the most amount of free memory over the measurement interval (device dependent, but usually 5 minutes) are also displayed at each row.

Add a note here Example 3-12: show memory Command Output

Add a note hereRO1# show memory
Head Total(b) Used(b) Free(b) Lowest(b)
Largest(b)
Processor 820B1DB4 26534476 19686964 6847512 6288260 6712884
I/O 3A00000 6291456 3702900 2588556 2511168 2577468

Add a note hereTypically, the memory on routers and switches is more than enough to do what they were designed for. However, in particular deployment scenarios, for example if you decide to run Border Gateway Protocol (BGP) on your router and carry the full Internet routing table, you might need more memory than the typical amount recommended for the router. Also, whenever you decide to upgrade Cisco IOS Software on your router, you should verify the recommended amount of memory for the new software version.

Add a note hereAs with CPU usage, it is useful to create a baseline of the memory usage on your routers and switches and graph the utilization over time. You should monitor memory utilization over time and be able to anticipate when your devices need memory upgrade or a complete system upgrade. If a router or switch does not have enough free memory to satisfy the request of a process, it will log a memory allocation failure, signified by a %SYS-2-MALLOCFAIL message. The result of this is that the process cannot get the memory that it requires, and this might result in unpredictable disruptions or failures. Apart from the processes using up the memory through normal use, there is a possibility for memory leak. Caused by a software defect, a process that does not properly release memory (making memory to “leak” away) eventually leads to memory exhaustion and memory-allocation failures. Creating a baseline and graphing memory usage over time allows us monitor for these types of failures, too.

Checking Interfaces

Add a note hereChecking the performance of the device interfaces while troubleshooting, especially while hardware faults are suspected, is as important as checking your device’s CPU and memory utilization. The show interfaces command is a valuable Cisco IOS troubleshooting command. Example 3-13 shows sample output from the show interfaces command for a FastEthernet interface.

Add a note here Example 3-13: show interfaces Command Output

Add a note hereRO1# show interfaces FastEthernet 0/0
FastEthernet0/0 is up, line protocol is up
<...output omitted...>
Last input 00:00:00, output 00:00:01, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/1120/0 (size/max/drops/flushes); Total output drops: 0

Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 2000 bits/sec, 3 packets/sec
5 minute output rate 0 bits/sec, 1 packets/sec
110834589 packets input, 1698341767 bytes

Received 61734527 broadcasts, 0 runts, 0 giants, 565 throttles
30 input errors, 5 CRC, 1 frame, 0 overrun, 25 ignored

0 watchdog
0 input packets with dribble condition detected
35616938 packets output, 526385834 bytes, 0 underruns
0 output errors, 0 collisions, 1 interface resets

0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier
0 output buffer failures, 0 output buffers swapped out

Add a note here The output of this command lists a number of key statistics, which are briefly described as follows:

  • Add a note here Input queue drops: Input queue drops (and the related ignored and throttle counters) signify that at some point more traffic was delivered to the router than it could process. This does not necessarily indicate a problem, as it could be normal during traffic peaks. However, it may indicate that the CPU cannot process packets in time. If this number is consistently high and the dropped packets are causing application failures, the reasons must be detected and resolved.

  • Add a note here Output queue drops: Input packet drops indicate congestion on the interface. Seeing output drops is normal when the aggregate input traffic rate is higher than the output traffic rate on an interface. However, even if this is considered normal behavior, it leads to packet drops and queuing delays. Applications that are sensitive to delay and packet loss, such as Voice over IP, will have serious quality issues in those situations. This counter is a good indicator that you need to implement a congestion management mechanism to provide good quality of service (QoS) to your applications.

  • Add a note here Input errors: This counter indicates the number of errors such as cyclic redundancy check (CRC) errors, experienced during reception of frames. High numbers of CRC errors could indicate cabling problems, interface hardware problems, or in an Ethernet-based network, duplex mismatches.

  • Add a note here Output errors: This counter indicates the number of errors, such as collisions, during the transmission of frames. In most Ethernet-based networks today, full-duplex transmission is the norm, and half-duplex is the exception. In full-duplex operation, collisions cannot occur, and therefore collisions, and especially late collisions, often indicate duplex mismatches.

Add a note hereThe absolute number of drops or errors in the output of the show interfaces command is not very significant. The error counters should be evaluated against the total number of input and output packets. For example, a total of 25 CRC errors in relation to 123 input packets is reason for concern, whereas 25 CRC errors for 1,458,349 packets is not a problem at all. Furthermore, note that these counters accumulate from the time the router boots, so the numbers displayed on the output might be accumulated over months. Therefore, it is difficult to diagnose a problem that has been happening over 2 days based on these statistics. After you have decided that you need to investigate the interface counters in more detail, it is good practice to reset the interface counters by using the clear counters command, let it accumulate statistics for a specific period, and then reevaluate the outcome. If you repeatedly want to display selected statistics to see how the counters are increasing, it is useful to filter the output. Using a regular expression to include only the lines in which you are interested can prove quite useful in this case. In Example 3-14, the output is limited to the lines that start with the word Fast, include the word errors or include the word packets.

Add a note here Example 3-14: Filtering the Output of the show interfaces Command

Add a note hereRO1# show interfaces FastEthernet 0/0 | include ^Fast|errors|packets
FastEthernet0/0 is up, line protocol is up
5 minute input rate 3000 bits/sec, 5 packets/sec
5 minute output rate 2000 bits/sec, 1 packets/sec
2548 packets input, 257209 bytes
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 input packets with dribble condition detected
610 packets output, 73509 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets

Add a note hereThe show processes cpu, show memory, and show interfaces commands form a limited toolkit of hardware troubleshooting commands, but they are a good starting point to collect some initial clues to either confirm that the problem may be hardware related or to eliminate hardware problems from the list of potential problem causes. Once you have decided that the cause of the problem might be hardware related, you should research the more specific hardware troubleshooting tools that are available for the platform that you are working with. Many additional hardware troubleshooting features and commands are supported in the Cisco IOS Software, including the following:

  • Add a note here show controllers: The output of this command varies based on interface hardware type. In general, this command provides more detailed packet and error statistics for each type of hardware and interface.

  • Add a note here show platform: On many of Cisco LAN switches, this command can be used to examine the TCAM and other specialized switch hardware components.

  • Add a note here show inventory: This command lists the hardware components of a router or switch. The output includes the product code and serial number for each component. This is very useful for documenting your device and for ordering replacement or spare parts.

  • Add a note here show diag: On routers, you can use this command to gather even more detailed information about the hardware than the output provided by the show inventory command. For example, the output of this command includes the hardware revision of the individual components. In case of known hardware issues, this command can be used to determine whether the component is susceptible to a particular hardware fault.

  • Add a note here Generic Online Diagnostics (GOLD): GOLD is a platform-independent framework for runtime diagnostics. It includes command-line interface (CLI)-based access to boot and health monitoring, plus on-demand and scheduled diagnostics. GOLD is available on many of the mid-range and high-end Catalyst LAN switches and high-end routers such as the 7600 series and CRS-1 routers.

  • Add a note here Time Domain Reflectometer. Some of the Catalyst LAN switches support the TDR feature. This feature enables you to detect cabling problems such as open or shorted UTP wire pairs.



No comments:

Post a Comment