Troubleshooting Performance Issues on Routers
Diagnosing and resolving router performance problems is an important skill set for network support engineers. Common causes for performance problems on routers are high CPU utilization and memory-allocation problems. Therefore, it is important to be able to recognize the typical symptoms associated with CPU or memory issues and to know the typically causes of these types of issues. This section prepares you to diagnose problems caused by high CPU utilization on routers using the Cisco IOS CLI, explains the typical symptoms and possible causes of memory-allocation failures, and offers guidelines for troubleshooting memory problems.
Troubleshooting High CPU Usage Issues on Routers
The CPU on a router performs two major tasks: forwarding packets and executing management and control plane processes. The CPU can become too busy when the CPU either has many packets to forward or when a system process consumes a large amount of the CPU time. For example, if the CPU is receiving many SNMP packets because of intensive network monitoring, it can become so busy processing all those packets that the other system processes cannot get access to CPU resources.
It is very to understand when high CPU utilization is at a problematic level and when it is considered to be normal. In some cases, high CPU utilization is normal and does not cause network problems. If CPU utilization is high for a short period of time, it does not necessarily cause a problem, as it is merely due to a short burst of network management requests or expected peaks of network traffic. If CPU utilization is consistently very high and packet forwarding or process performance on the router performance degrades, however, it is usually considered to be a problem and needs to be investigated.
When the router CPU is too busy to forward all packets as they arrive, the router might start to buffer packets, increasing latency, or even drop packets. This affects the application traffic passing through the router, and as a result, network performance will suffer. Also, because the CPU is spending most of its time on packet forwarding, control plane processes may not be able to get sufficient access to the CPU, which could lead to further disruptions because of failing routing or other control plane protocols.
Common symptoms of a router CPU that is too busy is that the router fails to respond to certain service requests. In those situations, the router might exhibit the following behaviors:
-
Slow response to Telnet requests or to the commands that are issued in active Telnet sessions
-
Slow response to commands issued on the console
-
High latency on ping responses or too many ping timeouts
-
Failure to send routing protocol packets to other routers
The following are some of the most common router processes that could cause high CPU utilization:
-
ARP Input: High CPU utilization by the ARP Input process occurs if the router has to originate an excessive number of ARP requests. Multiple ARP requests for the same IP address are rate-limited to one request every 2 seconds, so excessive numbers of ARP requests can only occur if the router needs to originate ARP requests for many different IP addresses. This can happen if an IP route has been configured pointing to a broadcast interface. This causes the router to generate an ARP request for each IP address that is not reachable through a more specific route. An excessive amount of ARP requests can also be caused by malicious network traffic. An indication of such traffic is the presence of a high number of incomplete ARP entries in the ARP table, similar to the one shown in Example 7-47.
Example 7-47: The Output of show arp Has Several Incomplete Entries
Router# show arp
Protocol Address Age (min) Hardware Addr Type Interface
Internet 10.10.10.1 - 0013.1918.caae ARPA FastEthernet0/0
Internet 10.16.243.249 0 Incomplete ARPA
Internet 10.16.243.250 0 Incomplete ARPA
Internet 10.16.243.251 0 Incomplete ARPA
Internet 10.16.243.252 0 Incomplete ARPA
Internet 10.16.243.253 0 Incomplete ARPA
Internet 10.16.243.254 0 Incomplete ARPA -
Net Background: The Net Background process runs whenever a buffer is required but is not available to a process or an interface. It uses the main buffer pool to provide the requested buffers. Net Background also manages the memory used by each process and cleans up freed-up memory. The symptoms of high CPU are increases in throttles, ignores, overruns, and resets on an interface; you can see these in the output of the show interfaces command.
-
TCP Timer: The TCP Timer process is responsible for TCP sessions running on the router. When the TCP timer process uses a lot of CPU resources, this indicates that there are too many TCP peers (such as Border Gateway Protocol [BGP] peers). The show tcp statistics command (a sample is shown in Example 7-48) displays detailed TCP information.
Example 7-48: The Output of show tcp statistics Displays Detailed TCP-Related Information
Router# show tcp statistics
Rcvd: 22771 Total, 152 no port
0 checksum error, 0 bad offset, 0 too short
4661 packets (357163 bytes) in sequence
7 dup packets (860 bytes)
0 partially dup packets (0 bytes)
0 out-of-order packets (0 bytes)
0 packets (0 bytes) with data after window
0 packets after close
0 window probe packets, 0 window update packets
4 dup ack packets, 0 ack packets with unsend data
4228 ack packets (383828 bytes)
Sent: 22490 Total, 0 urgent packets
16278 control packets (including 17 retransmitted)
5058 data packets (383831 bytes)
7 data packets (630 bytes) retransmitted
0 data packets (0 bytes) fastretransmitted
1146 ack only packets (818 delayed)
0 window probe packets, 1 window update packets
8 Connections initiated, 82 connections accepted, 65 connections established
32046 Connections closed (including 27 dropped, 15979 embryonic dropped)
24 total rxmt timeout, 0 connections dropped in rxmt timeout
0 Keepalive timeout, 0 keepalive probe, 0 Connections dropped in keepalive -
IP Background: This process is responsible for encapsulation type changes on an interface, the move of an interface to a new state (up or down), and change of IP address on an interface. The IP Background process modifies the routing table in accordance with the status of the interfaces and notifies all routing protocols of the status change of each IP interface.
To determine the CPU utilization on a router, issue the show processes cpu command. The output of this command shows how busy the CPU has been in the past 5 seconds, the past 1 minute, and the past 5 minutes. The output also shows the percentage of the available CPU time that each system process has used during these periods. In the output shown in Example 7-49, the CPU utilization for the last 5 seconds was 72 percent. Out of this total of 72 percent, 23 percent of the CPU time was spent in interrupt mode, which corresponds to switching of packets. On the same line of output, you can also see the average utilization for the last 1 minute (74 percent in this example), and the average utilization for the past 5 minutes (71 percent in this example).
Router# show processes cpu sorted
CPU utilizatin for five seconds: 72%/23%; one minute: 74%; five minutes: 71%
! 72%, 74%, and 71% indicate total CPU spent on processes and interrupts
(packet switching). 23% indicates CPU spent on interrupts (packet switching)
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
62 3218415936 162259897 8149 65.08% 72.01% 68.00% 0 IP Input
183 47280 35989616 1 0.16% 0.08% 0.08% 0 RADIUS
47 432 223 2385 0.24% 0.03% 0.06% 0 SSH Process
2 9864 232359 42 0.08% 0.00% 0.00% 0 Load Meter
61 6752 139374 48 0.08% 0.00% 0.00% 0 CDP Protocol
33 14736 1161808 12 0.08% 0.01% 0.00% 0 Per-Second Jobs
73 12200 4538259 2 0.08% 0.01% 0.00% 0 SSS Feature Time
! Output omitted for brevity
Issue the show processes cpu history command to see the CPU utilization for the last 60 seconds, 60 minutes, and 72 hours. The command output for this command provides ASCII graphical views of how busy the CPU has been. You can see if the CPU has been constantly busy or whether utilization has been spiking. CPU utilization spikes caused by a known network event or activity do not indicate problems, but if you see prolonged spikes that do not seem to correspond to any known network activity, you must definitely investigate.
Troubleshooting Switching Paths
To understand the different switching options and how they work, it is necessary to understand that there are different types of router platforms and that each of these platforms has its own behavior. For example, 2800 series routers are based on a single CPU, and all functions of the router can be executed by the Cisco IOS Software running on the main CPU. However, many of the functions can be offloaded to separate network modules that can be installed into these routers. 7600 series routers are based on special hardware that is responsible for all packet-forwarding actions, which means that the main CPU is not involved in processing of most packets. The task of packet forwarding (data plane) consists of two steps:
Step 1 | Making a routing decision: The routing decision is made based on network topology information and all the configured policies. Information about network destinations, gathered by a routing protocol, and possible restrictions like access lists or policy-based routing (PBR) are used to decide where to send each packet. |
Step 2 | Switching the packet: Switching packets on a router (not to be confused with Layer 2 switching) involves moving a packet from an input buffer to an output buffer and rewriting the data link layer header of the frame to forward the packet to the next hop toward the final destination. |
The data link layer addresses necessary to rewrite the frame are stored in different tables such as the ARP table, which lists the MAC addresses for known IP devices reachable via Ethernet interfaces. Usually routers discover the data link layer addresses to be used for a destination through an address resolution process that matches the Layer 3 address to the Layer 2 address of a next hop device.
There are three types of packet switching modes supported by Cisco routers:
The newest switching mode is CEF, and it is the default, preferred, and recommended switching mode. It is important to remember that the switching method used affects the router’s performance. To successfully troubleshoot problems related to the switching path it is essential to understand which method is used and how it works. The switching method might be altered globally or per interface for several reasons:
-
During troubleshooting, to verify if the observed behavior is caused by the switching method
-
During debugging, to direct all packets to CPU for processing
-
Because some IOS features require a specific switching method
Process Switching
Process switching is the oldest mode. When using process switching to forward packets, the router strips the Layer 2 header from an incoming frame, looks up the Layer 3 network address in the routing table for the packet, and then sends the frame with a rewritten Layer 2 header, including a newly computed cyclical redundancy check (CRC) to the outgoing interface. All these operations are performed for each individual frame by the IP Input process that is running on the central CPU. Process switching is configured on an interface by disabling fast switching (and CEF) on that interface. Process switching is the most CPU-intensive method available on Cisco routers. It greatly degrades performance figures such as throughput, jitter, latency, and so on. This method should be used only temporarily as a last resort during troubleshooting.
Note | To use process switching, fast switching must be disabled using this command: Router(config-if)# no ip route-cache |
Fast Switching
After performing a routing table lookup for the first packet destined for particular IP network, the router also initializes the fast-switching cache that is used by the fast-switching process. When subsequent frames to that same destination arrive, a cache lookup is performed and the destination is found in the fast-switching cache. Then the frame is rewritten with the corresponding data link layer header that was stored in the cache, and the frame is sent to the outgoing interface. The interface processor computes the CRC for the frame. Because the cache is destination based, fast switching can provide load sharing on a per-destination basis. Fast switching is less processor intensive than process switching because it uses a cache entry created by the first packet sent to a particular destination. The CPU utilization can go high even when the fast switching method is used, in a situation that there are a high number of new flows per second. This can happen when a network attack generates too many new flows rapidly.
Note | Fast switching is enabled using the following command: Router(config-if)# ip route-cache |
Cisco Express Forwarding
Cisco Express Forwarding (CEF) is the default switching mode on Cisco routers. CEF is less CPU-intensive than fast switching or process switching. CEF is a highly scalable and resilient switching technique. When CEF is enabled, information used for packet forwarding purposes resides in the following two tables:
-
CEF Forwarding Information Base (FIB): A router that has CEF enabled uses the FIB to make IP destination prefix-based switching decisions. This table is updated after each network change, but only once, and contains all known routes. There is no need to build a route cache by first using process switching for some of the packets. Each change in the IP routing table triggers a similar change in FIB table because it contains all next-hop addresses associated with all network destinations.
-
CEF adjacency table: The adjacency table contains Layer 2 frame headers for all next hops used by the FIB. These addresses are used to rewrite frame headers for packets that are forwarded by a router.
Both tables are built independently, and a change in one table does not lead to change in the other. CEF is an efficient mechanism for traffic load balancing. In this case, both the FIB and the adjacency table contain multiple entries for a single network destination to reflect the multiple network paths toward it. It is important to note that there are several Cisco IOS features that require CEF to be enabled for their operation because they rely on the data structures that are built and maintained by Cisco operation. Some of those features are as follows:
-
Network-Based Application Recognition (NBAR)
-
AutoQoS and Modular QoS CLI (MQC)
-
Frame Relay traffic shaping
-
Multiprotocol Label Switching (MPLS)
-
Class-based weighted random early detection
Note | CEF can be enabled and disabled globally using the command: Router(config)# [no] ip cef You can also enable or disable CEF on each interface individually using the command: Router(config-if)# [no] ip route-cache cef Generally, if CEF is disabled globally, it cannot be enabled on an interface, but if it is enabled globally, it can be disabled on a single interface. |
Troubleshooting Process and Fast Switching
Example 7-50 shows sample output from the show ip interface command after disabling the default CEF packet-switching mode using the no ip cef command. In the output, you can see that fast switching is enabled for all packets (except for packets that are sent back to the same interface that they came in on), but CEF switching is disabled.
Router# show ip interface GigabitEthernet 0/0
GigabitEthernet0/0 is up, line protocol is up
<...output omitted...>
IP fast switching is enabled
IP fast switching on the same interface is disabled
IP Flow switching is disabled
IP CEF switching is disabled
IP Fast switching turbo vector
IP multicast fast switching is enabled
IP multicast distributed fast switching is disabled
IP route-cache flags are Fast
! Output omitted for brevity
If you turn fast switching off, too, using the command no ip route-cache, and repeat the show ip interface command, the output will look similar to the one shown in Example 7-51. As you can see, however, multicast fast switching is still enabled. This is because IP multicast routing is configured entirely separate from IP unicast routing and there are separate configuration statements related to unicast and multicast operations. The no ip route-cache command only applies to unicast packets. To disable fast switching for multicast packets, the no ip mroute-cache command is used.
Router# show ip interface GigabitEthernet 0/0
GigabitEthernet0/0 is up, line protocol is up
<... output omitted ...>
IP fast switching is disabled
IP fast switching on the same interface is disabled
IP Flow switching is disabled
IP CEF switching is disabled
IP Fast switching turbo vector
IP multicast fast switching is enabled
IP multicast distributed fast switching is disabled
IP route-cache flags are Fast
! Output omitted for brevity
Disabling fast switching increases the load on the system CPU because every packet is processed by the IP Input process on the router CPU. In some situations however, disabling fast switching might be necessary (for example, during troubleshooting of connectivity problems) to eliminate the use of the fast-switching cache and to allow processing of all packets by the router CPU.
The show ip cache command displays the content of the fast-switching cache, as shown in Example 7-52. If fast switching is disabled on a particular interface, this cache will not have any network entries for that interface. The route cache is periodically cleared to remove stale entries and make room for new entries. This command is useful when troubleshooting because it shows that the fast-switching cache is initialized and populated with information for different network prefixes and associated outgoing interfaces.
Router# show ip cache
IP routing cache 4 entries, 784 bytes
5 adds, 1 invalidates, 0 refcounts
Minimum invalidation interval 2 seconds, maximum interval 5 seconds,
quiet interval 3 seconds, threshold 0 requests
Invalidation rate 0 in last second, 0 in last 3 seconds
Last full cache invalidation occurred 00:11:31 ago
Prefix/Length Age Interface Next Hop
10.1.1.1/32 00:07:20 FastEthernet0/0 10.1.1.1
10.2.1.1/32 00:04:18 FastEthernet0/1 10.2.1.1
10.10.1.0/24 00:01:06 FastEthernet0/0 10.1.1.1
10.11.1.0/24 00:01:20 FastEthernet0/1 10.2.1.1
Troubleshooting CEF
CEF builds two main data structures for its operation: the FIB and the adjacency table. When troubleshooting CEF, you have to check both tables and correlate entries between them. The items that you should check and verify when troubleshooting CEF are as follows:
-
Is CEF enabled globally and per interface?
-
Is there a FIB entry for a given network destination?
-
Is there a next hop associated with this entry?
-
Is there an adjacency entry for this next hop?
To find out whether CEF is enabled on a particular interface, issue the show ip interface command. As you can see in Example 7-53, the output clearly states whether CEF switching is enabled.
Router# show ip interface GigabitEthernet 0/0
GigabitEthernet0/0 is up, line protocol is up
<... output omitted ...>
IP fast switching is enabled
IP fast switching on the same interface is disabled
IP Flow switching is disabled
IP CEF switching is disabled
IP Fast switching turbo vector
IP multicast fast switching is enabled
IP multicast distributed fast switching is disabled
IP route-cache flags are Fast
! Output omitted for brevity
If CEF is enabled on the router, you will see output similar to that shown in Example 7-54 after issuing the show ip cef command. This command displays the content of the FIB table, but you also discover if CEF is globally enabled or disabled on the router. All directly connected networks in the output are marked as attached in the Next Hop field. Network prefixes that are local to the router are marked as receive. The show ip cef command does not display the interfaces on which CEF is explicitly disabled.
Router# show ip cef
Prefix Next Hop Interface
0.0.0.0/0 10.14.14.19 GigabitEthernet0/0
0.0.0.0/32 receive
10.14.14.0/24 attached GigabitEthernet0/0
10.14.14.0/32 receive
! Output omitted for brevity
10.14.14.252/32 receive
224.0.0.0/4 drop
224.0.0.0/24 receive
255.255.255.255/32 receive
In Example 7-54, the output shows that the router uses output interface GigabitEthernet0/0 and next hop 10.14.14.19/32 to reach 0.0.0.0/0 (the default route). You can also see what other destinations are associated with this interface/next-hop pair, using the show ip cef adjacency command for this interface and next-hop value, as shown in Example 7-55. This specific combination of output interface and next hop is used to reach two network destinations: the default route and a specific host destination (10.14.14.19/32), in this example.
Router# show ip cef adjacency GigabitEthernet0/0 10.14.14.19 detail
IP CEF with switching (Table Version 24), flags=0x0
23 routes, 0 reresolve, 0 unresolved (0 old, 0 new), peak 0
2 instant recursive resolutions, 0 used background process
28 leaves, 22 nodes, 26516 bytes, 79 inserts, 51 invalidations
0 load sharing elements, 0 bytes, 0 references
universal per-destination load sharing algorithm, id 56F4BAB5
4(1) CEF resets, 2 revisions of existing leaves
Resolution Timer: Exponential (currently 1s, peak 1s)
1 in-place/0 aborted modifications
refcounts: 6223 leaf, 6144 node
Table epoch: 0 (23 entries at this epoch)
Adjacency Table has 13 adjacencies
0.0.0.0/0, version 22, epoch 0, cached adjacency 10.14.14.19
0 packets, 0 bytes
via 10.14.14.19, 0 dependencies, recursive
next hop 10.14.14.19, GigabitEthernet0/0 via 10.14.14.19/32
valid cached adjacency
10.14.14.19/32, version 11, epoch 0, cached adjacency 10.14.14.19
0 packets, 0 bytes
via 10.14.14.19, GigabitEthernet0/0, 1 dependency
next hop 10.14.14.19, GigabitEthernet0/0
valid cached adjacency
To see the adjacency table entries for this next hop, you use the show adjacency command. Note the difference that there is no ip in this command. The output of the show adjacency command for the Gi0/0 interface, beginning with the next-hop value of 10.14.14.19, is shown in Example 7-56. In this entry, you can see the full Layer 2 frame header associated with this next hop, which has been built through ARP. The Layer 2 MAC address for this next-hop IP address can also be checked in the ARP cache using the show ip arp command for the specific 10.14.14.19 address (also shown in Example 7-56).
Router# show adjacency GigabitEthernet 0/0 detail | begin 10.14.14.19
Protocol Interface Address
IP GigabitEthernet0/0 10.14.14.9(5)
0 packets, 0 bytes
001200A2BC41001BD5F9E7C00800
ARP 03:19:39
Epoch: 0
[...]
Router# show ip arp 10.14.14.19
Protocol Address Age (min) Hardware Addr Type Interface
Internet 10.14.14.19 4 0012.009a.0c42 ARPA GigabitEthernet0/0
You must know that the CPU might process some packets, even if CEF is enabled. This can happen for reasons such as an incomplete adjacency table or when processing packets that need special handling by the main processor. You can gather information about the packets that are not switched with CEF by using the show cef not-cef-switched command, as shown in Example 7-57.
Router# show cef not-cef-switched
CEF Packets passed on to next switching layer
Slot No_adj No_encap Unsupp'ted Redirect Receive Options Access Frag
RP 424260 0 5227416 67416 2746773 9 15620 0
IOS Tools to Analyze Packet Forwarding
Cisco IOS Software is a powerful operating system that has an embedded set of tools to assist in troubleshooting various networking problems. These tools enable network administrators to quickly and effectively find, isolate, and repair IP communication problems. The following series of steps shows you an example of a troubleshooting process that could be used to find problems related to the switching path used by a router. The example is based on the network shown in Figure 7-15. Be aware that the actual routers used for command outputs in this example do not have any problems. The aim is to show the Cisco IOS commands in action.
Step 1 | First try to find the problematic router along the path with the traceroute utility as demonstrated in Example 7-58. Although the output seems normal, suppose that the traceroute command would have shown a much higher delay or packet loss on router R2 compared to router R3. Such symptoms can lead you to suspect problems in router R2. |
R1# traceroute 10.11.1.1
Type escape sequence to abort.
Tracing the route to 10.11.1.1
1 10.1.1.2 72 msec 56 msec 64 msec
2 10.2.1.1 76 msec 104 msec *
Step 2 | Check the CPU utilization on router R2 for load due to packet processing, using the show processes cpu command, as shown in Example 7-59. In this example, there are no problems related to packet processing. |
R2# show processes cpu | exclude 0.00
CPU utilization for five seconds: 4%/0%; one minute: 1%; five minutes: 1%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
2 3396 650 5224 0.08% 0.07% 0.10% 0 Load Meter
3 11048 474 23308 3.27% 0.51% 0.37% 0 Exec
99 13964 6458 2162 0.90% 0.66% 0.71% 0 DHCPD Receive
154 348 437 796 0.08% 0.09% 0.08% 0 CEF process
Step 3 | Check the routing table for the corresponding destination prefix (in this example, 10.11.1.1), as shown in Example 7-60. In this example, the routing information is present. |
R2# show ip route 10.11.1.1
Routing entry for 10.11.1.1/32
Known via "ospf 1", distance 110, metric 11, type intra area
Last update from 10.2.1.1 on FastEthernet0/1, 00:29:20 ago
Routing Descriptor Blocks:
* 10.2.1.1, from 10.11.1.1, 00:29:20 ago, via FastEthernet0/1
Route metric is 11, traffic share count is 1
Step 4 | Find which switching mode is used by the router and on the interfaces involved in packet forwarding. Using show ip cef, find out if CEF is enabled, for the destination under investigation, discover the egress interface, and use the show ip interface for that interface to see what type of switching is operational on it. This work is shown in Example 7-61 for the current example. In this example, CEF is enabled globally, and all involved interfaces are enabled for CEF switching. |
R2# show ip cef
Prefix Next Hop Interface
0.0.0.0/0 drop Null0 (default route handler entry)
0.0.0.0/32 receive
10.1.1.0/24 attached FastEthernet0/0
10.1.1.0/32 receive
10.1.1.1/32 10.1.1.1 FastEthernet0/0
10.1.1.2/32 receive
10.1.1.255/32 receive
10.2.1.0/24 attached FastEthernet0/1
10.2.1.0/32 receive
10.2.1.1/32 10.2.1.1 FastEthernet0/1
10.2.1.2/32 receive
10.2.1.255/32 receive
10.10.1.1/32 10.1.1.1 FastEthernet0/0
10.11.1.1/32 10.2.1.1 FastEthernet0/1
224.0.0.0/4 drop
224.0.0.0/24 receive
255.255.255.255/32 receive
R2# show ip interface FastEthernet 0/0 | include CEF
IP CEF switching is enabled
IP CEF Fast switching turbo vector
IP route-cache flags are Fast, CEF
R2# show ip interface FastEthernet 0/1 | include CEF
IP CEF switching is enabled
IP CEF Fast switching turbo vector
IP route-cache flags are Fast, CEF
Step 5 | Check the FIB entry for the routing information under investigation (in this case, 10.11.1.1), as shown in Example 7-62. The related adjacency entry shows interface FastEthernet0/1 with next hop 10.2.1.1. |
R2# show ip cef 10.11.1.1 255.255.255.255
10.11.1.1/32, version 13, epoch 0, cached adjacency 10.2.1.1
0 packets, 0 bytes
via 10.2.1.1, FastEthernet0/1, 0 dependencies
next hop 10.2.1.1, FastEthernet0/1
valid cached adjacency
Step 6 | Check the adjacency table for the next-hop value of the destination you are investigating, as shown in Example 7-63. In this case, the relevant adjacency is built using ARP. |
R2# show adjacency FastEthernet0/1 detail
Protocol Interface Address
IP FastEthernet0/1 10.2.1.1(7)
203 packets, 307342 bytes
C40202640000C4010F5C00010800
ARP 02:57:43
Epoch: 0
Step 7 | Check the ARP cache entry for the next hop, as shown in Example 7-64. You see that the MAC address information is present in the router. Based on this verification process, you can conclude that the routers in this example do not have any switching-related problems. |
R2# show ip arp
Protocol Address Age (min) Hardware Addr Type Interface
Internet 10.2.1.1 67 c402.0264.0000 ARPA FastEthernet0/1
Internet 10.1.1.2 - c401.0f5c.0000 ARPA FastEthernet0/0
Internet 10.1.1.1 67 c400.0fe4.0000 ARPA FastEthernet0/0
Internet 10.2.1.2 - c401.0f5c.0001 ARPA FastEthernet0/1
The steps shown can be used as generic procedure for finding issues with CEF switching.
Troubleshooting Router Memory Issues
Memory-allocation failure is the most common router memory issue. Memory-allocation failures happen when the router has used all available memory (temporarily or permanently), or the memory has been fragmented into such small pieces that the router cannot find a usable available block. This can happen to the processor memory, which is used by Cisco IOS Software, or to the packet memory, which is used to buffer incoming and outgoing packets. Symptoms of memory allocation failures include the following:
-
Messages such as %SYS–2–MALLOCFAIL: Memory allocation of 1028 bytes failed from 0x6015EC84, Pool Processor, alignment 0 display in the router logs.
-
show commands generate no output.
-
Receiving Low on memory messages.
-
Receiving the message Unable to create EXEC – no memory or too many processes on the console.
When a router is low on memory, in some instances it is not even possible to use Telnet to connect to the router. When you get to this point, you need to get access to the console port to collect data for troubleshooting. When connecting to the console port, however, you might see the Unable to create EXEC – no memory or too many processes message. If you see this message, there is not even enough available memory to allow for a console connection.
Some of the main reasons for memory problems are as follows:
-
Memory size does not support the Cisco IOS Software image: First, check the Release Notes (available to registered customers only) or the IOS Upgrade Planner (available to registered customers only) for the minimum memory size for the Cisco IOS Software feature set and version that you are running. Make sure that you have sufficient memory in your router to support the software image. The actual memory requirements will vary based on protocols used, routing tables, and traffic patterns on the network.
-
Memory-leak bug: A memory leak occurs when a process requests or allocates memory and then forgets to free (de-allocate) the memory when it is finished with that task. As a result, the memory block stays reserved until the router is reloaded. The show memory allocating-process totals command will help you to identify how much memory is used and is free, and the per-process memory utilization of the router. Example 7-65 shows sample output from this command. Memory leaks are caused by bugs in the Cisco IOS code, and the only solution is to upgrade Cisco IOS Software on the device to a version that fixes the issue.
Example 7-65: show memory allocating-process totals Command Output
Router# show memory allocating-process totals
Head Total (b) Used(b) Free(b) Lowest(b) Largest(b)
Processor 62A2B2D0 183323952 26507580 156816372 155132764 154650100
I/0 ED900000 40894464 4957092 35937372 35887920 3590524
Allocator PC Summary for: Processor
PC Total Count Name
0x6136A5A8 5234828 1 Init
0x608E2208 3576048 812 TTY data
0x6053ECEC 1557568 184 Process Stack
0x61356928 1365448 99 Init
! Output omitted for brevity -
Security-related problems: MALLOCFAIL errors can also be caused by a security issue, such as a worm or virus operating in your network. This is likely the cause, especially if there have not been any recent changes to the network, such as router IOS upgrades or configuration changes. You can often mitigate the effect of this type of problem by adding a number of configuration statements to your router, such as an access list that drops the traffic generated by the worm or virus. The Cisco Product Security Advisories and Notices page contains information on detection of the most likely causes and specific workarounds.
-
Memory-allocation failure at process = interrupt level: The error message identifies the cause. If the process is listed as
, as shown in the message that follows, the memory-allocation failure is being caused by a software problem: %SYS–2–MALLOCFAIL: Memory allocation of 68 bytes failed from
0x604CEF48, pool Processor, alignment 0–Process=,
ipl= 3You can use the Bug Toolkit to search for a matching software bug ID (unique bug identification) for this issue. After you have identified the software bug, upgrade to a Cisco IOS Software version that contains the fix to resolve the problem.
-
Buffer-leak bug: When a process is finished using a buffer, the process should free the buffer. A buffer leak occurs when the code forgets to free it. As a result, the buffer pool continues to grow as more and more packets are stuck in the buffers.
The show interfaces command displays statistics for all interfaces configured on the router. Figure 7-66 displays sample output from this command. The output indicates that the interface input queue is wedged, which is a symptom of buffer leak. The full input queue (76/75) warns of a buffer leak. Here, the values 76 and 75 represent the number of packets in the input queue, and the maximum size of the input queue, respectively: The number of packets in the input queue is larger than the queue depth! This is called a wedged interface. When the input queue of an interface is wedged, the router no longer forwards traffic that enters the affected interface.
Router# show interfaces
<...output omitted...>
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:58, output never, output hang never
Last clearing of "show interface" counters never
input queue 76/75, 1250 drops
Output queue 0/40, 0 drops;
! Output omitted for brevity
The show buffers command displays statistics for the buffer pools on the router. The output in Example 7-67 reveals a buffer leak in the middle buffers pool. There are a total of 17602 middle buffers in the router, and only 11 are in the free list. This implies that some process takes all the buffers, but does not return them. Other symptoms of this type of buffer leak are %SYS–2–MALLOCFAIL error messages for the pool “processor” or “input/output (I/O),” based on the platform. Similar to a generic memory leak, a buffer leak is caused by a software bug, and the only solution is to upgrade Cisco IOS Software on the device to a version that fixes the issue.
Router# show buffers
<...output omitted...>
Middle buffers, 600 bytes (total 17602, permanent 170):
11 in free list (10 min, 400 max allowed)
498598 hits, 148 misses, 671 trims, 657 created
0 failures (0 no memory)
! Output omitted for brevity
BGP Memory Use
Cisco IOS has three main processes used by the Border Gateway Protocol (BGP):
-
BGP I/O: This process handles reading, writing, and executing of all BGP messages. This process is also the interface between TCP and BGP.
-
BGP router: This process is responsible for initiation of a BGP process, session maintenance, processing of incoming updates, sending of BGP updates, and updating the IP RIB (Routing Information Base) with BGP entries.
-
BGP scanner: This process performs periodic scans of the BGP RIB to update it as necessary, and it scans the IP RIB to ensure that all BGP next hops are valid.
The BGP router process consumes the majority of the memory used by BGP. The BGP router process uses memory to store the BGP RIB, IP RIB for BGP prefixes, and IP switching data structures for BGP prefixes. If you do not have enough memory to store this information, BGP cannot operate in a stable manner, and network reliability will be compromised. If you are using chassis-based routers, which distribute routing information to the line cards, you should not only check the memory availability for the route processor, but also the memory availability on the line cards. The show diag command displays the different types of cards present in your router and their respective amounts of memory, as demonstrated in Example 7-68. This command is useful to identify a lack of memory on the line cards when the router runs BGP.
Router# show diag | I (DRAM|SLOT)
SLOT 0 (RP/LC 0 ): 1 Port SONET based SRP OC-12c/STM-4 Single Mode
DRAM size: 268435456 bytes
FrFab SDRAM size: 134217728 bytes, SDRAM pagesize: 8192 bytes
ToFab SDRAM size: 134217728 bytes, SDRAM pagesize: 8192 bytes
SLOT 2 (RP/LC 2 ): 12 Port Packet over E3
DRAM size: 67108864 bytes
FrFab SDRAM size: 67108864 bytes
ToFab SDRAM size: 67108864 bytes
SLOT 3 (RP/LC 3 ): 1 Port Gigabit Ethernet
DRAM size: 134217728 bytes
FrFab SDRAM size: 134217728 bytes, SDRAM pagesize: 8192 bytes
ToFab SDRAM size: 134217728 bytes, SDRAM pagesize: 8192 bytes
SLOT 5 (RP/LC 5 ): Route Processor
DRAM size: 268435456 bytes
Summary
The main categories of application services are as follows:
-
Network classification
-
Application scalability
-
Application networking
-
Application acceleration
-
WAN acceleration
-
Application optimization
The recipe to application optimization is a four-step cycle that incrementally increases your understanding of network applications and allows you to progressively deploy measurable improvements and adjustments as required, as follows:
Step 1 | Baseline application traffic. |
Step 2 | Optimize the network. |
Step 3 | Measure, adjust, and verify. |
Step 4 | Deploy new applications. |
NetFlow efficiently provides a vital set of services for IP applications, including network traffic accounting, usage-based network billing, network planning, security DoS monitoring, and overall network monitoring. A flow is a unidirectional stream of packets, between a given source and a destination, that have several components in common. The seven fields that need to match for packets to be considered part of the same flow are as follows:
-
Source IP Address
-
Destination IP Address
-
Source Port (protocol dependent)
-
Destination Port (protocol dependent)
-
Protocol (Layer 3 or 4)
-
Type of Service (ToS) Value (differentiated services code point [DSCP])
IP SLA is useful for performance measurement, monitoring, and network baselining. You can tie the results of the IP SLA operations to other features of your router, and trigger action based on the results of the probe. To implement IP SLA network performance measurement, you need to perform the following tasks:
-
Enable the IP SLA responder, if required.
-
Configure the required IP SLA operation type.
-
Configure any options available for the specified operation type.
-
Configure threshold conditions, if required.
-
Schedule the operation to run, and then let the operation run for a period of time to gather statistics.
-
Display and interpret the results of the operation using the Cisco IOS CLI or an NMS, with SNMP.
NBAR is another important tool for baselining and traffic classification purposes. NBAR is a classification engine that recognizes a wide variety of applications, including web-based and other difficult-to-classify protocols that utilize dynamic TCP/UDP port assignments. The simplest use of NBAR is baselining through protocol discovery.
The Cisco IOS SLB feature is a Cisco IOS-based solution that provides server load balancing. This feature allows you to define a virtual server that represents a cluster of real servers, known as a server farm. When a client initiates a connection to the virtual server, the Cisco IOS SLB load balances the connection to a chosen real server based on the configured load-balance algorithm or predictor.
Cisco AutoQoS is an automation tool for deploying QoS policies. The newer versions of Cisco AutoQoS have two phases. In the first phase, information is gathered and traffic is baselined to define traffic classes and volumes; this is called autodiscovery. The command auto discovery qos is entered at the interface configuration mode. You must let discovery run for a period of time that is appropriate for your baselining or monitoring needs. The auto qos command, which is also an interface configuration mode command, uses the information gathered by autodiscovery to apply QoS policies accordingly. The autodiscovery phase generates templates on the basis of the data collected. These templates are then used to create QoS policies. Finally, the policies are installed by AutoQoS on the interface.
For Cisco AutoQoS to work certain requirements must be met, as follows:
-
CEF must be enable on the interface.
-
The interface (or subinterface) must have an IP address configured.
-
For serial interfaces (or subinterfaces) configure the appropriate bandwidth.
-
On point-to-point serial interfaces, both sides must be configured AutoQoS.
Some useful NetFlow troubleshooting commands are the following:
-
show ip cache flow
-
show ip flow export
-
show ip flow interface
-
debug ip flow export
Useful IP SLA troubleshooting commands include the following:
-
show ip sla monitor statistics
-
show ip sla monitor collection-statistics
-
show ip sla monitor configuration
-
debug ip sla monitor trace
Some useful NBAR troubleshooting commands are these:
-
show ip nbar port-map
-
show ip nbar protocol-discovery
-
debug ip nbar unclassified-port-stats
Some of the useful AutoQoS troubleshooting commands are as follows:
-
show auto qos interface
-
show auto discovery qos
Troubleshooting performance problems is a three-step process:
Step 1 | Assessing whether the problem is technical in nature |
Step 2 | Isolating the performance problem to a device, link, or component |
Step 3 | Diagnosing and resolving the performance degradation at the component level |
The following events cause spikes in the CPU utilization:
-
Processor-intensive Cisco IOS commands
-
Routing protocol update processing
-
SNMP polling
Some common interface and wiring problems are as follows:
-
No cable connected
-
Wrong port
-
Wrong cable type
-
Bad cable
-
Loose connections
-
Patch panels
-
Faulty media converters
-
Bad or wrong GBIC
Common symptoms of a router CPU that is too busy is that the router fails to respond to certain service requests. In those situations, the router might exhibit the following behaviors:
-
Slow response to Telnet requests or to the commands issued in active Telnet sessions
-
Slow response to commands issued on the console
-
High latency on ping responses or too many ping timeouts
-
Failure to send routing protocol packets to other routers
When troubleshooting CEF, you always want to check and verify the following:
-
Is CEF enabled globally and per interface?
-
Is there a FIB entry for a given network destination?
-
Is there a next hop associated with this entry?
-
Is there an adjacency entry for this next hop?
Symptoms of memory-allocation failures include the following:
-
Messages such as %SYS–2–MALLOCFAIL: Memory allocation of 1028 bytes failed from 0x6015EC84, Pool Processor, alignment 0 display in the router logs.
-
Not getting any output from show commands.
-
Receiving Low on memory messages.
-
Receiving the message Unable to create EXEC – no memory or too many processes on the console.
Some of the main reasons for memory problems are as follows:
-
Memory size does not support the Cisco IOS Software image
-
Memory-leak bug
-
Memory-allocation failure at process = interrupt level error message
-
Buffer-leak bug
No comments:
Post a Comment