Troubleshooting Spanning Tree
High availability is an important requirement for today’s campus LANs. The more dependent enterprises have become on their networks to support their business, the more important it is that those networks are highly available and have minimized downtime. One primary way to build highly available networks is through usage of redundant devices and links. However, when you introduce redundancy in a Layer 2 switched network, you can introduce bridging loops, resulting in broadcast storms that render the network unusable. The IEEE 802.1D Spanning Tree Protocol eliminates active bridging loops and thereby prevents broadcast storms. This is why a good understanding of the operation of STP is essential to any network engineer. It is important to know how to predict the spanning-tree topology or, in the absence of spanning-tree documentation, determine the spanning-tree topology using the appropriate Cisco IOS commands. Spanning-tree failures can be catastrophic when they happen. Therefore, recognizing the symptoms and having an action plan for these types of failures is an essential skill in reducing network downtime.
Spanning-Tree Operation
The IEEE 802.1D Spanning Tree Protocol is one of the most important protocols within a LAN switching environment, such as a campus network. A LAN that does not run this protocol in some form or another is not common. If you hear someone say that they do not use STP in their LAN, what they probably mean is that in their network spanning tree is not actively blocking ports or involved in the reconvergence process when a failure occurs. In most of those instances, STP is running in the background, in case a topological loop is created. The main purpose of STP is to prevent bridging loops and packet storms that might stem from loop conditions. Figure 4-9 shows a LAN with four LAN switches and many topological loops. Anyone who has been involved in a situation where loops were introduced in the switching topology while STP was not running, or not functioning correctly, knows how badly this type of failure can affect the network. The LAN can become fully saturated, and switches might become entirely unresponsive until the loops are eliminated. Therefore, it is crucial for any engineer who implements or supports switched LANs understand STP, recognize the symptoms of a spanning-tree failure, and know how to resolve those issues.
The text that follows reviews spanning-tree operation making use of the topology depicted in Figure 4-9. Building the spanning tree requires the following four main steps:
Electing a Root Bridge
The first step in the spanning-tree algorithm is the election of a root bridge. Initially, all switches assume that they are the root. They transmit bridge protocol data units (BPDUs) with the Root ID field containing the same value as the Bridge ID field. This implies that each switch nominates itself as the root bridge on the network. As soon as the switches start receiving BPDUs from the other switches, each switch compares the root ID in the received BPDUs against the value that it currently has recorded as the root ID. If the received value is lower than the recorded value (which was originally the switch’s own bridge ID), the switch replaces the recorded value with the received value and starts transmitting this in the Root ID field in its own BPDUs. As a result, shortly, all switches will have learned and recorded the bridge ID of the switch that has the lowest bridge ID of all switches, and they will all be transmitting this ID in the Root ID field of their BPDUs. The root bridge is elected. Based on this method, in Figure 4-10 it is shown that Switch B is elected as the root bridge.
Electing a Root Port
As soon as a switch receives BPDUs that show a root ID different from its own bridge ID, the switch recognizes that it is not the root, and it will mark the port on which it is receiving those BPDUs as its root port. If these types of BPDUs are received on multiple ports, the switch selects the port that has the lowest cost path to the root as its root port. If two ports have an equal path cost to the root, the switch will look at the bridge ID values in the received BPDUs and selects the port that is receiving the BPDU with lower bridge ID. If the root path cost and the bridge ID in the received BPDUs are the same (because both ports are connected to the same upstream switch), the switch selects the port that is receiving the BPDU with the lower port ID to be the root port. Note that the port ID consists of two parts: port priority and port index (number). The port ID on the received BPDU is the port ID of the sending (upstream) switch. Based on these steps, Figure 4-11 shows the ports selected as root ports on each switch (except the root bridge).
The way a switch determines the path cost is by adding the cost associated to the port on which it receives the BPDU to the value in the Root Path Cost field of the received BPDU. The lowest value determines the switch’s root port, and this value in turn is transmitted in the switch’s own BPDUs. In short, this means that the root bridge starts sending BPDUs with the root path cost set to zero, and then each switch adds the cost of its root port to the received cost when it sends BPDUs to neighboring switches. The cost associated to each port is, by default, inversely related to its speed, but can be manually changed.
Electing Designated Ports
After electing the root bridge and root ports, the switches determine which interface/port in each Ethernet segment must be the designated port. This process has similarities to both root bridge and root port elections. Each switch connected to a segment will send BPDUs out of its port connected to that segment, essentially claiming that port to be the designated port for that segment. At this point, each switch considers its port to be the designated port for that Ethernet segment. However, as soon as a switch starts receiving BPDUs from other switches on that segment, it compares the received values of the Root Path Cost, Bridge ID, and Port ID fields (in that order) against the values in its own BPDUs that it is sending out that port. If it turns out that the other switch has lower values than this switch, it stops transmitting BPDUs on the port and marks it as a nondesignated port (in modern spanning-tree protocols, this port can be either an alternative or a backup port). In a rare case that a switch has multiple ports on the same Ethernet segment, the tie is broken using the local port ID. Note that a single port is designated for an Ethernet segment. This is emphasized because it is theoretically possible for two ports of the same switch to be connected to the same Ethernet segment. Figure 4-12 shows the designated ports on each Ethernet segment marked as DP.
Ports Going into Blocking, or Learning, and Forwarding State
To prevent bridging loops during the time it takes the STP to execute its algorithm, all ports are in a state called listening. While in the listening state, the port is busy building the spanning tree, and it does not forward any traffic. After forward_delay seconds, if the switch marks a port as either a root port or a designated port, that port transitions to the learning state. The port learns MAC addresses and records them in the MAC address table for a period called forwarding delay, and then the port proceeds to the forwarding state and starts to forward traffic. The ports that ended up as neither designated nor root ports transition into the blocking state. Figure 4-13 shows the ports in the blocking state with the letter B and an X marker. The designated or root ports that have completed the learning state and are now in the forwarding state have the letter F. From this moment going forward, the designated ports release a BPDU periodically based on a timer called the hello timer.
Although the order of the steps listed suggests that STP goes through these steps in a coordinated sequential manner, this is not actually the case. If you look back at the descriptions of each of the steps in the process, you will see that each switch is going through these steps in parallel and that it might adapt its selection of root bridge, root port, and designated ports as new BPDUs are received. As the BPDUs are propagated through the network, all switches will eventually have a consistent view of the topology of the network. Finally, notice that up to this point no distinction has been made between the classical (802.1D) and Rapid (802.1w) versions of STP. Both versions execute the same algorithm when it comes to the decision-making process. On the other hand, when it comes to the process of transitioning a port from the blocking (or discarding in RSTP terms) to the forwarding state, a significant difference exists between these two STP versions. Classical 802.1D can take up to 50 seconds to transition a port to forwarding, whereas RSTP can leverage additional mechanisms to transition a port in blocking state to the forwarding state in less than a second.
Analyzing the Spanning-Tree Topology
In many networks, the optimal spanning-tree topology is determined as part of the network design and then implemented through manipulation of spanning-tree priority and cost values. Sometimes you might run into situations where spanning tree was not considered in the initial design and implementation. In other situations, the spanning-tree topology might have been considered initially, but the network has undergone significant growth and changes since then. In these types of situations, it is important that an engineer know how to analyze the actual spanning-tree topology in the operational network. Troubleshooting includes comparing the actual state of the network against the expected state of the network and spotting the differences to gather clues about the problem. To do that, you should be able to examine the switches and determine the actual topology and compare that with the spanning-tree topology as per design. Important commands for gathering information about the status of STP and the corresponding topology include the following:
Note | The original spanning-tree timers are based on the assumption that the network diameter is up to seven bridges/switches long. |
-
show spanning-tree [vlan vlan-id]: This command, without specifying any additional options, is useful if you want a quick overview of the status of STP for all VLANs that are defined on a switch. If you are interested in only a particular VLAN, you can limit the scope of this command by specifying the VLAN number as an option. Figure 4-14 shows sample output from this command.
Figure 4-14: Sample Output from the show spanning-tree Command -
show spanning-tree interface interface-id detail: This command is useful if you need to see the status of an interface plus all the STP-related parameters on that interface and the BPDUs on this interface. It will either give you the BPDU content of the BPDUs received from the upstream switch (if that switch’s port is the designated port) or the content of the BPDUs that are sent out by this switch (if this switch’s port is the designated port for the segment connected to the interface). Figure 4-15 shows sample output from this command.
Figure 4-15: Sample Output from the show spanning-tree interface Command
In the example shown in Figure 4-15, you can see that port 88 (TenGigabitEthernet9/1) is a root port and the upstream switch’s port is the designated port. This is also reflected by the fact that this switch is receiving BPDUs (it received 670 BPDUs), but not transmitting them (it sent 10 BPDUs during initial spanning-tree convergence and stopped after that). You can also see that the upstream switch is the root bridge. This can be concluded from the fact that the designated bridge ID and the root bridge ID are the same. This is further confirmed by the fact that the designated path cost is reported as a cost of 0.
Spanning-Tree Failures
The biggest problem with STP is not the fact that it can fail, as any protocol can. In fact, STP is one of the most reliable protocols available. The main concern is that when a problem related to STP exists, there are usually major negative consequences. For many protocols, when they malfunction, all that happens is that you lose some of the functionality that was gained through this protocol. For instance, if Open Shortest Path First (OSPF) Protocol is malfunctioning on one of your routers, you might lose connectivity to networks that are reachable through that particular router. However, this generally does not affect the rest of your OSPF network. If you have some way to connect to that router, you can still perform your troubleshooting routines to diagnose and fix the problem.
With STP, there are two different types of failures. The first one is similar to the OSPF problem just described. STP may erroneously block certain ports that should have gone to the forwarding state. This will cause problems that are similar to the OSPF problem: You might lose connectivity to certain parts of your network, but the rest of the network is unaffected. If you are able to access the switch, you can perform some troubleshooting, and even fix the problem. The second type of failure is a lot more disruptive, and it happens when STP erroneously moves one or more ports to the forwarding state.
An Ethernet frame header does not include a Time To Live (TTL) field; therefore, any frame that enters a bridging loop will continue to be forwarded by the switches indefinitely. The only exceptions are the frames that have their destination address recorded in the MAC address table of the switches. These frames will simply be forwarded to the port that the MAC address is associated with and will not go into an endless loop. However, any frame that is flooded by a switch, such as broadcasts, multicasts, and unicasts with an unknown destination MAC address, will go into an endless loop and start circling. The consequences and corresponding symptoms of this behavior are as follows:
-
The load on all links in the switched LAN will quickly start increasing as more and more frames enter the loop. Note that this is not limited to just the links that form the loop, but also any other links in the switched domain, because some frames are flooded on all links. Naturally, when the spanning-tree failure is limited to a single VLAN, then only links in that VLAN will be affected and switches and trunks that do not carry that VLAN will operate normally.
-
If the spanning-tree failure has caused more than one bridging loop to form, traffic will increase exponentially (because frames will not only start circling, but will also start getting duplicated). This happens because, in the case of multiple loops, there will be switches that receive a frame on a port and then flood it out on multiple ports, essentially creating a copy of the frame every time they forward it.
-
When control plane traffic, such as Hot Standby Router Protocol (HSRP), OSPF, or Enhanced Interior Gateway Routing Protocol (EIGRP) hellos, starts entering the loop, the devices that are running these protocols will quickly start getting overloaded. Their CPU will approach 100 percent utilization while they are trying to process an ever-increasing load of control plane traffic. In many cases, the earliest indication of a broadcast storm in progress is that routers or Layer 3 switches are reporting control plane failures, such as continual HSRP state changes, or that they are running at a very high CPU utilization load.
-
Switches will experience very frequent MAC address table changes. This happens because frames might loop in both directions, causing a switch to see a frame with a certain source MAC address enter through one port and then see a frame with the same source MAC address enter through a different port shortly later.
-
Because of the combination of very high load on all links and the CPU running at maximum load on Layer 3 switches or routers, these devices typically become unreachable, making it nearly impossible to diagnose the problem while it is in progress.
Spanning-tree problems, especially when they result in bridging loops and broadcast storms, are severe. During these periods, execution of proper troubleshooting methods is severely hindered by the fact that some links and devices are overloaded. One intrusive but effective way to start tackling severe spanning-tree problems is eliminating topological loops by either physically disconnecting links or by shutting down interfaces if that is still possible. Once the loops are broken, the traffic and CPU loads should quickly drop to normal levels, and you should regain connectivity to your devices. Although this restores connectivity to the network, you cannot consider this the end of your troubleshooting process. You have removed all redundancy from your switched network, and you need to restore the redundant links. Redundancy is a common way to provide fault tolerance and load sharing within a network. Naturally, if the underlying cause for the spanning-tree failure has not been fixed, restoring the redundant links will trigger a new broadcast storm. To find the root cause of the failure, you should identify and correct the cause of the problem, before you restore the redundant links. An example of a failure that causes spanning-tree problems is a unidirectional link. If you identify and remove a faulty cable that caused the unidirectional link, you can use a new cable to replace the old faulty one. After restoring the redundant links, always carefully monitor the network and have an emergency plan to fall back on if you see a new broadcast storm developing.
EtherChannel Operation
EtherChannel is a technology that bundles multiple physical Ethernet links (100 Mbps,1 Gbps,10 Gbps) into a single logical link and distributes the traffic across these links. This logical link is represented in Cisco IOS syntax as a port-channel interface. Control protocols like STP or routing protocols will only interact with this single port-channel interface and not with the associated physical interfaces. Packets and frames are routed or switched to the port-channel interface, and then a hashing mechanism determines which physical link will be used to transmit them. There are three common EtherChannel problems:
-
Inconsistencies between the physical ports that are members of the channel: The physical links in an EtherChannel must have the same operational characteristics. For example, they must have the same speed, duplex, trunk, or access port status, native VLAN when trunking, and same access VLAN when they are access ports. As a rule, it is therefore recommended that the configuration of all physical links in the channel be identical. If at a certain point in time (usually due to misconfiguration), one of the physical links changes its operational status in such a way that a mismatch with the other physical links is created, this port will be suspended and removed from the EtherChannel bundle until consistency is restored. When the switch suspends a physical link in the channel because of incompatibilities, it generates a %EC-5-CANNOT_BUNDLE2 log message.
-
Inconsistencies between the ports on the opposite sides of the EtherChannel link: If the switch on one side of a few links is configured to bundle these links into an EtherChannel and the switch on the other side is not, the switch that is configured for EtherChannel will detect this (by detecting inconsistencies in the spanning-tree behavior) and move the port to an error-disabled state. The switch will generate a %SPANTREE-2-CHNL_MISCFG message when it “error disables” the port. The use of an EtherChannel negotiation protocol like the 802.3ad Link Aggregation Control Protocol (LACP) or the Port Aggregation Protocol (PAgP) prevents this situation from happening because both sides must first agree to form the channel.
-
Uneven distribution of traffic between EtherChannel bundle members: Some people expect that when EtherChannel is used, the traffic is equally balanced across all physical links in the bundle. You must realize, however, that the method used to distribute traffic over the physical links is to calculate a hash of a combination of fields in the Ethernet and IP headers of a frame and then send the frame to a physical interface based on the hash result. Therefore, the distribution of traffic depends on two things: The distribution of hash values over the physical links, and the header fields that are used as a key into the hash calculation. The Cisco EtherChannel hash algorithm results in a value between 0 and 7. This means that in case of an eight-port EtherChannel, one hash value is assigned to each of the links, and (assuming a random traffic mix) traffic is equally balanced across all eight links. However, if the channel consists of six links, the distribution will be 2:2:1:1:1:1 instead, meaning that the first two links in the channel will each handle twice as much traffic as the other links. The second factor in EtherChannel load balancing is which header fields are used as the base of the hash value. If you could assume those fields in the traffic to be entirely random, it would not matter what hashing mechanism were used; however, because header fields are typically not random, the choice of header fields to be hashed does affect the distribution. For example, when only the destination MAC address is used as the input for the hash calculation, if 90 percent of all frames are destined for a single MAC address (for instance, the MAC address of the default gateway), all of that traffic would end up on the same physical link. Therefore, if you see an uneven distribution of traffic over the links in the channel, you should examine the hashing method and the traffic mix to determine the cause.
Troubleshooting Example: Switch Replacement Gone Bad
A broken access switch has been replaced by a new access switch ASW1 (see Figure 4-16). The junior support staff has configured ASW1 to the best of his knowledge and using the documentation that exists. After the switch booted and its physical connections to the two other switches (CSW1 and CSW2) were restored, the junior support staff reported three problems that he could not solve:
-
On CSW2, port channel 1, which connects to ASW1, is down.
-
On ASW1, the following log message on the console indicates a spanning-tree problem on Po2, which connects to CSW1: %SPANTREE-2-PVSTSIM_FAIL: Blocking designated port Po2: Inconsistent superior PVST BPDU received on VLAN 17, claiming root 24593:001f.2721.8400
-
On ASW1, interface VLAN 128 is down.
A key command used to troubleshoot problems with EtherChannel bundles is the show etherchannel summary command. The output from this command presents a concise overview of all links that are configured for EtherChannel, the status of the individual physical interfaces, and the logical port-channel interfaces. Starting with the first problem, you would use the show etherchannel summary command (see results in Example 4-1) and see the letters SD beside the Po1 interface. S means that it is configured as a Layer 2 interface, but D means that Po1 is down, as reported by the junior staff. Also, the small letter s is present beside the physical interfaces Fa0/5 and Fa0/6. Small s, marking the physical interfaces, indicates that those interfaces have been suspended.
CSW2# show etherchannel summary
Flags: D - down P - bundled in port-channel
I - stand-alone s - suspended
H - Hot-standby (LACP only)
R - Layer3 S - Layer2
U - in use f - failed to allocate aggregator
M - not in use, minimum links not met
u - unsuitable for bundling
w - waiting to be aggregated
d - default port
Number of channel-groups in use: 2
Number of aggregators: 2
Group Port-channel Protocol Ports
———+——————-+—————-+————————————-
1 Po1(SD) - Fa0/5(s) Fa0/6(s)
2 Po2(SU) - Fa0/3(P) Fa0/4(P)
When you see physical interfaces in an EtherChannel that are marked as suspended, this usually indicates that a configuration mismatch, either between interfaces in the channel itself or between the configuration on this end of the EtherChannel and the configuration at the other end. To find more detail about what exactly caused the problem, you can use the show etherchannel number detail command, or search the log for a message that tells you why the links were suspended. Issuing the show etherchannel 1 detail command provides the output shown in Example 4-2.
CSW2# show etherchannel 1 detail
Group state = L2
Ports: 2 Maxports = 8
Port-channels: 1 Max Port-channels = 1
Protocol: -
Minimum Links: 0
Ports in the group:
—————————-
Port: Fa0/5
——————
Port state = Up Cnt-bndl Suspend Not-in-Bndl
Channel group = 1 Mode = On Gcchange = -
Port-channel = null GC = - Pseudo port-channel = Po1
Port index = 0 Load = 0x00 Protocol = -
Age of the port in the current state: 0d:00h:25m:13s
Probable reason: vlan mask is different
Port: Fa0/6
——————
Port state = Up Cnt-bndl Suspend Not-in-Bndl
Channel group = 1 Mode = On Gcchange = -
Port-channel = null GC = - Pseudo port-channel = Po1
Port index = 0 Load = 0x00 Protocol = -
Age of the port in the current state: 0d:00h:25m:14s
Probable reason: vlan mask is different
Port-channels in the group:
—————————————-
Port-channel: Po1
——————
Age of the Port-channel = 0d:00h:24m:48s
Logical slot/port = 2/1 Number of ports = 0
GC = 0x00000000 HotStandBy port = null
Port state = Port-channel Ag-Not-Inuse
Protocol = -
Port security = Disabled
The output shown in Example 4-2 indicates that the cause of the problem is the VLAN mask, which means that there must be a mismatch between the VLANs allowed on the port channel versus the VLANs allowed on the physical interfaces. You could also find the problem indication from the log, which contains the messages shown in Example 4-3.
Mar 20 08:12:39 PDT: %EC-5-CANNOT_BUNDLE2: Fa0/5 is not compatible with Po1 and
will be suspended (vlan mask is different)
Mar 20 08:12:39 PDT: %EC-5-CANNOT_BUNDLE2: Fa0/6 is not compatible with Po1 and
will be suspended (vlan mask is different)
These findings lead you to compare the port-channel interface to the physical interfaces to find out that the VLAN allowed list is missing on physical interfaces Fa0/5 and Fa0/6 of ASW1. You notify the junior staff member of the problem, and he changes the configuration by adding allowed VLAN list to physical interfaces Fa0/5 and Fa0/6. He then saves both pre- and post-change configurations in flash on the switch and communicates the change and the results to other team members.
To investigate the second problem, use the show spanning-tree command on VLAN 17, as demonstrated in Example 4-4.
ASW1# show spanning-tree vlan 17
MST0
Spanning tree enabled protocol mstp
Root ID Priority 32768
Address 001e.79a9.b580
This bridge is the root
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 32768 (priority 32768 sys-id-ext 0)
Address 001e.79a9.b580
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Interface Role Sts Cost Prio.Nbr Type
—————————- —— —- ————- ———— ————————————————
Fa0/7 Desg FWD 200000 128.9 P2p Edge
Po1 Desg BLK 100000 128.56 P2p
Po2 Desg BKN*100000 128.64 P2p Bound(PVST) *PVST_Inc
The output of this command has two elements that clearly point to a spanning-tree configuration issue. The BKN* and *PVST_Inc elements in the output point toward a spanning-tree inconsistency, while the Bound (PVST) element points toward a boundary between two different spanning-tree varieties. Because all other switches run Rapid Per-VLAN Spanning Tree (R-PVST+), it is reasonably safe to assume that switch ASW1 should not be running Multiple Spanning Tree (MST), but should be running R-PVST+. Note that the third line in Example 4-4 clearly indicates that ASW1 is running MST.
After verifying that CSW2 uses R-PVST+, and ASW1 uses MST, you check the baseline documentation and confirm that all switches should run R-PVST+. You notify the junior staff, and he makes the required configuration change on ASW1 by entering the command spanning-tree mode rapid-pvst. He also saves both pre- and post-change configurations in flash on the switch and communicates the problem and the resolution to team members.
Before starting to troubleshoot the third problem as reported, you check the status of the VLAN interface VLAN128, and based on the results shown in Example 4-5, you notice that the VLAN interface 128 is indeed down (and not administratively down).
ASW1# show ip interfaces brief | exclude unassigned
Interface IP-Address OK? Method Status Protocol
Vlan128 10.1.156.1 YES NVRAM up down
A VLAN interface is up as long as the VLAN exists and there is an active port in that VLAN that is in spanning-tree forwarding state. Therefore, when you discover that a VLAN interface is down, it is a good idea to first check the spanning-tree status for that VLAN. You enter the show spanning tree vlan 128 and the show vlan id 128 commands and discover, as shown in the output of Example 4-6, that spanning tree is not running for VLAN 128. That leads to the hypothesis that VLAN 128 does not exist on ASW1, which is confirmed immediately.
ASW1# show spanning-tree vlan 128
Spanning tree instance(s) for vlan 128 does not exist.
ASW1# show vlan id 128
VLAN id 128 not found in current VLAN database
You notify the junior staff, and he adds VLAN 128 to the configuration of ASW1. He then saves pre- and post-change configurations in flash on the switch and communicates both the problem and the solution to the other team members.
0 comments
Post a Comment