Cisco Ebook: Chapter 03: Implementing Spanning Tree (Part 04)

Chapter 03: Implementing Spanning Tree (Part 04)

[11:49 PM | 0 comments ]

Recommended Spanning Tree Practices

There are many arguments in favor of using large Layer 2 domains in a corporate network. There are also good reasons why you should avoid Layer 2 in the network. The traditional way of doing transparent bridging requires the computation of a spanning tree for the data plane. Spanning means that there will be connectivity between any two devices that have at least one path physically available between them in the network. Tree means that the active topology will use a subset of the links physically available so that there is a single path between any two devices. (For example, there is no loop in the network.) Note that this requirement is related to the way frames are forwarded by bridges, not to the STP that is just a control protocol in charge of building such a tree. This behavior can result in a single copy being delivered to all the nodes in the network without any duplicate frames. This approach has the following two main drawbacks:

Networkwide failure domain: A single source can send traffic that is propagated to all the links in the network. If an error condition occurs and the active topology includes a loop, because Ethernet frames do not include a Time-To-Live (TTL) field, traffic might circle around endlessly, resulting in networkwide flooding and link saturation.
No multipathing: Because the forwarding paradigm requires the active topology to be a tree, only one path between any two nodes is used. That means that if there are N redundant paths between two devices, all but one will be simply ignored. Note that the introduction of a per-VLAN tree allows working around this constraint to a certain extent.

To limit the impact of such limitations, the general recommendation is to use Layer 3 connectivity at the distribution or core layer of the network, keeping Layer 2 for the access layer. as shown in Figure 3-30. Using Layer 3 between the distribution and core layer allows multipathing (up to 16 paths) using Equal-Cost Multipathing (ECMP) without dependency of STP and is strongly preferred unless there is a need to extend Layer 2 across a data center pod (distribution block). ECMP refers to the situation in which a router has multiple equal-cost paths to a prefix, and thus load-balances traffic over each path. Newer technologies, such as Catalyst 6500 Virtual Switching System or Nexus 7000 virtual Port Channel (vPC), enable multipathing at Layer 2.

Figure 3-30: Avoiding Spanning Layer 2 Domain in an Enterprise Network

In modern networks, a 50-second convergence time is usually not acceptable. For this reason, Rapid Spanning Tree is widely preferred over legacy 802.1D implementations. In networks where a large number of VLANs are configured over many switches, it might be necessary to group STP instances with MST Protocol. Most of the time, the same VLAN would not be configured over many switches. VLANs would be local to a floor, thus spanning across a limited number of switches. In this configuration, RSTP provides the best efficiency.

RSTP is far superior to 802.1D STP and even PVST+ from a convergence perspective. It greatly improves the restoration times for any VLAN that requires a topology convergence due to link up, and it also greatly improves the convergence time over BackboneFast for any indirect link failures.

Note

If a network includes other vendor switches, you should isolate the different STP domains with Layer 3 routing to avoid STP compatibility issues.

Even if the recommended design does not depend on STP to resolve link or node failure events, STP is required to protect against user-side loops. A loop can be introduced on the user-facing access layer ports in many ways. Wiring mistakes, misconfigured end stations, or malicious users can create a loop. STP is required to ensure a loop-free topology and to protect the rest of the network from problems created in the access layer.

Note

Some security personnel have recommended disabling STP at the network edge. This practice is not recommended because the risk of lost connectivity without STP is far greater than any STP information that might be revealed.

Spanning tree should be used and its topology controlled by root bridge manual designation. When the tree is created, use the STP toolkit to enhance the overall mechanism performances and reduce the time lost during topology changes.

To configure a VLAN instance to become the root bridge, enter the spanning-tree vlan vlan_ID root command to modify the bridge priority from the default value (32768) to a significantly lower value. Manually placing the primary and secondary bridges along with enabling STP toolkit options enables you to support a deterministic configuration where you know which ports should be forwarding and which ports should be blocking.

Figure 3-31 illustrates recommended placements for STP toolkit features:

Loop guard is implemented on the Layer 2 ports between distribution switches and on the uplink ports from the access switches to the distribution switches.
Root guard is configured on the distribution switch ports facing the access switches.
UplinkFast is implemented on the uplink ports from the access switches to the distribution switches.
BPDU guard or root guard is configured on ports from the access switches to the end devices, as is PortFast.
The UDLD protocol enables devices to monitor the physical configuration of the cables and detect when a unidirectional link exists. When a unidirectional link is detected, UDLD shuts down the affected LAN port. UDLD is often configured on ports linking switches.
Depending on the security requirements of an organization, the port security feature can be used to restrict a port’s ingress traffic by limiting the MAC addresses that are allowed to send traffic into the port.

Figure 3-31: STP Toolkit Recommendation

Troubleshooting STP

Bridging loops generally characterize STP problems. Troubleshooting STP involves identifying and preventing such loops.

The primary function of STP is to prevent loops created by redundant links in bridged networks. STP operates at Layer 2 of the OSI model. STP fails in specific cases, such as hardware or software anomalies. Troubleshooting these situations is typically difficult depending on the design of the network.

Potential STP Problems

The following subsections highlight common network conditions that lead to STP problems:

Duplex mismatch
Unidirectional link failure
Frame corruption
Resource errors
PortFast configuration error
Inappropriate STP diameter parameter tuning

Duplex Mismatch

Duplex mismatch on point-to-point links is a common configuration error. Duplex mismatch occurs specifically when one side of the link is manually configured as full duplex and the other side is using the default configuration for auto-negotiation. Such a configuration leads to duplex mismatch.

The worst-case scenario for a duplex mismatch is when a bridge that is sending BPDUs is configured for half duplex on a link while its peer is configured for full duplex. In Figure 3-32, the duplex mismatch on the link between Switch A and Switch B could potentially lead to a bridging loop. Because Switch B is configured for full duplex, it starts forwarding frames even if Switch A is already using the link. This is a problem for Switch A, which detects a collision and runs the back-off algorithm before attempting another transmission of its frame. If there is enough traffic from Switch B to Switch A, every packet (including the BPDUs) sent by Switch A is deferred or has a collision and is subsequently dropped. From an STP point of view, because Switch B no longer receives BPDUs from Switch A, it assumes the root bridge is no longer present. Consequently, Switch B moves its port to Switch C into the forwarding state, creating a Layer 2 loop.

Figure 3-32: Duplex Mismatch

Unidirectional Link Failure

A unidirectional link is a frequent cause for a bridging loop. An undetected failure on a fiber link or a problem with a transceiver usually causes unidirectional links. With STP enabled to provide redundancy, any condition that results in a link maintaining a physical link connected status on both link partners but operating in a one-way communication state is detrimental to network stability because it could lead to bridging loops and routing black holes. Figure 3-33 shows such an example of a unidirectional link failure affecting STP.

Figure 3-33: Unidirectional Link Failure

The link between Switch A and Switch B is unidirectional and drops traffic from Switch A to Switch B while transmitting traffic from Switch B to Switch A. Suppose, however, that the interface on Switch B should be blocking. An interface blocks only if it receives BPDUs from a bridge that has a better priority. In this case, all the BPDUs coming from Switch A are lost, and Switch B eventually moves to the forwarding state, creating a loop. Note that in this case, if the failure exists at startup, STP does not converge correctly. In addition, rebooting of the bridges has absolutely no effect on this scenario.

To resolve this problem, configure aggressive mode UDLD to detect incorrect cabling or unidirectional links and automatically put the affected port in err-disable state. The general recommended practice is to use aggressive mode UDLD on all point-to-point interfaces in any multilayer switched network.

Frame Corruption

Frame corruption is another cause for STP failure. If an interface is experiencing a high rate of physical errors, the result may be lost BPDUs, which may lead to an interface in the blocking state moving to the forwarding state. However, this case is rare because STP default parameters are conservative. The blocking port needs to miss consecutive BPDUs for 50 seconds before transitioning to the forwarding state. In addition, any single BPDU that is successfully received by the switch breaks the loop. This case is more common for nondefault STP parameters and aggressive STP timer values. Frame corruption is generally a result of a duplex mismatch, bad cable, or incorrect cable length.

Resource Errors

Even on high-end switches that perform most of their switching functions in hardware with specialized application-specific integrated circuits (ASIC), STP is performed by the CPU (software-based). This means that if the CPU of the bridge is over-utilized for any reason, it might lack the resources to send out BPDUs. STP is generally not a processor-intensive application and has priority over other processes; therefore, a resource problem is unlikely to arise. However, you need to exercise caution when multiple VLANs in PVST+ mode exist. Consult the product documentation for the recommended number of VLANs and STP instances on any specific Catalyst switch to avoid exhausting resources.

PortFast Configuration Error

As discussed in the previous “PortFast” section, the PortFast feature, when enabled on a port, bypasses the listening and learning states of STP, and the port transitions to the forwarding mode on linkup. The fast transition can lead to bridging loops if configured on incorrect ports.

In Figure 3-34, Switch A has Port p1 in the forwarding state and Port p2 configured for PortFast. Device B is a hub. Port p2 goes to forwarding and creates a loop between p1 and p2 as soon as the second cable plugs in to Switch A. The loop ceases as soon as p1 or p2 receives a BPDU that transitions one of these two ports into blocking mode. The problem with this type of transient loop condition is that if the looping traffic is intensive, the bridge might have trouble successfully sending the BPDU that stops the loop. The BPDU Guard prevents this type of event from occurring.

Figure 3-34: PortFast Configuration Error

Troubleshooting Methodology

Troubleshooting STP issues can be difficult if logical troubleshooting procedures are not deployed in advance. Occasionally, rebooting of the switches might resolve the problem temporarily, but without determining the underlying cause of the problem, the problem is likely to return.

The following steps provide a general overview of a methodology for troubleshooting STP:

Step 1	Develop a plan.
Step 2	Isolate the cause and correct an STP problem.
Step 3	Document findings.

The following subsections explain the approach to troubleshooting Layer 2 bridging loops in more detail.

Develop a Plan

It is critical to develop a plan of action for potential STP issues. To create a plan, you must understand the following basic characteristics of your network:

Topology of the bridged network
Location of the root bridge
Location of the blocked ports and, therefore, the redundant links

Knowing the basic characteristics is essential in troubleshooting any Layer 2 issue. In addition, knowledge of the network helps to focus attention on the critical ports on key devices, because most of the STP troubleshooting steps simply involve using show commands to identify error conditions. Knowing which links on each device is redundant helps to quickly stop a bridging loop by disabling those links.

Isolate the Cause and Correct an STP Problem

If there is a STP loop in your network, follow these steps:

Step 1	Identify a bridging loop.
Step 2	Restore connectivity.
Step 3	Check the port status.
Step 4	Check for resource errors.
Step 5	Disable unneeded features.

Identify a Bridging Loop

The best way to identify a bridging loop is to capture the traffic on a saturated link and to determine whether duplicate packets are propagating. If all users in a specific bridging domain have connectivity issues at the same time, a bridging loop is a possible cause. Check the port utilization on devices and look for abnormal values. In addition, you might see other protocols break down due to the bridging loops. For example, HSRP might complain of duplicate IP addresses if a loop causes it to see its own packets. Another common message during a loop is constant flapping of MAC addresses between interfaces. In a stable network, MAC addresses do not flap. In addition, be careful not to associate a bridging loop with a packet storm caused by another anomalous event such as an Internet worm or virus.

Restore Connectivity

Bridging loops have severe consequences on a bridged network. Administrators generally do not have time to look for the cause of a loop, however, preferring to restore connectivity as soon as possible and identify potential issues later. Restoring connectivity consists of the following two actions:

Breaking the loop: A simple solution is to manually disable every port that is providing redundancy in the network. Identify the part of the network that is more affected and start disabling ports in that area. If possible, start by disabling ports that should be in the blocking state. Check to see whether network connectivity is restored while disabling one port at a time.
Logging events: If it is not possible to identify the source of the problem or if the problem is transient, enable logging and increase the logging level of STP events on the switches experiencing the failure. At a minimum, enable logging on switches with blocked ports because the transition of a blocked port to forwarding state creates the loop.

To log detailed events or to identify STP problems, use debug commands on Cisco IOS–based Catalyst switches. Debugging commands, if used with care, can help identify the source of the problem.

Use the following command to enable STP debugging:


debug spanning-tree events

Example 3-19 shows sample debug output for spanning-tree events.

Use the following command from global configuration mode to capture debug information into the logging buffer of a Catalyst switch.


logging buffered

Example 3-19: Spanning-Tree Events Debug on Cisco IOS–Based Catalyst Switches

Switch# debug spanning-tree events
Spanning Tree event debugging is on
Switch#
*Mar  5 21:23:14.994: STP: VLAN0013 sent Topology Change Notice on Gi0/3
*Mar  5 21:23:14.994: STP: VLAN0014 sent Topology Change Notice on Gi0/4
*Mar  5 21:23:14.994: STP: VLAN0051 sent Topology Change Notice on Po3
*Mar  5 21:23:14.994: STP: VLAN0052 sent Topology Change Notice on Po4
*Mar  5 21:23:15.982: %LINEPROTO-5-UPDOWN: Line protocol on Interface Giga-
bitEthernet0/1, changed state to down
*Mar  5 21:23:16.958: STP: VLAN0001 Topology Change rcvd on Po1

Note

When troubleshooting an IP subnet that spans multiple switches, it might be efficient to check the syslog server to collectively look at all the switches’ logged messages. However, if loss of network connectivity to the syslog server occurs, not all messages might be available.

Check Port Status

Investigate the blocking ports first and then the other ports. The following are several guidelines for troubleshooting port status:

Blocked ports: Check to make sure the switch reports receiving BPDUs periodically on root and blocked ports. Issue the following command on Cisco IOS–based Catalyst switches to display the number of BPDUs received on each interface:
```
show spanning-tree vlan vlan-id detail
```
Issue the command multiple times to determine whether the device is receiving BPDUs.
Duplex mismatch: To look for a duplex mismatch, check on each side of a point-to-point link. Simply use the show interface command to check the speed and duplex status of the specified ports.
Port utilization: An overloaded interface may fail to transmit vital BPDUs and is also an indication of a possible bridging loop. Use the show interface command to determine interface utilization using the load of the interface and packet input and output rates.
Frame corruption: Look for increases in the input error fields of the show interface command.

Look for Resource Errors

High CPU utilization can lead to network instability for switches running STP. Use the show processes cpu command to check whether the CPU utilization is approaching 100 percent. Cisco Catalyst switches prioritize control packets such as BPDU over any lower-priority traffic; hence, the switch would be stable with higher CPU if it were just processing low-priority traffic. As a general rule of thumb, if the CPU exceeds 70 percent, action should be taken to rectify any problem or to consider re-architecting the network to prevent any potential future problems.

Disable Unneeded Features

Disabling as many features as possible reduces troubleshooting complexity. EtherChannel, for example, is a feature that bundles several different links into a single logical port. It might be helpful to disable this feature while troubleshooting. In general, simplifying the network configuration reduces the troubleshooting effort. If configuration changes are made during the troubleshooting effort, note the changes. An alternative way is to save the configuration by maintaining a copy of the configuring in bootflash or on a TFTP server. After the root cause is found and fixed, the removed configurations can be easily reapplied.

Document Findings

When the STP issue is isolated and resolved, it is important to document any learnings from the incident as part of improving the plan for future issues. Not documenting any configuration or network design changes to the previous plan might result in difficulty troubleshooting during the next STP issue. Documentation of the network is critical for eventual up time of the business. Significant amounts of outages can be prevented by planning ahead. In some cases, some links can be disabled to break the loop without impacting the business during business hours, and troubleshooting can be performed after-hours. Without clear documentation, the network will begin to affect all critical functions, and as network administrators, it is critical to have the proper documentation to reduce the time to stabilize the network. Documentation includes the IP addresses of all the devices, passwords, root and the secondary root, and the proper configuration of all switch to switch or switch to router links. Also, knowing the network topology diagram with port number information can help determine quickly how the problem is manifested in the network. Having known good configuration is also essential to recover the network quickly.

Summary

The Spanning Tree Protocol is a fundamental protocol to prevent Layer 2 loops and at the same time provide redundancy in the network. This chapter covered the basic operation and configuration of RSTP and MST. Enhancements now enable STP to converge more quickly and run more efficiently.

RSTP provides faster convergence than 802.1D when topology changes occur.
RSTP enables several additional port roles to increase the overall mechanism’s efficiency.
show spanning-tree is the main family of commands used to verify RSTP operations.
MST reduces the encumbrance of PVRST+ by allowing a single instance of spanning tree to run for multiple VLANs.

The Cisco STP enhancements provide robustness and resiliency to the protocol. These enhancements add availability to the multilayer switched network. These enhancements not only isolate bridging loops but also prevent bridging loops from occurring.

To protect STP operations, several features are available that control the way BPDUs are sent and received:

BPDU guard protects the operation of STP on PortFast-configured ports.
BPDU filter is a variant that prevents BPDUs from being sent and received while leaving the port in forwarding state.
Root guard prevents root switch being elected via BPDUs received on a root-guard-configured port.
Loop guard detects and disables an interface with Layer 2 unidirectional connectivity, protecting the network from anomalous STP conditions.
UDLD detects and disables an interface with unidirectional connectivity, protecting the network from anomalous STP conditions.
In most implementations, the STP toolkit should be used in combination with additional features, such as Flex Links.

Spanning Tree Protocol troubleshooting is achieved with careful planning and documentation before the problem and following a set of logical troubleshooting steps to identify and correct the problem. The troubleshooting exercise needs to be completed with documenting findings and making appropriate changes to the planning document.

0 comments

About Me

This is a personal blog about connectivity for learning - funny - sharing and reference, in my opinion, covers everything about IT network infrastructures and all of its related components, like new software and/or hardware from vendors like Cisco Systems, Microsoft, IBM, HP, CheckPoint, Juniper and other things and so on. So that some blogs also contain useful configuration examples, posts and articles, at least for me, from different network components. I created this blog to share my knowledge with other people and hopefully someone will share his knowledge with me ... contains blogs about everything related to IT network infrastructures. Most of the blogs contain experiences of myself during my work.

Who am I ... My name is Huynh Phi Long and currently I work as a IT network administrator at PPF - Homecredit.

You can contact to me by email: longhp@live.com

Recommended Spanning Tree Practices

Troubleshooting STP

Potential STP Problems

Duplex Mismatch

Unidirectional Link Failure

Frame Corruption

Resource Errors

PortFast Configuration Error

Troubleshooting Methodology

Develop a Plan

Isolate the Cause and Correct an STP Problem

Identify a Bridging Loop

Restore Connectivity

Check Port Status

Look for Resource Errors

Disable Unneeded Features

Document Findings

Summary

0 comments

Post a Comment

About Me

Chat Box

Google Maps

Visitor Locations

Lunar Calendar

Link URL

Followers

Lưu Trữ - Archive

Others

Posted