| 0 comments ]

Recommended Spanning Tree Practices

Add a note here There are many arguments in favor of using large Layer 2 domains in a corporate network. There are also good reasons why you should avoid Layer 2 in the network. The traditional way of doing transparent bridging requires the computation of a spanning tree for the data plane. Spanning means that there will be connectivity between any two devices that have at least one path physically available between them in the network. Tree means that the active topology will use a subset of the links physically available so that there is a single path between any two devices. (For example, there is no loop in the network.) Note that this requirement is related to the way frames are forwarded by bridges, not to the STP that is just a control protocol in charge of building such a tree. This behavior can result in a single copy being delivered to all the nodes in the network without any duplicate frames. This approach has the following two main drawbacks:

  • Add a note here Networkwide failure domain: A single source can send traffic that is propagated to all the links in the network. If an error condition occurs and the active topology includes a loop, because Ethernet frames do not include a Time-To-Live (TTL) field, traffic might circle around endlessly, resulting in networkwide flooding and link saturation.

  • Add a note here No multipathing: Because the forwarding paradigm requires the active topology to be a tree, only one path between any two nodes is used. That means that if there are N redundant paths between two devices, all but one will be simply ignored. Note that the introduction of a per-VLAN tree allows working around this constraint to a certain extent.

Add a note hereTo limit the impact of such limitations, the general recommendation is to use Layer 3 connectivity at the distribution or core layer of the network, keeping Layer 2 for the access layer. as shown in Figure 3-30. Using Layer 3 between the distribution and core layer allows multipathing (up to 16 paths) using Equal-Cost Multipathing (ECMP) without dependency of STP and is strongly preferred unless there is a need to extend Layer 2 across a data center pod (distribution block). ECMP refers to the situation in which a router has multiple equal-cost paths to a prefix, and thus load-balances traffic over each path. Newer technologies, such as Catalyst 6500 Virtual Switching System or Nexus 7000 virtual Port Channel (vPC), enable multipathing at Layer 2.

Click to collapse
Add a note hereFigure 3-30: Avoiding Spanning Layer 2 Domain in an Enterprise Network

Add a note here In modern networks, a 50-second convergence time is usually not acceptable. For this reason, Rapid Spanning Tree is widely preferred over legacy 802.1D implementations. In networks where a large number of VLANs are configured over many switches, it might be necessary to group STP instances with MST Protocol. Most of the time, the same VLAN would not be configured over many switches. VLANs would be local to a floor, thus spanning across a limited number of switches. In this configuration, RSTP provides the best efficiency.

Add a note hereRSTP is far superior to 802.1D STP and even PVST+ from a convergence perspective. It greatly improves the restoration times for any VLAN that requires a topology convergence due to link up, and it also greatly improves the convergence time over BackboneFast for any indirect link failures.


Note

Add a note here If a network includes other vendor switches, you should isolate the different STP domains with Layer 3 routing to avoid STP compatibility issues.

Add a note hereEven if the recommended design does not depend on STP to resolve link or node failure events, STP is required to protect against user-side loops. A loop can be introduced on the user-facing access layer ports in many ways. Wiring mistakes, misconfigured end stations, or malicious users can create a loop. STP is required to ensure a loop-free topology and to protect the rest of the network from problems created in the access layer.


Note

Add a note hereSome security personnel have recommended disabling STP at the network edge. This practice is not recommended because the risk of lost connectivity without STP is far greater than any STP information that might be revealed.

Add a note hereSpanning tree should be used and its topology controlled by root bridge manual designation. When the tree is created, use the STP toolkit to enhance the overall mechanism performances and reduce the time lost during topology changes.

Add a note hereTo configure a VLAN instance to become the root bridge, enter the spanning-tree vlan vlan_ID root command to modify the bridge priority from the default value (32768) to a significantly lower value. Manually placing the primary and secondary bridges along with enabling STP toolkit options enables you to support a deterministic configuration where you know which ports should be forwarding and which ports should be blocking.

Add a note here Figure 3-31 illustrates recommended placements for STP toolkit features:

  • Add a note hereLoop guard is implemented on the Layer 2 ports between distribution switches and on the uplink ports from the access switches to the distribution switches.

  • Add a note hereRoot guard is configured on the distribution switch ports facing the access switches.

  • Add a note hereUplinkFast is implemented on the uplink ports from the access switches to the distribution switches.

  • Add a note hereBPDU guard or root guard is configured on ports from the access switches to the end devices, as is PortFast.

  • Add a note hereThe UDLD protocol enables devices to monitor the physical configuration of the cables and detect when a unidirectional link exists. When a unidirectional link is detected, UDLD shuts down the affected LAN port. UDLD is often configured on ports linking switches.

  • Add a note hereDepending on the security requirements of an organization, the port security feature can be used to restrict a port’s ingress traffic by limiting the MAC addresses that are allowed to send traffic into the port.

Image from book
Add a note hereFigure 3-31: STP Toolkit Recommendation

Troubleshooting STP

Add a note here Bridging loops generally characterize STP problems. Troubleshooting STP involves identifying and preventing such loops.

Add a note hereThe primary function of STP is to prevent loops created by redundant links in bridged networks. STP operates at Layer 2 of the OSI model. STP fails in specific cases, such as hardware or software anomalies. Troubleshooting these situations is typically difficult depending on the design of the network.

Potential STP Problems

Add a note hereThe following subsections highlight common network conditions that lead to STP problems:

Add a note here Duplex Mismatch

Add a note hereDuplex mismatch on point-to-point links is a common configuration error. Duplex mismatch occurs specifically when one side of the link is manually configured as full duplex and the other side is using the default configuration for auto-negotiation. Such a configuration leads to duplex mismatch.

Add a note hereThe worst-case scenario for a duplex mismatch is when a bridge that is sending BPDUs is configured for half duplex on a link while its peer is configured for full duplex. In Figure 3-32, the duplex mismatch on the link between Switch A and Switch B could potentially lead to a bridging loop. Because Switch B is configured for full duplex, it starts forwarding frames even if Switch A is already using the link. This is a problem for Switch A, which detects a collision and runs the back-off algorithm before attempting another transmission of its frame. If there is enough traffic from Switch B to Switch A, every packet (including the BPDUs) sent by Switch A is deferred or has a collision and is subsequently dropped. From an STP point of view, because Switch B no longer receives BPDUs from Switch A, it assumes the root bridge is no longer present. Consequently, Switch B moves its port to Switch C into the forwarding state, creating a Layer 2 loop.

Image from book
Add a note hereFigure 3-32: Duplex Mismatch

Add a note here Unidirectional Link Failure

Add a note hereA unidirectional link is a frequent cause for a bridging loop. An undetected failure on a fiber link or a problem with a transceiver usually causes unidirectional links. With STP enabled to provide redundancy, any condition that results in a link maintaining a physical link connected status on both link partners but operating in a one-way communication state is detrimental to network stability because it could lead to bridging loops and routing black holes. Figure 3-33 shows such an example of a unidirectional link failure affecting STP.

Image from book
Add a note hereFigure 3-33: Unidirectional Link Failure

Add a note here The link between Switch A and Switch B is unidirectional and drops traffic from Switch A to Switch B while transmitting traffic from Switch B to Switch A. Suppose, however, that the interface on Switch B should be blocking. An interface blocks only if it receives BPDUs from a bridge that has a better priority. In this case, all the BPDUs coming from Switch A are lost, and Switch B eventually moves to the forwarding state, creating a loop. Note that in this case, if the failure exists at startup, STP does not converge correctly. In addition, rebooting of the bridges has absolutely no effect on this scenario.

Add a note hereTo resolve this problem, configure aggressive mode UDLD to detect incorrect cabling or unidirectional links and automatically put the affected port in err-disable state. The general recommended practice is to use aggressive mode UDLD on all point-to-point interfaces in any multilayer switched network.

Add a note here Frame Corruption

Add a note hereFrame corruption is another cause for STP failure. If an interface is experiencing a high rate of physical errors, the result may be lost BPDUs, which may lead to an interface in the blocking state moving to the forwarding state. However, this case is rare because STP default parameters are conservative. The blocking port needs to miss consecutive BPDUs for 50 seconds before transitioning to the forwarding state. In addition, any single BPDU that is successfully received by the switch breaks the loop. This case is more common for nondefault STP parameters and aggressive STP timer values. Frame corruption is generally a result of a duplex mismatch, bad cable, or incorrect cable length.

Add a note here Resource Errors

Add a note hereEven on high-end switches that perform most of their switching functions in hardware with specialized application-specific integrated circuits (ASIC), STP is performed by the CPU (software-based). This means that if the CPU of the bridge is over-utilized for any reason, it might lack the resources to send out BPDUs. STP is generally not a processor-intensive application and has priority over other processes; therefore, a resource problem is unlikely to arise. However, you need to exercise caution when multiple VLANs in PVST+ mode exist. Consult the product documentation for the recommended number of VLANs and STP instances on any specific Catalyst switch to avoid exhausting resources.

Add a note here PortFast Configuration Error

Add a note here As discussed in the previous “PortFast” section, the PortFast feature, when enabled on a port, bypasses the listening and learning states of STP, and the port transitions to the forwarding mode on linkup. The fast transition can lead to bridging loops if configured on incorrect ports.

Add a note hereIn Figure 3-34, Switch A has Port p1 in the forwarding state and Port p2 configured for PortFast. Device B is a hub. Port p2 goes to forwarding and creates a loop between p1 and p2 as soon as the second cable plugs in to Switch A. The loop ceases as soon as p1 or p2 receives a BPDU that transitions one of these two ports into blocking mode. The problem with this type of transient loop condition is that if the looping traffic is intensive, the bridge might have trouble successfully sending the BPDU that stops the loop. The BPDU Guard prevents this type of event from occurring.

Click to collapse
Add a note hereFigure 3-34: PortFast Configuration Error

Add a note here Troubleshooting Methodology

Add a note hereTroubleshooting STP issues can be difficult if logical troubleshooting procedures are not deployed in advance. Occasionally, rebooting of the switches might resolve the problem temporarily, but without determining the underlying cause of the problem, the problem is likely to return.

Add a note hereThe following steps provide a general overview of a methodology for troubleshooting STP:

Add a note here Step 1

Add a note hereDevelop a plan.

Add a note here Step 2

Add a note hereIsolate the cause and correct an STP problem.

Add a note here Step 3

Add a note hereDocument findings.

Add a note hereThe following subsections explain the approach to troubleshooting Layer 2 bridging loops in more detail.

Develop a Plan

Add a note here It is critical to develop a plan of action for potential STP issues. To create a plan, you must understand the following basic characteristics of your network:

  • Add a note hereTopology of the bridged network

  • Add a note hereLocation of the root bridge

  • Add a note hereLocation of the blocked ports and, therefore, the redundant links

Add a note hereKnowing the basic characteristics is essential in troubleshooting any Layer 2 issue. In addition, knowledge of the network helps to focus attention on the critical ports on key devices, because most of the STP troubleshooting steps simply involve using show commands to identify error conditions. Knowing which links on each device is redundant helps to quickly stop a bridging loop by disabling those links.

Isolate the Cause and Correct an STP Problem

Add a note hereIf there is a STP loop in your network, follow these steps:

Identify a Bridging Loop

Add a note hereThe best way to identify a bridging loop is to capture the traffic on a saturated link and to determine whether duplicate packets are propagating. If all users in a specific bridging domain have connectivity issues at the same time, a bridging loop is a possible cause. Check the port utilization on devices and look for abnormal values. In addition, you might see other protocols break down due to the bridging loops. For example, HSRP might complain of duplicate IP addresses if a loop causes it to see its own packets. Another common message during a loop is constant flapping of MAC addresses between interfaces. In a stable network, MAC addresses do not flap. In addition, be careful not to associate a bridging loop with a packet storm caused by another anomalous event such as an Internet worm or virus.

Restore Connectivity

Add a note hereBridging loops have severe consequences on a bridged network. Administrators generally do not have time to look for the cause of a loop, however, preferring to restore connectivity as soon as possible and identify potential issues later. Restoring connectivity consists of the following two actions:

  • Add a note here Breaking the loop: A simple solution is to manually disable every port that is providing redundancy in the network. Identify the part of the network that is more affected and start disabling ports in that area. If possible, start by disabling ports that should be in the blocking state. Check to see whether network connectivity is restored while disabling one port at a time.

  • Add a note here Logging events: If it is not possible to identify the source of the problem or if the problem is transient, enable logging and increase the logging level of STP events on the switches experiencing the failure. At a minimum, enable logging on switches with blocked ports because the transition of a blocked port to forwarding state creates the loop.

Add a note hereTo log detailed events or to identify STP problems, use debug commands on Cisco IOS–based Catalyst switches. Debugging commands, if used with care, can help identify the source of the problem.

Add a note hereUse the following command to enable STP debugging:

Add a note here
debug spanning-tree events

Add a note here Example 3-19 shows sample debug output for spanning-tree events.

Add a note hereUse the following command from global configuration mode to capture debug information into the logging buffer of a Catalyst switch.

Add a note here
logging buffered
Add a note here Example 3-19: Spanning-Tree Events Debug on Cisco IOS–Based Catalyst Switches

Add a note hereSwitch# debug spanning-tree events
Spanning Tree event debugging is on
Switch#
*Mar 5 21:23:14.994: STP: VLAN0013 sent Topology Change Notice on Gi0/3
*Mar 5 21:23:14.994: STP: VLAN0014 sent Topology Change Notice on Gi0/4
*Mar 5 21:23:14.994: STP: VLAN0051 sent Topology Change Notice on Po3
*Mar 5 21:23:14.994: STP: VLAN0052 sent Topology Change Notice on Po4
*Mar 5 21:23:15.982: %LINEPROTO-5-UPDOWN: Line protocol on Interface Giga-
bitEthernet0/1, changed state to down
*Mar 5 21:23:16.958: STP: VLAN0001 Topology Change rcvd on Po1


Note

Add a note hereWhen troubleshooting an IP subnet that spans multiple switches, it might be efficient to check the syslog server to collectively look at all the switches’ logged messages. However, if loss of network connectivity to the syslog server occurs, not all messages might be available.

Check Port Status

Add a note here Investigate the blocking ports first and then the other ports. The following are several guidelines for troubleshooting port status:

  • Add a note here Blocked ports: Check to make sure the switch reports receiving BPDUs periodically on root and blocked ports. Issue the following command on Cisco IOS–based Catalyst switches to display the number of BPDUs received on each interface:

    Add a note here
    show spanning-tree vlan vlan-id detail

    Add a note hereIssue the command multiple times to determine whether the device is receiving BPDUs.

  • Add a note here Duplex mismatch: To look for a duplex mismatch, check on each side of a point-to-point link. Simply use the show interface command to check the speed and duplex status of the specified ports.

  • Add a note here Port utilization: An overloaded interface may fail to transmit vital BPDUs and is also an indication of a possible bridging loop. Use the show interface command to determine interface utilization using the load of the interface and packet input and output rates.

  • Add a note here Frame corruption: Look for increases in the input error fields of the show interface command.

Look for Resource Errors

Add a note hereHigh CPU utilization can lead to network instability for switches running STP. Use the show processes cpu command to check whether the CPU utilization is approaching 100 percent. Cisco Catalyst switches prioritize control packets such as BPDU over any lower-priority traffic; hence, the switch would be stable with higher CPU if it were just processing low-priority traffic. As a general rule of thumb, if the CPU exceeds 70 percent, action should be taken to rectify any problem or to consider re-architecting the network to prevent any potential future problems.

Disable Unneeded Features

Add a note hereDisabling as many features as possible reduces troubleshooting complexity. EtherChannel, for example, is a feature that bundles several different links into a single logical port. It might be helpful to disable this feature while troubleshooting. In general, simplifying the network configuration reduces the troubleshooting effort. If configuration changes are made during the troubleshooting effort, note the changes. An alternative way is to save the configuration by maintaining a copy of the configuring in bootflash or on a TFTP server. After the root cause is found and fixed, the removed configurations can be easily reapplied.

Document Findings

Add a note hereWhen the STP issue is isolated and resolved, it is important to document any learnings from the incident as part of improving the plan for future issues. Not documenting any configuration or network design changes to the previous plan might result in difficulty troubleshooting during the next STP issue. Documentation of the network is critical for eventual up time of the business. Significant amounts of outages can be prevented by planning ahead. In some cases, some links can be disabled to break the loop without impacting the business during business hours, and troubleshooting can be performed after-hours. Without clear documentation, the network will begin to affect all critical functions, and as network administrators, it is critical to have the proper documentation to reduce the time to stabilize the network. Documentation includes the IP addresses of all the devices, passwords, root and the secondary root, and the proper configuration of all switch to switch or switch to router links. Also, knowing the network topology diagram with port number information can help determine quickly how the problem is manifested in the network. Having known good configuration is also essential to recover the network quickly.


Summary

Add a note here The Spanning Tree Protocol is a fundamental protocol to prevent Layer 2 loops and at the same time provide redundancy in the network. This chapter covered the basic operation and configuration of RSTP and MST. Enhancements now enable STP to converge more quickly and run more efficiently.

  • Add a note hereRSTP provides faster convergence than 802.1D when topology changes occur.

  • Add a note hereRSTP enables several additional port roles to increase the overall mechanism’s efficiency.

  • Add a note here show spanning-tree is the main family of commands used to verify RSTP operations.

  • Add a note hereMST reduces the encumbrance of PVRST+ by allowing a single instance of spanning tree to run for multiple VLANs.

Add a note hereThe Cisco STP enhancements provide robustness and resiliency to the protocol. These enhancements add availability to the multilayer switched network. These enhancements not only isolate bridging loops but also prevent bridging loops from occurring.

Add a note hereTo protect STP operations, several features are available that control the way BPDUs are sent and received:

  • Add a note hereBPDU guard protects the operation of STP on PortFast-configured ports.

  • Add a note hereBPDU filter is a variant that prevents BPDUs from being sent and received while leaving the port in forwarding state.

  • Add a note hereRoot guard prevents root switch being elected via BPDUs received on a root-guard-configured port.

  • Add a note hereLoop guard detects and disables an interface with Layer 2 unidirectional connectivity, protecting the network from anomalous STP conditions.

  • Add a note hereUDLD detects and disables an interface with unidirectional connectivity, protecting the network from anomalous STP conditions.

  • Add a note hereIn most implementations, the STP toolkit should be used in combination with additional features, such as Flex Links.

Add a note here Spanning Tree Protocol troubleshooting is achieved with careful planning and documentation before the problem and following a set of logical troubleshooting steps to identify and correct the problem. The troubleshooting exercise needs to be completed with documenting findings and making appropriate changes to the planning document.

0 comments

Post a Comment