High-Availability Considerations
In the campus, high availability is concerned with minimizing link and node failures and optimizing recovery times to minimize convergence and downtime.
Implement Optimal Redundancy
The recommended design is redundant distribution layer switches and redundant connections to the core with a Layer 3 link between the distribution switches. Access switches should have redundant connections to redundant distribution switches, as illustrated in Figure 2-5.
As a recommended practice, the core and distribution layers are built with redundant switches and fully meshed links to provide maximum redundancy and optimal convergence. Access switches should have redundant connections to redundant distribution switches. The network bandwidth and capacity is engineered to withstand a switch or link failure, supporting 120 to 200 ms to converge around most events. Open Shortest Path First (OSPF) and Enhanced Interior Gateway Routing Protocol (EIGRP) timer manipulation attempt to quickly redirect the flow of traffic away from a router that has experienced a failure toward an alternate path.
In a fully redundant topology with tuned IGP timers, adding redundant supervisors with Cisco NSF and SSO may cause longer convergence times than single supervisors with tuned IGP timers. NSF attempts to maintain the flow of traffic through a router that has experienced a failure. NSF with SSO is designed to maintain a link-up Layer 3 up state during a routing convergence event. However, because an interaction occurs between the IGP timers and the NSF timers, the tuned IGP timers can cause NSF-aware neighbors to reset the neighbor relationships.
Note | Combining OSPF and EIGRP timer manipulation with Cisco NSF might not be the most common deployment environment. OSPF and EIGRP timer manipulation is designed to improve convergence time in a multiaccess network (where several IGP routing peers share a common broadcast media, such as Ethernet). The primary deployment scenario for Cisco NSF with SSO is in the enterprise network edge. Here, the data link layer generally consists of point-to-point links either to service providers or redundant Gigabit Ethernet point-to-point links to the campus infrastructure. |
In nonredundant topologies, using Cisco NSF with SSO and redundant supervisors can provide significant resiliency improvements.
Provide Alternate Paths
The recommended distribution layer design is redundant distribution layer switches and redundant connections to the core with a Layer 3 link between the distribution switches, as illustrated in Figure 2-6.
Although dual distribution switches connected individually to separate core switches will reduce peer relationships and port counts in the core layer, this design does not provide sufficient redundancy. In the event of a link or core switch failure, traffic will be dropped.
An additional link providing an alternate path to a second core switch from each distribution switch offers redundancy to support a single link or node failure. A link between the two distribution switches is needed to support summarization of routing information from the distribution layer to the core.
Avoid Single Points of Failure
Cisco NSF with SSO and redundant supervisors has the most impact in the campus in the access layer. An access switch failure is a single point of failure that causes outage for the end devices connected to it. You can reduce the outage to one to three seconds in this access layer, as shown in Figure 2-7, by using SSO in a Layer 2 environment or Cisco NSF with SSO in a Layer 3 environment.
Note | The SSO feature is available on the Catalyst 4500 and 6500/7600 switches. |
Cisco NSF with SSO
Cisco NSF with SSO is a supervisor redundancy mechanism in Cisco IOS Software that allows extremely fast supervisor switchover at Layers 2 to 4.
SSO allows the standby route processor (RP) to take control of the device after a hardware or software fault on the active RP. SSO synchronizes startup configuration, startup variables, and running configuration; and dynamic runtime data, including Layer 2 protocol states for trunks and ports, hardware Layer 2 and Layer 3 tables (MAC, Forwarding Information Base [FIB], and adjacency tables) and access control lists (ACL) and QoS tables.
Cisco NSF is a Layer 3 function that works with SSO to minimize the amount of time a network is unavailable to its users following a switchover. The main objective of Cisco NSF is to continue forwarding IP packets following an RP switchover. Cisco NSF is supported by the EIGRP, OSPF, Intermediate System-to-Intermediate System (IS-IS), and Border Gateway Protocol (BGP) for routing. A router running these protocols can detect an internal switchover and take the necessary actions to continue forwarding network traffic using Cisco Express Forwarding while recovering route information from the peer devices. With Cisco NSF, peer networking devices continue to forward packets while route convergence completes and do not experience routing flaps.
Routing Protocol Requirements for Cisco NSF
Usually, when a router restarts, all its routing peers detect that routing adjacency went down and then came back up. This transition is called a routing flap, and the protocol state is not maintained. Routing flaps create routing instabilities, which are detrimental to overall network performance. Cisco NSF helps to suppress routing flaps.
Cisco NSF allows for the continued forwarding of data packets along known routes while the routing protocol information is being restored following a switchover. With Cisco NSF, peer Cisco NSF devices do not experience routing flaps because the interfaces remain up during a switchover and adjacencies are not reset. Data traffic is forwarded while the standby RP assumes control from the failed active RP during a switchover. User sessions established before the switchover are maintained.
The ability of the intelligent line cards to remain up through a switchover and to be kept current with the FIB on the active RP is crucial to Cisco NSF operation. While the control plane builds a new routing protocol database and restarts peering agreements, the data plane relies on pre-switchover forwarding-table synchronization to continue forwarding traffic. After the routing protocols have converged, Cisco Express Forwarding updates the FIB table and removes stale route entries, and then it updates the line cards with the refreshed FIB information.
Note | Transient routing loops or black holes may be introduced if the network topology changes before the FIB is updated. |
The switchover must be completed before the Cisco NSF dead and hold timers expire; otherwise, the peers will reset the adjacency and reroute the traffic.
Cisco NSF protocol enhancements enable a Cisco NSF capable router to signal neighboring Cisco NSF-aware devices during switchover.
A Cisco NSF-aware neighbor is needed so that Cisco NSF-capable systems can rebuild their databases and maintain their neighbor adjacencies across a switchover.
Following a switchover, the Cisco NSF-capable device requests that the Cisco NSF-aware neighbor devices send state information to help rebuild the routing tables as a Cisco NSF reset.
The Cisco NSF protocol enhancements allow a Cisco NSF-capable router to signal neighboring Cisco NSF-aware devices. The signal asks that the neighbor relationship not be reset. As the Cisco NSF-capable router receives and communicates with other routers on the network, it can begin to rebuild its neighbor list. After neighbor relationships are reestablished, the Cisco NSF-capable router begins to resynchronize its database with all of its Cisco NSF-aware neighbors.
Based on platform and Cisco IOS Software release, Cisco NSF with SSO support is available for many routing protocols:
-
EIGRP
-
OSPF
-
BGP
-
IS-IS
Cisco IOS Software Modularity Architecture
The Cisco Catalyst 6500 series with Cisco IOS Software Modularity supports high availability in the enterprise. Figure 2-8 illustrates the key elements and components of the Cisco Software Modularity Architecture.
When Cisco IOS Software patches are needed on systems without Cisco IOS Software Modularity, the new image must be loaded on the active and redundant supervisors, and the supervisor must be reloaded or the switchover to the standby completed to load the patch.
The control plane functions (that manage routing protocol updates and management traffic) on the Catalyst 6500 series run on dedicated CPUs on the multilayer switch forwarding card complex (MSFC). A completely separate data plane is responsible for traffic forwarding. When the hardware is programmed for nonstop operation, the data plane continues forwarding traffic even if there is a disruption in the control plane. The Catalyst 6500 series switches benefit from the more resilient control plane offered by Cisco IOS Software Modularity.
Note | Catalyst switch forwarding fabrics are broken down into three planes or functional areas, as follows:
|
The Cisco Catalyst 6500 series with Cisco IOS Software Modularity enables several Cisco IOS control plane subsystems to run in independent processes. Cisco IOS Software Modularity boosts operational efficiency and minimizes downtime:
-
It minimizes unplanned downtime through fault containment and stateful process restarts, raising the availability of converged applications.
-
It simplifies software changes through subsystem in-service software upgrades (ISSU), significantly reducing code certification and deployment times and decreasing business risks.
-
It enables process-level, automated policy control by integrating Cisco IOS Embedded Event Manager (EEM), offloading time-consuming tasks to the network and accelerating the resolution of network issues. EEM is a combination of processes designed to monitor key system parameters such as CPU utilization, interface counters, Simple Network Management Protocol (SNMP), and syslog events. It acts on specific events or threshold counters that are exceeded.
Note | Embedded Event Manager is discussed in more detail in Chapter 12, “Network Management Capabilities with Cisco IOS Software.” |
Example—Software Modularity Benefits
Cisco IOS Software Modularity on the Cisco Catalyst 6500 series provides these benefits:
-
Operational consistency: Cisco IOS Software Modularity does not change the operational point of view. Command-line interfaces (CLI) and management interfaces such as SNMP or syslog are the same as before. New commands to EXEC and configuration mode and new show commands have been added to support the new functionality.
-
Protected memory: Cisco IOS Software Modularity enables a memory architecture where processes make use of a protected address space. Each process and its associated subsystems live in an individual memory space. Using this model, memory corruption across process boundaries becomes nearly impossible.
-
Fault containment: The benefit of protected memory space is increased availability because problems occurring in one process cannot affect other parts of the system. For example, if a less-critical system process fails or is not operating as expected, critical functions required to maintain packet forwarding are not affected.
-
Process restartability: Building on the protected memory space and fault containment, the modular processes are now individually restartable. For test purposes or nonresponding processes, the process restart process-name command is provided to manually restart processes. Restarting a process allows fast recovery from transient errors without the need to disrupt forwarding. Integrated high-availability infrastructure constantly checks the state of processes and keeps track of how many times a process restarted in a defined time interval. If a process restart does not restore the system, the high-availability infrastructure will take more drastic actions, such as initiating a supervisor engine switchover or a system restart.
Note Although a process restart can be initiated by the user, it should be done with caution.
-
Modularized processes: Several control plane functions have been modularized to cover the most commonly used features. Examples of modular processes include but are not limited to these:
-
Subsystem ISSU: Cisco IOS Software Modularity allows selective system maintenance during runtime through individual patches. By providing versioning and patch-management capabilities, Cisco IOS Software Modularity allows patches to be downloaded, verified, installed, and activated without the need to restart the system. Because data plane packet forwarding is not affected during the patch process, the network operator now has the flexibility to introduce software changes at any time through ISSU. A patch affects only the software components associated with the update.
Designing an Optimum Design for Layer 2
Layer 2 architectures rely on the following technologies to create a highly available, deterministic topology: Spanning Tree Protocol (STP), trunking (ISL/802.1q), Unidirectional Link Detection (UDLD), and EtherChannel.
The following section reviews design models and recommended practices for Layer 2 high availability and optimum convergence of the Cisco Enterprise Campus Infrastructure.
Recommended Practices for Spanning-Tree Configuration
For the most deterministic and highly available network topology, the requirement to support STP convergence should be avoided by design. You may need to implement STP for several reasons:
-
When a VLAN spans access layer switches to support business applications.
-
To protect against user-side loops. Even if the recommended design does not depend on STP to resolve link or node failure events, STP is required to protect against user-side loops. There are many ways that a loop can be introduced on the user-facing access layer ports. Wiring mistakes, misconfigured end stations, or malicious users can create a loop. STP is required to ensure a loop-free topology and to protect the rest of the network from problems created in the access layer.
-
To support data center applications on a server farm.
Note | Some security personnel have recommended disabling STP at the network edge. This practice is not recommended because the risk of lost connectivity without STP is far greater than any STP information that might be revealed. |
If you need to implement STP, use Rapid Per-VLAN Spanning-Tree Plus (RPVST+). You should also take advantage of the Cisco enhancements to STP known as the Cisco STP toolkit.
Cisco STP Toolkit
The Cisco enhancements to STP include the following. (Note that the enhancements marked with an * are also supported with Rapid Per-VLAN Spanning-Tree Plus [RPVST+].)
-
PortFast*: Causes a Layer 2 LAN interface configured as an access port to enter the forwarding state immediately, bypassing the listening and learning states. Use PortFast only when connecting a single end station to a Layer 2 access port.
-
UplinkFast: Provides from three to five seconds convergence after a direct link failure and achieves load balancing between redundant Layer 2 links using uplink groups.
-
BackboneFast: Cuts convergence time by max_age for indirect failure. BackboneFast is initiated when a root port or blocked port on a network device receives inferior bridge protocol data units (BPDU) from its designated bridge.
-
Loop guard*: Prevents an alternate or root port from becoming designated in the absence of BPDUs. Loop guard helps prevent bridging loops that could occur because of a unidirectional link failure on a point-to-point link.
-
Root guard*: Secures the root on a specific switch by preventing external switches from becoming roots.
-
BPDU guard*: When configured on a PortFast-enabled port, BPDU guard shuts down the port that receives a BPDU.
-
Unidirectional Link Detection (UDLD): UDLD monitors the physical configuration of fiber-optic and copper connections and detects when a one-way connection exists. When a unidirectional link is detected, the interface is shut down and the system alerted.
Note | The STP toolkit also supports the BPDU filter option, which prevents PortFast-enabled ports from sending or receiving BPDUs. This feature effectively disables STP at the edge and can lead to STP loops. It is not recommended. |
STP Standards and Features
STP enables the network to deterministically block interfaces and provide a loop-free topology in a network with redundant links. There are several varieties of STP:
-
STP is the original IEEE 802.1D version (802.1D-1998) that provides a loop-free topology in a network with redundant links.
-
Common Spanning Tree (CST) assumes one spanning-tree instance for the entire bridged network, regardless of the number of VLANs.
-
Per VLAN Spanning Tree Plus (PVST+) is a Cisco enhancement of STP that provides a separate 802.1D spanning tree instance for each VLAN configured in the network. The separate instance supports PortFast, UplinkFast, BackboneFast, BPDU guard, BPDU filter, root guard, and loop guard.
-
The 802.1D-2004 version is an updated version of the STP standard.
-
Multiple Spanning Tree (MST) is an IEEE standard inspired from the earlier Cisco proprietary Multi-Instance Spanning Tree Protocol (MISTP) implementation. MST maps multiple VLANs into the same spanning-tree instance. The Cisco implementation of MSTP is MST, which provides up to 16 instances of Rapid Spanning Tree Protocol (RSTP, 802.1w) and combines many VLANs with the same physical and logical topology into a common RSTP instance. Each instance supports PortFast, UplinkFast, BackboneFast, BPDU guard, BPDU filter, root guard, and loop guard.
-
RSTP, or IEEE 802.1w, is an evolution of STP that provides faster convergence of STP.
-
Rapid PVST+ (RPVST+) is a Cisco enhancement of RSTP that uses PVST+. It provides a separate instance of 802.1w per VLAN. The separate instance supports PortFast, UplinkFast, BackboneFast, BPDU guard, BPDU filter, root guard, and loop guard.
Note | When Cisco documentation and this course refer to implementing RSTP, they are referring to the Cisco RSTP implementation, or PVRST+. |
STP Standards and Features
To configure a VLAN instance to become the root bridge, enter the spanning-tree vlan vlan_ID root primary command to modify the bridge priority from the default value (32768) to a significantly lower value. The bridge priority for the specified VLAN is set to 8192 if this value will cause the switch to become the root for the VLAN. If any bridge for the VLAN has a priority lower than 8192, the switch sets the priority to one less than the lowest bridge priority Manually placing the primary and secondary bridges along with enabling STP toolkit options enables you to support a deterministic configuration where you know which ports should be forwarding and which ports should be blocking.
Note | Defining the root bridge under MST is done using the spanning-tree mst instance_id root primary. When you use this command, the switch will review all bridge ID values it receives from other root bridges. If any root bridge has a bridge ID equal to or less than 24576, it will set its own bridge priority to 4096 less then the lowest bridge priority. To ensure that it will retain its position as the root bridge, you must also enable root guard. |
Figure 2-9 illustrates recommended placements for STP toolkit features:
-
Loop guard is implemented on the Layer 2 ports between distribution switches, and on the uplink ports from the access switches to the distribution switches.
-
Root guard is configured on the distribution switch ports facing the access switches.
-
UplinkFast is implemented on the uplink ports from the access switches to the distribution switches.
-
BPDU guard or root guard is configured on ports from the access switches to the end devices, as is PortFast.
-
The UDLD protocol allows devices to monitor the physical configuration of the cables and detect when a unidirectional link exists. When a unidirectional link is detected, UDLD shuts down the affected LAN port. UDLD is often configured on ports linking switches.
-
Depending on the security requirements of an organization, the port security feature can be used to restrict a port’s ingress traffic by limiting the MAC addresses allowed to send traffic into the port.
Note | When you are configuring MST, UplinkFast is not required as a feature on dual-homed switches. Rapid root port failover occurs as part of the default MST protocol implementation. |
0 comments
Post a Comment