Friday, June 17, 2011

Chapter 2: Enterprise Campus Network Design (Part06)

Supporting a Layer 2 to Layer 3 Boundary Design

Add a note hereThis following section reviews design models and recommended practices for supporting the Layer 2 to Layer 3 boundary in highly available enterprise campus networks.


Layer 2 to Layer 3 Boundary Design Models

Add a note here There are several design models for placement of the Layer 2 to Layer 3 boundary in the enterprise campus.

Add a note here Layer 2 Distribution Switch Interconnection

Add a note hereIf the enterprise campus requirements must support VLANs spanning multiple access layer switches, the design model uses a Layer 2 link for interconnecting the distribution switches.

Add a note hereThe design, illustrated here in Figure 2-22, is more complex than the Layer 3 interconnection of the distribution switches. The STP convergence process will be initiated for uplink failures and recoveries.

Click to collapse
Add a note hereFigure 2-22: Layer 2 Distribution Switch Interconnection

Add a note hereYou can improve this suboptimal design as follows:

  • Add a note hereUse RSTP as the version of STP.


    Note

    Add a note hereRPVST+ is a Cisco enhancement of RSTP that uses PVST+. It provides a separate instance of 802.1w per VLAN. The separate instance supports PortFast, UplinkFast, BackboneFast, BPDU guard, BPDU filter, root guard, and loop guard. (RPVST+ is also known as PVRST+.)

  • Add a note hereProvide a Layer 2 trunk between the two distribution switches to avoid unexpected traffic paths and multiple convergence events.

  • Add a note hereIf you choose to load balance VLANs across uplinks, be sure to place the HSRP primary and the STP primary on the same distribution layer switch. The HSRP and RSTP root should be collocated on the same distribution switches to avoid using the interdistribution link for transit.

Add a note here Layer 3 Distribution Switch Interconnection (HSRP)

Add a note here Figure 2-23 shows the model which supports a Layer 3 interconnection between distribution switches using HSRP as the FHRP.

Click to collapse
Add a note hereFigure 2-23: Layer 3 Distribution Switch Interconnection

Add a note hereIn this time-proven topology, no VLANs span between access layer switches across the distribution switches. A subnet equals a VLAN, which equals an access switch. The root for each VLAN is aligned with the active HSRP instance. From a STP perspective, both access layer uplinks are forwarding, so the only convergence dependencies are the default gateway and return-path route selection across the distribution-to-distribution link.

Add a note hereThis recommended design provides the highest availability.

Add a note hereWith this design, a distribution-to-distribution link is required for route summarization. A recommended practice is to map the Layer 2 VLAN number to the Layer 3 subnet for ease of use and management.

Add a note here Layer 3 Distribution Switch Interconnection (GLBP)

Add a note hereGLBP can also be used as the FHRP with the Layer 3 distribution layer interconnection model, as shown in Figure 2-24.

Click to collapse
Add a note hereFigure 2-24: Layer 3 Distribution Switch Interconnection with GLBP

Add a note hereGLBP allows full utilization of the uplinks from the access layer. However, because the distribution of ARP responses is random, it is less deterministic than the design with HSRP. The distribution-to-distribution link is still required for route summarization. Because the VLANs do not span access switches, STP convergence is not required for uplink failure and recovery.

Add a note here Layer 3 Access to Distribution Interconnection

Add a note hereThe design extending Layer 3 to the access layer, shown here in Figure 2-25, provides the fastest network convergence.

Click to collapse
Add a note hereFigure 2-25: Layer 3 Access to Distribution Interconnection

Add a note here A routing protocol such as EIGRP, when properly tuned, can achieve better convergence results than designs that rely on STP to resolve convergence events. A routing protocol can even achieve better convergence results than the time-tested design placing the Layer 2 to Layer 3 boundary at the distribution layer. The design is easier to implement than configuring Layer 2 in the distribution layer because you do not need to align STP with HSRP or GLBP.

Add a note hereThis design supports equal-cost Layer 3 load balancing on all links between the network switches. No HSRP or GLBP configuration is needed because the access switch is the default gateway for the end users. VLANs cannot span access switches in this design.

Add a note hereThe convergence time required to reroute around a failed access-to-distribution layer uplink is reliably under 200 ms as compared to 900 ms for the design placing the Layer 2 and Layer 3 boundary at the distribution layer. Return-path traffic is also in the sub-200 ms of convergence time for an EIGRP reroute, again compared to 900 ms for the traditional Layer 2 to Layer 3 distribution layer model.

Add a note hereBecause both EIGRP and OSPF loads share over equal-cost paths, this design provides a convergence benefit similar to GLBP. Approximately 50 percent of the hosts are not affected by a convergence event because their traffic is not flowing over the link or through the failed node.

Add a note here However, some additional complexity associated with uplink IP addressing and subnetting and the loss of flexibility is associated with this design alternative.

Add a note hereRouting in the access layer is not as widely deployed in the enterprise environment as the Layer 2 and Layer 3 distribution layer boundary model.


Note

Add a note hereDeploying a layer 3 access layer may be prohibited because of conformance with the existing architecture, price of multilayer switches, application, or service requirements.

EIGRP Access Design Recommendations

Add a note hereWhen EIGRP is used as the routing protocol for a fully routed or routed access layer solution, with tuning it can achieve sub-200 ms convergence.

Add a note hereEIGRP to the distribution layer is similar to EIGRP in the branch, but it’s optimized for fast convergence using these design rules:

  • Add a note hereLimit scope of queries to a single neighbor:

    Add a note hereSummarize at the distribution layer to the core as is done in the traditional Layer 2 to Layer 3 border at the distribution layer. This confines impact of an individual access link failure to the distribution pair by stopping EIGRP queries from propagating beyond the core of the network. When the distribution layer summarizes toward the core, queries are limited to one hop from the distribution switches, which optimizes EIGRP convergence.

    Add a note hereConfigure all access switches to use EIGRP stub nodes so that the access devices are not queried by the distribution switches for routes. EIGRP stub nodes cannot act as transit nodes and do not participate in EIGRP query processing. When the distribution node learns through the EIGRP hello packets that it is talking to a stub node, it does not flood queries to that node.

  • Add a note hereControl route propagation to access switches using distribution lists. The access switches need only a default route to the distribution switches. An outbound distribution list applied to all interfaces facing the access layer from the distribution switch will conserve memory and optimize performance at the access layer.

  • Add a note hereSet hello and dead timers to 1 and 3 as a secondary mechanism to speed up convergence. The link failure or node failure should trigger convergence events. Tune EIGRP hello and dead timers to 1 and 3, respectively, to protect against a soft failure in which the physical links remain active but hello and route processing has stopped.

    Add a note hereEIGRP optimized configuration example:

    Add a note hereinterface GigabitEthernet1/1 ip hello-interval eigrp 100 2 ip hold-time eigrp 100 6
    router eigrp 100 eigrp stub connected

Note

Add a note hereAn EIGRP stub is included in the base image of all Cisco multilayer Catalyst switches.

OSPF Access Design Recommendations

Add a note here When OSPF is used as the routing protocol for a fully routed or routed access layer solution with tuning it can also achieve sub-200-ms convergence.

Add a note hereOSPF to the distribution layer is similar to OSPF in the branch, but it’s optimized for fast convergence. With OSPF, summarization and limits to the diameter of OSPF LSA propagation is provided through implementation of Layer 2 to Layer 3 boundaries or Area Border Routers (ABR). It follows these design rules:

  • Add a note hereControl the number of routes and routers in each area:

    • Add a note hereConfigure each distribution block as a separate, totally stubby OSPF area. The distribution switches become ABRs with their core-facing interfaces in area 0, and the access layer interfaces in unique, totally stubby areas for each access layer switch. Do not extend area 0 to the access switch because the access layer is not used as a transit area in a campus environment. Each access layer switch is configured into its own unique, totally stubby area. In this configuration, LSAs are isolated to each access layer switch so that a link flap for one access layer switch is not communicated beyond the distribution pairs.

    • Add a note hereTune OSPF millisecond hello, dead-interval, SPF, and LSA throttle timers as a secondary mechanism to improve convergence. Because CPU resources are not as scarce in a campus environment as they might be in a WAN environment, and the media types common in the access layer are not susceptible to the same half-up or rapid transitions as are those commonly found in the WAN, OSPF timers can safely be tuned, as shown in the configuration snippet here:

      Add a note hereinterface GigabitEthernet1/1 ip ospf dead-interval minimal hello-multiplier 4
      router ospf 100 area 120 stub no-summary timers throttle spf 10 100 5000 timers throttle lsa all 10 100 5000 timers lsa arrival 80

Note

Add a note hereOSPF support is not included in the base image of all Cisco multilayer Catalyst switches, but it is available with the IP Services upgrade.


The Coverimage


Chapter 2 - Enterprise Campus Network Design
Designing Cisco Network Service Architectures (ARCH) (Authorized Self-Study Guide), Second Edition
by Keith Hutton, Mark Schofield and Diane Teare
Cisco Press © 2009 Citation
On Bookshelfhas Chapters to Go downloadsIcon legend

Recommend? Click to vote yesClick to vote no
Progress: Progress: 10%

Add a note here Potential Design Issues

Add a note hereThe following sections discuss potential design issues for placement of the Layer 2 to Layer 3 boundary in the enterprise campus.

Add a note here Daisy Chaining Access Layer Switches

Add a note hereIf multiple fixed-configuration switches are daisy chained in the access layer of the network, there is a danger that black holes will occur in the event of a link or node failure.

Add a note hereIn the topology in Figure 2-26, before failures no links are blocking from a STP or RSTP perspective, so both uplinks are available to actively forward and receive traffic. Both distribution nodes can forward return-path traffic from the rest of the network toward the access layer for devices attached to all members of the stack or chain.

Click to collapse
Add a note hereFigure 2-26: Daisy Chaining Access Layer Switches

Add a note hereTwo scenarios can occur if a link or node in the middle of the chain or stack fails. In the first case, the standby HSRP peer can go active as it loses connectivity to its primary peer, forwarding traffic outbound for the devices that still have connectivity to it. The primary HSRP peer remains active and also forwards outbound traffic for its half of the stack. Although this is not optimum, it is not detrimental from the perspective of outbound traffic.

Add a note here The second scenario is the issue. Return-path traffic has a 50 percent chance of arriving on a distribution switch that does not have physical connectivity to the half of the stack where the traffic is destined. The traffic that arrives on the wrong distribution switch is dropped.

Add a note hereThe solution to this issue with this design is to provide alternate connectivity across the stack in the form of a loop-back cable running from the top to the bottom of the stack. This link needs to be carefully deployed so that the appropriate STP behavior will occur in the access layer.

Add a note hereAn alternate design uses a Layer 2 link between the distribution switches.

Cisco StackWise Technology in the Access Layer

Add a note hereCisco StackWise technology can eliminate the danger that black holes occur in the access layer in the event of a link or node failure. It can eliminate the need for loop-back cables in the access layer or Layer 2 links between distribution nodes.

Add a note hereStackWise technology, shown in the access layer in Figure 2-27, supports the recommended practice of using a Layer 3 connection between the distribution switches without having to use a loop-back cable or perform extra configuration.

Image from book
Add a note hereFigure 2-27: StackWise Technology

Add a note hereThe true stack creation provided by the Cisco Catalyst 3750 series switches makes using stacks in the access layer much less complex than chains or stacks of other models. A stack of 3750 switches appears as one node from the network topology perspective.

Add a note hereIf you use a modular chassis switch to support ports in the aggregation layer, such as the Cisco Catalyst 4500 or Catalyst 6500 family of switches, these design considerations are not required.

Add a note here Too Much Redundancy

Add a note here Be aware that even if some redundancy is good, more redundancy is not necessarily better.

Add a note hereIn Figure 2-28, a third switch is added to the distribution switches in the center. This extra switch adds unneeded complexity to the design and leads to these design questions:

  • Add a note hereWhere should the root switch be placed? With this design, it is not easy to determine where the root switch is located.

  • Add a note hereWhat links should be in a blocking state? It is very hard to determine how many ports will be in a blocking state.

  • Add a note hereWhat are the implications of STP and RSTP convergence? The network convergence is definitely not deterministic.

  • Add a note hereWhen something goes wrong, how do you find the source of the problem? The design is much harder to troubleshoot.

    Add a note here Image from book
    Add a note hereFigure 2-28: Too Much Redundancy

Add a note here Too Little Redundancy

Add a note here For most designs, a link between the distribution layer switches is required for redundancy.

Add a note here Figure 2-29 shows a less-than-optimal design where VLANs span multiple access layer switches. Without a Layer 2 link between the distribution switches, the design is a looped figure-eight topology. One access layer uplink will be blocking. HSRP hellos are exchanged by transiting the access switches.

Click to collapse
Add a note hereFigure 2-29: Too Little Redundancy

Add a note hereInitially, traffic is forwarded from both access switches to the Distribution A switch that supports the STP root and the primary or active HSRP peer for VLAN 2. However, this design will black-hole traffic and be affected by multiple convergence events with a single network failure.

Example: Impact of an Uplink Failure

Add a note hereThis example looks at the impact of an uplink failure on the design when there is no link between the distribution layer switches.

Add a note hereIn Figure 2-30, when the uplink from Access A to Distribution A fails, three convergence events occur:

  1. Add a note hereAccess A sends traffic across its active uplink to Distribution B to get to its default gateway. The traffic is black-holed at Distribution B because Distribution B does not initially have a path to the primary or active HSRP peer on Distribution A because of the STP blocking. The traffic is dropped until the standby HSRP peer takes over as the default gateway after not receiving HSRP hellos from Distribution A.

  2. Add a note hereThe indirect link failure is eventually detected by Access B after the maximum-age (max_age) timer expires, and Access B removes blocking on the uplink to Distribution B. With standard STP, transitioning to forwarding can take as long as 50 seconds. If BackboneFast is enabled with PVST+, this time can be reduced to 30 seconds, and RSTP can reduce this interval to as little as 1 second.

  3. Add a note here After STP and RSTP converge, the distribution nodes reestablish their HSRP relationships and Distribution A (the primary HSRP peer) preempts. This causes yet another convergence event when Access A endpoints start forwarding traffic to the primary HSRP peer. The unexpected side effect is that Access A traffic goes through Access B to reach its default gateway. The Access B uplink to Distribution B is now a transit link for Access A traffic, and the Access B uplink to Distribution A must now carry traffic for both the originally intended Access B and for Access A.

    Add a note here Image from book
    Add a note hereFigure 2-30: Impact of an Uplink Failure


Note

Add a note hereWith aggressive HSRP timers, you can minimize this period of traffic loss to approximately 900 ms.

Example: Impact on Return-Path Traffic

Add a note hereBecause the distribution layer in Figure 2-31 is routing with equal-cost load balancing, up to 50 percent of the return-path traffic arrives at Distribution A and is forwarded to Access B. Access B drops this traffic until the uplink to Distribution B is forwarding. This indirect link-failure convergence can take as long as 50 seconds. PVST+ with UplinkFast reduces the time to three to five seconds, and RSTP further reduces the outage to one second. After the STP and RSTP convergence, the Access B uplink to Distribution B is used as a transit link for Access A return-path traffic.

Click to collapse
Add a note hereFigure 2-31: Impact on Return Path Traffic

Add a note hereThese significant outages could affect the performance of mission-critical applications, such as voice or video. Traffic engineering or link-capacity planning for both outbound and return-path traffic is difficult and complex, and must support the traffic for at least one additional access layer switch.

Add a note here The conclusion is that if VLANs must span the access switches, a Layer 2 link is needed either between the distribution layer switches or the access switches.

Add a note here Asymmetric Routing (Unicast Flooding)

Add a note hereWhen VLANs span access switches, an asymmetric routing situation can result because of equal-cost load balancing between the distribution and core layers.

Add a note hereUp to 50 percent of the return-path traffic with equal-cost routing arrives at the standby HSRP, VRRP, or alternate, nonforwarding GLBP peer. If the content-addressable memory (CAM) table entry ages out before the ARP entry for the end node, the peer may need to flood the traffic to all access layer switches and endpoints in the VLAN.

Add a note hereIn Figure 2-32, the CAM table entry ages out on the standby HSRP router because the default ARP timers are four hours and the CAM aging timers are five minutes. The CAM timer expires because no traffic is sent upstream by the endpoint toward the standby HSRP peer after the endpoint initially uses ARP to determine its default gateway. When the CAM entry has aged out and is removed from the CAM table, the standby HSRP peer must forward the return-path traffic to all ports in the common VLAN. The majority of the access layer switches do not have a CAM entry for the target MAC, and they broadcast the return traffic on all ports in the common VLAN. This unicast traffic flooding can have a significant performance impact on the connected end stations because they may receive a large amount of traffic that is not intended for them.

Click to collapse
Add a note hereFigure 2-32: Asymmetric Routing

Unicast Flooding Prevention

Add a note here The unicast flooding situation can be easily avoided by not spanning VLANs across access layer switches.

Add a note hereUnicast flooding is not an issue when VLANs are not present across multiple access layer switches because the flooding occurs only to switches supporting the VLAN where the traffic would have normally been switched. If the VLANs are local to individual access layer switches, asymmetric routing traffic is flooded on only the one VLAN interface on the distribution switch. Traffic is flooded out the same interface that would be used normally to forward to the appropriate access switch. In addition, the access layer switch receiving the flooded traffic has a CAM table entry for the host because the host is directly attached, so traffic is switched only to the intended host. As a result, no additional end stations are affected by the flooded traffic.

Add a note hereIf you must implement a topology where VLANs span more than one access layer switch, the recommended workaround is to tune the ARP timer so that it is equal to or less than the CAM aging timer. A shorter ARP cache timer causes the standby HSRP peer to use ARP for the target IP address before the CAM entry timer expires and the MAC entry is removed. The subsequent ARP response repopulates the CAM table before the CAM entry is aged out and removed. This removes the possibility of flooding asymmetrically routed return-path traffic to all ports. You can also consider biasing the routing metrics to remove the equal cost routes.


No comments:

Post a Comment