Dynamic Host Configuration Protocol (DHCP) is widely used technique for configuring the IP address and other parameters of IP devices. Usage of this protocol involves servers, relay agents, routers, and clients. Troubleshooting DHCP requires knowledge of this protocol, common configuration errors with respect to this protocol, and the relevant troubleshooting tools and commands.
IPv6 deployment has started in many parts of the world at different extents. Most vendor devices support all aspects of IP Version 6 (IPv6), including addressing, routing, filtering, and translations. Understanding IPv6 and troubleshooting IPv6 are progressively becoming more important and more in demand.
Identify Common IPv4 Addressing Service Issues
This section reviews NAT and DHCP and highlights common troubleshooting issues with respect to each one. For both of these topics, a troubleshooting example is provided as a case study practice.
NAT/PAT Operation
NAT was designed for IP Version 4 (IPv4) address conservation. Today, NAT is also used for address hiding, with security implications. NAT usually operates at the border of a network and translates the source address of the exiting IP packets that are private addresses to public addresses before packets are forwarded out, as illustrated in Figure 6-1. The packet header information and the corresponding translated IP address are kept in a NAT table. NAT does the reverse for the destination address of the responding IP packets based on the content of the NAT table. NAT can be used in multiple scenarios, not just in the classic situation of connecting to the public Internet. For example, NAT can be used to renumber your global address space when you switch between service providers. Also, in virtual private network (VPN) connectivity situations, you frequently find remote locations that have overlapping address spaces and NAT can overcome the connectivity issues that arise by translating the overlapping address spaces to nonoverlapping addresses.
Figure 6-1: NAT Operates at the Border of the Network, and It Generally Translates the Source IP Address of the Outgoing Packets and the Destination IP Address of the Incoming IP Packets
In troubleshooting NAT, you need to be aware that NAT is used in different ways, each of which uses different resources and has its own limitations and barriers. Over the years, there have been multiple semantics and terminologies applied to these NAT types by different vendors, but in their simplest and most popular forms they fall within these three categories:
-
Static NAT: In this case, inside local (locally significant) and inside global (globally significant) addresses are mapped one to one. This mapping is particularly useful when an inside device must be accessible from the outside network, such as the case of web servers in an Internet data center. In troubleshooting this type of NAT, you must be aware of its static nature, and how IP address changes might affect an existing static configuration. If the server changes inside or outside addresses, the static NAT entry will have the wrong settings and there will be connectivity problems.
-
Dynamic NAT: Dynamic NAT also translates addresses following the same underlying technology as static NAT; however, local addresses are translated to a group or pool of global addresses. This way of translating opens the door to issues related to the size of that global pool, because you are still dealing with one-to-one translation once a global address has been selected. You might leave some inside hosts without a valid global address, thus causing connectivity problems. This also opens the door to management, tracking, and audit issues, because of the dynamic nature of the translation—one time the host obeys a certain translation entry, and the next time it could be translating to a different address.
-
-
NAT overloading: This type of NAT is a special type of dynamic NAT in which addresses are translated in a many-to-many fashion. This type is also known as Port Address Translation (PAT) because global addresses can be reused and the differentiator between many inside local addresses sharing the same global address is the port number. NAT overloading suffers from some application support issues.
The NAT types have slight design variations and each has its own benefits and limitations. Understanding the specific implementation method of each NAT type and its unique limitations is very useful during troubleshooting cases involving NAT. The main advantages of NAT are well known. Examples of that are keeping IP addresses from being depleted and providing some security by concealing the actual addresses of internal resources. However, the limitations of NAT add several issues that you need to consider. Examples of those issues are incompatibility of certain protocols and applications to NAT’s address or port translation. In some cases, end-to-end traceability is lost and you must make changes to, or at least clear translations at the NAT device, which requires you to have administrative access and know the translation rules. You must also consider the performance degradation caused by NAT. Packets are assembled and disassembled to carry out the address translations, affecting device resources to a certain extent. Table 6-1 summarizes the main advantages and disadvantages of implementing NAT.
Advantage | Disadvantages |
---|---|
Conserves registered addresses | Translation introduces processing delays |
Hides the actual address of internal hosts and services | Loss of end-to-end IP reachability |
Increases flexibility when connecting to Internet | Certain applications will not function with NAT enabled |
Eliminates address renumbering as the network changes from one ISP to another | Considerations are needed when working with VPNs |
Some applications or protocols have direct conflict with NAT or PAT. For example, imagine a case involving IPsec VPN. IPsec protocols encapsulate the original IP packet, and therefore the protocol type on the IP header changes (to Encapsulating Security Payload [ESP] or Authentication Header [AH]) and there is no TCP or UDP header next to the IP header. A lack of TCP or UDP header means that there is no port number for NAT/PAT to translate. If IPsec is not used in tunnel mode, meaning that the original IP header is not encapsulated in a new one, there is a chance that NAT/PAT will conflict with the integrity checks of IPsec protocols (ESP or AH). Some mechanisms have been invented to allow IPsec and NAT to coexist. Those mechanisms include NAT Transparency or NAT Traversal, IPsec over TCP, and IPsec over UDP. In certain cases, however, you might still be required to disable NAT for VPN traffic, or create exceptions for it. For some protocols, the reason of the conflict with NAT is the way those protocols function at the application layer. For example, some Internet Control Message Protocol (ICMP) packets make a reference to the IP address, which might not match the IP packet’s header due to NAT’s translation. Other applications related to multimedia traffic, such as voice and video, negotiate ports at the moment of connection, or have IP addresses embedded in the payload of the packets, forcing NAT to be application aware and be capable of changing its traditional behavior to allow those applications function with no problems. Applications and protocols as such might be labeled as NAT sensitive; examples of those protocols and applications include Kerberos, X Window System (remote-access application), remote shell (rsh), Session Initiation Protocol (SIP), Simple Network Management Protocol (SNMP), File Transfer Protocol (FTP), and Domain Name System (DNS). Table 6-2 lists some well-known NAT-sensitive protocols and explains why they conflict with NAT.
Protocol | Behavior |
---|---|
IPsec | NAT changes certain IP header fields such as the IP address and the IP header checksum, and this can conflict with IPsec integrity. |
ICMP | Many ICMP packets, such as Destination Unreachable, carry embedded IP header information inside the ICMP message payload, not matching IP packet’s translated address. |
Session Initiation Protocol (SIP) | Protocols such as SIP negotiate address and ports numbers at the application layer, which can become invalid through a NAT device. |
Note | For more information about conflicts with NAT, see the Cisco.com article “The Trouble with NAT,” which you can at http://tinyurl.com/re2g7. |
In many cases, NAT is implemented on a device that has several other services active, too. This makes the troubleshooting task more complicated. Figure 6-2 lists several features that can be enabled for inbound and outbound traffic on each router interface. Note that as Figure 6-2 shows, the features are enforced in specific order as per the IOS rules, and the order makes a significant difference in the outcome.
Some of the features and services shown in Figure 6-2 are as follows:
-
Security through access control lists (ACLs)
-
Quality of service (QoS) through rate limiting and queuing
-
Encryption through VPN technologies
It is important to understand the impact of NAT on all those services. You must also think about the order in which the enabled services are enforced before you begin troubleshooting. Furthermore, you must note that this order changes depending on the direction of the traffic flow, whether it is entering an interface or leaving the interface. Notice, for example, that for outbound traffic (inside to outside), address translation occurs before output access lists are evaluated. This changing order means that in building your output access policy, you must consider “post-NAT” addresses.
Troubleshooting Common NAT/PAT Issues
Similar to any other troubleshooting case, NAT troubleshooting requires you to be familiar with the relevant tools and commands to help at different steps of troubleshooting. This includes useful show commands for information gathering and specific considerations when analyzing symptoms and eliminating possible hypothesis. One important consideration, for example, is the impact of routing in NAT configurations. Global addresses are not usually applied to the inside physical network segments, but they must be advertised to the outside world so that outsiders know where and how to send (respond) packets back. Another consideration must be given to the size of the NAT pools because they have a critical effect on specific troubleshooting scenarios where dynamic NAT is involved. Finally, during every stage of the troubleshooting process, you must consider configuration errors of all sorts.
Some of the important NAT issues and considerations to keep in mind are as follows:
-
Having a diagram for the NAT configuration is always helpful and should be a standard practice. Do not just start configuring without a good drawing or diagram that shows or explains each item involved.
-
ACLs are used to tell the NAT device “what source IP addresses are to be translated,” and IP NAT pools are used to specify “to what those addresses translate,” as packets go from IP NAT inside to IP NAT outside.
-
Marking the IP NAT inside interfaces and the IP NAT outside interfaces correctly is very important; otherwise, NAT could have unpredictable and undesirable effects.
-
-
NAT packets still have to obey routing protocols and reachability rules, so make sure that every router knows how to reach the desired destinations. Make sure the public addresses to which addresses translated, are advertised to the outside neighbors and autonomous systems.
Some helpful commands that enable you to determine the correct or incorrect functioning of NAT are as follows:
-
clear ip nat translation: In the case of a change or inaccuracy, this command enables you can specify exactly which entries you want to clear by specifying more parameters. Clearing all translations might cause a disruption until new translations are re-created.
-
show ip nat translations: Allows you to see all the translations that are currently installed and active on the router.
-
show ip nat statistics: This command displays NAT statistics such as number of translations (static, dynamic, extended), number of expired translations, number of hits (matches), number of misses (no matches), and so on.
-
debug ip nat: Use this command to verify the operation of the NAT feature by displaying information about each packet that the router translates. The debug ip nat detailed command generates a description of each packet considered for translation. This command also displays information about certain errors or exception conditions, such as the failure to allocate a global address.
-
debug ip packet [access-list]: Use this command to display general IP debugging information and IP security option (IPSO) security transactions. If a communication session is closing when it should not be, an end-to-end connection problem can be the cause. The debug ip packet command is useful for analyzing the messages traveling between the local and remote hosts. IP packet debugging captures the packets that are process switched including received, generated, and forwarded packets. IP packets that are switched in the fast path are not captured. The access-list option allows you to narrow down the scope of your debugging.
-
debug condition interface interface: This is called conditionally triggered debugging. When this feature is enabled, the router generates debugging messages for packets entering or leaving the router on the specified interface; the router will not generate debugging output for packets entering or leaving through a different interface.
-
Always be careful when running debug commands on a production or critical network; the more specific your debug statement, the better. Extensive debug commands typically impair or overload resources in the routers. Again, you can further scope the debug command with additional keywords and access lists.
For example, the debug ip packet command is fairly dangerous because it generates a lot of information that might clutter your console and even cause critical performance degradation to the router. You can, however, use access lists to filter the output. For example, you can use access list 100 to narrow down the scope of the debug ip packet by typing debug ip packet 100, and filter the output to what access list 100 permits. Some commands do not have the access-list option, so a more recent approach is to use conditionally triggered debugs. To use a conditionally triggered debug, first define your condition with the debug condition command. For example, you can define a condition of interface serial 0/0 by typing debug condition interface serial 0/0. This definition means that all debug output will be limited only to that particular interface. The condition remains there until it is removed. You can check the active debug conditions using the show debug condition command.
Troubleshooting Example: NAT/PAT Problem Caused by a Routing Issue
The first NAT troubleshooting example to be discussed here is based on the diagram shown in Figure 6-3. In this case, router R1 can ping R4, but router R1 cannot ping R3. You do not have much more information, except that there are no routing protocols running in any of the routers and R1 uses R2 as its gateway of last resort. Your objective is to restore end-to-end connectivity from R1 to all destinations.
When troubleshooting NAT, the structured approach can begin with learning where the NAT boundaries are located. Among the very few commands used for NAT monitoring and verification, the show ip nat statistics command is informative about all basic components of a NAT configuration. Using this command on router R2 generated the output shown in Example 6-1.
R2# sh ip nat statistics
Total active translations: 1 (1 static, 0 dynamic, 0 extended)
Outside interfaces:
FastEthernet0/1, Serial0/1/0, Loopback0
Inside interfaces:
FastEthernet0/0
Hits: 39 Misses: 6
CEF Translated packets: 45, CEF Punted packets: 49
Expired translations: 6
Dynamic mappings:
-- Inside Source
[Id: 1] access-list 10 pool NAT_OUT refcount 0
pool NAT_OUT: netmask 255.255.255.0
start 172.16.6.129 end 172.16.6.240
type generic, total addresses 112, allocated 0 (0%), misses 0
Appl doors: 0
Normal doors: 0
Queued Packets: 0
R2#
On the top lines of the output shown in Example 6-1, the outside and inside NAT interfaces are listed as per R2’s current configuration: Fa0/0 is correctly configured as inside interface, while s0/1/0 and Fa0/1 are correctly configured as outside interfaces. Notice that this command also shows the full NAT configuration. In this instance, you have both dynamic and static NAT configured. The only indication that there are static translations configured is the reference to “1 static” in the “Total Active Translations” line. To see the static translation, the show ip nat translations command is used, and the result of it is shown in Example 6-2.
R2# sh ip nat translations
Pro Inside global Inside local Outside local Outside global
--- 172.16.6.1 10.10.10.1 --- ---
R2#
The only entry in the NAT translation table is the static translation for 10.10.10.1 into 172.16.6.1. The address 10.10.10.1 is the IP address of R1’s Fast Ethernet interface (fa0/0). This translation might be causing the problem. Typical issues with static translations occur when there is no route back to the statically translated address, or when the statically selected global address overlaps with an available address in the dynamic address pool. Hence, as the next step, you can verify whether packets leaving R1 actually reach R3. This can help you discover whether the problem is with NAT or if it is a routing problem. To find out if packets reach R3 and do not return, or whether they never reach R3, you can make use of ICMP debugging on R3; accomplished by typing the debug ip icmp command. Next, ping R3 from R1 and observe the debug output on R3. The results are shown in the output of Example 6-3.
R3# debug ip icmp
ICMP packet debugging is on
R2#
*Aug 23 13:54:00.556: ICMP: echo reply sent, src 172.16.11.3, dst 172.16.6.1
*Aug 23 13:54:02.552: ICMP: echo reply sent, src 172.16.11.3, dst 172.16.6.1
*Aug 23 13:54:04.552: ICMP: echo reply sent, src 172.16.11.3, dst 172.16.6.1
*Aug 23 13:54:06.552: ICMP: echo reply sent, src 172.16.11.3, dst 172.16.6.1
*Aug 23 13:54:07.552: ICMP: echo reply sent, src 172.16.11.3, dst 172.16.6.1
Based on the output of the debug ip icmp command shown in Example 6-3, the ICMP echo requests reach R3, but R3’s ICMP echo replies do not reach R1. Notice that the debug output shown in Example 6-3 is R3’s echo reply that is not reaching R1. You can conclude that the NAT translation is working but there is a routing issue on R3 toward the 172.16.6.0 destination, which you can verify with the show ip route command on R3 as demonstrated in Example 6-4.
R3#
R3# show ip route 172.16.6.0
% Subnet not in table
R3#
R3# configure terminal
R3(config)# ip route 172.16.6.0 255.255.255.0 172.16.11.2
R3(config)# exit
Because there are no routing protocols in use, you can only fix the routing problem by entering a static route in R3’s routing table, so that it sends packets destined to 172.16.6.0/24 to R2 (also shown in Example 6-4). Following the addition of the static route to R3’s routing table, you can check to see whether the connectivity problem between R1 and R3 has been corrected. Example 6-5 shows the ping result from R1 to R2.
R1# ping 172.16.11.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.11.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms
R1#
The source of problem has been discovered, and it has been corrected. The problem was not directly a NAT problem; it was a routing issue. When NAT is deployed, a set of inside local addresses are normally translated to a set of inside global addresses. The inside global addresses might not be assigned to any physical networks, yet they need to be advertised so that outside devices know about those addresses and know how to forward packets to those networks. In this example, because no routing protocols were used, a static route was added to router R3’s routing table so that it can send packets to the devices such as R1 that are behind the NAT device (R2).
Troubleshooting Example: NAT Problem Caused by an Inaccurate Access List
The second NAT troubleshooting example is based on the diagram shown on Figure 6-4. In this scenario, administrators are reporting that they are unable to use Secure Shell (SSH) from the 10.10.10.0/24 network to routers R3 or R4, but they can accomplish connectivity from the R1 loopbacks. In addition, the risk management team recently performed an upgrade to router and firewall security policies, and some of the changes might have affected the NAT configuration and operations. The routing protocol used is the Open Shortest Path First (OSPF) Protocol, single-area model. Your mission is to restore end-to-end connectivity and make sure SSH is operational to support management processes.
The indication of an upgrade in security policies draws attention to some sort of blocking or filtering as the problem source. You definitely need to consider if SSH traffic is being filtered and if the answer is yes, you need to determine which devices it is being filtered. On the other hand, the problem could just be a routing problem (traffic is either not reaching the destination, or it is not coming back). There might be other problems, too, so the best course of action is to apply a structured approach. With the security filter possibility in mind, you can start with a follow-the-path approach, testing connectivity from R1 to R3. Using R1 console, you try to verify the symptoms by pinging the destination 172.16.11.3 (R3’s serial 0/1/0 interface) from the loopback interfaces using extended ping. This ping is successful as expected. Next, you do a ping from R1, but this time sourcing it from the fa0/0 interface, and the result is 100 percent success again. The results are shown in the output of Example 6-6. You can conclude that connectivity is not an issue and no routing problems exist. Next, try SSH from R1 to R3 using the command ssh –l user 172.16.11.3. The username is just the word user, which has already been created on R3. The SSH connection fails as reported. The results are shown in the third section of Example 6-6.
R1# ping 172.16.11.3 source 10.10.50.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.11.3, timeout is 2 seconds:
Packet sent with a source address of 10.10.50.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/4 ms
R1#
R1# ping 172.16.11.3 source 10.10.10.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.11.3, timeout is 2 seconds:
Packet sent with a source address of 10.10.10.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/4 ms
R1#
R1# ssh -l user 172.16.11.3
% Connection refused by remote host
R1#
Given the recent security policy updates, the next logical step is to review possible access lists or security controls. The follow-the-path strategy points to looking at the intermediate routers all the way up to the destination (R3). In this instance, however, you will use a clever tool to discover the potential filter: debug ip tcp transactions, instead of trying to find ACLs in each router along the path.
Tip | You must always be careful about using debug. For example, if the router CPU or memory utilization is sustained at high, or if the router handles a high volume of TCP, turning this debug on might not be a good idea. |
At this point, you can try the SSH connection again and observe the debug output. The results, as shown on Example 6-7, indicate that the attempt made by R1 to setup a TCP session with R3 failed because the remote device (R3) responded with a RST (reset).
R1# debug ip tcp transactions
TCP special event debugging is on
R1# ssh -l user 172.16.11.3
% Connection refused by remote host
R1#
*Aug 23 14:59:42.636: TCP: Random local port generated 42115, network 1
*Aug 23 14:59:42.636: TCB63BF854C created
*Aug 23 14:59:42.636: TCB63BF854C bound to UNKNOWN.42115
*Aug 23 14:59:42.636: TCB63BF854C setting property TCP_TOS (11) 62AAF6D55
*Aug 23 14:59:42.636: Reserved port 42115 in Transport Port Agent for TCP IP type 1
*Aug 23 14:59:42.640: TCP: sending SYN, seq 1491927624, ack 0
*Aug 23 14:59:42.640: TCP0: Connection to 172.16.11.3:22, advertising MSS 536
*Aug 23 14:59:42.640: TCP0: state was CLOSED -> SYNSENT [42115 ->
172.16.11.3(22)]
*Aug 23 14:59:42.640: TCP0: state was SYNSENT -> CLOSED [42115 ->
172.16.11.3(22)]
*Aug 23 14:59:42.640: Released port 42115 in Transport Port Agent for TCP IP
type 1 delay 240000
*Aug 23 14:59:42.640: TCP0: bad seg from 172.16.11.3 — closing connection:
port 42115 seq 0 ack 1491927625 rcvnxt 0 rcvwnd 0 len 0
*Aug 23 14:59:42.640: TCP0: connection closed - remote sent RST
*Aug 23 14:59:42.640: TCB 0x63BF854C destroyed
Now you have to focus on R3. The output of show ip int serial 0/1/0 shows that an access list called FIREWALL-INBOUND is applied to serial 0/1/0 interface on the inbound direction. Next, you look at the content of the FIREWALL-INBOUND access list using the show access-lists command, and the access list looks correct: statement number 30 permits TCP connection to 172.16.11.3 TCP port number 22 (SSH). Example 6-8 shows the output from these two commands.
R3# sh ip int s0/1/0
Serial 0/1/0 is up, line protocol is up
Internet address is 172.16.11.3/24
Broadcast address is 255.255.255.255
Address determined by nonvolatile memory
MTU is 1500 bytes
Helper address is not set
Directed broadcat forwarding is disabled
Multicast reserved groups joined: 224.0.0.5
Outgoing access list is not set
Inbound access list is FIREWALL-INBOUND
Proxy ARP is enabled
Local Proxy ARP is disabled
Security level is default
Split horizon is enabled
ICMP redirects are always sent
ICMP unreachables are always sent
ICMP mask replies are never sent
IP fast switching is enabled
IP fast switching on the same interface is enabled
IP Flow switching is disabled
IP CEF switching is enabled
IP CEF Feature Fast switching turbo vector
IP multicast fast switching is enabled
R3# sh access-lists
Standard IP access list 11
10 permit any
Extended IP access list FIREWALL-INBOUND
10 permit tcp any host 172.16.11.3 eq www
20 permit tcp any host 172.16.11.3 eq telent
30 permit tcp any host 172.16.11.3 eq 22
40 permit tcp any host 172.16.11.3 eq ftp
50 permit tcp any host 172.16.11.3 eq ftp-data
60 permit ospf any any (20 matches)
70 deny ip any any (1 match)
R3#
As an attempt to find out why the SSH packets from R1 are rejected by R3, you cautiously make use of debug ip packet on R3. You re-attempt the SSH session from R1 to R3 and observe the debug output on R3, as shown in the output of Example 6-9.
R1# ssh -l user 172.16.11.3
% Connection refused by remote host
R1#
R3# debug ip packet
IP packet debugging is on
R3#
R3#
*Aug 23 16:32:42.711: IP: s=172.16.11.2 (Serial0/1/0), d=224.0.0.5, len 80,
rcvd 0
*Aug 23 16:32:49.883: %SEC-6-IPACCESSLOGP: list FIREWALL-INBOUND denied tcp
10.10.10.1(29832) -> 172.16.11.3(2222), 1 packet
*Aug 23 16:32:49.883: IP: s=10.10.10.1 (Serial0/1/0), d-172.16.11.3, len 44,
access denied
*Aug 23 16:32:49.883: IP: tableid=0, s-172.16.11.3 (local), d=10.10.10.1
(Serial0/1/0), routed via FIB
*Aug 23 16:32:49.883: IP: s=172.16.11.3 (local), d=10.10.10.1 (Serial0/1/0),
len 56, sending
*Aug 23 16:32:50.067: IP: s=172.16.11.3 (local), d=224.0.0.5 (Serial0/1/0),
len 80, sending broad/multicast
The SSH attempt from R1 fails again, but the security message (%SEC-6-IPACCESSLOGP) in the output of debug on R3 states that the denied TCP has the source IP address 10.10.10.1 and port number 29832 and destination IP address 172.16.11.3 and port number 2222! The destination port number is 2222 instead of 22, which is the allowed port (SSH) in access list FIREWALL-INBOUND that is applied inbound to R3’s serial 0/1/0 interface. Now you know why the packet is denied, but you have to find out why the destination port number is 2222 rather than 22. In other words, you have to determine which device has translated the port number from 22 to 2222. Based on the network topology, the prime suspect is NAT on R2. Once again, cautiously, you use debug ip nat on R2 and observe the results as you re-attempt SSH from R1 to R3. To confirm your findings, you would also enter the show ip nat translations command on R2. Example 6-10 shows the results.
R2# debug ip nat
IP NAT debugging is on
R2#
R2#
R2#
R2#
*Aug 23 16:28:31.731: NAT*: TCP s=55587, d=22->2222
R1# ssh -l user 172.16.11.3
% Destination unreachable; gateway or host down
R1#
R2# sh ip nat translations
Pro Inside global Inside local Outside local Outside global
tcp --- --- 172.16.11.3:22 172.16.11.3:2222
tcp 10.10.10.1:29832 10.10.10.1:29832 172.16.11.3:22 172.16.11.3:2222
tcp 10.10.10.1:43907 10.10.10.1:43907 172.16.11.3:22 172.16.11.3:2222
tcp 10.10.10.1:55587 10.10.10.1:55587 172.16.11.3:22 172.16.11.3:2222
tcp 10.10.10.1:60089 10.10.10.1:60089 172.16.11.3:22 172.16.11.3:2222
tcp 10.10.10.1:62936 10.10.10.1:62936 172.16.11.3:22 172.16.11.3:2222
R2#
The output shown on Example 6-10 clearly shows that R2 is port mapping. It is translating port 22 to port 2222, and that is the problem. It seems that the risk management team updated the security policies, but did not update the access lists for the custom ports being used. You are using TCP 2222; but the access list on R3 is permitting TCP 22. The next step is to correct the FIREWALL-INBOUND on R3 and re-attempt SSH from R1 to R3. The results are shown in the output shown in Example 6-11.
R3# conf t
Enter configuration commands, one per line. End with CNTL/Z.
R3(config)# ip access-list exten FIREWALL-INBOUND
R3(config-ext-nacl)# permit tcp any ho 172.16.11.3 eq 2222
R3(config-ext-nacl)# end
R3#
R1# ssh -l user 172.16.11.3
Password:
*Aug 23 16:30:42.604: TCP: Random local port generated 43884, network 1
*Aug 23 16:30:26.604: TCB63BF854C created
*Aug 23 16:30:26.604: TCB63BF854C bound to UNKNOWN.43884
*Aug 23 16:30:26.604: TCB63BF854C setting property TCP_TOS (11) 62AF6D55
*Aug 23 16:30:26.604: Reserved port 43884 in Transport Port Agent for TCP IP type 1
*Aug 23 16:30:26.604: TCP: sending SYN, seq 1505095793, ack 0
*Aug 23 16:30:26.604: TCP0: Connection to 172.16.11.3:22, advertising MSS 536
*Aug 23 16:30:26.608: TCP0: state was CLOSED -> SYNSENT [43884 ->
172.16.11.3(22)]
*Aug 23 16:30:26.608: TCP0: state was SYNSENT -> ESTAB [43884 ->
172.16.11.3(22)]
*Aug 23 16:30:26.608: TCP: tcb 63BF854C connection to 172.16.11.3:22, peer MSS
536, MSS is 536
*Aug 23 16:30:26.608: TCB63BF854C connected to 172.16.11.3.22
As shown in Example 6-11, the SSH attempt is successful now. In summary, the problem was not the NAT configuration, but a lack of synchronization between the configuration teams: The configuration on R2 is doing port mapping to a custom port (2222), but the access list configuration on R3 did not consider or account for the custom port.
Reviewing DHCP Operation
DHCP is a client/server protocol. The DHCP client acquires IP configuration parameters, such as IP address, subnet mask, and default gateway, from a DHCP server. The DHCP server is typically centrally located and operated by the network administrator; therefore, DHCP clients can be reliably and dynamically configured with parameters appropriate to the current network architecture. Because the DHCP client initially does not have IP configuration, it must find a DHCP server and obtain its IP configuration based on broadcast communication.
Most enterprise networks are divided into sub-networks (subnets). Each subnet usually maps to a virtual LAN (VLAN) and routers or multilayer switches route between the subnets. Because routers do not pass broadcasts by default, a DHCP server would be needed on each subnet. To address this issue, you can configure a router’s interface with a feature called DHCP relay agent, using the ip helper-address server-address command. When configured with this command, the router interface will forward the clients DHCP messages to the configured server-address. When the server sends its reply back to the router interface, the router interface forwards it to the client subnet.
DHCP has historically been used for automatic provisioning of IP parameters. Increased mobility and usage of laptop computers have made DHCP even more popular. The DHCP server offers more than just the IP address, subnet mask, and the default gateway. Cisco IP phones, for example, require a TFTP server address to download their configuration files and become active in the network. IP phones obtain the TFTP server’s IP address from the DHCP server through DHCP options and extensions. These options allow the protocol to expand the number and nature of parameters that can be delivered to hosts, including vendor specific parameters.
To troubleshoot DHCP, it is important to understand the nature and semantics of the various transactions that happen between servers and clients. Figure 6-5 illustrates the first set of actions that must complete so that a client with no IP configuration finds a DHCP server and obtains its IP configuration.
As shown in Figure 6-5, the client starts with a DHCP discover message with the source IP address 0.0.0.0; this address is referred to as the alien address! One or more servers reply with a DHCP offer. The client responds, typically to the first offer, using the DHCP request message. Multiple offer messages could be sent, but only one will be accepted by the client, indicated by the request message. The request message is a broadcast, and reaches all offering servers. The servers whose offers are not selected withdraw their offer by putting the offered IP address back into the pool of available addresses. The selected server responds with a DHCP ack, and confirms the transaction by delivering all configured parameters. Table 6-3 lists these and other DHCP packets (message types) involved in DHCP transactions, some issued by servers, and some issued by clients.
Packet Type | Description |
---|---|
DHCP discover | Client looking for available DHCP servers. It is a UDP broadcast (source port is 68, and the destination port is 67). |
DHCP offer | This is the server’s response to the client’s discover message. This is also a UDP broadcast (source port is 67, and the destination port is 68). |
DHCP request | This is client’s response to one specific DHCP offer. |
DHCP decline | Client-to-server communication, indicating that the IP address is already in use. |
DHCP ack | Server-to-client communication. This is the server’s response to a client request. This message includes all configuration parameters. |
DHCP nack | Server-to-client communication. This is the server’s negative response to a client’s request, indicating the original offer is no longer available. |
DHCP release | Client-to-server communication. The client relinquishes its IP address and other parameters. |
DHCP inform | Client-to-server communication. Using this message, the client asks for local configuration parameters such as DNS server’s IP address, but it has its IP address externally configured. |
The DHCP messages listed in Table 6-3 are most helpful to know during the troubleshooting process. For example, the DHCP decline message, which is a client-to-server message, indicates that the IP address is already in use. The DHCP nak message is useful in determining whether the DHCP server refuses the request for a certain configuration parameter.
Common DHCP Troubleshooting Issues
For troubleshooting purposes, the DHCP problems can be divided into three categories with relation to the router’s role in the DHCP process. As Figure 6-6 displays, a router can play three roles: DHCP server, DHCP client, or DHCP relay agent.
In branch office scenarios, it is typical to have Cisco IOS routers or firewalls acting as the DHCP server. In that case, troubleshooting DHCP is critical because it will affect the operations of all hosts in the branch, including IP phones and video devices.
In some other scenarios, routers can act as DHCP clients. An example of this is a branch or home office router using broadband connectivity such as digital subscriber line (DSL) or cable and obtaining IP address and IP parameters from the service provider’s DHCP server.
The router can also act as a broker for DHCP transactions, located in the path of DHCP requests. This is a typical scenario in networks where DHCP servers are centralized and serve multiple network segments and the router acts as a DHCP relay agent.
In all three scenarios, your typical DHCP issues can come from multiple sources. It is crucial to apply a structured troubleshooting method and to know the inner workings of the protocol.
The most common reasons for problems are configuration issues. This can result in a multitude of symptoms, such as clients not obtaining IP information from the server, client requests not reaching the server across a DHCP relay agent, or clients failing to obtain DHCP options and extensions. Poor capacity planning and security issues might also be a source of problems. For example, DHCP scope exhaustion is becoming increasingly common, in part due to malicious attacks. In some scenarios, the infrastructure and requirements lead to a combination of static and dynamic IP address assignments. In those scenarios, it is possible that a DHCP server grants an IP address that is already in use. Different implementations of servers and clients react differently to this and that is important to know for troubleshooting purposes. Furthermore, if you have multiple DHCP servers, or even rogue DHCP servers in your network, because there is no stateful communication between DHCP servers, there is a chance that duplicate IP addresses will be assigned to hosts. Other management issues arise due to the “pull” nature of DHCP. There are no provisions in the protocol to allow the DHCP server to push configuration parameters or control messages to DHCP clients. A good example, with critical implications in IP address renumbering, is that IP addresses must be renewed from the client side. There is no server-side, push-type renewal process. This means that during renumbering, all clients would need to reboot or manually renew their IP addresses. Otherwise, you need to wait until the clients leases expire, which might not be a viable option.
Note | The phrase “no stateful communication between the DHCP servers” means that the servers do not communicate about the addresses they have assigned and have not assigned. In other words, they do not exchange state information. |
Problems related to the DHCP relay agent require special consideration. The Cisco IOS command that makes a router a DHCP relay agent is the ip helper-address command. This is an interface configuration command that makes the router forward the BOOTP/DHCP requests from clients to the DHCP server. However, if the DHCP server’s IP address changes, you must reconfigure all the interfaces of all the routers with the new IP helper address (DHCP server’s new IP address). Another issue related to DHCP relay agent is that enabling a router interface with the ip helper-address command makes the interface forward UDP broadcasts for six protocols (not just DHCP) to the IP address configured using the ip helper-address command. Those protocols are as follows:
-
TFTP (port 69)
-
DNS (port 53)
-
Time Service (port 37)
-
NetBIOS Name Service and Datagram Service (ports 137 and 138)
-
TACACS (port 49)
-
DHCP/BOOTP client and server (ports 67 and 68)
If other protocols do not require this service, forwarding their requests must be disabled manually on all routers using the Cisco IOS no ip forward-protocol udp port-number global configuration mode command.
Another area of DHCP troubleshooting is the vendor-specific and function-specific DHCP options.
DHCP options have historically been an integral part of the protocol, because they are used to deliver parameters in addition to the traditional IP address, subnet mask, default gateway, and DNS server address. Some options aim at tuning the DHCP operation, or even the TCP/IP operation. Table 6-4 lists some important DHCP configured parameters and options.
DHCP Option | Code | Description |
---|---|---|
Subnet mask | 1 | Specifies the subnet mask for the client to use (as per RFC 950) |
Router | 3 | The list of routers the client can use (usually, in order of preference) |
Domain name server | 6 | The list of DNS servers the client can use (usually, in order of preference) |
ARP cache timeout | 35 | Specifies the timeout (seconds) for ARP cache entries |
IP address lease time | 51 | Specifies the period over which the IP address is leased for (it must be renewed) |
Relay agent information | 82 | Information about the port from which the DHCP request originates |
TFTP server IP address | 150 | Typically used by devices such as IP phones to download their configuration files |
Some DHCP options are required for certain other protocols or applications to function properly. For example, the relay agent information option number 82 makes DHCP relay agents add source port (or interface or circuit) information to the forwarded DHCP requests. The DHCP server obtains and stores that information which can be used to identify the physical location of a DHCP client. For enhanced 911 purposes, for example, it is important to know the switch port that an IP phone connects to, to determine the 911 caller’s location. Another example of DHCP options field is number 150, which is used by IP phones requesting DHCP information from the network. Option 150 provides the IP address of the TFTP server to the IP phones so that they can download their firmware and configuration files, which is a critical step in the IP phone boot process. The bottom line is that you need to ensure that the DHCP servers support and are configured with the needed options. Unfortunately, sometimes those options are vendor-specific and might not be supported by all devices. In that case, options as such would have to be entered manually, provided in a configuration file (through TFTP, for example), or using another method.
Other troubleshooting scenarios might be related to DHCP security efforts. This is a perfect example of how multiple services act in an integrated fashion: Automatic addressing is accomplished through DHCP, and security is accomplished through DHCP snooping. A misconfigured service can and will affect other services. The following are some specific issues related to DHCP snooping:
-
Improper configuration of the DHCP snooping trust boundaries
-
Failure to configure DHCP snooping on certain VLANs
-
Improper configuration of the DHCP snooping rate limits
-
Performance degradation
These issues illustrate the impact of poor planning of DHCP snooping, which can result in DHCP transactions being blocked or affected. Improper location of DHCP snooping boundaries, low DHCP snooping rate limits, and other configurations related to security must be part of your considerations during the DHCP troubleshooting process.
DHCP Troubleshooting Tips and Commands
For troubleshooting purposes, it is important that you can answer the following questions:
-
Where are the DHCP servers and clients located? Are they co-located in the same IP subnet, or do you need to configure relay agents?
-
Are DHCP relay agents configured? (They are most likely necessary.)
-
What are the DHCP pool sizes? Are they sufficient? (Otherwise you’ll run out of addresses.)
-
Are there any DHCP option compatibility issues? (Some applications will fail if all necessary options are not supplied.)
You must also investigate the following possibilities:
-
Are there any ACLs or firewalls filtering UDP port 67 or UDP port 68?
-
Is the ip helper-address command applied to correct router interfaces?
Some of the Cisco IOS commands that can prove helpful for DHCP troubleshooting are as follows:
-
show ip dhcp server statistics: Because this command displays count information about server statistics and messages sent and received, it can be very helpful during troubleshooting of an IOS-based DHCP server.
-
show ip dhcp binding: This command is used to display DHCP binding information for IP address assignment and subnet allocation.
-
show ip dhcp conflict: Use this command to display address conflicts found by a Cisco IOS DHCP server when addresses are offered to the client. The server uses ping to detect conflicts. The client uses gratuitous Address Resolution Protocol (ARP) to detect clients. If an address conflict is detected, the address is removed from the pool and the address is not assigned until an administrator resolves the conflict.
-
show ip dhcp database: This command is used to display Cisco IOS DHCP server database agent information, such as the following:
-
URL: Specifies the remote file used to store automatic DHCP bindings
-
Read/written: The last date and time bindings were read/written from the file server
-
Status: Indication of whether the last read or write of host bindings was successful
-
Delay: The amount of time (in seconds) to wait before updating the database
-
Timeout: The amount of time (in seconds) before the file transfer is aborted
-
Failures/Successes: The number of failed/successful file transfers
-
-
show ip dhcp pool: This command is used to determine the subnets allocated and to examine the current utilization level for the pool or all the pools if the name argument is not used.
-
debug ip udp: This debug displays all UDP packets sent and received and it can use considerable CPU cycles on the device.
-
debug ip dhcp server [packets | events]: It is evident that this command enables DHCP server debugging, which can be helpful during relevant troubleshooting exercises. The events option reports server events such as address assignments and database updates and the packets option decodes DHCP receptions and transmissions.
-
clear ip dhcp binding {* | address}: Use this command to delete an address binding from the Cisco IOS DHCP server database. The address denotes the IP address of the client. If the asterisk (*) character is used as the address parameter, DHCP clears all automatic bindings.
-
clear ip dhcp conflict {* | address}: Use this command to clear an address conflict (or all address conflicts, with the * option) from the Cisco IOS DHCP server database.
DHCP Troubleshooting Example: Problems After a Security Audit
This troubleshooting example is based on the diagram depicted in Figure 6-7. Router R1 provides DHCP services to the clients in the 10.1.1.0 subnet. The DHCP clients in this example are routers R2 and R3. A security audit has been recently performed in router R1, and you receive reports that R1 is no longer providing reliable DHCP services: The clients are unable to renew their IP addresses.
As the first step in fact gathering, you can try to determine whether the problem affects all clients in the LAN, or if only some clients are experiencing the reported symptoms. You check R2 to make sure that it is configured as a DHCP client. The output of the show ip interfaces brief command, displayed in Example 6-12, shows that interface fa0/0 is configured as a DHCP client and it shows an “unassigned” IP address. Next, you check router R3 and find the same situation (also displayed in Example 6-12). Because multiple clients are having the same problem, it is reasonable to suspect the problem originates elsewhere. You can check the configuration of the DHCP server next.
R2# show ip int brief
Interface IP-Address OK? Method Status Protocol
FastEthernet0/0 unassigned YES DHCP up up
FastEthernet0/1 unassigned YES NVRAM administratively down down
Serial0/0/0 unassigned YES NVRAM administratively down down
R2#
R3# show ip int brief
Interface IP-Address OK? Method Status Protocol
FastEthernet0/0 unassigned YES DHCP up up
FastEthernet0/1 unassigned YES unset administratively down down
Serial0/0/0 unassigned YES unset administratively down down
Serial0/1/0 unassigned YES unset administratively down down
R3#
To investigate the interaction between the clients and server, you can check whether the clients are issuing DHCP discovery messages. Using the debug dhcp detail command on R3, you find that the DHCP discovery messages are generated out of interface fa0/0, but no DHCP offers are received back from the DHCP server. Also, this client is making multiple requests, and after three attempts, it times out with the Timed out selecting state message, followed by No allocation possible. The results are shown in Example 6-13. Using the same debug command on the second client, R2, provides the exact same results.
R3# debug dhcp detail
DHCP client activity debugging is on (detailed)
R3#
R3#
*Aug 23 17:32:37.107: Retry count: 1 Client-ID: cisco-0019.5592.a442-Fa0/0
*Aug 23 17:32:37.107: Client-ID hex dump: 636973636F2D303031392E353539322E
*Aug 23 17:32:37.107: 613434322D4551302F30
*Aug 23 17:32:37.107: Hostname: R3
*Aug 23 17:32:37.107: DHCP: SDiscover: sending 291 byte length DHCP packet
*Aug 23 17:32:37.107: DHCP: SDiscover 291 bytes
*Aug 23 17:32:37.107: B cast on FastEthernet0/0 interface
from 0.0.0.0
*Aug 23 17:32:40.395: DHCP: SDiscover attempt #2 for entry:
*Aug 23 17:32:40.395: Temp IP addr: 0.0.0.0 for peer on Interface:
FastEthernet0/0
*Aug 23 17:32:40.395: Temp sub net mask: 0.0.0.0
*Aug 23 17:32:40.395: DHCP Lease server: 0.0.0.0, state: 1 Selecting
*Aug 23 17:32:40.395: DHCP transaction id: 13BA
*Aug 23 17:32:40.395: Lease: 0 secs, Renewal: 0 secs, Rebind: 0 secs
*Aug 23 17:32:40.395: Next timer fires after: 00:00:04
*Aug 23 17:32:40.395: Retry count: 2 Client-ID: cisco-0019.5592.a442-Fa0/0
*Aug 23 17:32:40.395: Client-ID hex dump: 636973636F2D303031392E353539322E
*Aug 23 17:32:40.395: 613434322D4551302F30
*Aug 23 17:32:40.395: Hostname: R3
*Aug 23 17:32:40.395: DHCP: SDiscover: sending 291 byte length DHCP packet
*Aug 23 17:32:40.395: DHCP: SDiscover 291 bytes
*Aug 23 17:32:40.395: B cast on FastEthernet0/0 interface
from 0.0.0.0
*Aug 23 17:32:44.395: DHCP: SDiscover attempt #3 for entry:
*Aug 23 17:32:44.395: Temp IP addr: 0.0.0.0 for peer on Interface:
FastEthernet0/0
*Aug 23 17:32:44.395: Temp sub net mask: 0.0.0.0
*Aug 23 17:32:44.395: DHCP Lease server: 0.0.0.0, state: 1 Selecting
*Aug 23 17:32:44.395: DHCP transaction id: 13BA
*Aug 23 17:32:44.395: Lease: 0 secs, Renewal: 0 secs, Rebind: 0 secs
*Aug 23 17:32:44.395: Next timer fires after: 00:00:04
*Aug 23 17:32:44.395: Retry count: 3 Client-ID: cisco-0019.5592.a442-Fa0/0
*Aug 23 17:32:44.395: Client-ID hex dump: 636973636F2D303031392E353539322E
*Aug 23 17:32:44.395: 613434322D4551302F30
*Aug 23 17:32:44.395: Hostname: R3
*Aug 23 17:32:44.395: DHCP: SDiscover: sending 291 byte length DHCP packet
*Aug 23 17:32:44.395: DHCP: SDiscover 291 bytes
*Aug 23 17:32:44.395: B cast on FastEthernet0/0 interface
from 0.0.0.0
*Aug 23 17:32:48.395: DHCP: Qscan: Timed out Selecting state%Unknown
DHCP problem... No allocation possible
*Aug 23 17:32:57.587: DHCP: waiting for 60 seconds on interface FastEthernet0/0
Now you can start to check router R1, the DHCP server. Applying a bottom-up approach and using the show ip int brief command to verify that Layers 1 and 2 are operational on the documented DHCP server, you find that R1’s fa0/0 interface is up, with the IP address 10.1.1.1/24. Next, you would check the DHCP server information. Two commands are useful: The first command, show ip dhcp server statistics, displays useful counters. Changes on those counters are a signal of good operation. The output of this command on R1 is shown in Example 6-14, which displays that one address pool configured as part of the service and very few DHCP messages sent and received. So it looks to be in good shape, just not very active. You must verify the address pool configuration. In case the IP address scope, known as the address pool in Cisco IOS DHCP, is either misconfigured or exhausted, you check the DHCP address pool using the show ip dhcp pool command. The results, also displayed in Example 6-14, show one address pool, vlan10. You see that the number of leased addresses are zero, meaning that there are no leased addresses. You still have 254 addresses in this pool, but none are allocated at the moment.
R1# show ip dhcp server statistics
Memory usage 9106
Address pools 1
Database agents 0
Automatic bindings 0
Manual bindings 0
Expired bindings 0
Malformed messages 0
Secure arp entries 0
Message Received
BOOTREQUEST 0
DHCPDISCOVER 1
DHCPREQUEST 1
DHCPDECLINE 0
DHCPRELEASE 0
DHCPINFORM 0
Message Semt
BOOTREPLY 0
DHCPOFFER 1
DHCPACK 1
DHCPNAK 0
R1#
R1# sh ip dhcp pool
Pool vlan10 :
Utilization mark (high/low) : 100/0
Subnet size (first/next) : 0/0
Total addresses : 254
Leased addresses : 0
Pending event : none
1 subnet is currently in the pool :
Current index IP address range Leased addresses
10.1.1.12 10.1.1.1 -10.1.1.254 0
R1#
Now you know both the DHCP server and the DHCP clients have the correct configurations, and are operationally UP at physical and data link layers. Remembering that there has been a security audit recently (including one on R1), it is a good idea to verify whether any changes were made by security auditors that affect DHCP. One hardening method that security experts use to make a device less vulnerable to security incidents is shutting down unused services. It is possible that the auditors have shut down the DHCP service on R1. This sounds like a plausible conjecture because the rest of the configuration looks fine. The output of the show ip sockets command in Example 6-15 displays the active ports on R1, the DHCP server. The show ip sockets command is not frequently used by network administrators, but it is handy in monitoring the open ports on a router.
R1# show ip sockets
Proto Remote Port Local Port In Out Stat TTY OutputIF
88 --listen-- 10.1.1.1 10 0 0 0 0
17 --listen-- 10.1.1.1 161 0 0 1001 0
17 --listen-- 10.1.1.1 162 0 0 1011 0
17 --listen-- 10.1.1.1 57767 0 0 1011 0
17 --listen-- --any-- 161 0 0 20001 0
17 --listen-- --any-- 162 0 0 20011 0
17 --listen-- --any-- 60739 0 0 20011 0
R1#
In the output of the show ip sockets command, you see a series of services functioning on certain open ports; however, DHCP/BOOTP is nowhere to be found. You would need to see UDP 67 if the DHCP service was running. This is certainly a problem, so you would enable the DHCP service using the service dhcp command. After enabling this service, you must wait a few seconds, because DHCP clients retry at different intervals. Meanwhile, you would use the show ip sockets command again and now see port 67 as an active port on R1. The results are shown in Example 6-16.
Note | The show ip sockets command has been replaced by the show udp and show sockets commands as of Cisco IOS Software Release 12.4(11)T. |
R1# conf t
Enter configuration commands, one per line. End with CNTL/Z.
R1(config)# service dhcp
R1(config)#
R1(config)# end
R1#
R1#
R1# show ip socket
Proto Remote Port Local Port In Out Stat TTY OutputIF
88 --listen-- 10.1.1.1 10 0 0 0 0
17 --listen-- 10.1.1.1 161 0 0 1001 0
17 --listen-- 10.1.1.1 162 0 0 1011 0
17 --listen-- 10.1.1.1 57767 0 0 1011 0
17 --listen-- --any-- 161 0 0 20001 0
17 --listen-- --any-- 162 0 0 20011 0
17 --listen-- --any-- 60739 0 0 20011 0
17 0.0.0.0 0 10.1.1.1 67 0 0 2211 0
R1#
Finally, you would check routers R2 and R3 (the DHCP clients) and see that they successfully obtained IP addresses. The problem was that the security auditors have shut down the DHCP service on the DHCP server!
DHCP Troubleshooting Example: Duplicate Client IP Addresses
The second DHCP troubleshooting example is based on the topology shown in Figure 6-8. The IP address of router R1 on the Fast Ethernet interface was changed from 10.1.1.100 to 10.1.1.1 to comply with the new addressing scheme and policies of the network. This policy states that all branch routers will have the first IP address on any subnet that is being assigned to a network segment. After the change, some DHCP clients are reporting duplicated IP addresses. Clients state that this happens sporadically, a few times a week.
Because you know that the change happened at the router acting as the DHCP server (R1), it is good to get the troubleshooting process started there, while you ask the questions and try to gather more information on what the exact symptoms are. One piece of information that you have is that the IP address duplication happens sporadically, and one host at a time. Knowing that, perhaps your first order of business is to look at the lease times, and see whether they match the frequency of the symptoms. If the lease time of a Cisco IOS DHCP server is set to default values, there is really no command to display it other than show running-config | begin ip dhcp pool. The output of this command, displayed in Example 6-17, shows the vlan10 DHCP pool with a lease time of 3 days.
R1# show running-config | beg ip dhcp pool
ip dhcp pool vlan10
network 10.1.1.0 255.255.255.0
default-router 10.1.1.1
lease 3
The DHCP pool seems to be correct. Based on the reported symptoms, it seems logical to check whether there are statically assigned IP addresses conflicting with some of the addresses that are being delivered to the clients as part of the DHCP dynamic address pool. The show ip dhcp conflict command will verify whether the DHCP server has found overlap or duplication in the IP addresses that it has assigned. Example 6-18 shows the output of this command, entered on R1. One of the many conflicting addresses is 10.1.1.1, which is the new IP address of router R1 (the DHCP server itself) on interface fa0/0. However, you know that the DHCP server will not provide its own IP address to its clients.
R1# show ip dhcp conflict
IP address Detection method Detection time VRF
10.1.1.1 Gratuitous ARP Aug 23 2009 06:28 PM
10.1.1.3 Gratuitous ARP Aug 23 2009 06:29 PM
10.1.1.3 Gratuitous ARP Aug 23 2009 06:29 PM
10.1.1.4 Gratuitous ARP Aug 23 2009 06:29 PM
10.1.1.5 Gratuitous ARP Aug 23 2009 06:29 PM
10.1.1.6 Gratuitous ARP Aug 23 2009 06:29 PM
10.1.1.7 Gratuitous ARP Aug 23 2009 06:29 PM
10.1.1.8 Gratuitous ARP Aug 23 2009 06:29 PM
10.1.1.9 Gratuitous ARP Aug 23 2009 06:29 PM
10.1.1.10 Gratuitous ARP Aug 23 2009 06:29 PM
10.1.1.11 Gratuitous ARP Aug 23 2009 06:29 PM
10.1.1.12 Gratuitous ARP Aug 23 2009 06:29 PM
10.1.1.13 Gratuitous ARP Aug 23 2009 06:29 PM
--More--
Many devices such as servers and printers are usually configured as DHCP clients and have static IP addresses. If their addresses are not excluded from the DHCP dynamic pool, there will definitely be conflict problems. You must check and verify which IP addresses are being excluded on R1, the DHCP server. You do that using the show running-config | include excluded command as demonstrated in Example 6-19.
R1# show running-config | inc excluded
ip dhcp excluded-address 10.1.1.100
R1#
R1#
As per the output shown in Example 6-19, the only IP address excluded from the DHCP dynamic pool is 10.1.1.100, which is R1’s old address. The static addresses assigned to devices such as printers and servers are not excluded and are handed out to DHCP clients. The result is the address conflicts the users are experiencing. What you need to do is put the 10.1.1.100 back in the pool, because it need not be excluded any more. You also need to exclude the range of addresses that are meant to be statically assigned. This range is 10.1.1.1 to 10.1.1.20. Example 6-20 shows this work.
R1#
R1# conf t
Enter configuration commands, one per line. End with CNTL/Z.
R1(config)# no ip dhcp excluded-address 10.1.1.100
R1(config)# ip dhcp excluded-address 10.1.1.1 10.1.1.20
R1(config)#
R1# end
To ensure that the users will receive unique addresses from the DHCP server and will not incur any more address conflicts, you must renew IP address leases on all DHCP clients, especially those that have experienced conflicts before.
DHCP Troubleshooting Example: Relay Agent Issue
The final DHCP troubleshooting example is based on the diagram depicted in Figure 6-9. In this case, there is a centrally located DHCP server, represented by R4 in the topology diagram shown in Figure 6-9. The DHCP clients in network segment 10.1.1.0 are unable to obtain IP address and other parameters from the central DHCP server. R2 is a DHCP client that is having trouble acquiring the IP address, and R2 is the router that is supposed to act as a relay agent and forward DHCP messages between local clients and the DHCP server (R4).
Based on the symptom and the network topology, there are several possible causes:
-
The clients could be misconfigured or faulty.
-
The relay agent could be not configured or misconfigured.
-
The server could be misconfigured or exhausted (or faulty/disabled).
-
There could be network problems or filtering/security barriers.
Using a structured approach, you should start at the most simple and direct place, where you can start eliminating hypotheses quickly. If you start at the client side, you might need to renew DHCP addresses in multiple clients to prove the point. Because multiple clients are having the same problem, it is possible that they are all misconfigured, but that is not likely. It is simpler to check the relay agent. If you find no problem with the relay agent, you can then check the DHCP server. One of the quickest ways to verify DHCP relay agent operations is using the debug ip udp command. You use this debug on R1 and observe the results, as shown in Example 6-21.
R1#
R1# debug ip udp
UDP packet debugging is on
R1#
R1#
*Aug 23 19:01:05.303: UDP: rcvd src-0.0.0.0(68), dst=255.255.255.255(67),
length=584
*Aug 23 19:01:05.303: UDP: broadcast packet dropped, src=0.0.0.0,
dst=192.168.1.255
*Aug 23 19:01:08.911: UDP: rcvd src-0.0.0.0(68), dst=255.255.255.255(67),
length=584
*Aug 23 19:01:08.911: UDP: broadcast packet dropped, src=0.0.0.0,
dst=192.168.1.255
*Aug 23 19:01:12.911: UDP: rcvd src-0.0.0.0(68), dst=255.255.255.255(67),
length=584
*Aug 23 19:01:12.911: UDP: broadcast packet dropped, src=0.0.0.0,
dst=192.168.1.255
*Aug 23 19:01:35.795: UDP: rcvd src-0.0.0.0(68), dst=255.255.255.255(67),
length=584
*Aug 23 19:01:35.795: UDP: broadcast packet dropped, src=0.0.0.0,
dst=192.168.1.255
*Aug 23 19:01:38.911: UDP: rcvd src-0.0.0.0(68), dst=255.255.255.255(67),
length=584
*Aug 23 19:01:38.911: UDP: broadcast packet dropped, src=0.0.0.0,
dst=192.168.1.255
*Aug 23 19:01:42.911: UDP: rcvd src-0.0.0.0(68), dst=255.255.255.255(67),
length=584
*Aug 23 19:01:42.911: UDP: broadcast packet dropped, src=0.0.0.0,
dst=192.168.1.255
As you can see in the output shown in Example 6-21, R1 is certainly receiving DHCP requests. The UDP/IP packets shown have a source address of 0.0.0.0, destination address of 255.255.255.255 with source UDP port of 68 (DHCP client) and destination UDP port of 67 (DHCP server). The problem could be that the fa0/0 interface facing the DHCP client is missing the ip helper-address command pointing to 192.168.1.4. Checking the configuration reveals that this command is indeed missing, so adding this command is the first thing you have to do. Next, you can use the debug ip udp command on R4, the DHCP server. Example 6-22 shows the result.
R1(config)# int fa0/0
R1(config-if)# ip helper-address 192.168.1.4
R1(config-if)# end
R4#
R4#
R4#
*Aug 23 19:31:39.303: UDP: sent src=0.0.0.0(67), dst=255.255.255.255(68),
length=308
*Aug 23 19:31:39.303: UDP: rcvd src=0.0.0.0(68), dst=255.255.255.255(67),
length=584
*Aug 23 19:31:39.303: UDP: sent src=0.0.0.0(67), dst=255.255.255.255(68),
length=308
*Aug 23 19:31:40.159: UDP: rcvd src=0.0.0.0(68), dst=192.168.1.4(67), length=584
*Aug 23 19:31:44.159: UDP: rcvd src=0.0.0.0(68), dst=192.168.1.4(67), length=584
*Aug 23 19:31:46.307: UDP: rcvd src=10.1.1.11(53470), dst=255.255.255.255(69),
length=30
*Aug 23 19:31:49.307: UDP: rcvd src=10.1.1.11(53470), dst=255.255.255.255(69),
length=30
*Aug 23 19:31:53.307: UDP: rcvd src=10.1.1.11(53470), dst=255.255.255.255(69),
length=30
*Aug 23 19:31:58.307: UDP: rcvd src=10.1.1.11(53470), dst=255.255.255.255(69),
length=30
*Aug 23 19:32:04.307: UDP: rcvd src=10.1.1.11(53470), dst=255.255.255.255(69),
length=30
*Aug 23 19:32:11.307: UDP: rcvd src=10.1.1.11(53470), dst=255.255.255.255(69),
length=30
*Aug 23 19:32:19.307: UDP: rcvd src=10.1.1.11(53470), dst=255.255.255.255(69),
length=30
*Aug 23 19:32:28.439: UDP: rcvd src=10.1.1.11(53470), dst=255.255.255.255(69),
length=29
*Aug 23 19:32:31.439: UDP: rcvd src=10.1.1.11(53470), dst=255.255.255.255(69),
length=29
*Aug 23 19:32:35.439: UDP: rcvd src=10.1.1.11(53470), dst=255.255.255.255(69),
length=29
*Aug 23 19:32:37.011: UDP: rcvd src=0.0.0.0(68), dst=192.168.1.4(67), length=584
Finally, you verify the status of the DHCP clients, such as R2, in the 10.1.1.0 subnet, and see that they are acquiring IP address and other parameters from the DHCP server.
1 comments
I like your blog post. Keep on writing this type of great stuff. I'll make sure to follow up on your blog in the future.
NAT/PAT|
ISDN Configuration
Post a Comment