Troubleshooting Secure Networks
To troubleshoot effectively and efficiently in a secure infrastructure, it is important to understand which features are deployed and how they operate. Most security features operate at the transport layer and above. Therefore, it is important to be familiar with the generic principles of troubleshooting Layer 4 connectivity. Using a generic troubleshooting process can help to determine whether the problems are likely to be related to security features or caused by underlying Layer 1, 2, or 3 connectivity issues. After you have determined that the problem must be related to a security feature, you need to have detailed knowledge of the specific feature to diagnose and resolve the issue. Depending on your organization, you might need to escalate the issue to a security specialist once you arrive at this point in the troubleshooting process.
Troubleshooting Challenges in Secured Networks
The objective of building a network infrastructure is to enable connectivity between devices or between different parts of a network. The objective of security, however, is the complete opposite. Security features usually aim to restrict connectivity and only allow traffic that is specifically permitted by the security policy to support the organization’s business processes. This adds another dimension to network troubleshooting. In a network that is completely open and has been designed to enable connectivity from any point to any point in the network, it is easy to validate a reported problem. If there is no connectivity between any two hosts or applications, there is definitely a problem. In that case, you initiate a troubleshooting process to isolate the problem, find the point where connectivity is failing, and restore the connectivity by implementing the necessary changes.
In a secured environment, a reported connectivity problem does not automatically translate to a valid problem that needs to be resolved. First, you need to determine whether the reported lack of connectivity actually concerns authorized traffic according to the security policy of your organization. If this is not the case, the problem is not a technical problem, but should be resolved at the business level. If there is a valid reason for a user who is currently not allowed to have access/connectivity to some resource to be granted access/connectivity to that resource, the security policy may need to be reevaluated and changed. Eventually, changes in the policy should result in changes in the implementation, such as the addition of a new firewall rule. After you have validated the problem and determined that the reported lack of connectivity concerns traffic that should be allowed (according to the policy), the implemented security features complicate the troubleshooting process, because they add more potential problem causes that need to be eliminated to diagnose the problem. Therefore, it is vital that you know which security features have been implemented at each point in your network, because that will help you to quickly assess whether a misconfigured security feature might be a potential cause of the problem.
Once you have partially diagnosed a problem and you want to establish whether the problem is caused by a security feature, a useful diagnostic technique is to temporarily disable the security feature and see whether that fixes the problem. However, you should realize that by doing so, you are creating a situation that is not compliant with the security policy, and therefore, you are taking a risk of opening the network to an attack. Therefore, always consider the potential risk of the change and balance that against the criticality of the problem that you are troubleshooting. For example, imagine that you are troubleshooting a network problem caused by the fact that two Open Shortest Path First (OSPF) Protocol routers do not establish an OSPF adjacency, and the security policy states that message digest 5 (MD5) authentication should be used between OSPF routers. You might consider authentication as a possible cause of the problem. In that case, it might be permissible to temporarily remove the OSPF authentication between the two routers to confirm or eliminate OSPF authentication as the source of the problem. The criticality and urgency of the problem is high, and the risk of an actual security incident if you remove OSPF authentication for a few minutes is low. Now imagine that you are troubleshooting a connectivity problem of a user who cannot reach a particular site on the Internet. You suspect that this might be caused by an access list on the perimeter router. In this case, it might not be safe to temporarily remove the access list to verify that is indeed the cause of the problem.
In conclusion, if you realize that disabling a security feature restores connectivity, you cannot consider it as an actual solution to the problem. Even though it is beneficial to establish that the security feature is causing the problem, if the solution is not compliant with the security policy, it is not an acceptable solution; it is a workaround at best. In those situations, you must roll back the change and continue your troubleshooting process until you find a solution compliant with the security policy.
Security Features Review
In the past, network security was often implemented as an additional layer on top of the network infrastructure. Special devices, such as firewalls, VPN concentrators, and intrusion detection systems (IDS) were added to the network to implement specific security features, while the routers and switches would provide the basic network connectivity and not be involved in the security aspects of the network. A common approach was to design and implement a network infrastructure to provide connectivity and then design and implement a security solution on top of that. Over the years, it was realized that a more holistic approach to security is necessary to build secure networks. A system is as secure as its weakest component, and therefore, the security risks and vulnerabilities of each component and layer of the network should be evaluated and addressed. In addition to using specialized security devices, such as firewalls, intrusion prevention systems (IPS), and VPN concentrators, network devices such as routers and switches and the protocols that are used between these devices should be secured. If the network infrastructure itself is compromised, the entire system can be compromised. In addition, in smaller networks, the router might have a dual role, functioning as both a router and as a security device by providing firewall, IPS, or VPN services. Finally, infrastructure devices such as routers and switches can function as a component in a distributed security system.
The implementation of security features can affect router and switch operation on different planes. On a network device, there are three fundamental categories of functionality, also called functional planes. These planes and their corresponding types of traffic must be secured. The three main functional planes are as follows:
-
Management plane: The management plane represents all the functions and protocols involved in managing the device. This includes accessing information about the device configuration, device operation, and statistics. It also includes changing the device configuration to alter its behavior. Securing this plane is vital to the overall security of the device. If the management plane is compromised, the other planes are also exposed.
-
Control plane: The control plane represents all the functions and protocols that are used between network devices to control the operation of the network, such as routing protocols, the Spanning Tree Protocol (STP), or other signaling and control protocols. Because the control plane affects the behavior of the data plane, the control plane protocols need to be secured. If unauthorized devices are allowed to participate in the control plane protocols, this opens up possibilities for an attacker to block or divert traffic.
-
Data plane: The data plane represents all the functions involved in forwarding traffic through the device. Routers and switches can inspect and filter traffic as part of the implementation of a security policy. It is important to note that all management and control plane traffic flow through the data plane, too. Consequently, security features on the data plane can be used to secure the management and control planes, too. This implies that failures on the management and control plane may be caused by the implementation of security features on the data plane.
router or switch are commonly accessed using three different methods:
-
The Cisco IOS command-line interface (CLI)
-
Web-based device management
-
A network management platform based on Simple Network Management Protocol (SNMP)
All these methods must be used in the most secure way, based on the device type, its operating system capabilities (IOS), and the security policies of the organization.
The Management Plane
The CLI that is part of Cisco IOS Software is the most common and most powerful method to manage routers and switches. Commands can be entered directly through a serial connection to the console of the device, or remotely through Telnet or Secure Shell (SSH). For all these methods, at the very least, some form of authentication should be implemented to ensure that only authorized personnel can access and configure the network devices. Furthermore it is recommended to restrict the network locations that these devices can be accessed from. Moreover, because of its more secure operation, SSH should be used as the access method rather than Telnet when possible. Telnet transmissions contain unencrypted data (including the password), whereas SSH uses encryption to secure its transmission. If it is not possible to use SSH to the devices themselves and Telnet is the only option, additional precautions need to be taken to secure the management traffic. For example, Telnet access could be restricted to a secure part of the network that is used only for management and access to this network itself is secured through use of VPN techniques or SSH using a bastion host (possibly exposed, but hardened to withstand attacks). Finally, you should be aware that physical security of the device itself is also vital to the security of the management plane. The CLI can always be accessed through the serial console of the device itself, and therefore having physical access to the device implies having access to the command line. Again, authentication can limit access, but you have to be aware that if someone has access to the console of the device and the ability to power cycle the device, that person can perform the password recovery procedure and gain control of the device.
Note | On some devices, the impact of a successful password recovery procedure can be limited by use of the no service password-recovery command. By enabling this command, the original configuration and passwords of the device cannot be recovered using the password recovery procedure. However, the device configuration can still be erased causing the device to be reset to factory defaults. If no additional control plane and data plane security measures have been implemented and the attacker has sufficient knowledge of the network, it may be possible to rebuild the configuration and gain access to the network. |
An alternative method to manage routers and switches is by use of a web-based device manager such as the Cisco Configuration Professional (CCP) or the Security Device Manager (SDM), which is either installed on the device itself or on a PC. The protocol used by these web-based device managers is either HTTP or HTTPS. HTTPS is more secure than HTTP because it uses encryption to secure its transmissions, while HTTP transmits unencrypted data. Similar arguments that are used in the discussion about the use of Telnet versus SSH can be applied to the use of HTTP versus HTTPS. Authentication should be implemented to restrict web-based access to authorized users only, and it is safer to restrict the locations that these devices can be accessed from.
A third method to access the management functions of the device is by use of a network management platform using SNMP. Most commonly, this method is only used to access operational parameters and statistics of the device, not to change the configuration. In that case, devices are only configured for read-access, and the configuration cannot be changed. From a security standpoint, this means that the associated security risk is generally lower. If the devices have been configured for read-write access, the configuration can be changed, and the same level of security should be applied that is also applied for command-line or web-based access.
Authentication, authorization, and accounting (AAA) is a major component of the network security of the organization. AAA is the basis for providing secure remote access to the network and remote management of network devices. Network devices can use a centralized security server containing all of the security polices that define the list of users and what they are allowed to do. TACACS+ and RADIUS are the commonly used protocols to communicate with a centralized (AAA) security server such as Cisco Secure Access Control Server (ACS). The following are some of the main characteristics of these protocols:
-
RADIUS combines authentication and authorization. The access-accept packets sent by the RADIUS server to the client contain authorization information. This makes it difficult to decouple authentication and authorization. On the other hand, TACACS+ uses the AAA architecture, which decouples authentication and authorization.
-
RADIUS uses UDP, whereas TACACS+ uses TCP (port 49). RADIUS uses UDP port 1812 (or 1645) for authentication, and UDP port 1813 (or 1646) for accounting messages.
-
RADIUS encrypts only the password in the access-request packet, from the client to the server. The remainder of the packet is unencrypted. Other information, such as username, authorized services, and accounting, can be captured by a third party. TACACS+ encrypts the entire body of the packet but leaves a standard TACACS+ header. Within this header, a field indicates whether the body is encrypted. For debugging purposes, it is useful to have the body of the packets unencrypted; however, during normal operation, the body of the packet is fully encrypted for more secure communications.
-
RADIUS does not allow specification (or enforcement) of which commands can be and which commands cannot be executed on a router on a per user basis. Therefore, RADIUS is not as useful for router management or as flexible for terminal services. TACACS+ provides two methods to control the authorization of router commands on a per-user or per-group basis:
-
Assign privilege levels to commands and have the router verify with the TACACS+ server whether the user is authorized at the specified privilege level
-
Explicitly specify in the TACACS+ server, on a per-user or per-group basis, the commands that are allowed
-
-
RADIUS has extensive accounting capabilities, whereas TACACS+ has limited accounting capabilities.
-
RADIUS is based on an open standard (RFC 2865); TACACS+ was developed by Cisco Systems.
Securing the Management Plane
There are two common techniques to secure management access to network devices. First, access to the management plane can be restricted using packet or session filters. Second, access can be allowed from only specific source IP addresses or networks. This can be implemented in several ways. Because all access to the management plane goes through the data plane, generic access lists can be applied to the interfaces of the device to restrict the access to the management IP addresses and interfaces of the device. If packet filtering using access lists or the Cisco IOS firewall feature has already been implemented, this is just a small addition to the existing filtering policies. If packet filtering or firewalling has not been configured yet, this might not be the best way to secure the management plane because it will also affect the forwarding of all data plane traffic. All the management protocols, such as Telnet, SSH, HTTP, HTTPS, and SNMP, allow access lists to be specifically applied to the sessions directed to these management plane applications. Incoming access requests are evaluated against an access list and permitted or denied based on the source of the session. The advantage of this method is that only management plane traffic is affected and the forwarding of data plane traffic is unaffected.
After a management session is established from an authorized source IP address, it is necessary to authenticate the user that is attempting to access the device. This can be done through a simple password authentication or an authentication based on a username and password combination. These usernames and passwords can be stored locally on the device itself or, for a more scalable solution, they can be stored in a central database. The router or switch can verify these credentials against the central database through use of the TACACS+ or RADIUS protocols. In addition to centralized authentication, the use of a TACACS+ or RADIUS server also allows for added authorization and accounting capabilities. Furthermore, the use of a TACACS+ or RADIUS server also opens up the possible use of more sophisticated and secure authentication methods such as token card services.
From a troubleshooting standpoint, it is important to know the answer to the following questions:
-
What security policies have been implemented for management access to the devices?
-
From which IP addresses or networks can the network devices be accessed?
-
What type of authentication, authorization, and accounting is used on the network?
-
If centralized AAA services are deployed, what happens when these servers fail or become unreachable?
-
Are there any backdoors or fallback mechanisms to access the devices?
Clearly, the more restrictive the policy is, the more secure the network will be. On the other hand, you have to be careful to not create a situation where devices cannot be accessed during a network outage. In those cases a very dangerous, “catch-22” situation can be created—A network problem causes you to lose management access to the devices. To diagnose and resolve the problem, you first need access to the devices. However, to access the devices, you first need to solve the problem. However, to solve the problem, you first need access to the network devices. Eventually, you might not have another option other than hoping that you can get physical access to the device to perform a password recovery procedure and gain access to the command line of the device.
Cisco Secure ACS offers centralized command and control for all user authentication, authorization, and accounting from a web-based, graphical interface (see Figure 9-1).
Under Reports and Activity option, Cisco Secure ACS provides the following mechanisms to troubleshoot AAA-related problems:
-
Accounting Reports: The TACACS+ and the RADIUS, accounting reports contain a record of all successful authentications during the period covered by the report. Information captured includes time, date, username, type of connection, amount of time logged in, and bytes transferred.
-
Administration Reports: This report contains all TACACS+ commands requested during the period covered by the report. This is typically used when Cisco Secure ACS is being used to manage access to routers.
-
Passed Authentications: This report lists successful authentications during the period covered by the report.
-
Failed Attempts: This report contains a record of all unsuccessful authentications during the period covered by the report for both TACACS+ and RADIUS. The reports capture the username attempted, time, date, and cause of failure.
-
Logged-in Users: This report shows all users currently logged in, grouped by AAA client.
-
Disabled Accounts: This list contains accounts that have been disabled. They might have been manually disabled or disabled automatically based on the aging information defined under User Setup.
Troubleshooting Security Implementations in the Management Plane
Authentication provides a method for identifying users based on credentials such as a username and password. Access to network devices or network services should only be granted after the user’s identity has been verified. AAA authentication on routers and switches is configured by defining a named list of authentication methods to be used for a specific service, such as logging in to the device or connecting to an interface using PPP. The list, also called a method list, determines which types of authentication will be used and the sequence in which the different authentication methods will be tried. If a default authentication method is defined for a particular service that will be the authentication method used unless a different method list is specifically applied. In other words, the default list can be overruled by specifying an alternative named method list, which should then be explicitly assigned to a line (for logins) or interface (for network-based authentication). The output of the debug aaa authentication command can be very useful for troubleshooting AAA authentication problems. Example 9-1 shows the output of debug aaa authentication for a successful login attempt.
Router# debug tacacs
Router# debug aaa authentication
13:21:20: AAA/AUTHEN: create_user user='' ruser='' port='tty6'
rem_addr='10.0.0.32' authen_type=1 service=1 priv=1
13:21:20: AAA/AUTHEN/START (0): port= 'tty6' list='' action=LOGIN service=LOGIN
13:21:20: AAA/AUTHEN/START (0): using "default" list
13:21:20: AAA/AUTHEN/START (70215483): Method=TACACS+
13:21:20: TAC+ (70215483): received authen response status = GETUSER
13:21:20: AAA/AUTHEN (70215483): status = GETUSER
13:21:23: AAA/AUTHEN/CONT (70215483): continue_login
13:21:23: AAA/AUTHEN (70215483): status = GETUSER
13:21:23: AAA/AUTHEN (70215483): Method=TACACS+
13:21:23: TAC+ : send AUTHEN/CONT packet
13:21:23: TAC+ (70215483): received authen response status = GETPASS
13:21:23: AAA/AUTHEN (70215483): status = GETPASS
13:21:27: AAA/AUTHEN/CONT (70215483): continue_login
13:21:27: AAA/AUTHEN (70215483): status = GETPASS
13:21:27: AAA/AUTHEN (70215483): Method=TACACS+
13:21:27: TAC+ : send AUTHEN/CONT packet
13:21:27: TAC+ (70215483): received authen response status = PASS
13:21:27: AAA/AUTHEN (70215483): status = PASS
The debug aaa authentication output captured in Example 9-1 shows the following events:
-
A remote user with the IP address 10.0.0.32 attempts to log in to the router.
-
The router checks to see whether AAA authentication for the LOGIN service is enabled (and it is).
-
Because no other authentication method list is applied, the router applies the default method, and the first method defined by the default method list is TACACS+.
-
The authentication process then prompts the user for a username and password, and upon receiving these credentials, they are sent to the TACACS+ server for verification.
-
The TACACS+ server checks the credentials against its database and responds with a PASS status as a sign of successful authentication.
Note | In Example 9-1, the lines in the output that are preceded by AAA are generated by the debug aaa authentication command, while the lines preceded by TAC+ are originated by the debug tacacs command. |
Authorization determines what resources the user has access to (on the router, for example). In other words, AAA authorization assembles a set of attributes that describe what tasks the user is authorized to perform. These attributes are compared to the information contained in a database for a given user, and the result is returned to AAA to determine the user’s actual capabilities and restrictions. The database can be located locally on the device or it can be hosted remotely on a RADIUS or TACACS+ security server.
As with authentication, you configure AAA authorization by defining a named list of authorization methods, and then applying that list to various interfaces. As demonstrated in Example 9-2, the output of the debug aaa authorization command can prove quite useful to troubleshoot AAA authorization problems.
Router# debug aaa authorization
2:23:21: AAA/AUTHOR (0): user= 'admin1'
2:23:21: AAA/AUTHOR (0): send AV service=shell
2:23:21: AAA/AUTHOR (0): send AV cmd*
2:23:21: AAA/AUTHOR (342885561): Method=TACACS+
2:23:21: AAA/AUTHOR/TAC+ (342885561): user=admin1
2:23:21: AAA/AUTHOR/TAC+ (342885561): send AV service=shell
2:23:21: AAA/AUTHOR/TAC+ (342885561): send AV cmd*
2:23:21: AAA/AUTHOR (342885561): Post authorization status = FAIL
The debug aaa authorization output captured in Example 9-2 shows the following events:
-
The user (admin1) is attempting to do something that requires authorization (in Example 9-2, the user attempts to gain an EXEC shell service).
-
The cmd parameter specifies a command that the user is trying to execute. If * is listed, it refers to plain EXEC access.
-
The method used to authorize this access is TACACS+.
-
The router sends the necessary information (user credentials) through TACACS+ to the security server.
-
The security server verifies the authorization, determines that the user is not authorized to perform this function, and sends back a FAIL status response.
Accounting provides a method for collecting and sending information about user activities to the security server for billing, auditing, and reporting purposes. This information includes user identities, start and stop times, executed commands (such as PPP), and the number of packets and bytes sent and received. When AAA accounting is activated, the network access server reports user activity to the RADIUS or TACACS+ security server in the form of accounting records. This data can then be analyzed for network management, client billing, or for auditing purposes. All accounting methods must be defined through AAA. As with authentication and authorization, you configure AAA accounting by defining a named list of accounting methods, and applying that list to various interfaces. As demonstrated in Example 9-3, the output of the debug aaa accounting command is very informative; hence, this debug can help you troubleshoot AAA accounting problems.
Router# debug aaa accounting
May 10 14:48:33.011: AAA/ACCT/EXEC(00000005): Pick method list 'default'
May 10 14:48:33.011: AAA/ACCT/SETMLIST(00000005): Handle 0, mlist 81CA79CC,
Name default
May 10 14:48:33.011: Getting session id for EXEC (00000005): db=82099258
May 10 14:48:33.011: AAA/ACCT/EXEC(00000005): add, count 2
May 10 14:48:33.011: AAA/ACCT/EVENT(00000005): EXEC UP
The debug aaa accounting output captured in Example 9-3 shows a scenario where the default method was used, and that a user successfully gained access to the router’s EXEC shell.
There are a number of common TACACS+ failures, including the following:
-
A TACACS+ server goes down or a device (TACACS client) cannot connect to the server. To be ready for a situation like this, you may want to configure the network device to use the local database for authenticating critical users.
-
A device (client) shared key and the server’s shared key do not match.
-
User credentials (username, password, or both) getting rejected by the server.
Example 9-4 shows TACACS messages regarding these common cases.
Router# debug tacacs
Router# debug aaa authentication
! The TACACS+ server is down or the device has no connectivity to the server:
TAC+: TCP/IP open to 171.68.118.101/49 failed-
Connection refused by remote host
AAA/AUTHEN (2546660185): status = ERROR
AAA/AUTHEN/START (2546660185): Method=LOCAL
AAA/AUTHEN (2546660185): status = FAIL
As1 CHAP: Unable to validate response. Username chapuser: Authentication failure
! The key on the device and TACACS+ server do not match:
TAC+: received bad AUTHEN packet: length = 68, expected 67857
TAC+: Invalid AUTHEN/START packet (check keys)
AAA/AUTHEN (1771887965): status = ERROR
! Bad user name, bad password, or both:
AAA/AUTHEN: free_user (0x170940) user= 'chapuser' ruser= ''
Port= 'Async1' rem_addr= 'async' authen_type=CHAP service=PPP priv=1
TAC+: Closing TCP/IP 0x16EF4C connection to 171.68.118.101/49
AAA/AUTHEN (2082151566): status = FAIL
As1 CHAP: Unable to validate Response. Username papuser: Authentication failure
Note | In Example 9-4, both the debug tacacs and the debug aaa authentication commands have been enabled. The lines in the output that are preceded by AAA are generated by the debug aaa authentication command, while the lines preceded by TAC+ are originated by the debug tacacs command. |
The common RADIUS problems are similar to the common TACACS+ problems. Example 9-5 shows cases of RADIUS server’s failure or loss of network connectivity, mismatch of the shared key between the RADIUS server and the network device (RADIUS client), user authorization failure, and finally, a case of bad username, password, or both.
Router# debug radius
Router# debug aaa authentication
! The RADIUS server is down or the device has no connectivity to the server:
As1 CHAP: I RESPONSE id 12 len 28 from "chapadd"
RADIUS: id 15, requestor hung up.
RADIUS: No response for id 15
RADIUS: No response from server
AAA/AUTHEN (1866705040): status = ERROR
AAA/AUTHEN/START (1866705040): Method=LOCAL
AAA/AUTHEN (1866705040): status = FAIL
As1 CHAP: Unable to validate Response. Username chapadd: Authentication failure
As1 CHAP: 0 FAILURE id 13 len 26 msg is "Authentication failure"
! The key on the device and RADIUS server do not match:
RADIUS: received from id 21 171.68.118.101:1645, Access-Reject, len 20
RADIUS: Reply for 21 fails decrypt
NT client sends 'DOMAIN\user' and Radius server expects 'user':
RADIUS: received from id 16 171.68.118.101:1645, Access-Reject, len 20
AAA/AUTHEN (2974782384): status = FAIL
As1 CHAP: Unable to validate Response. Username CISCO\chapadd: Authentication
failure
As1 CHAP: 0 FAILURE id 13 len 26 msg is "Authentication failure"
! Username and password are correct, but authorization failed:
RADIUS: received from id 19 171.68.118.101:1645, Access-Accept, len 20
RADIUS: no appropriate authorization type for user
AAA/AUTHOR (2370106832): Post authorization status = FAIL
AAA/AUTHOR/LCP As1: Denied
! Bad username, bad password, or both:
RADIUS: received from id 17 171.68.118.101:1645, Access-Reject, len 20
AAA/AUTHEN (3898168391): status = FAIL
As1 CHAP: Unable to validate Response. Username ddunlap: Authentication failure
As1 CHAP: 0 FIALURE id 14 len 26 msg is "Authentication failure"
As1 PPP: Phase is TERMINATING
If a user is trying to get service for which he or she is not authorized, an authorization status failed message appears in the log. Misconfiguration of the RADIUS server must be investigated, as it is a common mistake. Misconfiguring usernames, passwords, or both, as is the case in Example 9-5, is likewise common.
No comments:
Post a Comment