Tuesday, March 07, 2006
TCP/IP keep alive parameter
[In the testing in IBM blade, Linux2.4 kernel, somehow, we can not get expected behavior]
Use case.
A and B have set up TCP/IP connection. At some time, B is powered off, or the networking cable is unplugged.
In this case, TCP/IP protocol will depend on the Ti, Nm, Td to do liveness checking
In Linux, you can read this value by
[puliu@ib04b14 ~]$ /sbin/sysctl -a | grep tcp_keepalive_time
net.ipv4.tcp_keepalive_time = 7200
and set this value by
[puliu@ib04b14 ~]$ sudo /sbin/sysctl -w net.ipv4.tcp_keepalive_time=10
net.ipv4.tcp_keepalive_time = 10
[puliu@ib04b14 ~]$ sudo /sbin/sysctl -a -e | grep 'tcp_keepalive_time'
net.ipv4.tcp_keepalive_time = 10
Here is the detailed information from http://www.ianywhere.com/developer/technotes/mobilink_tcpip.html
I quote them as follows:
MobiLink & TCP/IP Keep-Alive
The MobiLink TCP/IP-based communications streams have a "keep_alive" option. This article discusses how to set this option effectively.
What Is the keep_alive Option?
The keep_alive option enables or disables liveness checking over a TCP/IP connection. Liveness checking is the process of checking whether or not the other end of a network connection is still connected ("alive"). It only applies to the end of a connection that is waiting to receive data, since the sending end finds out about a lost connection as soon as it tries to send data.
Why Liveness Checking Is Important
Liveness checking prevents the listening side of a connection from waiting indefinitely when a connection is lost. This lets servers like MobiLink discard client connections that have disappeared. Without liveness checking, a server can dedicate too many resources to lost connections it thinks are still alive. On the client side, liveness checking helps detect failed connections so that the user can be notified, and perhaps try again later or try a different server.
Without liveness checking, a dropped connection will either hang the client or the MobiLink worker thread to which the client was connected. If MobiLink was at a point in the synchronization where it was waiting for data from the client, then that worker thread will continue to wait and never be available for other synchronizations. If the client was waiting for data from MobiLink, then the client would continue to wait and never complete the synchronization. Since MobiLink synchronizations have communications in both directions, you need to enable liveness checking on both the MobiLink server and the client to prevent either side from hanging on lost communications.
How To Set the keep_alive Option In MobiLink Servers and Clients
The TCP/IP-based streams that are used during MobiLink synchronization now accept a new parameter, both on the client and server side:
keep_alive=[0|1]
To enable liveness checking in 7.0.x MobiLink, for example, your command line should contain the following:
dbmlsrv7 -x tcpip{keep_alive=1} ....
0 turns off TCP/IP liveness checking, and 1 enables it. The default is 1 (liveness detection enabled). If the network layer doesn't support TCP/IP liveness checking, this parameter is ignored. The default was initially 0, but our customers kept running into problems that were solved by having the option enabled, so the default was changed to 1. When in doubt, set the keep_alive option explicitly.
Consult your UltraLite and ASA client for MobiLink documentation for details on setting TCP/IP stream options in MobiLink clients.
The following SQL Anywhere Studio releases (and EBFs), and all later releases, have the keep_alive option available:
7.0.0.455
7.0.1.1103
7.0.2
Setting the keep_alive option to 1 isn't usually enough, however. The operating system's settings for TCP/IP liveness checking may need to be altered to suit your application, but first you need to understand some of the liveness-checking details.
How TCP/IP Liveness Checking Works
The OS network layer's implementation of liveness checking are slightly different for each OS, but the following basic algorithm applies:
IF( waiting for data on TCP/IP connection for at least time Ti ) {
// Begin liveness checking:
Repeat = Nm
WHILE( there is no response from the other side AND Repeat > 0 ) {
Send a keep-alive message
Wait for reply up to time Td
Repeat <- Repeat - 1
}
IF( there was no response ) {
Declare the connection dead
} ELSE {
Reset the count of time waiting for data
}
}
NOTES:
The bolded values (Ti, Nm, and Td) are example names, created for the purposes of this discussion, that represent OS/network settings.
* Ti is the idle time threshold. Any connection idle for at least this time is a candidate for liveness checking.
* Nm is the maximum number (unsigned integer) of keep-alive messages that can be sent before a connection is declared dead.
* Td is the minimum delay that is applied between keep-alive messages.
* Once a connection is declared dead, any current or subsequent attempt to communicate on that connection will fail immediately with an error.
* Some TCP/IP implementations may use a progression of geometric delays by applying a formula to Td, but that isn't discussed here.
* The keep-alive message is a simple, low-level message. If the connection is still present, the other end will use fundamental TCP/IP functionality to reply, without affecting the sending application. The cost of a liveness check is negligible; because it is implemented via an exchange of TCP/IP ACKs there is minimal network or processor utilization.
The key to setting up liveness checking is to choose and set the idle time threshold (Ti) that must elapse before the checks actually take place on an idle connection. Usually you should not need to change the other parameters.
What Idle Threshold Should You Use?
Your choice for Ti really depends on your deployment, and your network characteristics in particular. You want an interval that is long enough to prevent falsely declaring a connection to be dead, but shorter than the time you are willing to wait before detecting a dropped connection. Note that the total time before a dead connection is declared is Nm * Td + Ti, not just Ti. To avoid false detections, you need to make sure that the idle and delay time settings are significantly longer than the round-trip time for the network connection. You can use a "ping" utility (included with Windows and Unix) to determine round-trip times for network connections. If you are using a network with low bandwidth or high latency, such as a dialup, wireless WAN or satellite network, than you would probably need larger idle time than with a low-latency, high-bandwidth network.
Also, if your deployment relies on repeated, scheduled synchronizations, you probably want to set the interval to a value that detects dropped connections well before the next synchronization is attempted.
Now that you know what to set the interval to, let's look at how you can set it on various operating systems.
How To Set the Idle Threshold on Windows
The Windows default settings are as follows:
Ti = 2 hours
Nm = 5
Td = 5 seconds
Windows relies on system-wide registry settings to set a single check interval for an entire machine. Even worse, the default is to wait two (2) hours before checking for a dropped connection! This is usually inadequate for responsive liveness checking.
NOTE: This setting is shared by all applications that request TCP/IP keep-alive. Any such applications want to detect lost connections, so picking parameters appropriate for your network will likely benefit those applications as well as MobiLink. However be aware that the Nm parameter is also used for data retransmissions irrespective of TCP/IP keep-alive, so it should only be changed after careful consideration. Future releases of ASA Studio may include the ability to let you control the keep-alive behaviour without affecting other applications.
Windows 95/98/ME
Some of the registry entries below may not already exist. You may need to create some of them. Once modified, you will have to reboot for the new settings to take effect.
WARNING: Modifying your registry is DANGEROUS. Modify your registry at your own risk.
The registry entry for Ti is a DWORD value with millisecond units:
\HKEY_LOCAL_MACHINE
\System
\CurrentControlSet
\Services
\VxD
\MSTCP
\KeepAliveTime
The registry entry for Nm is a String value:
\HKEY_LOCAL_MACHINE
\System
\CurrentControlSet
\Services
\VxD
\MSTCP
\MaxDataRetries
The registry entry for Td is a DWORD value with millisecond units:
\HKEY_LOCAL_MACHINE
\System
\CurrentControlSet
\Services
\VxD
\MSTCP
\KeepAliveInterval
Windows NT/2000/XP
Some of the registry entries below may not already exist. You may need to create some of them. Once modified, you will have to reboot for the new settings to take effect.
WARNING: Modifying your registry is DANGEROUS. Modify your registry at your own risk.
The registry entry for Ti is a DWORD value with millisecond units:
\HKEY_LOCAL_MACHINE
\System
\CurrentControlSet
\Services
\Tcpip
\Parameters
\KeepAliveTime
The registry entry for Nm is a String value:
\HKEY_LOCAL_MACHINE
\System
\CurrentControlSet
\Services
\Tcpip
\Parameters
TcpMaxDataRetransmissions
The registry entry for Td is a DWORD value with millisecond units:
\HKEY_LOCAL_MACHINE
\System
\CurrentControlSet
\Services
\Tcpip
\Parameters
\KeepAliveInterval
Windows CE
Some of the registry entries below may not already exist. You may need to create some of them. Once modified, you will have to soft reset for the new settings to take effect.
WARNING: Modifying your registry is DANGEROUS. Modify your registry at your own risk.
The registry entry for Ti is a DWORD value with millisecond units:
\HKEY_LOCAL_MACHINE
\Comm
\Tcpip
\Parms
\KeepAliveTime
The registry entry for Nm is a String value:
\HKEY_LOCAL_MACHINE
\Comm
\Tcpip
\Parms
\TcpMaxDataRetransmissions
The registry entry for Td is a DWORD value with millisecond units:
\HKEY_LOCAL_MACHINE
\Comm
\Tcpip
\Parms
\KeepAliveInterval
How to Set the Liveness-Checking Interval on Sun Solaris
Use the ndd utility to set the tcp_keepalive_interval (Ti), in milliseconds. The following sets the value to the default of 2 hours:
ndd -set /dev/tcp tcp_keepalive_interval 7200000
You can inspect the current value as follows:
ndd -get /dev/tcp tcp_keepalive_interval
You must have the proper permissions (root will do) to be able to set this value. Consult your system administrator for details.
How to Set the Liveness-Checking Interval on Linux
You must write a program, or find a utiltity, that makes use of the sysctl function to set system parameters. The details are too tedious to go into here, but the relevant parameters are tcp_keepalive_time (Ti), in seconds, and tcp_keepalive_probes (Nm).
Sources
- The Microsoft Developers Network (MSDN), October 2000.
- Sybase Technical News, Volume 7, Number 8, August 1998 (from ISUG): http://www.isug.com/Sybase_FAQ/ASE/Section10/5/Q10.5.8.html