TCP/IP Connection Primer

Linux Kernel Tuning for High Performance Networking Series

John H Patton
Level Up Coding

--

Photo by Toa Heftiba on Unsplash

TCP/IP Socket and Open Connection Primer

Every TCP/IP connection consist of what’s called a 3-way handshake. The server will be listening on a welcome socket associated with a listener port.

Server Welcome Socket: server IP address + listener port

When a client connects to a server, the client will use a previously unused port in the ephemeral port range, if possible. The client creates a socket with the ephemeral port and client IP address associated with the network device that can reach the server.

Client Socket: client IP address + port from the ephemeral port range.

1. The client initiates the TCP connection by sending a synchronization packet with a sequence number, or SYN(1), to the server and port the server is listening on.

2. The server reads the packet from the rx ring on the NIC and puts it in the receive backlog queue. When ready, the server reads the packet from the receive queue, determines it’s a SYN packet, creates a new connection in the “SYN_RECV” state, and puts this connection in the SYN backlog queue. The server then sends a synchronization+acknowledgement packet with its own sequence number, or SYN+ACK(2), back to the client.

3. The client completes the connection by confirming receipt of the SYN+ACK packet by sending an acknowledgement, or ACK(3), back to the server. Upon receipt of the ACK packet, the server creates a connection socket, the connection is removed from the SYN backlog queue and moved to the accept queue, and the state is changed from “SYN_RECV” to “ESTABLISHED.”

The application receives the client socket from the established connection on the next call to accept() and can now communicate with the client.

Server Connection Socket: mapped to Server Welcome Socket, but dedicated to the established connection.

Open Connection Sequence

TCP/IP 3-Way Handshake to Establish a Connection
TCP/IP 3-Way Handshake

TCP Connection:
client IP & ephemeral port + server IP & listener port
or
client socket + server socket

TCP/IP Close Connection Primer

When the client and server are done sending and receiving data, the connection needs to be closed in a similar manner to how it was opened. The client usually controls when the close connection is triggered. However, the protocol gives the server application time to do some cleanup before closing the connection, which adds an additional state on both client and server.

1. The client application initiates closing the TCP connection by sending a finalization packet, or FIN(1), to the server socket and puts the client connection in the “FIN_WAIT1” state.

2. The server receives the FIN packet, moves the connection to the “CLOSE_WAIT” state, and sends an ACK(2) packet back to the client. The server waits for the server application to call close().

The client receives the ACK packet and moves the connection to the “FIN_WAIT2” state and waits for the application on the server to finish and the server to send a FIN packet indicating everything is complete.

3. The server application calls close() and the server sends a FIN(3) packet and moves the connection to the “LAST_ACK” state.

4. The client receives the FIN packet, sends a final ACK(4) packet to the server, and moves the connection to the “TIME_WAIT” state. The client connection closes after double the Maximum Segment Lifetime (MSL).

MSL defaults to 60 seconds on most systems and client connections will be in TIME_WAIT for 120 seconds.

Close Connection Sequence

TCP/IP Close Connection Squence.
TCP Connection Close Sequence

Impact of Heavy Traffic Spikes

Websites can become overloaded when an event draws a lot of traffic to a website all at once, like a product launch, concert ticket sale, current news, etc. The reasons they can become overloaded are varied, but a poorly configured web server can be one of the first in a long list of potential bottlenecks. One of the primary causes of an overloaded web server is a poorly tuned network stack, or worse… a default network stack configuration.

Half-Open Connections

Client connections both initiate and complete the 3-Way Handshake to create an established connection. As a result, the receiving server can be in a state of limbo while waiting for the partial connections to complete. In addition, most TCP applications have no access to this state since these details are handled by the kernel’s network driver and network stack. TCP/IP server applications create a listener bound to a welcoming socket (listen()), retrieve a client socket from the established connection queue on the TCP stack (accept()), perform reads/writes from/to the client socket (recv() / send()), and finally close the connection (close()).

The kernel is responsible for the maximum number of packets a CPU can have in a receive buffer to all listeners, including packets used in 3-way handshakes. The receive buffer sits between the network interface card (NIC) and the processor for the protocol stack. Packets coming into the receive buffer faster than they can be processed will result in dropped packets once the receive queue becomes full.

The kernel configuration controls how many half-open connections can be waiting on an ACK. If the SYN backlog queue is 128, it can only hold 128 SYN packets per listener, max. This means only 128 clients can attempt a handshake at a time, or 128 half-open connections. There are a few factors that need to be paid attention to at this point in the handshake:

  • Any server+port can receive a flood of SYN packets from various sources, and not all of them may be from legitimate sources.
  • Clients do not have to complete the handshake, leaving many SYN packets in the SYN backlog queue until they timeout.
  • A full backlog queue will cause any additional client connection attempts to fail.

Established Connections Waiting for the Application

When a connection state changes to “ESTABLISHED” it is moved to the accept queue where it will be picked up by the server application when it calls accept(). The size of this queue is set by both the kernel and the application. The application requests the size when it calls listen() and the kernel sets the maximum size limit of the queue. If the application requests an accept queue backlog larger than the kernel limit, this is silently truncated to the kernel limit.

The effect is that a full accept queue will cause the kernel to limit the rate at which SYN packets are received to match the rate that the accept queue can be processed, dropping packets coming in past the threshold. Increasing both the SYN backlog queue and the accept backlog queue is an effective remedy.

Half-Closed Connections

Another sign there’s a problem can occur on both sides of the connection with either being in a half-closed state. “TIME_WAIT” on client connections indicate the connection is closed, but needs to wait for 2 times the Maximum Segment Lifetime to be reclaimed. This would be a problem seen on reverse proxy servers that are not reusing connections in “TIME_WAIT” or have a keepalive. “CLOSE_WAIT” on server connections indicate that the connection is waiting on the application to finish cleaning up.

TIME_WAIT Starvation
For HTTP reverse proxies, a high number of connections in “TIME_WAIT” can be reduced by enabling the keepalive setting if the proxy target supports it, but make sure the value is between 3-to-5 seconds so the connection can be closed more quickly when it’s no longer needed. This is an HTTP/1.1 protocol and is commonly used in HTTP reverse proxies.

The kernel can be adjusted to reuse connections that are in the “TIME_WAIT” state. Reusing “TIME_WAIT” connections comes with some minor risk, especially when operating behind a NAT or on non-compliant client operating systems. In addition, linux kernels prior to v4.12 allow for ports to be recycled instead of closed. This is not a recommended setting and should be used with care and should never be enabled on systems behind a NAT or on non-compliant operating systems, but in some environments can provide some value. Check for increased RST packets or connection resets on these systems to validate these settings.

The purpose of the 2 minute wait (MSL x 2) on reusing a connection is to ensure that the protocol does not create segments containing a duplicate sequence from a previous segment that might still exist in the network.

Alternatively, the MSL can also be lowered on some systems to allow for faster reclamation of “TIME_WAIT” connections, but this should be done with some care. On reverse proxy servers under heavy load, lowering the MSL might be a good way to increase performance on proxied connections.

CLOSE_WAIT Starvation
On servers with a high number of “CLOSE_WAIT” connections, the application is unable to finish processing the call to close(). These should clear quickly under normal conditions, and reverse proxy web servers tend to handle this well. However, when a large volume of “CLOSE_WAIT” connections lingers, this is the result of the application being unable to call close() in a timely fashion. If these connections are locked up, the only way to clear these is to wait or to restart the application. This can cause issues with memory since the connections take memory; however, the larger issue is what the application is doing that’s causing these to pile up. There’s not a lot that can be done to improve the cause of an increased volume of “CLOSE_WAIT” connections other than tuning or fixing the server application that’s holding the connection open.

Conclusion

The overview in this article is meant to provide a basic understanding of the underlying elements of opening and closing TCP connections. There are a lot of details not covered in this article, but this should provide a good foundation for further exploration of the TCP protocol.

If any of the information in this article is inaccurate, please post a comment and I’ll update the article to correct the information.

--

--