Peculiarities of c10k client

July 04, 2018

There is a well-known problem called c10k. The essence of it is to handle 10000 concurrent clients on a single server. This problem was conceived in 1999 by Dan Kegel and at that time it made the industry to rethink the way the web servers were handling connections. Then-state-of-the-art solution to allocate a thread for each client started to leak facing the upcoming web scale. Nginx was born to solve this problem by embracing event-driven I/O model provided by a shiny new epoll system call (in Linux).

Times were different back then and now we can have a really beefy server with the 10G network, 32 cores and 256 GiB RAM that can easily handle that amount of clients, so c10k is not much of a problem even with threaded I/O. But, anyway, I wanted to check how various solutions like threads and non-blocking async I/O will handle it, so I started to write some silly servers in my c10k repo and then I’ve stuck because I needed some tools to test my implementations.

Basically, I needed a c10k client. And I actually wrote a couple – one in Go and the other in C with libuv. I’m going to also write the one in Python 3 with asyncio.

While I was writing each client I’ve found 2 peculiarities – how to make it bad and how to make it slow.

How to make it bad

By making bad I mean making it really c10k – creating a lot of connections to the server thus saturation its resources.

Go client

I started with the client in Go and quickly stumbled upon the first roadblock. When I was making 10 concurrent HTTP request with simple "net/http" requests there were only 2 TCP connections

$ lsof -p $(pgrep go-client) -n -P
COMMAND     PID USER   FD      TYPE DEVICE SIZE/OFF    NODE NAME
go-client 11959  avd  cwd       DIR  253,0     4096 1183846 /home/avd/go/src/github.com/dzeban/c10k
go-client 11959  avd  rtd       DIR  253,0     4096       2 /
go-client 11959  avd  txt       REG  253,0  6240125 1186984 /home/avd/go/src/github.com/dzeban/c10k/go-client
go-client 11959  avd  mem       REG  253,0  2066456 3151328 /usr/lib64/libc-2.26.so
go-client 11959  avd  mem       REG  253,0   149360 3152802 /usr/lib64/libpthread-2.26.so
go-client 11959  avd  mem       REG  253,0   178464 3151302 /usr/lib64/ld-2.26.so
go-client 11959  avd    0u      CHR  136,0      0t0       3 /dev/pts/0
go-client 11959  avd    1u      CHR  136,0      0t0       3 /dev/pts/0
go-client 11959  avd    2u      CHR  136,0      0t0       3 /dev/pts/0
go-client 11959  avd    4u  a_inode   0,13        0   12735 [eventpoll]
go-client 11959  avd    8u     IPv4  68232      0t0     TCP 127.0.0.1:55224->127.0.0.1:80 (ESTABLISHED)
go-client 11959  avd   10u     IPv4  68235      0t0     TCP 127.0.0.1:55230->127.0.0.1:80 (ESTABLISHED)

The same with ss¹

$ ss -tnp dst 127.0.0.1:80
State  Recv-Q  Send-Q   Local Address:Port     Peer Address:Port
ESTAB  0       0         127.0.0.1:55224       127.0.0.1:80       users:(("go-client",pid=11959,fd=8))
ESTAB  0       0         127.0.0.1:55230       127.0.0.1:80       users:(("go-client",pid=11959,fd=10))

The reason for this is quite simple – HTTP 1.1 is using persistent connections with TCP keepalive for clients to avoid the overhead of TCP handshake on each HTTP request. Go’s "net/http" fully implements this logic – it multiplexes multiple requests over a handful of TCP connections. It can be tuned via Transport.

But I don’t need to tune it, I need to avoid it. And we can avoid it by explicitly creating TCP connection via net.Dial and then sending a single request over this connection. Here the function that does it and runs concurrently inside a dedicated goroutine.

func request(addr string, delay int, wg *sync.WaitGroup) {
	conn, err := net.Dial("tcp", addr)
	if err != nil {
		log.Fatal("dial error ", err)
	}

	req, err := http.NewRequest("GET", "/index.html", nil)
	if err != nil {
		log.Fatal("failed to create http request")
	}

	req.Host = "localhost"

	err = req.Write(conn)
	if err != nil {
		log.Fatal("failed to send http request")
	}

	_, err = bufio.NewReader(conn).ReadString('\n')
	if err != nil {
		log.Fatal("read error ", err)
	}

	wg.Done()
}

Let’s check it’s working

$ lsof -p $(pgrep go-client) -n -P
COMMAND     PID USER   FD      TYPE DEVICE SIZE/OFF    NODE NAME
go-client 12231  avd  cwd       DIR  253,0     4096 1183846 /home/avd/go/src/github.com/dzeban/c10k
go-client 12231  avd  rtd       DIR  253,0     4096       2 /
go-client 12231  avd  txt       REG  253,0  6167884 1186984 /home/avd/go/src/github.com/dzeban/c10k/go-client
go-client 12231  avd  mem       REG  253,0  2066456 3151328 /usr/lib64/libc-2.26.so
go-client 12231  avd  mem       REG  253,0   149360 3152802 /usr/lib64/libpthread-2.26.so
go-client 12231  avd  mem       REG  253,0   178464 3151302 /usr/lib64/ld-2.26.so
go-client 12231  avd    0u      CHR  136,0      0t0       3 /dev/pts/0
go-client 12231  avd    1u      CHR  136,0      0t0       3 /dev/pts/0
go-client 12231  avd    2u      CHR  136,0      0t0       3 /dev/pts/0
go-client 12231  avd    3u     IPv4  71768      0t0     TCP 127.0.0.1:55256->127.0.0.1:80 (ESTABLISHED)
go-client 12231  avd    4u  a_inode   0,13        0   12735 [eventpoll]
go-client 12231  avd    5u     IPv4  73753      0t0     TCP 127.0.0.1:55258->127.0.0.1:80 (ESTABLISHED)
go-client 12231  avd    6u     IPv4  71769      0t0     TCP 127.0.0.1:55266->127.0.0.1:80 (ESTABLISHED)
go-client 12231  avd    7u     IPv4  71770      0t0     TCP 127.0.0.1:55264->127.0.0.1:80 (ESTABLISHED)
go-client 12231  avd    8u     IPv4  73754      0t0     TCP 127.0.0.1:55260->127.0.0.1:80 (ESTABLISHED)
go-client 12231  avd    9u     IPv4  71771      0t0     TCP 127.0.0.1:55262->127.0.0.1:80 (ESTABLISHED)
go-client 12231  avd   10u     IPv4  71774      0t0     TCP 127.0.0.1:55268->127.0.0.1:80 (ESTABLISHED)
go-client 12231  avd   11u     IPv4  73755      0t0     TCP 127.0.0.1:55270->127.0.0.1:80 (ESTABLISHED)
go-client 12231  avd   12u     IPv4  71775      0t0     TCP 127.0.0.1:55272->127.0.0.1:80 (ESTABLISHED)
go-client 12231  avd   13u     IPv4  73758      0t0     TCP 127.0.0.1:55274->127.0.0.1:80 (ESTABLISHED)

$ ss -tnp dst 127.0.0.1:80
State  Recv-Q  Send-Q   Local Address:Port     Peer Address:Port
ESTAB  0       0         127.0.0.1:55260       127.0.0.1:80     users:(("go-client",pid=12231,fd=8))
ESTAB  0       0         127.0.0.1:55262       127.0.0.1:80     users:(("go-client",pid=12231,fd=9))
ESTAB  0       0         127.0.0.1:55270       127.0.0.1:80     users:(("go-client",pid=12231,fd=11))
ESTAB  0       0         127.0.0.1:55266       127.0.0.1:80     users:(("go-client",pid=12231,fd=6))
ESTAB  0       0         127.0.0.1:55256       127.0.0.1:80     users:(("go-client",pid=12231,fd=3))
ESTAB  0       0         127.0.0.1:55272       127.0.0.1:80     users:(("go-client",pid=12231,fd=12))
ESTAB  0       0         127.0.0.1:55258       127.0.0.1:80     users:(("go-client",pid=12231,fd=5))
ESTAB  0       0         127.0.0.1:55268       127.0.0.1:80     users:(("go-client",pid=12231,fd=10))
ESTAB  0       0         127.0.0.1:55264       127.0.0.1:80     users:(("go-client",pid=12231,fd=7))
ESTAB  0       0         127.0.0.1:55274       127.0.0.1:80     users:(("go-client",pid=12231,fd=13))

C client

I also decided to make a C client built on top of libuv for convenient event loop.

In my C client, there is no HTTP library so we’re making TCP connections from the start. It works well by creating a connection for each request so it doesn’t have the problem (more like feature :-) of the Go client. But when it finishes reading response it stucks and doesn’t return the control to the event loop until the very long timeout.

Here is the response reading callback that seems stuck:

static void on_read(uv_stream_t* stream, ssize_t nread, const uv_buf_t* buf)
{
    if (nread > 0) {
        printf("%s", buf->base);
    } else if (nread == UV_EOF) {
        log("close stream");
        uv_connect_t *conn = uv_handle_get_data((uv_handle_t *)stream);
        uv_close((uv_handle_t *)stream, free_close_cb);
        free(conn);
    } else {
        return_uv_err(nread);
    }

    free(buf->base);
}

It appears like we’re stuck here and wait for some (quite long) time until we finally got EOF.

This “quite long time” is actually HTTP keepalive timeout set in nginx and by default it’s 75 seconds.

We can control it on the client though with Connection and Keep-Alive HTTP headers which are part of HTTP 1.1.

And that’s the only sane solution because on the libuv side I had no way to close the convection – I don’t receive EOF because it is sent only when connection actually closed.

So what is happening is that my client creates a connection, send a request, nginx replies and then nginx is keeping connection because it waits for the subsequent requests. Tinkering with libuv showed me that and that’s why I love making things in C – you have to dig really deep and really understand how things work.

So to solve this hanging requests I’ve just set Connection: close header to enforce the new connection on each request from the same client and to disable HTTP keepalive. As an alternative, I could just insist on HTTP 1.0 where there is no keep-alive.

Now, that it’s creating lots of connections let’s make it keep those connections for a client-specified delay to appear a slow client.

How to make it slow

I needed to make it slow because I wanted my server to spend some time handling the requests while avoiding putting sleeps in the server code.

Initially, I thought to make reading on the client side slow, i.e. reading one byte at a time or delaying reading the server response. Interestingly, none of these solutions worked.

I tested my client with nginx by watching access log with the $request_time variable. Needless to say, all of my requests were served in 0.000 seconds. Whatever delay I’ve inserted, nginx seemed to ignore it.

I started to figure out why by tweaking various parts of the request-response pipeline like the number of connections, response size, etc.

Finally, I was able to see my delay only when nginx was serving really big file like 30 MB and that’s when it clicked.

The whole reason for this delay ignoring behavior were socket buffers. Socket buffers are, well, buffers for sockets, in other words, it’s the piece of memory where the Linux kernel buffers the network requests and responses for performance reason – to send data in big chunks over the network and to mitigate slow clients, and also for other things like TCP retransmission. Socket buffers are like page cache – all network I/O (with page cache it’s disk I/O) is made through it unless explicitly skipped.

So in my case, when nginx received a request, the response written by send/write syscall was merely stored in the socket buffer but from nginx point of view, it was done. Only when the response was large enough to not fit in the socket buffer, nginx would be blocked in syscall and wait until the client delay was elapsed, socket buffer was read and freed for the next portion of data.

You can check and tube the size of the socket buffers in /proc/sys/net/ipv4/tcp_rmem and /proc/sys/net/ipv4/tcp_wmem.

So after figuring this out, I’ve inserted delay after establishing the connection and before sending a request.

This way the server will keep around client connections (yay, c10k!) for a client-specified delay.

Recap

So in the end, I have a 2 c10k clients – one written in Go and the other written in C with libuv. The Python 3 client is on its way.

All of these clients connect to the HTTP server, waits for a specified delay and then sends GET request with Connection: close header.

This makes HTTP server keep a dedicated connection for each request and spend some time waiting to emulate I/O.

That’s how my c10k clients work.

ss stands for socket stats and it’s more versatile tool to inspect sockets than netstat. ↩︎