Problem supporting keep-alive sockets on a home-grown http server - c

I am currently experimenting with building an http server. The server is multi-threaded by one listening thread using select(...) and four worker threads managed by a thread pool. I'm currently managing around 14k-16k requests per second with a document length of 70 bytes, a response time of 6-10ms, on a Core I3 330M. But this is without keep-alive and any sockets I serve I immediatly close when the work is done.
EDIT: The worker threads processes 'jobs' that have been dispatched when activity on a socket is detected, ie. service requests. After a 'job' is completed, if there are no more 'jobs', we sleep until more 'jobs' gets dispatched or if there already are some available, we start processing one of these.
My problems started when I began to try to implement keep-alive support. With keep-alive activated I only manage 1.5k-2.2k requests per second with 100 open sockets. This number grows to around 12k with 1000 open sockets. In both cases the response time is somewhere around 60-90ms. I feel that this is quite odd since my current assumptions says that requests should go up, not down, and response time should hopefully go down, but definitely not up.
I've tried several different strategies for fixing the low performance:
1. Call select(...)/pselect(...) with a timeout value so that we can rebuild our FD_SET structure and listen to any additional sockets that arrived after we blocked, and service any detected socket activity.
(aside from the low performance, there's also the problem of sockets being closed while we're blocking, resulting in select(...)/pselect(...) reporting bad file descriptor.)
2. Have one listening thread that only accept new connections and one keep-alive thread that is notified via a pipe of any new sockets that arrived after we blocked and any new socket activity, and rebuild the FD_SET.
(same additional problem here as in '1.').
3. select(...)/pselect(...) with a timeout, when new work is to be done, detach the linked-list entry for the socket that has activity, and add it back when the request has been serviced. Rebuilding the FD_SET will hopefully be faster. This way we also avoid trying to listen to any bad file descriptors.
4. Combined (2.) and (3.).
-. Probably a few more, but they escape me atm.
The keep-alive sockets are stored in a simple linked List, whose add/remove methods are surrounded by a pthread_mutex lock, the function responsible for rebuilding the FD_SET also has this lock.
I suspect that it's the constant locking/unlocking of the mutex that is the main culprit here, I've tried to profile the problem but neither gprof or google-perftools has been very cooperative, either introducing extreme instability or plain refusing to gather any data att all (This could be me not knowing how to use the tools properly though.). But removing the locks risks putting the linked list in a non-sane state and probably crash or put the program into an infinite loop.
I've also suspected the select(...)/pselect(...) timeout when I've used it, but I'm pretty confident that this was not the problem since the low performance is maintained even without it.
I'm at a loss of how I should handle keep-alive sockets and I'm therefor wondering if you people out there has any suggestions on how to fix the low performance or have suggestions on any alternate methods I can use to go about supporting keep-alive sockets.
If you need any more information to be able to answer my question properly, don't hesitate to ask for it and I shall try my best to provide you with the necessary information and update the question with this new information.

Try to get rid of select completely. You can find some kind of event notification on every popular platform: kqueue/kevent on freebsd(), epoll on Linux, etc. This way you do not need to rebuild FD_SET and can add/remove watched fds anytime.

The time increase will be more visible when the client uses your socket for more then one request. If you are merely opening and closing yet still telling the client to keep alive then you have the same scenario as you did without keepalive. But now you have the overhead of the sockets sticking around.
If however you are using the sockets multiple times from the same client for multiple requests then you will lose the TCP connection overhead and gain performance that way.
Make sure your client is using keepalive properly. and likely a better way to get notification of the sockets state and data. Perhaps a poll device or queuing the requests.
http://www.techrepublic.com/article/using-the-select-and-poll-methods/1044098
This page has a patch for linux to handle a poll device. Perhaps some understanding of how it works and you can use the same technique in your application rather then rely on a device that may not be installed.

There are many alternatives:
Use processes instead of threads, and pass file descriptors via Unix sockets.
Maintain per-thread lists of sockets. You could even accept() directly on the worker threads.
etc...

Are your test clients reusing the sockets? Are they correctly handling keep alive?
I could see that case where you do the minimum change possible in your benchmarking code by just passing the keep alive header, but then not changing your code so that the socket is closed at the client end once the pay packet is received.
This would incure all the costs of keep-alive with none of the benefits.

What you are trying to do has been done before. Consider reading about the Leader-Follower network server pattern, http://www.kircher-schwanninger.de/michael/publications/lf.pdf

Related

No threads and blocking sockets - is it possible to handle several connections?

I have a program that needs to:
Handle 20 connections. My program will act as client in every connection, each client connecting to a different server.
Once connected my client should send a request to the server every second and wait for a response. If no request is sent within 9 seconds, the server will time out the client.
It is unacceptable for one connection to cause problems for the rest of the connections.
I do not have access to threads and I do not have access to non-blocking sockets. I have a single-threaded program with blocking sockets.
Edit: The reason I cannot use threads and non blocking sockets is that I am on a non-standard system. I have a single RTOS(Real-Time Operating System) task available.
To solve this, use of select is necessary but I am not sure if it is sufficient.
Initially I connect to all clients. But select can only be used to see if a read or write will block or not, not if a connect will.
So when I have connected to say 2 clients and they are all waiting to be served, what if the 3rd does not work, the connection will block causing the first 2 connections to time out as well.
Can this be solved?
I think the connection-issue can be solved by setting a timeout for the connect-operation, so that it will fail fast enough. Of course that will limit you if the network really is working, but you have a very long (slow) path to some of the server(s). That's bad design, but your requirements are pretty harsh.
See this answer for details on connection-timeouts.
It seems you need to isolate the connections. Well, if you cannot use threads you can always resort to good-old-processes.
Spawn each client by forking your server process and use traditional IPC mechanisms if communication between them is required.
If you can neither use a multiprocess approach I'm afraid you'll have a hard time doing that.

Libevent: multithreading to handle HTTP keep-alive connections

I am writing an HTTP reverse-proxy in C using Libevent and I would like to implement multithreading to make use of all available CPU cores. I had a look at this example: http://roncemer.com/software-development/multi-threaded-libevent-server-example/
In this example it appears that one thread is used for the full duration of a connection, but for HTTP 1.1 I don't think this will be the most effective solution as connections are kept alive by default after each request so that they can be reused later. I have noticed that even one browser panel can open several connections to one server and keep them open until the tab is closed which would immediately exhaust the thread pool. For an HTTP 1.1 proxy there will be many open connections but only very few of them actively transferring data at a given moment.
So I was thinking of an alternative, to have one event base for all incoming connections and have the event callback functions delegate to worker threads. This way we could have many open connections and make use of a thread only when data arrives on a connection, returning it back to the pool once the data has been dealt with.
My question is: is this a suitable implementation of threads with Libevent?
Specifically – is there any need to have one event base per connection as in the example or is one for all connections sufficient?
Also – are there any other issues I should be aware of?
Currently the only problem I can see is with burstiness, when data is received in many small chunks triggering many read events per HTTP response which would lead to a lot of handing-off to worker threads. Would this be a problem? If it would be, then it could be somewhat negated using Libevent's watermarking, although I'm not sure how that works if a request arrives in two chunks and the second chunk is sufficiently small to leave the buffer size below the watermark. Would it then stay there until more data arrives?
Also, I would need to implement scheduling so that a chunk is only sent once the previous chunk has been fully sent.
The second problem I thought of is when the thread pool is exhausted, i.e. all threads are currently doing something, and another read event occurs – this would lead to the read event callback blocking. Does that matter? I thought of putting these into another queue, but surely that's exactly what happens internally in the event base. On the other hand, a second queue might be a good way to organise scheduling of the chunks without blocking worker threads.

HTTP Persistent connection

Trying to implement a simple HTTP server in C using Linux socket interface I have encountered some difficulties with a certain feature I'd like it to have, namely persistent connections. It is relatively easy to send one file at a time with separate TCP connections, but it doesn't seem to be very efficient solution (considering multiple handshakes for instance). Anyway, the server should handle several requests (HTML, CSS, images) during one TCP connection. Could you give me some clues how to approach the problem?
It is pretty easy - just don't close the TCP connection after you write the reply.
There are two ways to do this, pipelined, and non pipelined.
In a non-pipelined implementation you read one http request on the socket, process it, write it back out of the socket, and then try to read another one. Keep doing that until the remote party closes the socket, or close it yourself after you stop getting requests on the socket after about 10 seconds.
In a pipelined implementation, read as many requests as are on the socket, process them all in parallel, and then write them all back out on the socket, in the same order as your received them. You have one thread reading requests in all the time, and another one writing them out again.
You don't have to do it, but you can advertize that you support persistent connections and pipelining, by adding the following header in your replies:
Connection: Keep-Alive
Read this:
http://en.wikipedia.org/wiki/HTTP_persistent_connection
By the way, in practice there aren't huge advantages to persistent connections. The overhead of managing the handshake is very small compared to the time taken to read and write data to network sockets. There is some debate about the performance advantages of persistent connections. On the one hand under heavy load, keeping connections open means many fewer sockets on your system in TIME_WAIT. On the other hand, because you keep the socket open for 10 seconds, you'll have many more sockets open at any given time than you would in non-persistent mode.
If you're interested in improving performance of a self written server - the best thing you can do to improve performance of the network "front-end" of your server is to implement an event based socket management system. Look into libev and eventlib.

Lost messages with non-blocking OpenSSL in C

Context: I'm developing a client-server application that is fairly solid most of the time, despite frequent network problems, outages, broken pipes, and so on. I use non-blocking sockets, select(), and OpenSSL to deliver messages between one or more nodes in a cluster, contingent on application-level heartbeats. Messages are queued and not removed from the queue until the entire message has been transferred and all the SSL_write()s return successfully. I maintain two sockets for each relationship, one incoming and one outgoing. I do this for a reason, and that's because it's much easier to detect a failed connection (very frequent) on a write than it is on a read. If a client is connecting, and I already have a connection, I replace it. Basically, the client performing the write is responsible for detecting errors and initiating a new connection (which will then replace the existing (dead) read connection on the server). This has worked well for me with one exception.
Alas, I'm losing messages. 99.9% of the time, the messages go through fine. But every now and then, I'll send, and I have no errors detected on either side for a few minutes... and then I'll get an error on the socket. The problem is that SSL_write has already returned successfully.
Let me guess: if I was blocking this would be fine, but since I'm non-blocking, I don't wait for the read on my remote end. As long as my TCP buffer can fit more, I keep stuffing things in the pipe. And when my socket goes poof, I lose anything in that buffer yet to be delivered?
How can I deal with this? Are application-level acks really necessary? (I'd rather not travel down the long road of complicated lost-acks and duplicate message complexity) Is there an elegant way to know what message I've lost? Or is there a way I can delay removal from my queue until I know it has been delivered? (Without an ack, how?)
Thanks for any help in advance.

Is there any benefit to using epoll with a very small number of file descriptors?

Would the following single threaded UDP client application see a performance benefit from using epoll over simply calling recvfrom/sendto on non-blocking sockets?
Let me explain the client.
I am writing single threaded UDP based client (custom protocol) that both sends and receives data using non-blocking I/O and my colleague suggested I use epoll for this.
The client sends and receives multiple packets of information that are all associated with a unique session id and multiple sessions can be run simultaneously.
If I use epoll, there will be a limited number of maybe 10-20 file descriptors which epoll_wait could wait on. Each file descriptor would be associated with one session. So that's maximum 10 - 20 sessions and this number will be enforced.
Each session has it's own state machine. From a single thread I need to run each state machine reasonably frequently and poll the associated socket as well.
In my case, I'd have to use epoll_wait with a timeout of zero or some very small value so that I can give CPU time to run the state machines for each session.
If there is data for a session then it needs to be directed to the associated state machine.
However, I can't really see much benefit of this design with such a small number of file descriptors.
The way I see it is I have two design options:
1. In my main loop using epoll I can poll the descriptors using epoll_wait with either a small timeout or no timeout.
How it handles data at this point is where I'm getting a bit stuck... either I read it right away and then throw it into a queue for each state machine to pick up when it's run, or I set a flag on the state machine to tell it that data is waiting and when the state machine runs it'll pick it up with a call to recvfrom. Or, I read the data and handle it right away and run the state machine for it.
Or...
2, Just run each state machine from the main loop and call recvfrom. If I get some data, handle it. If I don't then do whatever else the state machine requires. Is there huge overhead calling recvfrom when there is no data?
With going the epoll route I'm coding in some extra complexity. If there is a strong likelyhood for it be faster in my case then I will start doing it. However, if the second way which really simple works just as well then I would not use epoll.
Any thoughts?
No, and in fact performance will be much worse using epoll if adding and removing file descriptors from the set to poll is anything but an extremely rare event. With poll, a single syscall performs the entire operation. With epoll, you need multiple syscalls to modify the set and then wait on it.
Unless you're writing a server that's intended to scale to tens, hundreds, or thousands of thousands of long-term persistent connections, epoll is not only premature optimization, but actually a pessimization. It's also completely nonstandard and non-portable.

Resources