Programmatically detect if local web server has hung - c

I realise that I'll get at least one answer along the lines of "(re)write the code so it doesn't hang" but let's assume we don't live in that shiny happy utopia just yet...
In our embedded system we have a big SDK including a web-server (Boa) which is the primary method of user interaction.
It's possible, during certain phases of the moon, that something can cause the web server to hang or become otherwise stuck in such a way that the process appears running normally (not crashed/dead/using 100% CPU) but does not serve any web pages.
So, the question is, how do we test/detect this situation?

To test whether the server is hung, create a TCP socket and connect to port 80 on IP address 127.0.0.1 (loopback address). Then send the following text over the socket
GET / HTTP/1.1\r\n\r\n
Most servers will interpret that as a request for index.html. Alternatively, you could implement an undocumented URL for testing (which allows for a shorter, predetermined response), e.g.
GET /test/fdoaoqfaf12491r2h1rfda HTTP/1.1\r\n\r\n
You then need to read the response from the server. This involves using select with a reasonable timeout to determine whether any data came back from the server, and if so, use recv to read the data. The response from the server will consist of a header followed by content. The header consists of lines of text, with a blank line at the end of the header. Lines end with \r\n, so the end of the header is \r\n\r\n.
Getting the content involves calling select and recv until recv returns 0. This assumes that the server will send the response and then close the socket. Some sophisticated servers will leave a socket open to allow multiple requests over the same socket. A simple embedded server should not be doing that. (If your server is trying to use the same socket for multiple requests, then you need to figure out how to turn that feature off.)
That's all very well and good, but you really need to rewrite your code so it doesn't hang.
The mostly likely cause of the problem is that the server has a bunch of dangling sockets, i.e. connections from clients that were never properly cleaned up. Dangling sockets will eventually prevent the server from accepting more connections, either because the server has a limit on the number of open connections, or because the process that's running the server uses up all of its file descriptors.
The first thing to check is the TCP timeout value. One project that I worked on had a default timeout of 5 hours, which meant that dangling sockets stayed open for 5 hours. A reasonable timeout is 1 minute.
Then you need to create a client that deliberately misbehaves. Clients can misbehave by
leaving a socket open without reading the server's response
abruptly closing the socket while reading the response
gracefully closing the socket while reading the response
The first situation should be handled by the TCP timeout. The other two need to be properly handled by the server code. Graceful and abrupt socket closure is controlled via the SO_LINGER option of ioctl and the shutdown function. After the client misbehaves, check the number of open file descriptors in the server process, to verify that the server has handled the situation correctly.

Related

is there a way to see if a client socket on my serverside is dead?

when a client connects to my server side, after they connect if they switch to a VPN or something the server side still says the socket is alive and still tries to read from it i tried using another thread to check all my sockets constantly with read and close it if it returns -1 but it still doesn't do anything
It very depends on what type of protocol you use, but generalized question is : yes and no. You have to learn network protocol stack to know what you csn do in you situations, details of which you did not disclose.
Usual way to solve this problem is establish some policy or two way cpmmunocation. E.g. there was np data or "i'm alive" message send from client X for duration of time Y, we close connection. Or, send a regular "ping" message to client C and expect a response before period Y expires.
If we're talking TCP, and if the client's connection is properly closed, a message is sent to the server, so the server will know the connection is closed, so read/recv will return 0 bytes indicating EOF.
But you're asking about the times when the client becomes unable to communicate with the server. Detecting an absence of messages is necessarily done using a timeout.
You can have the server "ping" the client (send a message to which the client must respond) periodically.
You can have the client send a message periodically (a "heartbeat") when idle.
Either way, no message (of any kind) for X seconds indicates a broken connection.
If you enable the SO_KEEPALIVE socket option on each new TCP connection, the OS will automatically ping the remote side periodically to see if it still responds, and close the connection if it doesn't. The default timeout is several hours, but many OSes allow you to configure a lower timeout on a per-socket basis. Unfortunately, each one is different in how to do this. Linux, for example, uses the TCP_KEEPIDLE socket option. NetBSD (And probably other BSDs) uses TCP_KEEPALIVE. And so on.

Deny a client's TCP connect request before accept()

I'm trying code TCP server in C language. I just noticed accept() function returns when connection is already established.
Some clients are flooding with random data some clients are just sending random data for one time, after that I want to close their's current connection and future connections for few minutes (or more, depends about how much load program have).
I can save bad client IP addresses in a array, can save timings too but I cant find any function for abort current connection or deny future connections from bad clients.
I found a function for windows OS called WSAAccept that allows you deny connections by user choice, but I don't use windows OS.
I tried code raw TCP server which allows you access TCP packet from begin including all TCP header and it doesn't accept connections automatically. I tried handle connections by program side including SYN ACK and other TCP signals. It worked but then I noticed raw TCP server receiving all packets in my network interface, when other programs using high traffic it makes my program laggy too.
I tried use libnetfilter which allows you filter whole traffic in your network interface. It works too but like raw TCP server it also receiving whole network interface's packets which is making it slow when there is lot of traffic. Also I tried compare libnetfilter with iptables. libnetfilter is slower than iptables.
So in summary how I can abort client's current and future connection without hurt other client connections?
I have linux with debian 10.
Once you do blacklisting on packet level you could get very fast vulnerable to very trivial attacks based on IP spoofing. For a very basic implementation an attacker could use your packet level blacklisting to blacklist anyone he wants by just sending you many packets with a fake source IP address. Usually you don't want to touch these filtering (except you really know what you are doing) and you just trust your firewall etc. .
So I recommend really just to close the file descriptor immediately after getting it from accept.

How to notify server that client is closing connection

I have a server that is running a select() loop that sometimes continues blocking when the client closes the connection from its side. The select() loop handles all other read/write operations correctly and sets the correct file descriptor in the fd_set, leading me to believe that it is not an issue with the file descriptor setup on the server-side.
The way I planned on handling the client closing the connection was to have the select() break due to activity on the socket (closing it from the client-side), see that the fd was set for that socket, and then try to read from it - and if the read returned 0, then close the connection. However, because the select() doesn't always return when the client side closes the connection, there is no attempt to check the fd_set and subsequently try to read from the socket.
As a workaround, I implemented a "stop code" that the client writes to the server just before closing the connection, and this write causes the select() to break and the server reads the "stop code" and knows to close the socket. The only problem with this solution is the "stop code" is an arbitrary string of bytes that could potentially appear in regular traffic, as the normal data being written can contain random strings that could potentially contain the "stop code". Is there a better way to handle the client closing the connection from its end? Or is the method I described the general "best practice"?
I think my issue has something to do with OpenSSL, as the connection in question is an OpenSSL tunnel, and it is the only file descriptor in the set giving me issues.
The way I planned on handling the client closing the connection was to have the select() break due to activity on the socket (closing it from the client-side), see that the fd was set for that socket, and then try to read from it - and if the read returned 0, then close the connection. However, because the select() doesn't always return when the client side closes the connection, there is no attempt to check the fd_set and subsequently try to read from the socket.
Regardless of whether you are using SSL or not, select() can tell you when the socket is readable (has data available to read), and a graceful closure is a readable condition (a subsequent read operation reports 0 bytes read). It is only abnormal disconnects that select() can't report (unless you use the exceptfds parameter, but even that is not always guaranteed). The best way to handle abnormal disconnects is to simply use timeouts in your own code. If you don't receive data from the client for awhile, just close the connection. The client will have to send data periodically, such as a small heartbeat command, if it wants to stay connected.
Also, when using OpenSSL, if you are using the older ssl_... API functions (ssl_new(), ssl_set_fd(), ssl_read(), ssl_write(), etc), make sure you are NOT just blindly calling select() whenever you want, that you call it ONLY when OpenSSL tells you to (when an SSL read/write operation reports an SSL_ERROR_WANT_(READ|WRITE) error). This is an area where alot of OpenSSL newbies tend to make the same mistake. They try to use OpenSSL on top of pre-existing socket logic that waits for a readable notification before then reading data. This is the wrong way to use the ssl_... API. You are expected to ask OpenSSL to perform a read/write operation unconditionally, and then if it needs to wait for new data to arrive, or pending data to send out, it will tell you and you can then call select() accordingly before retrying the SSL read/write operation again.
On the other hand, if you are using the newer bio_... API functions (bio_new(), bio_read(), bio_write(), etc), you can take control of the underlying socket I/O and not let OpenSSL manage it for you, thus you can do whatever you want with select() (or any other socket API you want).
As a workaround, I implemented a "stop code" that the client writes to the server just before closing the connection, and this write causes the select() to break and the server reads the "stop code" and knows to close the socket.
That is a very common approach in many Internet protocols, regardless of whether SSL is used or not. It is a very distinct and explicit way for the client to say "I'm done" and both parties can then close their respective sockets.
The only problem with this solution is the "stop code" is an arbitrary string of bytes that could potentially appear in regular traffic, as the normal data being written can contain random strings that could potentially contain the "stop code".
Then either your communication protocol is not designed properly, or your code is not processing the protocol correctly. In a properly-designed and correctly-processed protocol, there will not be any such ambiguity. There needs to be a clear distinction between the various commands that your protocol defines. Your "stop code" would be one such command amongst other commands. Random data in one command should not be mistakenly treated as a different command. If you are experiencing that problem, you need to fix it.

Minimising client processing - c socket programming

I am working on a client/server model based on Berkeley sockets and have almost finished but I'm stuck with a way to know that all of the data has been received whilst minimising the processing being executed on the client side.
The client I am working with has very little memory and battery and is to be deployed in remote conditions. This means that wherever possible I am trying to avoid processing (and therefore battery loss) on the client side. The following conditions on the client are outside of my control:
The client sends its data 1056 bytes at a time until it has ran out of data to send (I have no idea where the number 1056 came from but if you think that you know I would be very interested)
The client is very unpredictable in when it will send the data (it is attached to a wild animal and sends data determined by connection strength and battery life)
The client has an unknown amount of data to send at any given time
The data is transmitted though a GRSM enabled phone tag (Not sure that this is relevant but I'm assuming that extra information could only help)
(I am emulating the data I am expecting to receive from the client through localhost, if it seems to work I will ask the company where I am interning to invest in a static ip address to allow "real" tcp transfers, if it doesn't I won't. I don't think this is relevant but, again, I would rather provide too much information than too little)
At the moment I am using a while loop and incrementing the number of bytes received in order to "recv()" each of the 1056 byte sections. My problem is that the server needs to receive an unknown number of these. To me, the most obvious solutions are to send the number of sections to be received in an initial header from the client or to mark the last section being sent in some way. However, both of these approaches would require processing on the client side, I was wondering if there was a way to check whether the client has closed its socket from the server side? Or even whether something like closing the connection from the server after a pre-determined period of time without information from the client would be feasible? If these aren't possible then I would love to hear any other suggestions.
TLDR: What condition can I use here to minimise client-side processing?
while(!(/* Client has ran out of data to send*/)) {
receive1056Section();
}
Also, I know that it is bad practise to make a stackOverflow account and immediately ask a question, I didn't know what else to do, I'm sorry. Please don't hesitate to be mean if I've missed something very obvious.
Here is a suggestion for how to do the interaction:
The client:
Client connects to server via tcp.
Client sends chunks of data until all data has been sent. Flush the send buffer after each chunk.
When it is done the client issues a shutdown on the socket, sleeps for a couple of seconds and then closes the connection.
The client then sleeps until the next transmission. If the transmission was unsuccessful, the sleep time should be shorter to prevent unsent data to overflow the avaiable memory.
If the client is unable to connect for an extended period of time, you would have to discard data that doesn't fit in the memory.
I am assuming that sleep reduces power consumption.
The server:
The server programcan be single-threaded unless you need massive scalability. It is listening for incoming connections on the agreed port.
Whenever a client connects, a new socket is created.
Use select() to see which sockets has data (don't forget to include the listening socket!), and non-blocking reads to read from the sockets.
When you get the appropriate error (no more data to read and the other side has shutdown it's side of the connection), then you can close that socket.
This should work fine up to a couple of thousand simultaneous connections.
Example that handles many of the difficulties of implementing a server

SO_LINGER and closing sockets(WINSOCK)

im writing a multithreaded winsock application and im having some issues with closing the sockets.
first of all, is there a limit for a number of simultaneously open sockets? lets say like 32 sockets all in once.
i establish a connection on one of the sockets, and passing information and it all goes right.
problem is when i disconnect the socket and then reconnect to the same destination, i get a RST from the server after my SYN.
i dont have the code for the server app so i cant debug it.
when i used SO_LINGER and it sent a RST flag at the end of each session - it worked.
but i dont want to end my connections this way.
when not using SO_LINGER a FIN flag was sent but it seems the connection was not really closed.
any help?
thanks
On Unix there's a file descriptor limit per process - I'm guessing on Windows it's "handles".
You are probably bind()-ing your client socket to a fixed port. That might be the reason the server is rejecting your subsequent connection. Try normal ephemeral ports.
Firstly, I agree with Nikolai, are you binding your client socket?
If so it sounds like the socket on the server side is still in TIME_WAIT and is discarding the new connection attempt. By binding the client socket you're forcing the server to try and reuse the exact same connection that is currently in the 2MSL wait period, it can't be reused at this point in time and so you're seeing what you're seeing. There's usually no need to bind the client port, stop doing it and your problem will likely go away.
Secondly, yes, there are limits to the number of open sockets on Windows platforms but they're resource related rather than some hard coded number.
Each open socket uses some 'non paged pool' memory and each pending read or write request on a socket is also likely to use both 'non paged pool' and have pages of memory locked in memory during I/O (there's a limit to the number of pages that can be locked). That said on Vista and later there's much more 'non paged pool' available than on earlier versions of Windows and even then I've managed to achieve more than 70,000 concurrent active connections on a pretty low spec XP box (see here: http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html). Note that there are some separate limits on the number of outbound connections that you can establish (which is more likely to be of interest to you) but that's around 4000 by default and can be tuned by setting MAX_USER_PORT see here: Maximum number of concurrent TCP/IP connections - Win XP SP3 for more details.

Resources