I'm writing a non forking server, using poll() for multiple simultaneous connections. It works properly, except I have a problem with how to detect timeout the right way.
Let's say I have the following code:
#define POLL_SIZE 512
struct pollfd poll_set[POLL_SIZE];
timeout = 60000; // 60 secs
// setup server_sockfd with socket(), bind(), listen(), ...
poll_set[0].fd = server_sockfd;
poll_set[0].events = POLLIN;
numfds = 1;
while(1) {
rc = poll(poll_set, numfds, timeout);
if(rc == 0){
// handle timeout
}
for(fd_index = 0; fd_index < numfds; fd_index++) {
if(poll_set[fd_index].revents & POLLIN) {
// accept new connection or handle established connections
}
}
}
Let's assume, I have 15 clients connected, 14 clients are sending and receiving data, however one client is silent, no data to or from, ie. just occupying a socket on the server.
Now, the problem is that poll() can't spot this one specific client, because all the other 14 clients are providing data, so poll() says, it's ok.
How would you solve this problem by detecting this silent client, and close its connection?
Currently, I have nothing better, then create a time_t lastseen[POLL_SIZE] array, and keep track of the timestamp of the given connection when either data is read from the client or sent to client.
Then I use an alarm signal in every 60 seconds, and run through the lastseen array, compare their timestamp with the current timestamp, and tear down every connection being idle > 60 seconds.
Or perhaps a thread could do the same to avoid signaling. What do you suggest to solve the problem?
(Note that I experimented with libevent, and it's very nice. However, I had to abandon it, because I couldn't find support to add SSL/TLS to an already connected socket. Think of STARTTLS)
Detecting of errors related to socket is not poll's job. All it does it indicates whether one or more sockets are ready for read write operations. If error occurs with any awaited socket then poll marks that socket as ready (really it marked by OS) and POLLERR flag is indicated in revents field.
What about timeout. In general timeout is not transport layer error (and therefore is not tracked by sockets). You need to track it by yourself. For example you can remember timestamp of last read from socket (See clock_gettime(CLOCK_MONOTONIC, ...)) and set timeout in poll to minimum of all timeouts related to that sockets. After timeout expired you need to check whether it expired for the each socket or no.
Also consider use epoll - it is much faster for large number of sockets in one poll. And also for selection of nearest timeout you can use Heap data structure. So you can manage all sockets with O(log n) execution time.
Related
I am transferring files from server to clients using nonblocking sockets. Some files may be quite big (hundreds of megabytes). A separate thread handles each client.
while (more_to_read)
{
written = write(fd, ...);
if (written == 0)
{
struct pollfd wait = {
.fd = fd,
.events = POLLOUT,
.revents = 0
};
int status = poll(&wait, 1, TIMEOUT);
// ...
}
// ...
}
Sometimes poll() timeouts and the server closes the connection (to prevent malicious clients from using up server resources). I discovered this can happen even when the client is still trying to read the data. In such case, the client gets ECONNRESET when calling read().
I figured out that the server repeatedly sends some chunks of the file and at some point the socket may remain not ready for writing for some seconds. I assume this is because the data hasn't been sent yet and the corresponding kernel buffer is full. However this causes the timeout which leads the server to close the connection.
When I increase the value of TIMEOUT, the error happens less often. What is the best way to make sure unexpected timeouts don't happen?
The server is supposed to run on multiple platforms so I cannot really use an OS-specific solution.
You can never protect yourself fully against both DOS attacks and unexpected timeouts.
I suggest that you scale the allowed timeout with the amount of data already sent. Start with a shart timeout and increase it as you send more and more data.
DOS attackers and defunct clients will likely open many connections but won't bother reading from them. You want to close this kind of connection fast.
Clients who have received tens or hundreds of MB of data should be allowed much greater timeouts.
Example:
Initial: 500 ms
After 1 MB: 2 s
After 100 MB: 10 s
I am doing some test with TCP client application in a Raspberry Pi (server in the PC), with PPP (Point to Point Protocol) using a LTE Modem. I have used C program with sockets, checking system call's response. I wanted to test how socket works in a bad coverage area so I did some test removing the antenna.
I have followed the next steps:
Connect to server --> OK
Start sending data (write system call) --> OK (I also check in the server)
I removed the LTE modem's antenna (There is no network, it can't do ping)
Continue sending data (write system call) --> OK (server does not receive anything!!!)
It finished sending data and closed socket --> OK (connection is still opened and there is no data since the antenna was removed)
Program was finished
I put the antenna again
Some time later, the data has been uploaded and the connection closed. But I did another test following this steps but with more data, and it did not upload this data...
I do not know if there any way to ensure that the data written to TCP server is received by the server (I thought that TCP layer ensured this..). I could do it manually using an ACK but I guess that it has to be a better way to do.
Sending part code:
while(i<100)
{
sprintf(buf, "Message %d\n", i);
Return = write(Sock_Fd, buf, strlen(buf));
if(Return!=strlen(buf))
{
printf("Error sending data to TCP server. \n");
printf("Error str: %s \n", strerror(errno));
}
else
{
printf("write successful %d\n", i);
i++;
}
sleep(2);
}
Many thanks for your help.
The write()-syscall returns true, since the kernel buffers the data and puts it in the out-queue of the socket. It is removed from this queue when the data was sent and acked from the peer. When the OutQueue is full, the write-syscall will block.
To determine, if data has not been acked by the peer, you have to look at the size of the outqueue. With linux, you can use an ioctl() for this:
ioctl(fd, SIOCOUTQ, &outqlen);
However, it would be more clean and portable to use an inband method for determining if the data has been received.
TCP/IP is rather primitive technology. Internet may sound newish, but this is really antique stuff. TCP is needed because IP gives almost no guarantees, but TCP doesn't actually add that many guarantees. Its chief function is to turn a packet protocol into a stream protocol. That means TCP guarantees a byte order; no bytes will arrive out of order. Don't count on more than that.
You see that protocols on top of TCP add extra checks. E.g. HTTP has the famous HTTP error codes, precisely because it can't rely on the error state from TCP. You probably have to do the same - or you can consider implementing your service as a HTTP service. "RESTful" refers to an API design methodology which closely follows the HTTP philosophy; this might be relevant to you.
The short answer to your 4th and 5th topics was taken as a shortcut from this answer (read the whole answer to get more info)
A socket has a send buffer and if a call to the send() function succeeds, it does not mean that the requested data has actually really been sent out, it only means the data has been added to the send buffer. For UDP sockets, the data is usually sent pretty soon, if not immediately, but for TCP sockets, there can be a relatively long delay between adding data to the send buffer and having the TCP implementation really send that data. As a result, when you close a TCP socket, there may still be pending data in the send buffer, which has not been sent yet but your code considers it as sent, since the send() call succeeded. If the TCP implementation was closing the socket immediately on your request, all of this data would be lost and your code wouldn't even know about that. TCP is said to be a reliable protocol and losing data just like that is not very reliable. That's why a socket that still has data to send will go into a state called TIME_WAIT when you close it. In that state it will wait until all pending data has been successfully sent or until a timeout is hit, in which case the socket is closed forcefully.
The amount of time the kernel will wait before it closes the socket,
regardless if it still has pending send data or not, is called the
Linger Time.
BTW: that answer also refers to the docs where you can see more detailed info
I'm writing udp server/client application in which server is sending data and client
is receiving. When packet is loss client should sent nack to server. I set the socket as
O_NONBLOCK so that I can notice if the client does not receive the packet
if (( bytes = recvfrom (....)) != -1 ) {
do something
}else{
send nack
}
My problem is that if server does not start to send packets client is behave as the
packet is lost and is starting to send nack to server. (recvfrom is fail when no data is available)I want some advice how can I make difference between those cases , if the server does not start to send the packets and if it sends, but the packet is really lost
You are using UDP. For this protocol its perfectly ok to throw away packets if there is need to do so. So it's not reliable in terms of "what is sent will arrive". What you have to do in your client is to check wether all packets you need arrived, and if not, talk politely to your server to resend those packets you did not receive. To implement this stuff is not that easy,
If you have to use UDP to transfer a largish chunk of data, then design a small application-level protocol that would handle possible packet loss and re-ordering (that's part of what TCP does for you). I would go with something like this:
Datagrams less then MTU (plus IP and UDP headers) in size (say 1024 bytes) to avoid IP fragmentation.
Fixed-length header for each datagram that includes data length and a sequence number, so you can stitch data back together, and detect missed, duplicate, and re-ordered parts.
Acknowledgements from the receiving side of what has been successfully received and put together.
Timeout and retransmission on the sending side when these acks don't come within appropriate time.
you have a loop calling either select() or poll() to determine if data has arrived - if so you then call recvfrom() to read the data.
you can set time out for receive data as follows
ssize_t
recv_timeout(int fd, void *buf, size_t len, int flags)
{
ssize_t ret;
struct timeval tv;
fd_set rset;
// init set
FD_ZERO(&rset);
// add to set
FD_SET(fd, &rset);
// this is set to 60 seconds
tv.tv_sec =
config.idletimeout;
tv.tv_usec = 0;
// NEVER returns before the timeout value.
ret = select(fd, &rset, NULL, NULL, &tv);
if (ret == 0) {
log_message(LOG_INFO,
"Idle Timeout (after select)");
return 0;
} else if (ret < 0) {
log_message(LOG_ERR,
"recv_timeout: select() error \"%s\". Closing connection (fd:%d)",
strerror(errno), fd);
return;
}
ret = recvfrom(fd, buf, len, flags);
return ret;
}
It tells that if there are data ready, Normally, read() should return up to the maximum number of bytes that you've specified, which possibly includes zero bytes (this is actually a valid thing to happen!), but it should never block after previously having reported readiness.
Under Linux, select() may report a socket file descriptor as "ready
for reading", while nevertheless a subsequent read blocks. This could
for example happen when data has arrived but upon examination has
wrong checksum and is discarded. There may be other circumstances in
which a file descriptor is spuriously reported as ready. Thus it may
be safer to use O_NONBLOCK on sockets that should not block.
Look up sliding window protocol here.
The idea is that you divide your payload into packets that fit in a physical udp packet, then number them. You can visualize the buffers as a ring of slots, numbered sequentially in some fashion, e.g. clockwise.
Then you start sending from 12 oclock moving to 1,2,3... In the process, you may (or may not) receive ACK packets from the server that contain the slot number of a packet you sent.
If you receive a ACK, then you can remove that packet from the ring, and place the next unsent packet there which is not already in the ring.
If you receive a NAK for a packet you sent, it means that packet was received by the server with data corruptions, and then you resend it from the ring slot reported in the NAK.
This protocol class allows transmission over channels with data or packet loss (like RS232, UDP, etc). If your underlying data transmission protocol does not provide checksums, then you need to add a checksum for each ring packet you send, so the server can check its integrity, and report back to you.
ACK and NAK packets from the server can also be lost. To handle this, you need to associate a timer with each ring slot, and if you don't receive either a ACK or NAK for a slot when the timer reaches a timeout limit you set, then you retransmit the packet and reset the timer.
Finally, to detect fatal connection loss (i.e. server went down), you can establish a maximum timeout value for all your packets in the ring. To evaluate this, you just count how many consecutive timeouts you have for single slots. If this value exceeds the maximum you have set, then you can consider the connection lost.
Obviously, this protocol class requires dataset assembly on both sides based on packet numbers, since packets may not be sent or received in sequence. The 'ring' helps with this, since packets are removed only after successful transmission, and on the receiving side, only when the previous packet number has already been removed and appended to the growing dataset. However, this is only one strategy, there are others.
Hope this hepls.
So the basic premise of my program is that I'm supposed to create a tcp session, direct traffic through it, and detect any connection losses. If the connection does break, I need to close the sockets and reopen them (using the same ports) in such a way that it will seem like the connection (almost) never died. It should also be noted that the two programs will be treated as proxies (data gets sent to them, if the connection breaks it gets stored until connection is fixed, then data is sent off).
I've done some research and gone ahead and used setsockopt() with the SO_REUSEADDR option to set the socket options so that I can reuse the address.
Here's the basic algorithm I do to detect a connection break using signals:
After initial setup of sockets, begin sending data
After x seconds, set a flag to false, which will prevent all other data from being sent
Send a single piece of data to let the other program know the connection is still open, reset timer to x seconds
If I receive same piece of data from the program, set the flag to true to continue sending
If I don't receive the data after x seconds, close the socket and attempt to reconnect
(step 5 is where I'm getting the error).
Essentially one program is a client(on one VM) and one program is a server(on another VM), each sending and receiving data to/from each other and to/from another program on each VM.
My question is: Given that I'm still getting this error after setting the socket options, why am I not allowed to re-bind the address when a connection has been detected?
The server is the one complaining when a disconnect is detected (I close the socket, open a new one, set the option, and attempt to bind the port with the same information).
One other thing of note is the way I'm receiving the data from the sockets. If I have a socket open, I'm basically reading it by doing the following:
while((x = recv(socket, buff, 1, 0)>=0){
//add to buffer
// send out to other program if connection is alive
}
Since I'm using the timer to close/reopen the socket, and this is in a different thread, will this prevent the socket from closing?
SO_REUSEADDR only allows limited reuse of ports. Specifically, it does not allow reuse of a port that some other socket is currently actively listening for incoming connections on.
There seems to be an epidemic here of people calling bind() and then setsockopt() and wondering why the setsockopt() doesn't fix an error that had already happened on bind().
You have to call setsockopt() first.
But I don't understand your problem. Why do you think you need to use the same ports? Why are you setting a flag preventing you from sending data? You don't need any of this. Just handle the errors on send() when and if they arise, creating a new connection when necessary. Don't try to out-think TCP. Many have tried, few if any have succeeded.
I am debugging a c based linux socket program. As all the examples available in websites,
I applied the following structure:
sockfd= socket(AF_INET, SOCK_STREAM, 0);
connect(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr));
send_bytes = send(sockfd, sock_buff, (size_t)buff_bytes, MSG_DONTWAIT);
I can detect the disconnection when the remove server closes its server program. But if I unplug the ethernet cable, the send function still return positive values rather than -1.
How can I check the network connection in a client program assuming that I can not change server side?
But if I unplug the ethernet cable, the send function still return
positive values rather than -1.
First of all you should know send doesn't actually send anything, it's just a memory-copying function/system call. It copies data from your process to the kernel - sometime later the kernel will fetch that data and send it to the other side after packaging it in segments and packets. Therefore send can only return an error if:
The socket is invalid (for example bogus file descriptor)
The connection is clearly invalid, for example it hasn't been established or has already been terminated in some way (FIN, RST, timeout - see below)
There's no more room to copy the data
The main point is that send doesn't send anything and therefore its return code doesn't tell you anything about data actually reaching the other side.
Back to your question, when TCP sends data it expects a valid acknowledgement in a reasonable amount of time. If it doesn't get one, it resends. How often does it resend ? Each TCP stack does things differently, but the norm is to use exponential backoffs. That is, first wait 1 second, then 2, then 4 and so on. On some stacks this process can take minutes.
The main point is that in the case of an interruption TCP will declare a connection dead only after a seriously large period of silence (on Linux it does something like 15 retries - more than 5 minutes).
One way to solve this is to implement some acknowledgement mechanism in your application. You could for example send a request to the server "reply within 5 seconds or I'll declare this connection dead" and then recv with a timeout.
To detect a remote-disconnect, do a read()
Check this thread for more info:
Can read() function on a connected socket return zero bytes?
You can't detect the unplugged ethernet cable only with calling write() funcation.
That's because of tcp retransmission acted by tcp stack without your consciousness.
Here are solutions.
Even though you already set keepalive option to your application socket, you can't detect in time the dead connection state of the socket, in case of your app keeps writing on the socket.
That's because of tcp retransmission by the kernel tcp stack.
tcp_retries1 and tcp_retries2 are kernel parameters for configuring tcp retransmission timeout.
It's hard to predict precise time of retransmission timeout because it's calculated by RTT mechanism.
You can see this computation in rfc793. (3.7. Data Communication)
https://www.rfc-editor.org/rfc/rfc793.txt
Each platforms have kernel configurations for tcp retransmission.
Linux : tcp_retries1, tcp_retries2 : (exist in /proc/sys/net/ipv4)
http://linux.die.net/man/7/tcp
HPUX : tcp_ip_notify_interval, tcp_ip_abort_interval
http://www.hpuxtips.es/?q=node/53
AIX : rto_low, rto_high, rto_length, rto_limit
http://www-903.ibm.com/kr/event/download/200804_324_swma/socket.pdf
You should set lower value for tcp_retries2 (default 15) if you want to early detect dead connection, but it's not precise time as I already said.
In addition, currently you can't set those values only for single socket. Those are global kernel parameters.
There was some trial to apply tcp retransmission socket option for single socket(http://patchwork.ozlabs.org/patch/55236/), but I don't think it was applied into kernel mainline. I can't find those options definition in system header files.
For reference, you can monitor your keepalive socket option through 'netstat --timers' like below.
https://stackoverflow.com/questions/34914278
netstat -c --timer | grep "192.0.0.1:43245 192.0.68.1:49742"
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (1.92/0/0)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (0.71/0/0)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (9.46/0/1)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (8.30/0/1)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (7.14/0/1)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (5.98/0/1)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (4.82/0/1)
In addition, when keepalive timeout ocurrs, you can meet different return events depending on platforms you use, so you must not decide dead connection status only by return events.
For example, HP returns POLLERR event and AIX returns just POLLIN event when keepalive timeout occurs.
You will meet ETIMEDOUT error in recv() call at that time.
In recent kernel version(since 2.6.37), you can use TCP_USER_TIMEOUT option will work well. This option can be used for single socket.
Finally, you can use read function with MSG_PEEK flag, which can let you check that the socket is okay. (MSG_PEEK just peeks if data arrived at kernel stack buffer and never copies the data into user buffer.)
So you can use this flag just for checking socket is okay without any side effect.
Check the return value, and see if it's equal to this value:
EPIPE
This socket was connected but the connection is now broken. In this case, send generates a SIGPIPE signal first; if that signal is ignored or blocked, or if its handler returns, then send fails with EPIPE.
Also add a check for the SIGPIPE signal in your handler, to make it be more controllable.