Using C sockets: Address already in use - c

So the basic premise of my program is that I'm supposed to create a tcp session, direct traffic through it, and detect any connection losses. If the connection does break, I need to close the sockets and reopen them (using the same ports) in such a way that it will seem like the connection (almost) never died. It should also be noted that the two programs will be treated as proxies (data gets sent to them, if the connection breaks it gets stored until connection is fixed, then data is sent off).
I've done some research and gone ahead and used setsockopt() with the SO_REUSEADDR option to set the socket options so that I can reuse the address.
Here's the basic algorithm I do to detect a connection break using signals:
After initial setup of sockets, begin sending data
After x seconds, set a flag to false, which will prevent all other data from being sent
Send a single piece of data to let the other program know the connection is still open, reset timer to x seconds
If I receive same piece of data from the program, set the flag to true to continue sending
If I don't receive the data after x seconds, close the socket and attempt to reconnect
(step 5 is where I'm getting the error).
Essentially one program is a client(on one VM) and one program is a server(on another VM), each sending and receiving data to/from each other and to/from another program on each VM.
My question is: Given that I'm still getting this error after setting the socket options, why am I not allowed to re-bind the address when a connection has been detected?
The server is the one complaining when a disconnect is detected (I close the socket, open a new one, set the option, and attempt to bind the port with the same information).
One other thing of note is the way I'm receiving the data from the sockets. If I have a socket open, I'm basically reading it by doing the following:
while((x = recv(socket, buff, 1, 0)>=0){
//add to buffer
// send out to other program if connection is alive
}
Since I'm using the timer to close/reopen the socket, and this is in a different thread, will this prevent the socket from closing?

SO_REUSEADDR only allows limited reuse of ports. Specifically, it does not allow reuse of a port that some other socket is currently actively listening for incoming connections on.

There seems to be an epidemic here of people calling bind() and then setsockopt() and wondering why the setsockopt() doesn't fix an error that had already happened on bind().
You have to call setsockopt() first.
But I don't understand your problem. Why do you think you need to use the same ports? Why are you setting a flag preventing you from sending data? You don't need any of this. Just handle the errors on send() when and if they arise, creating a new connection when necessary. Don't try to out-think TCP. Many have tried, few if any have succeeded.

Related

TCP Sockets in C with bad network

I am doing some test with TCP client application in a Raspberry Pi (server in the PC), with PPP (Point to Point Protocol) using a LTE Modem. I have used C program with sockets, checking system call's response. I wanted to test how socket works in a bad coverage area so I did some test removing the antenna.
I have followed the next steps:
Connect to server --> OK
Start sending data (write system call) --> OK (I also check in the server)
I removed the LTE modem's antenna (There is no network, it can't do ping)
Continue sending data (write system call) --> OK (server does not receive anything!!!)
It finished sending data and closed socket --> OK (connection is still opened and there is no data since the antenna was removed)
Program was finished
I put the antenna again
Some time later, the data has been uploaded and the connection closed. But I did another test following this steps but with more data, and it did not upload this data...
I do not know if there any way to ensure that the data written to TCP server is received by the server (I thought that TCP layer ensured this..). I could do it manually using an ACK but I guess that it has to be a better way to do.
Sending part code:
while(i<100)
{
sprintf(buf, "Message %d\n", i);
Return = write(Sock_Fd, buf, strlen(buf));
if(Return!=strlen(buf))
{
printf("Error sending data to TCP server. \n");
printf("Error str: %s \n", strerror(errno));
}
else
{
printf("write successful %d\n", i);
i++;
}
sleep(2);
}
Many thanks for your help.
The write()-syscall returns true, since the kernel buffers the data and puts it in the out-queue of the socket. It is removed from this queue when the data was sent and acked from the peer. When the OutQueue is full, the write-syscall will block.
To determine, if data has not been acked by the peer, you have to look at the size of the outqueue. With linux, you can use an ioctl() for this:
ioctl(fd, SIOCOUTQ, &outqlen);
However, it would be more clean and portable to use an inband method for determining if the data has been received.
TCP/IP is rather primitive technology. Internet may sound newish, but this is really antique stuff. TCP is needed because IP gives almost no guarantees, but TCP doesn't actually add that many guarantees. Its chief function is to turn a packet protocol into a stream protocol. That means TCP guarantees a byte order; no bytes will arrive out of order. Don't count on more than that.
You see that protocols on top of TCP add extra checks. E.g. HTTP has the famous HTTP error codes, precisely because it can't rely on the error state from TCP. You probably have to do the same - or you can consider implementing your service as a HTTP service. "RESTful" refers to an API design methodology which closely follows the HTTP philosophy; this might be relevant to you.
The short answer to your 4th and 5th topics was taken as a shortcut from this answer (read the whole answer to get more info)
A socket has a send buffer and if a call to the send() function succeeds, it does not mean that the requested data has actually really been sent out, it only means the data has been added to the send buffer. For UDP sockets, the data is usually sent pretty soon, if not immediately, but for TCP sockets, there can be a relatively long delay between adding data to the send buffer and having the TCP implementation really send that data. As a result, when you close a TCP socket, there may still be pending data in the send buffer, which has not been sent yet but your code considers it as sent, since the send() call succeeded. If the TCP implementation was closing the socket immediately on your request, all of this data would be lost and your code wouldn't even know about that. TCP is said to be a reliable protocol and losing data just like that is not very reliable. That's why a socket that still has data to send will go into a state called TIME_WAIT when you close it. In that state it will wait until all pending data has been successfully sent or until a timeout is hit, in which case the socket is closed forcefully.
The amount of time the kernel will wait before it closes the socket,
regardless if it still has pending send data or not, is called the
Linger Time.
BTW: that answer also refers to the docs where you can see more detailed info

Writing on a TCP socket closed by the peer

I have a client-server application where each side communicate with the other via TCP socket.
I properly establish the connection and then I crash the server BEFORE any data is written on the socket by the client.
What I see is that the first write() attempt (client-side) is successful and it returns the actual number of written bytes, while the following ones return (as I expected) -1 (receiving a SIGPIPE) and errno=EPIPE.
Why the first write() is successful even if the socket is already closed?
EDIT
Sometimes also the following write() have a positive return values, as if everything goes well.
You're confused by what the return value of write() means. It doesn't mean, "the peer got the data and acknowledged it". Instead, it means, "I buffered so-many bytes to send to the peer and they're my responsibility now, so you can forget about them (and I don't have any pending errors)".
That is, if the TCP stack accepts the write and returns n bytes, that doesn't mean they've been written yet, just queued for writing. It'll take some time, perhaps 30s after it starts sending network traffic, before the stack gives up and returns an error to you. During that time, you could have done several calls to write() which were successful at queueing data for sending. (The write error will be returned in c.30s if the peer has vanished, or immediately if the peer can be contacted and sends a RST packet straight away to indicate the connection is dead.)
This has to do with how TCP/IP works, that can be roughly described as two mostly independent half-connections. When you close the socket at the server, the client is told that it will not receive further data from the C<-S half-connection, waking up read() immediatly, but not about the C->S direction. It only gets a reply resetting the connection after it tries to send some data. I recommend the TCP/IP Guide for further details.
The reason why sometimes you can write() twice is that you write faster than the round-trip time and can squeeze a second write() before the reply to the first one.
I'm using the following method to detect a disconnected server condition:
After getting the select() timeout on a socket (nothing was received, though was supposed to),
the 'system("ping -c 1 -w 1 server");' command is activated.
If the server is up and just lagging, the ping command will return in less than 0.1 seconds.
Otherwise (the server is down), the ping command will return in 1 second.

how to simulate abnormal case for socket/tcp programming in linux, such as terminating one side of connection?

i am learning to use SO_SNDTIMEO and SO_RCVTIMEO to check the timeout.
It is easy to use with read socket. But when i want to check write timeout, it always return successful. Here is what i did:(all in blocking mode)
close the client read socket and exit before server start write
terminate the client before server start write
unplug the cable of server after accept but before write
well, it seems all these case write just return sucessfully.
I think the reason should be that port is resource managed by os, and at the client side, after program gone, the tcp connection still shows FIN_WAIT2 state.
so, is there any convenient way to simulate some cases that write can receive errors such as EPIPE, EAGAIN?
How to get the error EAGAIN?
To get the error EAGAIN, you need to be using Non-Blocking Sockets. With Non-Blocking sockets, you need to write huge amounts of data (and stop receiving data on the peer side), so that your internal TCP buffer gets filled and returns this error.
How to get the error EPIPE?
To get the error EPIPE, you need to send large amount of data after closing the socket on the peer side. You can get more info about EPIPE error from this SO Link. I had asked a question about Broken Pipe Error in the link provided and the accepted answer gives a detailed explanation. It is important to note that to get EPIPE error you should have set the flags parameter of send to MSG_NOSIGNAL. Without that, an abnormal send can generate SIGPIPE signal.
Additional Note
Please note that it is difficult to simulate a write failure, as TCP generally stores the data that you are trying to write into it's internal buffer. So, if the internal buffer has sufficient space, then you won't get an error immediately. The best way is to try to write huge amounts of data. You can also try setting a smaller buffer size for send by using setsockopt function with SO_SNDBUF option
You can simulate errors using fault injection. For example, libfiu is a fault injection library that comes with an example project that allows you to simulate errors from POSIX functions. Basically it uses LD_PRELOAD to inject a wrapper around the regular system calls (including write), and then the wrapper can be configured to either pass through to the real system call, or return whatever error you like.
You could set the receive buffer size to be really small on one side, and send a large buffer on the other. Or on the one side set the send buffer small and try to send a large message.
Otherwise the most common test (I think) is to let the server and client talk for a while, and then remove a network cable.

Data transfer over TCP-IPv6 connection

I am working on a client-server application in C and on Linux platform. What I am trying to achieve is to change the socket id over a TCP connection on both client and server without data loss where in the client sends the data from a file to the server in the main thread. The application is multithreaded where the other threads change the socket id based on some global flags set.
Problem: The application has two TCP socket connections established, over both IPv4 and IPv6 paths. I am transferring a file over the TCP-IPv4 connection first in the main thread. The other thread is checking on some global flags and has access to/share the socket IDs created for each protocol in the main thread. The send and recv use a pointer variable in its call to point to the socket ID to be used for the data transfer. The data is transferred initially over TCP-Ipv4. Once the global flags are set and few other checks are made the other thread changes the socket ID used in send call to point to IPv6 socket. This thread also takes care of communicating the change between the two hosts.I am getting all the data over IPv4 sent completely before switching. Also I am getting data sent over Ipv6 after the socket ID is just switched. But down the transfer there is loss of data over IPv6 connection.(I am using a pointer variable in send function on server side send(*p_dataSocket.socket_id,sentence,p_size,0); to change the pointer to IPv6 socket ID on the fly)
The error after recv and send call on both side respectively is says ESPIPE:Illegal seek, but this error exists even before switching. So I am pretty much sure this is nothing to do with the data loss
I am using pselect() to check for the available data for each socket. I can somehow understand the data loss while switching(if not properly handled) but I am not able to figure out why the data loss is occurring down the transfer after switching. I hope I am clear on what the issue is. I have also checked to send the data individually over each protocol without switching and there is no data loss.It I initially transfer the data over Ipv6 and then switch to IPv4, there is no data loss. Also would really appreciate to know to how to investigate in this issue apart from using errno or netstat.
When you are using TCP to send data you just can't loose a part of the information in between. You either receive the byte stream the way it was sent or receive nothing at all - provided that you are using the socket-related functions correctly.
There are several points you may want to investigate.
First of all you must make sure that you are really sending the data which is lost. Add some logging on the server side application: dump anything that you transmit witn send() into some file. Include some extra info as well, like:
Data packet no.==1234, *p_dataSocket.socket_id==11, Data=="data_contents_here", 22 bytes total; send() return==22
The important thing here is to watch the contents of *p_dataSocket.socket_id. Make sure that you are using mutex or something like that cause you have a thread which regularly reads socket_id contents and another thread which occasionally changes it. You are not guranteed against the getting of a wrong value from that address unless your threads have monopoly access to it while reading/writing. It is important both for normal program operation and for the debugging information generation.
Another possible problem here is the logic which selects sentence to send. Corruption of this variable may be hard to track in multithreaded program. The logging of transmitted information will help you here too.
Use any TCP sniffer to check what TCP stack really transmits. Are there packets with lost data? If there are no those packets, try to find out which send() call was responsible for sending that data. If those packets exist, check the receiving side for bugs.
errno value should not be used alone. Its value has meaning only when you get an erroneous return from a function. Try to find out when exactly errno becomes ESPIPE That may happen when any of API functions return something like -1 (depends on function). When you find out where it happens you should find out what is wrong in that particular piece of code (debugger is your friend). Have in mind that errno behavior in multithreaded environment depends on your system implementation. Make sure that you use -pthread option (gcc) or at least compile with -D_REENTRANT to minimize the risks.
Check this question for some info about the possible cause of your situation with errno==ESPIPE. Try some debuggin techniques, as suggested there. Errno value of ESPIPE gives a hint that you are using file descriptors incorrectly somewhere in your program. Maybe somewhere you are using a socket fd as regular file or something like that. This may be caused by some race condition (simultaneous access to one object from several threads).

Socket read() hangs for a while when there is no data to read

Hi' I'm writing a simple http port forwarder. I read data from port 80, and pass the data to my lighttpd server, on port 8080.
As long as I write() data on the socket on port 8080 (forwarding the request) there's no problem, but when I read() data from that socket (forwarding the response), the last read() hangs a lot (about 1 or 2 seconds) before realizing there's no more data and returning 0.
I tried to set the socket to non-blocking, but this doesn't work, as sometimes it returns EWOULDBLOCKING even if there's some data left (lighttpd + cgi can be quite slow).
I tried to set a timeout with select(), but, as above, a slow cgi could timeout the socket when there's actually some data to transmit.
Update: SOLVED. It was the keepalive after all. After I disabled it in my lighttpd configuration file, the whole thing runs flawlessly.
Well, for the sake of completion, and as per my comment:
It is likely that the HTTP server itself (lighttpd in your case) is maintaining a persistent connection to your proxy because your proxy relayed a header containing “Connection: keep-alive”. This header aids when the client wants to make multiple requests over the same connection. So, because lighttpd received this header, it assumed it was going to receive further requests and kept the socket open, causing read to block in your proxy.
Disabling keep-alive in your lighttpd configuration is one way to fix it, but also you could also strip the “Connection: keep-alive“ from the header before you relay it to your web server.
Using both non-blocking sockets and select is the right way to go. Returning EWLOULDBLOCK doesn't mean that the entire stream of data is finished being received, it means that, instantaneously, there is nothing to read right now. That's exactly what you want, because it means that read won't wait even half a second for more data to show up. If the data isn't immediately available it will return.
Now, obviously, this means you will need to call read multiple times to get the complete data. The general format for doing this is a select loop. In pseudocode:
do
select ( my_sockets )
if ( select error )
handle_error
else
for each ( socket in my_sockets ) do
if ( socket is ready ) then
nonblocking read from socket
if ( no data was read ) then
close socket
remove socket from my_sockets
endif
endif
loop
endif
loop
The idea is that select will tell you which sockets have data available for reading right now. If you read one of those sockets, you are guaranteed either to get data or to get a return value of 0, indicating that the remote end closed the socket.
If you use this method, you will never be stuck in a read call that is not reading data, for any length of time. The blocking operation is the select call, and you can also select over writeable sockets if you need to write, and set a timeout if you need to do things periodically.
Don't do that!
Keepalives boost performance from other clients. Instead, fix your client. Send a Connection: close header in your client and make sure your request doesn't claim HTTP/1.1 compliance. (If for no other reason than that you probably don't handle chunked encoding either.)
I guess that I would use non-blocking I/O to full extend. Instead of setting timeouts I'd rather wait for event's:
while(select(...)) {
switch(...) {
case ...: // Handle accepting new connection
case ...: // Handle reading from socket
...
}
}
Sinle-thread, blocking forwarder will cause problems anyway with multiple clients.
Sorry - I don't remember exact calls. Also it can be strange in some cases (IIRC - you need to handle write), but there are libraries which simplify the task.

Resources