I have a program that consists of a master server and distributed slave servers. The slave servers send status updates to the server, and if the server hasn't heard from a specific slave in a fixed period, it marks the slave as down. This is happening consistently.
From inspecting logs, I have found that the slave is only able to send one status update to the server, and then is never able to send another update, always failing on the call to connect() "Cannot assign requested address (99).
Oddly enough, the slave is able to send several other updates to the server, and all of the connections are happening on the same port. It seems that the most common cause of this failure is that connections are left open, but I'm having trouble finding anything left open. Are there other possible explanations?
To clarify, here's how I'm connecting:
struct sockaddr *sa; // parameter
size_t sa_size; //parameter
int i = 1;
int stream;
stream = socket(AF_INET,SOCK_STREAM,0);
setsockopt(stream,SOL_SOCKET,SO_REUSEADDR,&i,sizeof(i));
bindresvport(stream,NULL);
connect(stream,sa,sa_size);
This code is in a function to obtain a connection to another server, and a failure on any of those 4 calls causes the function to fail.
It turns out that the problem really was that the address was busy - the busyness was caused by some other problems in how we are handling network communications. Your inputs have helped me figure this out. Thank you.
EDIT: to be specific, the problems in handling our network communications were that these status updates would be constantly re-sent if the first failed. It was only a matter of time until we had every distributed slave trying to send its status update at the same time, which was over-saturating our network.
Maybe SO_REUSEADDR helps here?
http://www.unixguide.net/network/socketfaq/4.5.shtml
this is just a shot in the dark : when you call connect without a bind first, the system allocates your local port, and if you have multiple threads connecting and disconnecting it could possibly try to allocate a port already in use. the kernel source file inet_connection_sock.c hints at this condition. just as an experiment try doing a bind to a local port first, making sure each bind/connect uses a different local port number.
Okay, my problem wasn't the port, but the binding address. My server has an internal address (10.0.0.4) and an external address (52.175.223.XX). When I tried connecting with:
$sock = #stream_socket_server('tcp://52.175.223.XX:123', $errNo, $errStr, STREAM_SERVER_BIND|STREAM_SERVER_LISTEN);
It failed because the local socket was 10.0.0.4 and not the external 52.175.223.XX. You can checkout the local available interfaces with sudo ifconfig.
sysctl -w net.ipv4.tcp_timestamps=1
sysctl -w net.ipv4.tcp_tw_recycle=1
Related
I am developing a 9p server, it is pretty much like an nfs server. Subsequent mounting and unmounting causes no socket descriptor file leakage because I am able to close the socket. However, in the following scenario, the server does not do a proper clean up and close the socket. The scenario is, when the client at Machine A mount a FS from the server machine. Then for some reason, Machine A restarts or is shut down. If this happens, I am expecting the server to clean up the work and close the socket but for some reason it blocks on read(). I thought a read() should return 0 when a connection is closed but it doesnt. I assume thats because a proper tcp termination has not occured so the server is waiting for some data from the client. Here is a pseudo code of my server
while(1){
n = read(sockfd, buffer, 4); //4 is protocol header that specifies the size
if ( n == 0 ) break;
/* iteratively read the rest of bytes until the incoming message ends */
}
cleanup(); // close socket and some other tasks
However, when the client restarts while the server is blocking on read, nothing happens. What is the best way and easiest to solve this? Some people suggest running a separate thread that checks connections but this is too involved. I am sure there must be a faster way
When the client does a shutdown then the OS on client terminates all TCP connection. But when the client crashes or it is switched off or when an network problem occurs somewhere at path between the client and the server then there is no way to deliver an information to server and the server may be blocked in the read() call forever.
There are two possible solutions. Either you can use standard TCP keep alive probes or you can implement an application level health-check.
TCP keep alive
TCP keep-alive is well described for example at http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html :
In order to understand what TCP keepalive (which we will just call keepalive) does, you need do nothing more than read the name: keep TCP alive. This means that you will be able to check your connected socket (also known as TCP sockets), and determine whether the connection is still up and running or if it has broken...
When you want you application use TCP keep alive the just set the socket option (error checking is missing):
int optval = 1;
socklen_t optlen = sizeof(optval);
setsockopt(socket, SOL_SOCKET, SO_KEEPALIVE, &optval, optlen);
The TCP keep alive is easy to use but it depends on the OS configuration and application cannot set own timeouts because they are configurable system wide.
Application level health check
Use an application level mechanism when you need application specific timeouts for disconnection detection. There are plenty of ways how to implement it. The idea is to send periodically a piece of useless data and assume connection is destroyed when it is not received.
I want to amend Zaboj Campula's good answer with the most important way to deal with this: Timeouts. Normally, you would assign a timeout to any socket operation. A typical value is 30 seconds. That way there is no need for keep alives most of the time. Connection failure will be detected within 30 seconds.
Some people suggest running a separate thread that checks connections but this is too involved.
That does not work because your machine does not know that the connection is gone. There is nothing to check.
I am attempting to bind a socket to a port below:
if( bind(socket_desc,(struct sockaddr *) &server, sizeof(server)) < 0)
{
perror("bind failed. Error");
return 1;
}
puts("bind done");
But it gives:
$ ./serve
Socket created
bind failed. Error: Address already in use
Why does this error occur?
Everyone is correct. However, if you're also busy testing your code your own application might still "own" the socket if it starts and stops relatively quickly. Try SO_REUSEADDR as a socket option:
What exactly does SO_REUSEADDR do?
This socket option tells the kernel that even if this port is busy (in
the TIME_WAIT state), go ahead and reuse it anyway. If it is busy,
but with another state, you will still get an address already in use
error. It is useful if your server has been shut down, and then
restarted right away while sockets are still active on its port. You
should be aware that if any unexpected data comes in, it may confuse
your server, but while this is possible, it is not likely.
It has been pointed out that "A socket is a 5 tuple (proto, local
addr, local port, remote addr, remote port). SO_REUSEADDR just says
that you can reuse local addresses. The 5 tuple still must be
unique!" by Michael Hunter (mphunter#qnx.com). This is true, and this
is why it is very unlikely that unexpected data will ever be seen by
your server. The danger is that such a 5 tuple is still floating
around on the net, and while it is bouncing around, a new connection
from the same client, on the same system, happens to get the same
remote port. This is explained by Richard Stevens in ``2.7 Please
explain the TIME_WAIT state.''.
You have a process that is already using that port. netstat -tulpn will enable one to find the process ID of that is using a particular port.
Address already in use means that the port you are trying to allocate for your current execution is already occupied/allocated to some other process.
If you are a developer and if you are working on an application which require lots of testing, you might have an instance of your same application running in background (may be you forgot to stop it properly)
So if you encounter this error, just see which application/process is using the port.
In linux try using netstat -tulpn. This command will list down a process list with all running processes.
Check if an application is using your port. If that application or process is another important one then you might want to use another port which is not used by any process/application.
Anyway you can stop the process which uses your port and let your application take it.
If you are in linux environment try,
Use netstat -tulpn to display the processes
kill <pid> This will terminate the process
If you are using windows,
Use netstat -a -o -n to check for the port usages
Use taskkill /F /PID <pid> to kill that process
The error usually means that the port you are trying to open is being already used by another application. Try using netstat to see which ports are open and then use an available port.
Also check if you are binding to the right ip address (I am assuming it would be localhost)
if address is already in use can you just want to kill whoso ever process is using the port, you can use
lsof -ti:PortNumberGoesHere | xargs kill -9
source and inspiration this.
PS: Could not use netstat because it not installed already.
As mentioned above the port is in use already.
This could be due to several reasons
some other application is already using it.
The port is in close_wait state when your program is waiting for the other end to close the program.refer (https://unix.stackexchange.com/questions/10106/orphaned-connections-in-close-wait-state).
The program might be in time_wait state. you can wait or use socket option SO_REUSEADDR as mentioned in another post.
Do netstat -a | grep <portno> to check the port state.
It also happens when you have not give enough permissions(read and write) to your sock file!
Just add expected permission to your sock contained folder and your sock file:
chmod ug+rw /path/to/your/
chmod ug+rw /path/to/your/file.sock
Then have fun!
I was also facing that problem, but I resolved it.
Make sure that both the programs for client-side and server-side are on different projects in your IDE, in my case NetBeans. Then assuming you're using localhost, I recommend you to implement both the programs as two different projects.
To terminate all node processes:
killall -9 node
First of check which port are listening,
netstat -tlpn
then select available port to conect,
sudo netstat -tlpn | grep ':port'
Fix it into also to your server and clients interfaces. Go Barrier tab -> change settings, -> port value type -> save/ok
Check both clients and server have similar port values
Then Reload.
Now it should be ok.
Check for running process pid:
pidof <process-name>
Kill processes:
sudo kill -9 process_id_1 process_id_2 process_id_3
I was trying to learn the usage of option SO_KEEPALIVE in socket programming in C language under Linux environment.
I created a server socket and used my browser to connect to it. It was successful and I was able to read the GET request, but I got stuck on the usage of SO_KEEPALIVE.
I checked this link keepalive_description#tldg.org but I could not find any example which shows how to use it.
As soon as I detect the client's request on accept() function I set the SO_KEEPALIVE option value 1 on the client socket. Now I don't know, how to check if the client is down, how to change the time interval between the probes sent etc.
I mean, how will I get the signal that the client is down? (Without reading or writing at the client - I thought I will get some signal when probes are not replied back from client), how should I program it after setting the option SO_KEEPALIVE on).
Also if suppose the probes are sent every 3 secs and the client goes down in between I will not get to know that client is down and I may get SIGPIPE.
Anyways importantly I wanna know how to use SO_KEEPALIVE in the code.
To modify the number of probes or the probe intervals, you write values to the /proc filesystem like
echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 60 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 20 > /proc/sys/net/ipv4/tcp_keepalive_probes
Note that these values are global for all keepalive enabled sockets on the system, You can also override these settings on a per socket basis when you set the setsockopt, see section 4.2 of the document you linked.
You can't "check" the status of the socket from userspace with keepalive. Instead, the kernel is simply more aggressive about forcing the remote end to acknowledge packets, and determining if the socket has gone bad. When you attempt to write to the socket, you will get a SIGPIPE if keepalive has determined remote end is down.
You'll get the same result if you enable SO_KEEPALIVE, as if you don't enable SO_KEEPALIVE - typically you'll find the socket ready and get an error when you read from it.
You can set the keepalive timeout on a per-socket basis under Linux (this may be a Linux-specific feature). I'd recommend this rather than changing the system-wide setting. See the man page for tcp for more info.
Finally, if your client is a web browser, it's quite likely that it will close the socket fairly quickly anyway - most of them will only hold keepalive (HTTP 1.1) connections open for a relatively short time (30s, 1 min etc). Of course if the client machine has disappeared or network down (which is what SO_KEEPALIVE is really useful for detecting), then it won't be able to actively close the socket.
As already discussed, SO_KEEPALIVE makes the kernel more aggressive about continually verifying the connection even when you're not doing anything, but does not change or enhance the way the information is delivered to you. You'll find out when you try to actually do something (for example "write"), and you'll find out right away since the kernel is now just reporting the status of a previously set flag, rather than having to wait a few seconds (or much longer in some cases) for network activity to fail. The exact same code logic you had for handling the "other side went away unexpectedly" condition will still be used; what changes is the timing (not the method).
Virtually every "practical" sockets program in some way provides non-blocking access to the sockets during the data phase (maybe with select()/poll(), or maybe with fcntl()/O_NONBLOCK/EINPROGRESS&EWOULDBLOCK, or if your kernel supports it maybe with MSG_DONTWAIT). Assuming this is already done for other reasons, it's trivial (sometimes requiring no code at all) to in addition find out right away about a connection dropping. But if the data phase does not already somehow provide non-blocking access to the sockets, you won't find out about the connection dropping until the next time you try to do something.
(A TCP socket connection without some sort of non-blocking behaviour during the data phase is notoriously fragile, as if the wrong packet encounters a network problem it's very easy for the program to then "hang" indefinitely, and there's not a whole lot you can do about it.)
Short answer, add
int flags =1;
if (setsockopt(sfd, SOL_SOCKET, SO_KEEPALIVE, (void *)&flags, sizeof(flags))) { perror("ERROR: setsocketopt(), SO_KEEPALIVE"); exit(0); };
on the server side, and read() will be unblocked when the client is down.
A full explanation can be found here.
I'm running a game website where users connect using an Adobe Flash client to a C server running on a Fedora Linux box.
Often users complain about disconnects. Usually they're "Connection reset by peer"-disconnects.
Is there any way to make the connection more stable or does it all depend on the route from the user host to my server?
One thing I tried is to make it more stable by sending PING in clear text every other minute to avoid timeout problems.
Anyone got more ideas?
You are not exhausting the number of socket/memory use/cpu that the server process is given on the server, are you?
Do check with ulimit.
Also, if possible try to trace the error message in the source code (when a RST packet is sent--), i.e. when a send() or accept() returns an error value. In such cases print a debug message into the logs; if you really fancy debugging it do a simulation of the server:
run it into debug mode on a separate machine (possibly a clone of the server)
simulate thousands of connection (or find a network harnessing program)
backtrace the call and/or sniff the connection
where are you running the server?
at home? at work? at a hosting facility?
this will make a very big difference.
Can you design your app to connect to two sockets on the server and then load balance or make it active/passive (or active/active)?
You can use SO_KEEPALIVE TCP socket option.
How, in C, can I detect whether a program is connecting to itself.
For example, I've set up a listener on port 1234, then I set up another socket to connect to an arbitrary address on port 1234. I want to detect whether I'm connecting to my own program. Is there any way?
Thanks,
Dave
Linux provides tools that I think can solve this problem. If the connection is to the same machine, you can run
fuser -n tcp <port-number>
and get back a list of processes listening to that port. You can then look in /proc and found out if there is a process with a pid not your own which is running the same binary you are. A bit of chewing gum and baling wire will help keep the whole contraption together.
I don't think you can easily ask questions about a process on another machine.
One of the parameters to the accept() function is a pointer to a struct sockaddr.
When you call accept() on the server side it will fill in the address of the remote machine connecting to your server socket.
If that address matches the address of any of the interfaces on that machine then that indicates that the client is on the same machine as the server.
You could send a sequence of magic packets upon connection, which is calculated in a deterministic way. The trick is how to do this in a way that sender and receiver will always calculate the same packet contents if they are from the same instance of the program. A little more information on what your program is would be helpful here, but most likely you can do some sort of hash on a bunch of program state and come up with something fairly unique to that instance of the program.
I assume you mean not just the same program, but the same instance of it running on the same machine.
Do you care about the case where you're connecting back to yourself via the network (perhaps you have two network cards, or a port-forwarding router, or some unusual routing out on the internet somewhere)?
If not, you could check whether the arbitrary address resolves to loopback (127.0.0.1), or any of the other IP addresses you know are you. I'm not a networking expert, so I may have missed some possibilities.
If you do care about that "indirect loopback" case, do some handshaking including a randomly-generated number which the two endpoints share via memory. I don't know whether there are security concerns in your situation: if so bear in mind that this is almost certainly subject to MITM unless you also secure the connection.