listen() ignores the backlog argument? - c

I have the following problem:
I have sockfd = socket(AF_INET, SOCK_STREAM, 0)
After I set up and bind the socket (let's say with sockfd.sin_port = htons(666)), I immediately do:
listen(sockfd, 3);
sleep(50); // for test purposes
I'm sleeping for 50 seconds to test the backlog argument, which seems to be ignored because I can establish a connection* more than 3 times on port 666.
*: What I mean is that I get a syn/ack for each Nth SYN (n>3) sent from the client and placed in the listen queue, instead of being dropped. What could be wrong? I've read the man pages of listen(2) and tcp(7) and found:
The behavior of the backlog argument on TCP sockets changed with Linux 2.2.
Now it specifies the queue length for completely established sockets waiting to be accepted, instead of the number of incomplete connection requests. The maximum length of the queue for incomplete sockets can
be set using
/proc/sys/net/ipv4/tcp_max_syn_backlog.
When syncookies are enabled there is
no logical maximum length and this
setting is ignored. See tcp(7) for
more
information.
, but even with sysctl -w sys.net.ipv4.tcp_max_syn_backlog=2 and sysctl -w net.ipv4.tcp_syncookies=0, I still get the same results! I must be missing something or completely missunderstand listen()'s backlog purpose.

The backlog argument to listen() is only advisory.
POSIX says:
The backlog argument provides a hint
to the implementation which the
implementation shall use to limit the
number of outstanding connections in
the socket's listen queue.
Current versions of the Linux kernel round it up to the next highest power of two, with a minimum of 16. The revelant code is in reqsk_queue_alloc().

Different operating systems provide different numbers of queued connections with different backlog numbers. FreeBSD appears to be one of the few OSes that actually has a 1-to-1 mapping. (source: http://books.google.com/books?id=ptSC4LpwGA0C&lpg=PA108&ots=Kq9FQogkTr&dq=berkeley%20listen%20backlog%20ack&pg=PA108#v=onepage&q=berkeley%20listen%20backlog%20ack&f=false )

Related

Is the new connected socket returned by accept() always bound to the same port as the listening socket?

I tested this on my machine by creating new connections until failure. On my machine new connect()/accept() requests fail* at near 700 socket connections (SOCK_STREAM); at the client/server respectively, on the loopback IP address. However the socket file descriptor returned by accept(), so far, is always bound to the same port as the listening socket.
My question is - If this behaviour is true for all machines then why is accept() limiting connections by creating connected sockets bound only to the same port as the listening socket? Couldn't the number of connections the server can make be increased greatly if the new sockets were bound to random ports (like connect() does)?
Also, why is accept(sock_fd, NULL, NULL) failing with "EFAULT - The addr argument is not in a writable part of the user address space." after nearly 700 successful iterations of the same call?
Similarly, why does, connect() fail with "EFAULT - The socket structure address is outside the user's address space." after nearly 700 successful iterations of the same call?
* EFAULT - Bad Address (after both accept()/connect()).
When you are listening, all connections will have the same port in the accept end of the connection (that is what is used as an identifier initially in order to establish the connection).
The local port number for the connecting part if not defined with a bind() can be anything. For the localhost device, the numbers can probably be recycled very fast on some OS, since there is no real need for lingering state of the TCP.
When it comes to having MANY connections on the same time, the amount of connections possible is limited by resources in your operating system per process. For Unix/Linux, this limit can be adjusted, put it is not advised to make the amount of FDs higher than default if using select(), since the libc size of the FDSET usually matches the default number of filedescriptors available per process. One trick around this is to create the socket, fork out children and let the children call accept(). Then each children can have many connections (apache and squid use this kind of model), increasing the total amount of connections possible on the same server port.
why is accept() limiting connections by creating connected sockets bound only to the same port as the listening socket? Couldn't the number of connections the server can make be increased greatly if the new sockets were bound to random ports (like connect() does)?
It doesn't impose any limitations. A connected TCP socket's "address" should be viewed as four parameters: srcip, srcport, dstip, dstport. So there's absolutely no need to bind accept()'ed socket to a random port.
Also, why is accept(sock_fd, NULL, NULL) failing with "EFAULT - The addr argument is not in a writable part of the user address space." after nearly 700 successful iterations of the same call?
Well, it's about OS internals. The amount of resources for any user process may (and should be) limited. Read your OS developer manual or such.

listen() backlog upper limits

even though a lot was said on the topic, I am still stumped.
I experiment with a monster linux server capable of handling proper load ramps, presumably many thousand connections a second. Now, if i check default listen() queue:
#cat /proc/sys/net/core/somaxconn
128
which couldn't be actual queue size at all. I suspect it might be a legacy, and actual size is given by this:
#cat /proc/sys/net/ipv4/tcp_max_syn_backlog
2048
However, man tcp says the latter is connections awaiting ACK from clients, which is different from total number of connections having not yet been accepted, which is what listen() backlog is.
So my question is how can I increase listen() backlog, and how to get/set upper limit of it (right before kernel recompilation)?
somaxconn is the number of complete connections waiting.
tcp_max_syn_backlog is the number of incomplete connections waiting.
They aren't the same thing. It's all described in the man page.
You increase it by following these instructions: https://serverfault.com/questions/271380/how-can-i-increase-the-value-of-somaxconn - basically by using sysctl.
And yes, somaxconn is the cap on listen backlog.

Linux Socket: How to detect disconnected network in a client program?

I am debugging a c based linux socket program. As all the examples available in websites,
I applied the following structure:
sockfd= socket(AF_INET, SOCK_STREAM, 0);
connect(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr));
send_bytes = send(sockfd, sock_buff, (size_t)buff_bytes, MSG_DONTWAIT);
I can detect the disconnection when the remove server closes its server program. But if I unplug the ethernet cable, the send function still return positive values rather than -1.
How can I check the network connection in a client program assuming that I can not change server side?
But if I unplug the ethernet cable, the send function still return
positive values rather than -1.
First of all you should know send doesn't actually send anything, it's just a memory-copying function/system call. It copies data from your process to the kernel - sometime later the kernel will fetch that data and send it to the other side after packaging it in segments and packets. Therefore send can only return an error if:
The socket is invalid (for example bogus file descriptor)
The connection is clearly invalid, for example it hasn't been established or has already been terminated in some way (FIN, RST, timeout - see below)
There's no more room to copy the data
The main point is that send doesn't send anything and therefore its return code doesn't tell you anything about data actually reaching the other side.
Back to your question, when TCP sends data it expects a valid acknowledgement in a reasonable amount of time. If it doesn't get one, it resends. How often does it resend ? Each TCP stack does things differently, but the norm is to use exponential backoffs. That is, first wait 1 second, then 2, then 4 and so on. On some stacks this process can take minutes.
The main point is that in the case of an interruption TCP will declare a connection dead only after a seriously large period of silence (on Linux it does something like 15 retries - more than 5 minutes).
One way to solve this is to implement some acknowledgement mechanism in your application. You could for example send a request to the server "reply within 5 seconds or I'll declare this connection dead" and then recv with a timeout.
To detect a remote-disconnect, do a read()
Check this thread for more info:
Can read() function on a connected socket return zero bytes?
You can't detect the unplugged ethernet cable only with calling write() funcation.
That's because of tcp retransmission acted by tcp stack without your consciousness.
Here are solutions.
Even though you already set keepalive option to your application socket, you can't detect in time the dead connection state of the socket, in case of your app keeps writing on the socket.
That's because of tcp retransmission by the kernel tcp stack.
tcp_retries1 and tcp_retries2 are kernel parameters for configuring tcp retransmission timeout.
It's hard to predict precise time of retransmission timeout because it's calculated by RTT mechanism.
You can see this computation in rfc793. (3.7. Data Communication)
https://www.rfc-editor.org/rfc/rfc793.txt
Each platforms have kernel configurations for tcp retransmission.
Linux : tcp_retries1, tcp_retries2 : (exist in /proc/sys/net/ipv4)
http://linux.die.net/man/7/tcp
HPUX : tcp_ip_notify_interval, tcp_ip_abort_interval
http://www.hpuxtips.es/?q=node/53
AIX : rto_low, rto_high, rto_length, rto_limit
http://www-903.ibm.com/kr/event/download/200804_324_swma/socket.pdf
You should set lower value for tcp_retries2 (default 15) if you want to early detect dead connection, but it's not precise time as I already said.
In addition, currently you can't set those values only for single socket. Those are global kernel parameters.
There was some trial to apply tcp retransmission socket option for single socket(http://patchwork.ozlabs.org/patch/55236/), but I don't think it was applied into kernel mainline. I can't find those options definition in system header files.
For reference, you can monitor your keepalive socket option through 'netstat --timers' like below.
https://stackoverflow.com/questions/34914278
netstat -c --timer | grep "192.0.0.1:43245 192.0.68.1:49742"
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (1.92/0/0)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (0.71/0/0)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (9.46/0/1)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (8.30/0/1)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (7.14/0/1)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (5.98/0/1)
tcp 0 0 192.0.0.1:43245 192.0.68.1:49742 ESTABLISHED keepalive (4.82/0/1)
In addition, when keepalive timeout ocurrs, you can meet different return events depending on platforms you use, so you must not decide dead connection status only by return events.
For example, HP returns POLLERR event and AIX returns just POLLIN event when keepalive timeout occurs.
You will meet ETIMEDOUT error in recv() call at that time.
In recent kernel version(since 2.6.37), you can use TCP_USER_TIMEOUT option will work well. This option can be used for single socket.
Finally, you can use read function with MSG_PEEK flag, which can let you check that the socket is okay. (MSG_PEEK just peeks if data arrived at kernel stack buffer and never copies the data into user buffer.)
So you can use this flag just for checking socket is okay without any side effect.
Check the return value, and see if it's equal to this value:
EPIPE
This socket was connected but the connection is now broken. In this case, send generates a SIGPIPE signal first; if that signal is ignored or blocked, or if its handler returns, then send fails with EPIPE.
Also add a check for the SIGPIPE signal in your handler, to make it be more controllable.

What value of backlog should I use?

I read the man 2 listen.
I don't understand what is the backlog value, it says
The backlog argument defines the maximum length to which the queue of pending connections for sockfd may grow
Right, how can I define what is the best value?
Thanks
Basically, what the listen() backlog affects is how many incoming connections can queue up if your application isn't accept()ing connections as soon as they come in. It's not particularly important to most applications. The maximum value used by most systems is 128, and passing that is generally safe.
It's a fight between clients trying to connect. pushing accept requests onto the queue, and the accept thread/s sucking them off. Usually, the threads win. I usually set at 32, but it's not usually an important parameter.

Making a reliable UDP by socket function in c

I am having this doubt in socket programming which I could not get cleared by reading the man pages.
In c the declaration of socket function is int socket(int domain, int type, int protocol);
The linux man page says that while type decides the stream that will be followed the protocol number is the one that decides the protocol being followed.
So my question is that suppose I give the type parameter as SOCK_STREAM which is reliable and add the protocol number for UDP would it give me a reliable UDP which is same as TCP but without flow control and congestion control.
Unfortunately I can't test this out as I have a single machine so there is no packet loss happening.
Could anyone clear this doubt? Thanks a lot...
UDP cannot be made reliable. Transmission of the packets is done on a "best effort" capacity, but any router/host along the chain is free to drop the packet in the garbage and NOT inform the sender that it has done so.
That's not to say you can't impose extra semantics on the sending and receiving ends to expect packets within a certain time frame and say "hey, we didn't get anything in the last X seconds". But that can't be done at the protocol level. UDP is a "dump it into the outbox and hope it gets there" protocol.
No. For an IPV4 or IPV6 protocol stack, SOCK_STREAM is going to get you TCP and the type SOCK_DGRAM is going to give you UDP. The protocol parameter is not used for either of the choices and the socket library is typically expecting a value of 0 to be specified there.
If you do this:
socket(AF_INET,SOCK_STREAM,IPPROTO_UDP):
socket() will return -1 and sett errno to
EPROTONOSUPPORT
The protocol type or the specified protocol
is not supported within this domain.

Resources