I have a doubt regarding the backlog value in listen system call. From man page of listen system call.
If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn, then it is silently truncated to that value; the default value in this file is 128.
It means my server can accept only <128 connections at once. What if I want to accept more connection >128 ?? Can I simply set the value to the possible maximum number so that I can access more number of connection ??
That number is only the size of the connection queue, where new connections wait for somebody to accept them. As soon as your application calls accept(), a waiting connection is removed from that queue. So, you can definitely handle more than 128 simultaneous connections because they usually only spend a short time in the queue.
Yes. Use a command such as
$ echo 1000 >/proc/sys/net/core/somaxconn
To set the limit higher. See, for instance, this page for more tuning tips.
The backlog value is not the number of maximum connections, it's the number of outstanding connections, i.e connections which you havn't accept():ed.
Related
I tested this on my machine by creating new connections until failure. On my machine new connect()/accept() requests fail* at near 700 socket connections (SOCK_STREAM); at the client/server respectively, on the loopback IP address. However the socket file descriptor returned by accept(), so far, is always bound to the same port as the listening socket.
My question is - If this behaviour is true for all machines then why is accept() limiting connections by creating connected sockets bound only to the same port as the listening socket? Couldn't the number of connections the server can make be increased greatly if the new sockets were bound to random ports (like connect() does)?
Also, why is accept(sock_fd, NULL, NULL) failing with "EFAULT - The addr argument is not in a writable part of the user address space." after nearly 700 successful iterations of the same call?
Similarly, why does, connect() fail with "EFAULT - The socket structure address is outside the user's address space." after nearly 700 successful iterations of the same call?
* EFAULT - Bad Address (after both accept()/connect()).
When you are listening, all connections will have the same port in the accept end of the connection (that is what is used as an identifier initially in order to establish the connection).
The local port number for the connecting part if not defined with a bind() can be anything. For the localhost device, the numbers can probably be recycled very fast on some OS, since there is no real need for lingering state of the TCP.
When it comes to having MANY connections on the same time, the amount of connections possible is limited by resources in your operating system per process. For Unix/Linux, this limit can be adjusted, put it is not advised to make the amount of FDs higher than default if using select(), since the libc size of the FDSET usually matches the default number of filedescriptors available per process. One trick around this is to create the socket, fork out children and let the children call accept(). Then each children can have many connections (apache and squid use this kind of model), increasing the total amount of connections possible on the same server port.
why is accept() limiting connections by creating connected sockets bound only to the same port as the listening socket? Couldn't the number of connections the server can make be increased greatly if the new sockets were bound to random ports (like connect() does)?
It doesn't impose any limitations. A connected TCP socket's "address" should be viewed as four parameters: srcip, srcport, dstip, dstport. So there's absolutely no need to bind accept()'ed socket to a random port.
Also, why is accept(sock_fd, NULL, NULL) failing with "EFAULT - The addr argument is not in a writable part of the user address space." after nearly 700 successful iterations of the same call?
Well, it's about OS internals. The amount of resources for any user process may (and should be) limited. Read your OS developer manual or such.
even though a lot was said on the topic, I am still stumped.
I experiment with a monster linux server capable of handling proper load ramps, presumably many thousand connections a second. Now, if i check default listen() queue:
#cat /proc/sys/net/core/somaxconn
128
which couldn't be actual queue size at all. I suspect it might be a legacy, and actual size is given by this:
#cat /proc/sys/net/ipv4/tcp_max_syn_backlog
2048
However, man tcp says the latter is connections awaiting ACK from clients, which is different from total number of connections having not yet been accepted, which is what listen() backlog is.
So my question is how can I increase listen() backlog, and how to get/set upper limit of it (right before kernel recompilation)?
somaxconn is the number of complete connections waiting.
tcp_max_syn_backlog is the number of incomplete connections waiting.
They aren't the same thing. It's all described in the man page.
You increase it by following these instructions: https://serverfault.com/questions/271380/how-can-i-increase-the-value-of-somaxconn - basically by using sysctl.
And yes, somaxconn is the cap on listen backlog.
I read the man 2 listen.
I don't understand what is the backlog value, it says
The backlog argument defines the maximum length to which the queue of pending connections for sockfd may grow
Right, how can I define what is the best value?
Thanks
Basically, what the listen() backlog affects is how many incoming connections can queue up if your application isn't accept()ing connections as soon as they come in. It's not particularly important to most applications. The maximum value used by most systems is 128, and passing that is generally safe.
It's a fight between clients trying to connect. pushing accept requests onto the queue, and the accept thread/s sucking them off. Usually, the threads win. I usually set at 32, but it's not usually an important parameter.
I have the following problem:
I have sockfd = socket(AF_INET, SOCK_STREAM, 0)
After I set up and bind the socket (let's say with sockfd.sin_port = htons(666)), I immediately do:
listen(sockfd, 3);
sleep(50); // for test purposes
I'm sleeping for 50 seconds to test the backlog argument, which seems to be ignored because I can establish a connection* more than 3 times on port 666.
*: What I mean is that I get a syn/ack for each Nth SYN (n>3) sent from the client and placed in the listen queue, instead of being dropped. What could be wrong? I've read the man pages of listen(2) and tcp(7) and found:
The behavior of the backlog argument on TCP sockets changed with Linux 2.2.
Now it specifies the queue length for completely established sockets waiting to be accepted, instead of the number of incomplete connection requests. The maximum length of the queue for incomplete sockets can
be set using
/proc/sys/net/ipv4/tcp_max_syn_backlog.
When syncookies are enabled there is
no logical maximum length and this
setting is ignored. See tcp(7) for
more
information.
, but even with sysctl -w sys.net.ipv4.tcp_max_syn_backlog=2 and sysctl -w net.ipv4.tcp_syncookies=0, I still get the same results! I must be missing something or completely missunderstand listen()'s backlog purpose.
The backlog argument to listen() is only advisory.
POSIX says:
The backlog argument provides a hint
to the implementation which the
implementation shall use to limit the
number of outstanding connections in
the socket's listen queue.
Current versions of the Linux kernel round it up to the next highest power of two, with a minimum of 16. The revelant code is in reqsk_queue_alloc().
Different operating systems provide different numbers of queued connections with different backlog numbers. FreeBSD appears to be one of the few OSes that actually has a 1-to-1 mapping. (source: http://books.google.com/books?id=ptSC4LpwGA0C&lpg=PA108&ots=Kq9FQogkTr&dq=berkeley%20listen%20backlog%20ack&pg=PA108#v=onepage&q=berkeley%20listen%20backlog%20ack&f=false )
In standard tcp implementations (say, on bsd), does anybody know if it's possible to find out how many bytes have been ack-ed by the remote host? Calling write() on a socket returns the number of bytes written, but I believe this actually means the number of bytes that could fit into the tcp buffer (not the number of bytes written to the network, or the number of bytes acked). Or maybe I'm wrong...
thanks!
When you have NODELAY=false (which is the default), when you call send() with less bytes than the TCP window, the bytes are not really sent immediately, so you're right. The OS will wait a little to see if you call another send(), in order to use only one packet to transmit the combined data, and avoid wasting a TCP header.
When NODELAY=true the data is transmitted when you call send(), so you can (theoretically) count on the returned value. But this is not recommended due to the added network inefficiency.
All in all, if you don't need absolute precision, you can use the value returned by send() even when NODELAY=true. The value will not reflect immediate reality, but some miliseconds later it will (but also check for lost connections, since the last data block you sent could have been lost). Once the connection is gracefully terminated, you can trust all the data was transmitted. If it wasn't, you'll know before - either because the connection was abruptly dropped or because you received a data retention related error (or any other).
I don't know of any way to get this and its probably not useful to you anyway.
Assuming you want to know how much data was received by the host so that after connection lost and re-connection you can start sending from there again. So, the ACK'd data has only been ACK'd by the OS! It doesn't indicate what data has been received by your program on the other side; depending on the size of the TCP receive buffer there, your program could be hundreds of KB behind. If you want to know how much data has been received and 'used' by the program there, then get it to send application-level ACKs
I think you're wrong, although its one of those places where I'd want to look at the specific implementation before I would bet serious money on it. Consider, though, the case of a TCP connection where the connection is dropped immediately after the original handshake. If the number-of-bytes returned were just the number of buffered, it would be possible to apparently have written a number of bytes, but have them remain undelivered; that would violate TCP's guarantee-of-delivery property.
Note, though, that this is only true of TCP; not all protocols within IP provide the same guarantee.
You might have some luck for TCP by using ioctl(fd, TIOCOUTQ, &intval); to get the outgoing queue into intval. This will be the total length kept in the queue, including "written by the app" but not yet sent. It's still the best approximation I can think of at the moment.