I read the man 2 listen.
I don't understand what is the backlog value, it says
The backlog argument defines the maximum length to which the queue of pending connections for sockfd may grow
Right, how can I define what is the best value?
Thanks
Basically, what the listen() backlog affects is how many incoming connections can queue up if your application isn't accept()ing connections as soon as they come in. It's not particularly important to most applications. The maximum value used by most systems is 128, and passing that is generally safe.
It's a fight between clients trying to connect. pushing accept requests onto the queue, and the accept thread/s sucking them off. Usually, the threads win. I usually set at 32, but it's not usually an important parameter.
Related
even though a lot was said on the topic, I am still stumped.
I experiment with a monster linux server capable of handling proper load ramps, presumably many thousand connections a second. Now, if i check default listen() queue:
#cat /proc/sys/net/core/somaxconn
128
which couldn't be actual queue size at all. I suspect it might be a legacy, and actual size is given by this:
#cat /proc/sys/net/ipv4/tcp_max_syn_backlog
2048
However, man tcp says the latter is connections awaiting ACK from clients, which is different from total number of connections having not yet been accepted, which is what listen() backlog is.
So my question is how can I increase listen() backlog, and how to get/set upper limit of it (right before kernel recompilation)?
somaxconn is the number of complete connections waiting.
tcp_max_syn_backlog is the number of incomplete connections waiting.
They aren't the same thing. It's all described in the man page.
You increase it by following these instructions: https://serverfault.com/questions/271380/how-can-i-increase-the-value-of-somaxconn - basically by using sysctl.
And yes, somaxconn is the cap on listen backlog.
This question already has an answer here:
Why is it assumed that send may return with less than requested data transmitted on a blocking socket?
(1 answer)
Closed 9 years ago.
After select returns with write fd set for a tcp socket. If I try to send data on that socket, what is the minimum guaranteed size of data to be sent at once using send api? I understand that I have to run a loop to make sure all the data is sent. Still i want to understand what is the minimum guaranteed data sent and why?
This has come up before. I'm still searching for the referenced answer.
Let's start with the function prototype for send()
ssize_t send(int sockfd, const void *buf, size_t len, int flags);
For blocking TCP sockets - all the documentation will suggest that send() and write() will return a value between [1..len], unless there was an error. However, the reality is that no one I know has ever observed send() returning something other than -1 (error) or just "len" in the success case to indicate all of "buf" was sent in one call. I've never felt good about this, so I code defensively and just put my blocking send calls in a loop until the entire buffer is sent.
For non-blocking TCP sockets - you should just code as if the minimum was "1" (or -1 on error). Don't make any assumptions about a minimum data size.
And for recv(), you should always assume recv() will return some random value between 1..len in the success case, or 0 (closed), or -1 (error). Don't EVER assume recv will return a full buffer.
From the official POSIX reference:
A descriptor shall be considered ready for writing when a call to an output function with O_NONBLOCK clear would not block, whether or not the function would transfer data successfully.
As you see it doesn't actually mentions any size, or that the write will even be successful, just that you will be able to write to the socket without blocking.
So the answer is that there is no minimum guaranteed size.
When select() indicates that a socket is writable it is guaranteed you can transfer at least one byte without incurring EWOULDBLOCK or EAGAIN. Nothing to say you won't incur a different error though :-)
I dont think that on a POSIX layer you have any control over that. send() will accept anything you pass into internal buffers of TCP/IP stack implementation. And what happens next with it it's already another story which you can watch over using same select() call.
If you are asking about size which will fit in one packet that's MTU (but this assumption also should be used with care in regards of fragmentation and reassembling of the packets on the way)
UPD:
I will answer your comment here. No, you shouldn't bother about fragmentation at all. Leave it to TCP/IP stack. There are a lot of reasons why you shouldn't do this.. One as an example. Your application works on Application(7) layer of OSI model (although I consider OSI model in most cases to be an evil thing, it's really applicable for this example). And from this layer you trying to affect functionality/properties of a logic which is on much lower layers (Session/Transport). You shouldn't do this. POSIX calls like send(), recv() are designed to give your application an ability to instruct underneath layers that you need to pass certain amount of data and you have a way to monitor an execution of a command ( select()), that's all you have to do. And lower layers suppose to do their best to deliver data you instruct them to do in most optimal way depending on OS network settings/etc.
UPD2: everything above mostly consider NON_BLOCKING sockets. Sorry forgot to mention this, I don't use blocking sockets in my projects for ages.. in case your socket is blocking I still would consider to pass everything at once and just wait for operation result in another thread for example, because trying to optimise this could lead to very OS/drivers dependant code.
I have the following problem:
I have sockfd = socket(AF_INET, SOCK_STREAM, 0)
After I set up and bind the socket (let's say with sockfd.sin_port = htons(666)), I immediately do:
listen(sockfd, 3);
sleep(50); // for test purposes
I'm sleeping for 50 seconds to test the backlog argument, which seems to be ignored because I can establish a connection* more than 3 times on port 666.
*: What I mean is that I get a syn/ack for each Nth SYN (n>3) sent from the client and placed in the listen queue, instead of being dropped. What could be wrong? I've read the man pages of listen(2) and tcp(7) and found:
The behavior of the backlog argument on TCP sockets changed with Linux 2.2.
Now it specifies the queue length for completely established sockets waiting to be accepted, instead of the number of incomplete connection requests. The maximum length of the queue for incomplete sockets can
be set using
/proc/sys/net/ipv4/tcp_max_syn_backlog.
When syncookies are enabled there is
no logical maximum length and this
setting is ignored. See tcp(7) for
more
information.
, but even with sysctl -w sys.net.ipv4.tcp_max_syn_backlog=2 and sysctl -w net.ipv4.tcp_syncookies=0, I still get the same results! I must be missing something or completely missunderstand listen()'s backlog purpose.
The backlog argument to listen() is only advisory.
POSIX says:
The backlog argument provides a hint
to the implementation which the
implementation shall use to limit the
number of outstanding connections in
the socket's listen queue.
Current versions of the Linux kernel round it up to the next highest power of two, with a minimum of 16. The revelant code is in reqsk_queue_alloc().
Different operating systems provide different numbers of queued connections with different backlog numbers. FreeBSD appears to be one of the few OSes that actually has a 1-to-1 mapping. (source: http://books.google.com/books?id=ptSC4LpwGA0C&lpg=PA108&ots=Kq9FQogkTr&dq=berkeley%20listen%20backlog%20ack&pg=PA108#v=onepage&q=berkeley%20listen%20backlog%20ack&f=false )
I have a doubt regarding the backlog value in listen system call. From man page of listen system call.
If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn, then it is silently truncated to that value; the default value in this file is 128.
It means my server can accept only <128 connections at once. What if I want to accept more connection >128 ?? Can I simply set the value to the possible maximum number so that I can access more number of connection ??
That number is only the size of the connection queue, where new connections wait for somebody to accept them. As soon as your application calls accept(), a waiting connection is removed from that queue. So, you can definitely handle more than 128 simultaneous connections because they usually only spend a short time in the queue.
Yes. Use a command such as
$ echo 1000 >/proc/sys/net/core/somaxconn
To set the limit higher. See, for instance, this page for more tuning tips.
The backlog value is not the number of maximum connections, it's the number of outstanding connections, i.e connections which you havn't accept():ed.
In standard tcp implementations (say, on bsd), does anybody know if it's possible to find out how many bytes have been ack-ed by the remote host? Calling write() on a socket returns the number of bytes written, but I believe this actually means the number of bytes that could fit into the tcp buffer (not the number of bytes written to the network, or the number of bytes acked). Or maybe I'm wrong...
thanks!
When you have NODELAY=false (which is the default), when you call send() with less bytes than the TCP window, the bytes are not really sent immediately, so you're right. The OS will wait a little to see if you call another send(), in order to use only one packet to transmit the combined data, and avoid wasting a TCP header.
When NODELAY=true the data is transmitted when you call send(), so you can (theoretically) count on the returned value. But this is not recommended due to the added network inefficiency.
All in all, if you don't need absolute precision, you can use the value returned by send() even when NODELAY=true. The value will not reflect immediate reality, but some miliseconds later it will (but also check for lost connections, since the last data block you sent could have been lost). Once the connection is gracefully terminated, you can trust all the data was transmitted. If it wasn't, you'll know before - either because the connection was abruptly dropped or because you received a data retention related error (or any other).
I don't know of any way to get this and its probably not useful to you anyway.
Assuming you want to know how much data was received by the host so that after connection lost and re-connection you can start sending from there again. So, the ACK'd data has only been ACK'd by the OS! It doesn't indicate what data has been received by your program on the other side; depending on the size of the TCP receive buffer there, your program could be hundreds of KB behind. If you want to know how much data has been received and 'used' by the program there, then get it to send application-level ACKs
I think you're wrong, although its one of those places where I'd want to look at the specific implementation before I would bet serious money on it. Consider, though, the case of a TCP connection where the connection is dropped immediately after the original handshake. If the number-of-bytes returned were just the number of buffered, it would be possible to apparently have written a number of bytes, but have them remain undelivered; that would violate TCP's guarantee-of-delivery property.
Note, though, that this is only true of TCP; not all protocols within IP provide the same guarantee.
You might have some luck for TCP by using ioctl(fd, TIOCOUTQ, &intval); to get the outgoing queue into intval. This will be the total length kept in the queue, including "written by the app" but not yet sent. It's still the best approximation I can think of at the moment.