Linux read() minimum data amount

Linux read() minimum data amount - c

When using read() syscall in Linux for reading from whatever source (file, socket, pipe), is there a minimum data amount that can be returned (in blocking mode)? Or can the syscall even return 1 byte?
When I want to read a single int (4 or 8 bytes) from a pipe, do I still need to check the return value of read() to see if I received less than sizeof(int) bytes?

There is no minimum, except on block mode devices where the minimum is the block size.
You should always check the return value; things can break, you should plan for breakage and handle short reads and errors appropriately instead of assuming the other side is always perfect.

Related

What happens when you call read() with length that is too large?

What happens when you call read() (or recv()) on an open socket, and you specify a length that is more the number of bytes ready to be read in the buffer (TCP) or the length of the next datagram (UDP)?

In both cases, if the size of the buffer is larger than the amount of available data, what data is available is read and the number of bytes actually read are returned from the function. That return value is what you should use when operating on the data.

Can I free my data immediately after send() in blocking mode?

the docs say for send:
When the message does not fit into the send buffer of the socket,
send() normally blocks, unless the socket has been placed in non-block-
ing I/O mode. In non-blocking mode it would return EAGAIN in this
case. The select(2) call may be used to determine when it is possible
to send more data.
I am in blocking mode, doing something along the lines of:
buf = malloc(size);
send (socket, buf, size);
free(buf)
Assume but is very large, larger than the buffer can hold at a time (so it would need to go into the buffer as two chunks lets say). Anyways, in blocking mode, which I'm in, after send, can I feel safe that the data is fully copied or dealt with and thus deletable?

In blocking mode, send blocks until I/O is complete, or an error is triggered. You should check the returned value, because a send operation does not guarantee that the number of bytes sent is the same number of bytes passed as third argument.
Only when send returns a value equal to the size of the buffer sent you can be sure that the whole block has been copied into kernel memory, or passed through device memory, or sent to the destination.

The short answer is: Yes, you can free the buffer after the send() call successfully returns (without errors) when the file descriptor is in blocking mode.
The reason for this is based on the blocking concept itself: The send() call (targeting a blocking file descriptor) will only return when an error occur or the requested size bytes of the data in the buf is buffered or transmitted by the underlying layer of the operating system (typically the kernel).
Also note that a successful return of send() doesn't mean that the data was transmitted. It means that it was, at least, buffered by the underlying layer.

How many bytes should I read/write to a socket?

I'm having some doubts about the number of bytes I should write/read through a socket in C on Unix. I'm used to sending 1024 bytes, but this is really too much sometimes when I send short strings.
I read a string from a file, and I don't know how many bytes this string is, it can vary every time, it can be 10, 20 or 1000. I only know for sure that it's < 1024. So, when I write the code, I don't know the size of bytes to read on the client side, (on the server I can use strlen()). So, is the only solution to always read a maximum number of bytes (1024 in this case), regardless of the length of the string I read from the file?
For instance, with this code:
read(socket,stringBuff,SIZE);
wouldn't it be better if SIZE is 10 instead of 1024 if I want to read a 10 byte string?

In the code in your question, if there are only 10 bytes to be read, then it makes no difference whether SIZE is 10 bytes, 1,024 bytes, or 1,000,024 bytes - it'll still just read 10 bytes. The only difference is how much memory you set aside for it, and if it's possible for you to receive a string up to 1,024 bytes, then you're going to have to set aside that much memory anyway.
However, regardless of how many bytes you are trying to read in, you always have to be prepared for the possibility that read() will actually read a different number of them. Particularly on a network, when you can get delays in transmission, even if your server is sending a 1,024 byte string, less than that number of bytes may have arrived by the time your client calls read(), in which case you'll read less than 1,024.
So, you always have to be prepared for the need to get your input in more than one read() call. This means you need to be able to tell when you're done reading input - you can't rely alone on the fact that read() has returned to tell you that you're done. If your server might send more than one message before you've read the first one, then you obviously can't hope to rely on this.
You have three main options:
Always send messages which are the same size, perhaps padding smaller strings with zeros if necessary. This is usually suboptimal for a TCP stream. Just read until you've received exactly this number of bytes.
Have some kind of sentinel mechanism for telling you when a message is over. This might be a newline character, a CRLF, a blank line, or a single dot on a line followed by a blank line, or whatever works for your protocol. Keep reading until you have received this sentinel. To avoid making inefficient system calls of one character at a time, you need to implement some kind of buffering mechanism to make this work well. If you can be sure that your server is sending you lines terminated with a single '\n' character, then using fdopen() and the standard C I/O library may be an option.
Have your server tell you how big the message is (either in an initial fixed length field, or using the same kind of sentinel mechanism from point 2), and then keep reading until you've got that number of bytes.

The read() system call blocks until it can read one or more bytes, or until an error occurs.
It DOESN'T guarantee that it will read the number of bytes you request! With TCP sockets, it's very common that read() returns less than you request, because it can't return bytes that are still propagating through the network.
So, you'll have to check the return value of read() and call it again to get more data if you didn't get everything you wanted, and again, and again, until you have everything.

What are the conditions under which a short read/write can occur?

The read and write functions (and relatives like send, recv, readv, ...) can return a number of bytes less than the requested read/write length if interrupted by a signal (under certain circumstances), and perhaps in other cases too. Is there a well-defined set of conditions for when this can happen, or is it largely up to the implementation? Here are some particular questions I'm interested in the answers to:
If a signal handler is non-interrupting (SA_RESTART) that will cause IO operations interrupted before any data is transferred to be restarted after the signal handler returns. But if a partial read/write has already occurred and the signal handler is non-interrupting, will the syscall return immediately with the partial length, or will it be resumed attempting to read/write the remainder?
Obviously read functions can return short reads on network, pipe, and terminal file descriptors when less data than the requested amount is available. But can write functions return short writes in these cases due to limited buffer size, or will they block until all the data can be written?
I'd be interested in all three of standards-required, common, and Linux-specific behavior.

For your second question : write can return short writes for a limited buffer size if it is non-blocking

There's at least one standard condition that can cause write on a regular file to return a short size:
If a write() requests that more bytes
be written than there is room for (for
example, [XSI] the file size limit
of the process or the physical end of
a medium), only as many bytes as there
is room for shall be written. For
example, suppose there is space for 20
bytes more in a file before reaching a
limit. A write of 512 bytes will
return 20. The next write of a
non-zero number of bytes would give a
failure return (except as noted
below).

With sendfile(), is it possible to tell when in_fd is at EOF?

Reading through the man page of the Linux system call sendfile, I am wondering whether it is possible for the calling program to know when in_fd is at EOF. Presumably, this could be signaled by a return value of 0, but this leads to the question of what a return value of 0 actually means. If sendfile is like write, then a return value of 0 would just mean that 0 bytes were copied. But, if sendfile is like read, then a return value of 0 would mean EOF. Must one know in advance how many bytes that are to be copied from in_fd to out_fd in order to use sendfile? What does it mean when sendfile returns 0?

I don't think there is any direct way to know that but it shouldn't really matter. Normally you would find the input file size via stat/fstat and use that to count out your transfer. The socket end isn't going to matter to you.
The only situation this should be problematic is if you want to transfer a file that is growing or shrinking. Given that the input file has to be mmap-ed, and the bad things that can happen (without some clever code) with mmap in those situations you should probably just not employ sendfile for those cases.

you can use offset parameter for read count.
According to Man page
If offset is not NULL, then it points to a variable holding the file offset from which sendfile() will start reading data from in_fd. When sendfile() returns, this variable will be set to the offset of the byte following the last byte that was read. If offset is not NULL, then sendfile() does not modify the current file offset of in_fd; otherwise the current file offset is adjusted to reflect the number of bytes read from in_fd.
count is the number of bytes to copy between the file descriptors.
RETURN VALUE
If the transfer was successful, the number of bytes written to out_fd is returned. On error, -1 is returned, and errno is set appropriately.
and yes that means return value 0 means no data copied to write socket.

You can assume EOF has been reached when then number of bytes sent is 0:
sent = sendfile(out_fd, in_fd, &offset, nbytes);
if (sent == 0) {
// EOF
...
}
This assumption works also in case of non-blocking sockets.

in my case, encounter the file be truncate by rsync, app use sendfile to transmit file in the same time. I find the app eat cpu 100% in the condition, I fix my code refer the follow article , the question disappear.
http://www.linuxjournal.com/article/6345
the point is use F_SETLEASE get the file leases for your app.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight