C Unix socket programming, ensuring read/write byte counts? - c

I'm writing client and server programs and I'm looking for a way to ensure all bytes are read and all bytes are sent when using read() or write() to/from the sockets opened by the client/server.
I'm assuming I'll have to use a loop to check the number of bytes returned from the read or write functions.
Something like this probably:
#define BUFFER 20
char buffer[BUFFER];
while (I haven't read all bytes from the buffer){
int bytesRead = read(theSocket, myWord, BUFFER);
}
And how would I ensure that all the bytes I am trying to transmit using write() have been transmitted?
Thanks for any help!

Yes, exactly like that. Typical read logic goes like this:
1) Call read.
2) Did we get EOF or error? If so, return.
3) Did we receive all the bytes? If so, return.
4) Go to step 1.
Note that when you call read, you'll need to pass it a pointer to the buffer after the data that was already read, and you'll need to try to read an appropriate amount of bytes that won't overflow the buffer. Also, how you tell if you received all the bytes depends on the protocol.
To write:
1) Call write, passing it a pointer to the first unwritten byte and the number of unwritten bytes.
2) Did we get zero or error? If so, return.
3) Did we write all the bytes? If so, return.
4) Go to step 1.
Note that you have to adjust appropriately for blocking or non-blocking sockets. For example, for non-blocking sockets, you have to handle EWOULDBLOCK.

Related

The magic of STREAMS in Linux. When to finish?

Today at 5am I read an article about read system call. And things become significantly clear for me.
ssize_t read(int fd, void *buf, size_t count);
The construction of *nix like operation system become amazing in it's simplicity. File interface for any entity, just ask to write some date from this fd interface into some memory by *buf pointer. All the same for network, files, streams.
But some question appears.
How to distinguish two cases?:
1) Stream is empty need to wait for new data. 2) Stream is closed need to close program.
Here is a scenario:
Reading data from STDIN in loop, this STDIN redirected by pipe.
some text_data appears
just read bite by bite until what EOF in memory, or 0 as result of read call?
How program will understand: wait for a new input, or exit?
This is unclear. In case of endless or continuous streams.
UPD After speak with #Bailey Kocin and reading some docs I have this understanding. Fix me if I'm wrong.
read holds the program execution and waits for count amount of bites.
When count amount of bites appears read writes it into buf and execution continues.
When stream is closed read returns 0, and it is a signal that program may be finished.
Question Do EOF appears in buf?
UPD2 EOF is a constant that can be in the output of getc function
while (ch != EOF) {
/* display contents of file on screen */
putchar(ch);
ch = getc(fp);
}
But in case of read the EOF value dose not appears in a buf. read system call signalize about file ending by returning 0. Instead of writing EOF constant into the data-area, as like ak in case of getc.
EOF is a constant that vary in different systems. And it used for getc.
Let's deal first with your original question. Note that man 7 pipe should give some useful information on this.
Say we have the standard input redirected to the input side of a descriptor created by a pipe call, as in:
pipe(p);
// ... fork a child to write to the output side of the pipe ...
dup2(p[0], 0); // redirect standard input to input side
and we call:
bytes = read(0, buf, 100);
First, note that this behaves no differently than simply reading directly from p[0], so we could have just done:
pipe(p);
// fork child
bytes = read(p[0], buf, 100);
Then, there are essentially three cases:
If there are bytes in the pipe (i.e., at least one byte has been written but not yet read), then the read call will return immediately, and it will return all bytes available up to a maximum of 100 bytes. The return value will be the number of bytes read, and it will always be a positive number between 1 and 100.
If the pipe is empty (no bytes) and the output side has been closed, the buffer won't be touched, and the call will return immediately with return value of 0.
Otherwise, the read call will block until something is written to the pipe or the output side is closed, and then the read call will return immediately using the rules in cases 1 and 2.
So, if a read() call returns 0, that means the end-of-file was reached, and no more bytes are expected. Waiting for additional data happens automatically, and after the wait, you'll either get data (positive return value) or an end-of-file signal (zero return value). In the special case that another process writes some bytes and then immediately closes (the output side of) the pipe, the next read() call will return a positive value up to the specified count. Subsequent read() calls will continue to return positive values as long as there's more data to read. When the data are exhausted, the read() call will return 0 (since the pipe is closed).
On Linux, the above is always true for pipes and any positive count. There can be differences for things other than pipes. Also, if the count is 0, the read() call will always return immediately with return value 0. Note that, if you are trying to write code that runs on platforms other than Linux, you may have to be more careful. An implementation is allowed to return a non-zero number of bytes less than the number requested, even if more bytes are available in the pipe -- this might mean that there's an implementation-defined limit (so you never get more than 4096 bytes, no matter how many you request, for example) or that this implementation-defined limit changes from call to call (so if you request bytes over a page boundary in a kernel buffer, you only get the end of the page or something). On Linux, there's no limit -- the read call will always return everything available up to count, no matter how big count is.
Anyway, the idea is that something like the following code should reliably read all bytes from a pipe until the output side is closed, even on platforms other than Linux:
#define _GNU_SOURCE 1
#include <errno.h>
#include <unistd.h>
/* ... */
while ((count = TEMP_FAILURE_RETRY(read(fd, buffer, sizeof(buffer)))) > 0) {
// process "count" bytes in "buffer"
}
if (count == -1) {
// handle error
}
// otherwise, end of data reached
If the pipe is never closed ("endless" or "continuous" stream), the while loop will run forever because read will block until it can return a non-zero byte count.
Note that the pipe can also be put into a non-blocking mode which changes the behavior substantially, but the above is the default blocking mode behavior.
With respect to your UPD questions:
Yes, read holds the program execution until data is available, but NO, it doesn't necessarily wait for count bytes. It will wait for a least one non-empty write to the pipe, and that will wake the process; when the process gets a chance to run, it will return whatever's available up to but not necessarily equal to count bytes. Usually, this means that if another process writes 5 bytes, a blocked read(fd, buffer, 100) call will return 5 and execution will continue. Yes, if read returns 0, it's a signal that there's no more data to be read and the write side of the pipe has been closed (so no more data will ever be available). No, an EOF value does not appear in the buffer. Only bytes read will appear there, and the buffer won't be touched when read() returns 0, so it'll contain whatever was there before the read() call.
With respect to your UPD2 comment:
Yes, on Linux, EOF is a constant equal to the integer -1. (Technically, according to the C99 standard, it is an integer constant equal to a negative value; maybe someone knows of a platform where it's something other than -1.) This constant is not used by the read() interface, and it is certainly not written into the buffer. While read() returns -1 in case of error, it would be considered bad practice to compare the return value from read() with EOF instead of -1. As you note, the EOF value is really only used for C library functions like getc() and getchar() to distinguish the end of file from a successfully read character.

How does the statements inside this IF statement work?

I just recently started my programming education within Inter-process commmunications and this piece of code was written within the parent processs code section. From what I have read about write(), it returns -1 if it failed, 0 if nothing was written to the pipe() and a positive integer if successful. How exactly does sizeof(value) help us identify this? Isn't if(write(request[WRITE],&value,sizeof(value) < 1) a much more reading friendlier alternative to what the sizeof(value).
if(sizeof(value)!=write(request[WRITE],&value,sizeof(value)))
{
perror("Cannot write thru pipe.\n");
return 1;
}
Code clarification: The variable value is an input of a digit in the parent process which the parent then sends to the child process through a pipe the child to do some arithmic operation on it.
Any help of clarification on the subject is very much apprecaited.
Edit: How do I highlight my system functions here when asking questions?
This also captures a successful but partial write, which the application wants to consider being a failure.
It's slightly easier to read without the pointless parnethesis:
if(write(request[WRITE], &value, sizeof value) != sizeof value)
So, for instance, if value is an int, it might occupy 4 bytes, but if the write() just writes 2 of those it will return 2 which is captured by this test.
At least in my opinion. Remember that sizeof is not a function.
That's not a read, that's a write. The principle is almost the same, but there's a bit of a twist.
As a general rule you are correct: write() could return a "short count", indicating a partial write. For instance, you might ask to write 2000 bytes to some file descriptor, and write might return a value like 1024 instead, indicating that 976 (2000 - 1024) bytes were not written but no actual error occurred. (This occurs, for instance, when receiving a signal while writing on a "slow" device like a tty or pty. Of course, the application must decide what to do about the partial write: should it consider this an error? Should it retry the remaining bytes? It's pretty common to wrap the write in a loop, that retries remaining bytes in case of short counts; the stdio fwrite code does this, for instance.)
With pipes, however, there's a special case: writes of sufficiently small size (less than or equal to PIPE_BUF) are atomic. So assuming sizeof(value) <= PIPE_BUF, and that this really is writing on a pipe, this code is correct: write will return either sizeof(value) or -1.
(If sizeof(value) is 1, the code is correct—albeit misleading—for any descriptor: write never returns zero. The only possible return values are -1 and some positive value between 1 and the number of bytes requested-to-write, inclusive. This is where read and write are not symmetric with respect to return values: read can, and does, return zero.)

recv - filling always the first bytes

recv (sh , buff , 5000, 0 ).
Let assume, that my buff is mallocated for x bytes. How can I write the received bytes always from the beginning? I mean I wish to start with *buff and not with buff+x.
recv starts always writing the received data to the address given (*buff).
To make your code robust you should read in a loop until all data has arrived (this implies that subsequent calls should write to buff+received bytes.
See also Handling partial return from recv() TCP in C

Read inside an if statement and buffer type

I just want to know how to following code works. I am trying to open a file and reading it using the read() function. However, I also want to check if the read is successful.
Does the code below execute read() twice? By that, I mean, does the buffer have 1024 bytes of data inside it after the code executes? Or does it only have 512 bytes of data because only the first executed and the read inside the if statement only check the value but does not really write into the buffer. Also, I'm reading raw bits into the buffer. Am I using the right buffer type? I'm sorry if I'm not being clear. Thanks in advance!
//Read the file
void *buffer;
read(fd,buffer,512);
if (read(fd,buffer,512) < 0){
printf("Error: Read was unsuccessful \n");
}
else{
printf("Read is successful \n" );
}
Yes, read(fd,buffer,512) is called twice. If you wish to use the return value more than once, store it in a variable – calling the same function again, even with the same arguments, is distinct from the previous call. In general, functions in C can have side-effects so multiple calls to the same function with the same arguments can not be optimized away (e.g., the side-effect of read is reading data from the descriptor into the buffer, and indeed calling read is done more for the fact that it reads than for sake of the value it returns).
Meanwhile if your code is complete as shown, the use of buffer is invalid since it's uninitialized.
Assuming buffer does point to an actual buffer capable of holding the data, a maximum of 512 bytes would be stored there by read. Since you don't change where buffer points to between the calls to read, the same buffer would be overwritten by the second read. (Note that there is no guarantee that all 512 bytes requested, or indeed any bytes, will actually be read - the return value of read tells you how many bytes were actually read.)
Yes. The read() function get executed twice. For checking the condition in if statement
if (read(fd,buffer,512) < 0){
...
read(fd,buffer,512) read again the data ignoring the previous one.
does the buffer have 1024 bytes of data inside it after the code executes
No. Second call to read will overwrite the data inside the buffer.
And one more thing is that you program's behavior is undefined. This is because you are using buffer without initializing it. You can use malloc function to allocate memory to your buffer
buffer = malloc(512);
You not initialize buffer pointer and use it. It is undefined behaviour. read() executes twice, and buffer can contain 1024 bytes if buffer initialized before. But it's possible that read() read only partial of 512 bytes.

Does a "UDP Client For the TIME Service" need to check the length of the read data before converting?

I'm in the middle of of reading Internetworking with TCP/IP Vol III, by Comer.
I am looking at a some sample code for a "TIME" client for UDP.
The code gets to the point where it does the read of the response, and it takes what should be a 4 bytes and converts it to a 32 bit unsigned integer, so it can be converted to UNIX time.
"n" is a file descriptor that points to a socket that listens for UDP.
n = read (s, (char *)&now, sizeof(now));
if (n < 0)
errexit("read failed: %s\n", strerror(errno));
now = ntohl((u_long)now); /* put in host byte order */
What I am wondering is:
Are there some assumptions that should be checked before making the conversion? This is in C, and I am wondering if there are situations where read would pass a number of bytes that is not 4. If so, it seems like "now" would be a mess.
"Now" is defined as:
time_t now; /* 32-bit integer to hold time */
So maybe I don't understand the nature of "time_t", or how the bytes are passed around in C, or what situations UDP would return the wrong number of bytes to the file descriptor...
Thanks in advance.
With UDP, as long as the recieve buffer you pass to read is long enough, a single UDP packet won't be broken up between read calls.
However, there's no guarantee that the other side sent a packet of at least 4 bytes - you're quite right, if a server sent only a 2 byte response then that code would leave now containing garbage.
That probably doesn't matter too much in this precise situation - after all, the server is just as free to send 4 bytes of garbage as it is to send only 2 bytes. If you want to check for it, just check that the n returned by read is as long as you were expecting.

Resources