C & Linux: Waiting for when a file has been written to - c

I'm currently working on a project that will read data from a micro-controller via serial communications.
As of right now, the program (on my computer), opens a /dev/tty* file and is able to read/write to it. The micro-controller will send a packet of n bytes at any time. I want to know if there is any way I can tell when all of the bytes have been written to the file?
I've been looking at the select() and poll() functions, but they seem to be only able to tell when a byte is ready, but not when every byte has been written.
Any help is appreciated. Thanks!

If your n is hardcoded you can just do (with pseudocode):
received_data = offset_from_last_round
while( received_data < n )
{
use select() for waiting data to arrive
read() all data you can, dont forget to check buffer oveflow here
received_data += how much data was red
}
full_message = buffer[ 0 ... N - 1 ]
offset_to_next_round = buffer[ N .. received_data ]
If n is not hardcoded you need to do something like what #Golgauth suggested, or add some "end of transmission" sequence/byte to your message (which is tricky if you have binary file/data to transmit). In short: You need some sort of protocol.

Well, the idea is that your binary file should start by some bytes which actually give the size that has to be read next.
Read N bytes => gives the DATASIZE, i.e. how many bytes remaining: (FILESIZE - N)
Read (DATASIZE) bytes => gives the data themselves (readable by blocks/packets of size n)
This is the kind of discussion we were having here actually (this about how to interpret a raw PCM wav sound file, but this is the same point: getting the number N of samples, to determine how many blocks are to be read next to get the file with concern to the integrity).

Related

Reading from Socket in a loop?

I am creating a server/client TCP in C.
The idea is for the server to send a relatively large amount of information. However, the buffer in the client has a size of only 512 (I don't want to increase this size), and obviously, the information sent by the server is larger than this. Let's imagine 812 bytes.
What I want to do is, in the client, read 512 bytes, print them on the client's console, and then read the remaining bytes, and print them as well.
Here's what should happen:
1) Create server, and block in the read() system call (waiting for the client to write something);
2) Create the client, and write something in the socket, and then blocks on read(), waiting for the server to respond;
3) The server's read() call returns, and now server has to send that large amount of data, using the following code (after creating a new process):
dup2(new_socketfd, STDOUT_FILENO); // Redirect to socket
execlp("/application", "application", NULL); // Application that prints the information to send to the client
Let's imagine "application" printed 812 bytes of data to the socket.
4) Now the client has to read 812 bytes, with a buffer size of 512. That's my problem.
How can I approach this problem? I was wondering if I could make a loop, and read until there's nothing to read, 512 by 512 bytes. But as soon as there's nothing to read, client will block on read().
Any ideas?
recv will block when there is no data in the stream. Any data extracted from the stream, the length is returned from recv.
You can write a simple function to extract the full data just by using an offset variable and checking the return value.
A simple function like this will do.
ssize_t readfull(int descriptor,char* buffer, ssize_t sizetoread){
ssize_t offset = 0;
while (offset <sizetoread) {
ssize_t read = recv(descriptor,buffer+offset,sizetoread-offset,0);
if(read < 1){
return offset;
}
offset+=read;
}
return offset;
}
Also servers typically send some kind of EOF when the data is finished. Either the server might first send the length of the message to be read which is a constant size either four or eight bytes, then it sends the data so you know ahead of time how much to read. Or, in the case of HTTP for example, there is the content-length field as well as the '\r\n' delimeters.
Realistically there is no way to know how much data the server has available to send you, it's impractical. The server has to tell you how much data there is through some kind of indicator.
Since you're writing the server yourself, you can first send a four byte message which can be an int value of how much data the client should read.
So your server can look like this:
int sizetosend = arbitrarysize;
send(descriptor,(char*)&sizetosend,sizeof(int),0);
send(descriptor,buffer,sizetosend,0);
Then on your client side, read four bytes then the buffer.
int sizetoread = 0;
ssize_t read = recv(descriptor,(char*)&sizetoread,sizeof(int),0);
if(read < 4)
return;
//Now just follow the code I posted above

What is the best way to read input of unpredictable and indeterminate (ie no EOF) size from stdin in C?

This must be a stupid question because this should be a very common and simple problem, but I haven't been able to find an answer anywhere, so I'll bite the bullet and ask.
How on earth should I go about reading from the standard input when there is no way of determining the size of the data? Obviously if the data ends in some kind of terminator like a NUL or EOF then this is quite trivial, but my data does not. This is simple IPC: the two programs need to talk back and forth and ending the file streams with EOF would break everything.
I thought this should be fairly simple. Clearly programs talk to each other over pipes all the time without needing any arcane tricks, so I hope there is a simple answer that I'm too stupid to have thought of. Nothing I've tried has worked.
Something obvious like (ignoring necessary realloc's for brevity):
int size = 0, max = 8192;
unsigned char *buf = malloc(max);
while (fread((buf + size), 1, 1, stdin) == 1)
++size;
won't work since fread() blocks and waits for data, so this loop won't terminate. As far as I know nothing in stdio allows nonblocking input, so I didn't even try any such function. Something like this is the best I could come up with:
struct mydata {
unsigned char *data;
int slen; /* size of data */
int mlen; /* maximum allocated size */
};
...
struct mydata *buf = xmalloc(sizeof *buf);
buf->data = xmalloc((buf->mlen = 8192));
buf->slen = 0;
int nread = read(0, buf->data, 1);
if (nread == (-1))
err(1, "read error");
buf->slen += nread;
fcntl(0, F_SETFL, oflags | O_NONBLOCK);
do {
if (buf->slen >= (buf->mlen - 32))
buf->data = xrealloc(buf->data, (buf->mlen *= 2));
nread = read(0, (buf->data + buf->slen), 1);
if (nread > 0)
buf->slen += nread;
} while (nread == 1);
fcntl(0, F_SETFL, oflags);
where oflags is a global variable containing the original flags for stdin (cached at the start of the program, just in case). This dumb way of doing it works as long as all of the data is present immediately, but fails otherwise. Because this sets read() to be non-blocking, it just returns -1 if there is no data. The program communicating with mine generally sends responses whenever it feels like it, and not all at once, so if the data is at all large this exits too early and fails.
How on earth should I go about reading from the standard input when there is no way of determining the size of the data?
There always has to be a way to determinate the size. Otherwise, the program would require infinite memory, and thus impossible to run on a physical computer.
Think about it this way: even in the case of a never-ending stream of data, there must be some chunks or points where you have to process it. For instance, a live-streamed video has to decode a portion of it (e.g. a frame). Or a video game which processes messages one by one, even if the game has undetermined length.
This holds true regardless of the type of I/O you decide to use (blocking/non-blocking, synchronous/asynchronous...). For instance, if you want to use typical blocking synchronous I/O, what you have to do is process the data in a loop: each iteration, you read as much data as is available, and process as much as you can. Whatever you can not process (because you have not received enough yet), you keep for the next iteration. Then, the rest of the loop is the rest of the logic of the program.
In the end, regardless of what you do, you (or someone else, e.g. a library, the operating system, the hardware buffers...) have to buffer incoming data until it can be processed.
Basically, you have two choices -- synchronous or asynchronous -- and both have their advantages and disadvantages.
For synchronous, you need either delimeters or a length field embedded in the record (or fixed length records, but that is pretty inflexible). This works best for synchronous protocols like synchronous rpc or simplex client-server interactions where only one side talks at a time while the other side waits. For ASCII/text based protocols, it is common to use a control-character delimiter like NL/EOL or NUL or CTX to mark the end of messages. Binary protocols more commonly use an embedded length field -- the receiver first reads the length and then reads the full amount of (expected) data.
For asynchronous, you use non-blocking mode. It IS possible to use non-blocking mode with stdio streams, it just requires some care. out-of-data conditions show up to stdio like error conditions, so you need to use ferror and clearerr on the FILE * as appropriate.
It's possible for both to be used -- for example in client-server interactions, the clients may use synchronous (they send a request and wait for a reply) while the server uses asynchronous (to be be robust in the presence of misbehaving clients).
The read api on Linux or the ReadFile Api on windows will immediately return and not wait for the specified number of bytes to fill the buffer (when reading a pipe or socket). Read then reurns the number of bytes read.
This means, when reading from a pipe, you set a buffersize, read as much as returned and the process it. You then read the next bit. The only time you are blocked is if there is no data available at all.
This differs from fread which only returns once the desired number of bytes are returned or the stream determines doing so is impossible (like eof).

Sockets - Reading and writing [duplicate]

I'm very new to C++, but I'm trying to learn some basics of TCP socket coding. Anyway, I've been able to send and receive messages, but I want to prefix my packets with the length of the packet (like I did in C# apps I made in the past) so when my window gets the FD_READ command, I have the following code to read just the first two bytes of the packet to use as a short int.
char lengthBuffer[2];
int rec = recv(sck, lengthBuffer, sizeof(lengthBuffer), 0);
short unsigned int toRec = lengthBuffer[1] << 8 | lengthBuffer[0];
What's confusing me is that after a packet comes in the 'rec' variable, which says how many bytes were read is one, not two, and if I make the lengthBuffer three chars instead of two, it reads three bytes, but if it's four, it also reads three (only odd numbers). I can't tell if I'm making some really stupid mistake here, or fundamentally misunderstanding some part of the language or the API. I'm aware that recv doesn't guarantee any number of bytes will be read, but if it's just two, it shouldn't take multiple reads.
Because you cannot assume how much data will be available, you'll need to continuously read from the socket until you have the amount you want. Something like this should work:
ssize_t rec = 0;
do {
int result = recv(sck, &lengthBuffer[rec], sizeof(lengthBuffer) - rec, 0);
if (result == -1) {
// Handle error ...
break;
}
else if (result == 0) {
// Handle disconnect ...
break;
}
else {
rec += result;
}
}
while (rec < sizeof(lengthBuffer));
Streamed sockets:
The sockets are generally used in a streamed way: you'll receive all the data sent, but not necessarily all at once. You may as well receive pieces of data.
Your approach of sending the length is hence valid: once you've received the length, you cann then load a buffer, if needed accross successive reads, until you got everything that you expected. So you have to loop on receives, and define a strategy on how to ahandle extra bytes received.
Datagramme (packet oriented) sockets:
If your application is really packet oriented, you may consider to create a datagramme socket, by requesting linux or windows socket(), the SOCK_DGRAM, or better SOCK_SEQPACKET socket type.
Risk with your binary size data:
Be aware that the way you send and receive your size data appers to be assymetric. You have hence a major risk if the sending and receiving between machine with CPU/architectures that do not use the same endian-ness. You can find here some hints on how to ame your code platform/endian-independent.
TCP socket is a stream based, not packet (I assume you use TCP, as to send length of packet in data does not make any sense in UDP). Amount of bytes you receive at once does not have to much amount was sent. For example you may send 10 bytes, but receiver may receive 1 + 2 + 1 + 7 or whatever combination. Your code has to handle that, be able to receive data partially and react when you get enough data (that's why you send data packet length for example).

Reading wrong data from TCP socket

I'm trying to send data blockwise over a TCP socket. The server code does the following:
#define CHECK(n) if((r=n) <= 0) { perror("Socket error\n"); exit(-1); }
int r;
//send the number of blocks
CHECK(write(sockfd, &(storage->length), 8)); //p->length is uint64_t
for(p=storage->first; p!=NULL; p=p->next) {
//send the size of this block
CHECK(write(sockfd, &(p->blocksize), 8)); //p->blocksize is uint64_t
//send data
CHECK(write(sockfd, &(p->data), p->blocksize));
}
On the client side, I read the size and then the data (same CHECK makro):
CHECK(read(sockfd, &block_count, 8));
for(i=0; i<block_count; i++) {
uint64_t block_size;
CHECK(read(sockfd, &block_size, 8));
uint64_t read_in=0;
while(read_in < block_size) {
r = read(sockfd, data+read_in, block_size-read_in); //assume data was previously allocated as char*
read_in += r;
}
}
This works perfectly fine as long as both client and server run on the same machine, but as soon as I try this over the network, it fails at some point. In particular, the first 300-400 blocks (à ~587 bytes) or so work fine, but then I get an incorrect block_size reading:
received block #372 size : 586
read_in: 586 of 586
received block #373 size : 2526107515908
And then it crashes, obviously.
I was under the impression that the TCP protocol ensures no data is lost and everything is received in correct order, but then how is this possible and what's my mistake here, considering that it already works locally?
There's no guarantee that when you read block_count and block_size that you will read all 8 bytes in one go.
I was under the impression that the TCP protocol ensures no data is
lost and everything is received in correct order
Yes, but that's all that TCP guarantees. It does not guarantee that the data is sent and received in a single packet. You need to gather the data and piece them together in a buffer until you get the block size you want before copying the data out.
Perhaps the read calls are returning without reading the full 8 bytes. I'd check what length they report they've read.
You might also find valgrind or strace informative for better understanding why your code is behaving this way. If you're getting short reads, strace will tell you what the syscalls returned, and valgrind will tell you that you're reading uninitialized bytes in your length variables.
The reason why it works on the same machine is that the block_size and block_count are sent as binary values and when they are received and interpreted by the client, they have same values.
However, if two machines communicating have different byte order for representing integers, e.g. x86 versus SPARC, or sizeof(int) is different, e.g. 64 bit versus 32 bit, then the code will not work correctly.
You need to verify that sizeof(int) and byte order of both machines is identical. On the server side, print out sizeof(int) and values of storage->length and p->blocksize. On the client side print out sizeof(int) and values of block_count and block_size.
When it doesn't work correctly, I think you will find them that they are not the same. If this is true, then the contents of data is also going to be misinterpreted if it contains any binary data.

C unix socket programming read() issue

I'm using C to implement a client server application. The client sends info to the server and the server uses it to send information back. I'm currently in the process of writing the code to handle the receiving of data to ensure all of it is, in fact, received.
The issue I'm having is best explained after showing some code:
int totalRead = 0;
char *pos = pBuffer;
while(totalRead < 6){
if(int byteCount = read(hSocket, pos, BUFFER_SIZE - (pos-pBuffer)>0)){
printf("Read %d bytes from client\n", byteCount);
pos += byteCount;
totalRead += byteCount;
}else return -1;
}
The code above runs on the server side and will print out "Read 1 bytes from client" 6 times and the program will continue working fine. I've hard-coded 6 here knowing I'm writing 6 bytes from the client side but I'll make my protocol require the first byte sent to be the length of rest of the buffer.
int byteCount = read(hSocket, pBuffer, BUFFER_SIZE);
printf("Read %d bytes from client", byteCount);
The code above, used in place of the first code segment, will print "Read 6 bytes from client" and continue working fine but it doesn't guarantee I've received every byte if only 5 were read for instance.
Can anyone explain to me why this is happening and a possible solution? I guess the first method ensures all bytes are being delivered but it seems inefficient reading one byte at a time...
Oh and this is taking place in a forked child process and I'm using tcp/ip.
Note: My goal is to implement the first code segment successfully so I can ensure I'm reading all bytes, I'm having trouble implementing it correctly.
Basically the right way to do this is a hybrid of your two code snippets. Do the first one, but don't just read one byte at a time; ask for all the bytes you're expecting. But look at bytesRead, and if it's less than you expected, adjust your destination pointer, adjust your expected number read, and call read() again. This is just how it works: sometimes the data you're expecting is split across packets and isn't all available at the same time.
Reading your comment below and looking at your code, I was puzzled, because yeah, that is what you're trying to do. But then I looked very closely at your code:
read(hSocket, pos, BUFFER_SIZE - (pos-pBuffer)>0)){
^
|
THIS ---------|
That "> 0" is inside the parentheses enclosing read's arguments, not outside; that means it's part of the arguments! In fact, your last argument is interpreted as
(BUFFER_SIZE - (pos-pBuffer)) > 0
which is 1, until the end, when it becomes 0.
Your code isn't quite right. read and write may not read or write the total amount of data you've requested they should. Instead, you should advance the read or write pointer after each call and count how many bytes you have left to submit with in the transmission.
If you get back negative one from either read or write you've gotten an error. A zero indicates the transmission completed (there were no more bytes to send) and any number above zero indicates how many bytes were sent in the last call to read or write respectively.

Resources