System call read result display random characters after the result - c

I've recieved this assignment where I have to read from a file.txt(max size 4096B) four times, basically splitting it in 4 strings of equal size. I have to fill this structure(just consider field 'msg', i think the problem is there):
struct message {
long mtype
int nclient;
int pid;
char path[151];
char msg[1025];
};
I used an array of 4 struct message to store all 4 parts
This is my read:
struct message msgs[4];
for (int i = 0; i < 4; i++) {
msgs[i].nclient=pos+1;
msgs[i].mtype = 42;
msgs[i].pid = getpid();
strcpy(msgs[i].path, filespath[pos]);
if (read(fd, msgs[i].msg, nMsgSize[i]) == -1)
ErrExit("read failed");
printf("I've read: %s\nMSGSize: %d\nPath: %s\n",msgs[i].msg, nMsgSize[i], msgs[i].path);
}
I tested it on a file "sendme_5.txt" that has this text in it:
ABCD
And this is my output:
I've read: A MSGSize: 1 Path:
/home/luca/Desktop/system_call_meh/myDir/joe_bastianich/bruno_barbieri/sendme_5.txt
I've read: BP"�> MSGSize: 1 Path:
/home/luca/Desktop/system_call_meh/myDir/joe_bastianich/bruno_barbieri/sendme_5.txt
I've read: C#��;�U MSGSize: 1 Path:
/home/luca/Desktop/system_call_meh/myDir/joe_bastianich/bruno_barbieri/sendme_5.txt
I've read: D�.�>� MSGSize: 1 Path:
/home/luca/Desktop/system_call_meh/myDir/joe_bastianich/bruno_barbieri/sendme_5.txt
If i try to read the full file without dividing it in 4(with only one read), it displays it correctly.
The problem started when i changed the field char path[151]. We had to set the max size to 151 from PATH_MAX(4096) after a change in the assignment, but i dont know if it's related.
What is the problem here?

As stated above, read does not know what a null-terminated string is. It deals with raw bytes, making no assumptions about the data it reads.
As is, your strings are possibly not null-terminated. printf("%s", msgs[i].msg) might continue past the the end of the read data, possibly past the end of the buffer, searching for a null-terminating byte. Unless the data read happens to contain a null-terminating byte, or the buffer was zeroed-out beforehand (and not completely filled by read), this is Undefined Behaviour.
On success, read returns the number of bytes read into the buffer. This may be less than requested. The return value is of type ssize_t.
When using this system call to populate string buffers, the return value can be used to index and place the null-terminating byte. An additional byte should always be reserved for this case (that is, always read at most the size of the buffer minus one: char buf[256]; read(fd, buf, 255)).
Always check for error, or the return value of -1 will index the buffer out-of-bounds.
Assuming nMsgSize[i] is the exact size of the msgs[i].msg buffer:
ssize_t n;
if (-1 == (n = read(fd, msgs[i].msg, nMsgSize[i] - 1)))
ErrExit("read failed");
msgs[i].msg[n] = 0;
printf("READ:%zd/%d expected bytes, MSG:<<%s>>\n", n, nMsgSize[i] - 1, msgs[i].msg);

Related

reading in from socket until I get enough bytes

I'm trying to read in an integer which is going to let the server know the message length. I read until I reach sizeof(int) bytes using a while loop. I'm following the same convention for the message and length using a while loop to call recv multiple times. If all I'm doing is reading a int can I just call recv directly and expect all the bytes?
If not then how should I read in a integer using a while loop.
struct CONN_STAT {
int size; // length function should return the length into this field
int nRecv; // bytes sent of message
int nSent; // bytes received of message
int lRecv; // bytes received of length
int lSent; // bytes received of length
};
server : How I'm reading length
I copied the logic from my message function it reads similar to this
but info is replaced by a char array and the (info + pStat->lRecv) works for it
int readLength(int sockfd, int * info, struct CONN_STAT * pStat){
int infoSize = sizeof(int);
// I copied the logic from my message function it reads similar to this
// but info is replaces by a char array and the (info + pStat->lRecv) works for
while(pStat->lRecv < infoSize){
int n = recv(sockfd, info + pStat->lRecv, infoSize - pStat->lRecv, 0);
if (n > 0) {
pStat->lRecv += n;
}
else if (n == 0 || (n < 0 && errno == ECONNRESET)) {
close(sockfd);
return -1;
}else if (n < 0 && (errno == EWOULDBLOCK)) {
//The socket becomes non-readable. Exit now to prevent blocking.
//OS will notify us when we can read
return 0;
}else {
printf("Unexpected recv error.");
}
}
return 0;
}
Calling it like this
readLength(sockfd, (int*)pStat->size, pStat);
error: warning: cast to 'int *' from smaller integer type 'int' [-Wint-to-pointer-cast]
If all I'm doing is reading a int can I just call recv directly and expect all the bytes?
Generally speaking, no. TCP is a byte-streaming protocol, so it doesn't guarantee anything about how many bytes will be delivered by any one call to recv(). It's entirely possible (and therefore, given enough time, inevitable) that you'll recv() only the first part of the integer from a given recv() call, and you'll need to save the bytes you've received into a buffer somewhere and plan to append the rest of the bytes to that buffer later on. You can only actually parse/use the received integer after you've collected the whole set of bytes that were used to represent it.
If not then how should I read in a integer using a while loop.
Pretty much the same way you are (presumably) reading in the data-payload that follows the integer: write any received bytes into an array until the array has the number of bytes in it that are required to parse it. (In this case, you need to have sizeof(int) bytes in your array before you can read the integer as an integer... and don't forget that sizeof(int) may be a different value on different machines, and that an int may be represented in either big-endian or little-endian form. You might want to use int32_t instead of int, and htonl() and ntohl() to handle any necessary endian-conversion)
Since you're using non-blocking I/O, I suggest putting your collection-buffer into the CONN_STAT struct, so that a given call to readLength() can update the array with any received bytes and then a subsequent call can update the array some more, and so on.
The way I think about is to just see it as receiving two data-buffers: The first buffer I can assume the size of -- it will always be sizeof(int) bytes long. The second buffer I will know the size of as soon as I have received the entire first buffer and can read what it contains. So I can use (almost) the same logic for both of the two buffers, and then repeat as necessary.

How does fread deal with trailing garbage when reaching the end of the file?

I recently started dabbing in C again, a language I'm not particularly proficient at and, in fact, keep forgetting (I mostly code in Python). My idea here is to read data from a hypothetically large file as chunks and then process the data accordingly. For now, I'm simulating this by actually loading the whole file into a buffer of type short with fread. This method will be changed, since it would be a very bad idea for, say, a file that's 1 GB, I'd think. The end goal is to read a chunk as one, process, move the cursor, read another chunk and so on.
The file in question is 43 bytes and has the phrase "The quick brown fox jumps over the lazy dog". This size is convenient because it's a prime number, so no matter how many bytes I split it into, there will always be trailing garbage (due to the buffer having leftover space?). Data processing in this case is just printing out the shorts as two chars after byte manipulation (see code below)
#include <stdio.h>
#include <stdlib.h>
#define MAX_BUFF_SIZE 1024
long file_size(FILE *f)
{
if (fseek(f, 0, SEEK_END) != 0) exit(EXIT_FAILURE); // Move cursor to the end
long file_size = ftell(f); // Determine position to get file size
rewind(f);
return file_size;
}
int main(int argc, char* argv[])
{
short buff[MAX_BUFF_SIZE] = {0}; // Initialize to 0 remove trailing garbage
char* filename = argv[1];
FILE* fp = fopen(filename, "r");
if (fp)
{
size_t size = sizeof(buff[0]); // Size in bytes of each chunk. Fixed to 2 bytes
int nmemb = (file_size(fp) + size - 1) / size; // Number of chunks to read from file
// (ceil f_size/size)
printf("Should read at most %d chunks\n", nmemb);
short mask = 0xFF; // Mask to take first or second byte
size_t num_read = fread(buff, size, nmemb, fp);
printf("Read %lu chunks\n\n", num_read); // Seems to have read more? Look into.
for (int i=0; i<nmemb; i++) {
char first_byte = buff[i] & mask;
char second_byte = (buff[i] >> 8) & mask; // Identity for 2 bytes. Keep mask for consistency
printf("Chunk %02d: 0x%04x | %c %c\n", // Remember little endian (bytes reversed)
i, buff[i], first_byte, second_byte);
}
fclose(fp);
} else
{
printf("File %s not found\n", filename);
return 1;
}
return 0;
}
Now yesterday, on printing out the last chunk of data I was getting "Chunk 21: 0xffff9567 | g". The last (first?) byte (0x67) is g, and I did expect some trailing garbage, but I don't understand why it was printing out so many bytes when the variable buff has shorts in it. At that point I was just printing the hex as %x, not %04x, and buff was not initialized to 0. Today, I decided to initialize it to 0 and not only did the garbage disappear, but I can't recreate the problem even after leaving buff uninitialized again.
So here are my questions that hopefully aren't too abstract:
Does fread look beyond the file when reading data and does it remove trailing garbage itself, or is it up to us?
Why was printf showing an int when the buffer is a short? (I assume %x is for ints) and why can't I replicate the behaviour even after leaving buff without initialization?
Should I always initialize the buffer to zero to remove trailing garbage? What's the usual approach here?
I hope these aren't too many, or too vague, questions, and that I was clear enough. Like I said, I don't know much about C but find low-mid level programming very interesting, especially when it comes to direct data bit/byte manipulation.
Hope you have a great day!
EDIT 1:
Some of you wisely suggested I use num_read instead of nmemb on the loop, since that's the return value of fread, but that means I'll discard the rest of the file (nmemb is 22 but num_read is 21). Is that the usual approach? Also, thank you for pointing out that %x was casting to unsigned int, hence the 4 bytes instead of 2.
EDIT 2:
For clarification, and since I mispoke in a comment, I'd like to keep the remaining byte (or data), while discarding the rest, which is undefined. I don't know if this is the usual approach since if I use num_read in the loop, whatever is leftover at the end is discarded, data or not. I'm more interested in knowing what the usual approach is: discard leftover data or remove anything that we know is undefined, in this case one of the bytes.

How to properly print file content to the command line in C?

I want to print the contents of a .txt file to the command line like this:
main() {
int fd;
char buffer[1000];
fd = open("testfile.txt", O_RDONLY);
read(fd, buffer, strlen(buffer));
printf("%s\n", buffer);
close(fd);
}
The file testfile.txt looks like this:
line1
line2
line3
line4
The function prints only the first 4 letters line.
When using sizeof instead of strlen the whole file is printed.
Why is strlen not working?
It is incorrect to use strlen at all in this program. Before the call to read, the buffer is uninitialized and applying strlen to it has undefined behavior. After the call to read, some number of bytes of the buffer are initialized, but the buffer is not necessarily a proper C string; strlen(buffer) may return a number having no relationship to the amount of data you should print out, or may still have UB (if read initialized the full length of the array with non-nul bytes, strlen will walk off the end). For the same reason, printf("%s\n", buffer) is wrong.
Your program also can't handle files larger than the buffer at all.
The right way to do this is by using the return value of read, and write, in a loop. To tell read how big the buffer is, you use sizeof. (Note: if you had allocated the buffer with malloc rather than as a local variable, then you could not use sizeof to get its size; you would have to remember the size yourself.)
#include <unistd.h>
#include <stdio.h>
int main(void)
{
char buf[1024];
ssize_t n;
while ((n = read(0, buf, sizeof buf)) > 0)
write(1, buf, n);
if (n < 0) {
perror("read");
return 1;
}
return 0;
}
Exercise: cope with short writes and write errors.
When using sizeof instead of strlen the whole file is printed. Why is
strlen not working?
Because how strlen works is it goes through the char array passed in and counts characters till it encounters 0. In your case, buffer is not initialized - hence it will try to access elements of uninitialized array (buffer) to look for 0, but reading uninitialized memory is not allowed in C. Actually you get undefined behavior.
sizeof works differently and returns the number of bytes of the passed object directly without looking for a 0 inside the array as strlen does.
As correctly noted in other answers read will not null terminate the string for you so you have to do it manually or declare buffer as:
char buffer[1000] = {0};
In this case printing such buffer using %s and printf after reading the file, will work, only assuming read didn't initialize full array with bytes of which none is 0.
Extra:
Null terminating a string means you append a 0 to it somewhere. This is how most of the string related functions guess where the string ends.
Why is strlen not working?
Because when you call it in read(fd, buffer, strlen(buffer));, you haven't yet assigned a valid string to buffer. It contains some indeterminate data which may or may not have a 0-valued element. Based on the behavior you report, buffer just so happens to have a 0 at element 4, but that's not reliable.
The third parameter tells read how many bytes to read from the file descriptor - if you want to read as many bytes as buffer is sized to hold, use sizeof buffer. read will return the number of bytes read from fd (0 for EOF, -1 for an error). IINM, read will not zero-terminate the input, so using strlen on buffer after calling read would still be an error.

Using fread() to parse data through a file and stdin

I am trying to write an algorithm that takes input from a file and builds what is called an "s1 record". (The functionality of this function is not important) Depending on the command line arguments, the program will set the inputFile to the specified file, or stdin if no file is provided.
The algorithm needs to be structured in a way that can handle both file patterns.
The idea of this is to take FILE* data and read it into a buffer of size 16 bytes. Every 16 bytes of data, an s1 record will be built. As long as there are 16 bytes to read then it works fine and dandy. Once there is a line with less than 16 bytes, it doesn't create an s1 record.
Ive tested the output and these are some of the things I noticed:
When I run the program using "stdin", I am prompted for user input. I enter 20 characters (which should print 16 in 1 srecord, and 4 in another) and my output is as follows:
12345678901234567890
Buffer: 1234567890123456
S113000031323334353637383930313233343536AA
When I run the program using a file (record.dat) with one single line with the characters of the alphabet on it, I get this:
Buffer: ABCDEFGHIJKLMNOP
Buffer: QRSTUVWXYZKLMNOP
This is not valid either, as it prints the "KLMNOP" at the end of the line as well.
My question is: How can I structure this to accept the input from either a file or stdin using the same algorithm, and what exactly am I doing wrong in my algorithm? I have tried providing all the useful information I can, and can specify more detail if requested. Below is the code for the algorithm I am trying to write.
inputFile is set to stdin if no file is specified
char buffer[kMaxLineSize + 1] = { '\0' };
char byte = 0;
int count = 1;
while((fread(buffer, 1, kMaxLineSize, inputFile)))
{
printf("%c", byte);
clearCRLF(buffer);
printf("Buffer: %s\n", buffer);
if(outputFormat == 1)
{
char s1Record[kMaxSRecordSize] = { 0 };
buildS1Record(addressField, s1Record, buffer);
fprintf(outputFile, "%s\n", s1Record);
addressField += strlen(buffer);
s1Count++;
}
else
{
char asmRecord[kMaxASMRecordSize] = { 0 };
buildAssemblyRecord(asmRecord, buffer);
fprintf(outputFile, "%s\n", asmRecord);
}
}
I'll try to combine the comments in this answer.
But first, what you call an s1 "record" is not a record. It is a string of maximum 16 characters and a terminating null character. A record, in my understanding, is a struct with fields of possibly different types, one of which could be a string.
The code fixes are as follows:
char buffer[kMaxLineSize + 1] = { '\0' };
int len;
while (len=fread(buffer, 1, kMaxLineSize, inputFile))
{
...
char s1Record[kMaxSRecordSize] = { 0 };
buildS1Record(addressField, s1Record, buffer, len);
so you pass the length read to the function. Now it can copy the characters read and terminate with a '\0' character. Note also that there can be a discrepancy between kMaxLineSize and kMaxSRecordSize: they must be the same size (plus 1 for \0), so better use a single variable.
I hope this late answer can still be of use to you.

Arduino serial message with unknown length

How can I store the result of Serial.readBytesUntil(character, buffer, length) in a buffer while I don't know the length of the incoming message ?
Here is a little code that makes use of realloc() to keep growing your buffer. You will have to free() when you're done with buf.
int length = 8;
char * buf = malloc(length);
int total_read = 0;
total_read = Serial.readBytesUntil(character, buf, length);
while(length == total_read) {
length *= 2;
buf = realloc(buf, length);
// Bug in this line:
// total_read += Serial.readBytesUntil(character, buf+total_read, length);
// Should be
total_read += Serial.readBytesUntil(character, buf+total_read, length-total_read);
}
*Edit: fixed a bug where readBytesUntil would have read off the end of buf by reading length bytes instead of length-total_read bytes.
make the buffer big enough for the message. Don't know the maximum length of the message? Use length to control the characters read, then continue reading until character encountered.
int bytesRead = Serial.readBytesUntil(character, buffer, length);
You could create a buffer that is just smaller than the remaining RAM and use that. The call to find the remaining ram (as I've posted elsewhere) is:
int freeRam () {
extern int __heap_start, *__brkval;
int v;
int fr = (int) &v - (__brkval == 0 ? (int) &__heap_start : (int) __brkval);
Serial.print("Free ram: ");
Serial.println(fr);
}
Regardless, you should make sure you only read into as much RAM as you actually have.
One answer is that when a program reads serial bytes it typically does NOT store them verbatim. Rather, the program examines each byte and determines what action to take next. This logic is typically implemented as Finite State Machine.
So, what does your specific serial stream represent? Can it be analyzed in sequential chunks? For example: "0008ABCDEFGH" says that 8 chars follow the 4 character length field. In this silly example your code would read 4 chars, then know how much space to allocate for the rest of the serial stream!

Resources