How to properly print file content to the command line in C? - c

I want to print the contents of a .txt file to the command line like this:
main() {
int fd;
char buffer[1000];
fd = open("testfile.txt", O_RDONLY);
read(fd, buffer, strlen(buffer));
printf("%s\n", buffer);
close(fd);
}
The file testfile.txt looks like this:
line1
line2
line3
line4
The function prints only the first 4 letters line.
When using sizeof instead of strlen the whole file is printed.
Why is strlen not working?

It is incorrect to use strlen at all in this program. Before the call to read, the buffer is uninitialized and applying strlen to it has undefined behavior. After the call to read, some number of bytes of the buffer are initialized, but the buffer is not necessarily a proper C string; strlen(buffer) may return a number having no relationship to the amount of data you should print out, or may still have UB (if read initialized the full length of the array with non-nul bytes, strlen will walk off the end). For the same reason, printf("%s\n", buffer) is wrong.
Your program also can't handle files larger than the buffer at all.
The right way to do this is by using the return value of read, and write, in a loop. To tell read how big the buffer is, you use sizeof. (Note: if you had allocated the buffer with malloc rather than as a local variable, then you could not use sizeof to get its size; you would have to remember the size yourself.)
#include <unistd.h>
#include <stdio.h>
int main(void)
{
char buf[1024];
ssize_t n;
while ((n = read(0, buf, sizeof buf)) > 0)
write(1, buf, n);
if (n < 0) {
perror("read");
return 1;
}
return 0;
}
Exercise: cope with short writes and write errors.

When using sizeof instead of strlen the whole file is printed. Why is
strlen not working?
Because how strlen works is it goes through the char array passed in and counts characters till it encounters 0. In your case, buffer is not initialized - hence it will try to access elements of uninitialized array (buffer) to look for 0, but reading uninitialized memory is not allowed in C. Actually you get undefined behavior.
sizeof works differently and returns the number of bytes of the passed object directly without looking for a 0 inside the array as strlen does.
As correctly noted in other answers read will not null terminate the string for you so you have to do it manually or declare buffer as:
char buffer[1000] = {0};
In this case printing such buffer using %s and printf after reading the file, will work, only assuming read didn't initialize full array with bytes of which none is 0.
Extra:
Null terminating a string means you append a 0 to it somewhere. This is how most of the string related functions guess where the string ends.

Why is strlen not working?
Because when you call it in read(fd, buffer, strlen(buffer));, you haven't yet assigned a valid string to buffer. It contains some indeterminate data which may or may not have a 0-valued element. Based on the behavior you report, buffer just so happens to have a 0 at element 4, but that's not reliable.
The third parameter tells read how many bytes to read from the file descriptor - if you want to read as many bytes as buffer is sized to hold, use sizeof buffer. read will return the number of bytes read from fd (0 for EOF, -1 for an error). IINM, read will not zero-terminate the input, so using strlen on buffer after calling read would still be an error.

Related

System call read result display random characters after the result

I've recieved this assignment where I have to read from a file.txt(max size 4096B) four times, basically splitting it in 4 strings of equal size. I have to fill this structure(just consider field 'msg', i think the problem is there):
struct message {
long mtype
int nclient;
int pid;
char path[151];
char msg[1025];
};
I used an array of 4 struct message to store all 4 parts
This is my read:
struct message msgs[4];
for (int i = 0; i < 4; i++) {
msgs[i].nclient=pos+1;
msgs[i].mtype = 42;
msgs[i].pid = getpid();
strcpy(msgs[i].path, filespath[pos]);
if (read(fd, msgs[i].msg, nMsgSize[i]) == -1)
ErrExit("read failed");
printf("I've read: %s\nMSGSize: %d\nPath: %s\n",msgs[i].msg, nMsgSize[i], msgs[i].path);
}
I tested it on a file "sendme_5.txt" that has this text in it:
ABCD
And this is my output:
I've read: A MSGSize: 1 Path:
/home/luca/Desktop/system_call_meh/myDir/joe_bastianich/bruno_barbieri/sendme_5.txt
I've read: BP"�> MSGSize: 1 Path:
/home/luca/Desktop/system_call_meh/myDir/joe_bastianich/bruno_barbieri/sendme_5.txt
I've read: C#��;�U MSGSize: 1 Path:
/home/luca/Desktop/system_call_meh/myDir/joe_bastianich/bruno_barbieri/sendme_5.txt
I've read: D�.�>� MSGSize: 1 Path:
/home/luca/Desktop/system_call_meh/myDir/joe_bastianich/bruno_barbieri/sendme_5.txt
If i try to read the full file without dividing it in 4(with only one read), it displays it correctly.
The problem started when i changed the field char path[151]. We had to set the max size to 151 from PATH_MAX(4096) after a change in the assignment, but i dont know if it's related.
What is the problem here?
As stated above, read does not know what a null-terminated string is. It deals with raw bytes, making no assumptions about the data it reads.
As is, your strings are possibly not null-terminated. printf("%s", msgs[i].msg) might continue past the the end of the read data, possibly past the end of the buffer, searching for a null-terminating byte. Unless the data read happens to contain a null-terminating byte, or the buffer was zeroed-out beforehand (and not completely filled by read), this is Undefined Behaviour.
On success, read returns the number of bytes read into the buffer. This may be less than requested. The return value is of type ssize_t.
When using this system call to populate string buffers, the return value can be used to index and place the null-terminating byte. An additional byte should always be reserved for this case (that is, always read at most the size of the buffer minus one: char buf[256]; read(fd, buf, 255)).
Always check for error, or the return value of -1 will index the buffer out-of-bounds.
Assuming nMsgSize[i] is the exact size of the msgs[i].msg buffer:
ssize_t n;
if (-1 == (n = read(fd, msgs[i].msg, nMsgSize[i] - 1)))
ErrExit("read failed");
msgs[i].msg[n] = 0;
printf("READ:%zd/%d expected bytes, MSG:<<%s>>\n", n, nMsgSize[i] - 1, msgs[i].msg);

Why buffer contain more data when use fread function (C programming)

I wrote a program that copy content from a file to another but when I used fread() to read data from a file and put into buffer it turn out it have more data than the text file
Here's my code
char *buffer;
int size;
FILE *fp1;
fp1 = fopen(src, "r");
if (fp1 == NULL) {
err = errno;
fprintf(stderr, "Value of errno: %d\n", errno);
fprintf(stderr, "Error opening file: %s\n", strerror( err ));
return 0;
}else{
fseek(fp1, 0, SEEK_END);
size = ftell(fp1);
buffer = (char *) malloc(size +1 );
printf("data in Buffer : %s\n",buffer);
printf("size : %d\n",size);
fseek(fp1, 0, SEEK_SET);
fread(buffer,size,1,fp1);
strcat(buffer,"\0");
printf("data in Buffer after fread(): %s\n",buffer);
int a = strlen(buffer);
printf("strlen in Buffer : %d\n",a);
fclose(fp1);
}
FILE *fp2;
fp2 = fopen("disk1.img", "a");
if (fp2 == NULL) {
err = errno;
fprintf(stderr, "Value of errno: %d\n", errno);
fprintf(stderr, "Error opening file: %s\n", strerror( err ));
}else{
rewind(fp2);
printf("data in Buffer before write to destination : %s\n",buffer);
fclose(fp2);
}
source file contain
test kub test ah hahaha 5
Result
data in Buffer : �
size : 26
data in Buffer after fread(): test kub test
ah hahaha 5
U*
strlen in Buffer : 30
data in Buffer before write to destination : test kub test
ah hahaha 5
U*
The file size is 26 bytes I specify 26 bytes in fread() but in turns out buffer contain 30 characters
I use fread() because I have to write data in specific position in destination file also I added "\0" after fread() because I though it could help but it didn't work
**This is second time I face this problem.First time I specific amount of byte when read data from buffer to solve this problem but now I want to know
Why buffer keep more data than the source file and How to fix it.
--------------------Update----------------------------
I read all comment then
I followed user2225104 suggestion and It worked !
I replaced strcat(buffer,"\0"); with buffer[size] = '\0';
Thank you all for your answer it makes me know c programming better.
Result
data in Buffer : 0u
size : 26
data in Buffer after fread(): test kub test
ah hahaha 5
strlen in Buffer : 26
data in Buffer before write to destination : test kub test
ah hahaha 5
The problem is your attempt to 0-terminate and turn the block of chars into a c-string.
strcat(buffer,"\0");
only works if the first string is already 0-terminated. If it were, you would not need it. As you say yourself, your supposed string length is larger than your buffer. So you read some random 0 value behind your buffers end and then overwrite memory 1 byte behind it with your strcat() operation.
buffer[size] = '\0';
This way to do it does not assume buffer is a 0-terminated string and will not hamper with memory outside buffer.
On a side note, malloc() can return NULL. Best make it a habit to ALWAYS check the results of heap operation functions, just as checking results on file operations (e.g. fopen()). Basically anything which can go wrong at run-time and is not an invariant should be checked.
There's two kinds of strings in the programming world:
the Pascal kind of string (used by managed languages like C# and Java), where the size of the string is stored as an integer separately
the C kind of strings, where the size is indicated by a terminating "special" character
There's pros and cons for each of them, but the most important thing is that C style strings can't hold binary data -- the terminating character chosen by C is a valid character in a file (obviously).
So instead you emulate Pascal strings and call them "buffers", basically vectors of characters of some kind, with the size stored manually. You can see it in your malloc call, and again in your fread. Then you sort of black out and forget you wrote it and stop using it, but the size is still there, it's not part of the string.
Instead of printing it with printf (which expects null terminated C strings), you should use a character buffer function like fwrite to write it, and give it the size as an argument. Instead you're printing memory past what you allocated (since it doesn't end with 0), buffer overruning yourself. Generally hackers don't need your help, if they put their mind to it, they'll do it themselves :)
As a side note, you don't need size+1 characters -- there's no terminator as explained.
It's because your code is invalid.
fread(buffer,size,1,fp1);
Here you are ignoring the count returned by fread(), which tells you how many bytes have just been read into the buffer.
strcat(buffer,"\0");
Here you are pointlessly appending a null character after the first null character in the buffer. Remove it.
printf("data in Buffer after fread(): %s\n",buffer);
Here again you are ignoring the count. Assuming you used int count = fread(...), this line should be
printf("data in Buffer after fread(): %.*s\n",count,buffer);
Then:
int a = strlen(buffer);
This line is pointless. You shouldn't assume that I/O operations result in null-terminated C strings. There's nothing anywhere that guarantees that. Instead, you should use the count again. So
printf("strlen in Buffer : %d\n",a);
should be
printf("byte count in Buffer : %d\n",count);

Reading strings and integers from one binary text in C

I'm using C and I want to read from a binaryFile.
I know that it is contain strings in the following way: Length of a string, the string itself, the length of a string, string itself, and so on...
I want to count the number of times which the string Str appears in the binary file.
So I want to do something like this:
int N;
while (!feof(file)){
if (fread(&N, sizeof(int), 1, file)==1)
...
Now I need to get the string itself. I know it's length. Should I do a 'for'
loop and get with fgetc char by char? I know I'm not allowed to use fscanf since
it's not a text file, but can I use fgetc? And would I get what I'm expecting for
my string? (To use dynamic allocation for char* for it with the size of the length
and use strcpy to add it to the current string?)
You could allocate some memory with malloc then fread into that buffer:
char *str;
/* ... */
if (fread(&N, sizeof(int), 1, file)==1)
{
/* check that N > 0 */
str = malloc(N+1);
if (fread(str, sizeof(char), N, file) == N)
{
str[N] = '\0'; /* terminate str */
printf("Read %d chars: %s\n", N, str);
}
free(str);
}
You should probably loop on:
while (fread(&N, sizeof(int), 1, file) == 1)
{
// Check N for sanity
char *buffer = malloc(N+1);
// Check malloc succeeded
if (fread(buffer, N, 1, file) != 1)
...process error...
buffer[N] = '\0'; // Null terminate for sanity's sake
...store buffer (the pointer) for later processing so you aren't leaking...
...or free it if you won't need it later...
}
You could use getc() or fgetc() in a loop; that would work. However, the direct fread() is much simpler (and is coded as if it uses getc() in a loop).
You might want to do some sanity checking on N before blindly using it with malloc(). In particular, negative values are likely to lead to much unhappiness.
The file format as written is tied to one class of machine — either big-endian or little-endian, and with the fixed size of int (probably 32-bits). Writing more portable data is slightly fiddlier, but eminently doable — but probably not relevant to you just yet.
Using feof() is seldom the correct way to test for whether to continue with a loop. Indeed, there is not often a need to use feof() in code. When it is used, it is because an I/O operation 'failed' and you need to disambiguate between 'it was not an error — just EOF' and 'there was some sort of error on the device'.

C reading from file, it reads "#"

I am trying to read a file in C. But when I read, and write it to stdout it prints # also which there is no in my file. What is the reason?
#include <stdio.h>
int main() {
FILE *fp;
int br;
char buffer[10];
int i;
fp = fopen("a.txt","r");
while(1) {
br = fread(buffer,1,10,fp);
printf("%s",buffer);
if (br==0)
break;
}
}
Output:
1234567891#2345678912#3456789
12#3456789
12#
The file:
123456789123456789123456789
Your fread call reads up to 10 bytes correctly, but printf with %s requires string to be null terminated. You can fix it by increasing size of the buffer to be 11 bytes and after every call to fread write zero at the end of data, i.e. buffer[br] = 0;.
The other way to go is to tell printf what is the size of your data by calling printf("%.*s", br, buffer);. You don't need to modify your buffer array then.
Dynamically allocate your buffer and have it be initialized to zeros like this:
char *buffer = calloc(1, 11);
<do your read loop>
free(buffer)
This way you get the zero byte at the end which will terminate the string when printing it. When C prints a string it expects it to be terminated by a NULL (or 0) byte.

Terminate string full of garbage?

Does C allow to place a string terminator at the end of read bytes full of garbage or is it only guaranteed if the read bytes are chars ?
I need to read something like this from stdin but I do not know how many chars to read and EOF is not guaranteed:
Hello World!---full of garbage until 100th byte---
char *var = malloc(100 + 1);
read(0, var, 100); // read from stdin. Unfortunately, I do not know how many bytes to read and stdin is not guaranteed to hold an EOF. (I chose 100 as an educated guess.)
var[100] = '\0'; // Is it possible to place a terminator at the end if most of the read bytes are garbage ?
read() returns the number of characters that were actually read into the buffer (or <0 in the case of an error). Hence the following should work:
int n;
char *var = malloc(100 + 1);
n = read(0, var, 100);
if(n >= 0)
var[n] = '\0';
else
/* error */
It is possible to place a terminator at the end, but the end result might be Hello World! and a long string of garbage after that.
Bytes are always chars. If you wanted to accept only printable characters (which the garbage at the end might contain, anyway) you could read the input one character at a time and check if each byte's value is between 0x20 and 0x7E.
Although that's only guaranteed to work with ASCII strings...

Resources