Segfault during a sprintf() - c

So, I am currently working on System programming for my Unix OS class. All that this program should do is read a binary file and output the lines to a CSV file. I feel like i'm almost done but for some reason I keep getting a segfault.
To clarify:
fd1 = input file,
fd2 = output file,
numrecs = number of records from input file.
Somewhere in main():
for(i=0;i<numrecs;i++){
if((bin2csv(fd1, fd2)) == -1){
printf("Error converting data.\n");
}
}
int bin2csv(fd1, fd2){
bin_record rec;
char buffer[100];
int buflen;
strncpy(buffer,"\0", 100); /* fill buffer with NULL */
recs = &rec;
/* read in a record */
if((buflen = read(fd1, &recs, sizeof(recs))) < 0){
printf("Fatal Error: Data could not be read.\n");
return -1;
}
sprintf(buffer, "%d, %s, %s, %f, %d\n", recs->id, recs->lname, recs->fname, recs->gpa, recs->iq);
printf("%s\n", buffer);
write(fd2, buffer, sizeof(buffer));
return 0;
}
The segfault is occurring on the line "sprintf(buffer, etc..);" however, I cannot figure out why that is happening.
This is the error gdb spits out:
Program received signal SIGSEGV, Segmentation fault.
0x0000000100000c87 in bin2csv (fd1=3, fd2=4) at bin2csv.c:25
25 sprintf(buffer, "%d, %s, %s, %f, %d\n", recs->id, recs->lname,
recs->fname, recs->gpa, recs->iq);
Hopefully this is enough info. Thanks!

It looks like recs is a pointer. You are reading bytes directly into that pointer, like reading a raw memory address from file:
read(fd1, &recs, sizeof(recs))
And then you start using it in the call to sprintf... BOOM!
There is actually no reason to use it at all (is it a global?)... Even though you initialised it by recs = &rec, and assuming you don't trash it, it still will not contain valid address outside of that function. That's because rec is a local variable.
So, just read directly into rec like this:
read(fd1, &rec, sizeof(rec))
And then on the sprintf line, you use rec.id instead of recs->id (etc).

I see a few issues here:
sprintf does nothing to prevent writing past the end of the string buffer. In fact it has no knowledge of the length of that buffer (100 bytes in your case). Since you have setup the buffer in the stack, if sprintf over-runs your buffer (which it could do with long first or last names or garbage strings as input) your stack will be corrupted and a seg fault is likely. You may want to consider including logic to ensure that sprintf will not exceed the amount of buffer space you have. Or better yet avoid sprintf altogether (more on that below)
You are not handling end-of-file in the code provided. For end of file, read returns 0. If you pass bad pointers to sprintf, it will fail.
The functions that you are using are the UNIX derived ones (part of POSIX but decidedly low level) that use small integers as file descriptors. I would recommend using the FILE * based ones instead. The I/O functions of interest would be fopen, fclose, fprintf, fwrite, etc. This would eliminate the need to use sprintf.
See this previous question for more information.

if((buflen = read(fd1, &recs, sizeof(recs))) < 0){
Use <= 0 rather than < 0, else when the return value is 0, sprintf(buffer ... may seg fault as it tries to de-reference recs->id which has an uninitialized value.

You have some problems:
1) structure of bin_record. It has char[] and it is possible to overflow.
2) in sprintf you cannot set buffer max size. it is better to use snprintf like this:
sprintf(buffer, 100, "%d, %s, %s, %f, %d\n", recs->id, recs->lname, recs->fname, recs->gpa, recs->iq);
3) to fill buffer with null us this:
memset (buffer,'\0',100);

Related

How to avoid calling fopen() with a buffer that is not null-terminated in C?

Let's look at this example:
static FILE *open_file(const char *file_path)
{
char buf[80];
size_t n = snprintf(buf, sizeof (buf), "%s", file_path);
assert(n < sizeof (buf));
return fopen(buf, "r");
}
Here, the assert() is off-by-one. From the manpage for snprintf:
"Upon successful return, these functions return the number of characters printed (excluding the null byte used to end output to strings)."
So, if it returns 80, then the string will fill the buffer, and won't be terminated by \0. This will cause a problem because fopen() assumes it is null terminated.
What is the best way to prevent this?
So, if it returns 80, then the string will fill the buffer, and won't be terminated by \0
That is incorrect: the string would be null-terminated no matter what you pass for file_path. Obviously, the string would be cut off at the sizeof(buf)-1.
Note that snprintf could return a number above 80 as well. This would mean that the string you wanted to print was longer than the buffer you have provided.
What is the best way to prevent this?
You are already doing it: the assert is not necessary for preventing unterminated strings. You can use the return value to decide if any truncation has happened, and pass a larger buffer to compensate:
// Figure out the size
size_t n = snprintf(NULL, 0, "%s", file_path);
// Allocate the buffer and print into it
char *tmpBuf = malloc(n+1);
snprintf(tmpBuf, n+1, "%s", file_path);
// Prepare the file to return
FILE *res = fopen(tmpBuf, "r");
// Free the temporary buffer
free(tmpBuf);
return res;
What is the best way to prevent this?
Simple, don't give it a non-null terminated string. Academic questions aside, you are in control of the code you write. You don't have to protect against yourself sabotaging the project in every conceivable way, you just have to not sabotage yourself.
If everyone checked and double checked everything in code, the performance loss would be incredible. There's a reason why fopen doesn't do it.
There are a couple of issues here.
First assert() is used to catch issues as a part of designer testing. It is not meant to be used in production code.
Secondly if the file path is not complete then do you really want to call fopen()?
Normally what is done is to add one to the expected number of characters.
static FILE *open_file(const char *file_path)
{
char buf[80 + 1] = {0};
size_t n = snprintf(buf, 80, "%s", file_path);
assert(n < sizeof (buf));
return fopen(buf, "r");
}

How to properly print file content to the command line in C?

I want to print the contents of a .txt file to the command line like this:
main() {
int fd;
char buffer[1000];
fd = open("testfile.txt", O_RDONLY);
read(fd, buffer, strlen(buffer));
printf("%s\n", buffer);
close(fd);
}
The file testfile.txt looks like this:
line1
line2
line3
line4
The function prints only the first 4 letters line.
When using sizeof instead of strlen the whole file is printed.
Why is strlen not working?
It is incorrect to use strlen at all in this program. Before the call to read, the buffer is uninitialized and applying strlen to it has undefined behavior. After the call to read, some number of bytes of the buffer are initialized, but the buffer is not necessarily a proper C string; strlen(buffer) may return a number having no relationship to the amount of data you should print out, or may still have UB (if read initialized the full length of the array with non-nul bytes, strlen will walk off the end). For the same reason, printf("%s\n", buffer) is wrong.
Your program also can't handle files larger than the buffer at all.
The right way to do this is by using the return value of read, and write, in a loop. To tell read how big the buffer is, you use sizeof. (Note: if you had allocated the buffer with malloc rather than as a local variable, then you could not use sizeof to get its size; you would have to remember the size yourself.)
#include <unistd.h>
#include <stdio.h>
int main(void)
{
char buf[1024];
ssize_t n;
while ((n = read(0, buf, sizeof buf)) > 0)
write(1, buf, n);
if (n < 0) {
perror("read");
return 1;
}
return 0;
}
Exercise: cope with short writes and write errors.
When using sizeof instead of strlen the whole file is printed. Why is
strlen not working?
Because how strlen works is it goes through the char array passed in and counts characters till it encounters 0. In your case, buffer is not initialized - hence it will try to access elements of uninitialized array (buffer) to look for 0, but reading uninitialized memory is not allowed in C. Actually you get undefined behavior.
sizeof works differently and returns the number of bytes of the passed object directly without looking for a 0 inside the array as strlen does.
As correctly noted in other answers read will not null terminate the string for you so you have to do it manually or declare buffer as:
char buffer[1000] = {0};
In this case printing such buffer using %s and printf after reading the file, will work, only assuming read didn't initialize full array with bytes of which none is 0.
Extra:
Null terminating a string means you append a 0 to it somewhere. This is how most of the string related functions guess where the string ends.
Why is strlen not working?
Because when you call it in read(fd, buffer, strlen(buffer));, you haven't yet assigned a valid string to buffer. It contains some indeterminate data which may or may not have a 0-valued element. Based on the behavior you report, buffer just so happens to have a 0 at element 4, but that's not reliable.
The third parameter tells read how many bytes to read from the file descriptor - if you want to read as many bytes as buffer is sized to hold, use sizeof buffer. read will return the number of bytes read from fd (0 for EOF, -1 for an error). IINM, read will not zero-terminate the input, so using strlen on buffer after calling read would still be an error.

Why buffer contain more data when use fread function (C programming)

I wrote a program that copy content from a file to another but when I used fread() to read data from a file and put into buffer it turn out it have more data than the text file
Here's my code
char *buffer;
int size;
FILE *fp1;
fp1 = fopen(src, "r");
if (fp1 == NULL) {
err = errno;
fprintf(stderr, "Value of errno: %d\n", errno);
fprintf(stderr, "Error opening file: %s\n", strerror( err ));
return 0;
}else{
fseek(fp1, 0, SEEK_END);
size = ftell(fp1);
buffer = (char *) malloc(size +1 );
printf("data in Buffer : %s\n",buffer);
printf("size : %d\n",size);
fseek(fp1, 0, SEEK_SET);
fread(buffer,size,1,fp1);
strcat(buffer,"\0");
printf("data in Buffer after fread(): %s\n",buffer);
int a = strlen(buffer);
printf("strlen in Buffer : %d\n",a);
fclose(fp1);
}
FILE *fp2;
fp2 = fopen("disk1.img", "a");
if (fp2 == NULL) {
err = errno;
fprintf(stderr, "Value of errno: %d\n", errno);
fprintf(stderr, "Error opening file: %s\n", strerror( err ));
}else{
rewind(fp2);
printf("data in Buffer before write to destination : %s\n",buffer);
fclose(fp2);
}
source file contain
test kub test ah hahaha 5
Result
data in Buffer : �
size : 26
data in Buffer after fread(): test kub test
ah hahaha 5
U*
strlen in Buffer : 30
data in Buffer before write to destination : test kub test
ah hahaha 5
U*
The file size is 26 bytes I specify 26 bytes in fread() but in turns out buffer contain 30 characters
I use fread() because I have to write data in specific position in destination file also I added "\0" after fread() because I though it could help but it didn't work
**This is second time I face this problem.First time I specific amount of byte when read data from buffer to solve this problem but now I want to know
Why buffer keep more data than the source file and How to fix it.
--------------------Update----------------------------
I read all comment then
I followed user2225104 suggestion and It worked !
I replaced strcat(buffer,"\0"); with buffer[size] = '\0';
Thank you all for your answer it makes me know c programming better.
Result
data in Buffer : 0u
size : 26
data in Buffer after fread(): test kub test
ah hahaha 5
strlen in Buffer : 26
data in Buffer before write to destination : test kub test
ah hahaha 5
The problem is your attempt to 0-terminate and turn the block of chars into a c-string.
strcat(buffer,"\0");
only works if the first string is already 0-terminated. If it were, you would not need it. As you say yourself, your supposed string length is larger than your buffer. So you read some random 0 value behind your buffers end and then overwrite memory 1 byte behind it with your strcat() operation.
buffer[size] = '\0';
This way to do it does not assume buffer is a 0-terminated string and will not hamper with memory outside buffer.
On a side note, malloc() can return NULL. Best make it a habit to ALWAYS check the results of heap operation functions, just as checking results on file operations (e.g. fopen()). Basically anything which can go wrong at run-time and is not an invariant should be checked.
There's two kinds of strings in the programming world:
the Pascal kind of string (used by managed languages like C# and Java), where the size of the string is stored as an integer separately
the C kind of strings, where the size is indicated by a terminating "special" character
There's pros and cons for each of them, but the most important thing is that C style strings can't hold binary data -- the terminating character chosen by C is a valid character in a file (obviously).
So instead you emulate Pascal strings and call them "buffers", basically vectors of characters of some kind, with the size stored manually. You can see it in your malloc call, and again in your fread. Then you sort of black out and forget you wrote it and stop using it, but the size is still there, it's not part of the string.
Instead of printing it with printf (which expects null terminated C strings), you should use a character buffer function like fwrite to write it, and give it the size as an argument. Instead you're printing memory past what you allocated (since it doesn't end with 0), buffer overruning yourself. Generally hackers don't need your help, if they put their mind to it, they'll do it themselves :)
As a side note, you don't need size+1 characters -- there's no terminator as explained.
It's because your code is invalid.
fread(buffer,size,1,fp1);
Here you are ignoring the count returned by fread(), which tells you how many bytes have just been read into the buffer.
strcat(buffer,"\0");
Here you are pointlessly appending a null character after the first null character in the buffer. Remove it.
printf("data in Buffer after fread(): %s\n",buffer);
Here again you are ignoring the count. Assuming you used int count = fread(...), this line should be
printf("data in Buffer after fread(): %.*s\n",count,buffer);
Then:
int a = strlen(buffer);
This line is pointless. You shouldn't assume that I/O operations result in null-terminated C strings. There's nothing anywhere that guarantees that. Instead, you should use the count again. So
printf("strlen in Buffer : %d\n",a);
should be
printf("byte count in Buffer : %d\n",count);

Reading strings and integers from one binary text in C

I'm using C and I want to read from a binaryFile.
I know that it is contain strings in the following way: Length of a string, the string itself, the length of a string, string itself, and so on...
I want to count the number of times which the string Str appears in the binary file.
So I want to do something like this:
int N;
while (!feof(file)){
if (fread(&N, sizeof(int), 1, file)==1)
...
Now I need to get the string itself. I know it's length. Should I do a 'for'
loop and get with fgetc char by char? I know I'm not allowed to use fscanf since
it's not a text file, but can I use fgetc? And would I get what I'm expecting for
my string? (To use dynamic allocation for char* for it with the size of the length
and use strcpy to add it to the current string?)
You could allocate some memory with malloc then fread into that buffer:
char *str;
/* ... */
if (fread(&N, sizeof(int), 1, file)==1)
{
/* check that N > 0 */
str = malloc(N+1);
if (fread(str, sizeof(char), N, file) == N)
{
str[N] = '\0'; /* terminate str */
printf("Read %d chars: %s\n", N, str);
}
free(str);
}
You should probably loop on:
while (fread(&N, sizeof(int), 1, file) == 1)
{
// Check N for sanity
char *buffer = malloc(N+1);
// Check malloc succeeded
if (fread(buffer, N, 1, file) != 1)
...process error...
buffer[N] = '\0'; // Null terminate for sanity's sake
...store buffer (the pointer) for later processing so you aren't leaking...
...or free it if you won't need it later...
}
You could use getc() or fgetc() in a loop; that would work. However, the direct fread() is much simpler (and is coded as if it uses getc() in a loop).
You might want to do some sanity checking on N before blindly using it with malloc(). In particular, negative values are likely to lead to much unhappiness.
The file format as written is tied to one class of machine — either big-endian or little-endian, and with the fixed size of int (probably 32-bits). Writing more portable data is slightly fiddlier, but eminently doable — but probably not relevant to you just yet.
Using feof() is seldom the correct way to test for whether to continue with a loop. Indeed, there is not often a need to use feof() in code. When it is used, it is because an I/O operation 'failed' and you need to disambiguate between 'it was not an error — just EOF' and 'there was some sort of error on the device'.

Cannot read binary video files in GNU/Linux

I'm stuck with an apparently harmless piece of code. I'm trying to read a whole flv video file into a uint8_t array, but by no reason only the 10 first bytes are read.
contents = malloc(size + 1);
if (read(fd, contents, size) < 0)
{
free(contents);
log_message(WARNING, __func__, EMSG_READFILE);
return (NULL);
}
I've tried with fopen and "rb" also, but seems that Glibc ignores that extra 'b' or something. Any clues?
Thanks in advance.
Edit: Maybe it reads a EOF character?
PS. 'size' is a variable containing the actual file size using stat().
It seems the original code correctly reads the entire content.
The problem seems to be in making use of that binary data - printing it out will truncate at the first null, making it appear that only 10 bytes are present.
You can't use any methods intended for strings or character arrays to output binary data, as they will truncate at the first null byte, making it appear the array is shorter than it really is.
Check out some other questions related to viewing hex data:
how do I print an unsigned char as hex in c++ using ostream?
Converting binary data to printable hex
If you want to append this to a string - in what format? hex? base64? Raw bytes won't work.
Here's the original code I posted. A few minor improvements, plus some better diagnostic code:
int ret, size = 4096; /* Probably needs to be much bigger */
uint8_t *contents;
contents = malloc(size + 1);
if(contents == NULL)
{
log_message(WARNING, __func__, EMSG_MEMORY);
return (NULL);
}
ret = read(fd, contents, size);
if(ret < 0)
{
/* Error reading file */
free(contents);
log_message(WARNING, __func__, EMSG_READFILE);
return (NULL);
}
for(i = 0;i < ret;++i)
{
printf("%c", contents[i]);
/* printf("%0.2X", (char) contents[i]); /* Alternatively, print in hex */
}
Now, is ret really 10? Or do you just get 10 bytes when you try to print the output?
The 'read()' function in the C library doesn't necessarily return the whole read in one shot. In fact, if you're reading very much data at all, it usually doesn't give it to you in a single call.
The solution to this is to call read() in a loop, continuing to ask for more data until you've got it all, or until read returns an error, indicated by a negative return value, or end-of-file, indicated by a zero return value.
Something like the following (untested):
contents = malloc(size + 1);
bytesread = 0;
pos = 0;
while (pos < size && (bytesread = read(fd, contents + pos, size - pos)) > 0)
{
pos += bytesread;
}
if (bytesread < 0)
{
free(contents);
log_message(WARNING, __func__, EMSG_READFILE);
return (NULL);
}
/* Go on to use 'contents' now, since it's been filled. Should probably
check that pos == size to make sure the file was the size you expected. */
Note that most C programmers would do this a little differently, probably making 'pos' a pointer which gets moved along, rather than offsetting from 'contents' each time through the loop. But I thought this approach might be clearer.
On success, read() returns the number of bytes read (which may be less than what you asked for, at which point you should ask for the rest.) On EOF it will return 0 and on error it will return -1. There are some errors for which you might want to consider re-issuing the read (eg. EINTR which happens when you get a signal during a read.)

Resources