Why this fscanf() segfaults when a big file is used? - c

I have a function that receives the name of a file as an argument.
The idea is to read each word in the given file and save each one in a linked list (as a struct with a value and a pointer to the next struct).
I could get it working for small files, but when I give a big .txt file I get a segmentation fault.
Using gdb I could figure out that this happens at the while(fscanf(fi, "%s", value) != EOF){ line.
For some reason when the file is bigger the fscanf() segfaults.
As I could figure out the linked list part, here I pasted just enough code to compile and for you to see my problem.
So my question are:
Why fscanf() segfauts with big .txt files (thousands of words), but not with small file (ten words)?
By the way, is there a better way to check for the end of the file?
Thanks in advance.
bool read(const char* file){
// open file
FILE* fi = fopen(file, "r"); //file is a variable that contains the name of the file to be opened
if (fi == NULL)
{
return false;
}
// malloc for value
char* value = malloc(sizeof(int));
// fscanf() until the end of the file
while(fscanf(fi, "%s", value) != EOF){ // HERE IS MY PROBLEM
// some code for the linked list
// where the value will be saved at the linked list
}
// free space
free(value);
// close the file
fclose(fi);
return true;
}

No, here is your problem:
char* value = malloc(sizeof(int)); // <<<<<<< You allocate only place for an int
while(fscanf(fi, "%s", value) != EOF){ // <<<<<<< but you read a huge string
So you end up with a buffer overflow !
You have to make sure that you never overflow the size of your buffer by setting some limits. For example by using the width field of fscanf() to indicate max size of chars to be read for the string:
char* value = malloc(512); // Allocate your buffer
while(fscanf(fi, "%511s", value) != EOF){ // read max 511 chars + 1 char for terminating 0
...

(disclaimer: simplified explanation)
A char* is a pointer to an address of memory. It specifies that it points to an array of characters. A malloc call reserves a block of memory of a certain size.
Your line
char* value = malloc(sizeof(int));
creates a character array that can hold 4 characters (as an int is 4 bytes long generally). And for it to be a complete string the last character has to be a NULL terminator '\0', So really it can only hold 3 readable characters.
You should make that malloc create a block of memory that is larger than the biggest string in the file. Or you could use another safer method such as fgets : http://www.cplusplus.com/reference/cstdio/fgets/

Related

How can I print as strings the content of a .txt file?

Like I said in the title I don't know how to print all the content of a .txt file in C.
Here's an incomplete function that I did:
void
print_from_file(items_t *ptr,char filemane[25]){
char *string_temp;
FILE *fptr;
fptr=fopen(filemane, "r");
if(fptr){
while(!feof(fptr)){
string_temp=malloc(sizeof(char*));
fscanf(fptr,"\n %[a-z | A-Z | 0-9/,.€#*]",string_temp);
printf("%s\n",string_temp);
string_temp=NULL;
}
}
fclose(fptr);
}
I'm pretty sure that there's errors in the fscanf because sometimes it doesn't exit the loop.
Can anyone please correct this?
You're using malloc wrong. Passing sizeof(char*) to malloc means you are only giving your string the amount of memory it would take to hold a pointer to a character(array). So currently, by writing into memory you have not allocated, you have undefined behavior. It is also highly advisable to perform checks on file lenght and otherwise make sure that you do not write more into the string then you allocated to it.
Instead, do something like this:
string_temp=malloc(100*sizeof(char)); // Enough space for 99 characters (99 chars + '\0' terminator)
There are several things to fix in your code.
First of all, you should always check if a file has been opened correctly.
Example:
FILE *fp; //file pointer
if((fp = fopen("file.txt", "r") == NULL) { //check opening
printf("Could not open file"); //or use perror()
exit(0);
}
Also, remember that scanf() and fscanf() return the number of elements they have read. So, for example, if you scan the file one word at a time, you could simplify your program by looping while fscanf(..) == 1.
As a final note, remember to allocate dynamic memory correctly.
You do not want to allocate memory based on a pointer to char size, in fact, you're gonna wanna allocate 1 byte for each character of the string, + 1 for the terminator.
Example:
char name[55];
char * name2;
//To make them of the same size:
name2 = malloc(sizeof(*char)); **WRONG**
name2 = malloc(sizeof(char) * 55); //OK

How do I read lines from a file in C?

I need to read in a file. The first line of the file is the number of lines in the file and it returns an array of strings, with the last element being a NULL indicating the end of the array.
char **read_file(char *fname)
{
char **dict;
printf("Reading %s\n", fname);
FILE *d = fopen(fname, "r");
if (! d) return NULL;
// Get the number of lines in the file
//the first line in the file is the number of lines, so I have to get 0th element
char *size;
fscanf(d, "%s[^\n]", size);
int filesize = atoi(size);
// Allocate memory for the array of character pointers
dict = NULL; // Change this
// Read in the rest of the file, allocting memory for each string
// as we go.
// NULL termination. Last entry in the array should be NULL.
printf("Done\n");
return dict;
}
I put some comments because I know that's what I'm to do, but I can't seem to figure out how to put it in actual code.
To solve this problem you need to do one of two things.
Read the file as characters then convert to integers.
Read the file directly as integers.
For the first, you would use freed into a char array and then use atoi to convert to integer.
For the second, you would use fscanf and use the %d specify to read directly into an int variable;
fscanf does not allocate memory for you. Passing it a random pointer as you have will only cause trouble. (I recommend avoid fscanf).
The question code has a flaw:
char *size;
fscanf(d, "%s[^\n]", size);
Although the above may compile, it will not function as expected at runtime. The problem is that fscanf() needs the memory address of where to write the parsed value. While size is a pointer that can store a memory address, it is uninitialized, and points to no specific memory in the process' memory map.
The following may be a better replacement:
fscanf(d, " %d%*c", &filesize);
See my version of the spoiler code here

Storing a string in char* in C

In the code below, I hope you can see that I have a char* variable and that I want to read in a string from a file. I then want to pass this string back from the function. I'm rather confused by pointers so I'm not too sure what I'm supposed to do really.
The purpose of this is to then pass the array to another function to be searched for a name.
Unfortunately the program crashes as a result and I've no idea why.
char* ObtainName(FILE *fp)
{
char* temp;
int i = 0;
temp = fgetc(fp);
while(temp != '\n')
{
temp = fgetc(fp);
i++;
}
printf("%s", temp);
return temp;
}
Any help would be vastly appreciated.
fgetc returns an int, not a char*. This int is a character from the stream, or EOF if you reach the end of the file.
You're implicitly casting the int to a char*, i.e., interpreting it as an address (turn your warnings on.) When you call printf it reads that address and continues to read a character at a time looking for the null terminator which ends the string, but that address is almost certainly invalid. This is undefined behavior.
I've taken some liberties with what you wanted to accomplish. Rather that deal with pointers, you can just use a fixed sized array as long as you can set a maximum length. I've also included several checks so that you don't run off the end of the buffer or the end of the file. Also important is to make sure that you have a null termination '\0' at the end of the string.
#define MAX_LEN 100
char* ObtainName(FILE *fp)
{
static char temp[MAX_LEN];
int i = 0;
while(i < MAX_LEN-1)
{
if (feof(fp))
{
break;
}
temp[i] = fgetc(fp);
if (temp[i] == '\n')
{
break;
}
i++;
}
temp[i] = '\0';
printf("%s", temp);
return temp;
}
So, there are several problems here:
You're not setting aside any storage for the string contents;
You're not storing the string contents correctly;
You're attempting to read memory that doesn't belong to you;
The way you're attempting to return the string is going to give you heartburn.
1. You're not setting aside storage for the string contents
The line
char *temp;
declares temp as a pointer to char; its value will be the address of a single character value. Since it's declared at local scope without the static keyword, its initial value will be indeterminate, and that value may not correspond to a valid memory address.
It does not set aside any storage for the string contents read from fp; that would have to be done as a separate step, which I'll get to below.
2. You're not storing the string contents correctly
The line
temp = fgetc(fp);
reads the next character from fp and assigns it to temp. First of all, this means you're only storing the last character read from the stream, not the whole string. Secondly, and more importantly, you're assigning the result of fgetc() (which returns a value of type int) to an object of type char * (which is treated as an address). You're basically saying "I want to treat the letter 'a' as an address into memory." This brings us to...
3. You're attempting to read memory that doesn't belong to you
In the line
printf("%s", temp);
you're attempting to print out the string beginning at the address stored in temp. Since the last thing you wrote to temp was most likely a character whose value is < 127, you're telling printf to start at a very low and most likely not accessible address, hence the crash.
4. The way you're attempting to return the string is guaranteed to give you heartburn
Since you've defined the function to return a char *, you're going to need to do one of the following:
Allocate memory dynamically to store the string contents, and then pass the responsibility of freeing that memory on to the function calling this one;
Declare an array with the static keyword so that the array doesn't "go away" after the function exits; however, this approach has serious drawbacks;
Change the function definition;
Allocate memory dynamically
You could use dynamic memory allocation routines to set aside a region of storage for the string contents, like so:
char *temp = malloc( MAX_STRING_LENGTH * sizeof *temp );
or
char *temp = calloc( MAX_STRING_LENGTH, sizeof *temp );
and then return temp as you've written.
Both malloc and calloc set aside the number of bytes you specify; calloc will initialize all those bytes to 0, which takes a little more time, but can save your bacon, especially when dealing with text.
The problem is that somebody has to deallocate this memory when its no longer needed; since you return the pointer, whoever calls this function now has the responsibility to call free() when it's done with that string, something like:
void Caller( FILE *fp )
{
...
char *name = ObtainName( fo );
...
free( name );
...
}
This spreads the responsibility for memory management around the program, increasing the chances that somebody will forget to release that memory, leading to memory leaks. Ideally, you'd like to have the same function that allocates the memory free it.
Use a static array
You could declare temp as an array of char and use the static keyword:
static char temp[MAX_STRING_SIZE];
This will set aside MAX_STRING_SIZE characters in the array when the program starts up, and it will be preserved between calls to ObtainName. No need to call free when you're done.
The problem with this approach is that by creating a static buffer, the code is not re-entrant; if ObtainName called another function which in turn called ObtainName again, that new call will clobber whatever was in the buffer before.
Why not just declare temp as
char temp[MAX_STRING_SIZE];
without the static keyword? The problem is that when ObtainName exits, the temp array ceases to exist (or rather, the memory it was using is available for someone else to use). That pointer you return is no longer valid, and the contents of the array may be overwritten before you can access it again.
Change the function definition
Ideally, you'd like for ObtainName to not have to worry about the memory it has to write to. The best way to achieve that is for the caller to pass target buffer as a parameter, along with the buffer's size:
int ObtainName( FILE *fp, char *buffer, size_t bufferSize )
{
...
}
This way, ObtainName writes data into the location that the caller specifies (useful if you want to obtain multiple names for different purposes). The function will return an integer value, which can be a simple success or failure, or an error code indicating why the function failed, etc.
Note that if you're reading text, you don't have to read character by character; you can use functions like fgets() or fscanf() to read an entire string at a time.
Use fscanf if you want to read whitespace-delimited strings (i.e., if the input file contains "This is a test", fscanf( fp, "%s", temp); will only read "This"). If you want to read an entire line (delimited by a newline character), use fgets().
Assuming you want to read an individual string at a time, you'd use something like the following (assumes C99):
#define FMT_SIZE 20
...
int ObtainName( FILE *fp, char *buffer, size_t bufsize )
{
int result = 1; // assume success
int scanfResult = 0;
char fmt[FMT_SIZE];
sprintf( fmt, "%%%zus", bufsize - 1 );
scanfResult = fscanf( fp, fmt, buffer );
if ( scanfResult == EOF )
{
// hit end-of-file before reading any text
result = 0;
}
else if ( scanfResult == 0 )
{
// did not read anything from input stream
result = 0;
}
else
{
result = 1;
}
return result;
}
So what's this noise
char fmt[FMT_SIZE];
sprintf( fmt, "%%%zus", bufsize - 1 );
about? There is a very nasty security hole in fscanf() when you use the %s or %[ conversion specifiers without a maximum length specifier. The %s conversion specifier tells fscanf to read characters until it sees a whitespace character; if there are more non-whitespace characters in the stream than the buffer is sized to hold, fscanf will store those extra characters past the end of the buffer, clobbering whatever memory is following it. This is a common malware exploit. So we want to specify a maximum length for the input; for example, %20s says to read no more than 20 characters from the stream and store them to the buffer.
Unfortunately, since the buffer length is passed in as an argument, we can't write something like %20s, and fscanf doesn't give us a way to specify the length as an argument the way fprintf does. So we have to create a separate format string, which we store in fmt. If the input buffer length is 10, then the format string will be %10s. If the input buffer length is 1000, then the format string will be %1000s.
The following code expands on that in your question, and returns the string in allocated storage:
char* ObtainName(FILE *fp)
{
int temp;
int i = 1;
char *string = malloc(i);
if(NULL == string)
{
fprintf(stderr, "malloc() failed\n");
goto CLEANUP;
}
*string = '\0';
temp = fgetc(fp);
while(temp != '\n')
{
char *newMem;
++i;
newMem=realloc(string, i);
if(NULL==newMem)
{
fprintf(stderr, "realloc() failed.\n");
goto CLEANUP;
}
string=newMem;
string[i-1] = temp;
string[i] = '\0';
temp = fgetc(fp);
}
CLEANUP:
printf("%s", string);
return(string);
}
Take care to 'free()' the string returned by this function, or a memory leak will occur.

Reading string input from file in C

I have a text file called "graphics" which contains the words "deoxyribonucleic acid".
When I run this code it works and it returns the first character. "d"
int main(){
FILE *fileptr;
fileptr = fopen("graphics.txt", "r");
char name;
if(fileptr != NULL){ printf("hey \n"); }
else{ printf("Error"); }
fscanf( fileptr, "%c", &name);
printf("%c\n", name);
fclose( fileptr );
return 0;
}
When I am using the fscanf function the parameters I am sending are the name of the FILE object, the type of data the function will read, and the name of the object it is going to store said data, correct? Also, why is it that I have to put an & in front of name in fscanf but not in printf?
Now, I want to have the program read the file and grab the first word and store it in name.
I understand that this will have to be a string (An array of characters).
So what I did was this:
I made name into an array of characters that can store 20 elements.
char name[20];
And changed the parameters in fscanf and printf to this, respectively:
fscanf( fileptr, "%s", name);
printf("%s\n", name);
Doing so produces no errors from the compiler but the program crashes and I don't understand why. I am letting fscanf know that I want it to read a string and I am also letting printf know that it will output a string. Where did I go wrong? How would I accomplish said task?
This is a very common problem. fscanf reads data and copies it into a location you provide; so first of all, you need the & because you provide the address of the variable (not the value) - that way fscanf knows where to copy TO.
But you really want to use functions that copy "only as many characters as I have space". This is for example fgets(), which includes a "max byte count" parameter:
char * fgets ( char * str, int num, FILE * stream );
Now, if you know that you only allocated 20 bytes to str, you can prevent reading more than 20 bytes and overwriting other memory.
Very important concept!
A couple of other points. A variable declaration like
char myString[20];
results in myString being a pointer to 20 bytes of memory where you can put a string (remember to leave space for the terminating '\0'!). So you can usemyStringas thechar *argument infscanforfgets`. But when you try to read a single character, and that characters was declared as
char myChar;
You must create the pointer to the memory location "manually", which is why you end up with &myChar.
Note - if you want to read up to white space, fscanf is the better function to use; but it will be a problem if you don't make sure you have the right amount of space allocated. As was suggested in a comment, you could do something like this:
char myBuffer[20];
int count = fscanf(fileptr, "%19s ", myBuffer);
if(count != 1) {
printf("failed to read a string - maybe the name is too long?\n");
}
Here you are using the return value of fscanf (the number of arguments correctly converted). You are expecting to convert exactly one; if that doesn't work, it will print the message (and obviously you will want to do more than print a message…)
Not answer of your question but;
for more efficient memory usage use malloc instead of a static declaration.
char *myName // declara as pointer
myName = malloc(20) // same as char[20] above on your example, but it is dynamic allocation
... // do your stuff
free(myName) // lastly free up your allocated memory for myName

How to read the standard input into string variable until EOF in C?

I am getting "Bus Error" trying to read stdin into a char* variable.
I just want to read whole stuff coming over stdin and put it first into a variable, then continue working on the variable.
My Code is as follows:
char* content;
char* c;
while( scanf( "%c", c)) {
strcat( content, c);
}
fprintf( stdout, "Size: %d", strlen( content));
But somehow I always get "Bus error" returned by calling cat test.txt | myapp, where myapp is the compiled code above.
My question is how do i read stdin until EOF into a variable? As you see in the code, I just want to print the size of input coming over stdin, in this case it should be equal to the size of the file test.txt.
I thought just using scanf would be enough, maybe buffered way to read stdin?
First, you're passing uninitialized pointers, which means scanf and strcat will write memory you don't own. Second, strcat expects two null-terminated strings, while c is just a character. This will again cause it to read memory you don't own. You don't need scanf, because you're not doing any real processing. Finally, reading one character at a time is needlessly slow. Here's the beginning of a solution, using a resizable buffer for the final string, and a fixed buffer for the fgets call
#define BUF_SIZE 1024
char buffer[BUF_SIZE];
size_t contentSize = 1; // includes NULL
/* Preallocate space. We could just allocate one char here,
but that wouldn't be efficient. */
char *content = malloc(sizeof(char) * BUF_SIZE);
if(content == NULL)
{
perror("Failed to allocate content");
exit(1);
}
content[0] = '\0'; // make null-terminated
while(fgets(buffer, BUF_SIZE, stdin))
{
char *old = content;
contentSize += strlen(buffer);
content = realloc(content, contentSize);
if(content == NULL)
{
perror("Failed to reallocate content");
free(old);
exit(2);
}
strcat(content, buffer);
}
if(ferror(stdin))
{
free(content);
perror("Error reading from stdin.");
exit(3);
}
EDIT: As Wolfer alluded to, a NULL in your input will cause the string to be terminated prematurely when using fgets. getline is a better choice if available, since it handles memory allocation and does not have issues with NUL input.
Since you don't care about the actual content, why bother building a string? I'd also use getchar():
int c;
size_t s = 0;
while ((c = getchar()) != EOF)
{
s++;
}
printf("Size: %z\n", s);
This code will correctly handle cases where your file has '\0' characters in it.
Your problem is that you've never allocated c and content, so they're not pointing anywhere defined -- they're likely pointing to some unallocated memory, or something that doesn't exist at all. And then you're putting data into them. You need to allocate them first. (That's what a bus error typically means; you've tried to do a memory access that's not valid.)
(Alternately, since c is always holding just a single character, you can declare it as char c and pass &c to scanf. No need to declare a string of characters when one will do.)
Once you do that, you'll run into the issue of making sure that content is long enough to hold all the input. Either you need to have a guess of how much input you expect and allocate it at least that long (and then error out if you exceed that), or you need a strategy to reallocate it in a larger size if it's not long enough.
Oh, and you'll also run into the problem that strcat expects a string, not a single character. Even if you leave c as a char*, the scanf call doesn't make it a string. A single-character string is (in memory) a character followed by a null character to indicate the end of the string. scanf, when scanning for a single character, isn't going to put in the null character after it. As a result, strcpy isn't going to know where the end of the string is, and will go wandering off through memory looking for the null character.
The problem here is that you are referencing a pointer variable that no memory allocated via malloc, hence the results would be undefined, and not alone that, by using strcat on a undefined pointer that could be pointing to anything, you ended up with a bus error!
This would be the fixed code required....
char* content = malloc (100 * sizeof(char));
char c;
if (content != NULL){
content[0] = '\0'; // Thanks David!
while ((c = getchar()) != EOF)
{
if (strlen(content) < 100){
strcat(content, c);
content[strlen(content)-1] = '\0';
}
}
}
/* When done with the variable */
free(content);
The code highlights the programmer's responsibility to manage the memory - for every malloc there's a free if not, you have a memory leak!
Edit: Thanks to David Gelhar for his point-out at my glitch! I have fixed up the code above to reflect the fixes...of course in a real-life situation, perhaps the fixed value of 100 could be changed to perhaps a #define to make it easy to expand the buffer by doubling over the amount of memory via realloc and trim it to size...
Assuming that you want to get (shorter than MAXL-1 chars) strings and not to process your file char by char, I did as follows:
#include <stdio.h>
#include <string.h>
#define MAXL 256
main(){
char s[MAXL];
s[0]=0;
scanf("%s",s);
while(strlen(s)>0){
printf("Size of %s : %d\n",s,strlen(s));
s[0]=0;
scanf("%s",s);
};
}

Resources