fgets() not reading from a text file? - c

I have a function loadsets() (short for load settings) which is supposed to load settings from a text file named Progsets.txt. loadsets() returns 0 on success, and -1 when a fatal error is detected. However, the part of the code which actually reads from Progsets.txt, (the three fgets()), seem to all fail and return the null pointer, hence not loading anything at all but a bunch of nulls. Is there something wrong with my code? fp is a valid pointer when I ran the code, and I was able to open it for reading. So what's wrong?
This code is for loading the default text color of my very basic text editor program using cmd.
headers:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <Windows.h>
#define ARR_SIZE 100
struct FINSETS
{
char color[ARR_SIZE + 1];
char title[ARR_SIZE + 1];
char maxchars[ARR_SIZE + 1];
} SETTINGS;
loadsets():
int loadsets(int* pMAXCHARS) // load settings from a text file
{
FILE *fp = fopen("C:\\Typify\\Settings (do not modify)\\Progsets.txt", "r");
char *color = (char*) malloc(sizeof(char*) * ARR_SIZE);
char *title = (char*) malloc(sizeof(char*) * ARR_SIZE);
char *maxchars = (char*) malloc(sizeof(char*) * ARR_SIZE);
char com1[ARR_SIZE + 1] = "color ";
char com2[ARR_SIZE + 1] = "title ";
int i = 0;
int j = 0;
int k = 0;
int found = 0;
while (k < ARR_SIZE + 1) // fill strings with '\0'
{
color[k] = title[k] = maxchars[k] = '\0';
SETTINGS.color[k] = SETTINGS.maxchars[k] = SETTINGS.title[k] = '\0';
k++;
}
if (!fp) // check for reading errors
{
fprintf(stderr, "Error: Unable to load settings. Make sure that Progsets.txt exists and has not been modified.\a\n\n");
return -1; // fatal error
}
if (!size(fp)) // see if Progsets.txt is not a zero-byte file (it shouldn't be)
{
fprintf(stderr, "Error: Progsets.txt has been modified. Please copy the contents of Defsets.txt to Progsets.txt to manually reset to default settings.\a\n\n");
free(color);
free(title);
free(maxchars);
return -1; // fatal error
}
// PROBLEMATIC CODE:
fgets(color, ARR_SIZE, fp); // RETURNS NULL (INSTEAD OF READING FROM THE FILE)
fgets(title, ARR_SIZE, fp); // RETURNS NULL (INSTEAD OF READING FROM THE FILE)
fgets(maxchars, ARR_SIZE, fp); // RETURNS NULL (INSTEAD OF READING FROM THE FILE)
// END OF PROBLEMATIC CODE:
system(strcat(com1, SETTINGS.color)); // set color of cmd
system(strcat(com2, SETTINGS.title)); // set title of cmd
*pMAXCHARS = atoi(SETTINGS.maxchars);
// cleanup
fclose(fp);
free(color);
free(title);
free(maxchars);
return 0; // success
}
Progsets.txt:
COLOR=$0a;
TITLE=$Typify!;
MAXCHARS=$10000;
EDIT: Here is the definition of the size() function. Since I'm just working with ASCII text files, I assume that every character is one byte and the file size in bytes can be worked out by counting the number of characters. Anything suspicious?
size():
int size(FILE* fp)
{
int size = 0;
int c;
while ((c = fgetc(fp)) != EOF)
{
size++;
}
return size;
}

The problem lies in your use of the size() function. It repeatedly calls fgetc() on the file handle until it gets to the end of the file, incrementing a value to track the number of bytes in the file.
That's not a bad approach (though I'm sure there are better ones that don't involve inefficient character-based I/O) but it does have one fatal flaw that you seem to have overlooked.
After you've called it, you've read the file all the way to the end so that any further reads, such as:
fgets(color, ARR_SIZE, fp);
will simply fail since you're already at the end of the file. You may want to consider something like rewind() before returning from size() - that will put the file pointer back to the start of the file so that you can read it again.

Related

How do I run C code on linux with input file from command line?

I'm trying to do some simple tasks in C and run them from the command line in Linux.
I'm having some problems with both C and running the code from the command line with a given filename given as a parameter. I've never written code in C before.
Remove the even numbers from a file. The file name is transferred to
the program as a parameter in the command line. The program changes
this file.
How do I do these?
read from a file and write the results over the same file
read numbers and not digits from the file (ex: I need to be able to read "22" as a single input, not two separate chars containing "2")
give the filename through a parameter in Linux. (ex: ./main.c file.txt)
my attempt at writing the c code:
#include <stdio.h>
int main ()
{
FILE *f = fopen ("arr.txt", "r");
char c = getc (f);
int count = 0;
int arr[20];
while (c != EOF)
{
if(c % 2 != 0){
arr[count] = c;
count = count + 1;
}
c = getc (f);
}
for (int i=0; i<count; i++){
putchar(arr[i]);
}
fclose (f);
getchar ();
return 0;
}
Here's a complete program which meets your requirements:
write the results over the same file - It keeps a read and write position in the file and copies characters towards the file beginning in case numbers have been removed; at the end, the now shorter file has to be truncated. (Note that with large files, it will be more efficient to write to a second file.)
read numbers and not digits from the file - It is not necessary to read whole numbers, it suffices to store the write start position of a number (this can be done at every non-digit) and the parity of the last digit.
give the filename through a parameter - If you define int main(int argc, char *argv[]), the first parameter is in argv[1] if argc is at least 2.
#include <stdio.h>
#include <ctype.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
if (argc < 2) return 1; // no argument given
FILE *f = fopen(argv[1], "rb+");
if (!f) return 1; // if fopen failed
// read, write and number position
long rpos = 0, wpos = 0, npos = 0;
int even = 0, c; // int to hold EOF
while (c = getc(f), c != EOF)
{
if (isdigit(c)) even = c%2 == 0;
else
{
if (even) wpos = npos, even = 0;
npos = wpos+1; // next may be number
}
fseek(f, wpos++, SEEK_SET);
putc(c, f);
fseek(f, ++rpos, SEEK_SET);
}
ftruncate(fileno(f), wpos); // shorten the file
}
I'd do that like this (removing extra declarations => micro optimizations)
/**
* Check if file is avaiable.
*/
if (f == NULL)
{
printf("File is not available \n");
}
else
{
/**
* Populate array with even numbers.
*/
while ((ch = fgetc(f)) != EOF)
ch % 2 != 0 ? push(arr, ch); : continue;
/**
* Write to file those numbers.
*/
for (int i = 0; i < 20; i++)
fprintf(f, "%s", arr[i]);
}
Push implementation:
void push(int el, int **arr)
{
int *arr_temp = *arr;
*arr = NULL;
*arr = (int*) malloc(sizeof(int)*(n - 1));
(*arr)[0] = el;
for(int i = 0; i < (int)n - 1; i++)
{
(*arr)[i + 1] = arr_temp[i];
}
}
In order to write to the same file, without closing and opening it, you should provide both methods, w+ (writing and reading), and this method will clear it's content.
So, change the line where you open the file, for this.
FILE *f = fopen ("arr.txt", "w+");
You should look for ways of implementing dynamic arrays (pointers and memory management).
With this example you could simply go ahead and write yourself, inside the main loop, a temporary variable that stores a sequence of numbers, and stack those values
Something like this (pseudocode, have fun :)):
DELIMITER one of (',' | '|' | '.' | etc);
char[] temp;
if(ch not DELIMITER)
push ch on temp;
else
push temp to arr and clear it's content;
Hope this was useful.

Trying to read an unknown string length from a file using fgetc()

So yeah, saw many similar questions to this one, but thought to try solving it my way. Getting huge amount of text blocks after running it (it compiles fine).
Im trying to get an unknown size of string from a file. Thought about allocating pts at size of 2 (1 char and null terminator) and then use malloc to increase the size of the char array for every char that exceeds the size of the array.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
char *pts = NULL;
int temp = 0;
pts = malloc(2 * sizeof(char));
FILE *fp = fopen("txtfile", "r");
while (fgetc(fp) != EOF) {
if (strlen(pts) == temp) {
pts = realloc(pts, sizeof(char));
}
pts[temp] = fgetc(fp);
temp++;
}
printf("the full string is a s follows : %s\n", pts);
free(pts);
fclose(fp);
return 0;
}
You probably want something like this:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define CHUNK_SIZE 1000 // initial buffer size
int main()
{
int ch; // you need int, not char for EOF
int size = CHUNK_SIZE;
char *pts = malloc(CHUNK_SIZE);
FILE* fp = fopen("txtfile", "r");
int i = 0;
while ((ch = fgetc(fp)) != EOF) // read one char until EOF
{
pts[i++] = ch; // add char into buffer
if (i == size + CHUNK_SIZE) // if buffer full ...
{
size += CHUNK_SIZE; // increase buffer size
pts = realloc(pts, size); // reallocate new size
}
}
pts[i] = 0; // add NUL terminator
printf("the full string is a s follows : %s\n", pts);
free(pts);
fclose(fp);
return 0;
}
Disclaimers:
this is untested code, it may not work, but it shows the idea
there is absolutely no error checking for brevity, you should add this.
there is room for other improvements, it can probably be done even more elegantly
Leaving aside for now the question of if you should do this at all:
You're pretty close on this solution but there are a few mistakes
while (fgetc(fp) != EOF) {
This line is going to read one char from the file and then discard it after comparing it against EOF. You'll need to save that byte to add to your buffer. A type of syntax like while ((tmp=fgetc(fp)) != EOF) should work.
pts = realloc(pts, sizeof(char));
Check the documentation for realloc, you'll need to pass in the new size in the second parameter.
pts = malloc(2 * sizeof(char));
You'll need to zero this memory after acquiring it. You probably also want to zero any memory given to you by realloc, or you may lose the null off the end of your string and strlen will be incorrect.
But as I alluded to earlier, using realloc in a loop like this when you've got a fair idea of the size of the buffer already is generally going to be non-idiomatic C design. Get the size of the file ahead of time and allocate enough space for all the data in your buffer. You can still realloc if you go over the size of the buffer, but do so using chunks of memory instead of one byte at a time.
Probably the most efficient way is (as mentioned in the comment by Fiddling Bits) is to read the whole file in one go (after first getting the file's size):
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/stat.h>
int main()
{
size_t nchars = 0; // Declare here and set to zero...
// ... so we can optionally try using the "stat" function, if the O/S supports it...
struct stat st;
if (stat("txtfile", &st) == 0) nchars = st.st_size;
FILE* fp = fopen("txtfile", "rb"); // Make sure we open in BINARY mode!
if (nchars == 0) // This code will be used if the "stat" function is unavailable or failed ...
{
fseek(fp, 0, SEEK_END); // Go to end of file (NOTE: SEEK_END may not be implemented - but PROBABLY is!)
// while (fgetc(fp) != EOF) {} // If your system doesn't implement SEEK_END, you can do this instead:
nchars = (size_t)(ftell(fp)); // Add one for NUL terminator
}
char* pts = calloc(nchars + 1, sizeof(char));
if (pts != NULL)
{
fseek(fp, 0, SEEK_SET); // Return to start of file...
fread(pts, sizeof(char), nchars, fp); // ... and read one great big chunk!
printf("the full string is a s follows : %s\n", pts);
free(pts);
}
else
{
printf("the file is too big for me to handle (%zu bytes)!", nchars);
}
fclose(fp);
return 0;
}
On the issue of the use of SEEK_END, see this cppreference page, where it states:
Library implementations are allowed to not meaningfully support SEEK_END (therefore, code using it has no real standard portability).
On whether or not you will be able to use the stat function, see this Wikipedia page. (But it is now available in MSVC on Windows!)

code in C being killed when reading a 250MB file

I am trying to process a 250MB file using a script in C.
The file is basically a dataset and I want to read just some of the columns and (more importantly) break one of them (which is originally a string) into a sequence of characters.
However, even though I have plenty of RAM available, the code is killed by konsole (using KDE Neon) everytime I run it.
The source is available below:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
FILE *arquivo;
char *line = NULL;
size_t len = 0;
int i = 0;
int j;
int k;
char *vetor[500];
int acertos[45];
FILE *licmat = fopen("licmat.csv", "w");
//creating the header
fprintf(licmat,"CO_CATEGAD,CO_UF_CURSO,ACERTO09,ACERTO10,ACERTO11,ACERTO12,ACERTO13,ACERTO14,ACERTO15,ACERTO16,ACERTO17,ACERTO18,ACERTO19,ACERTO20,ACERTO21,ACERTO22,ACERTO23,ACERTO24,ACERTO25,ACERTO26,ACERTO27,ACERTO28,ACERTO29,ACERTO30,ACERTO31,ACERTO32,ACERTO33,ACERTO34,ACERTO35\n");
if ((arquivo = fopen("MICRODADOS_ENADE_2017.csv", "r")) == NULL) {
printf ("\nError");
exit(0);
}
//reading one line at a time
while (getline(&line, &len, arquivo)) {
char *ptr = strsep(&line,";");
j=0;
//breaking the line into a vector based on ;
while(ptr != NULL)
{
vetor[j]=ptr;
j=j+1;
ptr = strsep(&line,";");
}
//filtering based on content
if (strcmp(vetor[4],"702")==0 && strcmp(vetor[33],"555")==0) {
//copying some info
fprintf(licmat,"%s,%s,",vetor[2],vetor[8]);
//breaking the string (32) into isolated characters
for (k=0;k<27;k=k+1) {
fprintf(licmat,"%c", vetor[32][k]);
if (k<26) {
fprintf(licmat,",");
}
}
fprintf(licmat,"\n");
}
i=i+1;
}
free(line);
fclose(arquivo);
fclose(licmat);
}
The output is perfect up to the point when the script is killed. The output file is just 640KB long and has about 10000 lines only.
What could be the issue?
It looks to me like you're mishandling the memory buffer managed by getline() - which allocates/reallocates as needed - by the use of strsep(), which seems to manipulate that same pointer value.
Once line has been updated to reflect some other element on the line, it's no longer pointing to the start of allocated memory, and then boom the next time getline() needs to do anything with it.
Use a different variable to pass to strsep():
while (getline(&line, &len, arquivo) > 0) { // use ">=" if you want blank lines
char *parseline = line;
char *ptr = strsep(&parseline,";");
// do the same thing later
The key thing here: you're not allowed to muck with the value of line other than to free() it at the end (which you do), and you can't let any other routine do it either.
Edit: updated to reflect getline() returning <0 on error (h/t to #user3121023)

C, Segmentation fault parsing large csv file

I wrote a simple program that would open a csv file, read it, make a new csv file, and only write some of the columns (I don't want all of the columns and am hoping removing some will make the file more manageable). The file is 1.15GB, but fopen() doesn't have a problem with it. The segmentation fault happens in my while loop shortly after the first progress printf().
I tested on just the first few lines of the csv and the logic below does what I want. The strange section for when index == 0 is due to the last column being in the form (xxx, yyy)\n (the , in a comma separated value file is just ridiculous).
Here is the code, the while loop is the problem:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv) {
long size;
FILE* inF = fopen("allCrimes.csv", "rb");
if (!inF) {
puts("fopen() error");
return 0;
}
fseek(inF, 0, SEEK_END);
size = ftell(inF);
rewind(inF);
printf("In file size = %ld bytes.\n", size);
char* buf = malloc((size+1)*sizeof(char));
if (fread(buf, 1, size, inF) != size) {
puts("fread() error");
return 0;
}
fclose(inF);
buf[size] = '\0';
FILE *outF = fopen("lessColumns.csv", "w");
if (!outF) {
puts("fopen() error");
return 0;
}
int index = 0;
char* currComma = strchr(buf, ',');
fwrite(buf, 1, (int)(currComma-buf), outF);
int progress = 0;
while (currComma != NULL) {
index++;
index = (index%14 == 0) ? 0 : index;
progress++;
if (progress%1000 == 0) printf("%d\n", progress/1000);
int start = (int)(currComma-buf);
currComma = strchr(currComma+1, ',');
if (!currComma) break;
if ((index >= 3 && index <= 10) || index == 13) continue;
int end = (int)(currComma-buf);
int endMinusStart = end-start;
char* newEntry = malloc((endMinusStart+1)*sizeof(char));
strncpy(newEntry, buf+start, endMinusStart);
newEntry[end+1] = '\0';
if (index == 0) {
char* findNewLine = strchr(newEntry, '\n');
int newLinePos = (int)(findNewLine-newEntry);
char* modifiedNewEntry = malloc((strlen(newEntry)-newLinePos+1)*sizeof(char));
strcpy(modifiedNewEntry, newEntry+newLinePos);
fwrite(modifiedNewEntry, 1, strlen(modifiedNewEntry), outF);
}
else fwrite(newEntry, 1, end-start, outF);
}
fclose(outF);
return 0;
}
Edit: It turned out the problem was that the csv file had , in places I was not expecting which caused the logic to fail. I ended up writing a new parser that removes lines with the incorrect number of commas. It removed 243,875 lines (about 4% of the file). I'll post that code instead as it at least reflects some of the comments about free():
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv) {
long size;
FILE* inF = fopen("allCrimes.csv", "rb");
if (!inF) {
puts("fopen() error");
return 0;
}
fseek(inF, 0, SEEK_END);
size = ftell(inF);
rewind(inF);
printf("In file size = %ld bytes.\n", size);
char* buf = malloc((size+1)*sizeof(char));
if (fread(buf, 1, size, inF) != size) {
puts("fread() error");
return 0;
}
fclose(inF);
buf[size] = '\0';
FILE *outF = fopen("uniformCommaCount.csv", "w");
if (!outF) {
puts("fopen() error");
return 0;
}
int numOmitted = 0;
int start = 0;
while (1) {
char* currNewLine = strchr(buf+start, '\n');
if (!currNewLine) {
puts("Done");
break;
}
int end = (int)(currNewLine-buf);
char* entry = malloc((end-start+2)*sizeof(char));
strncpy(entry, buf+start, end-start+1);
entry[end-start+1] = '\0';
int commaCount = 0;
char* commaPointer = entry;
for (; *commaPointer; commaPointer++) if (*commaPointer == ',') commaCount++;
if (commaCount == 14) fwrite(entry, 1, end-start+1, outF);
else numOmitted++;
free(entry);
start = end+1;
}
fclose(outF);
printf("Omitted %d lines\n", numOmitted);
return 0;
}
you're malloc'ing but never freeing. possibly you run out of memomry, one of your mallocs returns NULL, and the subsequent call to str(n)cpy segfaults.
adding free(newEntry);, free(modifiedNewEntry); immediately after the respective fwrite calls should solve your memory shortage.
also note that inside your loop you compute offsets into the buffer buf which contains the whole file. these offsets are held in variables of type int whose maximum value on your system may be too small for the numbers you are handling. also note that adding large ints may result in a negative value which is another possible cause of the segfault (negative offsets into buf take you to some address outside the buffer possibly not even readable).
The malloc(3) function can (and sometimes does) fail.
At least code something like
char* buf = malloc(size+1);
if (!buf) {
fprintf(stderr, "failed to malloc %d bytes - %s\n",
size+1, strerror(errno));
exit (EXIT_FAILURE);
}
And I strongly suggest to clear with memset(buf, 0, size+1) the successful result of a malloc (or otherwise use calloc ....), not only because the following fread could fail (which you are testing) but to ease debugging and reproducibility.
and likewise for every other calls to malloc or calloc (you should always test them against failure)....
Notice that by definition sizeof(char) is always 1. Hence I removed it.
As others pointed out, you have a memory leak because you don't call free appropriately. A tool like valgrind could help.
You need to learn how to use the debugger (e.g. gdb). Don't forget to compile with all warnings and debugging information (e.g. gcc -Wall -g). And improve your code till you get no warnings.
Knowing how to use a debugger is an essential required skill when programming (particularly in C or C++). That debugging skill (and ability to use the debugger) will be useful in every C or C++ program you contribute to.
BTW, you could read your file line by line with getline(3) (which can also fail and you should test that).

c - get file into array of chars

hi i have the following code below, where i try to get all the lines of a file into an array... for example if in file data.txt i have the following:
first line
second line
then in below code i want to get in data array the following:
data[0] = "first line";
data[1] = "second line"
My first question: Currently I am getting "Segmentation fault"... Why?
Exactly i get the following output:
Number of lines is 7475613
Segmentation fault
My second question: Is there any better way to do what i am trying do?
Thanks!!!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char* argv[])
{
FILE *f = fopen("data.txt", "rb");
fseek(f, 0, SEEK_END);
long pos = ftell(f);
fseek(f, 0, SEEK_SET);
char *bytes = malloc(pos);
fread(bytes, pos, 1, f);
int i =0;
int counter = 0;
for(; i<pos; i++)
{
if(*(bytes+i)=='\n') counter++;
}
printf("\nNumber of lines is %d\n", counter);
char* data[counter];
int start=0, end=0;
counter = 0;
int length;
for(i=0; i<pos; i++)
{
if(*(bytes+i)=='\n')
{
end = i;
length =end-start;
data[counter]=(char*)malloc(sizeof(char)*(length));
strncpy(data[counter],
bytes+start,
length);
counter = counter+1;
start = end+1;
}
}
free(bytes);
return 0;
}
First line of the data.txt in this case is not '\n' it is: "23454555 6346346 3463463".
Thanks!
You need to malloc 1 more char for data[counter] for the terminating NUL.
after strncpy, you need to terminate the destination string.
Edit after edit of original question
Number of lines is 7475613
Whooooooaaaaaa, that's a bit too much for your computer!
If the size of a char * is 4, you want to reserve 29902452 bytes (30M) of automatic memory in the allocation of data.
You can allocate that memory dynamically instead:
/* char *data[counter]; */
char **data = malloc(counter * sizeof *data);
/* don't forget to free the memory when you no longer need it */
Edit: second question
My second question: Is there any
better way to do what i am trying do?
Not really; you're doing it right. But maybe you can code without the need to have all that data in memory at the same time.
Read and deal with a single line at a time.
You also need to free(data[counter]); in a loop ... and free(data); before the "you're doing it right" above is correct :)
And you need to check if each of the several malloc() calls succeeded LOL
First of all you need to check if the file got opened correctly or not:
FILE *f = fopen("data.txt", "rb");
if(!f)
{
fprintf(stderr,"Error opening file");
exit (1);
}
If there is error opening the file and you don't check it, you'll get a seg fault when you try to fseek on an invalid file pointer.
Apart from that I see no errors. Tried running the program, by printing the value of the data array at the end, it ran as expected.
One thing to note is that you're opening your file as binary - line termination disciplines may not work as you expect on your platform (UNIX is lf, Windows is cr-lf, some versions of MacOS are cr).

Resources