Loading from file to linked list - c

I am trying to load an unknown amount of data from a file in to a linked list
Load Function
void load(FILE *file, Node **head) // while feof somewhere
{
char tempArtist[30]={'\0'}, tempAlbum[30]={'\0'}, tempTitle[30]={'\0'}, tempGenre[30]={'\0'},tempSpace='\0';
char tempPlay[100]={'\0'}, tempRating[6]={'\0'}, tempMins[8]={'\0'}, tempSecs[8]={'\0'};
Data *temp;
temp=(Data*)malloc(sizeof(Data));
while(!feof(file))
{
fscanf(file,"%s",tempArtist);
fscanf(file,"%s",tempAlbum);
fscanf(file,"%s",tempTitle);
fscanf(file,"%s",tempGenre);
fscanf(file,"%s",tempMins);
fscanf(file,"%s",tempSecs);
fscanf(file,"%s",tempPlay);
fscanf(file,"%s",tempRating);
temp->mins=strdup(tempMins);
temp->secs=strdup(tempSecs);
temp->album=strdup(tempAlbum);
temp->artist=strdup(tempArtist);
temp->genre=strdup(tempGenre);
temp->song=strdup(tempTitle);
temp->played=strdup(tempPlay);
temp->rating=strdup(tempRating);
insertFront(head,temp);
}
}
The problem I am having is that when I go to print out the list all the entries are the same as the last one read from the file. This is something to do with the strdup() but I can't get it to copy to the data type without getting access violations.
What is another method(proper method) that I could use to copy the strings read from the file and pass them to the insert?
InsertFront
void insertFront(Node **head, Data *data)
{
Node *temp=makeNode(data);
if ((*head) == NULL)
{
*head=temp;
}
else
{
temp->pNext=*head;
*head=temp;
}
}
Data struct
typedef struct data
{
char *artist;
char *album;
char *song;
char *genre;
char *played;
char *rating;
char *mins;
char *secs;
}Data;
testing file
snoop
heartbeat
swiggity
rap
03
10
25
4
hoodie
cakeboy
birthday
hiphop
02
53
12
5

You use the same instance of temp for all your lines. The allocation should go inside the loop, ideally after you have established that the whole entry was read in successfully.
By the way, feof is not a good way to control input loops. You should check the return value of fscanf instead.
while (1) {
if (fscanf(file,"%s",tempAlbum) < 1 ) break;
if (fscanf(file,"%s",tempArtist) < 1) break;
// ...
temp = (Data *) malloc(sizeof(Data));
temp->album=strdup(tempAlbum);
temp->artist=strdup(tempArtist);
// ...
insertFront(head, temp);
}
Note how the allocation of a new node happens only after a whole record was read.
There are other ways to improve the code. For example you have very short buffers for the strings that contain numerical data. What if there is a line slip-up and you read a longer line into into such a short buffer? Also, your input seems to be line-wise, so it might be good to use fgets instead of fscan(f, "%s"), which will only read "words" and will have problems with lines that have spaces.

while(!eof) is wrong. You can do like.
for (;;) {
ret = fscanf()
if(ret == EOF) break;
}

Related

My professor ran this code successfully, but I cannot. Seems to be an issue with the fopen() and the pointers

Edit: If it matters, I'm using Dev C++ with -std=c99 as an option.
My professor was able to run this code in class and successfully open a file that eventually reads the data into a linked list. When running the exact same code, my program abruptly exits despite the file being successfully opened.
All I'm trying to do is get this code to run. I've solved what he wants us to solve in my own example, but I can't figure out why his code doesn't run on my machine.
I did add in a puts("Success") line to verify the file has been opened, and I've gone through the methods it calls to see if I can find an error, but I cannot.
Here is the method with the issue (I'm assuming)
int ReadFileStoreInList(void)
{
FILE *cfPtr;
if ((cfPtr = fopen("Hertz-Homework-9-List.txt", "r")) !=NULL)
{
char make[TWELVE]={""}, model[TWELVE]={""}, size[TWELVE]={""}, color[TWELVE]={""}, power[TWELVE]={""};
char rented = 'A';
float daily_rate=0.0;
char dAilyRate[TWELVE];
fscanf(cfPtr, "%s%s%s%s%s%s", make, model, size, color, power, dAilyRate);
while (!feof(cfPtr))
{
fscanf(cfPtr, "%s%s%s%s%s%f", make, model, size, color, power, &daily_rate);
rented = 'A';
add_at_end(make, model, size, color, power, daily_rate, rented);
}
printScreenTitleAndHeaderForCars();
traverse_in_order();
}
else
{
puts("Input data file could not be opened, I have no new inventory of cars from headquarters\n\n\n");
}
fclose(cfPtr);
}
This calls traverse_in_order(); and printScreenTitleAndHeaderForCars();, which I will list below.
void traverse_in_order()
{
node *ptr;
if(start==NULL)
{
printf("list is empty\n");
return;
}
printScreenTitleAndHeaderForCars();
for(ptr=start; ptr!=NULL; ptr=(*ptr).next)
printf("%-12s%-12s%-12s%-12s%-12s%9.2f%12c\n", ptr->make, ptr->model, ptr->size, ptr->color, ptr->power, ptr->daily_rate, ptr->rented);
}
void printScreenTitleAndHeaderForCars()
{
system("Cls");
printf("%35s\n\n","Hertz Rental Cars");
printf("%79s\n","Avail");
printf("%-12s%-12s%-12s%-12s%-12s%-12s%8s\n", "Make", "Model", "Size", "Color", "Power", "Daily_Rate", "Rented");
for (int x=0; x< 7; x++)
printf("----------- ");
printf("\n");
}
In case the structure/header file was also needed:
#define TWELVE 12
typedef struct node_type
{
char make[TWELVE];
char model[TWELVE];
char size[TWELVE];
char color[TWELVE];
char power[TWELVE];
float daily_rate;
char rented;
struct node_type *next;
} node;
node *start=NULL;
int ReadFileStoreInList(void);
void add_at_beginning();
void add_at_end(char make[TWELVE], char model[TWELVE], char size[TWELVE], char color[TWELVE], char power[TWELVE], float daily_rate, char rented);
void add_after_element();
void add_before_element();
void traverse_in_order();
void traverse_in_reverse_order(node *);
void delete_at_beginning();
void delete_at_end();
void delete_after_element();
void delete_before_element();
void sort();
void doSomething(); // menu to ask operator what to do
void RentaCar();
void FindCarAndUpdateAsRented(char *modelSelected);
void ReturnCar();
void toTitleCase(char *aString);
void printScreenTitleAndHeaderForCars();
Requested add_at_end function:
void add_at_end(char make[TWELVE], char model[TWELVE], char size[TWELVE], char color[TWELVE], char power[TWELVE], float daily_rate, char rented)
{
node *ptr, *loc;
ptr = (node *) malloc(sizeof(node));
if(ptr==NULL)
{
printf("no space\n");
return;
}
strcpy((*ptr).make,make);
strcpy((*ptr).model,model);
strcpy((*ptr).size,size);
strcpy((*ptr).color,color);
strcpy((*ptr).power,power);
(*ptr).daily_rate = daily_rate;
(*ptr).rented = rented;
if(start==NULL)
{
start=ptr;
(*start).next=NULL;
}
else
{
loc = start;
while((*loc).next != NULL)
loc=(*loc).next;
(*loc).next=ptr;
(*ptr).next=NULL;
}
}
Interesting enough, if I rename this file to something that isn't correct, the proper printScreenTitleAndHeaderForCars() text executes and shows on screen. That's why I believe it has something to do with ReadFileStoreInList();
I've tried to debug this for a few hours now, but with the knowledge I've learned in class I just can not figure out why this doesn't run.
I expect the output to have the header information given from printScreenTitleAndHeaderForCars(), and then the data from the file I'm reading to appear on screen.
When the file is named improperly in my code, it runs this:
Hertz Rental Cars
Avail
Make Model Size Color Power Daily_Rate Rented
----------- ----------- ----------- ----------- ----------- ----------- -----------
list is empty
enter choice
1. Select Model and Rent
2. Select Model and Return
5. traverse in order
11. sort
12. exit
Where "list is empty" is situated, it should be populating the data from the text file and putting it there.
Instead, I just get:
--------------------------------
Process exited after 1.433 seconds with return value 3221225477
Press any key to continue . . .
I have a feeling this has to do with the way the pointers are written, but I'm struggling to understand how he could run the code and I couldn't.
Any knowledge as to why this might happen would be appreciated!
Edit: Text file contents:
make, model, size, color, power,daily rate.
Mazda,3,4-door,Black,4-Cyl,$99.73
Jeep,Cherokee,4-door,Blue,8-Cyl,$131.92
Buick,Regal,4-door,Purple,6-Cyl,$125.19
Fullsize,SUV,5-door,Brown,8-Cyl,$163.94
Chrysler,Pacifica,4-door,Green,6-Cyl,$127.49
Ford,Focus,2-door,Red,4-Cyl,$99.73
VW,Jetta,2-door,Orange,4-Cyl,$94.91
Chevrolet,Suburban,4-door,Yellow,8-Cyl,$204.92
Nissan,Pathfinder,4-door,White,6-Cyl,$145.11
Chevrolet,Spark,2-door,Teal,4-Cyl,$99.55
You concluded in comments that you thought the data file contents were to blame for your issue. You now having posted those, I can confirm that they are indeed a contributing factor. Imprudent details of your scanf format also contribute.
In the first place, the call for reading the header is dangerous:
fscanf(cfPtr, "%s%s%s%s%s%s", make, model, size, color, power, dAilyRate);
Since the data are just going to be discarded, there is no point in storing them. Moreover, it's not necessarily safe to assume that the header data will have characteristics matched to those of the associated data. It would be better to read and not assign the whole line:
fscanf(cfPtr, "%*[^\n]");
The * in the format says that the field directive is not to be assigned, only read. Overall, that reads (and ignores) everything up to but not including the first newline. That also allows you to get rid of the abominably-named dAilyRate variable.
Then there is the format for reading the actual data:
fscanf(cfPtr, "%s%s%s%s%s%f", make, model, size, color, power, &daily_rate);
It simply does not match the data. Specifically, the %s field descriptor skips leading whitespace and matches a whitespace-delimited string. Your data are comma-delimited, not whitespace-delimited except for line terminators. As a result, that scanf call will try to write a whole line's worth of data into each of the first five strings, thus overrunning each of their bounds. That's a plausible reason for a segfault.
What's more, the line read into daily_rate will fail, since the next data available at that point will be non-numeric. Yet even if the commas were changed to spaces, the rate data would still not be read correctly, because $ is not a valid part of a number. And that, in turn will throw off the reads for the second and subsequent lines.
The field overruns could have been avoided by specifying maximum field widths in the format. It would, moreover, be prudent to check the return value of scanf() to verify that all fields were read, as #Achal demonstrated in his answer, before relying on that data.
Here's a data file in a format compatible with the formats you're actually using:
make model size color power rate
Mazda 3 4-door Black 4-Cyl 99.73
Jeep Cherokee 4-door Blue 8-Cyl 131.92
Buick Regal 4-door Purple 6-Cyl 125.19
Fullsize SUV 5-door Brown 8-Cyl 163.94
Chrysler Pacifica 4-door Green 6-Cyl 127.49
Ford Focus 2-door Red 4-Cyl 99.73
VW Jetta 2-door Orange 4-Cyl 94.91
Chevrolet Suburban 4-door Yellow 8-Cyl 204.92
Nissan Pathfinder 4-door White 6-Cyl 145.11
Chevrolet Spark 2-door Teal 4-Cyl 99.55
And here's a safer way to read it:
#define STR_SIZE 12
char make[STR_SIZE]={""}, model[STR_SIZE]={""}, size[STR_SIZE]={""}, color[STR_SIZE]={""},
power[STR_SIZE]={""};
float daily_rate;
fscanf(cfPtr, "%*[^\n]");
while (fscanf(cfPtr, "%11s%11s%11s%11s%11s%f", make, model, size, color, power, &daily_rate) == 6) {
add_at_end(make, model, size, color, power, daily_rate, 'A');
}
The %11s fields will read up to 11 characters into your 12-character arrays, leaving room for the string terminator that fscanf() will append to each. This will still run into trouble if there is overlong data in the file, but it should not segfault.
I should say also that scanf is difficult to use safely and properly, and that there are other, better alternatives for parsing and consuming the data -- either in the original or in the modified format.
Here are few observation.
Firstly provide more readbility to your code by giving some meaningful macro name instead of TWELVE. For e.g #define NUMBER_OF_ITEMS 12
Secondly, In function definition the formal argument make[TWELVE] doesn't look good as you are passing char array & it decays to char*, so only char *make is enough.
This
void add_at_end(char make[TWELVE], char model[TWELVE], char size[TWELVE], char color[TWELVE], char power[TWELVE], float daily_rate, char rented) { }
can be replaced as
void add_at_end(char *make, char *model, char *size, char *color, char *power, float daily_rate, char rented) { }
And most importantly, this
fscanf(cfPtr, "%s%s%s%s%s%s", make, model, size, color, power, dAilyRate); /* Just Remove it */
just before
while (!feof(cfPtr))
creates an issue i.e that information you are not using, its getting overwritten by second fscanf() statement inside loop. Also do read Why is “while (!feof(file))” always wrong?
Sample code
void add_at_end(char *make, char *model, char *size, char *color, char *power, float daily_rate, char rented)
{
/* same code */
}
And
int ReadFileStoreInList(void)
{
FILE *cfPtr;
if ((cfPtr = fopen("input", "r")) !=NULL)
{
char make[TWELVE]={""}, model[TWELVE]={""}, size[TWELVE]={""}, color[TWELVE]={""}, power[TWELVE]={""};
char rented = 'A';
float daily_rate=0.0;
char dAilyRate[TWELVE];
while (fscanf(cfPtr, "%s%s%s%s%s%f", make, model, size, color, power, &daily_rate) == 6)
{
rented = 'A';
add_at_end(make, model, size, color, power, daily_rate, rented);
}
printScreenTitleAndHeaderForCars();
traverse_in_order();
}
else
{
puts("Input data file could not be opened, I have no new inventory of cars from headquarters\n\n\n");
return 0; /* in case of fopen failed */
}
fclose(cfPtr);
}

Unexpected Output - Storing into 2D array in c

I am reading data from a number of files, each containing a list of words. I am trying to display the number of words in each file, but I am running into issues. For example, when I run my code, I receive the output as shown below.
Almost every amount is correctly displayed with the exception of two files, each containing word counts in the thousands. Every other file only has three digits worth of words, and they seem just fine.
I can only guess what this problem could be (not enough space allocated somewhere?) and I do not know how to solve it. I apologize if this is all poorly worded. My brain is fried and I am struggling. Any help would be appreciated.
I've tried to keep my example code as brief as possible. I've cut out a lot of error checking and other tasks related to the full program. I've also added comments where I can. Thanks.
StopWords.c
#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <stddef.h>
#include <string.h>
typedef struct
{
char stopwords[2000][60];
int wordcount;
} LangData;
typedef struct
{
int languageCount;
LangData languages[];
} AllData;
main(int argc, char **argv)
{
//Initialize data structures and open path directory
int langCount = 0;
DIR *d;
struct dirent *ep;
d = opendir(argv[1]);
//Count the number of language files in the directory
while(readdir(d))
langCount++;
//Account for "." and ".." in directory
//langCount = langCount - 2 THIS MAKES SENSE RIGHT?
langCount = langCount + 1; //The program crashes if I don't do this, which doesn't make sense to me.
//Allocate space in AllData for languageCount
AllData *data = malloc(sizeof(AllData) + sizeof(LangData)*langCount); //Unsure? Seems to work.
//Reset the directory in preparation for reading data
rewinddir(d);
//Copy all words into respective arrays.
char word[60];
int i = 0;
int k = 0;
int j = 0;
while((ep = readdir(d)) != NULL) //Probably could've used for loops to make this cleaner. Oh well.
{
if (!strcmp(ep->d_name, ".") || !strcmp(ep->d_name, ".."))
{
//Filtering "." and ".."
}
else
{
FILE *entry;
//Get string for path (i should make this a function)
char fullpath[100];
strcpy(fullpath, path);
strcat(fullpath, "\\");
strcat(fullpath, ep->d_name);
entry = fopen(fullpath, "r");
//Read all words from file
while(fgets(word, 60, entry) != NULL)
{
j = 0;
//Store each word one character at a time (better way?)
while(word[j] != '\0') //Check for end of word
{
data->languages[i].stopwords[k][j] = word[j];
j++; //Move onto next character
}
k++; //Move onto next word
data->languages[i].wordcount++;
}
//Display number of words in file
printf("%d\n", data->languages[i].wordcount);
i++; Increment index in preparation for next language file.
fclose(entry);
}
}
}
Output
256 //czech.txt: Correct
101 //danish.txt: Correct
101 //dutch.txt: Correct
547 //english.txt: Correct
1835363006 //finnish.txt: Should be 1337. Of course it's 1337.
436 //french.txt: Correct
576 //german.txt: Correct
737 //hungarian.txt: Correct
683853 //icelandic.txt: Should be 1000.
399 //italian.txt: Correct
172 //norwegian.txt: Correct
269 //polish.txt: Correct
437 //portugese.txt: Correct
282 //romanian.txt: Correct
472 //spanish.txt: Correct
386 //swedish.txt: Correct
209 //turkish.txt: Correct
Do the files have more than 2000 words? You have only allocated space for 2000 words so once your program tries to copy over word 2001 it will be doing it outside of the memory allocated for that array, possibly into the space allocated for "wordcount".
Also I want to point out that fgets returns a string to the end of the line or at most n characters (60 in your case), whichever comes first. This will work find if there is only one word per line in the files you are reading from, otherwise will have to locate spaces within the string and count words from there.
If you are simply trying to get a word count, then there is no need to store all the words in an array in the first place. Assuming one word per line, the following should work just as well:
char word[60];
while(fgets(word, 60, entry) != NULL)
{
data->languages[i].wordcount++;
}
fgets reference- http://www.cplusplus.com/reference/cstdio/
Update
I took another look and you might want to try allocating data as follows:
typedef struct
{
char stopwords[2000][60];
int wordcount;
} LangData;
typedef struct
{
int languageCount;
LangData *languages;
} AllData;
AllData *data = malloc(sizeof(AllData));
data->languages = malloc(sizeof(LangData)*langCount);
This way memory is being specifically allocated for the languages array.
I agree that langCount = langCount - 2 makes sense. What error are you getting?

Two-dimensional char array too large exit code 139

Hey guys I'm attempting to read in workersinfo.txt and store it into a two-dimensional char array. The file is around 4,000,000 lines with around 100 characters per line. I want to store each file line on the array. Unfortunately, I get exit code 139(Not enough memory). I'm aware I have to use malloc() and free() but I've tried a couple of things and I haven't been able to make them work.Eventually I have to sort the array by ID number but I'm stuck on declaring the array.
The file looks something like this:
First Name, Last Name,Age, ID
Carlos,Lopez,,10568
Brad, Patterson,,20586
Zack, Morris,42,05689
This is my code so far:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
FILE *ptr_file;
char workers[4000000][1000];
ptr_file =fopen("workersinfo.txt","r");
if (!ptr_file)
perror("Error");
int i = 0;
while (fgets(workers[i],1000, ptr_file)!=NULL){
i++;
}
int n;
for(n = 0; n < 4000000; n++)
{
printf("%s", workers[n]);
}
fclose(ptr_file);
return 0;
}
The Stack memory is limited. As you pointed out in your question, you MUST use malloc to allocate such a big (need I say HUGE) chunk of memory, as the stack cannot contain it.
you can use ulimit to review the limits of your system (usually including the stack size limit).
On my Mac, the limit is 8Mb. After running ulimit -a I get:
...
stack size (kbytes, -s) 8192
...
Or, test the limit using:
struct rlimit slim;
getrlimit(RLIMIT_STACK, &rlim);
rlim.rlim_cur // the stack limit
I truly recommend you process each database entry separately.
As mentioned in the comments, assigning the memory as static memory would, in most implementations, circumvent the stack.
Still, IMHO, allocating 400MB of memory (or 4GB, depending which part of your question I look at), is bad form unless totally required - especially for a single function.
Follow-up Q1: How to deal with each DB entry separately
I hope I'm not doing your homework or anything... but I doubt your homework would include an assignment to load 400Mb of data to the computer's memory... so... to answer the question in your comment:
The following sketch of single entry processing isn't perfect - it's limited to 1Kb of data per entry (which I thought to be more then enough for such simple data).
Also, I didn't allow for UTF-8 encoding or anything like that (I followed the assumption that English would be used).
As you can see from the code, we read each line separately and perform error checks to check that the data is valid.
To sort the file by ID, you might consider either running two lines at a time (this would be a slow sort) and sorting them, or creating a sorted node tree with the ID data and the position of the line in the file (get the position before reading the line). Once you sorted the binary tree, you can sort the data...
... The binary tree might get a bit big. did you look up sorting algorithms?
#include <stdio.h>
// assuming this is the file structure:
//
// First Name, Last Name,Age, ID
// Carlos,Lopez,,10568
// Brad, Patterson,,20586
// Zack, Morris,42,05689
//
// Then this might be your data structure per line:
struct DBEntry {
char* last_name; // a pointer to the last name
char* age; // a pointer to the name - could probably be an int
char* id; // a pointer to the ID
char first_name[1024]; // the actual buffer...
// I unified the first name and the buffer since the first name is first.
};
// each time you read only a single line, perform an error check for overflow
// and return the parsed data.
//
// return 1 on sucesss or 0 on failure.
int read_db_line(FILE* fp, struct DBEntry* line) {
if (!fgets(line->first_name, 1024, fp))
return 0;
// parse data and review for possible overflow.
// first, zero out data
int pos = 0;
line->age = NULL;
line->id = NULL;
line->last_name = NULL;
// read each byte, looking for the EOL marker and the ',' seperators
while (pos < 1024) {
if (line->first_name[pos] == ',') {
// we encountered a devider. we should handle it.
// if the ID feild's location is already known, we have an excess comma.
if (line->id) {
fprintf(stderr, "Parsing error, invalid data - too many fields.\n");
return 0;
}
// replace the comma with 0 (seperate the strings)
line->first_name[pos] = 0;
if (line->age)
line->id = line->first_name + pos + 1;
else if (line->last_name)
line->age = line->first_name + pos + 1;
else
line->last_name = line->first_name + pos + 1;
} else if (line->first_name[pos] == '\n') {
// we encountered a terminator. we should handle it.
if (line->id) {
// if we have the id string's possition (the start marker), this is a
// valid entry and we should process the data.
line->first_name[pos] = 0;
return 1;
} else {
// we reached an EOL without enough ',' seperators, this is an invalid
// line.
fprintf(stderr, "Parsing error, invalid data - not enough fields.\n");
return 0;
}
}
pos++;
}
// we ran through all the data but there was no EOL marker...
fprintf(stderr,
"Parsing error, invalid data (data overflow or data too large).\n");
return 0;
}
// the main program
int main(int argc, char const* argv[]) {
// open file
FILE* ptr_file;
ptr_file = fopen("workersinfo.txt", "r");
if (!ptr_file)
perror("File Error");
struct DBEntry line;
while (read_db_line(ptr_file, &line)) {
// do what you want with the data... print it?
printf(
"First name:\t%s\n"
"Last name:\t%s\n"
"Age:\t\t%s\n"
"ID:\t\t%s\n"
"--------\n",
line.first_name, line.last_name, line.age, line.id);
}
// close file
fclose(ptr_file);
return 0;
}
Followup Q2: Sorting array for 400MB-4GB of data
IMHO, 400MB is already touching on the issues related to big data. For example, implementing a bubble sort on your database should be agonizing as far as performance goes (unless it's a single time task, where performance might not matter).
Creating an Array of DBEntry objects will eventually get you a larger memory foot-print then the actual data..
This will not be the optimal way to sort large data.
The correct approach will depend on your sorting algorithm. Wikipedia has a decent primer on sorting algorythms.
Since we are handling a large amount of data, there are a few things to consider:
It would make sense to partition the work, so different threads/processes sort a different section of the data.
We will need to minimize IO to the hard drive (as it will slow the sorting significantly and prevent parallel processing on the same machine/disk).
One possible approach is to create a heap for a heap sort, but only storing a priority value and storing the original position in the file.
Another option would probably be to employ a divide and conquer algorithm, such as quicksort, again, only sorting a computed sort value and the entry's position in the original file.
Either way, writing a decent sorting method will be a complicated task, probably involving threading, forking, tempfiles or other techniques.
Here's a simplified demo code... it is far from optimized, but it demonstrates the idea of the binary sort-tree that holds the sorting value and the position of the data in the file.
Be aware that using this code will be both relatively slow (although not that slow) and memory intensive...
On the other hand, it will require about 24 bytes per entry. For 4 million entries, it's 96MB, somewhat better then 400Mb and definitely better then the 4GB.
#include <stdlib.h>
#include <stdio.h>
// assuming this is the file structure:
//
// First Name, Last Name,Age, ID
// Carlos,Lopez,,10568
// Brad, Patterson,,20586
// Zack, Morris,42,05689
//
// Then this might be your data structure per line:
struct DBEntry {
char* last_name; // a pointer to the last name
char* age; // a pointer to the name - could probably be an int
char* id; // a pointer to the ID
char first_name[1024]; // the actual buffer...
// I unified the first name and the buffer since the first name is first.
};
// this might be a sorting node for a sorted bin-tree:
struct SortNode {
struct SortNode* next; // a pointer to the next node
fpos_t position; // the DB entry's position in the file
long value; // The computed sorting value
}* top_sorting_node = NULL;
// this function will free all the memory used by the global Sorting tree
void clear_sort_heap(void) {
struct SortNode* node;
// as long as there is a first node...
while ((node = top_sorting_node)) {
// step forward.
top_sorting_node = top_sorting_node->next;
// free the original first node's memory
free(node);
}
}
// each time you read only a single line, perform an error check for overflow
// and return the parsed data.
//
// return 0 on sucesss or 1 on failure.
int read_db_line(FILE* fp, struct DBEntry* line) {
if (!fgets(line->first_name, 1024, fp))
return -1;
// parse data and review for possible overflow.
// first, zero out data
int pos = 0;
line->age = NULL;
line->id = NULL;
line->last_name = NULL;
// read each byte, looking for the EOL marker and the ',' seperators
while (pos < 1024) {
if (line->first_name[pos] == ',') {
// we encountered a devider. we should handle it.
// if the ID feild's location is already known, we have an excess comma.
if (line->id) {
fprintf(stderr, "Parsing error, invalid data - too many fields.\n");
clear_sort_heap();
exit(2);
}
// replace the comma with 0 (seperate the strings)
line->first_name[pos] = 0;
if (line->age)
line->id = line->first_name + pos + 1;
else if (line->last_name)
line->age = line->first_name + pos + 1;
else
line->last_name = line->first_name + pos + 1;
} else if (line->first_name[pos] == '\n') {
// we encountered a terminator. we should handle it.
if (line->id) {
// if we have the id string's possition (the start marker), this is a
// valid entry and we should process the data.
line->first_name[pos] = 0;
return 0;
} else {
// we reached an EOL without enough ',' seperators, this is an invalid
// line.
fprintf(stderr, "Parsing error, invalid data - not enough fields.\n");
clear_sort_heap();
exit(1);
}
}
pos++;
}
// we ran through all the data but there was no EOL marker...
fprintf(stderr,
"Parsing error, invalid data (data overflow or data too large).\n");
return 0;
}
// read and sort a single line from the database.
// return 0 if there was no data to sort. return 1 if data was read and sorted.
int sort_line(FILE* fp) {
// allocate the memory for the node - use calloc for zero-out data
struct SortNode* node = calloc(sizeof(*node), 1);
// store the position on file
fgetpos(fp, &node->position);
// use a stack allocated DBEntry for processing
struct DBEntry line;
// check that the read succeeded (read_db_line will return -1 on error)
if (read_db_line(fp, &line)) {
// free the node's memory
free(node);
// return no data (0)
return 0;
}
// compute sorting value - I'll assume all IDs are numbers up to long size.
sscanf(line.id, "%ld", &node->value);
// heap sort?
// This is a questionable sort algorythm... or a questionable implementation.
// Also, I'll be using pointers to pointers, so it might be a headache to read
// (it's a headache to write, too...) ;-)
struct SortNode** tmp = &top_sorting_node;
// move up the list until we encounter something we're smaller then us,
// OR untill the list is finished.
while (*tmp && (*tmp)->value <= node->value)
tmp = &((*tmp)->next);
// update the node's `next` value.
node->next = *tmp;
// inject the new node into the tree at the position we found
*tmp = node;
// return 1 (data was read and sorted)
return 1;
}
// writes the next line in the sorting
int write_line(FILE* to, FILE* from) {
struct SortNode* node = top_sorting_node;
if (!node) // are we done? top_sorting_node == NULL ?
return 0; // return 0 - no data to write
// step top_sorting_node forward
top_sorting_node = top_sorting_node->next;
// read data from one file to the other
fsetpos(from, &node->position);
char* buffer = NULL;
ssize_t length;
size_t buff_size = 0;
length = getline(&buffer, &buff_size, from);
if (length <= 0) {
perror("Line Copy Error - Couldn't read data");
return 0;
}
fwrite(buffer, 1, length, to);
free(buffer); // getline allocates memory that we're incharge of freeing.
return 1;
}
// the main program
int main(int argc, char const* argv[]) {
// open file
FILE *fp_read, *fp_write;
fp_read = fopen("workersinfo.txt", "r");
fp_write = fopen("sorted_workersinfo.txt", "w+");
if (!fp_read) {
perror("File Error");
goto cleanup;
}
if (!fp_write) {
perror("File Error");
goto cleanup;
}
printf("\nSorting");
while (sort_line(fp_read))
printf(".");
// write all sorted data to a new file
printf("\n\nWriting sorted data");
while (write_line(fp_write, fp_read))
printf(".");
// clean up - close files and make sure the sorting tree is cleared
cleanup:
printf("\n");
fclose(fp_read);
fclose(fp_write);
clear_sort_heap();
return 0;
}

lexical analyser correct output but pointer "filename" contains wrong name

The purpose of this code is to read the following txts(d.txt,e.txt,f.txt) and do the actions that are required in order to put the alphabet with the correct order into the output.txt. The code suppose to work since in output.txt i get the correct results but there is a problem with the testing i did using the printf (it's at the end of newfile function). In order to run i give as input d.txt and output.txt.
It should print
top->prev points to file :d
top->prev points to file :e
but instead it prints the following and i can't find the reason
top->prev points to file :d
top->prev points to file :f
d.txt:
abc
#include e.txt
mno
e.txt:
def
#include f.txt
jkl
f.txt:
ghi
code:
%{
#include <stdio.h>
#include <stdlib.h>
struct yyfilebuffer{
YY_BUFFER_STATE bs;
struct yyfilebuffer *prev;
FILE *f;
char *filename;
}*top;
int i;
char temporal[7];
void newfile(char *filename);
void popfile();
void create();
%}
%s INC
%option noyywrap
%%
"#include " {BEGIN INC;}
<INC>.*$ {for(i=1;i<strlen(yytext)-2;i++)
{
temporal[i-1]=yytext[i];
}
newfile(temporal);
BEGIN INITIAL;
}
<<EOF>> {popfile();
BEGIN INITIAL;
}
%%
void main(int argc,int **argv)
{
if ( argc < 3 )
{
printf("\nUsage yybuferstate <filenamein> <filenameout>");
exit(1);
}
else
{
create();
newfile(argv[1]);
yyout = fopen(argv[2], "w");
yylex();
}
system("pause");
}
void create()
{
top = NULL;
}
void newfile(char *filename)
{
struct yyfilebuffer *newptr;
if(top == NULL)
{
newptr = malloc(1*sizeof(struct yyfilebuffer));
newptr->prev = NULL;
newptr->filename = filename;
newptr->f = fopen(filename,"r");
newptr->bs = yy_create_buffer(newptr->f, YY_BUF_SIZE);
top = newptr;
yy_switch_to_buffer(top->bs);
}
else
{
newptr = malloc(1*sizeof(struct yyfilebuffer));
newptr->prev = top;
newptr->filename = filename;
newptr->f = fopen(filename,"r");
newptr->bs = yy_create_buffer(newptr->f, YY_BUF_SIZE);
top = newptr;
yy_switch_to_buffer(top->bs); //edw
}
if(top->prev != NULL)
{
printf("top->prev points to file : %s\n",top->prev->filename);
}
}
void popfile()
{
struct yyfilebuffer *temp;
temp = NULL;
if(top->prev == NULL)
{
printf("\n Error : Trying to pop from empty stack");
exit(1);
}
else
{
temp = top;
top = temp->prev;
yy_switch_to_buffer(top->bs);
system("pause");
}
}
You need to think about how you manage memory, remembering that C does not really have a string type in the way you might be used to from other languages.
You define a global variable:
char temporal[7];
(which has an odd name, since globals are anything but temporary), and then fill in its value in your lexer:
for(i=1;i<strlen(yytext)-2;i++) {
temporal[i-1]=yytext[i];
}
There are at least three problems with the above code:
temporal only has room for a six-character filename, but nowhere do you check to make sure that yyleng is not greater than 6. If it is, you will overwrite random memory. (The flex-generated scanner sets yyleng to the length of the token whose starting address is yytext. So you might as well use that value instead of computing strlen(yytext), which involves a scan over the text.)
You never null-terminate temporal. That's OK the first time, because it has static lifetime and will therefore be filled with zeros at program initialization. But the second and subsequent times you are counting on the new filename to not be shorter than the previous one; otherwise, you'll end up with part of the previous name at the end of the new name.
You could have made much better use of the standard C library. Although for reasons I will note below, this does not solve the problem you observe, it would have been better to use the following instead of the loop, after checking that yyleng is not too big:
memcpy(temporal, yytext + 1, yyleng - 2); /* Copy the filename */
temporal[yyleng - 2] = '\0'; /* NUL-terminate the copy */
Once you make the copy in temporal, you give that to newfile:
newfile(temporal);
And in newfile, what we see is:
newptr->filename = filename;
That does not copy filename. The call to newfile passed the address of temporal as an argument, so within newfile, the value of the parameter filename is the address of temporal. You then store that address in newptr->filename, so newptr->filename is also the address of temporal.
But, as noted above, temporal is not temporary. It is a global variable whose lifetime is the entire lifetime of the program. So the next time your lexical scanner encounters an include directive, it will put it into temporal, overwriting the previous contents. So what then happens to the filename member in the yyfilebuffer structure? Answer: nothing. It still points to the same place, temporal, but the contents of that place have changed. So when you later print out the contents of the string pointed to by that filename field, you'll get a different string from the one which happened to be in temporal when you first created that yyfilebuffer structure.
On the whole, you'll find it easier to manage memory if newfile and popfile "own" the memory in the filebuffer stack. That means that newfile should make a copy of its argument into freshly-allocated storage, and popfile should free that storage, since it is no longer needed. If newfile makes a copy, then it is not necessary for the lexical-scanner action which calls newfile to make a copy; it is only necessary for it to make sure that the string is correctly NUL-terminated when it calls newfile.
In short, the code might look like this:
/* Changed parameter to const, since we are not modifying its contents */
void newfile(const char *filename) {
/* Eliminated this check as obviously unnecessary: if(top == NULL) */
struct yyfilebuffer *newptr = malloc(sizeof(struct yyfilebuffer));
newptr->prev = top;
// Here we copy filename. Since I suspect that you are on Windows,
// I'll write it out in full. Normally, I'd use strdup.
newptr->filename = malloc(strlen(filename) + 1);
strcpy(newptr->filename, filename);
newptr->f = fopen(filename,"r");
newptr->bs = yy_create_buffer(newptr->f, YY_BUF_SIZE);
top = newptr;
yy_switch_to_buffer(top->bs); //edw
if(top->prev != NULL) {
printf("top->prev points to file : %s\n",top->prev->filename);
}
}
void popfile() {
if(top->prev == NULL) {
fprintf(stderr, "Error : Trying to pop from empty stack\n");
exit(1);
}
struct yyfilebuffer temp = top;
top = temp->prev;
/* Reclaim memory */
free(temp->filename);
free(temp);
yy_switch_to_buffer(top->bs);
system("pause");
}
Now that newfile takes ownership of the string passed to it, we no longer need to make a copy. Since the action clearly indicates that you expect the argument to the #include to be something like a C #include directive (surrounded either by "..." or <...>), it is better to make that explicit:
<INC>\".+\"$|"<".+">"$ {
/* NUL-terminate the filename by overwriting the trailing "*/
yytext[yyleng - 1] = '\0';
newfile(yytext + 1);
BEGIN INITIAL;
}

C - segmentation fault using struct member values

I'm running head-long into a segmentation fault that I'm not sure of the reason behind.
Short story... I store file names into members of a struct, then use those members to open files to load their data into linked lists. This is working fine when I only have two file, but when I go to add a third, I get a segmentation fault opening the first file.
Code will hopefully illustrate better...
int main(int argc, char* argv[])
{
/* Initalise tennisStore struct */
TennisStoreType *ts;
systemInit(ts);
/* Variables */
ts->stockFile = "stock.csv";
ts->custFile = "customer.csv";
ts->salesFile = "sales.csv";
/* Load data from files */
loadData(ts, ts->custFile, ts->stockFile);
...
}
The struct details for ts...
typedef struct tennisStore
{
CustomerNodePtr headCust;
unsigned customerCount;
StockNodePtr headStock;
unsigned stockCount;
char *custFile;
char *stockFile;
char *salesFile;
} TennisStoreType;
systemInit() seems pretty innocuous, but here's the code just in case...
void systemInit(TennisStoreType *ts)
{
/* Set ts options to be ready */
ts->headCust = NULL;
ts->headStock = NULL;
ts->customerCount = 0;
ts->stockCount = 0;
}
loadData()...
void loadData(TennisStoreType* ts, char* customerFile, char* stockFile)
{
/* Load customer data */
addCustNode(ts, customerFile);
/* Load stock data */
addStockNode(ts, stockFile);
}
Here's where the problem occurs...
void addStockNode(TennisStoreType* ts, char* stockFile)
{
/* Variables */
StockNodePtr head, new, current, previous;
unsigned stkLevel;
char *stkTok1, *stkTok2, *stkTok3, *stkTok4;
char buf[BUFSIZ];
float stkPrice;
FILE *stream;
/* Set head */
head = NULL;
/* Open stock file */
stream = fopen(stockFile, "r"); <-- segmentation fault when sales.csv line included
assert(stream);
while (fgets(buf, BUFSIZ, stream))
{
...
}
...
}
As above, when the ts->salesFile = "sales.csv" line is included in main, the segmentation fault occurs. When it isn't, all is fine (file opens, I can read from it, write to it etc). Cannot for the life of me understand why, so I'm appealing to your good nature and superior knowledge of C for potential causes of this problem.
Thanks!
ts is uninitialized, and used as is, in systemInit().
It should be malloc()ed..
change
TennisStoreType *ts;
to
TennisStoreType *ts=malloc(sizeof(TennisStoreType));
or
change
TennisStoreType *ts;
systemInit(ts);
to
TennisStoreType ts;
systemInit(&ts);
You never actually created your TennisStoreType object.
int main(int argc, char* argv[])
{
TennisStoreType *ts; // <-- allocates 4 bytes for a pointer
systemInit(ts); // <-- pass the pointer to nowhere around.
Try inserting ts = malloc(sizeof(TennisStoreType)) in between those two lines.

Resources