I am currently working on creating a dictionary using a binary search tree-like structure we designed in class.
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
struct entry
{
char* word;
unsigned int n; /* n is the number of times the word appears in the source. */
struct entry *left;
struct entry *right;
};
/*input_from_args: if no additional argument is given, return stdin. Else, open the text file and read it.*/
FILE*
input_from_args(int argc, const char *argv[]){
if(argc==1){
return stdin;
}else{
return fopen(argv[1],"r");
}
}
Below is the insert function that we also wrote in my class. Given the new word we are looking at, if it is
struct entry*
insert(struct entry *table, char* str)
{
if(table == NULL){
table = (struct entry *)malloc(sizeof(struct entry));
strcpy(table->word,str);
table -> n = 1;
table -> left = NULL;
table -> right = NULL;
}else if(strcmp(str, table->word) == 0){
table -> n = (table ->n)+1;
}else if(strcmp(str, table->word) <0){
table->left = insert(table->left, str);
}else if(strcmp(str, table->word) >0){
table ->right = insert(table->right, str);
}
return table;
}
Below is a print function which I wrote myself which is to print every word in table and N, the number of times it occurs.
void
print_table(struct entry *table){
if(table!=NULL){
print_table(table->left);
printf("%s ", table->word);
printf("%d \n", table->n);
print_table(table->right);
}
}
And finally, below is the main function.
int
main(int argc, const char *argv[])
{
FILE *src = input_from_args(argc, argv);
if(src == NULL){
fprintf(stderr, "%s: unable to open %s\n", argv[0], argv[1]);
exit(EXIT_FAILURE);
}
char str[1024];
struct entry *table;
int c;
while((fscanf(src, "%s", str))!= EOF){
table = insert(table, str);
}
print_table(table);
return 0;
}
I'm having some very odd behavior when I run this function. It seems to only be happening when I run it with longer input.
When I run it with this input(in a .txt file):
This is a test.
This is a test.
This is a test.
I get the following output:
This 3
a 3
is 3
test 3
This is what I should be getting. However, when I give it slightly longer input, such as:
Apple Apple
Blue Blue
Cat Cat
Dog Dog
Elder Elder
Funions Funions
Gosh Gosh
Hairy Hairy
I get the following output:
Appme 2
Blue 2
Cat 2
Dog 2
Elder 2
Funions 2
Gosi 2
Hairy 2
Which is clearly correct as far as the numbers go, but why is it changing some of the letters in my words? I gave it Apple, it returned Appme. I gave it Gosh, it gave me Gosi. What's going on with my code that I am missing?
This line in the insert function is very problematic:
strcpy(table->word,str);
It's problematic because you don't actually allocate memory for the string. That means that table->word is uninitialized and its value will be indeterminate, so the strcpy call will lead to undefined behavior.
The simple solution? Use strdup to duplicate the string:
table->word = strdup(str);
The strdup function is not actually in standard C, but just about all platforms have it.
In your insert function, you do not allocate/malloc() space for the word pointer you are trying to strcpy() to:
if(table == NULL){
table = (struct entry *)malloc(sizeof(struct entry));
strcpy(table->word,str);
table -> n = 1;
table -> left = NULL;
table -> right = NULL;
}
Usually this code would exit with a segmentation fault, because you are copying data to memory you don't own, but this is easy to fix:
table->word = malloc(strlen(str) + 1);
strcpy(table->word, str);
You'll want to allocate one extra byte above the string length, to allow for the null terminator.
You do not need or want to cast the result of malloc(). In other words, this is fine:
table = malloc(sizeof(struct entry));
Get into the habit of using free() on any pointers you have malloc()-ed, when you are done with them. Otherwise, you end up with a memory leak.
Also, compile with -Wall -Weverything flags to enable all warnings.
Note: If one absolutely must use strdup(), it is easy to write a custom function to do so:
char* my_very_own_strdup(const char* src)
{
char* dest = NULL;
if (!src)
return dest;
size_t src_len = strlen(src) + 1;
dest = malloc(src_len);
if (!dest) {
perror("Error: Could not allocate space for string copy\n");
exit(EXIT_FAILURE);
}
memcpy(dest, src, src_len);
return dest;
}
On the line strcpy(table->word,str); where is table->word allocated?
So It copies only 4 bytes to table->word because pointer size is 4-bytes in your machine. So Be careful, you must allocate table->word there,
I would use this one instead of that table->word = strdup(str);
Related
I have a text file of the following form. Left column is names of players and right column is their score in games they played.
john 40
mary 50
john 30
kevin 88
kevin 29
joe 102
david 11
mary 134
I want to sum up the scores of the players. So, I want to print output of the form
john 70
mary 184
kevin 117
joe 102
david 11
I know that this can be easily done in R or Python. But I want to do this using C. So, I try to declare an array of structures in C and try to read each line from the file. struct is defined as a global variable, so by default, the struct members are initialized to zero values or null character in case of char array. Then, I try to read each row into the struct, which itself is an element of the array. But, while implementing this, I got stuck where new rows are to be read and then stored into structs. Is there any efficient way to do this ? Since R or 'pandas are based on C, their underlying code is probably written in C. How is it done there ?
Thanks
Typically you'd read a line, split it up on whitespace, see if an entry with that name already exists in a hash table or tree or other map data structure, and if so, add the current value to it, and if not, insert it using the current value. Then at the end traverse the map printing out the entries. Basically, the same approach you'd take with any language.
However, those other languages often have things like map data structures, high level abstractions for reading files and parsing text, etc., so a task like this can be done in a few lines (Shoot, awk can do it in one). With C, you have to write most of that stuff yourself, or use add-on libraries - the C standard, for example, has no hash table or trees. You basically have to do everything manually that languages like Python are doing for you under the hood.
Here's an example that uses the POSIX binary search tree functions (An awkward but portable API):
#define _GNU_SOURCE
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <search.h>
struct record {
int num;
char name[];
};
struct record *make_record(const char *name, int num) {
size_t len = strlen(name);
struct record *r = malloc(sizeof *r + len + 1);
r->num = num;
memcpy(r->name, name, len);
r->name[len] = 0;
return r;
}
int reccmp(const void *va, const void *vb) {
const struct record *a = va, *b = vb;
return strcmp(a->name, b->name);
}
void print_rec(const void *nodep, VISIT which, int depth) {
(void)depth;
// Print records in sorted order.
if (which == postorder || which == leaf) {
const struct record *r = *(const struct record **)nodep;
printf("%s\t%d\n", r->name, r->num);
}
}
int main(int argc, char **argv) {
if (argc != 2) {
fprintf(stderr, "Usage: %s filename\n", argc > 0 ? argv[0] : "program");
return EXIT_FAILURE;
}
FILE *fp = fopen(argv[1], "r");
if (!fp) {
fprintf(stderr, "%s: Unable to open %s: %s\n", argv[0], argv[1],
strerror(errno));
return EXIT_FAILURE;
}
void *counts = NULL; // Opaque pointer to the root of the tree
int lineno = 0;
char *line = NULL;
size_t line_len = 0;
while (getline(&line, &line_len, fp) > 0) {
lineno += 1;
char *saveptr = NULL;
char *name = strtok_r(line, " ", &saveptr);
char *numstr = strtok_r(NULL, " ", &saveptr);
if (!name || !*name || !numstr || !*numstr) {
fprintf(stderr, "Line %d of input is malformed!\n", lineno);
continue;
}
int num = atoi(numstr);
struct record *new_rec = make_record(name, num);
// tsearch() either inserts a new node and returns a pointer to it,
// or returns a pointer to an existing matching node.
struct record *found_rec =
*(struct record **)tsearch(new_rec, &counts, reccmp);
if (new_rec != found_rec) {
// If it's the latter, update its number sum and free the struct used
// to look it up.
found_rec->num += num;
free(new_rec);
}
}
free(line);
fclose(fp);
twalk(counts, print_rec);
#ifdef __GLIBC__
// Prevent spurious warnings from tools like ASan and valgrind about
// memory leaks.
tdestroy(counts, free);
#endif
return 0;
}
Example usage:
$ gcc -g -O -Wall -Wextra group.c
$ ./a.out input.txt
david 11
joe 102
john 70
kevin 117
mary 184
Here are the important parts of my code with unhelpful portions commented out:
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
#include "hmap.h"
struct val_word{
char *final_word;
struct val_word* next;
};
int main (int argc, char **argv){
//Check if dictionary file is given
FILE *fp1;
char key [125];
char val [125];
char temp;
struct val_word *storage;
char c;
int i;
int j;
int l;
HMAP_PTR dictionary = hmap_create(0, 0.75);
fp1 = fopen(argv[1], "r");
do{
c = fscanf(fp1, "%s", key);
// Convert string to lowercase
strcpy(val, key);
//Alphabetically sort string
struct val_word* word_node = malloc(sizeof(struct val_word));
word_node->final_word = val;
word_node->next = NULL;
storage = hmap_get(dictionary, key);
if(storage == NULL){
hmap_set(dictionary, key, word_node);
}
else{
struct val_word *temp2 = storage;
while(temp2->next != NULL){
temp2 = temp2->next;
}
word_node->final_word = val;
word_node->next = NULL;
temp2->next = word_node;
hmap_set(dictionary, key, storage);
}
} while (c != EOF);
fclose(fp1);
while(storage->next != NULL){
printf("The list is %s\n", storage->final_word);
storage = storage->next;
}
return 0;
}
I am given a dictionary file of unknown length, as well as a hash table implementation file that I cannot touch. The hash table stores jumbled versions of words, with the key being the alphabetically sorted version of the word. For example:
Part of the dictionary contains: leloh, hello, elloh, holel
key would be: ehllo
val would be a linked list storing the aforementioned 4 words.
hmap_get gets the value at the given key, and hmap_set sets the value at the given key.
My code processes everything fine, until I try to print the list located at a key.
The list will be of the correct size, but only stores the LAST value that it took as input. So adding onto the example above, my list would be (in chronological order):
leloh
elloh -> elloh
holel -> holel -> holel
ehllo -> ehllo -> ehllo -> ehllo
For some reason it also stores the correctly alphabetized string as the last string, which I did not provide the hmap_set function. Very confused about that.
However, the list makes perfect sense. I only have one node, and it is inside of a for loop. I do not change the variable name and therefore the pointers all point to the same node, and the node changes the string it contains through every iteration of the loop.
So, I am wondering how I would fix this.
I can't dynamically name variables, I can't just create a dynamic array of linked lists because I feel like that would defeat the purpose of having the hash table.
I don't know what sort of data type I would use to store this.
Any help is appreciated, thank you!
Transferring comments into an answer — where the code is easier to read.
The problem is, I think, that you keep reading new values into val (copying from key), but you only have one variable.
You need to duplicate the strings before stashing them in your hash map. So, look up the strdup() function and make a copy of the string in key using strdup() instead of strcpy(). Assign the value returned from strdup() to word_node->final_word.
If you're not allowed to use strdup(), write your own variant:
char *dup_str(const char *str)
{
size_t len = strlen(str) + 1;
char *dup = malloc(len);
if (dup != 0)
memmove(dup, str, len);
return dup;
}
I have a text file:
In 0 John 66
In 1 May 77
In 0 Eliz 88
Out 0
Out 0
I'm trying to parse this text file using scanf, and at the moment send the values after "In" to the add function, however I'm getting a seg fault when trying to do this.
I have some code here:
A struct in a seperate header file:
typedef Officer test;
typedef struct {
test tests[6];
int s;
} copList;
And this one:
typedef struct {
char name[25];
int id;
} Officer;
Then I have my main method
int main(void) {
FILE * ptr;
char buffer [500];
char * temp;
int pos;
int grade;
char * name;
copList * L;
ptr = fopen("test.txt","r");
if(ptr == NULL)
exit(1);
temp = malloc(sizeof(char)*10);
name = malloc(sizeof(char)*10);
L = malloc(sizeof(copList));
while(fgets(buffer,500,ptr) != NULL) {
sscanf(buffer,"%s %d %s %d\n",temp,&pos,name,&grade);
add(L->tests[pos],pos,L); //this gives me a seg fault
}
free(name);
free(temp);
free(L);
fclose(ptr);
return 0;
}
In a separate c file I have the add function.(Can't be changed)
void add(Test b, int pos, copList * L) {
//code to be added here later...
}
I've tried allocating different amounts of memory, but that didn't help. Also I noticed if I set a value to pos, in the while loop, before the add function call, I don't get a seg fault, but obviously that's not what I want, because the value wouldn't change. Any help would be much appreciated.
The main problem I see with your code is that it does not check the return value of sscanf -- if sscanf returns something other than 2 or 4, that means your input is something other than what you say it is. In addition, the arrays temp and name might overflow (on inputs other than what you show), which would cause undefined behavior. Finally, the spaces and \n in the sscanf format are wrong and should be removed (though they shouldn't actually cause any problems in this case.)
So you code should be something like:
while(fgets(buffer,500,ptr) != NULL) {
int count = sscanf(buffer,"%9s%d%9s%d",temp,&pos,name,&grade);
if (count != 2 && count != 4) {
fprintf(stderr, "Invalid input line: %s", buffer);
continue; }
... do stuff with temp and pos (only use name and grade if count == 4)
in this line:
add(L->tests[pos],pos,L);
the first parameter is a copy of the 'test' struct.
It is almost always a bad idea to pass a whole struct. Much better to just pass a pointer to the struct:
add( &(L->tests[pos]), pos, L );
Then, this line has a couple of problems:
void add(Test b, int pos, copList * L) {
1) 'Test' is a non-existent type, perhaps you meant: 'test'
2) 'b' is expecting a passed struct. as mentioned above,
it is (almost) always better to pass a pointer to a struct.
So I have the following question:
I have this struct ListAut
struct ListAut{
char* biggestn;
int sizeof_biggestn;
int total_len;
struct node* avl;
};
Its typedef is as it follows:
typedef struct ListAut *IndexOfAuts;
IndexOfAuts *newIndexOfAuts()
{
int i;
IndexOfAuts *ioa = malloc(27 * sizeof(struct ListAut));
for (i = 0; i < 27; i++)
{
ioa[i]->biggestn = "";
ioa[i]->sizeof_biggestn = 0;
ioa[i]->total_len = 0;
ioa[i]->avl = NULL;
}
return ioa;
}
void insertName(IndexOfAuts * ioa, char *nome)
{
char *aux = malloc(sizeof(nome));
aux = trim(nome);
int index = getIndexOfLetter(aux);
if (nameLen(aux) > getSizeOfLongName(ioa[index]))
{
strcpy(ioa[index]->biggestn, aux);
ioa[index]->sizeof_biggestn = nameLen(aux);
}
ioa[index]->total_len += nameLen(aux);
insert(ioa[index]->avl, aux);
}
This is an important part of a module I need for a project, and on its main it's Seg Faulting. I suspect it's on the creation of an "object" newIndexOfAuts(),
The idea of this module is to have an array of 27 pointers to those structures, one to each letter and another to the special characters;
Now I'm just confused because it might be from the problem above or from a module loader I made:
void loadModules(char *filename, IndexOfAuts * ioa, StatTable st)
{
char *nameofile = malloc(20);
strcpy(nameofile, filename);
FILE *file = fopen(nameofile, "r");
if (file != NULL)
{
int counter, ano;
char *buff, *field, *auxil;
buff = malloc(1024);
field = malloc(200);
auxil = malloc(200);
while (fgets(buff, 1024, file))
{
counter = 0;
field = strtok(buff, ",");
printf("inserting 1st name\n");
insertName(ioa, field);
counter++;
while (!atoi(field))
{
if ((auxil = strtok(NULL, ",")) != NULL)
{
counter++;
field = auxil;
insertName(ioa, field);
}
}
ano = atoi(field);
incPub(st, ano, counter - 1);
}
fclose(file);
}
}
When i run this in main that has the following lines:
printf("Creating Stat Table");
StatTable st=newStatTable();\\This Line is correct, i checked it,i hope
printf("Creating index");
IndexOfAuts* ioa=newIndexOfAuts();
printf("Loading Modules");
loadModules(filename,ioa,st);
Those prints were for me to see where was the cause of the seg fault, but the last line printed was the "Creating Index".
There are several cases of undefined behavior and one memory leak (and a possible case of undefined behavior too):
You have this initialization ioa[i]->biggestn=""; It make the biggestn member point to a constant array of one character (the '\0' character). You then do strcpy(ioa[index]->biggestn,aux); which will write over a constant character, and then go outside of the array writing into unknown memory.
You have this: char* aux=malloc(sizeof(nome)); That allocates only 4 or 8 bytes, which the size of the pointer and not what the pointer points to. Use strlen to get the length of a string.
For the above allocation you also need to allocate a byte extra, as strlen only returns the length of the string without the terminator.
You have aux=trim(nome); This overwrites the pointer you just allocated, leading to a memory leak.
The above call might also lead to undefined behavior if you return a pointer to a local variable or array.
There are probably other problematic lines, these were just the ones I found on a quick glance.
And a general tip: Learn to use a debugger! The debugger is a programmers best tool next to the compiler. If you run your program in a debugger, the debugger will stop at the location of the crash, and let you examine (and also walk up) the function call stack, as well as let you examine values of variables.
I am having some problem with writing a function to extract strings from a file as part of a bigger program. Everything seems to be working fine, except when I use memset or bzero to erase the character arrays I have been using. I have been sitting with this problem for more than an hour and I keep getting seg faults whatever I do. I am getting this error for both bzero and memset. Please help me out.
I am attaching my code below. The statement "Come out of addfront" is printed but none of the "Done with all bzero" statements are printing. I get a segmentation fault at that point. Thank you
void extractFileData(FILE *fp , char clientName[])
{
char tempFileName[50], tempFilePath[100], tempFileSize[50];
struct stat fileDetails;
while(fgets(tempFileName, sizeof(tempFileName), fp)!= NULL)
{
if((newLinePos = strchr(tempFileName, '\n')) != NULL)
{
*newLinePos = '\0';
}
strcat(tempFilePath, "SharedFiles/");
strcat(tempFilePath, tempFileName);
if(stat(tempFilePath, &fileDetails) < 0)
{
perror("Stat error");
exit(1);
}
//Copy it into a string
sprintf(tempFileSize, "%zu", fileDetails.st_size);
printf("temp file size: %s\n", tempFileSize);
//Add all these details to the file list by creating a new node
addFront(tempFileName, tempFileSize, clientName);
printf("Come out of addfront\n");
memset(&tempFileName, 0, 45);
printf("Done with all bzero\n");
memset(&tempFileSize, 0, sizeof(tempFileSize));
memset(&tempFilePath, 0, sizeof(tempFilePath));
printf("Done with all bzero\n");
}
}
EDIT:
void addFront(char fileName[], char fileSize[], char clientName[])
{
FILENODE* n;
printf("Inside add front function\n");
strcpy(n->fileName, fileName);
printf("n->filename: %s\n", n->fileName);
strcpy(n->fileSize, fileSize);
printf("n->filesize: %s\n", n->fileSize);
strcpy(n->ownerName, clientName);
printf("n->ownername: %s\n", n->ownerName);
myFileList.head = n;
printf("Did it go past myfilelist head = n\n");
myFileList.numOfNodes++;
printf("num of nodes: %d\n", myFileList.numOfNodes);
}
I have added my code for the addFront function. It basically adds
the details to a struct myFileList which is basically an implementation
of a linked list. The FILENODE represents each entry in the list.
EDIT:
Adding the structs I am using
struct fileNode
{
char fileName[50];
char fileSize[50];
char ownerName[25];
struct fileNode* next;
};
struct fileList
{
struct fileNode* head;
struct fileNode* tail;
int numOfNodes;
};
typedef struct fileList FILELIST;
typedef struct fileNode FILENODE;
I don't know why your program would crash there. But I can another error in the program. Fix the other error first, see if you still have problems.
This is wrong:
strcat(tempFilePath, "SharedFiles/");
strcat(tempFilePath, tempFileName);
The tempFilePath variable is uninitialized. This may coincidentally not crash, but you cannot rely on it not to crash. It may scribble on your stack.
Do this instead:
snprintf(tempFilePath, sizeof(tempFilePath), "SharedFiles/%s", tempFileName);
Finally, there is no need to zero the arrays. The contents of the arrays are not used in the next loop iteration, so you might as well ignore them.
void extractFileData(FILE *fp , char clientName[])
{
char tempFileName[50], tempFilePath[100], *newLinePos;
struct stat fileDetails;
while (fgets(tempFileName, sizeof(tempFileName), fp)) {
if ((newLinePos = strchr(tempFileName, '\n')))
*newLinePos = '\0';
snprintf(tempFilePath, sizeof(tempFilePath),
"SharedFiles/%s", tempFileName);
if (stat(tempFilePath, &fileDetails) < 0) {
perror("Stat error");
exit(1);
}
printf("temp file size: %zu\n", tempFileSize);
addFront(tempFileName, tempFileSize, clientName);
}
}
The snprintf() function is really the number one choice for doing work like this in C. It's easy to write code with snprintf() that "obviously won't crash", as opposed to code that "won't obviously crash".
If your code still crashes, there is an error somewhere else.
addFront() needs a n = malloc( sizeof *n) before you do anything with it.