parsing into a struct - c

trying to parse a csv file containing lines in this format
1001,Kauri tree,1002,-1,1001,1001
and the struct node has attributes: id, name and then paths[4] // (4 paths)
my code is not working because it seg faults. I'm on a mac so I valgrind doesn't work for me. Can anyone help with my code? or better yet give me another option for debugging? I am using the Geany IDE.
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <ctype.h>
#include <string.h>
#define MAX_NAME_LENGTH 20
#define MAX_LINE_LENGTH 100
typedef struct node_s {
int id;
char name[MAX_NAME_LENGTH + 1];
int paths[4];
} Node;
Node readNode(FILE *infile)
{
char buffer[MAX_LINE_LENGTH];
Node *node = NULL;
char* tok;
const char comma[2] = ",";
char* inputLine = fgets(buffer, MAX_LINE_LENGTH, infile);
if (inputLine != NULL) {
tok = strtok(buffer, comma);
if (tok == NULL) {
node->id = -1;
} else {
node->id = *tok;
tok = strtok(NULL, comma);
strncpy(node->name, tok, MAX_NAME_LENGTH);
int i = 0;
while(tok != NULL || i < 4) {
tok = strtok(NULL, comma);
node->paths[i] = atoi(tok);
i++;
}
}
}
return *node;
}
int main(void)
{
FILE* infile = fopen("new_list.txt", "r");
Node node = readNode(infile);
while (node.id >= 0) {
printf("Node: id = %d, name = '%s', neighbours = [%d, %d, %d, %d]\n",
node.id, node.name,
node.paths[0], node.paths[1], node.paths[2], node.paths[3]);
node = readNode(infile);
}
}

Your primary problem in readNode() is your declaration of Node *node = NULL; declares a pointer that is initialized NULL and points to no valid storage. You then attempt to assign values to the memory pointed to by node (e.g. NULL) invoking Undefined Behavior and almost guaranteeing a SegFault.
You have several options to handle the issue,
provide storage for node in main() and pass as a parameter -- declare node in main() with automatic storage and pass the address of the struct as an additional parameter to readNode(), filling the values within the function (readNode can be declared as void in this case). You can also dynamically allocate node in main() and simply pass the pointer as an additional parameter,
allocate storage for node in readNode() and return a pointer -- dynamically declare storage for node within readNode() using malloc or calloc (don't forget to free the storage at the end of each loop to prevent a memory leak), or
declare readNode() as type Node -- the easiest way is simply to declare node within readNode and rely on the fact that a function can always return its own type to return a filled struct to main() as noted in the comments.
Either way will eliminate your Segmentation Fault as valid storage will be provided for node both in readNode and in main().
While not an error, the standard coding style for C avoids the use of camelCase or MixedCase variable names in favor of all lower-case while reserving upper-case names for use with macros and constants. It is a matter of style -- so it is completely up to you, but failing to follow it can lead to the wrong first impression in some circles. readnode is just a readable readNode, etc..
While it is perfectly fine to pass FILE *infile is a parameter to readnode(), it does you absolutely no good, and invites Undefined Behavior if you fail to validate that infile is open for reading before calling readnode() in main(). Library functions (and your functions should) provide a meaningful return to allow you to determine if the function call succeeded or failed -- use them, always.
Don't hard-code numbers or strings in your code (this is called using magic numbers -- don't do it). Instead, if you need a constant #define one (or more) as you have with MAX_NAME_LENGTH and MAX_LINE_LENGTH, and don't skimp on buffer sizes, e.g.
#include <stdio.h>
#include <string.h>
#define NPATHS 4
#define MAX_NAME_LENGTH 64 /* don't skimp on buffer size */
#define MAX_LINE_LENGTH 512
typedef struct node_s {
int id;
char name[MAX_NAME_LENGTH + 1];
int paths[NPATHS];
} node_t;
(note: an exception is when numeric values are required in your code such as when specifying the scanf field-width modifier, etc. where a defined constant or variable is not allowed)
Reading with fgets is a good approach, but you need to validate that a complete line was read by checking the length against MAX_LINE_LENGTH - 1 and that the last character in the buffer is the newline character. Further, while parsing with strtok is fine, when reading formatted input, calling sscanf on the buffer filled by fgets and validating the number of conversions simplifies the conversion process. It is also helpful to fill a temporary struct with values when parsing with sscanf to protect against some number of members less than all being filled and rendering your node.id >= 0 check invalid in main(). You could do something like the following:
node_t readnode (FILE *infile)
{
char buffer[MAX_LINE_LENGTH];
node_t node = {.id = -1}; /* initialize node to indicate error */
if (fgets (buffer, MAX_LINE_LENGTH, infile)) { /* validate line read */
node_t tmp = {.id = 0}; /* parse to temporary node */
size_t len = strlen (buffer); /* get length */
/* validate complete line read */
if (len == MAX_LINE_LENGTH - 1 && buffer[len-1] != '\n') {
/* handle line too long */
fputs ("error: line too long.\n", stderr);
/* discard remaining characters in line */
while (fgets (buffer, MAX_LINE_LENGTH, infile)) {
len = strlen (buffer);
if (len && buffer[len-1] == '\n')
break;
}
}
/* parse csv values using sscanf, validate return */
if (sscanf (buffer, "%d,%64[^,],%d,%d,%d,%d", &tmp.id, tmp.name,
&tmp.paths[0], &tmp.paths[1], &tmp.paths[2],
&tmp.paths[3]) == 6)
node = tmp; /* good parse, assign tmp to node */
else /* parse failed, issue error, return zeroed node */
fputs ("readnode() error: parse of line failed.\n", stderr);
}
return node; /* return filled node on success, zeroed node otherwise */
}
Putting all the pieces together in a short example based on your code, you could do something similar to to the following that will read from the filename provided as the first-argument to your program (or from stdin by default if no argument is given):
#include <stdio.h>
#include <string.h>
#define NPATHS 4
#define MAX_NAME_LENGTH 64 /* don't skimp on buffer size */
#define MAX_LINE_LENGTH 512
typedef struct node_s {
int id;
char name[MAX_NAME_LENGTH + 1];
int paths[NPATHS];
} node_t;
node_t readnode (FILE *infile)
{
char buffer[MAX_LINE_LENGTH];
node_t node = {.id = -1}; /* initialize node to indicate error */
if (fgets (buffer, MAX_LINE_LENGTH, infile)) { /* validate line read */
node_t tmp = {.id = 0}; /* parse to temporary node */
size_t len = strlen (buffer); /* get length */
/* validate complete line read */
if (len == MAX_LINE_LENGTH - 1 && buffer[len-1] != '\n') {
/* handle line too long */
fputs ("error: line too long.\n", stderr);
/* discard remaining characters in line */
while (fgets (buffer, MAX_LINE_LENGTH, infile)) {
len = strlen (buffer);
if (len && buffer[len-1] == '\n')
break;
}
}
/* parse csv values using sscanf, validate return */
if (sscanf (buffer, "%d,%64[^,],%d,%d,%d,%d", &tmp.id, tmp.name,
&tmp.paths[0], &tmp.paths[1], &tmp.paths[2],
&tmp.paths[3]) == 6)
node = tmp; /* good parse, assign tmp to node */
else /* parse failed, issue error, return zeroed node */
fputs ("readnode() error: parse of line failed.\n", stderr);
}
return node; /* return filled node on success, zeroed node otherwise */
}
int main (int argc, char **argv)
{
node_t node = {.id = 0};
FILE *infile = argc > 1 ? fopen (argv[1], "r") : stdin;
/* validate file open for reading */
if (infile == NULL) {
perror ("fopen-infile");
return 1;
}
node = readnode (infile); /* struct can be assigned */
while (node.id >= 0) {
printf ("Node: id = %4d, name = '%s',%*s"
"neighbours = [%d, % 4d, %d, %d]\n",
node.id, node.name, 11 - (int)strlen(node.name), " ",
node.paths[0], node.paths[1], node.paths[2], node.paths[3]);
node = readnode (infile);
}
if (infile != stdin) fclose (infile); /* if not stdin, close file */
return 0;
}
Example Input File
$ cat dat/struct_node.csv
1001,Kauri tree,1002,-1,1001,1001
1002,Beach tree,1003,-2,1002,1002
1003,Pine tree,1004,-10,1003,1003
1004,Elm tree,1005,-100,1004,1004
Example Use/Output
$ ./bin/struct_rd_csv_node <dat/struct_node.csv
Node: id = 1001, name = 'Kauri tree', neighbours = [1002, -1, 1001, 1001]
Node: id = 1002, name = 'Beach tree', neighbours = [1003, -2, 1002, 1002]
Node: id = 1003, name = 'Pine tree', neighbours = [1004, -10, 1003, 1003]
Node: id = 1004, name = 'Elm tree', neighbours = [1005, -100, 1004, 1004]
Look things over and let me know if you have further questions.

Related

How to load multiple "clones" of structure from FILE? C

I want to learn how to load multiple structures (many students: name, surname, index, address...) from a text file looking like:
Achilles, 9999
Hector, 9998
Menelaos, 9997
... and so on
Struct can be like:
struct student_t {
char *name;
int index;
}
My attempt (doesn't work; I'm not even sure if fgets+sscanf is a considerable option here):
int numStudents=3; //to simplify... I'd need a function to count num of lines, I imagine
int x, y=1000, err_code=1;
FILE *pfile = fopen("file.txt", "r");
if(pfile==0) {return 2;}
STUDENT* students = malloc(numStudents * sizeof *students);
char buffer[1024];
char *ptr[numStudents];
for (x = 0; x < numStudents; x++){ //loop for each student
students[x].name=malloc(100); //allocation of each *name field
fgets(buffer, 100, pfile); //reads 1 line containing data of 1 student, to buffer
if(x==0) *ptr[x] = strtok(buffer, ",");//cuts buffer into tokens: ptr[x] for *name
else *ptr[x] = strtok(NULL, ","); //cuts next part of buffer
sscanf(ptr[x], "%19s", students[x].name); //loads the token to struct field
*ptr[y] = strtok(NULL, ","); //cuts next part of the buffer
students[y].index = (int)strtol(ptr[y], NULL, 10); //loads int token to struct field
*buffer='\0';//resets buffer to the beginning for the next line from x++ fgets...
y++;//the idea with y=1000 is that I need another pointer to each struct field right?
}
for (x = 0; x < numStudents; x++)
printf("first name: %s, index: %d\n",students[x].name, students[x].index);
return students;
Then printf it to see what was loaded. (to simplify my real structure that has 6 fields). I know a nice method to load 1 student from user input...(How to scanf commas, but with commas not assigned to a structure? C) however to load multiple, I have this idea but I'm not sure if it's too clumsy to work or just terrybly written.
Later I'd try to sort students by name , and perhaps even try to do a realloc buffer that increases it's size along with new students being loaded to buffer... and then to sort what'd been loaded... but I imagine that first I need to load it from the file to buffer and from buffer to fill structure, to be able to sort it then?...
Thanks A LOT for all the help!
C is a little harsh. I use GNU getline below, which may be not portable, which you might end up implementing yourself. I use stdin for input FILE * just for simplicity.
The program reads the students list into the students array. Then I sort the students by comparing indexes, then by name, each time with printing out.
Your code is a bit of a mishmash - try to write a separate function for loading a single student, you don't need char ptr[students] just a single char *ptr for strtok function. strtok is a little mixy, I prefer using just strchr mutliple times. I used memcpy to just copy the name from the string and remember to null delimeter it.
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <stddef.h>
#include <stdlib.h>
#include <errno.h>
#include <limits.h>
struct student_s {
char *name;
int index;
};
static int students_name_cmp(const void *a, const void *b)
{
const struct student_s *s1 = a;
const struct student_s *s2 = b;
return strcmp(s1->name, s2->name);
}
static int students_index_cmp(const void *a, const void *b)
{
const struct student_s *s1 = a;
const struct student_s *s2 = b;
return s1->index - s2->index;
}
int main()
{
struct student_s *students = NULL;
size_t students_cnt = 0;
FILE *fp = stdin;
size_t read;
char *line = NULL;
size_t len = 0;
// for each line
while ((read = getline(&line, &len, fp)) != -1) {
// resize students!
students = realloc(students, (students_cnt + 1) * sizeof(*students));
// handle erros
if (students == NULL) {
fprintf(stderr, "ERROR allocating students!\n");
exit(-1);
}
// find the comma in the line
const const char * const commapos = strchr(line, ',');
if (commapos == NULL) {
fprintf(stderr, "ERROR file is badly formatted!\n");
exit(-1);
}
// student has the neme between the start to the comma adding null delimeter
const size_t namelen = (commapos - line) + 1;
// alloc memory for the name and copy it and null delimeter it
students[students_cnt].name = malloc(namelen * sizeof(char));
// handle errors
if (students[students_cnt].name == NULL) {
fprintf(stderr, "ERROR allocating students name!\n");
exit(-1);
}
memcpy(students[students_cnt].name, line, namelen - 1);
students[students_cnt].name[namelen] = '\0';
// convert the string after the comma to the number
// strtol (sadly) discards whitespaces before it, but in this case it's lucky
// we can start after the comma
errno = 0;
char *endptr;
const long int tmp = strtol(&line[namelen], &endptr, 10);
// handle strtol errors
if (errno) {
fprintf(stderr, "ERROR converting student index into number\n");
exit(-1);
}
// handle out of range values, I use INT_MIN/MAX cause index is int, no better idea, depends on application
if (tmp <= INT_MIN || INT_MAX <= tmp) {
fprintf(stderr, "ERROR index number is out of allowed range\n");
exit(-1);
}
students[students_cnt].index = tmp;
// handle the case when the line consist of any more characters then a string and a number
if (*endptr != '\n' && *endptr != '\0') {
fprintf(stderr, "ERROR there are some rabbish characters after the index!");
exit(-1);
}
// finnally, increment students count
students_cnt++;
}
if (line) {
free(line);
}
// sort by index
qsort(students, students_cnt, sizeof(*students), students_index_cmp);
// print students out sorted by index
printf("Students sorted by index:\n");
for (size_t i = 0; i < students_cnt; ++i) {
printf("student[%zu] = '%s', %d\n", i, students[i].name, students[i].index);
}
// now we have students. We can sort them.
qsort(students, students_cnt, sizeof(*students), students_name_cmp);
// print students out sorted by name
printf("Students sorted by name:\n");
for (size_t i = 0; i < students_cnt; ++i) {
printf("student[%zu] = '%s', %d\n", i, students[i].name, students[i].index);
}
// free students, lucky them!
for (size_t i = 0; i < students_cnt; ++i) {
free(students[i].name);
}
free(students);
return 0;
}
For the following input on stdin:
Achilles, 9999
Hector, 9998
Menelaos, 9997
the program outputs:
Students sorted by index:
student[0] = 'Menelaos', 9997
student[1] = 'Hector', 9998
student[2] = 'Achilles', 9999
Students sorted by name:
student[0] = 'Achilles', 9999
student[1] = 'Hector', 9998
student[2] = 'Menelaos', 9997
A test version available here on onlinegdb.

Printing the most frequent occurring words in a given text file, unable to sort by frequency in C

I am working on an assignment that requires me to print the top 10 most occurring words in a given text file. My code is printing the words from the file, but it is not sorting them according to their frequency.
Here is come of my code below. I use a hashtable to store each unique word and its frequency. I am currently sorting the words using the wordcmp function I wrote and calling it in the inbuilt qsort function in main.
If anyone can guide me to fix my error, I'd be very greatful.
My current output:
the top 10 words (out of 10) are:
1 im
1 are
1 again
3 happy
2 hello
1 how
1 lets
1 you
1 try
1 this
Expected output (what I want):
The top 10 words (out of 10) are:
3 happy
2 hello
1 you
1 try
1 this
1 lets
1 im
1 how
1 are
1 again
Here is some of my code:
typedef struct word
{
char *s; /* the word */
int count; /* number of times word occurs */
struct word* next;
}word;
struct hashtable
{
word **table;
int tablesize;
int currentsize;
};
typedef struct hashtable hashtable;
int main(int argc, char *argv[])
{
int top_words = 10;
word *word = NULL;
hashtable *hash = ht_create(5000);
char *file_name;
char *file_word;
FILE *fp;
struct word *present = NULL;
fp = fopen (file_name, "r");
if (fp == NULL)
{
fprintf (stderr,"%s: No such file or directory\n", file_name);
fprintf(stderr,"The top %d words (out of 0) are:\n", top_words);
exit(-1);
}
continue_program:
while ((file_word = getWord(fp)))
{
word = add(hash, file_word, 1);
}
fclose(fp);
qsort((void*)hash->table, hash->currentsize, sizeof(word),(int (*)(const void *, const void *)) wordcmp);
if(top_words > total_unique_words)
top_words = total_unique_words;
printf("the top %d words (out of %d) are:\n", top_words, total_unique_words);
int iterations =0;
for(i =0; i <= hash->tablesize && iterations< top_words; i++)
{
present = hash->table[i];
if(present != NULL)
{
printf(" %4d %s\n", present->count, present->s);
present = present->next;
iterations++;
}
}
freetable(hash);
return 0;
}
int wordcmp (word *a, word *b)
{
if (a != NULL && b!= NULL) {
if (a->count < b->count)
{
return +1;
}
else if (a->count > b->count)
{
return -1;
}
else if (a->count == b->count)
{
/*return strcmp(b->s, a->s);*/
return 0;
}
}
return 0;
}
/* Create a new hashtable. */
struct hashtable *ht_create( int size )
{
int i;
if( size < 1 )
return NULL;
hashtable *table = (hashtable *) malloc(sizeof(hashtable));
table->table = (word **) malloc(sizeof(word *) * size);
if(table != NULL)
{
table->currentsize = 0;
table->tablesize = size;
}
for( i = 0; i < size; i++ )
{
table->table[i] = NULL;
}
return table;
}
/* Adds a new node to the hash table*/
word * add(hashtable *h, char *key, int freq)
{
int index = hashcode(key) % h->tablesize;
word *current = h->table[index];
/* Search for duplicate value */
while(current != NULL) {
if(contains(h, key) == 1){
current->count++;
return current;
}
current = current->next;
}
/* Create new node if no duplicate is found */
word *newnode = (struct word*)malloc(sizeof(struct word));
if(newnode!=NULL){
newnode->s =strdup(key);
newnode-> count = freq;
newnode-> next = NULL;
}
h->table[index] = newnode;
h->currentsize = h->currentsize + 1;
total_unique_words++;
return newnode;
}
The primary problem you are facing is attempting to sort a hashtable with linked-list chaining of buckets. When a hash collision occurs, your table is not resized, you simply use a linked-list to store the word causing the collision at the same table[index] linked to the word already stored there. That is what add does.
This can easily result in the contents of your hashtable looking like this:
table[ 0] = NULL
table[ 1] = foo
table[ 2] = NULL
table[ 3] = |some|->|words|->|that|->|collided| /* chained bucket */
table[ 4] = other
table[ 5] = words
table[ 6] = NULL
table[ 7] = NULL
...
You cannot simply qsort table and hope to get the correct word frequencies. qsort has no way to know that "some" is just the beginning word in a linked-list, all qsort gets is a pointer to "some" and sizeof(word).
To make life much easier, simply forget the hashtable, and use a dynamically allocated array of word**. You can use a similar add where you increment the number of occurrences for duplicates, and you avoid all problems with chained-buckets. (and if you provide automatic storage for each word, it leaves you with a simple free() of your pointers and you are done)
The following example takes 2 arguments. The first the filename to read words from, and (optionally) a second integer value limiting the sorted output to the that top number of words. The words_t struct uses automatic storage for word limited to 32-chars (the largest word in the unabridged dictionary is 28-characters). You can change the way words or read to parse the input and ignore punctuation and plurals as desired. The following delimits words on all punctuation (except the hyphen), and discards the plural form of words (e.g. it stores "Mike" when "Mike's" is encountered, discarding the "'s")
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <errno.h>
#define MAXC 32 /* max word length is 28-char, 29-char is sufficient */
#define MAXW 128 /* initial maximum number of words to allocate */
typedef struct {
char word[MAXC]; /* struct holding individual words */
size_t ninst; /* and the number of times they occur */
} words_t;
/* function prototypes */
void *addword (words_t *words, const char *word, size_t *wc, size_t *maxw);
void *xrealloc (void *ptr, size_t psz, size_t *nelem);
/* qsort compare function for words_t (alphabetical) */
int cmpwrds (const void *a, const void *b)
{
return strcmp (((words_t *)a)->word, ((words_t *)b)->word);
}
/* qsort compare function for words_t (by occurrence - descending)
* and alphabetical (ascending) if occurrences are equal)
*/
int cmpinst (const void *a, const void *b)
{
int ndiff = (((words_t *)a)->ninst < ((words_t *)b)->ninst) -
(((words_t *)a)->ninst > ((words_t *)b)->ninst);
if (ndiff)
return ndiff;
return strcmp (((words_t *)a)->word, ((words_t *)b)->word);
}
int main (int argc, char **argv) {
int c = 0, nc = 0, prev = ' ', total = 0;
size_t maxw = MAXW, wc = 0, top = 0;
char buf[MAXC] = "";
words_t *words = NULL;
FILE *fp = fopen (argv[1], "r");
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
if (argc > 2) { /* if 2 args, convert argv[2] to number of top words */
char *p = argv[2];
size_t tmp = strtoul (argv[2], &p, 0);
if (p != argv[2] && !errno)
top = tmp;
}
/* allocate/validate initial words */
if (!(words = calloc (maxw, sizeof *words))) {
perror ("calloc-words");
return 1;
}
while ((c = fgetc(fp)) != EOF) { /* read each character in file */
if (c != '-' && (isspace (c) || ispunct (c))) { /* word-end found */
if (!isspace (prev) && !ispunct (prev) && /* multiple ws/punct */
!(prev == 's' && nc == 1)) { /* exclude "'s" */
buf[nc] = 0; /* nul-terminate */
words = addword (words, buf, &wc, &maxw); /* add word */
nc = 0; /* reset char count */
}
}
else if (nc < MAXC - 1) { /* add char to buf */
buf[nc++] = c;
}
else { /* chars exceed MAXC - 1; storage capability of struct */
fprintf (stderr, "error: characters exceed %d.\n", MAXC);
return 1;
}
prev = c; /* save previous char */
}
if (!isspace (prev) && !ispunct (prev)) /* handle non-POSIX end */
words = addword (words, buf, &wc, &maxw);
if (fp != stdin) fclose (fp); /* close file if not stdin */
qsort (words, wc, sizeof *words, cmpinst); /* sort words by frequency */
printf ("'%s' contained '%zu' words.\n\n", /* output total No. words */
fp == stdin ? "stdin" : argv[1], wc);
/* output top words (or all words in descending order if top not given) */
for (size_t i = 0; i < (top != 0 ? top : wc); i++) {
printf (" %-28s %5zu\n", words[i].word, words[i].ninst);
total += words[i].ninst;
}
printf ("%33s------\n%34s%5d\n", " ", "Total: ", total);
free (words);
return 0;
}
/** add word to words, updating pointer to word-count 'wc' and
* the maximum words allocated 'maxw' as needed. returns pointer
* to words (which must be assigned back in the caller).
*/
void *addword (words_t *words, const char *word, size_t *wc, size_t *maxw)
{
size_t i;
for (i = 0; i < *wc; i++)
if (strcmp (words[i].word, word) == 0) {
words[i].ninst++;
return words;
}
if (*wc == *maxw)
words = xrealloc (words, sizeof *words, maxw);
strcpy (words[*wc].word, word);
words[(*wc)++].ninst++;
return words;
}
/** realloc 'ptr' of 'nelem' of 'psz' to 'nelem * 2' of 'psz'.
* returns pointer to reallocated block of memory with new
* memory initialized to 0/NULL. return must be assigned to
* original pointer in caller.
*/
void *xrealloc (void *ptr, size_t psz, size_t *nelem)
{ void *memptr = realloc ((char *)ptr, *nelem * 2 * psz);
if (!memptr) {
perror ("realloc(): virtual memory exhausted.");
exit (EXIT_FAILURE);
} /* zero new memory (optional) */
memset ((char *)memptr + *nelem * psz, 0, *nelem * psz);
*nelem *= 2;
return memptr;
}
(note: the output is sorted in descending order of occurrence, and in alphabetical order if words have the same number of occurrences)
Example Use/Output
$ ./bin/getchar_wordcnt_top dat/damages.txt 10
'dat/damages.txt' contained '109' words.
the 12
a 10
in 7
of 7
and 5
anguish 4
injury 4
jury 4
mental 4
that 4
------
Total: 61
Note: to use your hashtable as your basis for storage, you would have to, at minimum, create an array of pointers to each word in your hashtable, and then sort the array of pointers. Otherwise you would need to duplicate storage and copy the words to a new array to sort. (that would be somewhat a memory inefficient approach). Creating a separate array of pointers to each word in your hashtable to sort is about the only way you have to then call qsort and avoid the chained-bucket problem.

how to scan words and store which line scanned from in C?

im trying to make a program that read words from a file and stores each word and the line it appears at, in a list and then prints the words with the lines appeared in alphabetically, any guidance on how to do that?
so far i've put two arrays , words and lines to test my code..but im confused with how to make it read from a file with getting each word and the line it appears in..
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define LEN 7
/* Struct for word and lines that appears in */
struct wordStruct {
char *word;
char *lines;
struct wordStruct *next;
};
static int compare_words(const struct wordStruct *a, const struct wordStruct *b) {
return strcmp(a->word, b->word);
}
static struct wordStruct *insert_sorted(struct wordStruct *headptr, char *word, char *lines) {
/* Struct head */
struct wordStruct **pp = &headptr;
/* Allocate heap space for a record */
struct wordStruct *ptr = malloc(sizeof(struct wordStruct));
if (ptr == NULL) {
abort();
}
/* Assign to structure fields */
ptr->word = word;
ptr->lines = lines;
ptr->next = NULL;
/* Store words in alphabetic order */
while (*pp != NULL && compare_words(ptr, *pp) >= 0) {
pp = &(*pp)->next;
}
ptr->next = *pp;
*pp = ptr;
return headptr;
}
int main(int argc, char **argv) {
char *Arr[LEN] = { "jack", "and", "jill", "went", "up", "the", "hill" };
char *Arr2[LEN] = { "22,1,5", "24,7,3", "50", "26,66", "18,23", "32,22", "24,8" };
int i;
/* Snitialize empty list */
struct wordStruct *headptr = NULL;
/* Snitialize current */
struct wordStruct *current;
/* Insert words in list */
for (i = 0; i < LEN; i++) {
headptr = insert_sorted(headptr, Arr[i], Arr2[i]);
}
current = headptr;
while (current != NULL) {
printf("%s appears in lines %s.\n", current->word, current->lines);
current = current->next;
}
return 0;
}
i thoguht about this too, but im not sure how to merge it with my code to make it get the lines of where the word was found and make a change in Lines in wordStruct..
void read_words (FILE *f) {
char x[1024];
/* assumes no word exceeds length of 1023 */
while (fscanf(f, " %1023s", x) == 1) {
puts(x);
}
}
im confused with how to make it read from a file with getting each word and the line it appears in..
Let us define a line: All the characters up to and including a potential terminating '\n'. The first line is line 1. The last line may or may not end with a '\n'.
Let us define a word: A string consisting of non-white-space characters. For practical and security concerns, limit its size.
Using fscanf(..., "%1023s", ...) work for reading words, but since "%s" consume leading white-spaces, any '\n' are lost for counting lines. Simply pre-fscanf, one character at a time looking for '\n'.
char *GetWord1024(FILE *ifile, char *dest, uintmax_t *linefeed_count) {
// test for bad parameters
assert(ifile && dest && linefeed_count);
// consume leading white space and update count of leading line-feeds
int ch;
while (isspace(ch = fgetc(ifile))) {
if (ch == '\n') {
(*linefeed_count)++;
}
}
ungetc(ch, ifile); // put back non-whitespace character or EOF
if (fscanf(ifile, "%1023s", dest) == 1) {
return dest;
}
return NULL; // No word
}
Sample usage
int main(void) {
uintmax_t linefeed_count = 0;
char word[1024];
while (GetWord1024(stdin, word, &linefeed_count)) {
printf("Line:%ju <%s>\n", linefeed_count + 1, word);
}
return 0;
}

reading large lists through stdin in C

If my program is going to have large lists of numbers passed in through stdin, what would be the most efficient way of reading this in?
The input I'm going to be passing into the program is going to be of the following format:
3,5;6,7;8,9;11,4;;
I need to process the input so that I can use the numbers between the colons (i.e I want to be able to use 3 and 5, 6 and 7 etc etc). The ;; indicates that it is the end of the line.
I was thinking of using a buffered reader to read entire lines and then using parseInt.
Would this be the most efficient way of doing it?
This is a working solution
One way to do this is to use strtok() and store the values in an array. Ideally, dynamically allocated.
int main(int argc, char *argv[])
{
int lst_size=100;
int line_size=255;
int lst[lst_size];
int count=0;
char buff[line_size];
char * token=NULL;
fgets (buff, line_size, stdin); //Get input
Using strtok by passing ',' and ';' as deleminator.
token=strtok(buff, ";,");
lst[count++]=atoi(token);
while(token=strtok(NULL, ";,")){
lst[count++]=atoi(token);
}
Finally you have to account for the double ";;" by reducing the count by 1, because atoi(token) will return 0 for that case and store it in the nth index. Which you don't want.
count--;
}
One other fairly elegant way to handle this is to allow strtol to parse the input by advancing the string to be read to endptr as returned by strtol. Combined with an array allocated/reallocated as needed, you should be able to handle lines of any length (up to memory exhaustion). The example below uses a single array for the data. If you want to store multiple lines, each as a separate array, you can use the same approach, but start with a pointer to array of pointers to int. (i.e. int **numbers and allocate the pointers and then each array). Let me know if you have questions:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#define NMAX 256
int main () {
char *ln = NULL; /* NULL forces getline to allocate */
size_t n = 0; /* max chars to read (0 - no limit) */
ssize_t nchr = 0; /* number of chars actually read */
int *numbers = NULL; /* array to hold numbers */
size_t nmax = NMAX; /* check for reallocation */
size_t idx = 0; /* numbers array index */
if (!(numbers = calloc (NMAX, sizeof *numbers))) {
fprintf (stderr, "error: memory allocation failed.");
return 1;
}
/* read each line from stdin - dynamicallly allocated */
while ((nchr = getline (&ln, &n, stdin)) != -1)
{
char *p = ln; /* pointer for use with strtol */
char *ep = NULL;
errno = 0;
while (errno == 0)
{
/* parse/convert each number on stdin */
numbers[idx] = strtol (p, &ep, 10);
/* note: overflow/underflow checks omitted */
/* if valid conversion to number */
if (errno == 0 && p != ep)
{
idx++; /* increment index */
if (!ep) break; /* check for end of str */
}
/* skip delimiters/move pointer to next digit */
while (*ep && (*ep <= '0' || *ep >= '9')) ep++;
if (*ep)
p = ep;
else
break;
/* reallocate numbers if idx = nmax */
if (idx == nmax)
{
int *tmp = realloc (numbers, 2 * nmax * sizeof *numbers);
if (!tmp) {
fprintf (stderr, "Error: struct reallocation failure.\n");
exit (EXIT_FAILURE);
}
numbers = tmp;
memset (numbers + nmax, 0, nmax * sizeof *numbers);
nmax *= 2;
}
}
}
/* free mem allocated by getline */
if (ln) free (ln);
/* show values stored in array */
size_t i = 0;
for (i = 0; i < idx; i++)
printf (" numbers[%2zu] %d\n", i, numbers[i]);
/* free mem allocate to numbers */
if (numbers) free (numbers);
return 0;
}
Output
$ echo "3,5;6,7;8,9;11,4;;" | ./bin/prsistdin
numbers[ 0] 3
numbers[ 1] 5
numbers[ 2] 6
numbers[ 3] 7
numbers[ 4] 8
numbers[ 5] 11
numbers[ 6] 4
Also works where the string is stored in a file as:
$ cat dat/numsemic.csv | ./bin/prsistdin
or
$ ./bin/prsistdin < dat/numsemic.csv
Using fgets and without size_t
It took a little reworking to come up with a revision I was happy with that eliminated getline and substituted fgets. getline is far more flexible, handling the allocation of space for you, with fgets it is up to you. (not to mention getline returning the actual number of chars read without having to call strlen).
My goal here was to preserve the ability to read any length line to meet your requirement. That either meant initially allocating some huge line buffer (wasteful) or coming up with a scheme that would reallocate the input line buffer as needed in the event it was longer than the space initially allocate to ln. (this is what getline does so well). I'm reasonably happy with the results. Note: I put the reallocation code in functions to keep main reasonably clean. footnote 2
Take a look at the following code. Note, I have left the DEBUG preprocessor directives in the code allowing you to compile with the -DDEBUG flag if you want to have it spit out each time it allocates. [footnote 1] You can compile the code with:
gcc -Wall -Wextra -o yourexename yourfilename.c
or if you want the debugging output (e.g. set LMAX to 2 or something less than the line length), use the following:
gcc -Wall -Wextra -o yourexename yourfilename.c -DDEBUG
Let me know if you have questions:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#define NMAX 256
#define LMAX 1024
char *realloc_char (char *sp, unsigned int *n); /* reallocate char array */
int *realloc_int (int *sp, unsigned int *n); /* reallocate int array */
char *fixshortread (FILE *fp, char **s, unsigned int *n); /* read all stdin */
int main () {
char *ln = NULL; /* dynamically allocated for fgets */
int *numbers = NULL; /* array to hold numbers */
unsigned int nmax = NMAX; /* numbers check for reallocation */
unsigned int lmax = LMAX; /* ln check for reallocation */
unsigned int idx = 0; /* numbers array index */
unsigned int i = 0; /* simple counter variable */
char *nl = NULL;
/* initial allocation for numbers */
if (!(numbers = calloc (NMAX, sizeof *numbers))) {
fprintf (stderr, "error: memory allocation failed (numbers).");
return 1;
}
/* initial allocation for ln */
if (!(ln = calloc (LMAX, sizeof *ln))) {
fprintf (stderr, "error: memory allocation failed (ln).");
return 1;
}
/* read each line from stdin - dynamicallly allocated */
while (fgets (ln, lmax, stdin) != NULL)
{
/* provide a fallback to read remainder of line
if the line length exceeds lmax */
if (!(nl = strchr (ln, '\n')))
fixshortread (stdin, &ln, &lmax);
else
*nl = 0;
char *p = ln; /* pointer for use with strtol */
char *ep = NULL;
errno = 0;
while (errno == 0)
{
/* parse/convert each number on stdin */
numbers[idx] = strtol (p, &ep, 10);
/* note: overflow/underflow checks omitted */
/* if valid conversion to number */
if (errno == 0 && p != ep)
{
idx++; /* increment index */
if (!ep) break; /* check for end of str */
}
/* skip delimiters/move pointer to next digit */
while (*ep && (*ep <= '0' || *ep >= '9')) ep++;
if (*ep)
p = ep;
else
break;
/* reallocate numbers if idx = nmax */
if (idx == nmax)
realloc_int (numbers, &nmax);
}
}
/* free mem allocated by getline */
if (ln) free (ln);
/* show values stored in array */
for (i = 0; i < idx; i++)
printf (" numbers[%2u] %d\n", (unsigned int)i, numbers[i]);
/* free mem allocate to numbers */
if (numbers) free (numbers);
return 0;
}
/* reallocate character pointer memory */
char *realloc_char (char *sp, unsigned int *n)
{
char *tmp = realloc (sp, 2 * *n * sizeof *sp);
#ifdef DEBUG
printf ("\n reallocating %u to %u\n", *n, *n * 2);
#endif
if (!tmp) {
fprintf (stderr, "Error: char pointer reallocation failure.\n");
exit (EXIT_FAILURE);
}
sp = tmp;
memset (sp + *n, 0, *n * sizeof *sp); /* memset new ptrs 0 */
*n *= 2;
return sp;
}
/* reallocate integer pointer memory */
int *realloc_int (int *sp, unsigned int *n)
{
int *tmp = realloc (sp, 2 * *n * sizeof *sp);
#ifdef DEBUG
printf ("\n reallocating %u to %u\n", *n, *n * 2);
#endif
if (!tmp) {
fprintf (stderr, "Error: int pointer reallocation failure.\n");
exit (EXIT_FAILURE);
}
sp = tmp;
memset (sp + *n, 0, *n * sizeof *sp); /* memset new ptrs 0 */
*n *= 2;
return sp;
}
/* if fgets fails to read entire line, fix short read */
char *fixshortread (FILE *fp, char **s, unsigned int *n)
{
unsigned int i = 0;
int c = 0;
i = *n - 1;
realloc_char (*s, n);
do
{
c = fgetc (fp);
(*s)[i] = c;
i++;
if (i == *n)
realloc_char (*s, n);
} while (c != '\n' && c != EOF);
(*s)[i-1] = 0;
return *s;
}
footnote 1
nothing special about the choice of the word DEBUG (it could have been DOG, etc..), the point to take away is if you want to conditionally include/exclude code, you can simply use preprocessor flags to do that. You just add -Dflagname to pass flagname to the compiler.
footnote 2
you can combine the reallocation functions into a single void* function that accepts a void pointer as its argument along with the size of the type to be reallocated and returns a void pointer to the reallocated space -- but we will leave that for a later date.
What you could do is read in from stdin using fgets or fgetc. You could also use getline() since you're reading in from stdin.
Once you read in the line you can use strtok() with the delimiter for ";" to split the string into pieces at the semicolons. You can loop through until strok() is null, or in this case, ';'. Also in C you should use atoi() to convert strings to integers.
For Example:
int length = 256;
char* str = (char*)malloc(length);
int err = getline(&str, &length, stdin);
I would read in the command args, then parse using the strtok() library method
http://man7.org/linux/man-pages/man3/strtok.3.html
(The web page referenced by the URL above even has a code sample of how to use it.)
I'm a little rusty at C, but could this work for you?
char[1000] remainder;
int first, second;
fp = fopen("C:\\file.txt", "r"); // Error check this, probably.
while (fgets(&remainder, 1000, fp) != null) { // Get a line.
while (sscanf(remainder, "%d,%d;%s", first, second, remainder) != null) {
// place first and second into a struct or something
}
}
getchar_unlocked() is what you are looking for.
Here is the code:
#include <stdio.h>
inline int fastRead_int(int * x)
{
register int c = getchar_unlocked();
*x = 0;
// clean stuff in front of + look for EOF
for(; ((c<48 || c>57) && c != EOF); c = getchar_unlocked());
if(c == EOF)
return 0;
// build int
for(; c>47 && c<58 ; c = getchar_unlocked()) {
*x = (*x<<1) + (*x<<3) + c - 48;
}
return 1;
}
int main()
{
int x;
while(fastRead_int(&x))
printf("%d ",x);
return 0;
}
For input 1;2;2;;3;;4;;;;;54;;;; the code above produces 1 2 2 3 4 54.
I guarantee, this solution is a lot faster than others presented in this topic. It is not only using getchar_unlocked(), but also uses register, inline as well as multiplying by 10 tricky way: (*x<<1) + (*x<<3).
I wish you good luck in finding better solution.

Linked list gives same results C

At the end of the method, all my test printfs prints the same results. The last line of the file. But current printf in the while loop is working correctly. For some reason my nodes have all the same results. How can I fix it?
This is my struct unit:
struct unit
{
struct unit * next;
char *name;
};
This is my function for linked list adding lines one by one to the linked list:
void readFile(char fileName[], struct unit * units)
{
FILE * fp;
char *line = NULL;
int length = 1000;
fp = fopen(fileName, "r");
int counter = 0;
int strLength = 0;
struct unit * current;
units = (struct units*)malloc(sizeof(struct unit));
current = units;
while ( getline(&line, &length, fp) != -1)
{
strLength = strlen(&line);
current->name = (char*)malloc(sizeof(char)* strLength);
current->next = (struct units*)malloc (sizeof(struct unit));
strcpy(&current->name, &line);
printf("\nCurrent: %s",current->name);
current = current->next;
counter++;
}
printf("\nTest %s", units->name);
printf("\nTest %s", units->next->name);
printf("\nTest %s", units->next->next->name);
printf("\nTest %s", units->next->next->next->name);
}
Why are you passing in &line into strlen and strcpy? If I remember correctly, you should just pass in line and current->name into these functions. (I don't know about getline though; maybe that's fine as-is.)
This worked for me (Built and run with a file with several lines. I had to change the getline function for my compiler: also changed several "units" for "unit" which is the name of the struct. Also the line for buffering is statically reserved with a maximum length of 255 characters):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct unit{
struct unit * next;
char *name;
};
void readFile(char fileName[], struct unit * units){
FILE * fp;
char line[255];
int length = 1000;
fp = fopen(fileName, "r");
int counter = 0;
int strLength = 0;
struct unit * current;
units = (struct unit*)malloc(sizeof(struct unit));
current = units;
while ( fgets ( line, sizeof line, fp ) != NULL ) /* read a line */
{
strLength = strlen(line);
current->name = (char*)malloc(sizeof(char)* strLength);
current->next = (struct unit*)malloc (sizeof(struct unit));
strcpy(current->name, line);
printf("\nCurrent: %s",current->name);
current = current->next;
counter++;
}
fclose ( fp );
printf("\nTest %s", units->name);
printf("\nTest %s", units->next->name);
printf("\nTest %s", units->next->next->name);
printf("\nTest %s", units->next->next->next->name);
}
int main(){
readFile("filename.txt", NULL);
}
Your code has several bad practices, and several bugs. You do not need to preallocate a node before entering your loop. You can simply allocate as-needed. There are a number of ways to ensure the newly allocated node is added to the end of the list using a technique called forward-chaining. I'll get to that in a minute. The following list is in no particular order, but I at least tried to work it top-down
Major: Your readFile() function should return the head of the list it is allocating. If you want to join this to some other list after that, feel free, but the function should start with this:
struct unit* readFile(const char fileName[])
Minor: Note also we're not modifying the file name, so there is no reason to pass it as mutable, thus it is non-const.
Major: Check your file open operation for success before using it:
fp = fopen(fileName, "r");
if (fp == NULL)
{
perror("Failed to open file.");
return NULL;
}
Major: Use properly typed variable for the API calls your making. The function getline(), a non-standard extension, is prototyped as:
ssize_t getline(char ** , size_t * , FILE *)
It returns a ssize_t (a "signed" size-type) and takes a size_t* for the second parameter. You're passing the address of an int variable, length, as the second parameter. This is no guarantee the two are compatible types. Fix this by declaring length as the proper type; size_t
size_t length = 0;
Minor: The same issue happens with the return value type of strlen(), which is also size_t, but that will become unimportant in a moment as you'll soon see.
Major: Your use of getline() apart from the second parameter mentioned before is almost correct. The initial input on the first loop is the address of a NULL pointer and a 0-valued length. With each iteration if the buffer already allocated in the previous loop is big enough, it is reused. Unfortunately reading a shorter line, then a longer, then a shorter, and then longer will introduce extra allocates that aren't needed. In fact, You can forego your malloc() logic entirely and just use getline() for allocating your buffer, since it is documented to use malloc() compatible allocation. Therefore, using your existing logic (which we will go over lastly):
while ( getline(&line, &length, fp) != -1)
{
// note: added to throw out empty lines
if (length > 0)
{
// note: added to null out trailing newline. see the
// documentation of getline() for more info.
if (line[length-1] == '\n')
line[length-1] = 0;
}
if (line[0] != 0)
{
// other code here
current->name = line;
}
else
{ // not using this. release it.
free(line);
}
// reset line and length for next iteration
line = NULL;
length = 0;
}
Major: Your original algorithm never free()d the line buffer once you were done with it, thereby introducing a one-time memory leak. Using the above alternative, you need not worry about it.
Alternate: Finally, the list population loop can be made more robust by applying everything discussed so far, and adding to it a technique called forward-chaining. This technique uses a pointer-to-pointer pp that always holds the address of the pointer that will receive the next node allocation. If the list is initially empty( and it is), it holds the address of the head pointer. With each new node added pp is assigned the address of the last node's next member. When the loop is complete (even if i didn't add any nodes), we finish by setting *pp = NULL to terminate the list.
This is the final code base for readFile. I hope you find it useful:
struct unit* readFile(char fileName[])
{
FILE * fp;
char *line = NULL;
size_t length = 0;
// used for populating the list
struct unit *head = NULL;
struct unit **pp = &head;
// open file
fp = fopen(fileName, "r");
if (fp == NULL)
{
perror("Failed to open file");
return NULL;
}
while ( getline(&line, &length, fp) != -1)
{
// note: added to throw out empty lines
if (length > 0)
{
// note: added to null out trailing newline. see the
// documentation of getline() for more info.
if (line[length-1] == '\n')
line[length-1] = 0;
}
if (line[0] != 0)
{
// allocate new node
*pp = malloc(sizeof(**pp));
if (*pp != NULL)
{
(*pp)->name = line;
pp = &(*pp)->next;
}
else
{ // could not allocate a new node. uh oh.
perror("Failed to allocate new node");
free(line);
break;
}
}
else
{ // not using this. release it.
free(line);
}
// reset line and length for next iteration
line = NULL;
length = 0;
}
*pp = NULL;
return head;
}

Resources