Counting Syllables in an Array - c

I am trying to count all syllables in each word of a string that I passed into an array.
A syllable counts as two vowels adjacent to each other (a, e, i , o ,u, y). For example, the "ee" in "peel" counts as 1 syllable. But, the "u" and "e" in "juked" count as 2 syllables. The "e" at the end of a word does not count as a syllable. Also, each word has at least 1 syllable even if the previous rules don't apply.
I have a file that contained a large string (words, spaces, and newlines) that I passed into an array.
I have code that counts each word by counting the whitespace between them and newlines. See below:
for (i = 0; i < lengthOfFile; i++)
{
if (charArray[i] == ' ' || charArray[i] == '\n')
{
wordCount++;
}
}
Where charArray is the file that is passed into an array (freads) and lengthOfFile is the total bytes in the file counted by (fseek) and wordCount is total words counted.
From here, I need to somehow count the syllables in each word in the array but don't know where to start.

If you are still having trouble, it's only because you are overthinking the problem. Whenever you are counting, determining frequency, etc.., you can normally simplify things by using a "State-Loop". A state-loop is nothing more than a loop where you loop over each character (or whatever) and handle whatever state you find yourself in, like:
have I read any characters? (if not, handle that state);
is the current character a space? (if so, assuming no multiple-spaces for simplicity, you have reached the end of a word, handle that state);
is the current character a non-space and non-vowel? (if so, if my last character was a vowel, increment my syllable count); and
what do I need to do regardless of the classification of the current char? (output it, set last = current, etc..)
That's basically it and can be translated into a single loop with a number of tests to handle each state. You can also add a check to insure that words like "my" and "he" are counted as a single syllable by checking if your syllable count is zero when you reach the end of the word.
Putting it altogether, you could write a basic implementation like:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main (void) {
char c, last = 0; /* current & last char */
const char *vowels = "AEIOUYaeiouy"; /* vowels (plus Yy) */
size_t syllcnt = 0, totalcnt = 0; /* word syllable cnt & total */
while ((c = getchar()) != EOF) { /* read each character */
if (!last) { /* if 1st char (no last) */
putchar (c); /* just output it */
last = c; /* set last */
continue; /* go get next */
}
if (isspace (c)) { /* if space, end of word */
if (!syllcnt) /* if no syll, it's 1 (he, my) */
syllcnt = 1;
printf (" - %zu\n", syllcnt); /* output syll cnt and '\n' */
totalcnt += syllcnt; /* add to total */
syllcnt = 0; /* reset syllcnt to zero */
} /* otherwise */
else if (!strchr (vowels, c)) /* if not vowel */
if (strchr (vowels, last)) /* and last was vowel */
syllcnt++; /* increment syllcnt */
if (!isspace (c)) /* if not space */
putchar (c); /* output it */
last = c; /* set last = c */
}
printf ("\n total syllables: %zu\n", totalcnt);
}
(note: as mentioned above, this simple example implementation does not consider multiple spaces between words -- which you can simply add as another needed condition by checking whether !isspace (last). Can you figure out where that check should be added, hint: it's added to an existing check with && -- fine tuning is left to you)
Example Use/Output
$ echo "my dog eats banannas he peels while getting juked" | ./syllablecnt
my - 1
dog - 1
eats - 1
banannas - 3
he - 1
peels - 1
while - 1
getting - 2
juked - 2
total syllables: 13
If you need to read words from a file, simply redirect the file as input to the program on stdin, e.g.
./syllablecnt < inputfile
Edit - Reading from File into Dynamically Allocated Buffer
Following on from the comments about wanting to read from a file (or stdin) into a dynamically sized buffer and then traversing the buffer to output the syllables per-word and total syllables, you could do something like the following that simply reads all characters from a file into a buffer initially allocated holding 8-characters and is reallocated as needed (doubling the allocation size each time a realloc is needed). That is a fairly standard and reasonably efficient buffer growth strategy. You are free to grow it by any size you like, but avoid many small rabbit-pellet reallocations as memory allocation is relatively expensive from a computing standpoint.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define NCHAR 8 /* initial characters to allocate */
int main (int argc, char **argv) {
char c, last = 0, *buffer; /* current, last & pointer */
const char *vowels = "AEIOUYaeiouy"; /* vowels */
size_t syllcnt = 0, totalcnt = 0, /* word syllable cnt & total */
n = 0, size = NCHAR;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("fopen-file");
return 1;
}
/* allocate/validate initial NCHAR buffer size */
if (!(buffer = malloc (size))) {
perror ("malloc-buffer");
return 1;
}
while ((c = fgetc(fp)) != EOF) { /* read each character */
buffer[n++] = c; /* store, increment count */
if (n == size) { /* reallocate as required */
void *tmp = realloc (buffer, 2 * size);
if (!tmp) { /* validate realloc */
perror ("realloc-tmp");
break; /* still n good chars in buffer */
}
buffer = tmp; /* assign reallocated block to buffer */
size *= 2; /* update allocated size */
}
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
for (size_t i = 0; i < n; i++) { /* loop over all characters */
c = buffer[i]; /* set to c to reuse code */
if (!last) { /* if 1st char (no last) */
putchar (c); /* just output it */
last = c; /* set last */
continue; /* go get next */
}
if (isspace(c) && !isspace(last)) { /* if space, end of word */
if (!syllcnt) /* if no syll, it's 1 (he, my) */
syllcnt = 1;
printf (" - %zu\n", syllcnt); /* output syll cnt and '\n' */
totalcnt += syllcnt; /* add to total */
syllcnt = 0; /* reset syllcnt to zero */
} /* otherwise */
else if (!strchr (vowels, c)) /* if not vowel */
if (strchr (vowels, last)) /* and last was vowel */
syllcnt++; /* increment syllcnt */
if (!isspace (c)) /* if not space */
putchar (c); /* output it */
last = c; /* set last = c */
}
free (buffer); /* don't forget to free what you allocate */
printf ("\n total syllables: %zu\n", totalcnt);
}
(you can do the same using fgets or using POSIX getline, or allocate all at once with your fseek/ftell or stat and then fread the entire file into a buffer in a single call -- up to you)
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/syllablecnt_array dat/syllables.txt
==19517== Memcheck, a memory error detector
==19517== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==19517== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==19517== Command: ./bin/syllablecnt_array dat/syllables.txt
==19517==
my - 1
dog - 1
eats - 1
banannas - 3
he - 1
peels - 1
while - 1
getting - 2
juked - 2
total syllables: 13
==19517==
==19517== HEAP SUMMARY:
==19517== in use at exit: 0 bytes in 0 blocks
==19517== total heap usage: 5 allocs, 5 frees, 672 bytes allocated
==19517==
==19517== All heap blocks were freed -- no leaks are possible
==19517==
==19517== For counts of detected and suppressed errors, rerun with: -v
==19517== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.

Related

Dynamically allocated string in dynamic structure array(seg fault)

I want to read the entire file(line by line)into a char pointer"name" in the struct array.(Wanna keep the names (can be of arbitrary length) in a dynamically allocated string Then I will divide the readed string(name) into chunks(age name score) in struct.I get seg fault.(file format is:
age name score
25,Rameiro Rodriguez,3
30,Anatoliy Stephanos,0
19,Vahan: Bohuslav,4.2
struct try{
double age;
char *name;
double score;
};
void allocate_struct_array(struct try **parr,int total_line);
int main(){
int count=0,i=0;
char ch;
fileptr = fopen("book.txt", "r");
//total line in the file is calculated
struct try *parr;
allocate_struct_array(&parr,count_lines);
//i got segmentation fault at below.(parsing code is not writed yet just trying to read the file)
while((ch=fgetc(fileptr))!=EOF) {
count++;
if(ch=='\n'){
parr->name=malloc(sizeof(char*)*count+1);
parr[i].name[count+1]='\0';
parr+=1;
count=0;
}
}
fclose(fileptr);
}
void allocate_struct_array(struct try **parr,int total_line){
*parr = malloc(total_line * sizeof(struct try));
}
Continuing from my comment, in allocate_struct_array(struct try **parr,int total_line), you allocate a block of struct try not a block of pointers (e.g. struct try*). Your allocation parr->name=malloc(sizeof(char*)*count+1); attempts to allocate count + 1 pointers. Moreover, on each iteration, you overwrite the address held by parr->name creating a memory leak because the pointer to the prior allocation is lost and cannot be freed.
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
A better approach to your problem is to read each line into a simply character array (of sufficient size to hold each line). You can then separate age, name and score and determine the number of characters in name so you can properly allocate for parr[i].name and then you can copy the name after you have allocated. If you are careful about it, you can simply locate both ',' in the buffer, allocate for parr[i].name and then use sscanf() with a proper format-string to separate, convert and copy all values to your struct parr[i] in a single call.
Since you have given no way to determine how //total line in the file is calculated, we will just presume a number large enough to accommodate your example file for purposes of discussion. Finding that number is left to you.
To read each line into an array, simply declare a buffer (character array) large enough to hold each line (take your longest expected line and multiply by 2 or 4, or if on a typical PC, just use a buffer of 1024 or 2048 bytes that will accommodate all but the obscure file with lines longer than that. (Rule: Don't Skimp On Buffer Size!!) You can do that with, e.g.
#define COUNTLINES 10 /* if you need a constant, #define one (or more) */
#define MAXC 1024
#define NUMSZ 64
...
int main (int argc, char **argv) {
char buf[MAXC]; /* temporary array to hold each line */
...
When reading until '\n' or EOF in a loop, it is easier to loop continually and check for EOF within the loop. That way the final line is handled as a normal part of your read loop and you don't need a special final code block to handle the last line, e.g.
while (nparr < count_lines) { /* protect your allocation bounds */
int ch = fgetc (fileptr); /* ch must be type int */
if (ch != '\n' && ch != EOF) { /* if not \n and not EOF */
...
}
else if (count) { /* only process buf if chars present */
...
}
if (ch == EOF) { /* if EOF, now break */
break;
}
}
(note: for your example we have continued to read with the fgetc() you used, but in normal practice you would simply use fgets() to fill the character array with the line)
To find the first and last ',' in the array, you can simply #include <string.h> and use strchar() to find the first and strrchr() to find the last. Using a pointer and end-pointer set to the first and last ',' the number of characters in name becomes ep - p - 1;. You can find the ','s and find the length of name with:
char *p = buf, *ep; /* pointer & end-pointer */
...
/* locate 1st ',' with p and last ',' with ep */
if ((p = strchr (buf, ',')) && (ep = strrchr (buf, ',')) &&
p != ep) { /* confirm pointers don't point to same ',' */
size_t len = ep - p - 1; /* get length of name */
Once you have found the first ',' and second ',' and determined the number of characters in name, you allocate characters, not pointers, e.g. with len characters in name and nparr as the struct index (instead of your i) you would do:
parr[nparr].name = malloc (len + 1); /* allocate */
if (!parr[nparr].name) { /* validate */
perror ("malloc-parr[nparr].name");
break;
}
(note: you break instead of exit on allocation error as all prior structs allocated for and filled will still contain valid data that you can use)
Now you can craft a sscanf() format string and separate age, name and score in a single call, e.g.
/* separate buf & convert into age, name, score -- validate */
if (sscanf (buf, "%d,%[^,],%lf", &parr[nparr].age,
parr[nparr].name, &parr[nparr].score) != 3) {
fputs ("error: invalid line format.\n", stderr);
...
}
Putting it altogether into a short program to read and separate your exmaple file, you could do:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define COUNTLINES 10 /* if you need a constant, #define one (or more) */
#define MAXC 1024
#define NUMSZ 64
typedef struct { /* typedef for convenient use as type */
int age; /* age is generally an integer, not double */
char *name;
double score;
} try;
/* always provde a meaningful return when function can
* succeed or fail. Return result of malloc.
*/
try *allocate_struct_array (try **parr, int total_line)
{
return *parr = malloc (total_line * sizeof **parr);
}
int main (int argc, char **argv) {
char buf[MAXC]; /* temporary array to hold each line */
int count = 0,
nparr = 0,
count_lines = COUNTLINES;
try *parr = NULL;
/* use filename provided as 1st argument (book.txt by default) */
FILE *fileptr = fopen (argc > 1 ? argv[1] : "book.txt", "r");
if (!fileptr) { /* always validate file open for reading */
perror ("fopen-fileptr");
return 1;
}
if (!fgets (buf, MAXC, fileptr)) { /* read/discard header line */
fputs ("file-empty\n", stderr);
return 1;
}
/* validate every allocation */
if (allocate_struct_array (&parr, count_lines) == NULL) {
perror ("malloc-parr");
return 1;
}
while (nparr < count_lines) { /* protect your allocation bounds */
int ch = fgetc (fileptr); /* ch must be type int */
if (ch != '\n' && ch != EOF) { /* if not \n and not EOF */
buf[count++] = ch; /* add char to buf */
if (count + 1 == MAXC) { /* validate buf not full */
fputs ("error: line too long.\n", stderr);
count = 0;
continue;
}
}
else if (count) { /* only process buf if chars present */
char *p = buf, *ep; /* pointer & end-pointer */
buf[count] = 0; /* nul-terminate buf */
/* locate 1st ',' with p and last ',' with ep */
if ((p = strchr (buf, ',')) && (ep = strrchr (buf, ',')) &&
p != ep) { /* confirm pointers don't point to same ',' */
size_t len = ep - p - 1; /* get length of name */
parr[nparr].name = malloc (len + 1); /* allocate */
if (!parr[nparr].name) { /* validate */
perror ("malloc-parr[nparr].name");
break;
}
/* separate buf & convert into age, name, score -- validate */
if (sscanf (buf, "%d,%[^,],%lf", &parr[nparr].age,
parr[nparr].name, &parr[nparr].score) != 3) {
fputs ("error: invalid line format.\n", stderr);
if (ch == EOF) /* if at EOF on failure */
break; /* break read loop */
else {
count = 0; /* otherwise reset count */
continue; /* start read of next line */
}
}
}
nparr += 1; /* increment array index */
count=0; /* reset count zero */
}
if (ch == EOF) { /* if EOF, now break */
break;
}
}
fclose(fileptr); /* close file */
for (int i = 0; i < nparr; i++) {
printf ("%3d %-20s %5.1lf\n",
parr[i].age, parr[i].name, parr[i].score);
free (parr[i].name); /* free strings when done */
}
free (parr); /* free struxts */
}
(note: Never Hardcode Filenames or use Magic-Numbers in your code. If you need a constant, #define ... one. Pass the filename to read as the first argument to your program or take the filename as input. You shouldn't have to recompile your code just to read from a different filename)
Example Use/Output
With your example data in dat/parr_name.txt, you would have:
$ ./bin/parr_name dat/parr_name.txt
25 Rameiro Rodriguez 3.0
30 Anatoliy Stephanos 0.0
19 Vahan: Bohuslav 4.2
Memory Use/Error Check
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/parr_name dat/parr_name.txt
==17385== Memcheck, a memory error detector
==17385== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17385== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==17385== Command: ./bin/parr_name dat/parr_name.txt
==17385==
25 Rameiro Rodriguez 3.0
30 Anatoliy Stephanos 0.0
19 Vahan: Bohuslav 4.2
==17385==
==17385== HEAP SUMMARY:
==17385== in use at exit: 0 bytes in 0 blocks
==17385== total heap usage: 7 allocs, 7 frees, 5,965 bytes allocated
==17385==
==17385== All heap blocks were freed -- no leaks are possible
==17385==
==17385== For counts of detected and suppressed errors, rerun with: -v
==17385== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Using fgets() To Read Each Line And A Temp Array For name
To not leave you with the wrong impression, this problem can be simplified substantially by reading each line into a character array using fgets() and separating the needed values with sscanf(), saving name into a temporary array of sufficient size. Now all that is needed is to allocate for parr[nparr].name and then copy the temporary name to parr[nparr].name.
By doing it this way you substantially reduce the complexity of reading character-by-character and by using a temporary array for name, you eliminate having to locate the ',' in order to obtain the length of the name.
The only changes needed are to add a new constant for the temporary name array and then you can replace the entire read-loop with:
#define NAMSZ 256
...
/* protect memory bounds, read each line into buf */
while (nparr < count_lines && fgets (buf, MAXC, fileptr)) {
char name[NAMSZ]; /* temporary array for name */
size_t len; /* length of name */
/* separate buf into age, temp name, score & validate */
if (sscanf (buf, "%d,%[^,],%lf", &parr[nparr].age, name,
&parr[nparr].score) != 3) {
fputs ("error: invalid line format.\n", stderr);
continue;
}
len = strlen (name); /* get length of name */
parr[nparr].name = malloc (len + 1); /* allocate for name */
if (!parr[nparr].name) { /* validate allocation */
perror ("malloc-parr[nparr].name");
break;
}
memcpy (parr[nparr].name, name, len + 1);
nparr += 1;
}
fclose(fileptr); /* close file */
...
(same output and same memory check)
Also note you can allocate and copy as a single operation if your compiler provides strdup(). That would reduce the allocation and copy of name to a single call, e.g.
parr[nparr].name = strdup (name);
Since strdup() allocates memory (and can fail), you must validate the allocation just as you would if you were using malloc() amd memcpy(). But, understand, strdup() is not standard C. It is a POSIX function that isn't part of the standard library.
The other improvement you can make is adding logic to call realloc() when your block of struct (parr) is full. That way you can start with some reasonably anticipated number of struct and then reallocate more whenever you run out. This will eliminate the artificial limit on the number of lines you can store -- and remove the need to know count_lines. (there are numerous examples on this site of how to use realloc(), the implementation is left to you.
Look things over and let me know if you have further questions.

Saving values to 2D array

according to my task I need to read a file passed as a command line argument using C and store its content (each character) to an 2D array to be able change array's values later and save the changed content to another file. NVM some custom functions.
Here is an example of a file I need to read:
#,#,#,#,#,#,.,#,.,.,.$
#,.,#,.,.,#,.,#,#,#,#$
#,.,#,.,.,.,.,.,.,#,#$
#,.,#,.,.,#,#,#,#,#,#$
#,.,.,#,.,.,.,.,.,.,#$
#,.,.,.,#,.,#,#,.,.,#$
#,.,.,.,.,#,.,.,.,.,#$
#,.,.,.,.,#,.,.,.,.,#$
#,.,.,.,.,.,.,.,.,.,#$
#,#,#,#,#,#,#,#,#,.,#$
Here is what I've tried:
int main(int argc, char *argv[]) {
int startX = 3;
int startY = 3;
int endX = 6;
int endY = 6;
int count = 0;
int x = 0;
int y = 0;
int fd = open(argv[1], O_RDONLY);
char ch;
if (fd == -1) {
mx_printerr("map does not exist\n");
exit(-1);
}
int targetFile =
open("path.txt", O_CREAT | O_EXCL | O_WRONLY, S_IWUSR | S_IRUSR);
while (read(fd, &ch, 1)) {
if (ch == '\n') {
x++;
}
if (ch != ',') {
count++;
}
}
fd = open(argv[1], O_RDONLY);
y = (count - x) / x;
char **arr;
arr = malloc(sizeof(char *) * x);
for (int i = 0; i < x; i++) arr[i] = malloc(y);
int tempX = 0, tempY = 0, tempCount = 0;
char tempString[count - x];
// the loop in question >>>>>
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 11; j++) {
while (read(fd, &ch, 1)) {
if (ch != ',') {
arr[i][j] = ch;
// mx_printchar(arr[i][j]);
}
}
}
}
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 11; j++) {
mx_printchar(arr[i][j]);
}
}
for (int i = 0; i < x; i++) free(arr[i]);
free(arr);
close(fd);
close(targetFile);
exit(0);
}
The last while loop should be saving the file's content to an array. However, when I try to print the array's content to console, I get some garbage values:
���pp
����8��
Please help me understand what is wrong here or should I use another approach to save the data to the array.
You have started off well, but then strayed into an awkward way of handling your read and allocations. There are a number of ways you can approach a flexible read of any number of characters and any number of rows into a dynamically allocated pointer-to-pointer-to char object that you can index like a 2D array. (often incorrectly referred to an a "dynamic 2D array") There is no array involved at all, you have a single-pointer to more pointers and you allocate a block of storage for your pointers (rows) and then allocate separate blocks of memory to hold each row worth of data and assign the beginning address to each such block to one of the pointers in turn.
An easy way to eliminate having to pre-read each row of characters to determine the number is to simply buffer the characters for each row and then allocate storage for and copy that number of characters to their final location. This provides the advantage of not having to allocate/reallocate each row starting from some anticipated number of characters. (as there is no guarantee that all rows won't have a stray character somewhere)
The other approach, equally efficient, but requiring the pre-read of the first row is to read the first row to determine the number of characters, allocate that number of characters for each row and then enforce that number of characters on every subsequent row (handling the error if additional characters are found). There are other options if you want to treat each row as a line and then read and create an array of strings, but your requirements appear to simply be a grid of characters) You can store your lines as strings at this point simply by adding a nul-terminating character.
Below we will use a fixed buffer to hold the characters until a '\n' is found marking the end of the row (or you run out of fixed storage) and then dynamically allocate storage for each row and copy the characters from the fixed buffer to your row-storage. This is generally a workable solution as you will know some outer bound of the max number of characters than can occur per-line (don't skimp). A 2K buffer is cheap security even if you think you are reading a max of 100 chars per-line. (if you are on an embedded system with limited memory, then I would reduce the buffer to 2X the anticipated max number of chars) If you define a constant up top for the fixed buffer size -- if you find you need more, it's a simple change in one location at the top of your file.
How does it work?
Let's start with declaring the counter variables to track the number of pointers available (avail), a row counter (row) a column counter (col) and a fixed number of columns we can use to compare against the number of columns in all subsequent rows (cols). Declare your fixed buffer (buf) and your pointer-to-pointer to dynamically allocate, and a FILE* pointer to handle the file, e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NCHARS 2048 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
size_t avail = 2, /* initial no. of available pointers to allocate */
row = 0, /* row counter */
col = 0, /* column counter */
cols = 0; /* fixed no. of columns based on 1st row */
char buf[NCHARS], /* temporary buffer to hold characters */
**arr = NULL; /* pointer-to-pointer-to char to hold grid */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
(note: if no argument is provided, the program will read from stdin by default)
Next we validate the file is open for reading and we allocate an initial avail number of pointers:
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
/* allocate/validate initial avail no. of pointers */
if (!(arr = malloc (avail * sizeof *arr))) {
perror ("malloc-arr");
return 1;
}
Next rather than looping while ((c = fgetc(fp)) != EOF), just continually loop - that will allow you to treat a '\n' or EOF within the loop and not have to handle the storage of the last line separately after the loop exits. Begin by reading the next character from the file and checking if you have used all your available pointers (indicating you need to realloc() more before proceeded):
while (1) { /* loop continually */
int c = fgetc(fp); /* read each char in file */
if (row == avail) { /* if all pointers used */
/* realloc 2X no. of pointers using temporary pointer */
void *tmp = realloc (arr, 2 * avail * sizeof *arr);
if (!tmp) { /* validate reallocation */
perror ("realloc-arr");
return 1; /* return failure */
}
arr = tmp; /* assign new block to arr */
avail *= 2; /* update available pointers */
}
(note: always realloc() using a temporary pointer. When realloc() fails (not if it fails) it returns NULL and if you reallocate using arr = realloc (arr, ..) you have just overwritten your pointer to your current block of memory with NULL causing the loss of the pointer and inability to free() the prior allocated block resulting in a memory-leak)
Now check if you have reached the end of line, or EOF and in the case of EOF if your col count is zero, you know you reached EOF after a previous '\n' so you can simply break the loop at that point. Otherwise, if you reach EOF with a full column-count, you know your file lacks a POSIX end-of-file and you need to store the last line of character, e.g.
if (c == '\n' || c == EOF) { /* if end of line or EOF*/
if (c == EOF && !col) /* EOF after \n - break */
break;
if (!(arr[row] = malloc (col))) { /* allocate/validate col chars */
perror ("malloc-arr[row]");
return 1;
}
memcpy (arr[row++], buf, col); /* copy buf to arr[row], increment */
if (!cols) /* if cols not set */
cols = col; /* set cols to enforce cols per-row */
if (col != cols) { /* validate cols per-row */
fprintf (stderr, "error: invalid no. of cols - row %zu\n", row);
return 1;
}
if (c == EOF) /* break after non-POSIX eof */
break;
col = 0; /* reset col counter zero */
}
If your character isn't a '\n' or EOF it's just a normal character, so add it to your buffer, check your buffer has room for the next and keep going:
else { /* reading in line */
buf[col++] = c; /* add char to buffer */
if (col == NCHARS) { /* if buffer full, handle error */
fputs ("error: line exceeds maximum.\n", stderr);
return 1;
}
}
}
At this point you have all of your characters stored in a dynamically allocated object you can index as a 2D array. (you also know it is just storage of characters that are not nul-terminated so you cannot treat each line as a string). You are free to add a nul-terminating character if you like, but then you might as well just read each line into buf with fgets() and trim the trailing newline, if present. Depends on your requirements.
The example just closes the file (if not reading from stdin), outputs the stored characters and frees all allocated memory, e.g.
if (fp != stdin) /* close file if not stdin */
fclose (fp);
for (size_t i = 0; i < row; i++) { /* loop over rows */
for (size_t j = 0; j < cols; j++) /* loop over cols */
putchar (arr[i][j]); /* output char */
putchar ('\n'); /* tidy up with newline */
free (arr[i]); /* free row */
}
free (arr); /* free pointers */
}
(that's the whole program, you can just cut/paste the parts together)
Example Input File
$ cat dat/gridofchars.txt
#,#,#,#,#,#,.,#,.,.,.$
#,.,#,.,.,#,.,#,#,#,#$
#,.,#,.,.,.,.,.,.,#,#$
#,.,#,.,.,#,#,#,#,#,#$
#,.,.,#,.,.,.,.,.,.,#$
#,.,.,.,#,.,#,#,.,.,#$
#,.,.,.,.,#,.,.,.,.,#$
#,.,.,.,.,#,.,.,.,.,#$
#,.,.,.,.,.,.,.,.,.,#$
#,#,#,#,#,#,#,#,#,.,#$
Example Use/Output
$ ./bin/read_dyn_grid dat/gridofchars.txt
#,#,#,#,#,#,.,#,.,.,.$
#,.,#,.,.,#,.,#,#,#,#$
#,.,#,.,.,.,.,.,.,#,#$
#,.,#,.,.,#,#,#,#,#,#$
#,.,.,#,.,.,.,.,.,.,#$
#,.,.,.,#,.,#,#,.,.,#$
#,.,.,.,.,#,.,.,.,.,#$
#,.,.,.,.,#,.,.,.,.,#$
#,.,.,.,.,.,.,.,.,.,#$
#,#,#,#,#,#,#,#,#,.,#$
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/read_dyn_grid dat/gridofchars.txt
==29391== Memcheck, a memory error detector
==29391== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==29391== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==29391== Command: ./bin/read_dyn_grid dat/gridofchars.txt
==29391==
#,#,#,#,#,#,.,#,.,.,.$
#,.,#,.,.,#,.,#,#,#,#$
#,.,#,.,.,.,.,.,.,#,#$
#,.,#,.,.,#,#,#,#,#,#$
#,.,.,#,.,.,.,.,.,.,#$
#,.,.,.,#,.,#,#,.,.,#$
#,.,.,.,.,#,.,.,.,.,#$
#,.,.,.,.,#,.,.,.,.,#$
#,.,.,.,.,.,.,.,.,.,#$
#,#,#,#,#,#,#,#,#,.,#$
==29391==
==29391== HEAP SUMMARY:
==29391== in use at exit: 0 bytes in 0 blocks
==29391== total heap usage: 17 allocs, 17 frees, 6,132 bytes allocated
==29391==
==29391== All heap blocks were freed -- no leaks are possible
==29391==
==29391== For counts of detected and suppressed errors, rerun with: -v
==29391== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.

Split a string on the occurance of a particular character [duplicate]

This question already has answers here:
C - split string into an array of strings
(2 answers)
Closed 5 years ago.
I am trying to split a string on every occurrence of a closing bracket and send it to a character array as a line by line in a while loop.
This is the input I am reading in a char * input
(000,P,ray ),(100,D,ray ),(009,L,art ),(0000,C,max ),(0000,S,ben ),(020,P,kay ),(040,L,photography ),(001,C,max ),(0001,S,ben ),(0001,P,kay )
This is the output I am trying to produce in a char each[30] = {}
(000,P,ray ),
(100,D,ray ),
(009,L,art ),
(000,C,max ),
(000,S,ben ),
(020,P,kay ),
(040,L,photography ),
(001,C,max ),
(201,S,ben ),
(301,P,kay )
I copied the input to a char * temp so that strtok() does not change the input. But I am not understanding how to use strtok() inside the while loop condition. Does anyone know how to do it ?
Thanks,
UPDATE:
Sorry if I have violated the rules.
Here's my code -
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
int main(int argc, char *argv[]){
size_t len = 0;
ssize_t read;
char * line = NULL;
char *eacharray;
FILE *fp = fopen(argv[1], "r");
char * each = NULL;
while ((read = getline(&line, &len, fp)) != -1) {
printf("%s\n", line);
eacharray = strtok(line, ")");
// printf("%s +\n", eacharray);
while(eacharray != NULL){
printf("%s\n", eacharray);
eacharray = strtok(NULL, ")");
}
}
return 0;
}
It produces an output like this -
(000,P,ray
,(100,D,ray
,(009,L,art
,(0000,C,max
,(0000,S,ben
,(020,P,kay
,(040,L,photography
,(001,C,max
,(0001,S,ben
,(0001,P,kay
I would not use strtok, because your simple parser should first detect an opening brace and then search for a closing one. With strtok, you could just split at a closing brace; then you could not check if there was an opening one, and you'd have to skip the characters until the next opening brace "manually".
BTW: you probably meant each[10][30], not each[30].
See the following code looking for opening and closing braces and copying the content in between (including the braces):
int main(int argc, char *argv[]) {
const char* source ="(000,P,ray ),"
"(100,D,ray ),"
"(009,L,art ),"
"(0000,C,max ),"
"(0000,S,ben ),"
"(020,P,kay ),"
"(040,L,photography ),"
"(001,C,max ),"
"(0001,S,ben ),"
"(0001,P,kay )";
char each[10][30] = {{ 0 }};
const char *str = source;
int i;
for (i=0; i<10; i++) {
const char* begin = strchr(str, '(');
if (!begin)
break;
const char* end = strchr(begin,')');
if (!end)
break;
end++;
ptrdiff_t length = end - begin;
if (length >= 30)
break;
memcpy(each[i], begin, length);
str = end;
}
for (int l=0; l<i; l++) {
printf("%s", each[l]);
if (l!=i-1)
printf(",\n");
}
putchar ('\n');
}
Hope it helps.
There are many ways to approach this problem. Stephan has a good approach using the functions available in string.h (and kindly contributed the example source string). Another basic way to approach this problem (or any string parsing problem) is to simply walk-a-pointer down the string, comparing characters as you go and taking the appropriate action.
When doing so with multiple-delimiters (e.g. ',' and (...), it is often helpful to indicate the "state" of your position within the original string. Here a simple flag in (for inside or outside (...)) well let you control whether you copy characters to your array or skip them.
The rest is just keeping track of your indexes and protecting your array bounds as you loop over each character (more of an accounting problem from a memory standpoint -- which you should do regardless)
Putting the pieces together, and providing additional details in comments in-line below, you could do something like the following:
#include <stdio.h>
#define MAXS 10 /* if you need constants -- declare them */
#define MAXL 30 /* (don't use 'magic numbers' in code) */
int main (void) {
const char* source ="(000,P,ray ),"
"(100,D,ray ),"
"(009,L,art ),"
"(0000,C,max ),"
"(0000,S,ben ),"
"(020,P,kay ),"
"(040,L,photography ),"
"(001,C,max ),"
"(0001,S,ben ),"
"(0001,P,kay )";
char each[MAXS][MAXL] = {{0}},
*p = (char *)source;
int i = 0, in = 0, ndx = 0; /* in - state flag, ndx - row index */
while (ndx < MAXS && *p) { /* loop over all chars filling 'each' */
if (*p == '(') { /* (while protecting your row bounds) */
each[ndx][i++] = *p; /* copy opening '(' */
in = 1; /* set flag 'in'side record */
}
else if (*p == ')') {
each[ndx][i++] = *p; /* copy closing ')' */
each[ndx++][i] = 0; /* nul-terminate */
i = in = 0; /* reset 'i' and 'in' */
}
else if (in) { /* if we are 'in', copy char */
each[ndx][i++] = *p;
}
if (i + 1 == MAXL) { /* protect column bounds */
fprintf (stderr, "record exceeds %d chars.\n", MAXL);
return 1;
}
p++; /* increment pointer */
}
for (i = 0; i < ndx; i++) /* display results */
printf ("each[%2d] : %s\n", i, each[i]);
return 0;
}
(note: above, each row in each will be nul-terminated by default as a result of initializing all characters in each to zero at declaration, but it is still good practice to affirmatively nul-terminate all strings)
Example Use/Output
$ ./bin/testparse
each[ 0] : (000,P,ray )
each[ 1] : (100,D,ray )
each[ 2] : (009,L,art )
each[ 3] : (0000,C,max )
each[ 4] : (0000,S,ben )
each[ 5] : (020,P,kay )
each[ 6] : (040,L,photography )
each[ 7] : (001,C,max )
each[ 8] : (0001,S,ben )
each[ 9] : (0001,P,kay )
Get comfortable using either method. You can experiment whether using if.. else if.. or a switch best fits any parsing problem. The functions in string.h can be the better choice. It all depends on your input. Being comfortable with both approaches helps you better tailor your code to the problem at hand.
Example with getline and realloc of Rows
Since you are using getline to read each line, it will potentially read and allocate storage for an unlimited number of records (e.g. (...)). The way to handle this is to allocate storage for your records (pointers) dynamically, keep track of the number of pointers used, and realloc to allocate more pointers when you reach your record limit. You will need to validate each allocation, and understand you allocate each as a pointer-to-pointer-to-char (e.g. char **each) instead of each being a 2D array (e.g. char each[rows][cols]). (though you will still access and use the string held with each the same way (e.g. each[0], each[1], ...))
The code below will read from the filename given as the first argument (or from stdin if no argument is given). The approach is a standard approach for handling this type problem. each is declared as char **each = NULL; (a pointer-to-pointer-to-char). You then allocate an initial number of pointers (rows) for each with:
each = malloc (rows * sizeof *each); /* allocate rows no. of pointers */
if (!each) { /* validate allocation */
perror ("each - memory exhausted"); /* throw error */
return 1;
}
You then use getline to read each line into a buffer (buf) and pass a pointer to buf to the logic we used above. (NOTE, you must preserve a pointer to buf as buf points to storage dynamically allocated by getline that you must free later.)
The only addition to the normal parsing logic is we now need to allocate storage for each of the records we parse, and assign the address of the block of memory holding each record to each[x]. (we use strcpy for that purpose after allocating the storage for each record).
To simplify parsing, we originally parse each record into a fixed size buffer (rec) since we do not know the length of each record ahead of time. You can dynamically allocate/reallocate for rec as well, but that adds an additional level of complexity -- and I suspect you will struggle with the additions as they stand now. Just understand we parse each record from buf into rec (which we set at 256 chars #define MAXR 256 -- more than ample for the expected 30-31 char record size) Even though we use a fixed length rec, we still check i against MAXR to protect the fixed array bounds.
The storage for each record and copy of parsed records from rec to each[ndx] is handled when a closing ) is encountered as follows:
(note - storage for the nul-character is included in 'i' where you would normally see 'i + 1')
each[ndx] = malloc (i); /* allocate storage for rec */
if (!each[ndx]) { /* validate allocation */
perror ("each[ndx] - memory exhausted");
return 1;
}
strcpy (each[ndx], rec);/* copy rec to each[ndx] */
(note: by approaching allocation in this manner, you allocate the exact amount of storage you need for each record. There is no wasted space. You can handle 1 record or 10,000,000 records (to the extent of the memory on your computer))
Here is your example. Take time to understand what every line does and why. Ask questions if you do not understand. This is the meat-and-potatoes of dynamic allocation and once you get it -- you will have a firm understanding of the basics for handling any of your storage needs.
#include <stdio.h>
#include <stdlib.h> /* for malloc, realloc */
#include <string.h> /* for strcpy */
#define ROWS 10 /* initial number of rows to allocate */
#define MAXR 256 /* maximum record length between (...) */
int main (int argc, char **argv) {
int in = 0; /* in - state flag */
char **each = NULL, /* pointer to pointer to char */
*buf = NULL; /* buffer for getline */
size_t rows = ROWS, /* currently allocated row pointers */
ndx = 0, /* ndx - row index */
n = 0, /* buf size (0 - getline decides) */
i = 0; /* loop counter */
ssize_t nchr = 0; /* num chars read by getline (return) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
each = malloc (rows * sizeof *each); /* allocate rows no. of pointers */
if (!each) { /* validate allocation */
perror ("each - memory exhausted"); /* throw error */
return 1;
}
while ((nchr = getline (&buf, &n, fp) != -1)) { /* read line into buf */
char *p = buf, /* pointer to buf */
rec[MAXR] = ""; /* temp buffer to hold record */
while (*p) { /* loop over all chars filling 'each' */
if (*p == '(') { /* (while protecting your row bounds) */
rec[i++] = *p; /* copy opening '(' */
in = 1; /* set flag 'in'side record */
}
else if (*p == ')') {
rec[i++] = *p; /* copy closing ')' */
rec[i++] = 0; /* nul-terminate */
each[ndx] = malloc (i); /* allocate storage for rec */
if (!each[ndx]) { /* validate allocation */
perror ("each[ndx] - memory exhausted");
return 1;
}
strcpy (each[ndx], rec);/* copy rec to each[ndx] */
i = in = 0; /* reset 'i' and 'in' */
ndx++; /* increment row index */
if (ndx == rows) { /* check if rows limit reached */
/* reallocate 2X number of pointers using tmp pointer */
void *tmp = realloc (each, rows * sizeof *each * 2);
if (!tmp) { /* validate realloc succeeded */
perror ("realloc each - memory exhausted");
goto memfull; /* each still contains original recs */
}
each = tmp; /* assign reallocated block to each */
rows *= 2; /* update rows with current pointers */
}
}
else if (in) { /* if we are 'in', copy char */
rec[i++] = *p;
}
if (i + 1 == MAXR) { /* protect column bounds */
fprintf (stderr, "record exceeds %d chars.\n", MAXR);
return 1;
}
p++; /* increment pointer */
}
}
memfull:; /* labet for goto */
free (buf); /* free memory allocated by getline */
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (i = 0; i < ndx; i++) { /* display results */
printf ("each[%2zu] : %s\n", i, each[i]);
free (each[i]); /* free memory for each record */
}
free (each); /* free pointers */
return 0;
}
(note: since nchr isn't used to trim the '\n' from the end of the buffer read by getline, you can eliminate that variable. Just note that there is no need to call strlen on the buffer returned by getline as the number of characters read is the return value)
Example Use/Output
Note: for the input test, I just put your line of records in the file dat/delimrecs.txt and copied it 4 times giving a total of 40 records in 4 lines.
$ ./bin/parse_str_state_getline <dat/delimrecs.txt
each[ 0] : (000,P,ray )
each[ 1] : (100,D,ray )
each[ 2] : (009,L,art )
each[ 3] : (0000,C,max )
each[ 4] : (0000,S,ben )
<snip 5 - 34>
each[35] : (020,P,kay )
each[36] : (040,L,photography )
each[37] : (001,C,max )
each[38] : (0001,S,ben )
each[39] : (0001,P,kay )
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/parse_str_state_getline <dat/delimrecs.txt
==13035== Memcheck, a memory error detector
==13035== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==13035== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==13035== Command: ./bin/parse_str_state_getline
==13035==
each[ 0] : (000,P,ray )
each[ 1] : (100,D,ray )
each[ 2] : (009,L,art )
each[ 3] : (0000,C,max )
each[ 4] : (0000,S,ben )
<snip 5 - 34>
each[35] : (020,P,kay )
each[36] : (040,L,photography )
each[37] : (001,C,max )
each[38] : (0001,S,ben )
each[39] : (0001,P,kay )
==13035==
==13035== HEAP SUMMARY:
==13035== in use at exit: 0 bytes in 0 blocks
==13035== total heap usage: 46 allocs, 46 frees, 2,541 bytes allocated
==13035==
==13035== All heap blocks were freed -- no leaks are possible
==13035==
==13035== For counts of detected and suppressed errors, rerun with: -v
==13035== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
This is a lot to take in, but this is a basic minimal example of the framework for handling an unknown number of objects.

Difficulty splitting strings read from a file in C

I need to read input from a file, then split the word in capitals from it's definition. My trouble being that I need multiple lines from the file to be in one variable to pass it to another function.
The file I want to read from looks like this
ACHROMATIC. An optical term applied to those telescopes in which
aberration of the rays of light, and the colours dependent thereon, are
partially corrected. (See APLANATIC.)
ACHRONICAL. An ancient term, signifying the rising of the heavenly
bodies at sunset, or setting at sunrise.
ACROSS THE TIDE. A ship riding across tide, with the wind in the
direction of the tide, would tend to leeward of her anchor; but with a
weather tide, or that running against the wind, if the tide be strong,
would tend to windward. A ship under sail should prefer the tack that
stems the tide, with the wind across the stream, when the anchor is
let go.
Right now my code splits the word from the rest, but I'm having difficulty getting the rest of the input into one variable.
while(fgets(line, sizeof(line), mFile) != NULL){
if (strlen(line) != 2){
if (isupper(line[0]) && isupper(line[1])){
word = strtok(line, ".");
temp = strtok(NULL, "\n");
len = strlen(temp);
for (i=0; i < len; i++){
*(defn+i) = *(temp+i);
}
printf("Word: %s\n", word);
}
else{
temp = strtok(line, "\n");
for (i=len; i < strlen(temp) + len; i++);
*(defn+i) = *(temp+i-len);
len = len + strlen(temp);
//printf(" %s\n", temp);
}
}
else{
len = 0;
printf("%s\n", defn);
index = 0;
}
}
like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>
//another function
void func(char *word, char *defs){
printf("<%s>\n", word);
if(defs){
printf("%s", defs);
}
}
int main(void){
char buffer[4096], *curr = buffer;
size_t len, buf_size = sizeof buffer;
FILE *fp = fopen("dic.txt", "r");
while(fgets(curr, buf_size, fp)){
//check definition line
if(*curr == '\n' || !isupper(*curr)){
continue;//printf("invalid format\n");
}
len = strlen(curr);
curr += len;
buf_size -= len;
//read rest line
while(1){
curr = fgets(curr, buf_size, fp);
if(!curr || *curr == '\n'){//upto EOF or blank line
char *word, *defs;
char *p = strchr(buffer, '.');
if(p)
*p++ = 0;
word = buffer;
defs = p;
func(word, defs);
break;
}
len = strlen(curr);
curr += len;
buf_size -= len;
assert(buf_size >= 2 || (fprintf(stderr, "small buffer\n"), 0));
}
curr = buffer;
buf_size = sizeof buffer;
}
fclose(fp);
return 0;
}
It appears you need to first pull a string of uppercase letters from the beginning of the line, up to the first period, then concatenate the remainder of that line with subsequent lines until a blank line is found. Lather, rinse, repeat as needed.
While this task would be MUCH easier in Perl, if you need to do it in C, for one thing I recommend using the built-in string functions instead of constructing your own for-loops to copy the data. Perhaps something like the following:
while(fgets(line, sizeof(line), mFile) != NULL) {
if (strlen(line) > 2) {
if (isupper(line[0]) && isupper(line[1])) {
word = strtok(line, ".");
strcpy(defn,strtok(NULL, "\n"));
printf("Word: %s\n", word);
} else {
strcat(defn,strtok(line, "\n"));
}
} else {
printf("%s\n", defn);
defn[0] = 0;
}
}
When I put this in a properly structured C program, with appropriate include files, it works fine. I personally would have approached the problem differently, but hopefully this gets you going.
There are several areas that can be addressed. Given your example input and description, it appears your goal is to develop a function that will read and separate each word (or phrase) and associated definition, return a pointer to the collection of words/definitions, while also updating a pointer to the number of words and definitions read so that number is available back in the calling function (main here).
While your data suggests that the word and definition are both contained within a single line of text with the word (or phrase written in all upper-case), it is unclear whether you will have to address the case where the definition can span multiple lines (essentially causing you to potentially read multiple lines and combine them to form the complete definition.
Whenever you need to maintain relationships between multiple variables within a single object, then a struct is a good choice for the base data object. Using an array of struct allows you access to each word and its associated definition once all have been read into memory. Now your example has 3 words and definitions. (each separated by a '\n'). Creating an array of 3 struct to hold the data is trivial, but when reading data, like a dictionary, you rarely know exactly how many words you will have to read.
To handle this situation, a dynamic array of structs is a proper data structure. You essentially allocate space for some reasonable number of words/definitions, and then if you reach that limit, you simply realloc the array containing your data, update your limit to reflect the new size allocated, and continue on.
While you can use strtok to separate the word (or phrase) by looking for the first '.', that is a bit of an overkill. You will need to traverse over each char anyway to check if they are all caps anyway, you may as well just iterate until you find the '.' and use the number for that character index to store your word and set a pointer to the next char after the '.'. You will begin looking for the start of the definition from there (you basically want to skip any character that is not an [a-zA-Z]). Once you locate the beginning of the definition, you can simply get the length of the rest of the line, and copy that as the definition (or the first part of it if the definition is contained in multiple-separate lines).
After the file is read and the pointer returned and the pointer for the number of words updated, you can then use the array of structs back in main as you like. Once you are done using the information, you should free all the memory you have allocated.
Since the size of the maximum word or phrase is generally know, the struct used provides static storage for the word. Give the definitions can vary wildly in length and are much longer, the struct simply contains a pointer-to-char*. So you will have to allocate storage for each struct, and then allocates storage for each definition within each struct.
The following code does just that. It will take the filename to read as the first argument (or it will read from stdin by default if no filename is given). The code the output the words and definitions on single lines. The code is heavily commented to help you follow along and explain the logic e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum {MAXW = 64, NDEF = 128};
typedef struct { /* struct holding words/definitions */
char word[MAXW],
*def; /* you must allocate space for def */
} defn;
defn *readdict (FILE *fp, size_t *n);
int main (int argc, char **argv) {
defn *defs = NULL;
size_t n = 0;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
if (!(defs = readdict (fp, &n))) { /* read words/defs into defs */
fprintf (stderr, "readdict() error: no words read from file.\n");
return 1;
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < n; i++) {
printf ("\nword: %s\n\ndefinition: %s\n", defs[i].word, defs[i].def);
free (defs[i].def); /* free allocated definitions */
}
free (defs); /* free array of structs */
return 0;
}
/** read word and associated definition from open file stream 'fp'
* into dynamic array of struct, updating pointer 'n' to contain
* the total number of defn structs filled.
*/
defn *readdict (FILE *fp, size_t *n)
{
defn *defs = NULL; /* pointer to array of structs */
char buf[BUFSIZ] = ""; /* buffer to hold each line read */
size_t max = NDEF, haveword = 0, offset = 0; /* allocated size & flags */
/* allocate, initialize & validate memory to hold 'max' structs */
if (!(defs = calloc (max, sizeof *defs))) {
fprintf (stderr, "error: virtual memory exhausted.\n");
return NULL;
}
while (fgets (buf, BUFSIZ, fp)) /* read each line of input */
{
if (*buf == '\n') { /* check for blank line */
if (haveword) (*n)++; /* if word/def already read, increment n */
haveword = 0; /* reset haveword flag */
if (*n == max) {
void *tmp = NULL; /* tmp ptr to realloc defs */
if (!(tmp = realloc (defs, sizeof *defs * (max + NDEF)))) {
fprintf (stderr, "error: memory exhaused, realloc defs.\n");
break;
}
defs = tmp; /* assign new block to defs */
memset (defs + max, 0, NDEF * sizeof *defs); /* zero new mem */
max += NDEF; /* update max with current allocation size */
}
continue; /* get next line */
}
if (haveword) { /* word already stored in defs[n].word */
void *tmp = NULL; /* tmp pointer to realloc */
size_t dlen = strlen (buf); /* get line/buf length */
if (buf[dlen - 1] == '\n') /* trim '\n' from end */
buf[--dlen] = 0; /* realloc & validate */
if (!(tmp = realloc (defs[*n].def, offset + dlen + 2))) {
fprintf (stderr,
"error: memory exhaused, realloc defs[%zu].def.\n", *n);
break;
}
defs[*n].def = tmp; /* assign new block, fill with definition */
sprintf (defs[*n].def + offset, offset ? " %s" : "%s", buf);
offset += dlen + 1; /* update offset for rest (if required) */
}
else { /* no current word being defined */
char *p = NULL;
size_t i;
for (i = 0; buf[i] && i < MAXW; i++) { /* check first MAXW chars */
if (buf[i] == '.') { /* if a '.' is found, end of word */
size_t dlen = 0;
if (i + 1 == MAXW) { /* check one char available for '\0' */
fprintf (stderr,
"error: 'word' exceeds MAXW, skipping.\n");
goto next;
}
strncpy (defs[*n].word, buf, i); /* copy i chars to .word */
haveword = 1; /* set haveword flag */
p = buf + i + 1; /* set p to next char in buf after '.' */
while (*p && (*p == ' ' || *p < 'A' || /* find def start */
('Z' < *p && *p < 'a') || 'z' < *p))
p++; /* increment p and check again */
if ((dlen = strlen (p))) { /* get definition length */
if (p[dlen - 1] == '\n') /* trim trailing '\n' */
p[--dlen] = 0;
if (!(defs[*n].def = malloc (dlen + 1))) { /* allocate */
fprintf (stderr,
"error: virtual memory exhausted.\n");
goto done; /* bail if allocation failed */
}
strcpy (defs[*n].def, p); /* copy definition to .def */
offset = dlen; /* set offset in .def buf to be */
} /* used if def continues on a */
break; /* new or separae line */
} /* check word is all upper-case or a ' ' */
else if (buf[i] != ' ' && (buf[i] < 'A' || 'Z' < buf[i]))
break;
}
}
next:;
}
done:;
if (haveword) (*n)++; /* account for last word/definition */
return defs; /* return pointer to array of struct */
}
Example Use/Output
$ ./bin/dict_read <dat/dict.txt
word: ACHROMATIC
definition: An optical term applied to those telescopes in which
aberration of the rays of light, and the colours dependent thereon,
are partially corrected. (See APLANATIC.)
word: ACHRONICAL
definition: An ancient term, signifying the rising of the heavenly
bodies at sunset, or setting at sunrise.
word: ACROSS THE TIDE
definition: A ship riding across tide, with the wind in the direction
of the tide, would tend to leeward of her anchor; but with a weather tide,
or that running against the wind, if the tide be strong, would tend to
windward. A ship under sail should prefer the tack that stems the tide,
with the wind across the stream, when the anchor is let go.
(line breaks were manually inserted to keep the results tidy here).
Memory Use/Error Check
You should also run any code that dynamically allocates memory though a memory use and error checking program like valgrind on linux. Just run the code though it and confirm you free all memory you allocate and that there are no memory errors, e.g.
$ valgrind ./bin/dict_read <dat/dict.txt
==31380== Memcheck, a memory error detector
==31380== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==31380== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==31380== Command: ./bin/dict_read
==31380==
word: ACHROMATIC
<snip output>
==31380==
==31380== HEAP SUMMARY:
==31380== in use at exit: 0 bytes in 0 blocks
==31380== total heap usage: 4 allocs, 4 frees, 9,811 bytes allocated
==31380==
==31380== All heap blocks were freed -- no leaks are possible
==31380==
==31380== For counts of detected and suppressed errors, rerun with: -v
==31380== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Look things over and let me know if you have further questions.

Count the reocurrence of words in text file

Expanding on my a previous exercise, I have a text file that is filled with one word per line.
hello
hi
hello
bonjour
bonjour
hello
As I read these words from the file I would like to compare them to an array of struct pointers (created from the text file). If the word does not exist within the array, the word should be stored into a struct pointer with a count of 1. If the word already exist in the array the count should increase by 1. I will write the outcome into a new file (that already exist).
hello = 3
hi = 1
bonjour = 2
this is my code
#include <stdio.h>
#include <stdlib.h>
struct wordfreq{
int count;
char *word;
};
int main(int argc, char * argv[]) {
struct wordfreq *words[1000] = {NULL};
int i, j, f = 0;
for(i=0; i <1000; i++)
words[i] = (struct wordfreq*)malloc(sizeof(struct wordfreq));
FILE *input = fopen(argv[1], "r");
FILE *output = fopen(argv[2], "w");
if(input == NULL){
printf("Error! Can't open file.\n");
exit(0);
}
char str[20];
i=0;
while(fscanf(input, "%s[^\n]", &str) ==1){
//fprintf(output, "%s:\n", str);
for(j=0; j<i; j++){
//fprintf(output, "\t%s == %s\n", str, words[j] -> word);
if(str == words[j]->word){
words[j] ->count ++;
f = 1;
}
}
if(f==0){
words[i]->word = str;
words[i]->count = 1;
}
//fprintf(output, "\t%s = %d\n", words[i]->word, words[i]->count);
i++;
}
for(j=0; j< i; j++)
fprintf(output, "%s = %d\n", words[j]->word, words[j]->count);
for(i=0; i<1000; i++){
free(words[i]);
}
return 0;
}
I used several fprintf statements to look at my values and I can see that while str is right, when I reach the line to compare str to the other array struct pointers (str == words[I]->word) during the transversal words[0] -> word is always the same as str and the rest of the words[i]->words are (null). I am still trying to completely understand mixing pointes and structs, with that said any thoughts, comments, complains?
You may be making things a bit harder than necessary, and you are certainly allocating 997 more structures than necessary in the case of your input file. There is no need to allocate all 1000 structs up front. (you are free to do so, it's just a memory management issue). The key is that you only need allocate a new struct each time a unique word is encountered. (in the case of your data file, 3-times). For all other cases, you are simply updating count to add the occurrence for a word you have already stored.
Also, if there is no compelling reason to use a struct, it is just as easy to use an array of pointers-to-char as your pointers to each word, and then a simple array of int [1000] as your count (or frequency) array. Your choice. In the case of two arrays, you only need to allocate for each unique word and never need a separate allocation for each struct.
Putting those pieces together, you could reduce your code (not including the file -- which can be handled by simple redirection) to the following:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum { MAXC = 128, MAXW = 1000 };
struct wordfreq{
int count;
char *word;
};
int main (void) {
struct wordfreq *words[MAXW] = {0};
char tmp[MAXC] = "";
int n = 0;
/* while < MAXW unique words, read each word in file */
while (n < MAXW && fscanf (stdin, " %s", tmp) == 1) {
int i;
for (i = 0; i < n; i++) /* check against exising words */
if (strcmp (words[i]->word, tmp) == 0) /* if exists, break */
break;
if (i < n) { /* if exists */
words[i]->count++; /* update frequency */
continue; /* get next word */
}
/* new word found, allocate struct and
* allocate storage for word (+ space for nul-byte)
*/
words[n] = malloc (sizeof *words[n]);
words[n]->word = malloc (strlen (tmp) + 1);
if (!words[n] || !words[n]->word) { /* validate ALL allocations */
fprintf (stderr, "error: memory exhausted, words[%d].\n", n);
break;
}
words[n]->count = 0; /* initialize count */
strcpy (words[n]->word, tmp); /* copy new word to words[n] */
words[n]->count++; /* update frequency to 1 */
n++; /* increment word count */
}
for (int i = 0; i < n; i++) { /* for each word */
printf ("%s = %d\n", words[i]->word, words[i]->count);
free (words[i]->word); /* free memory when no longer needed */
free (words[i]);
}
return 0;
}
Example Input File
$ cat dat/wordfile.txt
hello
hi
hello
bonjour
bonjour
hello
Example Use/Output
$ ./bin/filewordfreq <dat/wordfile.txt
hello = 3
hi = 1
bonjour = 2
As with any code that dynamically allocates memory, you will want to validate your use of the memory to insure you have not written beyond the bounds or based a conditional move or jump on an uninitialized value. In Linux, valgrind is the natural choice (there are similar programs for each OS). Just run you program through it, e.g.:
$ valgrind ./bin/filewordfreqstruct <dat/wordfile.txt
==2000== Memcheck, a memory error detector
==2000== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==2000== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==2000== Command: ./bin/filewordfreqstruct
==2000==
hello = 3
hi = 1
bonjour = 2
==2000==
==2000== HEAP SUMMARY:
==2000== in use at exit: 0 bytes in 0 blocks
==2000== total heap usage: 6 allocs, 6 frees, 65 bytes allocated
==2000==
==2000== All heap blocks were freed -- no leaks are possible
==2000==
==2000== For counts of detected and suppressed errors, rerun with: -v
==2000== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Verify that you free all memory you allocate and that there are no memory errors.
Look things over and let me know if you have any further questions.
Using 2-Arrays Instead of a struct
As mentioned above, sometimes using a storage array and a frequency array can simplify accomplishing the same thing. Whenever you are faced with needing the frequency of any "set", your first thought should be a frequency array. It is nothing more than an array of the same size as the number of items in your "set", (initialized to 0 at the beginning). The same approach applies, when you add (or find a duplicate of an existing) element in your storage array, you increment the corresponding element in your frequency array by 1. When you are done, your frequency array elements hold the frequency the corresponding elements in your storage array appear.
Here is an equivalent to the program above.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum { MAXC = 128, MAXW = 1000 };
int main (void) {
char *words[MAXW] = {NULL}, /* storage array of pointers to char* */
tmp[MAXC] = "";
int freq[MAXW] = {0}, n = 0; /* simple integer frequency array */
/* while < MAXW unique words, read each word in file */
while (n < MAXW && fscanf (stdin, " %s", tmp) == 1) {
int i;
for (i = 0; words[i]; i++) /* check against exising words */
if (strcmp (words[i], tmp) == 0) /* if exists, break */
break;
if (words[i]) { /* if exists */
freq[i]++; /* update frequency */
continue; /* get next word */
}
/* new word found, allocate storage (+ space for nul-byte) */
words[n] = malloc (strlen (tmp) + 1);
if (!words[n]) { /* validate ALL allocations */
fprintf (stderr, "error: memory exhausted, words[%d].\n", n);
break;
}
strcpy (words[n], tmp); /* copy new word to words[n] */
freq[n]++; /* update frequency to 1 */
n++; /* increment word count */
}
for (int i = 0; i < n; i++) { /* for each word */
printf ("%s = %d\n", words[i], freq[i]); /* output word + freq */
free (words[i]); /* free memory when no longer needed */
}
return 0;
}
Using this approach, you eliminate 1/2 of your memory allocations by using a statically declared frequency array for your count. Either way is fine, it is largely up to you.

Resources