I want to read the entire file(line by line)into a char pointer"name" in the struct array.(Wanna keep the names (can be of arbitrary length) in a dynamically allocated string Then I will divide the readed string(name) into chunks(age name score) in struct.I get seg fault.(file format is:
age name score
25,Rameiro Rodriguez,3
30,Anatoliy Stephanos,0
19,Vahan: Bohuslav,4.2
struct try{
double age;
char *name;
double score;
};
void allocate_struct_array(struct try **parr,int total_line);
int main(){
int count=0,i=0;
char ch;
fileptr = fopen("book.txt", "r");
//total line in the file is calculated
struct try *parr;
allocate_struct_array(&parr,count_lines);
//i got segmentation fault at below.(parsing code is not writed yet just trying to read the file)
while((ch=fgetc(fileptr))!=EOF) {
count++;
if(ch=='\n'){
parr->name=malloc(sizeof(char*)*count+1);
parr[i].name[count+1]='\0';
parr+=1;
count=0;
}
}
fclose(fileptr);
}
void allocate_struct_array(struct try **parr,int total_line){
*parr = malloc(total_line * sizeof(struct try));
}
Continuing from my comment, in allocate_struct_array(struct try **parr,int total_line), you allocate a block of struct try not a block of pointers (e.g. struct try*). Your allocation parr->name=malloc(sizeof(char*)*count+1); attempts to allocate count + 1 pointers. Moreover, on each iteration, you overwrite the address held by parr->name creating a memory leak because the pointer to the prior allocation is lost and cannot be freed.
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
A better approach to your problem is to read each line into a simply character array (of sufficient size to hold each line). You can then separate age, name and score and determine the number of characters in name so you can properly allocate for parr[i].name and then you can copy the name after you have allocated. If you are careful about it, you can simply locate both ',' in the buffer, allocate for parr[i].name and then use sscanf() with a proper format-string to separate, convert and copy all values to your struct parr[i] in a single call.
Since you have given no way to determine how //total line in the file is calculated, we will just presume a number large enough to accommodate your example file for purposes of discussion. Finding that number is left to you.
To read each line into an array, simply declare a buffer (character array) large enough to hold each line (take your longest expected line and multiply by 2 or 4, or if on a typical PC, just use a buffer of 1024 or 2048 bytes that will accommodate all but the obscure file with lines longer than that. (Rule: Don't Skimp On Buffer Size!!) You can do that with, e.g.
#define COUNTLINES 10 /* if you need a constant, #define one (or more) */
#define MAXC 1024
#define NUMSZ 64
...
int main (int argc, char **argv) {
char buf[MAXC]; /* temporary array to hold each line */
...
When reading until '\n' or EOF in a loop, it is easier to loop continually and check for EOF within the loop. That way the final line is handled as a normal part of your read loop and you don't need a special final code block to handle the last line, e.g.
while (nparr < count_lines) { /* protect your allocation bounds */
int ch = fgetc (fileptr); /* ch must be type int */
if (ch != '\n' && ch != EOF) { /* if not \n and not EOF */
...
}
else if (count) { /* only process buf if chars present */
...
}
if (ch == EOF) { /* if EOF, now break */
break;
}
}
(note: for your example we have continued to read with the fgetc() you used, but in normal practice you would simply use fgets() to fill the character array with the line)
To find the first and last ',' in the array, you can simply #include <string.h> and use strchar() to find the first and strrchr() to find the last. Using a pointer and end-pointer set to the first and last ',' the number of characters in name becomes ep - p - 1;. You can find the ','s and find the length of name with:
char *p = buf, *ep; /* pointer & end-pointer */
...
/* locate 1st ',' with p and last ',' with ep */
if ((p = strchr (buf, ',')) && (ep = strrchr (buf, ',')) &&
p != ep) { /* confirm pointers don't point to same ',' */
size_t len = ep - p - 1; /* get length of name */
Once you have found the first ',' and second ',' and determined the number of characters in name, you allocate characters, not pointers, e.g. with len characters in name and nparr as the struct index (instead of your i) you would do:
parr[nparr].name = malloc (len + 1); /* allocate */
if (!parr[nparr].name) { /* validate */
perror ("malloc-parr[nparr].name");
break;
}
(note: you break instead of exit on allocation error as all prior structs allocated for and filled will still contain valid data that you can use)
Now you can craft a sscanf() format string and separate age, name and score in a single call, e.g.
/* separate buf & convert into age, name, score -- validate */
if (sscanf (buf, "%d,%[^,],%lf", &parr[nparr].age,
parr[nparr].name, &parr[nparr].score) != 3) {
fputs ("error: invalid line format.\n", stderr);
...
}
Putting it altogether into a short program to read and separate your exmaple file, you could do:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define COUNTLINES 10 /* if you need a constant, #define one (or more) */
#define MAXC 1024
#define NUMSZ 64
typedef struct { /* typedef for convenient use as type */
int age; /* age is generally an integer, not double */
char *name;
double score;
} try;
/* always provde a meaningful return when function can
* succeed or fail. Return result of malloc.
*/
try *allocate_struct_array (try **parr, int total_line)
{
return *parr = malloc (total_line * sizeof **parr);
}
int main (int argc, char **argv) {
char buf[MAXC]; /* temporary array to hold each line */
int count = 0,
nparr = 0,
count_lines = COUNTLINES;
try *parr = NULL;
/* use filename provided as 1st argument (book.txt by default) */
FILE *fileptr = fopen (argc > 1 ? argv[1] : "book.txt", "r");
if (!fileptr) { /* always validate file open for reading */
perror ("fopen-fileptr");
return 1;
}
if (!fgets (buf, MAXC, fileptr)) { /* read/discard header line */
fputs ("file-empty\n", stderr);
return 1;
}
/* validate every allocation */
if (allocate_struct_array (&parr, count_lines) == NULL) {
perror ("malloc-parr");
return 1;
}
while (nparr < count_lines) { /* protect your allocation bounds */
int ch = fgetc (fileptr); /* ch must be type int */
if (ch != '\n' && ch != EOF) { /* if not \n and not EOF */
buf[count++] = ch; /* add char to buf */
if (count + 1 == MAXC) { /* validate buf not full */
fputs ("error: line too long.\n", stderr);
count = 0;
continue;
}
}
else if (count) { /* only process buf if chars present */
char *p = buf, *ep; /* pointer & end-pointer */
buf[count] = 0; /* nul-terminate buf */
/* locate 1st ',' with p and last ',' with ep */
if ((p = strchr (buf, ',')) && (ep = strrchr (buf, ',')) &&
p != ep) { /* confirm pointers don't point to same ',' */
size_t len = ep - p - 1; /* get length of name */
parr[nparr].name = malloc (len + 1); /* allocate */
if (!parr[nparr].name) { /* validate */
perror ("malloc-parr[nparr].name");
break;
}
/* separate buf & convert into age, name, score -- validate */
if (sscanf (buf, "%d,%[^,],%lf", &parr[nparr].age,
parr[nparr].name, &parr[nparr].score) != 3) {
fputs ("error: invalid line format.\n", stderr);
if (ch == EOF) /* if at EOF on failure */
break; /* break read loop */
else {
count = 0; /* otherwise reset count */
continue; /* start read of next line */
}
}
}
nparr += 1; /* increment array index */
count=0; /* reset count zero */
}
if (ch == EOF) { /* if EOF, now break */
break;
}
}
fclose(fileptr); /* close file */
for (int i = 0; i < nparr; i++) {
printf ("%3d %-20s %5.1lf\n",
parr[i].age, parr[i].name, parr[i].score);
free (parr[i].name); /* free strings when done */
}
free (parr); /* free struxts */
}
(note: Never Hardcode Filenames or use Magic-Numbers in your code. If you need a constant, #define ... one. Pass the filename to read as the first argument to your program or take the filename as input. You shouldn't have to recompile your code just to read from a different filename)
Example Use/Output
With your example data in dat/parr_name.txt, you would have:
$ ./bin/parr_name dat/parr_name.txt
25 Rameiro Rodriguez 3.0
30 Anatoliy Stephanos 0.0
19 Vahan: Bohuslav 4.2
Memory Use/Error Check
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/parr_name dat/parr_name.txt
==17385== Memcheck, a memory error detector
==17385== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17385== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==17385== Command: ./bin/parr_name dat/parr_name.txt
==17385==
25 Rameiro Rodriguez 3.0
30 Anatoliy Stephanos 0.0
19 Vahan: Bohuslav 4.2
==17385==
==17385== HEAP SUMMARY:
==17385== in use at exit: 0 bytes in 0 blocks
==17385== total heap usage: 7 allocs, 7 frees, 5,965 bytes allocated
==17385==
==17385== All heap blocks were freed -- no leaks are possible
==17385==
==17385== For counts of detected and suppressed errors, rerun with: -v
==17385== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Using fgets() To Read Each Line And A Temp Array For name
To not leave you with the wrong impression, this problem can be simplified substantially by reading each line into a character array using fgets() and separating the needed values with sscanf(), saving name into a temporary array of sufficient size. Now all that is needed is to allocate for parr[nparr].name and then copy the temporary name to parr[nparr].name.
By doing it this way you substantially reduce the complexity of reading character-by-character and by using a temporary array for name, you eliminate having to locate the ',' in order to obtain the length of the name.
The only changes needed are to add a new constant for the temporary name array and then you can replace the entire read-loop with:
#define NAMSZ 256
...
/* protect memory bounds, read each line into buf */
while (nparr < count_lines && fgets (buf, MAXC, fileptr)) {
char name[NAMSZ]; /* temporary array for name */
size_t len; /* length of name */
/* separate buf into age, temp name, score & validate */
if (sscanf (buf, "%d,%[^,],%lf", &parr[nparr].age, name,
&parr[nparr].score) != 3) {
fputs ("error: invalid line format.\n", stderr);
continue;
}
len = strlen (name); /* get length of name */
parr[nparr].name = malloc (len + 1); /* allocate for name */
if (!parr[nparr].name) { /* validate allocation */
perror ("malloc-parr[nparr].name");
break;
}
memcpy (parr[nparr].name, name, len + 1);
nparr += 1;
}
fclose(fileptr); /* close file */
...
(same output and same memory check)
Also note you can allocate and copy as a single operation if your compiler provides strdup(). That would reduce the allocation and copy of name to a single call, e.g.
parr[nparr].name = strdup (name);
Since strdup() allocates memory (and can fail), you must validate the allocation just as you would if you were using malloc() amd memcpy(). But, understand, strdup() is not standard C. It is a POSIX function that isn't part of the standard library.
The other improvement you can make is adding logic to call realloc() when your block of struct (parr) is full. That way you can start with some reasonably anticipated number of struct and then reallocate more whenever you run out. This will eliminate the artificial limit on the number of lines you can store -- and remove the need to know count_lines. (there are numerous examples on this site of how to use realloc(), the implementation is left to you.
Look things over and let me know if you have further questions.
Related
I basically want to print out all of the lines in a file i made
but then it just loops over and over back to the start
basically cuz in the function i set fseek(fp, 0, SEEK_SET); this part
but idk how otherwise i would place it to get through all the other lines
im basically going back to the start every time.
#include<stdio.h>
#include <stdlib.h>
char *freadline(FILE *fp);
int main(){
FILE *fp = fopen("story.txt", "r");
if(fp == NULL){
printf("Error!");
}else{
char *pInput;
while(!feof(fp)){
pInput = freadline(fp);
printf("%s\n", pInput); // outpu
}
}
return 0;
}
char *freadline(FILE *fp){
int i;
for(i = 0; !feof(fp); i++){
getc(fp);
}
fseek(fp, 0, SEEK_SET); // resets to origin (?)
char *pBuffer = (char*)malloc(sizeof(char)*i);
pBuffer = fgets(pBuffer, i, fp);
return pBuffer;
}
this is my work so far
Continuing from my comments, you are thinking along the correct lines, you just haven't put the pieces together in the right order. In main() you are looping calling your function, allocating for a single line, and then outputting a line of text and doing it over and over again. (and without freeing any of the memory you allocate -- creating a memory leak of every line read)
If you are allocating storage to hold the line, you will generally want to read the entire file in your function in a single pass though the file allocating for, and storing all lines, and returning a pointer to your collection of lines for printing in main() (or whatever the calling function is).
You do that by adding one additional level of indirection and having your function return char **. This is a simple two-step process where you:
allocate a block of memory containing pointers (one pointer for each line). Since you will not know how many lines beforehand, you simply allocate some number of pointers and then realloc() more pointers when you run out;
for each line you read, you allocate length + 1 characters of storage and copy the current line to that block of memory assigning the address for the line to your next available pointer.
(you either keep track of the number of pointers and lines allocated, or provide an additional pointer set to NULL as a Sentinel after the last pointer assigned a line -- up to you, simply keeping track with a counter is likely conceptually easier)
After reading your last line, you simply return the pointer to the collection of pointers which is assigned for use back in the caller. (you can also pass the address of a char ** as a parameter to your function, resulting in the type being char ***, but being a Three-Star Programmer isn't always a compliment). However, there is nothing wrong with doing it that way, and in some cases, it will be required, but if you have an alternative, that is generally the preferred route.
So how would this work in practice?
Simply change your function return type to char ** and pass an additional pointer to a counting variable so you can update the value at that address with the number of lines read before you return from your function. E.g., you could do:
char **readfile (FILE *fp, size_t *n);
Which will take your file pointer to an open file stream and then read each line from it, allocating storage for the line and assigning the address for that allocation to one of your pointers. Within the function, you would use a sufficiently sized character array to hold each line you read with fgets(). Trim the '\n' from the end and get the length and then allocate length + 1 bytes to hold the line. Assign the address for the newly allocated block to a pointer and copy from your array to the newly allocated block.
(strdup() can both allocate and copy, but it is not part of the standard library, it's POSIX -- though most compilers support it as an extension if you provide the proper options)
Below the readfile() function puts it altogether, starting with a single pointer and reallocation twice the current number when you run out (that provides a reasonable trade-off between the number of allocations needed and the number of pointers. (after just 20 calls to realloc(), you would have 1M pointers). You can choose any reallocation and growth scheme you like, but you want to avoid calling realloc() for every line -- realloc() is still a relatively expensive call.
#define MAXC 1024 /* if you need a constant, #define one (or more) */
#define NPTR 1 /* initial no. of pointers to allocate */
/* readfile reads all lines from fp, updating the value at the address
* provided by 'n'. On success returns pointer to allocated block of pointers
* with each of *n pointers holding the address of an allocated block of
* memory containing a line from the file. On allocation failure, the number
* of lines successfully read prior to failure is returned. Caller is
* responsible for freeing all memory when done with it.
*/
char **readfile (FILE *fp, size_t *n)
{
char buffer[MAXC], **lines; /* buffer to hold each line, pointer */
size_t allocated = NPTR, used = 0; /* allocated and used pointers */
lines = malloc (allocated * sizeof *lines); /* allocate initial pointer(s) */
if (lines == NULL) { /* validate EVERY allocation */
perror ("malloc-lines");
return NULL;
}
while (fgets (buffer, MAXC, fp)) { /* read each line from file */
size_t len; /* variable to hold line-length */
if (used == allocated) { /* is pointer reallocation needed */
/* always realloc to a temporary pointer to avoid memory leak if
* realloc fails returning NULL.
*/
void *tmp = realloc (lines, 2 * allocated * sizeof *lines);
if (!tmp) { /* validate EVERY reallocation */
perror ("realloc-lines");
break; /* lines before failure still good */
}
lines = tmp; /* assign reallocted block to lines */
allocated *= 2; /* update no. of allocated pointers */
}
buffer[(len = strcspn (buffer, "\n"))] = 0; /* trim \n, save length */
lines[used] = malloc (len + 1); /* allocate storage for line */
if (!lines[used]) { /* validate EVERY allocation */
perror ("malloc-lines[used]");
break;
}
memcpy (lines[used], buffer, len + 1); /* copy buffer to lines[used] */
used++; /* increment used no. of pointers */
}
*n = used; /* update value at address provided by n */
/* can do final realloc() here to resize exactly to used no. of pointers */
return lines; /* return pointer to allocated block of pointers */
}
In main(), you simply pass your file pointer and the address of a size_t variable and check the return before iterating through the pointers making whatever use of the line you need (they are simply printed below), e.g.
int main (int argc, char **argv) {
char **lines; /* pointer to allocated block of pointers and lines */
size_t n; /* number of lines read */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
lines = readfile (fp, &n);
if (fp != stdin) /* close file if not stdin */
fclose (fp);
if (!lines) { /* validate readfile() return */
fputs ("error: no lines read from file.\n", stderr);
return 1;
}
for (size_t i = 0; i < n; i++) { /* loop outputting all lines read */
puts (lines[i]);
free (lines[i]); /* don't forget to free lines */
}
free (lines); /* and free pointers */
return 0;
}
(note: don't forget to free the memory you allocated when you are done. That become critical when you are calling functions that allocate within other functions. In main(), the memory will be automatically released on exit, but build good habits.)
Example Use/Output
$ ./bin/readfile_allocate dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
The program will read any file, no matter if it has 4-lines or 400,000 lines up to the physical limit of your system memory (adjust MAXC if your lines are longer than 1023 characters).
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/readfile_allocate dat/captnjack.txt
==4801== Memcheck, a memory error detector
==4801== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4801== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==4801== Command: ./bin/readfile_allocate dat/captnjack.txt
==4801==
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
==4801==
==4801== HEAP SUMMARY:
==4801== in use at exit: 0 bytes in 0 blocks
==4801== total heap usage: 10 allocs, 10 frees, 5,804 bytes allocated
==4801==
==4801== All heap blocks were freed -- no leaks are possible
==4801==
==4801== For counts of detected and suppressed errors, rerun with: -v
==4801== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
The full code used for the example simply includes the headers, but is included below for completeness:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
#define NPTR 1 /* initial no. of pointers to allocate */
/* readfile reads all lines from fp, updating the value at the address
* provided by 'n'. On success returns pointer to allocated block of pointers
* with each of *n pointers holding the address of an allocated block of
* memory containing a line from the file. On allocation failure, the number
* of lines successfully read prior to failure is returned. Caller is
* responsible for freeing all memory when done with it.
*/
char **readfile (FILE *fp, size_t *n)
{
char buffer[MAXC], **lines; /* buffer to hold each line, pointer */
size_t allocated = NPTR, used = 0; /* allocated and used pointers */
lines = malloc (allocated * sizeof *lines); /* allocate initial pointer(s) */
if (lines == NULL) { /* validate EVERY allocation */
perror ("malloc-lines");
return NULL;
}
while (fgets (buffer, MAXC, fp)) { /* read each line from file */
size_t len; /* variable to hold line-length */
if (used == allocated) { /* is pointer reallocation needed */
/* always realloc to a temporary pointer to avoid memory leak if
* realloc fails returning NULL.
*/
void *tmp = realloc (lines, 2 * allocated * sizeof *lines);
if (!tmp) { /* validate EVERY reallocation */
perror ("realloc-lines");
break; /* lines before failure still good */
}
lines = tmp; /* assign reallocted block to lines */
allocated *= 2; /* update no. of allocated pointers */
}
buffer[(len = strcspn (buffer, "\n"))] = 0; /* trim \n, save length */
lines[used] = malloc (len + 1); /* allocate storage for line */
if (!lines[used]) { /* validate EVERY allocation */
perror ("malloc-lines[used]");
break;
}
memcpy (lines[used], buffer, len + 1); /* copy buffer to lines[used] */
used++; /* increment used no. of pointers */
}
*n = used; /* update value at address provided by n */
/* can do final realloc() here to resize exactly to used no. of pointers */
return lines; /* return pointer to allocated block of pointers */
}
int main (int argc, char **argv) {
char **lines; /* pointer to allocated block of pointers and lines */
size_t n; /* number of lines read */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
lines = readfile (fp, &n);
if (fp != stdin) /* close file if not stdin */
fclose (fp);
if (!lines) { /* validate readfile() return */
fputs ("error: no lines read from file.\n", stderr);
return 1;
}
for (size_t i = 0; i < n; i++) { /* loop outputting all lines read */
puts (lines[i]);
free (lines[i]); /* don't forget to free lines */
}
free (lines); /* and free pointers */
return 0;
}
Let me know if you have further questions.
If you want to read a fine till the last line, you can simply use getc(). It returns EOF when end of file is reached or it fails to read. So, if using getc(), to make sure the end of file is reached, it's better to use feof(), which returns a non-zero value if end of file is reached, else 0.
Example :-
int main()
{
FILE *fp = fopen("story.txt", "r");
int ch = getc(fp);
while (ch != EOF)
{
/* display contents of file on screen */
putchar(ch);
ch = getc(fp);
}
if (feof(fp))
printf("\n End of file reached.");
else
printf("\nError while reading!");
fclose(fp);
getchar();
return 0;
}
you can also fgets(), below is the example :-
#define MAX_LEN 256
int main(void)
{
FILE *fp = fopen("story.txt", "r");
if (fp == NULL) {
perror("Failed: "); //prints a descriptive error message to stderr.
return 1;
}
char buffer[MAX_LEN];
// -1 to allow room for NULL terminator for really long string
while (fgets(buffer, MAX_LEN - 1, fp))
{
// Remove trailing newline
buffer[strcspn(buffer, "\n")] = 0;
printf("%s\n", buffer);
}
fclose(fp);
return 0;
}
alternatively,
int main() {
int MAX_LEN = 255;
char buffer[MAX_LEN];
FILE *fp = fopen("story.txt", "r");
while(fgets(buffer, MAX_LEN, fp)) {
printf("%s\n", buffer);
}
fclose(fp);
return 0;
}
For more details, refer C read file line by line
Your approach is wrong and it's very likely that it will generate an endless loop. I'll explain why using the original code and inline comments:
char *freadline(FILE *fp){
int i;
// This part attempts to count the number of characters
// in the whole file by reading char-by-char until EOF is set
for(i = 0; !feof(fp); i++){
getc(fp);
}
// Here EOF is set
// This returns to the start and clears EOF
fseek(fp, 0, SEEK_SET);
// Here EOF is cleared
char *pBuffer = (char*)malloc(sizeof(char)*i);
// Here you read a line, i.e. you read characters until (and including) the
// first newline in the file.
pBuffer = fgets(pBuffer, i, fp);
// Here EOF is still cleared as you only read the first line of the file
return pBuffer;
}
So in main when you do
while(!feof(fp)){
...
}
you have an endless loop as feof is false. Your program will print the same line again and again and you have memory leaks as you never call free(pInput).
So you need to redesign your code. Read what fgets do, e.g. here https://man7.org/linux/man-pages/man3/fgets.3p.html
A number of issues to address:
Using fgets does not guarantee that you read a line after the function returns. So if you really want to check whether you've read a complete line, check the number of characters in the returned string, and also check for the presence of a new-line character at the end of the string.
Your use of fseek is interesting here because what it does is to tell the stream pointer to go back to the start of the file, and start reading from there. This means that after the first time the freadline function is called, you will continue reading the first byte from the file each time.
Lastly, your program is hoarding memory like a greedy baby! You never free any of those allocations you did!
With that being said, here is an improved freadline implementation:
char *freadline(FILE *fp) {
/* initializations */
char buf[BUFSIZ + 1];
char *pBuffer;
size_t size = 0, tmp_size;
/* fgets returns NULL when it reaches EOF, so our loop is conditional
* on that
*/
while (fgets (buf, BUFSIZ + 1, fp) != NULL) {
tmp_size = strlen (buf);
size += tmp_size;
if (tmp_size != BUFSIZ || buf[BUFSIZ] == '\n')
break;
}
/* after breaking from loop, check that size is not zero.
* this should only happen if we reach EOF, so return NULL
*/
if (size == 0)
return NULL;
/* Allocate memory for the line plus one extra for the null byte */
pBuffer = malloc (size + 1);
/* reads the contents of the file into pBuffer */
if (size <= BUFSIZ) {
/* Optimization: use memcpy rather than reading
* from disk if the line is small enough
*/
memcpy (pBuffer, buf, size);
} else {
fseek (fp, ftell(fp) - size, SEEK_SET);
fread (pBuffer, 1, size, fp);
}
pBuffer[size] = '\0'; /* set the null terminator byte */
return pBuffer; /* remember to free () this when you are done! */
}
This way will not need the additional call to feof (which is often a hit and miss), and instead relies on what fgets returns to determine if we have reached the end of file.
With these changes, it should be enough to change main to:
int main() {
FILE *fp = fopen("story.txt", "r");
if(fp == NULL){
printf("Error!");
}else{
char *pInput;
/* here we just keep reading until a NULL string is returned */
for (pInput = freadline(fp); pInput != NULL; pInput = freadline(fp))) {
printf("%s", pInput); // output
free (pInput);
}
fclose(fp);
}
return 0;
}
I am trying to parse a csv into a dynamically allocated array of structures, however my attempt crashes with a segmentation fault.
Here is the structure of my data:
SO02773202,5087001,0
SO02773203,5087001,0
SO02773204,5087001,0
SO02773205,5087001,0
SO02773206,5087001,14
This is the struct I am parsing the data into:
typedef struct saleslines{
char* salesid;
char* smmcampaignid;
int numberofbottles;
} saleslines_t;
Here is my attempt at parsing the file:
int read_saleslines(saleslines_t* saleslines, int number_of_lines){
char c;
FILE* fp;
fp = fopen(FILENAME, "r"); /* Open the saleslines file */
if(fp == NULL){ /* Crash if file not found */
printf("Error - file not found\n");
return 0;
}
c = getc(fp);
while (c != EOF){
if (c == '\n'){
number_of_lines += 1;
}
c = getc(fp);
}
printf("Number of lines is %d\n", number_of_lines);
saleslines = (saleslines_t*) malloc((number_of_lines * 2) * sizeof(saleslines_t));
/* allocation of the buffer for every line in the File */
char *buf = (char*) malloc(1000);
char *tmp;
if ( ( fp = fopen(FILENAME, "r" ) ) == NULL )
{
printf( "File could not be opened.\n" );
}
int i = 0;
while (fgets(buf, 255, fp) != NULL){
if ((strlen(buf)>0) && (buf[strlen (buf) - 1] == '\n'))
buf[strlen (buf) - 1] = '\0';
tmp = strtok(buf, ",");
saleslines[i].salesid = strdup(tmp);
tmp = strtok(NULL, ",");
saleslines[i].smmcampaignid = strdup(tmp);
tmp = strtok(NULL, ",");
saleslines[i].numberofbottles = atoi(tmp);
printf("Salesid: %s\nCampaign: %s\nBottles: %i\n\n", saleslines[i].salesid , saleslines[i].smmcampaignid, saleslines[i].numberofbottles);
i++;
}
free(buf);
fclose(fp);
printf("Number of lines is %i\n", number_of_lines);
return number_of_lines;
}
For some reason it parses the file and prints the resulting array of structs, however when I call this function immediately after, it crashes with a segfault:
void print_saleslines_struct(saleslines_t* saleslines, int number_of_lines{
int i;
printf("Number of lines is %i", number_of_lines);
for(i = 0; i < number_of_lines; i++){
printf("Salesid:\t %s\n", saleslines[i].salesid);
printf("Campaign:\t %s\n", saleslines[i].smmcampaignid);
printf("# of Bottles:\t %d\n", saleslines[i].numberofbottles);
}
}
I can't seem to find where this memory bug is.
Here is the initialization and main:
saleslines_t* saleslines;
saleslines_summary_t* saleslines_summary;
saleslines_grouped_t* saleslines_grouped;
int number_of_lines = 0;
int* number_of_linesp = &number_of_lines;
/* Main */
int main(){
int chosen_option;
while(1){
printf("What would you like to do?\n");
printf("1. Read saleslines.txt\n");
printf("2. Print saleslines\n");
printf("3. Summarise saleslines\n");
printf("4. Exit the program\n");
scanf("%d", &chosen_option);
switch(chosen_option){
/* case 1 : number_of_lines = read_saleslines_file(saleslines, number_of_lines); break; */
case 1 : number_of_lines = read_saleslines(saleslines, number_of_lines); break;
case 2 : printf("Number of lines is %i", number_of_lines); print_saleslines_struct(saleslines, number_of_lines); break;
case 3 : summarise_saleslines(saleslines, number_of_linesp, saleslines_summary, saleslines_grouped); break;
case 4 : free(saleslines); free(saleslines_summary); free(saleslines_grouped); return 0;
}
}
return 0;
}
Update
The issue seems to be with my initialization of the array of structures.
When I initialize it like this: saleslines_t* saleslines;
and then malloc like this: saleslines = malloc(number_of_lines + 1 * sizeof(saleslines_t);
I get a segfault.
But if I initialize like this: saleslines[600]; (allocating more than the number of lines in the file), everything works.
How can I get around this? I would like to be able to dynamically allocate the number of entries within the struct array.
Edit 2
Here are the changes as suggested:
int read_saleslines(saleslines_t** saleslines, int number_of_lines);
saleslines_t* saleslines;
int number_of_lines = 0;
int main(){
while(1){
printf("What would you like to do?\n");
printf("1. Read saleslines.txt\n");
printf("2. Print saleslines\n");
printf("3. Summarise saleslines\n");
printf("4. Exit the program\n");
printf("Number of saleslines = %i\n", number_of_lines);
scanf("%d", &chosen_option);
switch(chosen_option){
/* case 1 : number_of_lines = read_saleslines_file(saleslines, number_of_lines); break; */
case 1 : number_of_lines = read_saleslines(&saleslines, number_of_lines); break;
case 2 : printf("Number of lines is %i", number_of_lines); print_saleslines_struct(saleslines, number_of_lines); break;
case 3 : summarise_saleslines(saleslines, number_of_linesp, saleslines_summary, saleslines_grouped); break;
case 4 : free(saleslines); free(saleslines_summary); free(saleslines_grouped); return 0;
}
}
return 0;
}
int read_saleslines(saleslines_t** saleslines, int number_of_lines)
{
char c;
FILE* fp;
fp = fopen(FILENAME, "r"); /* Open the saleslines file */
if(fp == NULL){ /* Crash if file not found */
printf("Error - file not found\n");
return 0;
}
c = getc(fp);
while (c != EOF){
if (c == '\n'){
number_of_lines += 1;
}
c = getc(fp);
}
fclose(fp);
printf("Number of lines is %d\n", number_of_lines);
*saleslines = (saleslines_t*) malloc((number_of_lines + 1) * sizeof(saleslines_t));
/* allocation of the buffer for every line in the File */
char *buf = malloc(25);
char *tmp;
if ( ( fp = fopen(FILENAME, "r" ) ) == NULL )
{
printf( "File could not be opened.\n" );
}
int i = 0;
while (fgets(buf, 25, fp) != NULL){
if ((strlen(buf)>0) && (buf[strlen (buf) - 1] == '\n'))
buf[strlen (buf) - 1] = '\0';
tmp = strtok(buf, ",");
(*saleslines)[i].salesid = strdup(tmp);
tmp = strtok(NULL, ",");
(*saleslines)[i].smmcampaignid = strdup(tmp);
tmp = strtok(NULL, ",");
(*saleslines)[i].numberofbottles = atoi(tmp);
printf("Salesid: %s\nCampaign: %s\nBottles: %i\n\n", saleslines[i]->salesid , saleslines[i]->smmcampaignid, saleslines[i]->numberofbottles);
i++;
}
free(buf);
fclose(fp);
printf("Number of lines is %i\n", number_of_lines);
return number_of_lines;
}
The program now segfaults after reading the first element in the struct array.
You have a problem with the arguments of read_saleslines(). The first argument should be a pointer to an array of your structs, meaning a double pointer.
In
int read_saleslines(saleslines_t* saleslines, int number_of_lines){
you want to modify where saleslines is pointing. saleslines is a local variable of the function, and the scope is that function. Once you exit read_saleslines(), the variable is "destroyed", meaning that the value it holds it is not accessible anymore. Adding another level of indirection, a pointer, you can modify the variable that's defined outside the function, being that (ugly) global or other. So, change that argument so that the function prototype matches
int read_saleslines(saleslines_t** saleslines, int *);
and change the places where you access it inside the function (adding an * to access it, for example:
saleslines = (saleslines_t*) malloc((number_of_lines * ...
to
*saleslines = (saleslines_t*) malloc((number_of_lines * ...
and
saleslines[i].salesid = strdup(tmp);
to
(*saleslines)[i].salesid = strdup(tmp);
Then add an & where you use the variable outside the function:
number_of_lines = read_saleslines(saleslines, number_of_lines);
changes to
some_var = read_saleslines(&saleslines, &number_of_lines);
That will make you code work.
You have a large number of errors in your code, and with your approach in general. There is no need to make two-passes over the file to determine the number of lines before allocating and then re-reading the file in an attempt to parse the data. Further, there is no need to tokenize each line to separate the comma-separated-values, sscanf() to parse the two strings and one int is sufficient here after reading each line with fgets.
While you are free to pass any mix of parameters you like and return whatever you like, since you are allocating for an array of struct and reading values into the array, it makes sense to return a pointer to the allocated array from your function (or NULL on failure) and simply update a parameter passed as a pointer to make the total number of lines read available back in the caller.
Further, generally you want to open and validate the file in the caller and pass a FILE* parameter passing the open file stream to your function. With that in mind, you could refactor your function as:
/* read saleslines into array of saleslines_t, allocating for
* salesid, and smmcampaignid within each struct. Return pointer
* to allocated array on success with lines updated to hold the
* number of elements, or NULL otherwise.
*/
saleslines_t *read_saleslines (FILE *fp, size_t *lines)
{
Within your function, you simply need a buffer to hold each line read, a counter to track the number of elements allocated in your array, and a pointer to your array to return. For example, you could do something like the following to handle all three:
char buf[MAXC]; /* buffer to hold line */
size_t maxlines = MINL; /* maxlines allocated */
saleslines_t *sales = NULL; /* pointer to array of struct */
(note: since you are tracking the number of lines read through the pointer lines passed as a parameter, it would make sense to initialize the value at that address to zero)
Now the work of your function begins, you want to read each line into buf and parse the needed information from each line. Since salesid and smmcampaignid are both pointers-to-char in your struct, you will need to allocate a block of memory for each string parsed from the line, copy the string to the new block of memory, and then assign the beginning address for the bock to each of your pointers. To "dynamically" handle allocating elements for your struct, you simply check if the number of lines (*lines) filled equals against the number allocated (maxlines), (or if *lines is zero indicating a need for an initial allocation), and realloc in both cases to either realloc (or newly allocate) storage for your array of struct.
When you realloc you always realloc using a temporary pointer so if realloc fails and returns NULL, you don't overwrite your pointer to the currently allocated block with NULL thereby creating a memory leak.
Putting all that together at the beginning of your function may seem daunting, but it is actually straight forward, e.g.
while (fgets (buf, MAXC, fp)) { /* read each line in file */
char id[MAXC], cid[MAXC]; /* temp arrays to hold strings */
int bottles; /* temp int for numberofbottles */
if (*lines == maxlines || !*lines) { /* check if realloc req'd */
/* always realloc with a temp pointer */
void *tmp = realloc (sales, 2 * maxlines * sizeof *sales);
if (!tmp) { /* if realloc fails, original pointer still valid */
perror ("realloc-sales"); /* throw error */
return sales; /* return current pointer */
} /* (don't exit or return NULL) */
sales = tmp; /* assign reallocated block to sales */
/* (optional) zero newly allocated memory */
memset (sales + *lines, 0, maxlines * sizeof *sales);
maxlines *= 2; /* update maxlines allocated */
}
Now you are ready to parse the wanted information from your line with sscanf, and then following a successful parse of information, you can allocate for each of your salesid and smmcampaignid pointers, copy the parsed information to the new blocks of memory assigning the beginning address to each pointer, respectively, e.g.
/* parse needed data from line (sscanf is fine here) */
if (sscanf (buf, "%1023[^,],%1023[^,],%d", id, cid, &bottles) == 3) {
size_t idlen = strlen (id), /* get lengths of strings */
cidlen = strlen (cid);
sales[*lines].salesid = malloc (idlen + 1); /* allocate string */
if (!sales[*lines].salesid) { /* validate! */
perror ("malloc-sales[*lines].salesid");
break;
}
sales[*lines].smmcampaignid = malloc (cidlen + 1); /* ditto */
if (!sales[*lines].smmcampaignid) {
perror ("malloc-sales[*lines].smmcampaignid");
break;
}
memcpy (sales[*lines].salesid, id, idlen + 1); /* copy strings */
memcpy (sales[*lines].smmcampaignid, cid, cidlen + 1);
sales[(*lines)++].numberofbottles = bottles; /* assign int */
} /* (note lines counter updated in last assignment) */
(note: you can use strdup to both get the length of each string parsed and allocate sufficient memory to hold the string and assign that to your pointer in one-shot, e.g. sales[*lines].salesid = strdup (id);, but... strdup is not required to be included in C99 or later, so it is just as simple to get the length, allocate length + 1 bytes and then memcpy your string manually to ensure portability. Further, since strdup allocates memory, you must validate the pointer returned -- something overlooked by 99% of those using it.)
That's it, when fgets() fails, you have reached EOF, now simply:
return sales; /* return dynamically allocated array of struct */
}
Putting it altogether in a short, working example that takes the filename to read as the first argument to your program (or reads from stdin by default if no argument is given), you could do:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
#define MINL 2
typedef struct saleslines{
char *salesid;
char *smmcampaignid;
int numberofbottles;
} saleslines_t;
/* read saleslines into array of saleslines_t, allocating for
* salesid, and smmcampaignid within each struct. Return pointer
* to allocated array on success with lines updated to hold the
* number of elements, or NULL otherwise.
*/
saleslines_t *read_saleslines (FILE *fp, size_t *lines)
{
char buf[MAXC]; /* buffer to hold line */
size_t maxlines = MINL; /* maxlines allocated */
saleslines_t *sales = NULL; /* pointer to array of struct */
*lines = 0; /* zero lines */
while (fgets (buf, MAXC, fp)) { /* read each line in file */
char id[MAXC], cid[MAXC]; /* temp arrays to hold strings */
int bottles; /* temp int for numberofbottles */
if (*lines == maxlines || !*lines) { /* check if realloc req'd */
/* always realloc with a temp pointer */
void *tmp = realloc (sales, 2 * maxlines * sizeof *sales);
if (!tmp) { /* if realloc fails, original pointer still valid */
perror ("realloc-sales"); /* throw error */
return sales; /* return current pointer */
} /* (don't exit or return NULL) */
sales = tmp; /* assign reallocated block to sales */
/* (optional) zero newly allocated memory */
memset (sales + *lines, 0, maxlines * sizeof *sales);
maxlines *= 2; /* update maxlines allocated */
}
/* parse needed data from line (sscanf is fine here) */
if (sscanf (buf, "%1023[^,],%1023[^,],%d", id, cid, &bottles) == 3) {
size_t idlen = strlen (id), /* get lengths of strings */
cidlen = strlen (cid);
sales[*lines].salesid = malloc (idlen + 1); /* allocate string */
if (!sales[*lines].salesid) { /* validate! */
perror ("malloc-sales[*lines].salesid");
break;
}
sales[*lines].smmcampaignid = malloc (cidlen + 1); /* ditto */
if (!sales[*lines].smmcampaignid) {
perror ("malloc-sales[*lines].smmcampaignid");
break;
}
memcpy (sales[*lines].salesid, id, idlen + 1); /* copy strings */
memcpy (sales[*lines].smmcampaignid, cid, cidlen + 1);
sales[(*lines)++].numberofbottles = bottles; /* assign int */
} /* (note lines counter updated in last assignment) */
}
return sales; /* return dynamically allocated array of struct */
}
int main (int argc, char **argv) {
saleslines_t *sales = NULL; /* pointer to saleslines_t */
size_t nlines;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
sales = read_saleslines (fp, &nlines); /* read saleslines */
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < nlines; i++) { /* loop over each */
printf ("sales[%2zu]: %s %s %2d\n", i, sales[i].salesid,
sales[i].smmcampaignid, sales[i].numberofbottles);
free (sales[i].salesid); /* free salesid */
free (sales[i].smmcampaignid); /* free smmcampaignid */
}
free (sales); /* free sales */
return 0;
}
Example Use/Output
$ ./bin/saleslines dat/saleslines.txt
sales[ 0]: SO02773202 5087001 0
sales[ 1]: SO02773203 5087001 0
sales[ 2]: SO02773204 5087001 0
sales[ 3]: SO02773205 5087001 0
sales[ 4]: SO02773206 5087001 14
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/saleslines dat/saleslines.txt
==19819== Memcheck, a memory error detector
==19819== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==19819== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==19819== Command: ./bin/saleslines dat/saleslines.txt
==19819==
sales[ 0]: SO02773202 5087001 0
sales[ 1]: SO02773203 5087001 0
sales[ 2]: SO02773204 5087001 0
sales[ 3]: SO02773205 5087001 0
sales[ 4]: SO02773206 5087001 14
==19819==
==19819== HEAP SUMMARY:
==19819== in use at exit: 0 bytes in 0 blocks
==19819== total heap usage: 13 allocs, 13 frees, 935 bytes allocated
==19819==
==19819== All heap blocks were freed -- no leaks are possible
==19819==
==19819== For counts of detected and suppressed errors, rerun with: -v
==19819== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
There is nothing difficult in dynamically allocating for anything. Just take it in small enough pieces that you dot all "I's" and cross all "T's" for each pointer requiring allocation. Look things over and let me know if you have further questions.
I am trying to count all syllables in each word of a string that I passed into an array.
A syllable counts as two vowels adjacent to each other (a, e, i , o ,u, y). For example, the "ee" in "peel" counts as 1 syllable. But, the "u" and "e" in "juked" count as 2 syllables. The "e" at the end of a word does not count as a syllable. Also, each word has at least 1 syllable even if the previous rules don't apply.
I have a file that contained a large string (words, spaces, and newlines) that I passed into an array.
I have code that counts each word by counting the whitespace between them and newlines. See below:
for (i = 0; i < lengthOfFile; i++)
{
if (charArray[i] == ' ' || charArray[i] == '\n')
{
wordCount++;
}
}
Where charArray is the file that is passed into an array (freads) and lengthOfFile is the total bytes in the file counted by (fseek) and wordCount is total words counted.
From here, I need to somehow count the syllables in each word in the array but don't know where to start.
If you are still having trouble, it's only because you are overthinking the problem. Whenever you are counting, determining frequency, etc.., you can normally simplify things by using a "State-Loop". A state-loop is nothing more than a loop where you loop over each character (or whatever) and handle whatever state you find yourself in, like:
have I read any characters? (if not, handle that state);
is the current character a space? (if so, assuming no multiple-spaces for simplicity, you have reached the end of a word, handle that state);
is the current character a non-space and non-vowel? (if so, if my last character was a vowel, increment my syllable count); and
what do I need to do regardless of the classification of the current char? (output it, set last = current, etc..)
That's basically it and can be translated into a single loop with a number of tests to handle each state. You can also add a check to insure that words like "my" and "he" are counted as a single syllable by checking if your syllable count is zero when you reach the end of the word.
Putting it altogether, you could write a basic implementation like:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main (void) {
char c, last = 0; /* current & last char */
const char *vowels = "AEIOUYaeiouy"; /* vowels (plus Yy) */
size_t syllcnt = 0, totalcnt = 0; /* word syllable cnt & total */
while ((c = getchar()) != EOF) { /* read each character */
if (!last) { /* if 1st char (no last) */
putchar (c); /* just output it */
last = c; /* set last */
continue; /* go get next */
}
if (isspace (c)) { /* if space, end of word */
if (!syllcnt) /* if no syll, it's 1 (he, my) */
syllcnt = 1;
printf (" - %zu\n", syllcnt); /* output syll cnt and '\n' */
totalcnt += syllcnt; /* add to total */
syllcnt = 0; /* reset syllcnt to zero */
} /* otherwise */
else if (!strchr (vowels, c)) /* if not vowel */
if (strchr (vowels, last)) /* and last was vowel */
syllcnt++; /* increment syllcnt */
if (!isspace (c)) /* if not space */
putchar (c); /* output it */
last = c; /* set last = c */
}
printf ("\n total syllables: %zu\n", totalcnt);
}
(note: as mentioned above, this simple example implementation does not consider multiple spaces between words -- which you can simply add as another needed condition by checking whether !isspace (last). Can you figure out where that check should be added, hint: it's added to an existing check with && -- fine tuning is left to you)
Example Use/Output
$ echo "my dog eats banannas he peels while getting juked" | ./syllablecnt
my - 1
dog - 1
eats - 1
banannas - 3
he - 1
peels - 1
while - 1
getting - 2
juked - 2
total syllables: 13
If you need to read words from a file, simply redirect the file as input to the program on stdin, e.g.
./syllablecnt < inputfile
Edit - Reading from File into Dynamically Allocated Buffer
Following on from the comments about wanting to read from a file (or stdin) into a dynamically sized buffer and then traversing the buffer to output the syllables per-word and total syllables, you could do something like the following that simply reads all characters from a file into a buffer initially allocated holding 8-characters and is reallocated as needed (doubling the allocation size each time a realloc is needed). That is a fairly standard and reasonably efficient buffer growth strategy. You are free to grow it by any size you like, but avoid many small rabbit-pellet reallocations as memory allocation is relatively expensive from a computing standpoint.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define NCHAR 8 /* initial characters to allocate */
int main (int argc, char **argv) {
char c, last = 0, *buffer; /* current, last & pointer */
const char *vowels = "AEIOUYaeiouy"; /* vowels */
size_t syllcnt = 0, totalcnt = 0, /* word syllable cnt & total */
n = 0, size = NCHAR;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("fopen-file");
return 1;
}
/* allocate/validate initial NCHAR buffer size */
if (!(buffer = malloc (size))) {
perror ("malloc-buffer");
return 1;
}
while ((c = fgetc(fp)) != EOF) { /* read each character */
buffer[n++] = c; /* store, increment count */
if (n == size) { /* reallocate as required */
void *tmp = realloc (buffer, 2 * size);
if (!tmp) { /* validate realloc */
perror ("realloc-tmp");
break; /* still n good chars in buffer */
}
buffer = tmp; /* assign reallocated block to buffer */
size *= 2; /* update allocated size */
}
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
for (size_t i = 0; i < n; i++) { /* loop over all characters */
c = buffer[i]; /* set to c to reuse code */
if (!last) { /* if 1st char (no last) */
putchar (c); /* just output it */
last = c; /* set last */
continue; /* go get next */
}
if (isspace(c) && !isspace(last)) { /* if space, end of word */
if (!syllcnt) /* if no syll, it's 1 (he, my) */
syllcnt = 1;
printf (" - %zu\n", syllcnt); /* output syll cnt and '\n' */
totalcnt += syllcnt; /* add to total */
syllcnt = 0; /* reset syllcnt to zero */
} /* otherwise */
else if (!strchr (vowels, c)) /* if not vowel */
if (strchr (vowels, last)) /* and last was vowel */
syllcnt++; /* increment syllcnt */
if (!isspace (c)) /* if not space */
putchar (c); /* output it */
last = c; /* set last = c */
}
free (buffer); /* don't forget to free what you allocate */
printf ("\n total syllables: %zu\n", totalcnt);
}
(you can do the same using fgets or using POSIX getline, or allocate all at once with your fseek/ftell or stat and then fread the entire file into a buffer in a single call -- up to you)
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/syllablecnt_array dat/syllables.txt
==19517== Memcheck, a memory error detector
==19517== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==19517== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==19517== Command: ./bin/syllablecnt_array dat/syllables.txt
==19517==
my - 1
dog - 1
eats - 1
banannas - 3
he - 1
peels - 1
while - 1
getting - 2
juked - 2
total syllables: 13
==19517==
==19517== HEAP SUMMARY:
==19517== in use at exit: 0 bytes in 0 blocks
==19517== total heap usage: 5 allocs, 5 frees, 672 bytes allocated
==19517==
==19517== All heap blocks were freed -- no leaks are possible
==19517==
==19517== For counts of detected and suppressed errors, rerun with: -v
==19517== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.
I'm relatively new to programming in C and am trying to read input from stdin using fgets.
To begin with I thought about reading max 50 lines, max 50 characters each, and had something like:
int max_length = 50;
char lines[max_length][max_length];
char current_line[max_length];
int idx = 0;
while(fgets(current_line, max_length, stdin) != NULL) {
strcopy(lines[idx], current_line);
idx++;
}
The snippet above successfully reads the input and stores it into the lines array where I can sort and print it.
My question is how do I deal with an unknown number of lines, with an unknown number of characters on each line? (bearing in mind that I will have to sort the lines and print them out).
While there are a number of different variations of this problem already answered, the considerations of how to go about it could use a paragraph. When faced with this problem, the approach is the same regardless of which combination of library or POSIX functions you use to do it.
Essentially, you will dynamically allocate a reasonable number of characters to hold each line. POSIX getline will do this for you automatically, using fgets you can simply read a fixed buffer full of chars and append them (reallocating storage as necessary) until the '\n' character is read (or EOF is reached)
If you use getline, then you must allocate memory for, and copy the buffer filled. Otherwise, you will overwrite previous lines with each new line read, and when you attempt to free each line, you will likely SegFault with double-free or corruption as you repeatedly attempt to free the same block of memory.
You can use strdup to simply copy the buffer. However, since strdup allocates storage, you should validate successful allocation before assigned a pointer to the new block of memory to your collection of lines.
To access each line, you need a pointer to the beginning of each (the block of memory holding each line). A pointer to pointer to char is generally used. (e.g. char **lines;) Memory allocation is generally handled by allocating some reasonable number of pointers to begin with, keeping track of the number you use, and when you reach the number you have allocated, you realloc and double the number of pointers.
As with each read, you need to validate each memory allocation. (each malloc, calloc, or realloc) You also need to validate the way your program uses the memory you allocate by running the program through a memory error check program (such as valgrind for Linux). They are simple to use, just valgrind yourexename.
Putting those pieces together, you can do something similar to the following. The following code will read all lines from the filename provided as the first argument to the program (or from stdin by default if no argument is provided) and print the line number and line to stdout (keep that in mind if you run it on a 50,000 line file)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NPTR 8
int main (int argc, char **argv) {
size_t ndx = 0, /* line index */
nptrs = NPTR, /* initial number of pointers */
n = 0; /* line alloc size (0, getline decides) */
ssize_t nchr = 0; /* return (no. of chars read by getline) */
char *line = NULL, /* buffer to read each line */
**lines = NULL; /* pointer to pointer to each line */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
/* allocate/validate initial 'nptrs' pointers */
if (!(lines = calloc (nptrs, sizeof *lines))) {
fprintf (stderr, "error: memory exhausted - lines.\n");
return 1;
}
/* read each line with POSIX getline */
while ((nchr = getline (&line, &n, fp)) != -1) {
if (nchr && line[nchr - 1] == '\n') /* check trailing '\n' */
line[--nchr] = 0; /* overwrite with nul-char */
char *buf = strdup (line); /* allocate/copy line */
if (!buf) { /* strdup allocates, so validate */
fprintf (stderr, "error: strdup allocation failed.\n");
break;
}
lines[ndx++] = buf; /* assign start address for buf to lines */
if (ndx == nptrs) { /* if pointer limit reached, realloc */
/* always realloc to temporary pointer, to validate success */
void *tmp = realloc (lines, sizeof *lines * nptrs * 2);
if (!tmp) { /* if realloc fails, bail with lines intact */
fprintf (stderr, "read_input: memory exhausted - realloc.\n");
break;
}
lines = tmp; /* assign reallocted block to lines */
/* zero all new memory (optional) */
memset (lines + nptrs, 0, nptrs * sizeof *lines);
nptrs *= 2; /* increment number of allocated pointers */
}
}
free (line); /* free memory allocated by getline */
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < ndx; i++) {
printf ("line[%3zu] : %s\n", i, lines[i]);
free (lines[i]); /* free memory for each line */
}
free (lines); /* free pointers */
return 0;
}
If you don't have getline, or strdup, you can easily implement each. There are multiple examples of each on the site. If you cannot find one, let me know. If you have further questions, let me know as well.
Check out the GetString function found here.
I need to read input from a file, then split the word in capitals from it's definition. My trouble being that I need multiple lines from the file to be in one variable to pass it to another function.
The file I want to read from looks like this
ACHROMATIC. An optical term applied to those telescopes in which
aberration of the rays of light, and the colours dependent thereon, are
partially corrected. (See APLANATIC.)
ACHRONICAL. An ancient term, signifying the rising of the heavenly
bodies at sunset, or setting at sunrise.
ACROSS THE TIDE. A ship riding across tide, with the wind in the
direction of the tide, would tend to leeward of her anchor; but with a
weather tide, or that running against the wind, if the tide be strong,
would tend to windward. A ship under sail should prefer the tack that
stems the tide, with the wind across the stream, when the anchor is
let go.
Right now my code splits the word from the rest, but I'm having difficulty getting the rest of the input into one variable.
while(fgets(line, sizeof(line), mFile) != NULL){
if (strlen(line) != 2){
if (isupper(line[0]) && isupper(line[1])){
word = strtok(line, ".");
temp = strtok(NULL, "\n");
len = strlen(temp);
for (i=0; i < len; i++){
*(defn+i) = *(temp+i);
}
printf("Word: %s\n", word);
}
else{
temp = strtok(line, "\n");
for (i=len; i < strlen(temp) + len; i++);
*(defn+i) = *(temp+i-len);
len = len + strlen(temp);
//printf(" %s\n", temp);
}
}
else{
len = 0;
printf("%s\n", defn);
index = 0;
}
}
like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>
//another function
void func(char *word, char *defs){
printf("<%s>\n", word);
if(defs){
printf("%s", defs);
}
}
int main(void){
char buffer[4096], *curr = buffer;
size_t len, buf_size = sizeof buffer;
FILE *fp = fopen("dic.txt", "r");
while(fgets(curr, buf_size, fp)){
//check definition line
if(*curr == '\n' || !isupper(*curr)){
continue;//printf("invalid format\n");
}
len = strlen(curr);
curr += len;
buf_size -= len;
//read rest line
while(1){
curr = fgets(curr, buf_size, fp);
if(!curr || *curr == '\n'){//upto EOF or blank line
char *word, *defs;
char *p = strchr(buffer, '.');
if(p)
*p++ = 0;
word = buffer;
defs = p;
func(word, defs);
break;
}
len = strlen(curr);
curr += len;
buf_size -= len;
assert(buf_size >= 2 || (fprintf(stderr, "small buffer\n"), 0));
}
curr = buffer;
buf_size = sizeof buffer;
}
fclose(fp);
return 0;
}
It appears you need to first pull a string of uppercase letters from the beginning of the line, up to the first period, then concatenate the remainder of that line with subsequent lines until a blank line is found. Lather, rinse, repeat as needed.
While this task would be MUCH easier in Perl, if you need to do it in C, for one thing I recommend using the built-in string functions instead of constructing your own for-loops to copy the data. Perhaps something like the following:
while(fgets(line, sizeof(line), mFile) != NULL) {
if (strlen(line) > 2) {
if (isupper(line[0]) && isupper(line[1])) {
word = strtok(line, ".");
strcpy(defn,strtok(NULL, "\n"));
printf("Word: %s\n", word);
} else {
strcat(defn,strtok(line, "\n"));
}
} else {
printf("%s\n", defn);
defn[0] = 0;
}
}
When I put this in a properly structured C program, with appropriate include files, it works fine. I personally would have approached the problem differently, but hopefully this gets you going.
There are several areas that can be addressed. Given your example input and description, it appears your goal is to develop a function that will read and separate each word (or phrase) and associated definition, return a pointer to the collection of words/definitions, while also updating a pointer to the number of words and definitions read so that number is available back in the calling function (main here).
While your data suggests that the word and definition are both contained within a single line of text with the word (or phrase written in all upper-case), it is unclear whether you will have to address the case where the definition can span multiple lines (essentially causing you to potentially read multiple lines and combine them to form the complete definition.
Whenever you need to maintain relationships between multiple variables within a single object, then a struct is a good choice for the base data object. Using an array of struct allows you access to each word and its associated definition once all have been read into memory. Now your example has 3 words and definitions. (each separated by a '\n'). Creating an array of 3 struct to hold the data is trivial, but when reading data, like a dictionary, you rarely know exactly how many words you will have to read.
To handle this situation, a dynamic array of structs is a proper data structure. You essentially allocate space for some reasonable number of words/definitions, and then if you reach that limit, you simply realloc the array containing your data, update your limit to reflect the new size allocated, and continue on.
While you can use strtok to separate the word (or phrase) by looking for the first '.', that is a bit of an overkill. You will need to traverse over each char anyway to check if they are all caps anyway, you may as well just iterate until you find the '.' and use the number for that character index to store your word and set a pointer to the next char after the '.'. You will begin looking for the start of the definition from there (you basically want to skip any character that is not an [a-zA-Z]). Once you locate the beginning of the definition, you can simply get the length of the rest of the line, and copy that as the definition (or the first part of it if the definition is contained in multiple-separate lines).
After the file is read and the pointer returned and the pointer for the number of words updated, you can then use the array of structs back in main as you like. Once you are done using the information, you should free all the memory you have allocated.
Since the size of the maximum word or phrase is generally know, the struct used provides static storage for the word. Give the definitions can vary wildly in length and are much longer, the struct simply contains a pointer-to-char*. So you will have to allocate storage for each struct, and then allocates storage for each definition within each struct.
The following code does just that. It will take the filename to read as the first argument (or it will read from stdin by default if no filename is given). The code the output the words and definitions on single lines. The code is heavily commented to help you follow along and explain the logic e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum {MAXW = 64, NDEF = 128};
typedef struct { /* struct holding words/definitions */
char word[MAXW],
*def; /* you must allocate space for def */
} defn;
defn *readdict (FILE *fp, size_t *n);
int main (int argc, char **argv) {
defn *defs = NULL;
size_t n = 0;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
if (!(defs = readdict (fp, &n))) { /* read words/defs into defs */
fprintf (stderr, "readdict() error: no words read from file.\n");
return 1;
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < n; i++) {
printf ("\nword: %s\n\ndefinition: %s\n", defs[i].word, defs[i].def);
free (defs[i].def); /* free allocated definitions */
}
free (defs); /* free array of structs */
return 0;
}
/** read word and associated definition from open file stream 'fp'
* into dynamic array of struct, updating pointer 'n' to contain
* the total number of defn structs filled.
*/
defn *readdict (FILE *fp, size_t *n)
{
defn *defs = NULL; /* pointer to array of structs */
char buf[BUFSIZ] = ""; /* buffer to hold each line read */
size_t max = NDEF, haveword = 0, offset = 0; /* allocated size & flags */
/* allocate, initialize & validate memory to hold 'max' structs */
if (!(defs = calloc (max, sizeof *defs))) {
fprintf (stderr, "error: virtual memory exhausted.\n");
return NULL;
}
while (fgets (buf, BUFSIZ, fp)) /* read each line of input */
{
if (*buf == '\n') { /* check for blank line */
if (haveword) (*n)++; /* if word/def already read, increment n */
haveword = 0; /* reset haveword flag */
if (*n == max) {
void *tmp = NULL; /* tmp ptr to realloc defs */
if (!(tmp = realloc (defs, sizeof *defs * (max + NDEF)))) {
fprintf (stderr, "error: memory exhaused, realloc defs.\n");
break;
}
defs = tmp; /* assign new block to defs */
memset (defs + max, 0, NDEF * sizeof *defs); /* zero new mem */
max += NDEF; /* update max with current allocation size */
}
continue; /* get next line */
}
if (haveword) { /* word already stored in defs[n].word */
void *tmp = NULL; /* tmp pointer to realloc */
size_t dlen = strlen (buf); /* get line/buf length */
if (buf[dlen - 1] == '\n') /* trim '\n' from end */
buf[--dlen] = 0; /* realloc & validate */
if (!(tmp = realloc (defs[*n].def, offset + dlen + 2))) {
fprintf (stderr,
"error: memory exhaused, realloc defs[%zu].def.\n", *n);
break;
}
defs[*n].def = tmp; /* assign new block, fill with definition */
sprintf (defs[*n].def + offset, offset ? " %s" : "%s", buf);
offset += dlen + 1; /* update offset for rest (if required) */
}
else { /* no current word being defined */
char *p = NULL;
size_t i;
for (i = 0; buf[i] && i < MAXW; i++) { /* check first MAXW chars */
if (buf[i] == '.') { /* if a '.' is found, end of word */
size_t dlen = 0;
if (i + 1 == MAXW) { /* check one char available for '\0' */
fprintf (stderr,
"error: 'word' exceeds MAXW, skipping.\n");
goto next;
}
strncpy (defs[*n].word, buf, i); /* copy i chars to .word */
haveword = 1; /* set haveword flag */
p = buf + i + 1; /* set p to next char in buf after '.' */
while (*p && (*p == ' ' || *p < 'A' || /* find def start */
('Z' < *p && *p < 'a') || 'z' < *p))
p++; /* increment p and check again */
if ((dlen = strlen (p))) { /* get definition length */
if (p[dlen - 1] == '\n') /* trim trailing '\n' */
p[--dlen] = 0;
if (!(defs[*n].def = malloc (dlen + 1))) { /* allocate */
fprintf (stderr,
"error: virtual memory exhausted.\n");
goto done; /* bail if allocation failed */
}
strcpy (defs[*n].def, p); /* copy definition to .def */
offset = dlen; /* set offset in .def buf to be */
} /* used if def continues on a */
break; /* new or separae line */
} /* check word is all upper-case or a ' ' */
else if (buf[i] != ' ' && (buf[i] < 'A' || 'Z' < buf[i]))
break;
}
}
next:;
}
done:;
if (haveword) (*n)++; /* account for last word/definition */
return defs; /* return pointer to array of struct */
}
Example Use/Output
$ ./bin/dict_read <dat/dict.txt
word: ACHROMATIC
definition: An optical term applied to those telescopes in which
aberration of the rays of light, and the colours dependent thereon,
are partially corrected. (See APLANATIC.)
word: ACHRONICAL
definition: An ancient term, signifying the rising of the heavenly
bodies at sunset, or setting at sunrise.
word: ACROSS THE TIDE
definition: A ship riding across tide, with the wind in the direction
of the tide, would tend to leeward of her anchor; but with a weather tide,
or that running against the wind, if the tide be strong, would tend to
windward. A ship under sail should prefer the tack that stems the tide,
with the wind across the stream, when the anchor is let go.
(line breaks were manually inserted to keep the results tidy here).
Memory Use/Error Check
You should also run any code that dynamically allocates memory though a memory use and error checking program like valgrind on linux. Just run the code though it and confirm you free all memory you allocate and that there are no memory errors, e.g.
$ valgrind ./bin/dict_read <dat/dict.txt
==31380== Memcheck, a memory error detector
==31380== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==31380== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==31380== Command: ./bin/dict_read
==31380==
word: ACHROMATIC
<snip output>
==31380==
==31380== HEAP SUMMARY:
==31380== in use at exit: 0 bytes in 0 blocks
==31380== total heap usage: 4 allocs, 4 frees, 9,811 bytes allocated
==31380==
==31380== All heap blocks were freed -- no leaks are possible
==31380==
==31380== For counts of detected and suppressed errors, rerun with: -v
==31380== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Look things over and let me know if you have further questions.