I'm getting a segmentation fault error while parsing data from a CSV file in C Language.
I believe the error is given while reading the last line <person[i].status> as if i comment the same line the code runs perfectly.
Contents of CSV file:
1;A;John Mott;D;30;Z
2;B;Judy Moor;S;60;X
3;A;Kae Blanchett;S;42;y
4;B;Jair Tade;S;21;W
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct Person
{
int id;
char key;
char name[16];
char rel;
int age;
char status;
} Person;
int main()
{
Person person[12];
FILE *f = fopen("data.csv", "r");
char buffer[256];
if (f != NULL)
{
int i = 0;
printf("\nFile OK!\n");
printf("Printing persons:\n");
while (fgets(buffer, 256, f))
{
person[i].id = atoi(strtok(buffer, ";"));
person[i].key = strtok(NULL, ";")[0];
strcpy(person[i].name, strtok(NULL, ";"));
person[i].rel = strtok(NULL, ";")[0];
person[i].age = atoi(strtok(buffer, ";"));
person[i].status = strtok(NULL, ";")[0]; // error: segmentation fault
printf("id: %d\n", person[i].id);
printf("key: %c\n", person[i].key);
printf("name: %s\n", person[i].name);
printf("rel: %c\n", person[i].rel);
printf("age: %d\n", person[i].age);
printf("status: %c\n", person[i].status);
i++;
}
}
else
{
printf("\nFile BAD!\n");
}
return 0;
}
Thank you for your help!
While you have a good answer addressing your problems with strtok(), you may be over-complicating your code by using strtok() to begin with. When reading a delimited file with a fixed delimiter, reading a line-at-a-time into a sufficiently sized buffer and then separating the buffer into the needed values with sscanf() can provide a succinct (and in the case of your use of atoi() a more robust) solution.
Your fields are easily separated in this case using a carefully crafted format-string. For example, reading each line into a buffer (buf in this case) you can separate each of the lines into the needed values with:
if (sscanf (buf, "%d;%c;%15[^;];%c;%d;%c", /* convert to person/VALIDATE */
&person[n].id, &person[n].key, person[n].name,
&person[n].rel, &person[n].age, &person[n].status) == 6)
The conversion to int by sscanf() at least minimally validates the integer conversion. Not so with atoi() which will happily take atoi ("my cow") and fail silently returning zero without any indication things have gone wrong.
Note, in every conversion to string, you must provide a field-width modifier to limit the number of characters stored to one less than your array can hold (saving room for the '\0' nul-terminating character). Otherwise the use of the scanf() family "%s" or "%[..]" is no safer than using gets(). See Why gets() is so dangerous it should never be used!
The same protection of your array bounds for person[] applies on your read loop. Simply keeping a count of the successful conversions and testing before the next read is all you need, e.g.
#define NPERSONS 12 /* if you need a constant, #define one (or more) */
#define MAXNAME 16
#define MAXC 1024
...
char buf[MAXC]; /* buffer to hold each line */
size_t n = 0; /* person counter/index */
Person person[NPERSONS] = {{ .id = 0 }}; /* initialize all elements */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
...
while (n < NPERSONS && fgets (buf, MAXC, fp)) { /* protect array, read line */
if (sscanf (buf, "%d;%c;%15[^;];%c;%d;%c", /* convert to person/VALIDATE */
&person[n].id, &person[n].key, person[n].name,
&person[n].rel, &person[n].age, &person[n].status) == 6)
n++; /* increment count on good conversion */
}
As shown with the #defines above, don't use MagicNumbers in your code. (e.g. 12, 16). Instead declare a constant at the top of your code that provides a convenient single-location to change if your limits later need adjustment.
In the same vein, do not hardcode filenames. There is no reason you should have to re-compile your code just to read from a different file. Pass the filename as the first argument to your program (that's what argc and argv are for), or prompt the user and take the filename as input. Above, the code takes the filename as the first argument, or reads from stdin by default if no argument is provided (like most Unix utilities do).
Putting that altogether, you could do something similar to:
#include <stdio.h>
#define NPERSONS 12 /* if you need a constant, #define one (or more) */
#define MAXNAME 16
#define MAXC 1024
typedef struct Person {
int id;
char key;
char name[MAXNAME];
char rel;
int age;
char status;
} Person;
int main (int argc, char **argv) {
char buf[MAXC]; /* buffer to hold each line */
size_t n = 0; /* person counter/index */
Person person[NPERSONS] = {{ .id = 0 }}; /* initialize all elements */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (n < NPERSONS && fgets (buf, MAXC, fp)) { /* protect array, read line */
if (sscanf (buf, "%d;%c;%15[^;];%c;%d;%c", /* convert to person/VALIDATE */
&person[n].id, &person[n].key, person[n].name,
&person[n].rel, &person[n].age, &person[n].status) == 6)
n++; /* increment count on good conversion */
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
for (size_t i = 0; i < n; i++) /* output results */
printf ("person[%zu] %3d %c %-15s %c %3d %c\n", i,
person[i].id, person[i].key, person[i].name,
person[i].rel, person[i].age, person[i].status);
}
(note: you only need one call to printf() to output any contiguous block of output with conversions. If you have no conversions required, use puts() or fputs() if end-of-line control is needed)
Lastly, do not skimp on buffer size. 16 seems horribly short for a name field (64 is still pushing it). By using the field-width modifier you are protected against Undefined Behavior due to overwriting your array bounds (the code will simply skip the line), but you should add an else { ... } condition to output an error in that case. 16 is sufficient for your example data, but for general use, you would want to adjust that to a larger value.
Example Use/Output
With your sample input in the file named dat/person_id-status.txt, you could do:
$ ./bin/person_id-status dat/person_id-status.txt
person[0] 1 A John Mott D 30 Z
person[1] 2 B Judy Moor S 60 X
person[2] 3 A Kae Blanchett S 42 y
person[3] 4 B Jair Tade S 21 W
Those there the main points that struct me looking over your code. (I'm sure I've forgotten to mention one or two more) Look things over and let me know if you have further questions.
Related
I'm having some troubles using strtok function.
As an exercise I have to deal with a text file by ruling out white spaces, transforming initials into capital letters and printing no more than 20 characters in a line.
Here is a fragment of my code:
fgets(sentence, SIZE, f1_ptr);
char *tok_ptr = strtok(sentence, " \n"); //tokenazing each line read
tok_ptr[0] = toupper(tok_ptr[0]); //initials to capital letters
int num = 0, i;
while (!feof(f1_ptr)) {
while (tok_ptr != NULL) {
for (i = num; i < strlen(tok_ptr) + num; i++) {
if (i % 20 == 0 && i != 0) //maximum of 20 char per line
fputc('\n', stdout);
fputc(tok_ptr[i - num], stdout);
}
num = i;
tok_ptr = strtok(NULL, " \n");
if (tok_ptr != NULL)
tok_ptr[0] = toupper(tok_ptr[0]);
}
fgets(sentence, SIZE + 1, f1_ptr);
tok_ptr = strtok(sentence, " \n");
if (tok_ptr != NULL)
tok_ptr[0] = toupper(tok_ptr[0]);
}
The text is just a bunch of lines I just show as a reference:
Watch your thoughts ; they become words .
Watch your words ; they become actions .
Watch your actions ; they become habits .
Watch your habits ; they become character .
Watch your character ; it becomes your destiny .
Here is what I obtain in the end:
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;THeyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacteR.Wat
chYourCharacter;ItBe
comesYourDEstiny.Lao
-Tze
The final result is mostly correct, but sometimes (for example "they" in they become (and only in that case) or "destiny") words are not correctly tokenized. So for example "they" is split into "t" and "hey" resulting in THey (DEstiny in the other instance) after the manipulations I made.
Is it some bug or am I missing something? Probably my code is not that efficient and some condition may end up being critical...
Thank you for the help, it's not that big of a deal, I just don't understand why such a behaviour is occurring.
You have a large number of errors in your code and you are over-complicating the problem. The most pressing error is Why is while ( !feof (file) ) always wrong? Why? Trace the execution-path within your loop. You attempt to read with fgets(), and then you use sentence without knowing whether EOF was reached calling tok_ptr = strtok(sentence, " \n"); before you ever get around to checking feof(f1_ptr)
What happens when you actually reach EOF? That IS "Why while ( !feof (file) ) is always wrong?" Instead, you always want to control your read-loop with the return of the read function you are using, e.g. while (fgets(sentence, SIZE, f1_ptr) != NULL)
What is it you actually need your code to do?
The larger question is why are you over-complicating the problem with strtok, and arrays (and fgets() for that matter)? Think about what you need to do:
read each character in the file,
if it is whitespace, ignore it, set the in-word flag false,
if a non-whitespace, if 1st char in word, capitalize it, output the char, set the in-word flag true and increment the number of chars output to the current line, and finally
if it is the 20th character output, output a newline and reset the counter zero.
The bare-minimum tools you need from your C-toolbox are fgetc(), isspace() and toupper() from ctype.h, a counter for the number of characters output, and a flag to know if the character is the first non-whitespace character after a whitespace.
Implementing the logic
That makes the problem very simple. Read a character, is it whitespace?, set your in-word flag false, otherwise if your in-word flag is false, capitalize it, output the character, set your in-word flag true, increment your word count. Last thing you need to do is check if your character-count has reached the limit, if so output a '\n' and reset your character-count zero. Repeat until you run out of characters.
You can turn that into a code with something similar to the following:
#include <stdio.h>
#include <ctype.h>
#define CPL 20 /* chars per-line, if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
int c, in = 0, n = 0; /* char, in-word flag, no. of chars output in line */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while ((c = fgetc(fp)) != EOF) { /* read / validate each char in file */
if (isspace(c)) /* char is whitespace? */
in = 0; /* set in-word flag false */
else { /* otherwise, not whitespace */
putchar (in ? c : toupper(c)); /* output char, capitalize 1st in word */
in = 1; /* set in-word flag true */
n++; /* increment character count */
}
if (n == CPL) { /* CPL limit reached? */
putchar ('\n'); /* output newline */
n = 0; /* reset cpl counter */
}
}
putchar ('\n'); /* tidy up with newline */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
}
Example Use/Output
Given your input file stored on my computer in dat/text220.txt, you can produce the output you are looking for with:
$ ./bin/text220 dat/text220.txt
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacter.Wat
chYourCharacter;ItBe
comesYourDestiny.
(the executable for the code was compiled to bin/text220, I usually keep separate dat, obj, and bin directories for data, object files and executables to keep by source code directory clean)
note: by reading from stdin by default if no filename is provided as the first argument to the program, you can use your program to read input directly, e.g.
$ echo "my dog has fleas - bummer!" | ./bin/text220
MyDogHasFleas-Bummer
!
No fancy string functions required, just a loop, a character, a flag and a counter -- the rest is just arithmetic. It's always worth trying to boils your programming problems down to basic steps and then look around your C-toolbox and find the right tool for each basic step.
Using strtok
Don't get me wrong, there is nothing wrong with using strtok and it makes a fairly simple solution in this case -- the point I was making is that for simple character-oriented string-processing, it's often just a simple to loop over the characters in the line. You don't gain any efficiencies using fgets() with an array and strtok(), the read from the file is already placed into a buffer of BUFSIZ1.
If you did want to use strtok(), you should control you read-loop your with the return from fgets()and then you can tokenize with strtok() also checking its return at each point. A read-loop with fgets() and a tokenization loop with strtok(). Then you handle first-character capitalization and then limiting your output to 20-chars per-line.
You could do something like the following:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define CPL 20 /* chars per-line, if you need a constant, #define one (or more) */
#define MAXC 1024
#define DELIM " \t\r\n"
void putcharCPL (int c, int *n)
{
if (*n == CPL) { /* if n == limit */
putchar ('\n'); /* output '\n' */
*n = 0; /* reset value at mem address 0 */
}
putchar (c); /* output character */
(*n)++; /* increment value at mem address */
}
int main (int argc, char **argv) {
char line[MAXC]; /* buffer to hold each line */
int n = 0; /* no. of chars ouput in line */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (fgets (line, MAXC, fp)) /* read each line and tokenize line */
for (char *tok = strtok (line, DELIM); tok; tok = strtok (NULL, DELIM)) {
putcharCPL (toupper(*tok), &n); /* convert 1st char to upper */
for (int i = 1; tok[i]; i++) /* output rest unchanged */
putcharCPL (tok[i], &n);
}
putchar ('\n'); /* tidy up with newline */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
}
(same output)
The putcharCPL() function is just a helper that checks if 20 characters have been output and if so outputs a '\n' and resets the counter. It then outputs the current character and increments the counter by one. A pointer to the counter is passed so it can be updated within the function making the updated value available back in main().
Look things over and let me know if you have further questions.
footnotes:
1. Depending on your version of gcc, the constant in the source setting the read-buffer size may be _IO_BUFSIZ. _IO_BUFSIZ was changed to BUFSIZ here: glibc commit 9964a14579e5eef9 For Linux BUFSIZE is defined as 8192 (512 on Windows).
This is actually a much more interesting OP from a professional point of view than some of the comments may suggest, despite the 'newcomer' aspect of the question, which may sometimes raise fairly deep, underestimated issues.
The fun thing is that on my platform (W10, MSYS2, gcc v.10.2), your code runs fine with correct results:
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacter.Wat
chYourCharacter;ItBe
comesYourDestiny.
So first, congratulations, newcomer: your coding is not that bad.
This points to how different compilers may or may not protect against limited inappropriate coding or specification misuse, may or may not protect stacks or heaps.
This said, the comment by #Andrew Henle pointing to an illuminating answer about feof is quite relevant.
If you follow it and retrieve your feof test, just moving it down after read checks, not before (as below). Your code should yield better results (note: I will just alter your code minimally, deliberately ignoring lesser issues):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <ctype.h>
#define SIZE 100 // add some leeway to avoid off-by-one issues
int main()
{
FILE* f1_ptr = fopen("C:\\Users\\Public\\Dev\\test_strtok", "r");
if (! f1_ptr)
{
perror("Open issue");
exit(EXIT_FAILURE);
}
char sentence[SIZE] = {0};
if (NULL == fgets(sentence, SIZE, f1_ptr))
{
perror("fgets issue"); // implementation-dependent
exit(EXIT_FAILURE);
}
errno = 0;
char *tok_ptr = strtok(sentence, " \n"); //tokenizing each line read
if (tok_ptr == NULL || errno)
{
perror("first strtok parse issue");
exit(EXIT_FAILURE);
}
tok_ptr[0] = toupper(tok_ptr[0]); //initials to capital letters
int num = 0;
size_t i = 0;
while (1) {
while (1) {
for (i = num; i < strlen(tok_ptr) + num; i++) {
if (i % 20 == 0 && i != 0) //maximum of 20 char per line
fputc('\n', stdout);
fputc(tok_ptr[i - num], stdout);
}
num = i;
tok_ptr = strtok(NULL, " \n");
if (tok_ptr == NULL) break;
tok_ptr[0] = toupper(tok_ptr[0]);
}
if (NULL == fgets(sentence, SIZE, f1_ptr)) // let's get away whith annoying +1,
// we have enough headroom
{
if (feof(f1_ptr))
{
fprintf(stderr, "\n%s\n", "Found EOF");
break;
}
else
{
perror("Unexpected fgets issue in loop"); // implementation-dependent
exit(EXIT_FAILURE);
}
}
errno = 0;
tok_ptr = strtok(sentence, " \n");
if (tok_ptr == NULL)
{
if (errno)
{
perror("strtok issue in loop");
exit(EXIT_FAILURE);
}
break;
}
tok_ptr[0] = toupper(tok_ptr[0]);
}
return 0;
}
$ ./test
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacter.Wat
chYourCharacter;ItBe
comesYourDestiny.
Found EOF
This is my csv file, i want to get only those row which start with character "A" so i got my output but with some addition column as '0' please help me to find were i went wrong?
And one more thing i want to remove specific column like bread,anName,ot
Name,id,bread,anName,Ot,number
A,1,animal,tiger,op,8.1
M,2,animal,toper,ip,9.1
A1,7,animal,dog,cp,Na11
A2,9,animal,mouse,ap,0
A23,9,animal,pouch,gp,Na11
#include <stdio.h>
#include <stdlib.h>
#define NUMLETTERS 100
typedef struct {
char Name[100];
int id;
char number[100];
} record_t;
int main(void) {
FILE *fp;
record_t records[NUMLETTERS];
int count = 0, i;
fp = fopen("letter.csv", "r");
if (fp == NULL) {
fprintf(stderr, "Error reading file\n");
return 1;
}
while (fscanf(fp, "%s,%d,%s", records[count].name, &records[count].id, records[count].number) == 1)
count++;
for (i = 0; i < count; i++) {
if(records[i].Name[0] == 'A'){
printf("%s,%d,%s\n", records[i].Name, records[i].id, records[i].number);
}
}
fclose(fp);
return 0;
}
i want output as:
A,1,8.1
A1,7,Na11
A2,9,0
A23,9,Na11
You have two problems:
The %s format specifier tells fscanf to read a space-delimited string. Since the the records aren't space-delimited the first %s will read the whole line.
The fscanf function returns the number of successfully parsed elements it handled. Since you attempt to read three values you should compare with 3 instead of 1.
Now for one way how to solve the first problem: Use the %[ format specifier. It can handle simple patterns and, most importantly, negative patterns (read while input does not match).
So you could tell fscanf to read a string until it finds a comma by using %[^,]:
fscanf(fp, " %[^,],%d,%s", records[count].Refdes, &records[count].pin, records[count].NetName)
The use of the %[ specifier is only needed for the first string, as the second will be space-delimited (the newline).
Also note that there's a space before the %[ format, to read and ignore leading white-space, like for example the newline from the previous line.
i want to get only those row which start with character "A"
i want to remove the number which coming between A and tiger,
If I understand you correctly and you only want to store rows beginning with 'A', then I would adjust your approach to read each line with fgets() and then check whether the first character in the buffer is 'A', if so, continue; and get the next line. The for those lines that do start with 'A', simply use sscanf to parse the data into your array of struct records.
For your second part of removing the number between 'A' and "tiger", there is a difference between what you store and what you output (this comes into play in storing only records beginning with 'A' as well), but for those structs stored where the line starts with 'A', you can simply not-output the pin struct member to get the output you want.
The approach to reading a line at a time will simply require that you declare an additional character array (buffer), called buf below, to read each line into with fgets(), e.g.
char buf[3 * NUMLETTERS] = "";
...
/* read each line into buf until a max of NUMLETTERS struct filled */
while (count < NUMLETTERS && fgets (buf, sizeof buf, fp)) {
record_t tmp = { .Refdes = "" }; /* temporary struct to read into */
if (*buf != 'A') /* if doesn't start with A get next */
continue;
/* separate lines beginning with 'A' into struct members */
if (sscanf (buf, " %99[^,],%d,%99[^\n]",
tmp.Refdes, &tmp.pin, tmp.NetName) == 3)
records[count++] = tmp; /* assign tmp, increment count */
else
fprintf (stderr, "%d A record - invalid format.\n", count + 1);
}
A short example putting that to use and (since we are not sure what "remove" is intended to be), we have included a pre-processor conditional that will only output the .Refdes and .NetName members by default, but if you either #define WITHPIN or include the define in your compile string (e.g. -DWITHPIN) it will output the .pin member as well.
#include <stdio.h>
#include <stdlib.h>
#define NUMLETTERS 100
typedef struct {
char Refdes[NUMLETTERS];
int pin;
char NetName[NUMLETTERS];
} record_t;
int main (int argc, char **argv) {
record_t records[NUMLETTERS];
char buf[3 * NUMLETTERS] = "";
int count = 0, i;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
/* read each line into buf until a max of NUMLETTERS struct filled */
while (count < NUMLETTERS && fgets (buf, sizeof buf, fp)) {
record_t tmp = { .Refdes = "" }; /* temporary struct to read into */
if (*buf != 'A') /* if doesn't start with A get next */
continue;
/* separate lines beginning with 'A' into struct members */
if (sscanf (buf, " %99[^,],%d,%99[^\n]",
tmp.Refdes, &tmp.pin, tmp.NetName) == 3)
records[count++] = tmp; /* assign tmp, increment count */
else
fprintf (stderr, "%d A record - invalid format.\n", count + 1);
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
for (i = 0; i < count; i++)
#ifdef WITHPIN
printf ("%-8s %2d %s\n",
records[i].Refdes, records[i].pin, records[i].NetName);
#else
printf ("%-8s %s\n", records[i].Refdes, records[i].NetName);
#endif
}
Example Use/Output
$ ./bin/getaonly dat/getasonly.txt
A tiger
A1 dog
A2 mouse
A23 pouch
If you define -DWITHPIN in your compile string, then you will get all three outputs:
$ ./bin/getaonly dat/getasonly.txt
A 1 tiger
A1 7 dog
A2 9 mouse
A23 9 pouch
(note: with the data stored in your array, you can adjust the output format to anything you need)
Since there is some uncertainty whether you want to store all and output only records beginning with 'A' or only want to store records beginning with 'A' -- let me know if I need to make changes and I'm happy to help further.
The code I'm working on involves reading a file w/ input structured as the following:
(spaces)name(spaces) val (whatever) \n
(spaces)name(spaces) val (whatever) \n
(spaces)name(spaces) val (whatever) \n
Where spaces denotes an arbitrary amount of white spaces. My code is supposed to give both the name and the value. There is another condition, where everything on the line after a '#' is ignored (treated like a comment). The output is supposed be:
"name: (name) value: val \n"
For the most bit the code is working, except that it adds an extra line where it will create a set name= null and val to whatever the last number read was. For example my test file:
a 12
b 33
#c 15
nice 6#9
The output is:
Line after: a 12
name: a value: 12 :
Line after: b 33
name: b value: 33 :
Line after: # c 15
Line after: nice 6#9
name: nice value: 6 :
Line after:
name: value: 6 : //why is this happening
The code is here.
void readLine(char *filename)
{
FILE *pf;
char name[10000];
char value[20];
pf = fopen(filename, "r");
char line[10000];
if (pf){
while (fgets(line, sizeof(line), pf) != NULL) {
//printf("Line: %s\n",line);
printf("Line after: %s\n",line);
while(true){
int i=0;
char c=line[i]; //parse every char of the line
//assert(c==' ');
int locationS=0; //index in the name
int locationV=0; //index in the value
while((c==' ')&& i<sizeof(line)){
//look for next sequence of chars
++i;
c=line[i];
if(c=='#'){
break;
}
}
if(c=='#'){ break;}
assert(c!=' ');
while (c!=' '&&i<sizeof(line))
{
name[locationS]=c;
locationS++;
//printf("%d",locationS);
++i;
c=line[i];if(c=='#'){
break;
}
}
if(c=='#'){ break;}
assert(c==' ');
while(c==' '&&i<sizeof(line)){
//look for next sequence of chars
++i;
c=line[i];
if(c=='#'){
break;
}
}
if(c=='#'){ break;}
assert(c!=' ');
printf("\n");
while ((c!=' '&& c!='\n')&&i<sizeof(line))
{
value[locationV]=c;
locationV++;
++i;
c=line[i];if(c=='#'){
break;
}
}
printf("name: %s value: %s : \n",name, value);
memset(&name[0], 0, sizeof(name));
memset(&value[0], 0, sizeof(value));
break; //nothing interesting left
}
}
fclose(pf);
}else{
printf("Error in file\n");
exit(EXIT_FAILURE);
}
}
Pasha, you are doing some things correctly, but then you are making what you are trying to do much more difficult that need be. First, avoid using magic-numbers in your code, such as char name[10000];. Instead:
...
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char line[MAXC];
...
(you did very good following the rule Don't skimp on Buffer Size :)
Likewise you have done well in opening the file and validating the file is open for reading before attempting to read from it with fgets(). You can do that validation in a single block and handle the error at that time -- which will have the effect of reducing one-level of indention throughout the rest of your code, e.g.
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
Now with the file open and validated that it is open for reading and any error handled, you can proceed to reading each line in your file. Unless you are storing the names in an array that needs to survive your read loop, you can simply declare name[MAXC]; within the read-loop block, e.g.
while (fgets (line, MAXC, fp)) { /* read each line of input */
char name[MAXC]; /* storage for name */
int val; /* integer value for val */
(note: rather than declare another array to hold value, we have simply declared val as an int and will use sscanf to parse name and val converting the value directly to int at that time)
Anytime you are using a line-oriented input function (like fgets() or POSIX getline(), you will want to trim the '\n' read and included in the buffer that is filled. You can do that easily with the strcspn, see strspn(3) - Linux manual page. It is a simple, single call where you use the return from strcspn as the index for the '\n' in order to overwrite the '\n' with the nul-terminating character (which is '\0', or simply 0)
line[strcspn (line, "\n")] = 0; /* trim '\n' from end of line */
Now all you need to do is check for the presence of the first '#' in line that marks the beginning of a comment. If found, you will simply overwrite '#' with the nul-terminating character as you did for the '\n', e.g.
line[strcspn (line, "#")] = 0; /* overwrite '#' with nul-char */
Now that you have your line and have removed the '\n' and any comment that may be present, you can check that line isn't empty (meaning it began with a '#' or was simply an empty line containing only a '\n')
if (!*line) /* if empty-string */
continue; /* get next line */
(note: if (!*line) is simply shorthand for if (line[0] == 0). When you dereference your buffer, e.g. *line your are simply returning the first element (first char) as *line == *(line + 0) in pointer notation which is equivalent *(line + 0) == line[0] in array-index notation. The [] operates as a dereference as well.)
Now simply parse for the name and val directly from line using sscanf. Both the "%s" and "%d" conversion specifiers will ignore all leading whitespace before the conversion specifier. You can use this simple method so long as name itself does not contain whitespace. Just as you validate the return of your file opening, you will validate the return of sscanf to determine if the number of conversions you specified successfully took place. For example:
if (sscanf (line, "%1023s %d", name, &val) == 2) /* have name/value? */
printf ("\nline: %s\nname: %s\nval : %d\n", line, name, val);
else
printf ("\nline: %s (doesn't contain name/value\n", line);
(note: by using the field-width modifier for your string, e.g. "%1023s" you protect your array-bounds for name. The field width limits sscanf from writing more than 1023 char + \0 to name. This cannot be provided by a variable or by a macro and is one of the occasions where you must stick a magic-number in your code... For every rule there is generally a caveat or two...)
If you asked for 2 conversions, and sscanf returned 2, then you know that both the requested conversions were successful. Further, since for val you have specified an integer conversion, you are guaranteed that value will contain an integer.
That's all there is to it. All that remains is closing the file (if not reading from stdin) and you are done. A full example could be:
#include <stdio.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char line[MAXC];
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (fgets (line, MAXC, fp)) { /* read each line of input */
char name[MAXC]; /* storage for name */
int val; /* integer value for val */
line[strcspn (line, "\n")] = 0; /* trim '\n' from end of line */
line[strcspn (line, "#")] = 0; /* overwrite '#' with nul-char */
if (!*line) /* if empty-string */
continue; /* get next line */
if (sscanf (line, "%1023s %d", name, &val) == 2) /* have name/value? */
printf ("\nline: %s\nname: %s\nval : %d\n", line, name, val);
else
printf ("\nline: %s (doesn't contain name/value\n", line);
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
}
(note: if you want to print the raw line before trimming the '\n' and comments, just move the printing of line before the calls to strcspn. Above line is printed showing the final state of line before the call to sscanf)
Example Use/Output
Using your input file stored in dat/nameval.txt on my system, you could simply do the following to read values redirected from stdin:
$ ./bin/parsenameval <dat/nameval.txt
line: a 12
name: a
val : 12
line: b 33
name: b
val : 33
line: nice 6
name: nice
val : 6
(note: just remove the redirection < to actually open and read from the file rather than having the shell do it for you. Six-to-one, half-dozen to another.)
Look things over and let me know if you have further questions. If for some reason you cannot use any function to help you parse the line and must use only pointers or array-indexing, let me know. Following the approach above, it takes only a little effort to replace each of the operations with its manual equivalent.
I have a file with a series of words separated by a white space. For example file.txt contains this: "this is the file". How can I use fscanf to take word by word and put each word in an array of strings?
Then I did this but I don't know if it's correct:
char *words[100];
int i=0;
while(!feof(file)){
fscanf(file, "%s", words[i]);
i++;
fscanf(file, " ");
}
When reading repeated input, you control the input loop with the input function itself (fscanf in your case). While you can also loop continually (e.g. for (;;) { ... }) and check independently whether the return is EOF, whether a matching failure occurred, or whether the return matches the number of conversion specifiers (success), in your case simply checking that the return matches the single "%s" conversion specifier is fine (e.g. that the return is 1).
Storing each word in an array, you have several options. The most simple is using a 2D array of char with automatic storage. Since the longest non-medical word in the Unabridged Dictionary is 29-characters (requiring a total of 30-characters with the nul-terminating character), a 2D array with a fixed number of rows and fixed number of columns of at least 30 is fine. (dynamically allocating allows you to read and allocate memory for as many words as may be required -- but that is left for later.)
So to set up storage for 128 words, you could do something similar to the following:
#include <stdio.h>
#define MAXW 32 /* if you need a constant, #define one (or more) */
#define MAXA 128
int main (int argc, char **argv) {
char array[MAXA][MAXW] = {{""}}; /* array to store up to 128 words */
size_t n = 0; /* word index */
Now simply open your filename provided as the first argument to the program (or read from stdin by default if no argument is given), and then validate that your file is open for reading, e.g.
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
Now to the crux of your read-loop. Simply loop checking the return of fscanf to determine success/failure of the read, adding words to your array and incrementing your index on each successful read. You must also include in your loop-control a check of your index against your array bounds to ensure you do not attempt to write more words to your array than it can hold, e.g.
while (n < MAXA && fscanf (fp, "%s", array[n]) == 1)
n++;
That's it, now just close the file and use your words stored in your array as needed. For example just printing the stored words you could do:
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < n; i++)
printf ("array[%3zu] : %s\n", i, array[i]);
return 0;
}
Now just compile it, With Warnings Enabled (e.g. -Wall -Wextra -pedantic for gcc/clang, or /W3 on (VS, cl.exe) and then test on your file. The full code is:
#include <stdio.h>
#define MAXW 32 /* if you need a constant, #define one (or more) */
#define MAXA 128
int main (int argc, char **argv) {
char array[MAXA][MAXW] = {{""}}; /* array to store up to 128 words */
size_t n = 0; /* word index */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (n < MAXA && fscanf (fp, "%s", array[n]) == 1)
n++;
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < n; i++)
printf ("array[%3zu] : %s\n", i, array[i]);
return 0;
}
Example Input File
$ cat dat/thefile.txt
this is the file
Example Use/Output
$ ./bin/fscanfsimple dat/thefile.txt
array[ 0] : this
array[ 1] : is
array[ 2] : the
array[ 3] : file
Look things over and let me know if you have further questions.
strtok() might be a function that can help you here.
If you know that the words will be separated by whitespace, then calling strtok will return the char pointer to the start of the next word.
Sample code from https://www.systutorials.com/docs/linux/man/3p-strtok/
#include <string.h>
...
char *token;
char *line = "LINE TO BE SEPARATED";
char *search = " ";
/* Token will point to "LINE". */
token = strtok(line, search);
/* Token will point to "TO". */
token = strtok(NULL, search);
In your case, the space character would also act as a delimiter in the line.
Note that strtok might modify the string passed in, so if you need to you should make a deep copy using something like malloc.
It might also be easier to use fread() to read a block from a file
As mentioned in comments, using feof() does not work as would be expected. And, as described in this answer unless the content of the file is formatted with very predictable content, using any of the scanf family to parse out the words is overly complicated. I do not recommend using it for that purpose.
There are many other, better ways to read content of a file, word by word. My preference is to read each line into a buffer, then parse the buffer to extract the words. This requires determining those characters that may be in the file, but would not be considered part of a word. Characters such as \n,\t, (space), -, etc. should be considered delimiters, and can be used to extract the words. The following is a recipe for extracting words from a file: (example code for a few of the items is included below these steps.)
Read file to count words, and get the length of the longest word.
Use count, and longest values from 1st step to allocate memory for words.
Rewind the file.
Read file line by line into a line buffer using while(fgets(line, size, fp))
Parse each new line into words using delimiters and store each word into arrays of step 2.
Use resulting array of words as necessary.
free all memory allocated when finished with arrays
Some example of code to do some of these tasks:
// Get count of words, and longest word in file
int longestWord(char *file, int *nWords)
{
FILE *fp=0;
int cnt=0, longest=0, numWords=0;
int c;
fp = fopen(file, "r");
if(fp)
{
// if((strlen(buf) > 0) && (buf[0] != '\t') && (buf[0] != '\n') && (buf[0] != '\0')&& (buf[0] > 0))
while ( (c = fgetc(fp) ) != EOF )
{
if ( isalnum (c) ) cnt++;
else if ( ( ispunct (c) ) || ( isspace(c) ) || (c == '\0' ))
{
(cnt > longest) ? (longest = cnt, cnt=0) : (cnt=0);
numWords++;
}
}
*nWords = numWords;
fclose(fp);
}
else return -1;
return longest;
}
// Create indexable memory for word arrays
char ** Create2DStr(ssize_t numStrings, ssize_t maxStrLen)
{
int i;
char **a = {0};
a = calloc(numStrings, sizeof(char *));
for(i=0;i<numStrings; i++)
{
a[i] = calloc(maxStrLen + 1, 1);
}
return a;
}
Usage: For a file with 25 words, the longest being 80 bytes:
char **strArray = Create2DStr(25, 80+1);//creates 25 array locations
//each 80+1 characters long
//(+1 is room for null terminator.)
int i=0;
char words[50][50];
while(fscanf(file, " %s ", words[i]) != EOF)
i++;
I wouldn't entirely recommend doing it this way, because of the unknown amount of words in the file, and the unknown length of a "word". Either can be over the size of '50'. Just do it dynamically, instead. Still, this should show you how it works.
How can I use fscanf to take word by word and put each word in an array of strings?
Read each word twice: first to find length via "%n". 2nd time, save it. (Inefficient yet simple)
Re-size strings as you go. Again inefficient, yet simple.
// Rough untested sample code - still need to add error checking.
size_t string_count = 0;
char **strings = NULL;
for (;;) {
long pos = ftell(file);
int n = 0;
fscanf(file, "%*s%n", &n); // record where scanning a "word" stopped
if (n == 0) break;
fseek(file, pos, SEEK_SET); // go back;
strings = realloc(strings, sizeof *strings * (string_count+1));// increase array size
strings[string_count] = malloc(n + 1u); // Get enough memory for the word
fscanf(file, "%s ", strings[string_count] ); // read/save word
}
// use strings[], string_count
// When done, free each strings[] and then strings
I have a very strange problem, I'm trying to read a .txt file with C, and the data is structured like this:
%s
%s
%d %d
Since I have to read the strings all the way to \n I'm reading it like this:
while(!feof(file)){
fgets(s[i].title,MAX_TITLE,file);
fgets(s[i].artist,MAX_ARTIST,file);
char a[10];
fgets(a,10,file);
sscanf(a,"%d %d",&s[i].time.min,&s[i++].time.sec);
}
However, the very first integer I read in s.time.min shows a random big number.
I'm using the sscanf right now since a few people had a similar issue, but it doesn't help.
Thanks!
EDIT: The integers represent time, they will never exceed 5 characters combined, including the white space between.
Note, I take your post to be reading values from 3 different lines, e.g.:
%s
%s
%d %d
(primarily evidenced by your use of fgets, a line-oriented input function, which reads a line of input (up to and including the '\n') each time it is called.) If that is not the case, then the following does not apply (and can be greatly simplified)
Since you are reading multiple values into a single element in an array of struct, you may find it better (and more robust), to read each value and validate each value using temporary values before you start copying information into your structure members themselves. This allows you to (1) validate the read of all values, and (2) validate the parse, or conversion, of all required values before storing members in your struct and incrementing your array index.
Additionally, you will need to remove the tailing '\n' from both title and artist to prevent having embedded newlines dangling off the end of your strings (which will cause havoc with searching for either a title or artist). For instance, putting it all together, you could do something like:
void rmlf (char *s);
....
char title[MAX_TITLE] = "";
char artist[MAX_ARTIST = "";
char a[10] = "";
int min, sec;
...
while (fgets (title, MAX_TITLE, file) && /* validate read of values */
fgets (artist, MAX_ARTIST, file) &&
fgets (a, 10, file)) {
if (sscanf (a, "%d %d", &min, &sec) != 2) { /* validate conversion */
fprintf (stderr, "error: failed to parse 'min' 'sec'.\n");
continue; /* skip line - tailor to your needs */
}
rmlf (title); /* remove trailing newline */
rmlf (artist);
s[i].time.min = min; /* copy to struct members & increment index */
s[i].time.sec = sec;
strncpy (s[i].title, title, MAX_TITLE);
strncpy (s[i++].artist, artist, MAX_ARTIST);
}
/** remove tailing newline from 's'. */
void rmlf (char *s)
{
if (!s || !*s) return;
for (; *s && *s != '\n'; s++) {}
*s = 0;
}
(note: this will also read all values until an EOF is encountered without using feof (see Related link: Why is “while ( !feof (file) )” always wrong?))
Protecting Against a Short-Read with fgets
Following on from Jonathan's comment, when using fgets you should really check to insure you have actually read the entire line, and not experienced a short read where the maximum character value you supply is not sufficient to read the entire line (e.g. a short read because characters in that line remain unread)
If a short read occurs, that will completely destroy your ability to read any further lines from the file, unless you handle the failure correctly. This is because the next attempt to read will NOT start reading on the line you think it is reading and instead attempt to read the remaining characters of the line where the short read occurred.
You can validate a read by fgets by validating the last character read into your buffer is in fact a '\n' character. (if the line is longer than the max you specify, the last character before the nul-terminating character will be an ordinary character instead.) If a short read is encountered, you must then read and discard the remaining characters in the long line before continuing with your next read. (unless you are using a dynamically allocated buffer where you can simply realloc as required to read the remainder of the line, and your data structure)
Your situation complicates the validation by requiring data from 3 lines from the input file for each struct element. You must always maintain your 3-line read in sync reading all 3 lines as a group during each iteration of your read loop (even if a short read occurs). That means you must validate that all 3 lines were read and that no short read occurred in order to handle any one short read without exiting your input loop. (you can validate each individually if you just want to terminate input on any one short read, but that leads to a very inflexible input routine.
You can tweak the rmlf function above to a function that validates each read by fgets in addition to removing the trailing newline from the input. I have done that below in a function called, surprisingly, shortread. The tweaks to the original function and read loop could be coded something like this:
int shortread (char *s, FILE *fp);
...
for (idx = 0; idx < MAX_SONGS;) {
int t, a, b;
t = a = b = 0;
/* validate fgets read of complete line */
if (!fgets (title, MAX_TITLE, fp)) break;
t = shortread (title, fp);
if (!fgets (artist, MAX_ARTIST, fp)) break;
a = shortread (artist, fp);
if (!fgets (buf, MAX_MINSEC, fp)) break;
b = shortread (buf, fp);
if (t || a || b) continue; /* if any shortread, skip */
if (sscanf (buf, "%d %d", &min, &sec) != 2) { /* validate conversion */
fprintf (stderr, "error: failed to parse 'min' 'sec'.\n");
continue; /* skip line - tailor to your needs */
}
s[idx].time.min = min; /* copy to struct members & increment index */
s[idx].time.sec = sec;
strncpy (s[idx].title, title, MAX_TITLE);
strncpy (s[idx].artist, artist, MAX_ARTIST);
idx++;
}
...
/** validate complete line read, remove tailing newline from 's'.
* returns 1 on shortread, 0 - valid read, -1 invalid/empty string.
* if shortread, read/discard remainder of long line.
*/
int shortread (char *s, FILE *fp)
{
if (!s || !*s) return -1;
for (; *s && *s != '\n'; s++) {}
if (*s != '\n') {
int c;
while ((c = fgetc (fp)) != '\n' && c != EOF) {}
return 1;
}
*s = 0;
return 0;
}
(note: in the example above the result of the shortread check for each of the lines that make up and title, artist, time group.)
To validate the approach I put together a short example that will help put it all in context. Look over the example and let me know if you have any further questions.
#include <stdio.h>
#include <string.h>
/* constant definitions */
enum { MAX_MINSEC = 10, MAX_ARTIST = 32, MAX_TITLE = 48, MAX_SONGS = 64 };
typedef struct {
int min;
int sec;
} stime;
typedef struct {
char title[MAX_TITLE];
char artist[MAX_ARTIST];
stime time;
} songs;
int shortread (char *s, FILE *fp);
int main (int argc, char **argv) {
char title[MAX_TITLE] = "";
char artist[MAX_ARTIST] = "";
char buf[MAX_MINSEC] = "";
int i, idx, min, sec;
songs s[MAX_SONGS] = {{ .title = "", .artist = "" }};
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
for (idx = 0; idx < MAX_SONGS;) {
int t, a, b;
t = a = b = 0;
/* validate fgets read of complete line */
if (!fgets (title, MAX_TITLE, fp)) break;
t = shortread (title, fp);
if (!fgets (artist, MAX_ARTIST, fp)) break;
a = shortread (artist, fp);
if (!fgets (buf, MAX_MINSEC, fp)) break;
b = shortread (buf, fp);
if (t || a || b) continue; /* if any shortread, skip */
if (sscanf (buf, "%d %d", &min, &sec) != 2) { /* validate conversion */
fprintf (stderr, "error: failed to parse 'min' 'sec'.\n");
continue; /* skip line - tailor to your needs */
}
s[idx].time.min = min; /* copy to struct members & increment index */
s[idx].time.sec = sec;
strncpy (s[idx].title, title, MAX_TITLE);
strncpy (s[idx].artist, artist, MAX_ARTIST);
idx++;
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (i = 0; i < idx; i++)
printf (" %2d:%2d %-32s %s\n", s[i].time.min, s[i].time.sec,
s[i].artist, s[i].title);
return 0;
}
/** validate complete line read, remove tailing newline from 's'.
* returns 1 on shortread, 0 - valid read, -1 invalid/empty string.
* if shortread, read/discard remainder of long line.
*/
int shortread (char *s, FILE *fp)
{
if (!s || !*s) return -1;
for (; *s && *s != '\n'; s++) {}
if (*s != '\n') {
int c;
while ((c = fgetc (fp)) != '\n' && c != EOF) {}
return 1;
}
*s = 0;
return 0;
}
Example Input
$ cat ../dat/titleartist.txt
First Title I Like
First Artist I Like
3 40
Second Title That Is Way Way Too Long To Fit In MAX_TITLE Characters
Second Artist is Fine
12 43
Third Title is Fine
Third Artist is Way Way Too Long To Fit in MAX_ARTIST
3 23
Fourth Title is Good
Fourth Artist is Good
32274 558212 (too long for MAX_MINSEC)
Fifth Title is Good
Fifth Artist is Good
4 27
Example Use/Output
$ ./bin/titleartist <../dat/titleartist.txt
3:40 First Artist I Like First Title I Like
4:27 Fifth Artist is Good Fifth Title is Good
Instead of sscanf(), I would use strtok() and atoi().
Just curious, why only 10 bytes for the two integers? Are you sure they are always that small?
By the way, I apologize for such a short answer. I'm sure there is a way to get sscanf() to work for you, but in my experience sscanf() can be rather finicky so I'm not a big fan. When parsing input with C, I have just found it a lot more efficient (in terms of how long it takes to write and debug the code) to just tokenize the input with strtok() and convert each piece individually with the various ato? functions (atoi, atof, atol, strtod, etc.; see stdlib.h). It keeps things simpler, because each piece of input is handled individually, which makes debugging any problems (should they arise) much easier. In the end I typically spend a lot less time getting such code to work reliably than I did when I used to try to use sscanf().
Use "%*s %*s %d %d" as your format string, instead...
You seem to be expecting sscanf to automagically skip the two tokens leading up to the decimal digit fields. It doesn't do that unless you explicitly tell it to (hence the pair of %*s).
You can't expect the people who designed C to have designed it the same way as you would. You NEED to check the return value, as iharob said.
That's not all. You NEED to read (and understand reelatively well) the entire scanf manual (the one written by OpenGroup is okay). That way you know how to use the function (including all of the subtle nuances of format strings) and what to do with the return vale.
As a programmer, you need to read. Remember that well.