Pattern Recognition for File input in C

Pattern Recognition for File input in C - c

I was trying to get input from a file in C using scanf. The data in the file is as follows:
223234 <Justin> <Riverside>
This is the following regex I tried:
FILE* fid;
int id;
char name[100], city[100];
char dontcare1[40], dontcare3[40];
char dontcare2,dontcare4[40],dontcare5;
fid = fopen("test.txt", "r");
fscanf(fid,"%d%[^<]%c%[^<]%c%[>]%c ",&id,&dontcare1[0],
&dontcare2,&dontcare3[0],&dontcare4[0],
&city[0],&dontcare5);
I was wondering if there is a better way to do this, how would I account for whitespaces in the file without creating extra variables, this doesn't seem to pick up the city name enclosed in the brackets.

In *scanf() you can expect literal characters and one space can match many separators.
My example is simplified with sscanf() in order to avoid dealing with a file, but it works the same with fscanf().
The trick here is to use %n in order to obtain the number of read characters till that point; this way, we ensure the last > literal has actually been read
(we cannot know that with the result of *scanf())
/**
gcc -std=c99 -o prog_c prog_c.c \
-pedantic -Wall -Wextra -Wconversion \
-Wc++-compat -Wwrite-strings -Wold-style-definition -Wvla \
-g -O0 -UNDEBUG -fsanitize=address,undefined
**/
#include <stdio.h>
int
main(void)
{
const char *line="223234 <Justin> <Riverside>";
int id;
char name[100], city[100];
int n_read=-1;
sscanf(line, "%d <%[^>]> <%[^>]>%n",
&id, name, city, &n_read);
if(n_read!=-1) // read till the end then updated
{
printf("id=%d\n", id);
printf("name=%s\n", name);
printf("city=%s\n", city);
}
return 0;
}

When trying to open the file, it's useful to ensure that the file was actually opened successfully.
FILE *fid;
fid = fopen("path/to/file", "r");
if (fid == NULL){
printf("Unable to open file. \n");
return -1;
}
Actually addressing your problem, I'd probably just use string.h's strtok function, then use a space as a delimiter.
Also, I wouldn't use scanf, but rather fgets... The reasons for this can be found in various other SO articles. The following is an untested solution.
char line[100], line_parse[100]; // Buffer(s) to store lines upto 100
char *ret; // token used for strtok
// Read an integer and store read status in success.
while (fgets(line, sizeof(line), fPtrIn) != NULL)
{
// Copy the line for parsing, as strtok changes original string
strcpy(line_parse, line);
// Separate the line into tokens
ret = strtok(line_parse, " ");
while (ret != NULL)
{/*do something with current field*/
ret = strtok(NULL, " "); // Move onto next field
}
Please be aware that strtok is not thread-safe. In multi-threaded code, you should therefore not use this function. Unfortunately, the ISO C standard itself does not provide a thread-safe version of the function. But many platforms provide such a function as an extension: On POSIX-compliant platforms (such as Linux), you can use the function strtok_r. On Microsoft Windows, you can use the function strtok_s. Both of these functions are thread-safe.

You can actually do this quite simply by reading the line into an array (buffer) and then parsing what you need from the line with sscanf(). Don't use scanf() directly as that opens you up to a whole array of pitfalls related to what characters remain unread in your input stream. Instead, do all input by reading a line at a time and then use sscanf() to parse the values from the buffer, just as you would with scanf(), but by using fgets() to read, you consume an entire line at a time, and what remains in your input stream does not depend on the success or failure of your conversions.
For example, you could do:
#include <stdio.h>
#define MAXC 1024
#define NCMAX 100
int main (void) {
char buf[MAXC],
name[NCMAX],
city[NCMAX];
unsigned n;
if (!fgets (buf, MAXC, stdin))
return 1;
if (sscanf (buf, "%u <%99[^>]> <%99[^>]>", &n, name, city) != 3) {
fputs ("error: invalid format", stderr);
return 1;
}
printf ("no. : %u\nname : %s\ncity : %s\n", n, name, city);
}
The sscanf() format string is key. "%u <%99[^>]> <%99[^>]>" reads the number as an unsigned value, <%99[^>]> consumes the '<' and then the character class %99[^>] uses the field-width modifier of 99 to protect your array bounds and the class [^>] will read all characters not including > (it does the same for the city next). The conversion is Validated by Checking the Return to insure three valid conversions took place. If not, the error is handled.
Example Use/Output
With your input in the file dat/no_name_place.txt, the file is simply redirected on stdin and read by the program resulting in:
$ ./bin/no_name_city < dat/no_name_place.txt
no. : 223234
name : Justin
city : Riverside

If you have to use scanf(), the other answers seem to cover every aspect. This is an alternative, getting input character by character using fgetc() and strcpy().
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAX_SIZE 100
int main(void)
{
int id = 0, c = 0;
char buff1[MAX_SIZE], buff2[MAX_SIZE];
size_t i = 0U;
FILE *fptr = NULL;
if (!(fptr = fopen("test.txt", "r")))
{
perror("error opening file");
return -1;
}
while ((c = fgetc(fptr)) != EOF)
{
if (isdigit(c)) /* maybe check INT_MAX here if you are planning to scan big numbers */
{
id = (id * 10) + (c - '0');
}
if (i != 0 && c == ' ')
{
buff2[i] = '\0';
strcpy(buff1, buff2);
i = 0U;
}
if (isalpha(c))
{
if (i < MAX_SIZE - 1)
{
buff2[i++] = c;
}
else
{
fputs("Buff full", stderr);
return -1;
}
}
}
buff2[i] = '\0';
return 0;
}

Related

Segmentation fault while reading data from file

I'm getting a segmentation fault error while parsing data from a CSV file in C Language.
I believe the error is given while reading the last line <person[i].status> as if i comment the same line the code runs perfectly.
Contents of CSV file:
1;A;John Mott;D;30;Z
2;B;Judy Moor;S;60;X
3;A;Kae Blanchett;S;42;y
4;B;Jair Tade;S;21;W
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct Person
{
int id;
char key;
char name[16];
char rel;
int age;
char status;
} Person;
int main()
{
Person person[12];
FILE *f = fopen("data.csv", "r");
char buffer[256];
if (f != NULL)
{
int i = 0;
printf("\nFile OK!\n");
printf("Printing persons:\n");
while (fgets(buffer, 256, f))
{
person[i].id = atoi(strtok(buffer, ";"));
person[i].key = strtok(NULL, ";")[0];
strcpy(person[i].name, strtok(NULL, ";"));
person[i].rel = strtok(NULL, ";")[0];
person[i].age = atoi(strtok(buffer, ";"));
person[i].status = strtok(NULL, ";")[0]; // error: segmentation fault
printf("id: %d\n", person[i].id);
printf("key: %c\n", person[i].key);
printf("name: %s\n", person[i].name);
printf("rel: %c\n", person[i].rel);
printf("age: %d\n", person[i].age);
printf("status: %c\n", person[i].status);
i++;
}
}
else
{
printf("\nFile BAD!\n");
}
return 0;
}
Thank you for your help!

While you have a good answer addressing your problems with strtok(), you may be over-complicating your code by using strtok() to begin with. When reading a delimited file with a fixed delimiter, reading a line-at-a-time into a sufficiently sized buffer and then separating the buffer into the needed values with sscanf() can provide a succinct (and in the case of your use of atoi() a more robust) solution.
Your fields are easily separated in this case using a carefully crafted format-string. For example, reading each line into a buffer (buf in this case) you can separate each of the lines into the needed values with:
if (sscanf (buf, "%d;%c;%15[^;];%c;%d;%c", /* convert to person/VALIDATE */
&person[n].id, &person[n].key, person[n].name,
&person[n].rel, &person[n].age, &person[n].status) == 6)
The conversion to int by sscanf() at least minimally validates the integer conversion. Not so with atoi() which will happily take atoi ("my cow") and fail silently returning zero without any indication things have gone wrong.
Note, in every conversion to string, you must provide a field-width modifier to limit the number of characters stored to one less than your array can hold (saving room for the '\0' nul-terminating character). Otherwise the use of the scanf() family "%s" or "%[..]" is no safer than using gets(). See Why gets() is so dangerous it should never be used!
The same protection of your array bounds for person[] applies on your read loop. Simply keeping a count of the successful conversions and testing before the next read is all you need, e.g.
#define NPERSONS 12 /* if you need a constant, #define one (or more) */
#define MAXNAME 16
#define MAXC 1024
...
char buf[MAXC]; /* buffer to hold each line */
size_t n = 0; /* person counter/index */
Person person[NPERSONS] = {{ .id = 0 }}; /* initialize all elements */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
...
while (n < NPERSONS && fgets (buf, MAXC, fp)) { /* protect array, read line */
if (sscanf (buf, "%d;%c;%15[^;];%c;%d;%c", /* convert to person/VALIDATE */
&person[n].id, &person[n].key, person[n].name,
&person[n].rel, &person[n].age, &person[n].status) == 6)
n++; /* increment count on good conversion */
}
As shown with the #defines above, don't use MagicNumbers in your code. (e.g. 12, 16). Instead declare a constant at the top of your code that provides a convenient single-location to change if your limits later need adjustment.
In the same vein, do not hardcode filenames. There is no reason you should have to re-compile your code just to read from a different file. Pass the filename as the first argument to your program (that's what argc and argv are for), or prompt the user and take the filename as input. Above, the code takes the filename as the first argument, or reads from stdin by default if no argument is provided (like most Unix utilities do).
Putting that altogether, you could do something similar to:
#include <stdio.h>
#define NPERSONS 12 /* if you need a constant, #define one (or more) */
#define MAXNAME 16
#define MAXC 1024
typedef struct Person {
int id;
char key;
char name[MAXNAME];
char rel;
int age;
char status;
} Person;
int main (int argc, char **argv) {
char buf[MAXC]; /* buffer to hold each line */
size_t n = 0; /* person counter/index */
Person person[NPERSONS] = {{ .id = 0 }}; /* initialize all elements */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (n < NPERSONS && fgets (buf, MAXC, fp)) { /* protect array, read line */
if (sscanf (buf, "%d;%c;%15[^;];%c;%d;%c", /* convert to person/VALIDATE */
&person[n].id, &person[n].key, person[n].name,
&person[n].rel, &person[n].age, &person[n].status) == 6)
n++; /* increment count on good conversion */
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
for (size_t i = 0; i < n; i++) /* output results */
printf ("person[%zu] %3d %c %-15s %c %3d %c\n", i,
person[i].id, person[i].key, person[i].name,
person[i].rel, person[i].age, person[i].status);
}
(note: you only need one call to printf() to output any contiguous block of output with conversions. If you have no conversions required, use puts() or fputs() if end-of-line control is needed)
Lastly, do not skimp on buffer size. 16 seems horribly short for a name field (64 is still pushing it). By using the field-width modifier you are protected against Undefined Behavior due to overwriting your array bounds (the code will simply skip the line), but you should add an else { ... } condition to output an error in that case. 16 is sufficient for your example data, but for general use, you would want to adjust that to a larger value.
Example Use/Output
With your sample input in the file named dat/person_id-status.txt, you could do:
$ ./bin/person_id-status dat/person_id-status.txt
person[0] 1 A John Mott D 30 Z
person[1] 2 B Judy Moor S 60 X
person[2] 3 A Kae Blanchett S 42 y
person[3] 4 B Jair Tade S 21 W
Those there the main points that struct me looking over your code. (I'm sure I've forgotten to mention one or two more) Look things over and let me know if you have further questions.

Strtok strange behaviour

I'm having some troubles using strtok function.
As an exercise I have to deal with a text file by ruling out white spaces, transforming initials into capital letters and printing no more than 20 characters in a line.
Here is a fragment of my code:
fgets(sentence, SIZE, f1_ptr);
char *tok_ptr = strtok(sentence, " \n"); //tokenazing each line read
tok_ptr[0] = toupper(tok_ptr[0]); //initials to capital letters
int num = 0, i;
while (!feof(f1_ptr)) {
while (tok_ptr != NULL) {
for (i = num; i < strlen(tok_ptr) + num; i++) {
if (i % 20 == 0 && i != 0) //maximum of 20 char per line
fputc('\n', stdout);
fputc(tok_ptr[i - num], stdout);
}
num = i;
tok_ptr = strtok(NULL, " \n");
if (tok_ptr != NULL)
tok_ptr[0] = toupper(tok_ptr[0]);
}
fgets(sentence, SIZE + 1, f1_ptr);
tok_ptr = strtok(sentence, " \n");
if (tok_ptr != NULL)
tok_ptr[0] = toupper(tok_ptr[0]);
}
The text is just a bunch of lines I just show as a reference:
Watch your thoughts ; they become words .
Watch your words ; they become actions .
Watch your actions ; they become habits .
Watch your habits ; they become character .
Watch your character ; it becomes your destiny .
Here is what I obtain in the end:
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;THeyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacteR.Wat
chYourCharacter;ItBe
comesYourDEstiny.Lao
-Tze
The final result is mostly correct, but sometimes (for example "they" in they become (and only in that case) or "destiny") words are not correctly tokenized. So for example "they" is split into "t" and "hey" resulting in THey (DEstiny in the other instance) after the manipulations I made.
Is it some bug or am I missing something? Probably my code is not that efficient and some condition may end up being critical...
Thank you for the help, it's not that big of a deal, I just don't understand why such a behaviour is occurring.

You have a large number of errors in your code and you are over-complicating the problem. The most pressing error is Why is while ( !feof (file) ) always wrong? Why? Trace the execution-path within your loop. You attempt to read with fgets(), and then you use sentence without knowing whether EOF was reached calling tok_ptr = strtok(sentence, " \n"); before you ever get around to checking feof(f1_ptr)
What happens when you actually reach EOF? That IS "Why while ( !feof (file) ) is always wrong?" Instead, you always want to control your read-loop with the return of the read function you are using, e.g. while (fgets(sentence, SIZE, f1_ptr) != NULL)
What is it you actually need your code to do?
The larger question is why are you over-complicating the problem with strtok, and arrays (and fgets() for that matter)? Think about what you need to do:
read each character in the file,
if it is whitespace, ignore it, set the in-word flag false,
if a non-whitespace, if 1st char in word, capitalize it, output the char, set the in-word flag true and increment the number of chars output to the current line, and finally
if it is the 20th character output, output a newline and reset the counter zero.
The bare-minimum tools you need from your C-toolbox are fgetc(), isspace() and toupper() from ctype.h, a counter for the number of characters output, and a flag to know if the character is the first non-whitespace character after a whitespace.
Implementing the logic
That makes the problem very simple. Read a character, is it whitespace?, set your in-word flag false, otherwise if your in-word flag is false, capitalize it, output the character, set your in-word flag true, increment your word count. Last thing you need to do is check if your character-count has reached the limit, if so output a '\n' and reset your character-count zero. Repeat until you run out of characters.
You can turn that into a code with something similar to the following:
#include <stdio.h>
#include <ctype.h>
#define CPL 20 /* chars per-line, if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
int c, in = 0, n = 0; /* char, in-word flag, no. of chars output in line */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while ((c = fgetc(fp)) != EOF) { /* read / validate each char in file */
if (isspace(c)) /* char is whitespace? */
in = 0; /* set in-word flag false */
else { /* otherwise, not whitespace */
putchar (in ? c : toupper(c)); /* output char, capitalize 1st in word */
in = 1; /* set in-word flag true */
n++; /* increment character count */
}
if (n == CPL) { /* CPL limit reached? */
putchar ('\n'); /* output newline */
n = 0; /* reset cpl counter */
}
}
putchar ('\n'); /* tidy up with newline */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
}
Example Use/Output
Given your input file stored on my computer in dat/text220.txt, you can produce the output you are looking for with:
$ ./bin/text220 dat/text220.txt
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacter.Wat
chYourCharacter;ItBe
comesYourDestiny.
(the executable for the code was compiled to bin/text220, I usually keep separate dat, obj, and bin directories for data, object files and executables to keep by source code directory clean)
note: by reading from stdin by default if no filename is provided as the first argument to the program, you can use your program to read input directly, e.g.
$ echo "my dog has fleas - bummer!" | ./bin/text220
MyDogHasFleas-Bummer
!
No fancy string functions required, just a loop, a character, a flag and a counter -- the rest is just arithmetic. It's always worth trying to boils your programming problems down to basic steps and then look around your C-toolbox and find the right tool for each basic step.
Using strtok
Don't get me wrong, there is nothing wrong with using strtok and it makes a fairly simple solution in this case -- the point I was making is that for simple character-oriented string-processing, it's often just a simple to loop over the characters in the line. You don't gain any efficiencies using fgets() with an array and strtok(), the read from the file is already placed into a buffer of BUFSIZ1.
If you did want to use strtok(), you should control you read-loop your with the return from fgets()and then you can tokenize with strtok() also checking its return at each point. A read-loop with fgets() and a tokenization loop with strtok(). Then you handle first-character capitalization and then limiting your output to 20-chars per-line.
You could do something like the following:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define CPL 20 /* chars per-line, if you need a constant, #define one (or more) */
#define MAXC 1024
#define DELIM " \t\r\n"
void putcharCPL (int c, int *n)
{
if (*n == CPL) { /* if n == limit */
putchar ('\n'); /* output '\n' */
*n = 0; /* reset value at mem address 0 */
}
putchar (c); /* output character */
(*n)++; /* increment value at mem address */
}
int main (int argc, char **argv) {
char line[MAXC]; /* buffer to hold each line */
int n = 0; /* no. of chars ouput in line */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (fgets (line, MAXC, fp)) /* read each line and tokenize line */
for (char *tok = strtok (line, DELIM); tok; tok = strtok (NULL, DELIM)) {
putcharCPL (toupper(*tok), &n); /* convert 1st char to upper */
for (int i = 1; tok[i]; i++) /* output rest unchanged */
putcharCPL (tok[i], &n);
}
putchar ('\n'); /* tidy up with newline */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
}
(same output)
The putcharCPL() function is just a helper that checks if 20 characters have been output and if so outputs a '\n' and resets the counter. It then outputs the current character and increments the counter by one. A pointer to the counter is passed so it can be updated within the function making the updated value available back in main().
Look things over and let me know if you have further questions.
footnotes:
1. Depending on your version of gcc, the constant in the source setting the read-buffer size may be _IO_BUFSIZ. _IO_BUFSIZ was changed to BUFSIZ here: glibc commit 9964a14579e5eef9 For Linux BUFSIZE is defined as 8192 (512 on Windows).

This is actually a much more interesting OP from a professional point of view than some of the comments may suggest, despite the 'newcomer' aspect of the question, which may sometimes raise fairly deep, underestimated issues.
The fun thing is that on my platform (W10, MSYS2, gcc v.10.2), your code runs fine with correct results:
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacter.Wat
chYourCharacter;ItBe
comesYourDestiny.
So first, congratulations, newcomer: your coding is not that bad.
This points to how different compilers may or may not protect against limited inappropriate coding or specification misuse, may or may not protect stacks or heaps.
This said, the comment by #Andrew Henle pointing to an illuminating answer about feof is quite relevant.
If you follow it and retrieve your feof test, just moving it down after read checks, not before (as below). Your code should yield better results (note: I will just alter your code minimally, deliberately ignoring lesser issues):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <ctype.h>
#define SIZE 100 // add some leeway to avoid off-by-one issues
int main()
{
FILE* f1_ptr = fopen("C:\\Users\\Public\\Dev\\test_strtok", "r");
if (! f1_ptr)
{
perror("Open issue");
exit(EXIT_FAILURE);
}
char sentence[SIZE] = {0};
if (NULL == fgets(sentence, SIZE, f1_ptr))
{
perror("fgets issue"); // implementation-dependent
exit(EXIT_FAILURE);
}
errno = 0;
char *tok_ptr = strtok(sentence, " \n"); //tokenizing each line read
if (tok_ptr == NULL || errno)
{
perror("first strtok parse issue");
exit(EXIT_FAILURE);
}
tok_ptr[0] = toupper(tok_ptr[0]); //initials to capital letters
int num = 0;
size_t i = 0;
while (1) {
while (1) {
for (i = num; i < strlen(tok_ptr) + num; i++) {
if (i % 20 == 0 && i != 0) //maximum of 20 char per line
fputc('\n', stdout);
fputc(tok_ptr[i - num], stdout);
}
num = i;
tok_ptr = strtok(NULL, " \n");
if (tok_ptr == NULL) break;
tok_ptr[0] = toupper(tok_ptr[0]);
}
if (NULL == fgets(sentence, SIZE, f1_ptr)) // let's get away whith annoying +1,
// we have enough headroom
{
if (feof(f1_ptr))
{
fprintf(stderr, "\n%s\n", "Found EOF");
break;
}
else
{
perror("Unexpected fgets issue in loop"); // implementation-dependent
exit(EXIT_FAILURE);
}
}
errno = 0;
tok_ptr = strtok(sentence, " \n");
if (tok_ptr == NULL)
{
if (errno)
{
perror("strtok issue in loop");
exit(EXIT_FAILURE);
}
break;
}
tok_ptr[0] = toupper(tok_ptr[0]);
}
return 0;
}
$ ./test
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacter.Wat
chYourCharacter;ItBe
comesYourDestiny.
Found EOF

Reading multiple lines with different data types in C

I have a very strange problem, I'm trying to read a .txt file with C, and the data is structured like this:
%s
%s
%d %d
Since I have to read the strings all the way to \n I'm reading it like this:
while(!feof(file)){
fgets(s[i].title,MAX_TITLE,file);
fgets(s[i].artist,MAX_ARTIST,file);
char a[10];
fgets(a,10,file);
sscanf(a,"%d %d",&s[i].time.min,&s[i++].time.sec);
}
However, the very first integer I read in s.time.min shows a random big number.
I'm using the sscanf right now since a few people had a similar issue, but it doesn't help.
Thanks!
EDIT: The integers represent time, they will never exceed 5 characters combined, including the white space between.

Note, I take your post to be reading values from 3 different lines, e.g.:
%s
%s
%d %d
(primarily evidenced by your use of fgets, a line-oriented input function, which reads a line of input (up to and including the '\n') each time it is called.) If that is not the case, then the following does not apply (and can be greatly simplified)
Since you are reading multiple values into a single element in an array of struct, you may find it better (and more robust), to read each value and validate each value using temporary values before you start copying information into your structure members themselves. This allows you to (1) validate the read of all values, and (2) validate the parse, or conversion, of all required values before storing members in your struct and incrementing your array index.
Additionally, you will need to remove the tailing '\n' from both title and artist to prevent having embedded newlines dangling off the end of your strings (which will cause havoc with searching for either a title or artist). For instance, putting it all together, you could do something like:
void rmlf (char *s);
....
char title[MAX_TITLE] = "";
char artist[MAX_ARTIST = "";
char a[10] = "";
int min, sec;
...
while (fgets (title, MAX_TITLE, file) && /* validate read of values */
fgets (artist, MAX_ARTIST, file) &&
fgets (a, 10, file)) {
if (sscanf (a, "%d %d", &min, &sec) != 2) { /* validate conversion */
fprintf (stderr, "error: failed to parse 'min' 'sec'.\n");
continue; /* skip line - tailor to your needs */
}
rmlf (title); /* remove trailing newline */
rmlf (artist);
s[i].time.min = min; /* copy to struct members & increment index */
s[i].time.sec = sec;
strncpy (s[i].title, title, MAX_TITLE);
strncpy (s[i++].artist, artist, MAX_ARTIST);
}
/** remove tailing newline from 's'. */
void rmlf (char *s)
{
if (!s || !*s) return;
for (; *s && *s != '\n'; s++) {}
*s = 0;
}
(note: this will also read all values until an EOF is encountered without using feof (see Related link: Why is “while ( !feof (file) )” always wrong?))
Protecting Against a Short-Read with fgets
Following on from Jonathan's comment, when using fgets you should really check to insure you have actually read the entire line, and not experienced a short read where the maximum character value you supply is not sufficient to read the entire line (e.g. a short read because characters in that line remain unread)
If a short read occurs, that will completely destroy your ability to read any further lines from the file, unless you handle the failure correctly. This is because the next attempt to read will NOT start reading on the line you think it is reading and instead attempt to read the remaining characters of the line where the short read occurred.
You can validate a read by fgets by validating the last character read into your buffer is in fact a '\n' character. (if the line is longer than the max you specify, the last character before the nul-terminating character will be an ordinary character instead.) If a short read is encountered, you must then read and discard the remaining characters in the long line before continuing with your next read. (unless you are using a dynamically allocated buffer where you can simply realloc as required to read the remainder of the line, and your data structure)
Your situation complicates the validation by requiring data from 3 lines from the input file for each struct element. You must always maintain your 3-line read in sync reading all 3 lines as a group during each iteration of your read loop (even if a short read occurs). That means you must validate that all 3 lines were read and that no short read occurred in order to handle any one short read without exiting your input loop. (you can validate each individually if you just want to terminate input on any one short read, but that leads to a very inflexible input routine.
You can tweak the rmlf function above to a function that validates each read by fgets in addition to removing the trailing newline from the input. I have done that below in a function called, surprisingly, shortread. The tweaks to the original function and read loop could be coded something like this:
int shortread (char *s, FILE *fp);
...
for (idx = 0; idx < MAX_SONGS;) {
int t, a, b;
t = a = b = 0;
/* validate fgets read of complete line */
if (!fgets (title, MAX_TITLE, fp)) break;
t = shortread (title, fp);
if (!fgets (artist, MAX_ARTIST, fp)) break;
a = shortread (artist, fp);
if (!fgets (buf, MAX_MINSEC, fp)) break;
b = shortread (buf, fp);
if (t || a || b) continue; /* if any shortread, skip */
if (sscanf (buf, "%d %d", &min, &sec) != 2) { /* validate conversion */
fprintf (stderr, "error: failed to parse 'min' 'sec'.\n");
continue; /* skip line - tailor to your needs */
}
s[idx].time.min = min; /* copy to struct members & increment index */
s[idx].time.sec = sec;
strncpy (s[idx].title, title, MAX_TITLE);
strncpy (s[idx].artist, artist, MAX_ARTIST);
idx++;
}
...
/** validate complete line read, remove tailing newline from 's'.
* returns 1 on shortread, 0 - valid read, -1 invalid/empty string.
* if shortread, read/discard remainder of long line.
*/
int shortread (char *s, FILE *fp)
{
if (!s || !*s) return -1;
for (; *s && *s != '\n'; s++) {}
if (*s != '\n') {
int c;
while ((c = fgetc (fp)) != '\n' && c != EOF) {}
return 1;
}
*s = 0;
return 0;
}
(note: in the example above the result of the shortread check for each of the lines that make up and title, artist, time group.)
To validate the approach I put together a short example that will help put it all in context. Look over the example and let me know if you have any further questions.
#include <stdio.h>
#include <string.h>
/* constant definitions */
enum { MAX_MINSEC = 10, MAX_ARTIST = 32, MAX_TITLE = 48, MAX_SONGS = 64 };
typedef struct {
int min;
int sec;
} stime;
typedef struct {
char title[MAX_TITLE];
char artist[MAX_ARTIST];
stime time;
} songs;
int shortread (char *s, FILE *fp);
int main (int argc, char **argv) {
char title[MAX_TITLE] = "";
char artist[MAX_ARTIST] = "";
char buf[MAX_MINSEC] = "";
int i, idx, min, sec;
songs s[MAX_SONGS] = {{ .title = "", .artist = "" }};
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
for (idx = 0; idx < MAX_SONGS;) {
int t, a, b;
t = a = b = 0;
/* validate fgets read of complete line */
if (!fgets (title, MAX_TITLE, fp)) break;
t = shortread (title, fp);
if (!fgets (artist, MAX_ARTIST, fp)) break;
a = shortread (artist, fp);
if (!fgets (buf, MAX_MINSEC, fp)) break;
b = shortread (buf, fp);
if (t || a || b) continue; /* if any shortread, skip */
if (sscanf (buf, "%d %d", &min, &sec) != 2) { /* validate conversion */
fprintf (stderr, "error: failed to parse 'min' 'sec'.\n");
continue; /* skip line - tailor to your needs */
}
s[idx].time.min = min; /* copy to struct members & increment index */
s[idx].time.sec = sec;
strncpy (s[idx].title, title, MAX_TITLE);
strncpy (s[idx].artist, artist, MAX_ARTIST);
idx++;
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (i = 0; i < idx; i++)
printf (" %2d:%2d %-32s %s\n", s[i].time.min, s[i].time.sec,
s[i].artist, s[i].title);
return 0;
}
/** validate complete line read, remove tailing newline from 's'.
* returns 1 on shortread, 0 - valid read, -1 invalid/empty string.
* if shortread, read/discard remainder of long line.
*/
int shortread (char *s, FILE *fp)
{
if (!s || !*s) return -1;
for (; *s && *s != '\n'; s++) {}
if (*s != '\n') {
int c;
while ((c = fgetc (fp)) != '\n' && c != EOF) {}
return 1;
}
*s = 0;
return 0;
}
Example Input
$ cat ../dat/titleartist.txt
First Title I Like
First Artist I Like
3 40
Second Title That Is Way Way Too Long To Fit In MAX_TITLE Characters
Second Artist is Fine
12 43
Third Title is Fine
Third Artist is Way Way Too Long To Fit in MAX_ARTIST
3 23
Fourth Title is Good
Fourth Artist is Good
32274 558212 (too long for MAX_MINSEC)
Fifth Title is Good
Fifth Artist is Good
4 27
Example Use/Output
$ ./bin/titleartist <../dat/titleartist.txt
3:40 First Artist I Like First Title I Like
4:27 Fifth Artist is Good Fifth Title is Good

Instead of sscanf(), I would use strtok() and atoi().
Just curious, why only 10 bytes for the two integers? Are you sure they are always that small?
By the way, I apologize for such a short answer. I'm sure there is a way to get sscanf() to work for you, but in my experience sscanf() can be rather finicky so I'm not a big fan. When parsing input with C, I have just found it a lot more efficient (in terms of how long it takes to write and debug the code) to just tokenize the input with strtok() and convert each piece individually with the various ato? functions (atoi, atof, atol, strtod, etc.; see stdlib.h). It keeps things simpler, because each piece of input is handled individually, which makes debugging any problems (should they arise) much easier. In the end I typically spend a lot less time getting such code to work reliably than I did when I used to try to use sscanf().

Use "%*s %*s %d %d" as your format string, instead...
You seem to be expecting sscanf to automagically skip the two tokens leading up to the decimal digit fields. It doesn't do that unless you explicitly tell it to (hence the pair of %*s).
You can't expect the people who designed C to have designed it the same way as you would. You NEED to check the return value, as iharob said.
That's not all. You NEED to read (and understand reelatively well) the entire scanf manual (the one written by OpenGroup is okay). That way you know how to use the function (including all of the subtle nuances of format strings) and what to do with the return vale.
As a programmer, you need to read. Remember that well.

C How to ignore empty lines in user input?

here is my current code:
int num = 0;
char c = '#';
scanf("%d",&num);
do{
for (int i=0;i<num;i++){
printf("%c",c);
}
printf("\n");
}
while (scanf("%d", &num) == 1);
How would I have it so that if the user doesn't enter anything, that the program won't spit out a newline?
Any help is appreciated, thank you!

This code should work for what you want to do :
#include <stdio.h>
int main()
{
int num = 0;
char c = '#';
char readLine[50];
while ((fgets(readLine, sizeof readLine, stdin) != NULL) && sscanf(readLine, "%d", &num) == 1)
{
for (int i=0;i<num;i++){
printf("%c",c);
}
printf("\n");
fflush(stdout);
}
return 0;
}
The behaviour of this code is the following : fgets will read anything you enter in the standard stream (stdin), and put it in the readLine array. The program will then try to read the number which is in your readLine variable and put it in your num variable with the sscanf function. If a number is read, the program will execute the behaviour you did present in your question (writing a # character "num" times), and go back to the beginning of the loop. If anything else than a number has been read, the loop is stopped.

In general, avoid scanf. It's very easy to leave yourself with unprocessed cruft on the input stream. Instead, read the whole line and then use sscanf (or something else) to process it. This guarantees that you won't get stuck with a partially read line, those are hard to debug.
I prefer getline to fgets to read lines. fgets requires you to guess how long the input might be, and input might get truncated. getline will allocate the memory to read the line for you avoiding buffer overflow or truncation problems.
NOTE: getline is it's not a C standard function, but a POSIX one and fairly recent (2008), though it was a GNU extension well before that. Some older compilers may not have it.
#include <stdio.h>
#include <stdlib.h>
int main()
{
char c = '#';
char *line = NULL;
size_t linelen = 0;
/* First read the whole line */
while( getline(&line, &linelen, stdin) > 0 ) {
/* Then figure out what's in it */
long num = 0;
if( sscanf(line, "%ld", &num) > 0 ) {
for( int i = 0; i < num; i++ ) {
printf("%c", c);
}
printf("\n");
}
}
free(line);
return 0;
}
if( sscanf(line, "%ld", &num) > 0 ) { will ignore any line that does not match any part of the pattern, such as a blank line or a line full of words, by checking how many things matched. Yet it will still handle 0 as a valid input.
$ ./test
foo
bar
foo123
12
############
1
#
0
2
##
I also moved num inside the loop to guarantee it's reinitialized each iteration, and on the general principle of putting your variables in minimum scopes to avoid interference. And I upgraded it to a long int better able to handle the unpredictably large numbers users might type in.

Here is how I have done input parsing over the years using the fgets() and sscanf() functions. I don't write c++ much, and if I can I keep code within old style ansi C then I do.
The fgets and sscanf functions from the stdio.h library are universal and are always available on any platform.
For a character array used to read in anything, I generally set LINE_SIZE to 256 or 512 even if I know typically the line to be read is 80 characters or less. With any computer today having over 1GB of RAM, not worth worrying about allocating an extra 500 or so bytes. Obviously, if you have no idea how long the input line is then you either have to:
guess at what LINE_SIZE should be set to and not worry about it
or verify a newline character is present in line[] prior to a null character after calling fgets().
# include <stdio.h>
# define LINE_SIZE 256
int main ( int argc, char *argv[] )
{
FILE *fp;
char line[LINE_SIZE];
int nn;
int value;
fp = fopen( "somefile", "r" );
fgets( line, LINE_SIZE, fp );
/*
this way to read from standard input (i.e. the keyboard)
using fgets with stdin prevents compiler warning when using
deprecated gets function
fgets( line, LINE_SIZE, stdin );
*/
if ( line[0] != '\n' )
{
/* definitely not a blank line */
nn = sscanf( line, "%d", &num );
if ( nn == 1 )
{
/* some number placed into num variable that met the
%d conversion for the sscanf function
*/
}
}
return 0;

Jumping to next line with fscanf()

I have two files .csv and I need to read the whole file but it have to be filed by field. I mean, csv files are files with data separated by comma, so I cant use fgets.
I need to read all the data but I don't know how to jump to the next line.
Here is what I've done so far:
int main()
{
FILE *arq_file;
arq_file = fopen("file.csv", "r");
if(arq_file == NULL){
printf("Not possible to read the file.");
exit(0);
}
while( !feof(arq_file) ){
fscanf(arq_file, "%i %lf", &myStruct[i+1].Field1, &myStruct[i+1].Field2);
}
fclose(arq_file);
return 0;
}
It will get in a infinity loop because it never gets the next line.
How could I reach the line below the one I just read?
Update: File 01 Example
1,Alan,123,
2,Alan Harper,321
3,Jose Rendeks,32132
4,Maria da graça,822282
5,Charlie Harper,9999999999
File 02 Example
1,320,123
2,444,321
3,250,123,321
3,3,250,373,451
2,126,621
1,120,320
2,453,1230
3,12345,0432,1830

I think an example is better than giving you hints, this is a combination of fgets() + strtok(), there are other functions that could work for example strchr(), though it's easier this way and since I just wanted to point you in the right direction, well I did it like this
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int
main(void)
{
FILE *file;
char buffer[256];
char *pointer;
size_t line;
file = fopen("data.dat", "r");
if (file == NULL)
{
perror("fopen()");
return -1;
}
line = 0;
while ((pointer = fgets(buffer, sizeof(buffer), file)) != NULL)
{
size_t field;
char *token;
field = 0;
while ((token = strtok(pointer, ",")) != NULL)
{
printf("line %zu, field %zu -> %s\n", line, field, token);
field += 1;
pointer = NULL;
}
line += 1;
}
return 0;
}
I think it's very clear how the code works and I hope you can understand.

If the same code has to handle both data files, then you're stuck with reading the fields into a string, and subsequently converting the string into a number.
It is not clear from your description whether you need to do something special at the end of line or not — but because only one of the data lines ends with a comma, you do have to allow for fields to be separated by a comma or a newline.
Frankly, you'd probably do OK with using getchar() or equivalent; it is simple.
char buffer[4096];
char *bufend = buffer + sizeof(buffer) - 1;
char *curfld = buffer;
int c;
while ((c = getc(arq_file)) != EOF)
{
if (curfld == bufend)
…process overlong field…
else if (c == ',' || c == '\n')
{
*curfld = '\0';
process(buffer);
curfld = buffer;
}
else
*curfld++ = c;
}
if (c == EOF && curfld != buffer)
{
*curfld = '\0';
process(buffer);
}
However, if you want to go with higher level functions, then you do want to use fgets() to read lines (unless you need to worry about deviant line endings, such as DOS vs Unix vs old-style Mac (CR-only) line endings). Or use POSIX
getline() to read arbitrarily long lines. Then split the lines using strtok_r() or equivalent.
char *buffer = 0;
size_t buflen = 0;
while (getline(&buffer, &buflen, arq_file) != -1)
{
char *posn = buffer;
char *epos;
char *token;
while ((token = strtok_r(posn, ",\n", &epos)) != 0)
{
process(token);
posn = 0;
}
/* Do anything special for end of line */
}
free(buffer);
If you think you must use scanf(), then you need to use something like:
char buffer[4096];
char c;
while (fscanf(arq_file, "%4095[^,\n]%c", buffer, &c) == 2)
process(buffer);
The %4095[^,\n] scan set reads up to 4095 characters that are neither comma nor newline into buffer, and then reads the next character (which must, therefore, either be comma or newline — or conceivably EOF, but that causes problems) into c. If the last character in the file is neither comma nor newline, then you will skip the last field.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Pattern Recognition for File input in C - c

Related

Segmentation fault while reading data from file

Strtok strange behaviour

Reading multiple lines with different data types in C

C How to ignore empty lines in user input?

Jumping to next line with fscanf()

Categories

Resources