How to properly fetch data from tab separated fields in text file - c

I am trying to learn how to import data from tab separated fields in a text file. Here it is an example of what I am trying to fetch from an external file called users.in:
1 joshmith mypwd John Smith Awesome Road 103
2 jane_doe strongpwd Jane Doe Lucky Street 201
3 august84 goodpwd August May Red Boulevard 24
here it is the structure that is supposed to keep the data...
typedef struct User
{
int id;
char username[20];
char password[40];
char firstname[20];
char lastname[20];
char address[120];
} User;
... and of course the code that should handle the operation:
User *u = (User *)malloc(sizeof(User)*4);
int i = 0;
while (6 == fscanf(data_file, "%d\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\n", &(u+i)->id, (u+i)->username, (u+i)->password, (u+i)->firstname, (u+i)->lastname, (u+i)->address))
{
fprintf(stdout, "%d %s %s %s %s %s\n", (u+i)->id, (u+i)->username, (u+i)->password, (u+i)->firstname, (u+i)->lastname, (u+i)->address);
i++;
}
the loop manages to go through the first iteration... and then it stops. Here it is the output:
1 joshmith mypwd John Smith Awesome Road 103
2
can anyone help me figure out why is this happening? What is the proper way to import such formatted data?

I would use fgets to read each line into a string and then use strtok with \t as a delimiter character to extract the tokens; the first token in each line can be converted to a number using atoi.
NOTE: using atoi() means that an invalid number will be returned as a zaero value, so you cannot distinguish between these without extra logic

The problem with your format string is that the last scanset you're using is %[^\t] while most likely ends with a \n, although of course it could possibly be that it ends with a \t. If it is certain that it ends with a \n, then simply changing that last one should suffice:
"%d\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\n]\n"
// changed this ^ from t to n
If it may also be a \t, then you may use the following:
"%d\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\n\t]%*[\n\t]"
// %[^\n\t] discards and assigns whatever found until a '\t' or '\n' is encountered
// %*[\n\t] discards and only discards '\n's and '\t's
// ... until something else is encountered
As an additional information, a space ' ' inside a format string matches to zero or more of any whitespace character and discards them. It essentially is like %*[ \t\n] telling the -scanf to: match any (if any) ' ', '\t' and '\n' until you encounter something else and discard them.

Related

`C`: String not being updated properly when reading from a CVS file

I am given a text file of movie showtime information. I have to format the information in a clean way. Right now I'm just trying to get all line's information saved into strings. However, when getting the movie's rating the array wont save the rating properly.
This is the main code.
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
int main(void) {
const int MAX_TITLE_CHARS = 44; // Maximum length of movie titles
const int LINE_LIMIT = 100; // Maximum length of each line in the text file
char line[LINE_LIMIT];
char inputFileName[25];
FILE *file;
file = fopen("D:\\movies.txt", "r");
char currentLine[LINE_LIMIT];
char movieTitle[MAX_TITLE_CHARS];
char movieTime[10];
char movieRating[10];
fgets(currentLine, LINE_LIMIT, file); // Get first file
while(!feof(file)){
sscanf(currentLine, "%[^,],%44[^,],%s", movieTime, movieTitle, movieRating);
printf("%s\n", movieRating);
fgets(currentLine, LINE_LIMIT, file); // Get next file
}
return 0;
}
This is the CVS file
16:40,Wonders of the World,G
20:00,Wonders of the World,G
19:00,Journey to Space ,PG-13
12:45,Buffalo Bill And The Indians or Sitting Bull's History Lesson,PG
15:00,Buffalo Bill And The Indians or Sitting Bull's History Lesson,PG
19:30,Buffalo Bill And The Indians or Sitting Bull's History Lesson,PG
10:00,Adventure of Lewis and Clark,PG-13
14:30,Adventure of Lewis and Clark,PG-13
19:00,Halloween,R
This prints out
G
G
PG-13
PG-13
PG-13
PG-13
PG-13
PG-13
R
I need it to be
G
G
PG-13
PG
PG
PG
PG-13
PG-13
R
I use Eclipse and when in the debugger, I see that when it encounters the first PG-13, it doesn't update at all until the R. I'm thinking maybe since PG and PG-13 have the same two starting characters perhaps it gets confused? I'm not sure. Any help is appreciated.
You are converting the line using the following line:
sscanf(currentLine, "%[^,],%44[^,],%s", movieTime, movieTitle, movieRating);
the function will read a string into movietTime until a ',' appears in the input, then it will read another string until either a ',' appears or 44 characters are read. This behavior is explained in the manual for sscanf:
...
An optional decimal integer which specifies the maximum field width.
Reading of characters stops either when this maximum is reached or when
a nonmatching character is found, whichever happens first...
The lines with PG ratings have titles with 62 characters. Thus, it does not read the entire title, and does not find the comma. To fix this issue, you can either set MAX_TITLE_CHARS to a greater value or use the %m modifier to have sscanf dynamically allocate the string for you.
OP code had undefined behavior (UB) as the movieTitle[] was only big enough for 43 character + the terminating null character and OP used "%44[^,]" rather than the correct width limit of 43.
const int MAX_TITLE_CHARS = 44; // Maximum length of movie
...
char movieTitle[MAX_TITLE_CHARS];
Other problems too that followed this UB.
Account for the '\n' of the line and a '\0' to form a string.
Never use while(feof(...)).
Test sscanf() results.
Limit printed title width with a precision.
const int LINE_LIMIT = 100; // Maximum length of each line in the text file
char line[LINE_LIMIT + 2 /* room for \n and \0 */];
while (fgets(currentLine, sizeof currentLine, file)) {
// Either use a _width limit_ with `"%s"`, `"%[]"` or use a worse case size.
char movieTime[10 + 1]; // 10 characters + \0
char movieTitle[sizeof currentLine];
char movieRating[sizeof currentLine];
// Examples:
// Use a 10 width limit for the 11 char movieTime
// Others: use a worst case size.
if (sscanf(currentLine, " %10[^,], %[^,], %[^\n]",
movieTime, movieTitle, movieRating) != 3) {
fprintf(stderr, "Failed to parse <%s>\n", currentLine);
break;
}
// Maximum length of movie titles _to print_
const int MAX_TITLE_CHARS = 44;
printf("Title: %-.*s\n", MAX_TITLE_CHARS, movieTitle);
printf("Rating: %s\n", movieRating);
}
Note that "Maximum length of each line" is unclear if the length includes the ending '\n'. In the C library, a line includes the '\n'.
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a
terminating new-line character is implementation-defined. C17dr § 7.21.2 2
Your string 'Buffalo Bill...' is more than 44 characters. thus the sccanf statement reads up to that limit, it then looks for a ',', which doesn't exist 44 characters into the string and exits.
Because your new movieRating isn't being set, it just prints the previous value.
Hint: If you are looking for a work around, you can parse your string with something like strsep(). You can also just increase the size of your movie title.

C split specific File input into variables

I'm making phone book assignment.
I have to get data from delivered file that is given in specific format.
Name | Surname | Phone number
I'm using code below :
while(!feof(file)){
int result = fscanf(file, "%19s | %39s | %d", p->name, p->last_name, &p->number);
if (result == 3){
p++;
counter++;
if (counter > size + 1){
break;
}
}
}
It works ok for simple cases like :
Jim | Carrey | 123456
but it breaks when input is
Louis | Gossett Jr. | 502521950
Then this function writes name Louis Surname Gossett(lacks " .Jr" and Phone number is left empty and result returns 2 which makes input invalid...
How can i fix this ?
I tried to play a little bit with format specifiers [^...] but I can't really figure out if that's the correct way of thinking and how they actually works. Whenever i added something, everything broke completely. I'll add that requirement is not to use Arrays or functions allocating memory :(
Fscanf with %s format specifier will read a word and stop when it encounters a whitespace (newline, space or tab). That is why surname is not read completely. You are reading Luis and Gossett but %d can't read a string Jr.
You could use %[^|]| to read the whole string (name or surname).
Try fscanf(file, "%19[^|]|%39[^|]|%d", p->name, p->last_name, &p->number);
Something like this after fscanf call should clear the trailing spaces, but there are probably other ways to do it.
if (p->name[strlen(p->name)-1] == ' ')
p->name[strlen(p->name)-1] = '\0';

Parsing file txt C

I guys i've this part of my code:
void token(){
FILE *pointer;
user record;
pointer = fopen("utente_da_file.txt","r+");
printf("OK");
fscanf(pointer , "%s, %s, %s, %s, %s \n" , record.nome_utente , record.nome , record.cognome , record.data_di_nascita , record.data_di_iscrizione);
fclose(pointer);
printf("TEXT -> %s \n" , record.nome_utente);
}
This is utente_da_file.txt
cocco,ananas,banana,ciao,miao
This is my output:
TEXT -> cocco,ananas,banana,ciao,miao
I don't understand why.
Greetings :)
This is due to the nature of %s parameter in scanf family: it consumes all characters up to the first white space character it encounters – or up to the end of input, whichever comes first (scanf - OK, C++ documentation, but applies for C alike). As you do not have any whitespace in your file, the entire content is consumed at once, including the commas, before you can scan for them in your format string...
You would get a hint for if you checked the return value of (f)scanf - it returns the number of variables filled, so you should have got 1 as return value.
Problem with (f)scanf family is that you cannot specify the delimiters for your strings to stop. So in your case, you will have to append white space in between the words of the file. But be aware that the comma will be part of the string then, if you append whitespace after them, you would have to append whitespace before so that your format string can consume them - this might make your file ugly, though, so you might prefer dropping it entirely then (but then drop them in the format string, too!).
Alternatively, you can read the entire line at once using fgets and then parse it using strtok. The whole procedure could look similar to the following piece of code:
char buffer[256];
fgets(buffer, sizeof(buffer), pointer);
char const* delimiters = ", \t\n\r";
char* token = strtok(buffer, delimiters);
if(token)
{
strncpy(record.nome_utente, token, sizeof(record.nome_utente));
if((token = strtok(NULL, delimiters)))
{
strncpy(record.nome, token, sizeof(record.nome));
// rest alike...
}
}
for me the best solution is to write thr C code in this way (a space between 2 %s):
fscanf(pointer , "%s %s %s %s %s \n" , record.nome_utente , record.nome , record.cognome , record.data_di_nascita , record.data_di_iscrizione);
and write your text file in this way (a space between two records):
cocco ananas banana ciao miao
In this way I'm sure it works well.
Ciao e buona fortuna.

fscanf () read error

I have a text file which I'm reading in first and last names into a character array of size 20. I'm creating a struct array which holds the "People's" information.
The text file looks as follows:
John Robbins
Teresa Jones
my struct is defined as such:
struct people {
char name[20];
};
Declaration of Persons struct:
struct people *persons[2];
After declaring a new struct I read in the names with the following code:
for(i=0; i<2; i++)
{
fscanf(dp, "%[^\n]s", &persons[i].name[20]);
}
However, once I output the names to the console I receive the following:
hn Robbins
sa Jones
I've done extensive research and cannot find the solution to this problem. Has anyone experienced this problem?
fscanf(dp, "%[^\n]s", &persons[i].name[20]);
This reads the line up to a newline, and then attempts to read an s which will fail. The line up to the newline will be stored after the end of the name array in your people struct (which means it will overwrite into the next element of the persons array.
You want something like:
fscanf(dp, " %19[^\n]%*[^\n]", persons[i].name);
instead -- the initial space skips leading whitespace (including any newline from a previous line.) The %19[^\n] reads up to 19 characters or up to a newline and stores it, followed by a terminating NULL (so will use up the entire 20 byte name array, but no more). The %*[^\n] will read any additional characters on the line up to (and not including) the newline and throw them away.
You also want to check the return value of the fscanf call to make sure it doesn't get an error or end of file:
#define MAX_PEOPLE 2
struct people persons[MAX_PEOPLE];
i = 0;
while (i < MAX_PEOPLE && fscanf(dp, " %19[^\n]%*[^\n]", persons[i].name) > 0)
i++;

Ignoring separating character using scanf

The problem:
I am attempting to use scanf to read a sentence with fields seperate by | ,so naturally i use the scanf's natural features to ignore this symbol but it then also ignores everything that has a | in it.
The code, simplified:
int main(){
char* a=malloc(8);
char* b=malloc(8);
scanf("%s | %s",a,b);
printf("%s %s",a,b);
}
when i attempt the input:
TEST | ME
it works as intended, but when i have the following case:
TEST ME|
it naturally reads the test, but ignores the ME|, is there any way around this?
scanf("%[^ \t|]%*[ \t|]%[^ \t\n|]", a,b);
printf("%s %s",a,b);
Annotation:
%* : ignore this element.
E.g. %*s //skip the reading of the text of this one
%[character set(allow)] : Read only character set that you specify.
E.g. %[0123456789] or %[0-9] //Read as a string only numeric characters
%[^character set(denied)] : It is to mean character other than when ^ is specified at the beginning of the character set.
Yes, you can scan for a character set. The problem you're seeing is not related to the vertical bar, it's the fact that a string stops at the first whitespace character, i.e. the space between "TEST" and "ME|".
So, do something like:
if(scanf("%7[^|] | %7[^|]", a, b) == 2)
{
a[7] = b[7] = '\0';
printf("got '%s' and '%s'\n", a, b);
}
See the manual page for scanf() for details on the [ conversion specifier.
This one should work.
char a[200], b[200];
scanf ("%[^|]| %[^\n]", a, b); // Use it exactly
printf ("a = %s\nb = %s\n", a, b);
Meaning of this formatting. I seperate the format string into 3 parts and explain.
"%[^|]" - Scan everything into 1st string, until the bar character('|') appears.
"| " - Read the '|' and ignore it. Read all white space characters and ignore them.
"%[\n]" - Read remainder of the line into the 2nd string.
Test case
first string is this | 2nd is this
a = first string is this
b = 2nd is this
no space|between bar
a = no space
b = between bar
Leading spaces can be truncated by using extra local variable to store leading spaces.
%[ ] needs to be mentioned in scanf to store leading spaces
"%[ ]%[^\n]",first_string,second_string , mentioned scanf format specifier is to read two strings .
first_string contains leading spaces from given input string
second_string contains actual data without leading spaces.
Following is the sample code
int main()
{
char lVar[30];
char lPlaceHolder[30];
printf("\n Enter any string with leading spaces : ");
memset(lVar,'\0',30);
memset(lPlaceHolder,'\0',30);
scanf("%[ ]%[^\n]",lPlaceHolder,lVar);
printf("\n lPlaceHolder is :%s:\n",lPlaceHolder);
printf("\n lVar is :%s:\n",lVar);
return(0);
}
Input:
" hello world"
Output:
lPlaceHolder is : :
lVar is :hello world:
Note: Space not displayed properly for lPlaceHolder after uploading to stackover flow website
I'd say instead of messing with scanf(), try using saner functions - those that work as per the (intuitive) expectations:
char s1[] = "FOO | BAR";
char s2[] = "FOO BAR |";
void print_sep(char *in)
{
char *endp;
char *sep = strtok_r(in, "|", &endp);
printf("%s\n", sep);
if (sep = strtok_r(NULL, "|", &endp))
printf("%s\n", sep);
}
print_sep(s1);
print_sep(s2);

Resources