Parsing file txt C - c

I guys i've this part of my code:
void token(){
FILE *pointer;
user record;
pointer = fopen("utente_da_file.txt","r+");
printf("OK");
fscanf(pointer , "%s, %s, %s, %s, %s \n" , record.nome_utente , record.nome , record.cognome , record.data_di_nascita , record.data_di_iscrizione);
fclose(pointer);
printf("TEXT -> %s \n" , record.nome_utente);
}
This is utente_da_file.txt
cocco,ananas,banana,ciao,miao
This is my output:
TEXT -> cocco,ananas,banana,ciao,miao
I don't understand why.
Greetings :)

This is due to the nature of %s parameter in scanf family: it consumes all characters up to the first white space character it encounters – or up to the end of input, whichever comes first (scanf - OK, C++ documentation, but applies for C alike). As you do not have any whitespace in your file, the entire content is consumed at once, including the commas, before you can scan for them in your format string...
You would get a hint for if you checked the return value of (f)scanf - it returns the number of variables filled, so you should have got 1 as return value.
Problem with (f)scanf family is that you cannot specify the delimiters for your strings to stop. So in your case, you will have to append white space in between the words of the file. But be aware that the comma will be part of the string then, if you append whitespace after them, you would have to append whitespace before so that your format string can consume them - this might make your file ugly, though, so you might prefer dropping it entirely then (but then drop them in the format string, too!).
Alternatively, you can read the entire line at once using fgets and then parse it using strtok. The whole procedure could look similar to the following piece of code:
char buffer[256];
fgets(buffer, sizeof(buffer), pointer);
char const* delimiters = ", \t\n\r";
char* token = strtok(buffer, delimiters);
if(token)
{
strncpy(record.nome_utente, token, sizeof(record.nome_utente));
if((token = strtok(NULL, delimiters)))
{
strncpy(record.nome, token, sizeof(record.nome));
// rest alike...
}
}

for me the best solution is to write thr C code in this way (a space between 2 %s):
fscanf(pointer , "%s %s %s %s %s \n" , record.nome_utente , record.nome , record.cognome , record.data_di_nascita , record.data_di_iscrizione);
and write your text file in this way (a space between two records):
cocco ananas banana ciao miao
In this way I'm sure it works well.
Ciao e buona fortuna.

Related

How to scan strings from a text file with delimiter and then display it?

So i tried to make a program that will scan strings from a text file and then display it using loop. But, somehow my program cannot work and it is display weird symbols.. i am new to text file and i would appreciate a lot if someone can explain to me what is wrong with my code.
My code :
#include <stdio.h>
#include <string.h>
int main()
{
FILE *fPtr;
fPtr = fopen("alumni.txt", "r");
if (fPtr == NULL) {
printf("There is a error opening the file.");
exit(-1);
}
char name[20], design[50], category[20], location[20];
while (fscanf(fPtr, "%s:%[^\n]:%[^\n]:%[^\n]", &name, &design, &category, &location) != EOF) {
printf("Name : %s\n", name);
printf("Designation : %s\n", design);
printf("Category : %s\n", category);
printf("Location : %s\n", location);
}
}
and this is my text file,
Shanie:Programmer:Full Time:Kuala Lumpur
Andy:Sales Agent:Part Time:Johor Bahru
Elaine:Database Administrator Full Time Melaka
Stephanie:MIS manager:Full Time:Penang
You have two problems: The first is that %s will read space delimited "words", it won't stop at the :. The second problem is that the format %[^\n] reads all until newline.
So you need a scanset format for the first name as well as tell it to read until the next :, which is done with the format %[^:].
So please change to:
while (fscanf(fPtr, " %19[^:]:%49[^:]:%19[^:]:%19[^\n]", name, design, category, location) == 4) {
...
}
Please note a couple of other changes I made to your call and loop condition: First of all, I have added length specifiers to the formats, so fscanf will not write out of bounds of your arrays.
Secondly both the %s and %[] formats expects a char * argument, while you provided a pointer to arrays (&name will be of type char (*)[20] not char *). Arrays naturally decay to pointers to their first element, so e.g. name will decay to &name[0] which will be of the correct type char *.
Thirdly I changed the comparison to compare against 4, which is what fscanf will return if it successfully parsed the input.
Lastly I added a space before the first format, to skip any leading space (like the newline from the previous line).
To be sure to be able to continue even in the case of malformed input, I recommend you read full lines instead (using e.g. fgets), and then possibly use sscanf to parse each line.

Reading Strings/words and integers from input file

I want to write a little program to read lines from a given .csv/.txt file and print out specific details based on user input.
I'm currently working with a
FILE *input = fopen("Example.csv", "r");
and the input looks like this:
Test000, 40, 0, empty
Test001, 0, -41, empty
Now if I try to to fscanf() from input, it only sets the first char[] and ignores the other variables.
My fscanf() call looks like this:
fscanf(input, "%s , %d , %d , %s", name, &timeA, &timeB, info);
# I'm calling fscanf(...) inside of while()-condition.
# while (fscanf(...) == 4) { *apply logic here* }
So, with this code, fscanf() only ever sets name to 'Test000,', then '40', '0', 'empty' etc., but ignores timeA, timeB, and info.
They are defined as:
char name[51];
int timeA = 0;
int timeB = 0;
char info[51];
I really don't know how to circumvent this problem. Any kind of help will be appreciated!
Thank you for your time.
A scanset could be used. %50[^,] will read up to 50 characters or to a comma.
fscanf(input, " %50[^,], %d , %d , %50s", name, &timeA, &timeB, info);
Note the space before &50[^,] to consume leading whitespace.
Check the return of fscanf. In this case 4 will be returned if all four items are successfully scanned.
fscanf() treats consecutive characters until it encounters white-space as part of a single string (char[]) - so the best option for you would be to remove the commas in your .txt file, and make your fscanf the following: fscanf(input, "%s %d %d %s", name, &timeA, &timeB, info); - your data should look like: Test000 40 0 empty. That's the most straightforward way of making it work.
If you want it to work with your current data format, fscanf() may not be the best option. You would be better off using some functions form <string.h>.
char data[512];
fgets(data, sizeof (data), input);
strcpy(name, strtok(data), ","));
timea = (int) strtol(strtok(data, ","), NULL, 10);
timea = (int) strtol(strtok(data, ","), NULL, 10);
strcpy(info, strtok(data, ","));
(strcpy and strtok are both avaible in <string.h>, strtol() is available in <stdlib.h>)
strcpy is used to copy "strings".
strtok splits a string (Note that it modifies the string it is passed!).
strtol converts a string to an long (which we cast to int).
There are more secure versions of some of the functions available (i.e. strtok_r() and strtol() also comes in an int version (so you don't need to cast its return value to int) called strtod()
If you are on a *nix system, it would be a good idea to run man function_name() (e.g. man strtok) to get a better idea of the function prototype and what it does/how it behaves etc. - or you can always read the man pages online, for example the FreeBSD Online Manual Pages where you can search for the function name and read the relevent man page.

How to properly fetch data from tab separated fields in text file

I am trying to learn how to import data from tab separated fields in a text file. Here it is an example of what I am trying to fetch from an external file called users.in:
1 joshmith mypwd John Smith Awesome Road 103
2 jane_doe strongpwd Jane Doe Lucky Street 201
3 august84 goodpwd August May Red Boulevard 24
here it is the structure that is supposed to keep the data...
typedef struct User
{
int id;
char username[20];
char password[40];
char firstname[20];
char lastname[20];
char address[120];
} User;
... and of course the code that should handle the operation:
User *u = (User *)malloc(sizeof(User)*4);
int i = 0;
while (6 == fscanf(data_file, "%d\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\n", &(u+i)->id, (u+i)->username, (u+i)->password, (u+i)->firstname, (u+i)->lastname, (u+i)->address))
{
fprintf(stdout, "%d %s %s %s %s %s\n", (u+i)->id, (u+i)->username, (u+i)->password, (u+i)->firstname, (u+i)->lastname, (u+i)->address);
i++;
}
the loop manages to go through the first iteration... and then it stops. Here it is the output:
1 joshmith mypwd John Smith Awesome Road 103
2
can anyone help me figure out why is this happening? What is the proper way to import such formatted data?
I would use fgets to read each line into a string and then use strtok with \t as a delimiter character to extract the tokens; the first token in each line can be converted to a number using atoi.
NOTE: using atoi() means that an invalid number will be returned as a zaero value, so you cannot distinguish between these without extra logic
The problem with your format string is that the last scanset you're using is %[^\t] while most likely ends with a \n, although of course it could possibly be that it ends with a \t. If it is certain that it ends with a \n, then simply changing that last one should suffice:
"%d\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\n]\n"
// changed this ^ from t to n
If it may also be a \t, then you may use the following:
"%d\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\n\t]%*[\n\t]"
// %[^\n\t] discards and assigns whatever found until a '\t' or '\n' is encountered
// %*[\n\t] discards and only discards '\n's and '\t's
// ... until something else is encountered
As an additional information, a space ' ' inside a format string matches to zero or more of any whitespace character and discards them. It essentially is like %*[ \t\n] telling the -scanf to: match any (if any) ' ', '\t' and '\n' until you encounter something else and discard them.

Ignoring separating character using scanf

The problem:
I am attempting to use scanf to read a sentence with fields seperate by | ,so naturally i use the scanf's natural features to ignore this symbol but it then also ignores everything that has a | in it.
The code, simplified:
int main(){
char* a=malloc(8);
char* b=malloc(8);
scanf("%s | %s",a,b);
printf("%s %s",a,b);
}
when i attempt the input:
TEST | ME
it works as intended, but when i have the following case:
TEST ME|
it naturally reads the test, but ignores the ME|, is there any way around this?
scanf("%[^ \t|]%*[ \t|]%[^ \t\n|]", a,b);
printf("%s %s",a,b);
Annotation:
%* : ignore this element.
E.g. %*s //skip the reading of the text of this one
%[character set(allow)] : Read only character set that you specify.
E.g. %[0123456789] or %[0-9] //Read as a string only numeric characters
%[^character set(denied)] : It is to mean character other than when ^ is specified at the beginning of the character set.
Yes, you can scan for a character set. The problem you're seeing is not related to the vertical bar, it's the fact that a string stops at the first whitespace character, i.e. the space between "TEST" and "ME|".
So, do something like:
if(scanf("%7[^|] | %7[^|]", a, b) == 2)
{
a[7] = b[7] = '\0';
printf("got '%s' and '%s'\n", a, b);
}
See the manual page for scanf() for details on the [ conversion specifier.
This one should work.
char a[200], b[200];
scanf ("%[^|]| %[^\n]", a, b); // Use it exactly
printf ("a = %s\nb = %s\n", a, b);
Meaning of this formatting. I seperate the format string into 3 parts and explain.
"%[^|]" - Scan everything into 1st string, until the bar character('|') appears.
"| " - Read the '|' and ignore it. Read all white space characters and ignore them.
"%[\n]" - Read remainder of the line into the 2nd string.
Test case
first string is this | 2nd is this
a = first string is this
b = 2nd is this
no space|between bar
a = no space
b = between bar
Leading spaces can be truncated by using extra local variable to store leading spaces.
%[ ] needs to be mentioned in scanf to store leading spaces
"%[ ]%[^\n]",first_string,second_string , mentioned scanf format specifier is to read two strings .
first_string contains leading spaces from given input string
second_string contains actual data without leading spaces.
Following is the sample code
int main()
{
char lVar[30];
char lPlaceHolder[30];
printf("\n Enter any string with leading spaces : ");
memset(lVar,'\0',30);
memset(lPlaceHolder,'\0',30);
scanf("%[ ]%[^\n]",lPlaceHolder,lVar);
printf("\n lPlaceHolder is :%s:\n",lPlaceHolder);
printf("\n lVar is :%s:\n",lVar);
return(0);
}
Input:
" hello world"
Output:
lPlaceHolder is : :
lVar is :hello world:
Note: Space not displayed properly for lPlaceHolder after uploading to stackover flow website
I'd say instead of messing with scanf(), try using saner functions - those that work as per the (intuitive) expectations:
char s1[] = "FOO | BAR";
char s2[] = "FOO BAR |";
void print_sep(char *in)
{
char *endp;
char *sep = strtok_r(in, "|", &endp);
printf("%s\n", sep);
if (sep = strtok_r(NULL, "|", &endp))
printf("%s\n", sep);
}
print_sep(s1);
print_sep(s2);

Parsing .txt file in C code

I've got to parse a .txt file like this
autore: sempronio, caio; titolo: ; editore: ; luogo_pubblicazione: ; anno: 0; prestito: 0-1-1900; collocazione: ; descrizione_fisica: ; nota: ;
with fscanf in C code.
I tried with some formats in fscanf call, but none of them worked...
EDIT:
a = fscanf(fp, "autore: %s");
This is the first try I did; the patterns 'autore', 'titolo', 'editore', etc. must not be caught by fscanf().
Generally speaking, trying to parse input with fscanf is not a good idea, as it is difficult to recover gracefully if the input does not match expectations. It is generally better to read the input into an internal buffer (with fread or fgets), and parse it there (with sscanf, strtok, strtol etc.). Details on which functions are best depend on the definition of the input format (which you did not give us; example input is no replacement for a formal specification).
The following shows how to use strtok:
char* item;
char* input; // fill it with fgets
for (item = strtok(input, ";"); item != NULL; item = strtok(NULL, ";"))
{
// item loops through the following:
// "autore: sempronio, caio"
// " titolo: "
// " editore: "
// ...
}
The following shows how to use sscanf:
char tag[20];
int chars = -1;
if (sscanf(item, " %19[^:]: %n", tag, &chars) == 1 && chars >= 0)
{
printf("%s is %s\n", tag, item + chars);
}
Here, the format string consists of the following:
(space) - tells the parser to discard whitespace
19 - maximum number of bytes/chars in the tag
[^:] - tells the parser to read until it meets the colon character
: - tells the parser to discard the colon character
(whitespace) - as above
%n - tells the parser to report the number of bytes it read (check &chars)
If there was an unexpected input, the number of chars is not updated, so you have to set it to -1 before parsing each item.

Resources