C split specific File input into variables

C split specific File input into variables - c

I'm making phone book assignment.
I have to get data from delivered file that is given in specific format.
Name | Surname | Phone number
I'm using code below :
while(!feof(file)){
int result = fscanf(file, "%19s | %39s | %d", p->name, p->last_name, &p->number);
if (result == 3){
p++;
counter++;
if (counter > size + 1){
break;
}
}
}
It works ok for simple cases like :
Jim | Carrey | 123456
but it breaks when input is
Louis | Gossett Jr. | 502521950
Then this function writes name Louis Surname Gossett(lacks " .Jr" and Phone number is left empty and result returns 2 which makes input invalid...
How can i fix this ?
I tried to play a little bit with format specifiers [^...] but I can't really figure out if that's the correct way of thinking and how they actually works. Whenever i added something, everything broke completely. I'll add that requirement is not to use Arrays or functions allocating memory :(

Fscanf with %s format specifier will read a word and stop when it encounters a whitespace (newline, space or tab). That is why surname is not read completely. You are reading Luis and Gossett but %d can't read a string Jr.
You could use %[^|]| to read the whole string (name or surname).
Try fscanf(file, "%19[^|]|%39[^|]|%d", p->name, p->last_name, &p->number);
Something like this after fscanf call should clear the trailing spaces, but there are probably other ways to do it.
if (p->name[strlen(p->name)-1] == ' ')
p->name[strlen(p->name)-1] = '\0';

Related

count the total no. of keywords in the file

I want to count the total no. of keywords in the file but the code counts those keywords that are used to declare the variable.
void main()
{
//2d array used to store the keywords but few of them are used.
char key[32][12]={"int","char","while","for","if","else"};
//cnt is used to count the occurrence of the keyword in the file.
int cnt=0,i;
//used to store the string that is read line by line.
char ch[100];
FILE *fp=fopen("key.c","r");
//to check whether file exists or not
if(fp=='\0')
{
printf("file not found..\n");
exit(0);
}
//to extract the word till it don't reach the end of file
while((fscanf(fp,"%s",ch))!=EOF)
{
//compare the keyword with the word present in the file.
for(i=0;i<32;i++)
{
// compare the keyword with the string in ch.
if(strcmp(key[i],ch)==0) {
//just to check which keyword is printed.
printf("\nkeyword is : %s",ch);
cnt++;
}
}
}
printf("\n Total no. of keywords are : %d", cnt);
fclose(fp);
}
Expected output should be:
Total no. of keywords are : 7
Actual output is coming :
Total no. of keywords are : 3

fscanf(fp,"%s",ch) will match a sequence of non-whitespace characters (see cpp reference), so in your case for, while and if won't be matched as single words - because there's no space after them.

In my opinion, but diverting a little from your intention, you had better to use flex(1) for that purpose, as it will scan the file more efficiently than comparing each sequence with the set of words you may have. This approach will require more processing, as several keywords can be in the same line, and it only filters which lines have keywords on them.
Also, using flex(1) will give you a more efficient C source code, a sample input for flex(1) would be:
%{
unsigned long count = 0;
%}
%%
int |
char |
unsigned |
signed |
static |
auto |
do |
while |
if |
else |
/* ... add more keywords as you want here */
return |
break |
continue |
volatile { printf("keyword is = %s\n", yytext);
count++;
}
\n |
. ;
%%
int yywrap()
{
return 1;
}
int main()
{
yylex();
printf("count = %lu\n", count);
}
The efficiency comes basically from the fact that flex(1) uses a special algorithm that gets the right match with only scanning once the source file (one decision per char, all the patterns are scanned in parallel). The problem in your code comes from the fact that %s format has a special interpretation of what it considers is a word, different as the one defined by the C language (for scanf() a word si something surrounded by spaces, where spaces means \n, \t or only --- it will match as a word something like while(a==b) if you don't put spaces around your keywords). Also, If you need to compare each input pattern with each of the words your algorithm will end doing N passes through each input file character (with each letter meaning N = nw * awl (being N the number of times you compare each character and nw the number of words, awl the average of the list of word lengths in your set) By the way, keywords should not be recognised inside comments, or string literals, It is easy to adapt the code you see above to reject those and do a right scanning. For example, the next flex file will do this:
%{
unsigned long count = 0;
%}
%x COMM1
%x COMM2
%x STRLIT
%x CHRLIT
%%
int |
char |
unsigned |
signed |
static |
auto |
do |
while |
if |
else |
/* ... */
return |
break |
continue |
volatile { printf("kw is %s\n", yytext);
count++;
}
[a-zA-Z_][a-zA-Z0-9_]* |
^[\ \t]*#.* |
"/*"([^*]|\*[^/])*"*/" |
"//".* |
\"([^"\n]|\\")*\" |
\'([^'\n]|\\')*\' |
. |
\n ;
%%
int yywrap()
{
return 1;
}
int main()
{
yylex();
printf("count = %lu\n", count);
}
It allows different regular expressions to be recognised as language tokens, so provision is given to match also C language constructs like identifiers ([a-zA-Z_][a-zA-Z0-9_]*), preprocessor directives (^[\ \t]*#.*), old style C comments ("/*"([^*]|\*[^/])*"*/"), new C++ style comments ("//".*), string literals (\"([^"\n]|\\")*\"), character literals (\'([^'\n]|\\')*\'), where keywords cannot be identified as such.
Flex(1) is worth learning, as it simplifies a lot the input of structured data into a program. I suggest you to study it.
note
you had better to write if (fp == NULL), or even if (!fp)... (You are not doing anything incorrect in your statement if (fp == '\0'), but as \0 is the char representation of the nul character, it's somewhat inconvenient, strange or imprecise to compare a pointer value with a character literal, and suggests you are interpreting the FILE * not as a pointer, but more as an integer (or char) value.) But I repeat, it's something perfectly legal in C language.
note 2
The flex sample code posted above doesn't consider the possibility of running out of buffer space due to input very long tokens (like several line comments overflowing internal buffer space) This is done on purpose, to simplify description and to make the code simpler. Of course, in a professional scanner, all of these must be acquainted for.

Parsing file txt C

I guys i've this part of my code:
void token(){
FILE *pointer;
user record;
pointer = fopen("utente_da_file.txt","r+");
printf("OK");
fscanf(pointer , "%s, %s, %s, %s, %s \n" , record.nome_utente , record.nome , record.cognome , record.data_di_nascita , record.data_di_iscrizione);
fclose(pointer);
printf("TEXT -> %s \n" , record.nome_utente);
}
This is utente_da_file.txt
cocco,ananas,banana,ciao,miao
This is my output:
TEXT -> cocco,ananas,banana,ciao,miao
I don't understand why.
Greetings :)

This is due to the nature of %s parameter in scanf family: it consumes all characters up to the first white space character it encounters – or up to the end of input, whichever comes first (scanf - OK, C++ documentation, but applies for C alike). As you do not have any whitespace in your file, the entire content is consumed at once, including the commas, before you can scan for them in your format string...
You would get a hint for if you checked the return value of (f)scanf - it returns the number of variables filled, so you should have got 1 as return value.
Problem with (f)scanf family is that you cannot specify the delimiters for your strings to stop. So in your case, you will have to append white space in between the words of the file. But be aware that the comma will be part of the string then, if you append whitespace after them, you would have to append whitespace before so that your format string can consume them - this might make your file ugly, though, so you might prefer dropping it entirely then (but then drop them in the format string, too!).
Alternatively, you can read the entire line at once using fgets and then parse it using strtok. The whole procedure could look similar to the following piece of code:
char buffer[256];
fgets(buffer, sizeof(buffer), pointer);
char const* delimiters = ", \t\n\r";
char* token = strtok(buffer, delimiters);
if(token)
{
strncpy(record.nome_utente, token, sizeof(record.nome_utente));
if((token = strtok(NULL, delimiters)))
{
strncpy(record.nome, token, sizeof(record.nome));
// rest alike...
}
}

for me the best solution is to write thr C code in this way (a space between 2 %s):
fscanf(pointer , "%s %s %s %s %s \n" , record.nome_utente , record.nome , record.cognome , record.data_di_nascita , record.data_di_iscrizione);
and write your text file in this way (a space between two records):
cocco ananas banana ciao miao
In this way I'm sure it works well.
Ciao e buona fortuna.

How to properly fetch data from tab separated fields in text file

I am trying to learn how to import data from tab separated fields in a text file. Here it is an example of what I am trying to fetch from an external file called users.in:
1 joshmith mypwd John Smith Awesome Road 103
2 jane_doe strongpwd Jane Doe Lucky Street 201
3 august84 goodpwd August May Red Boulevard 24
here it is the structure that is supposed to keep the data...
typedef struct User
{
int id;
char username[20];
char password[40];
char firstname[20];
char lastname[20];
char address[120];
} User;
... and of course the code that should handle the operation:
User *u = (User *)malloc(sizeof(User)*4);
int i = 0;
while (6 == fscanf(data_file, "%d\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\n", &(u+i)->id, (u+i)->username, (u+i)->password, (u+i)->firstname, (u+i)->lastname, (u+i)->address))
{
fprintf(stdout, "%d %s %s %s %s %s\n", (u+i)->id, (u+i)->username, (u+i)->password, (u+i)->firstname, (u+i)->lastname, (u+i)->address);
i++;
}
the loop manages to go through the first iteration... and then it stops. Here it is the output:
1 joshmith mypwd John Smith Awesome Road 103
2
can anyone help me figure out why is this happening? What is the proper way to import such formatted data?

I would use fgets to read each line into a string and then use strtok with \t as a delimiter character to extract the tokens; the first token in each line can be converted to a number using atoi.
NOTE: using atoi() means that an invalid number will be returned as a zaero value, so you cannot distinguish between these without extra logic

The problem with your format string is that the last scanset you're using is %[^\t] while most likely ends with a \n, although of course it could possibly be that it ends with a \t. If it is certain that it ends with a \n, then simply changing that last one should suffice:
"%d\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\n]\n"
// changed this ^ from t to n
If it may also be a \t, then you may use the following:
"%d\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\n\t]%*[\n\t]"
// %[^\n\t] discards and assigns whatever found until a '\t' or '\n' is encountered
// %*[\n\t] discards and only discards '\n's and '\t's
// ... until something else is encountered
As an additional information, a space ' ' inside a format string matches to zero or more of any whitespace character and discards them. It essentially is like %*[ \t\n] telling the -scanf to: match any (if any) ' ', '\t' and '\n' until you encounter something else and discard them.

creating a string by mixing char[]s and ints in C

I am new to C and I am trying to figure out how to create separate strings (char[]) made up of a mix of char[]s and ints that line up in length-
for example if I have char[] first name, char[] last name and int age I need them all on 1 line the same length example -
Joe |smith |45
Amy |Footh |2
with each line being its own char[] and lining up.
This is the code I have so far -
while(temp != NULL)
{
char listLine[IDLEN + FOODNAMELEN + DESCLEN + 10];
char * id = temp->data->id;
char * name = temp->data->name;
int dollars = temp->data->price.dollars;
int cents = temp->data->price.cents;
sprintf(listLine,"%s |%s |$%d.%d\n", id, name, dollars, cents);
printf("%s",listLine);
temp = temp->next;
}
This works OK but I cant seem to line up the | with each other.
Im still new to stack exchange, so I am not sure how to mark as homework... but yes this is homework.
any help would be great.
Thanks

You can use width specifiers with the printf format specifiers.
Example:
printf("%3d | %10s", 32, "Hello");
will print
_32 | _____Hello
where _(underscore) represents a space
You can also specify width as an argument using printf("%*d", width, 32);. The width can be determined by the length of strings you are printing. The maximum length of a string can be your desired width. For numbers you can assume no number (32-bit number) will be greater than 10 digits (232 has 10 digits in decimal notation)
See this

If you're trying to line these up on the pipe characters ('|'), then you could make a first pass on your data to determine the maximum length of each field in characters, and then on the second pass, space-pad each field that is shorter than it's maximum so that when all have been formatted, each field will be exactly the same length.
Or, you could determine the max length you want for each field and then format those using a printf() statement.

It sounds to me like you're wanting to add tabs? You can add tabs by adding \t
"%s \t|%s \t|$%d.%d\n"

How to assign a string value

Can I do something like the code below to get a persons name into the firstname string?
printf("First Name? ");
scanf("%s", &firstname[11]);

yup. This stores the input in firstname, starting at index 11.

No. You should not do it this way. It is unsafe, in exactly the same way gets is unsafe.
You should instead do something like this. (I am assuming that you really do want to write to firstname starting at character position 11, which is what &firstname[11] does. If firstname is 11 bytes long and you want to write starting at position 0, you would simply use firstname and remove the various occurrences of 11 + below.)
char inbuf[80];
size_t n;
fputs("First Name? ", stdout);
fgets(inbuf, sizeof inbuf, stdin);
n = strlen(inbuf);
if (n == 0) {
fputs("No name entered\n", stderr);
exit(1);
}
if (inbuf[n-1] != '\n') {
fputs("Name too long\n", stderr);
exit(1);
}
inbuf[--n] = '\0';
if (n == 0) {
fputs("No name entered\n", stderr);
exit(1);
}
if (11 + n >= sizeof firstname) {
fputs("Name too long\n", stderr);
exit(1);
}
memcpy(&firstname[11], inbuf, n);
firstname[11 + n] = '\0';
If you reaction is that that looks like a giant pain in the behind, all I can say is, welcome to C programming. All of that is in fact necessary for robustness in the face of arbitrarily malformed input.
You should also reflect upon Falsehoods Programmers Believe About Names and then redesign your database accordingly. (Most importantly in this context: people do not necessarily divide their names into "first" and "last" and "middle" components; many people have first names that require more than 10 bytes to represent.)
EDIT: Bugs in example code should now all be corrected. Serve me right for doing memory arithmetic in my head without testing it.

The short answer to your question is, yes you can.
The long answer (and questions that go with it)...
What's the declaration of firstname?
What's in the first 11 places of filename?
Are you trying to limit the the limit the length of the user input to 11 characters and that was the syntax you thought would work?
I suspect that you are trying to limit the firstname to be at most 11 characters. The way to solve this is:
char firstname[12];
scanf("%11s", firstname);
However, as Zach pointed out, this will be problematic if you have a first name like "Mary Kay". scanf will stop at the first white space and you will end up with a first name that contains just "Mary". The better approach is to use fgets.
fgets(firstname, 12, stdin);
That should work if there is nothing else in a line.
If you expect to read the data from a file and there are other fields in a line, then you will have to deal with that additional complexity. Say you have first name, last name, and age in a line. Your input file could look something like:
John, Deer, 59
Mary Kay, Smith, 42
If your input file contains data like above, you can't use fgets to pick out the first name using fgets(firstname, 12, infile);. You will have to use fgets to read the entire line, then parse the line to extract all the relevant data.