Ignoring separating character using scanf - c

The problem:
I am attempting to use scanf to read a sentence with fields seperate by | ,so naturally i use the scanf's natural features to ignore this symbol but it then also ignores everything that has a | in it.
The code, simplified:
int main(){
char* a=malloc(8);
char* b=malloc(8);
scanf("%s | %s",a,b);
printf("%s %s",a,b);
}
when i attempt the input:
TEST | ME
it works as intended, but when i have the following case:
TEST ME|
it naturally reads the test, but ignores the ME|, is there any way around this?

scanf("%[^ \t|]%*[ \t|]%[^ \t\n|]", a,b);
printf("%s %s",a,b);
Annotation:
%* : ignore this element.
E.g. %*s //skip the reading of the text of this one
%[character set(allow)] : Read only character set that you specify.
E.g. %[0123456789] or %[0-9] //Read as a string only numeric characters
%[^character set(denied)] : It is to mean character other than when ^ is specified at the beginning of the character set.

Yes, you can scan for a character set. The problem you're seeing is not related to the vertical bar, it's the fact that a string stops at the first whitespace character, i.e. the space between "TEST" and "ME|".
So, do something like:
if(scanf("%7[^|] | %7[^|]", a, b) == 2)
{
a[7] = b[7] = '\0';
printf("got '%s' and '%s'\n", a, b);
}
See the manual page for scanf() for details on the [ conversion specifier.

This one should work.
char a[200], b[200];
scanf ("%[^|]| %[^\n]", a, b); // Use it exactly
printf ("a = %s\nb = %s\n", a, b);
Meaning of this formatting. I seperate the format string into 3 parts and explain.
"%[^|]" - Scan everything into 1st string, until the bar character('|') appears.
"| " - Read the '|' and ignore it. Read all white space characters and ignore them.
"%[\n]" - Read remainder of the line into the 2nd string.
Test case
first string is this | 2nd is this
a = first string is this
b = 2nd is this
no space|between bar
a = no space
b = between bar

Leading spaces can be truncated by using extra local variable to store leading spaces.
%[ ] needs to be mentioned in scanf to store leading spaces
"%[ ]%[^\n]",first_string,second_string , mentioned scanf format specifier is to read two strings .
first_string contains leading spaces from given input string
second_string contains actual data without leading spaces.
Following is the sample code
int main()
{
char lVar[30];
char lPlaceHolder[30];
printf("\n Enter any string with leading spaces : ");
memset(lVar,'\0',30);
memset(lPlaceHolder,'\0',30);
scanf("%[ ]%[^\n]",lPlaceHolder,lVar);
printf("\n lPlaceHolder is :%s:\n",lPlaceHolder);
printf("\n lVar is :%s:\n",lVar);
return(0);
}
Input:
" hello world"
Output:
lPlaceHolder is : :
lVar is :hello world:
Note: Space not displayed properly for lPlaceHolder after uploading to stackover flow website

I'd say instead of messing with scanf(), try using saner functions - those that work as per the (intuitive) expectations:
char s1[] = "FOO | BAR";
char s2[] = "FOO BAR |";
void print_sep(char *in)
{
char *endp;
char *sep = strtok_r(in, "|", &endp);
printf("%s\n", sep);
if (sep = strtok_r(NULL, "|", &endp))
printf("%s\n", sep);
}
print_sep(s1);
print_sep(s2);

Related

Parsing file txt C

I guys i've this part of my code:
void token(){
FILE *pointer;
user record;
pointer = fopen("utente_da_file.txt","r+");
printf("OK");
fscanf(pointer , "%s, %s, %s, %s, %s \n" , record.nome_utente , record.nome , record.cognome , record.data_di_nascita , record.data_di_iscrizione);
fclose(pointer);
printf("TEXT -> %s \n" , record.nome_utente);
}
This is utente_da_file.txt
cocco,ananas,banana,ciao,miao
This is my output:
TEXT -> cocco,ananas,banana,ciao,miao
I don't understand why.
Greetings :)
This is due to the nature of %s parameter in scanf family: it consumes all characters up to the first white space character it encounters – or up to the end of input, whichever comes first (scanf - OK, C++ documentation, but applies for C alike). As you do not have any whitespace in your file, the entire content is consumed at once, including the commas, before you can scan for them in your format string...
You would get a hint for if you checked the return value of (f)scanf - it returns the number of variables filled, so you should have got 1 as return value.
Problem with (f)scanf family is that you cannot specify the delimiters for your strings to stop. So in your case, you will have to append white space in between the words of the file. But be aware that the comma will be part of the string then, if you append whitespace after them, you would have to append whitespace before so that your format string can consume them - this might make your file ugly, though, so you might prefer dropping it entirely then (but then drop them in the format string, too!).
Alternatively, you can read the entire line at once using fgets and then parse it using strtok. The whole procedure could look similar to the following piece of code:
char buffer[256];
fgets(buffer, sizeof(buffer), pointer);
char const* delimiters = ", \t\n\r";
char* token = strtok(buffer, delimiters);
if(token)
{
strncpy(record.nome_utente, token, sizeof(record.nome_utente));
if((token = strtok(NULL, delimiters)))
{
strncpy(record.nome, token, sizeof(record.nome));
// rest alike...
}
}
for me the best solution is to write thr C code in this way (a space between 2 %s):
fscanf(pointer , "%s %s %s %s %s \n" , record.nome_utente , record.nome , record.cognome , record.data_di_nascita , record.data_di_iscrizione);
and write your text file in this way (a space between two records):
cocco ananas banana ciao miao
In this way I'm sure it works well.
Ciao e buona fortuna.

Using sscanf to read strings

I am trying to save one character and 2 strings into variables.
I use sscanf to read strings with the following form :
N "OldName" "NewName"
What I want : char character = 'N' , char* old_name = "OldName" , char* new_name = "NewName" .
This is how I am trying to do it :
sscanf(mystring,"%c %s %s",&character,old_name,new_name);
printf("%c %s %s",character,old_name,new_name);
The problem is , my problem stops working without any outputs .
(I want to ignore the quotation marks too and save only its content)
When you do
char* new_name = "NewName";
you make the pointer new_name point to the read-only string array containing the constant string literal. The array contains exactly 8 characters (the letters of the string plus the terminator).
First of all, using that pointer as a destination for scanf will cause scanf to write to the read-only array, which leads to undefined behavior. And if you give a string longer than 7 character then scanf will also attempt to write out of bounds, again leading to undefined behavior.
The simple solution is to use actual arrays, and not pointers, and to also tell scanf to not read more than can fit in the arrays. Like this:
char old_name[64]; // Space for 63 characters plus string terminator
char new_name[64];
sscanf(mystring,"%c %63s %63s",&character,old_name,new_name);
To skip the quotation marks you have a couple of choices: Either use pointers and pointer arithmetic to skip the leading quote, and then set the string terminator at the place of the last quote to "remove" it. Another solution is to move the string to overwrite the leading quote, and then do as the previous solution to remove the last quote.
Or you could rely on the limited pattern-matching capabilities of scanf (and family):
sscanf(mystring,"%c \"%63s\" \"%63s\"",&character,old_name,new_name);
Note that the above sscanf call will work iff the string actually includes the quotes.
Second note: As said in the comment by Cool Guy, the above won't actually work since scanf is greedy. It will read until the end of the file/string or a white-space, so it won't actually stop reading at the closing double quote. The only working solution using scanf and family is the one below.
Also note that scanf and family, when reading string using "%s" stops reading on white-space, so if the string is "New Name" then it won't work either. If this is the case, then you either need to manually parse the string, or use the odd "%[" format, something like
sscanf(mystring,"%c \"%63[^\"]\" \"%63[^\"]\"",&character,old_name,new_name);
You must allocate space for your strings, e.g:
char* old_name = malloc(128);
char* new_name = malloc(128);
Or using arrays
char old_name[128] = {0};
char new_name[128] = {0};
In case of malloc you also have to free the space before the end of your program.
free(old_name);
free(new_name);
Updated:...
The other answers provide good methods of creating memory as well as how to read the example input into buffers. There are two additional items that may help:
1) You expressed that you want to ignore the quotation marks too.
2) Reading first & last names when separated with space. (example input is not)
As #Joachim points out, because scanf and family stop scanning on a space with the %s format specifier, a name that includes a space such as "firstname lastname" will not be read in completely. There are several ways to address this. Here are two:
Method 1: tokenizing your input.
Tokenizing a string breaks it into sections separated by delimiters. Your string input examples for instance are separated by at least 3 usable delimiters: space: " ", double quote: ", and newline: \n characters. fgets() and strtok() can be used to read in the desired content while at the same time strip off any undesired characters. If done correctly, this method can preserve the content (even spaces) while removing delimiters such as ". A very simple example of the concept below includes the following steps:
1) reading stdin into a line buffer with fgets(...)
2) parse the input using strtok(...).
Note: This is an illustrative, bare-bones implementation, sequentially coded to match your input examples (with spaces) and includes none of the error checking/handling that would normally be included.
int main(void)
{
char line[128];
char delim[] = {"\n\""};//parse using only newline and double quote
char *tok;
char letter;
char old_name[64]; // Space for 63 characters plus string terminator
char new_name[64];
fgets(line, 128, stdin);
tok = strtok(line, delim); //consume 1st " and get token 1
if(tok) letter = tok[0]; //assign letter
tok = strtok(NULL, delim); //consume 2nd " and get token 2
if(tok) strcpy(old_name, tok); //copy tok to old name
tok = strtok(NULL, delim); //consume 3rd " throw away token 3
tok = strtok(NULL, delim); //consume 4th " and get token 4
if(tok) strcpy(new_name, tok); //copy tok to new name
printf("%c %s %s\n", letter, old_name, new_name);
return 0;
}
Note: as written, this example (as do most strtok(...) implementations) require very narrowly defined input. In this case input must be no longer than 127 characters, comprised of a single character followed by space(s) then a double quoted string followed by more space(s) then another double quoted string, as defined by your example:
N "OldName" "NewName"
The following input will also work in the above example:
N "old name" "new name"
N "old name" "new name"
Note also about this example, some consider strtok() broken, while others suggest avoiding its use. I suggest using it sparingly, and only in single threaded applications.
Method 2: walking the string.
A C string is just an array of char terminated with a NULL character. By selectively copying some characters into another string, while bypassing the one you do not want (such as the "), you can effectively strip unwanted characters from your input. Here is an example function that will do this:
char * strip_ch(char *str, char ch)
{
char *from, *to;
char *dup = strdup(str);//make a copy of input
if(dup)
{
from = to = dup;//set working pointers equal to pointer to input
for (from; *from != '\0'; from++)//walk through input string
{
*to = *from;//set destination pointer to original pointer
if (*to != ch) to++;//test - increment only if not char to strip
//otherwise, leave it so next char will replace
}
*to = '\0';//replace the NULL terminator
strcpy(str, dup);
free(dup);
}
return str;
}
Example use case:
int main(void)
{
char line[128] = {"start"};
while(strstr(line, "quit") == NULL)
{
printf("Enter string (\"quit\" to leave) and hit <ENTER>:");
fgets(line, 128, stdin);
sprintf(line, "%s\n", strip_ch(line, '"'));
printf("%s", line);
}
return 0;
}

Hex calculator , parsing string using scanf [duplicate]

This question already has answers here:
scanning a string to hex char array
(3 answers)
Closed 8 years ago.
Here is my Code :
char a[18], b[18];
char oper, clear;
char *test;
init_8051();
test="0x1234567890123456 + 0x1234567890123456\0";
printf("Please enter an equation: %s \n",test );
sscanf(test,"0x%s %c 0x%s",a,&oper,b);
printf(" a= %s \n" ,a);
printf(" oper= %s \n" ,oper);
printf(" b= %s \n" ,b);
I want to accept to hex numbers with an operation as a string and to be able to seperate those 2 numbers into 2 separate char arrays but it doesnt wanna work, here is the output of the following code :
Please enter an equation: 0x1234567890123456 + 0x1234567890123456
a= 1234567890123456
oper= Ò
b= 1234567890123456
As you can see the operation is not recognized and also i have to use spaces which i wish i didnt have to use i wish it to be in the format of 0x1234567890123456+0x1234567890123456
with no spaces between the plus and the number.
Thanks
From the sscanf manual
s Matches a sequence of non-white-space characters; the next pointer must be a pointer to character array that is long enough to hold the input sequence and the
terminating null byte ('\0'), which is added automatically. The input string stops at white space or at the maximum field width, whichever occurs first.
It means that %s consumes the + and the rest of the characters, leaving b and oper uninitalized, abd overflowing a since it only has space for 18 characters.
So when the input string is lacking the space after the first operand, sscanf will continue reading until it finds a whitespace character. Hence when the string does not contain the separating space between the operands and the operator, sscanf consumes all the input.
I'll append here a different approach to your problems solution
We copy the string, this is required by strtok you can't pass an inmutable string, there are plenty of methods to copy this string, you just have to pick the appropriate one for your case
input = strdup("0x1234567890123456 + 0x1234567890123456\0");
Now, we use strpbrk to find the operator
pointer = strpbrk(input, "+-*/" /* here go the operators */);
if (pointer != NULL)
oper = *pointer; /* this will contain the operator ascii value */
Create a string containing the operator as a delimiter
operstr[0] = oper;
operstr[1] = '\0'; /* strings must be null terminated */
Now, we use strtok to tokenize the string, and find the operands
pointer = strtok(input, operstr);
if (pointer != NULL)
fprintf(stderr, "first operand: %s\n", pointer); /* you can copy this string if you need to */
printf("Operator: %s \n", operstr);
Second call to strtok needs NULL first argument
pointer = strtok(NULL, operstr);
if (pointer != NULL)
fprintf(stderr, "second operand: %s\n", pointer); /* you can copy this string if you need to */
And finally free our copy of the input string.
free(input);
It is better to use strtok_r the reentrant version. But for now you could test my suggestions and may be, it is what you need.
Even though this will work for this particular situation it is not the preferred way of doing this kind of thing, you can try writing a parser and use Reverse Polish Notation, or you can try with a lexical analyzer and a parser generator like flex and bison.
My previous answer was downvoted, and didn't address all of OP's requirements, so I have rewritten this answer.
OP wants flexible input, either spaces or no spaces. I suggest not using sscanf() but the methods below. First the program finds a valid operator by using strcspn(), then breaks the string using strtok() on operators and whitespace. But using strtok() on a string literal is UB so I copy the "equation" to another string first.
I also corrected the printf() field spec for the operator, and made a and b different - it's always a bad idea using the same values for different variables in an example.
#include <stdio.h>
#include <string.h>
#define OPERATORS "+-*/"
#define DELIMS " \t\n" OPERATORS
int parse (char *test)
// return 1 if parsed successfully
{
char a[50], b[50];
char oper;
char *ptr;
int opind;
opind = strcspn (test, OPERATORS); // find operator
if (opind < 1) return 0; // fail
oper = test[opind]; // collect operator
ptr = strtok (test, DELIMS); // find a
if (ptr == NULL) return 0; // fail
strcpy (a, ptr); // collect 1st arg
ptr = strtok (NULL, DELIMS); // find b
if (ptr == NULL) return 0; // fail
strcpy (b, ptr); // collect 2nd arg
printf(" a %s \n" ,a);
printf(" oper %c \n" ,oper); // corrected format
printf(" b %s \n" ,b);
return 1;
}
int main (void)
{
char test[100];
strcpy (test, "0x123456789ABCDEF0+0xFEDCBA9876543210");
if (!parse (test))
printf("Failed\n");
printf("\n");
strcpy (test, "0x123456789ABCDEF0 + 0xFEDCBA9876543210");
if (!parse (test))
printf("Failed\n");
return 0;
}
Program output
a 0x123456789ABCDEF0
oper +
b 0xFEDCBA9876543210
a 0x123456789ABCDEF0
oper +
b 0xFEDCBA9876543210

How to properly fetch data from tab separated fields in text file

I am trying to learn how to import data from tab separated fields in a text file. Here it is an example of what I am trying to fetch from an external file called users.in:
1 joshmith mypwd John Smith Awesome Road 103
2 jane_doe strongpwd Jane Doe Lucky Street 201
3 august84 goodpwd August May Red Boulevard 24
here it is the structure that is supposed to keep the data...
typedef struct User
{
int id;
char username[20];
char password[40];
char firstname[20];
char lastname[20];
char address[120];
} User;
... and of course the code that should handle the operation:
User *u = (User *)malloc(sizeof(User)*4);
int i = 0;
while (6 == fscanf(data_file, "%d\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\n", &(u+i)->id, (u+i)->username, (u+i)->password, (u+i)->firstname, (u+i)->lastname, (u+i)->address))
{
fprintf(stdout, "%d %s %s %s %s %s\n", (u+i)->id, (u+i)->username, (u+i)->password, (u+i)->firstname, (u+i)->lastname, (u+i)->address);
i++;
}
the loop manages to go through the first iteration... and then it stops. Here it is the output:
1 joshmith mypwd John Smith Awesome Road 103
2
can anyone help me figure out why is this happening? What is the proper way to import such formatted data?
I would use fgets to read each line into a string and then use strtok with \t as a delimiter character to extract the tokens; the first token in each line can be converted to a number using atoi.
NOTE: using atoi() means that an invalid number will be returned as a zaero value, so you cannot distinguish between these without extra logic
The problem with your format string is that the last scanset you're using is %[^\t] while most likely ends with a \n, although of course it could possibly be that it ends with a \t. If it is certain that it ends with a \n, then simply changing that last one should suffice:
"%d\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\n]\n"
// changed this ^ from t to n
If it may also be a \t, then you may use the following:
"%d\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\t]\t%[^\n\t]%*[\n\t]"
// %[^\n\t] discards and assigns whatever found until a '\t' or '\n' is encountered
// %*[\n\t] discards and only discards '\n's and '\t's
// ... until something else is encountered
As an additional information, a space ' ' inside a format string matches to zero or more of any whitespace character and discards them. It essentially is like %*[ \t\n] telling the -scanf to: match any (if any) ' ', '\t' and '\n' until you encounter something else and discard them.

String manipulation using strtok/ sscanf in C

I'm trying to separate the following string into three separate variables, i.e., a, b and c.:
" mov/1/1/1,0 STR{7}, r7"
each need to hold a different segment of the string, e.g:
a = "mov/1/1/1,0"
b = "STR{7}"
c = "r7"
There may be a space or also a tab between each command; this what makes this code part trickier.
I tried to use strtok, for the string manipulation, but it didn't work out.
char command[50] = " mov/1/1/1,0 STR{7}, r7";
char a[10], b[10], c[10];
char * ptr = strtok(command, "\t");
strcpy(a, ptr);
ptr = strtok(NULL, "\t");
strcpy(b, ptr);
ptr = strtok(NULL, ", ");
strcpy(c, ptr);
but this gets things really messy as the variables a, b and c get to hold more values than they should, which leads the program to crash.
Input may vary from:
" mov/1/1/1,0 STR{7}, r7"
"jsr /0,0 PRTSTR"
"mov/1/1/0,0 STRADD{5}, LASTCHAR {r3} "
in which the values of a,b and c change to different part of the given string.
I was told it is safer to use sscanf for that kind of manners than strtok, but I'm not sure why and how it could assist me.
I would be more than glad to hear your opinion!
This should do the trick :
sscanf(command, "%s,%s,%s", &a, &b, &c)
From scanf manpage, %s eats whitespaces, be them spaces or tabs :
s : Matches a sequence of non-white-space characters; the next pointer
must be a pointer to character array that is long enough to hold the
input sequence and the terminating null byte ('\0'), which is added
automatically. The input string stops at white space or at the
maximum field width, whichever occurs first.
As you might be knowing that you can use sscanf() the same way as scanf(), the difference is sscanf scans from string, while scanf from standard input.
In this problem you can specify scanf, with a set of characters to "always skip", as done in this link.
Since you have different set of constraints for scanning all the three strings, you can specify, using %*[^...], these constraints, before every %s inside sscanf().
I have reservations about using strtok(), but this code using it seems to do what you need. As I noted in a comment, the sample string "jsr /0,0 PRTSTR" throws a spanner in the works; it has a significant comma in the second field, whereas in the other two example strings, the comma in the second field is not significant. If you need to remove trailing commas, you can do that after the space-based splitting — as shown in this code. The second loop tests the zap_trailing_commas() function to ensure that it behaves under degenerate cases, zapping trailing commas but not underflowing the start of the buffer or anything horrid.
#include <stdio.h>
#include <string.h>
static void zap_trailing_commas(char *str)
{
size_t len = strlen(str);
while (len-- > 0 && str[len] == ',')
str[len] = '\0';
}
static void splitter(char *command)
{
char a[20], b[20], c[20];
char *ptr = strtok(command, " \t");
strcpy(a, ptr);
zap_trailing_commas(a);
ptr = strtok(NULL, " \t");
strcpy(b, ptr);
zap_trailing_commas(b);
ptr = strtok(NULL, " \t");
strcpy(c, ptr);
zap_trailing_commas(c);
printf("<<%s>> <<%s>> <<%s>>\n", a, b, c);
}
int main(void)
{
char data[][50] =
{
" mov/1/1/1,0 STR{7}, r7",
"jsr /0,0 PRTSTR",
"mov/1/1/0,0 STRADD{5}, LASTCHAR {r3} ",
};
for (size_t i = 0; i < sizeof(data)/sizeof(data[0]); i++)
splitter(data[i]);
char commas[][10] = { "X,,,", "X,,", "X,", "X" };
for (size_t i = 0; i < sizeof(commas)/sizeof(commas[0]); i++)
{
printf("<<%s>> ", commas[i]);
zap_trailing_commas(&commas[i][1]);
printf("<<%s>>\n", commas[i]);
}
return 0;
}
Sample output:
<<mov/1/1/1,0>> <<STR{7}>> <<r7>>
<<jsr>> <</0,0>> <<PRTSTR>>
<<mov/1/1/0,0>> <<STRADD{5}>> <<LASTCHAR>>
<<X,,,>> <<X>>
<<X,,>> <<X>>
<<X,>> <<X>>
<<X>> <<X>>
I also tested a variant with commas in place of the X's and that left the single comma alone.

Resources