Shall converversion specifier "%%" match white spaces

Shall converversion specifier "%%" match white spaces - c

According to the C standard the conversion specifier % is defines as:
% Matches a single % character; no conversion or assignment occurs. The
complete conversion specification shall be %%.
However this code:
int main(int argc, char* argv[])
{
int n;
printf("%d\n", sscanf(" %123", "%% %d", &n));
return 0;
}
compiled with gcc-11.1.0 gives the output 1 so apparently %% matched the " %" of the string.
This seems to be a violation of "Matches a single % character" as it also accepted the spaces in front of the % character.
Question: Is it correct according to the standard to accept white spaces as part of %% directive?

According to the C89 Standard, at least, "Input white-space characters [...] are skipped, unless the specification includes a [, c, or n specifier." (That's an old version of the Standard, but it's the one I had handy. But I don't imagine this has changed in more recent versions.)

I looked at the final C17 draft and there's actually a specific example showing that %% skips whitespace:
EXAMPLE 5 The call:
#include <stdio.h>
/* ... */
int n, i;
n = sscanf("foo %bar 42", "foo%%bar%d", &i);
will assign to n the value 1 and to i the value 42 because input
white-space characters are skipped for both the % and d conversion
specifiers.

Related

Printing Japanese characters in C program

I want to print Japanese characters using the C program.
I've found the Unicode range of some Japanese characters, converted them to decimal and used the for loop to print them:
setlocale(LC_ALL, "ja_JP.UTF8");
for (int i = 12784; i <= 12799; i++) {
printf("%c\n",i);
}
locale.h and wchar.h are present in the header.
The output gives me only ?????????? characters.
Please let me know how it could be resolved.

%c is only able to print characters from 0 to 127, for extended characters use:
printf("%lc\n", i);
or better yet
wprintf(L"%lc\n", i);

In addition #David Ranieri fine answer, I wanted to explain about the "output gives me only ?????????? characters."
"%c" accepts an int argument. Recall a char passed to a ... function is converted to an int. Then
the int argument is converted to an unsigned char, and the resulting character is written. C17dr § 7.21.6.1 8.
Thus printf("%c" ... handles values 0-255. Values outside that range being converted to that range.
OP's code below re-written in hex.
// for (int i = 12784; i <= 12799; i++) {
for (int i = 0x31F0; i <= 0x31FF; i++) {
printf("%c\n",i);
}
With OP locale setting and implementation, printing values [0xF0 - 0XFF] resulted in '?'. I am confident that is true for [0x80 - 0xFF] for OP. Other possibilities exist. I received �.
Had OP done the below, more familiar output would be seen, though not the Hiragana characters desired.
for (int i = 0x3041; i <= 0x307E; i++) {
printf("%c",i);
}
ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Lex/flex program to count ids, statements, keywords, operators etc

%{
#undef yywrap
#define yywrap() 1
#include<stdio.h>
int statements = 0;
int ids = 0;
int assign = 0;
int rel = 0;
int keywords = 0;
int integers = 0;
%}
DIGIT [0-9]
LETTER [A-Za-z]
TYPE int|char|bool|float|void|for|do|while|if|else|return|void
%option yylineno
%option noyywrap
%%
\n {statements++;}
{TYPE} {/*printf("%s\n",yytext);*/keywords++;}
(<|>|<=|>=|==) {rel++;}
'#'/[a-zA-Z0-9]* {;}
[a-zA-Z]+[a-zA-Z0-9]* {printf("%s\n",yytext);ids++;}
= {assign++;}
[0-9]+ {integers++;}
. {;}
%%
void main(int argc, char **argv)
{
FILE *fh;
if (argc == 2 && (fh = fopen(argv[1], "r"))) {
yyin = fh;
}
yylex();
printf("statements = %d ids = %d assign = %d rel = %d keywords = %d integers = %d \n",statements,ids,assign,rel,keywords,integers);
}
//Input file.c
#include<stdio.h>
void main(){
float a123;
char a;
char b123;
char c;
int ab[5];
int bc[2];
int ca[7];
int ds[4];
for( a = 0; a < 5 ;a++)
printf("%d ", a);
return 0;
}
output:
include
stdio
h
main
a123
a
b123
c
ab
bc
ca
ds
a
a
a
printf
d
a
statements = 14 ids = 18 assign = 1 rel = 3 keywords = 11 integers = 7
I am printing the identifiers on the go. #include<stdio.h> is being counted as identifier. How do I avoid this?
I have tried '#'/[a-zA-Z0-9]* {;} rule:action pair but it is still being counted as identifier. How is the file being tokenized?
Also the %d string in printf is being counted as an identifier. I have explicitly written that identifiers should only begin with letters, then why is %d being inferred as identifier?

I have tried '#'/[a-zA-Z0-9]* {;} rule:action pair but it [include] is still being counted as identifier. How is the file being tokenized?
Tokens are recognized one at a time. Each token starts where the previous token finished.
'#'/[a-zA-Z0-9]* matches '#' provided it is followed by [a-zA-Z0-9]*. You probably meant "#"/[a-zA-Z0-9]* (with double quotes) which would match a #, again provided it is followed by a letter or digit. Note that only the # is matched; the pattern after the / is "trailing context", which is basically a lookahead assertion. In this case, the lookahead is pointless because [a-zA-Z0-9]* can match the empty string, so any # would be matched. In any event, after the # is matched as a token, the scan continues at the next character. So the next token would be include.
Because of the typo, that pattern does not match. (There are no apostrophes in the source.) So what actually matches is your "fallback" rule: the rule whose pattern is .. (We call this a fallback rule because it matches anything. Really, it should be .|\n, since . matches anything but a newline, but as long as you have some rule which matches a newline character, it's acceptable to use .. If you don't supply a fallback rule, one will be inserted automatically by flex with the action ECHO.)
Thus, the # is ignored (just as it would have been if you'd written the rule as intended) and again the scan continues with the token include.
If you wanted to ignore the entire preprocessor directive, you could do something like
^[[:blank:]]#.* { ; }
(from a comment) I am getting stdio and h as keywords, how does that fit the definition that I have given? What happened to the . in between?
After the < is ignored by the fallback rule, stdio is matched. Since [a-zA-Z]+[a-zA-Z0-9]* doesn't match anything other than letters and digits, the . is not considered part of the token. Then the . is matched and ignored by the fallback rule, and then h is matched.
Also the %d string in printf is being counted as an identifier.
Not really. The % is explicitly ignored by the fallback rule (as was the ") and then the d is marched as an identifier. If you want to ignore words in string literals, you will have to recognise and ignore string literals.

The #include directive is a preprocessor directive and is thus preprocessed by the preprocessor. The preprocessor includes the header file and removes the #include directive And thus after preprocessing when the program is given to compiler as input it doesn't have any preprocessor directives like #include.
So you don't need to write code to detect #include because neither compiler ever sees it nor it is designed to tokenize #include directive.
References: Is #include a token of type keyword?

adding the following line in the rules section works for me:
#.* ;
Here rule is #.* and action is ;. The #.* will catch the line starting with # and ; will just do nothing so basically this would ignore the line starting with #.

What does the n stand for in `sscanf(s, "%d %n", &i, &n)`?

The man page states that the signature of sscanf is
sscanf(const char *restrict s, const char *restrict format, ...);
I have seen an answer on SO where a function in which sscanf is used like this to check if an input was an integer.
bool is_int(char const* s) {
int n;
int i;
return sscanf(s, "%d %n", &i, &n) == 1 && !s[n];
}
Looking at !s[n] it seems to suggest that we check if sscanf scanned the character sequence until the termination character \0. So I assume n stands for the index where sscanf will be in the string s when the function ends.
But what about the variable i? What does it mean?
Edit:
To be more explicit: I see the signature of sscanf wants a pointer of type char * as first parameter. A format specifier as seconf parameter so it knows how to parse the character sequence and as much variables as conversion specifiers as next parameters. I understand now that i is for holding the parsed integer.
Since there is only one format specifier, I tried to deduce the function of n.
Is my assumption above for n correct?

Looks like the op has his answer already, but since I bothered to look this up for myself and run the code...
From "C The Pocket Reference" (2nd Ed by Herbert Shildt) scanf() section:
%n Receives an integer of value equal to the number of characters read so far
and for the return value:
The scanf() function returns a number equal to the number of the number of fields
that were successfully assigned values
The sscanf() function works the same, it just takes it's input from the supplied buffer argument ( s in this case ). The "== 1" test makes sure that only one integer was parsed and the !s[n] makes sure the input buffer is well terminated after the parsed integer and/or that there's really only one integer in the string.
Running this code, an s value like "32" gives a "true" value ( we don't have bool defined as a type on our system ) but s as "3 2" gives a "false" value because s[n] in that case is "2" and n has the value 2 ( "3 " is parsed to create the int in that case ). If s is " 3 " this function will still return true as all that white space is ingored and n has the value of 3.
Another example input, "3m", gives a "false" value as you'd expect.

Verbatim from sscanf()'s man page:
Conversions
[...]
n
Nothing is expected; instead, the number of characters
consumed thus far from the input is stored through the next pointer,
which must be a pointer to int. This is not a
conversion, although it can be suppressed with the * assignment-suppression character. The C
standard says: "Execution of
a %n directive does not increment the assignment count returned at the completion of
execution" but the Corrigendum seems to contradict this. Probably it is wise not
to make any assumptions on the effect of %n conversions on the return value.

I would like to point out that the original code is buggy:
bool is_int(char const* s) {
int n;
int i;
return sscanf(s, "%d %n", &i, &n) == 1 && !s[n];
}
I will explain why. And I will interpret the sscanf format string.
First, buggy:
Given input "1", which is the integer one, sscanf will store 1 into i. Then, since there is no white space after, sscanf will not touch n. And n is uninitialized. Because sscanf set i to 1, the value returned by sscanf will be 1, meaning 1 field scanned. Since sscanf returns 1, the part of the expression
sscanf(s, "%d %n", &i, &n) == 1
will be true. Therefore the other part of the && expression will execute. And s[n] will access some random place in memory because n is uninitialized.
Interpreting the format:
"%d %n"
Attempts to scan a number which may be a decimal number or an integer or a scientific notation number. The number is an integer, it must be followed by at least one white space. White space would be a space, \n, \t, and certain other non-printable characters. Only if it is followed by white space will it set n to the number of characters scanned to that point, including the white space.
This code might be what is intended:
static bool is_int(char const* s)
{
int i;
int fld;
return (fld = sscanf(s, "%i", &i)) == 1;
}
int main(int argc, char * argv[])
{
bool ans = false;
ans = is_int("1");
ans = is_int("m");
return 0;
}
This code is based on, if s is an integer, then sscanf will scan it and fld will be exactly one. If s is not an integer, then fld will be zero or -1. Zero if something else is there, like a word; and -1 if nothing is there but an empty string.

variable i there means until it has read an integer vaalue.
what are you trying to ask though? Its not too clear! the code will (try to ) read an integer from the string into 'i'

sscanf function usage in c

I'm trying to parse xxxxxx(xxxxx) format string using sscanf as following:
sscanf(command, "%s(%s)", part1, part2)
but it seems like sscanf does not support this format and as a result, part1 actually contains the whole string.
anyone has experience with this please share...
Thank you

Converting your code into a program:
#include <stdio.h>
int main(void)
{
char part1[32];
char part2[32];
char command[32] = "xxxxx(yyyy)";
int n;
if ((n = sscanf(command, "%s(%s)", part1, part2)) != 2)
printf("Problem! n = %d\n", n);
else
printf("Part1 = <<%s>>; Part2 = <<%s>>\n", part1, part2);
return 0;
}
When run, it produces 'Problem! n = 1'.
This is because the first %s conversion specifier skips leading white space and then scans for 'non white-space' characters up to the next white space character (or, in this case, end of string).
You would need to use (negated) character classes or scansets to get the result you want:
#include <stdio.h>
int main(void)
{
char part1[32];
char part2[32];
char command[32] = "xxxxx(yyyy)";
int n;
if ((n = sscanf(command, "%31[^(](%31[^)])", part1, part2)) != 2)
printf("Problem! n = %d\n", n);
else
printf("Part1 = <<%s>>; Part2 = <<%s>>\n", part1, part2);
return 0;
}
This produces:
Part1 = <<xxxxx>>; Part2 = <<yyyy>>
Note the 31's in the format; they prevent overflows.
I'm wondering how does %31 works. Does it work as %s and prevent overflow or does it just prevent overflow?
With the given data, these two lines are equivalent and both safe enough:
if ((n = sscanf(command, "%31[^(](%31[^)])", part1, part2)) != 2)
if ((n = sscanf(command, "%[^(](%[^)])", part1, part2)) != 2)
The %[...] notation is a conversion specification; so is %31[...].
The C standard says:
Each conversion specification is introduced by the character %.
After the %, the following appear in sequence:
An optional assignment-suppressing character *.
An optional decimal integer greater than zero that specifies the maximum field width
(in characters).
An optional length modifier that specifies the size of the receiving object.
A conversion specifier character that specifies the type of conversion to be applied.
The 31 is an example of the (optional) maximum field width. The [...] part is a scanset, which could perhaps be regarded as a special case of the s conversion specifier. The %s conversion specifier is approximately equivalent to %[^ \t\n].
The 31 is one less than the length of the string; the null at the end is not counted in that length. Since part1 and part2 are each an array of 32 char, the %31[^(] or %31[^)] conversion specifiers prevent buffer overflows. If the first string of characters was more than 31 characters before the (, you'd get a return value of 1 because of a mismatch on the literal open parenthesis. Similarly, the second string would be limited to 31 characters, but you'd not easily be able to tell whether the ) was in the correct place or not.

If you know exactly how long are the parts of your "command", then the simplest option is:
sscanf(command, "%6s(%5s)", part1, part2);
This assumes that 'part1' is always 6 characters long and 'part2' is always 5 characters long (as in your code sample).

Try this instead:
#include <stdio.h>
int main(void)
{
char str1[20];
char str2[20];
sscanf("Hello(World!)", "%[^(](%[^)])", str1, str2);
printf("str1=\"%s\", str2=\"%s\"\n", str1, str2);
return 0;
}
Output (ideone):
str1="Hello", str2="World!"

In gcc How do I pad a numeric string with leading 0's?

I have the following code
#include <stdio.h>
#include <stdlib.h>
char UPC[]="123456789ABC";
main()
{
int rc=0;
printf("%016s\n",UPC);
exit(rc);
}
On AIX using the xlC compiler this code prints out with leading 0's
0000123456789ABC
On Sles 11 it prints leading spaces using gcc version 4.3.2 [gcc-4_3-branch revision 141291]
123456789ABC
Is there some format specifier I could use for strings to print out with leading 0's?
I know that it works for numeric types.

printf("%.*d%s", (int)(w-strlen(s)), 0, s);

This behaviour is undefined, implementation specific (glibc, rather than gcc). It's bad practice to rely on it, IMO.
If you know for certain that your string is numeric (hexadecimal here), you could write:
printf("%016llX\n",strtoll(UPC,NULL,16));
But be aware of errors and overflows.
Edit by original poster:
For decimal numbers use the following:
printf("%016llu\n",strtoll(UPC2,NULL,10));

As far as printf is concerned, the effect of the 0 flag on the s conversion specifier is undefined. Are you restricted to printf?
[edit]
Here's an alternative, if you'd like (most error checks missing..):
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
char UPC[] = "1234567890ABCDEF",
out[sizeof UPC + 12]; /* substitute size for whatever you wish here, just make sure the original string plus NUL fits */
if(sizeof out <= sizeof UPC) {
return -1; /* you bad, bad man */
}
memset(out, '0', sizeof out);
memcpy(out + sizeof out - sizeof UPC - 1, UPC, sizeof UPC);
printf("%s\n", out);
return 0;
}

%0*s isn't standard. You can do your own padding though:
char buf[16];
memset(buf, '0', sizeof(buf));
printf("%.*s%s", sizeof(buf) - strlen(UPC), buf, UPC);

According to the printf man page, the 0 flag has undefined behavior for string arguments. You'd need to write your own code to pad out the right number of '0' characters.

The short answer is no.
Your code, when compiled, will result in : "warning: '0' flag used with ‘%s’"
The man page for printf lists the format specifiers that may be used after a '0' flag and they are all numeric.
You could, however, create a string with the appropriate number of spaces and print that ahead of your UPC.

char spbuf[30];
sprintf(spbuf, "%%.%ds%%s\n", 16 - strlen(UPC));
printf(spbuf,"0000000000000000",UPC); /* 16 '0' characters */

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Shall converversion specifier "%%" match white spaces - c

According to the C89 Standard, at least, "Input white-space characters [...] are skipped, unless the specification includes a [, c, or n specifier." (That's an old version of the Standard, but it's the one I had handy. But I don't imagine this has changed in more recent versions.)

Related

Printing Japanese characters in C program

Lex/flex program to count ids, statements, keywords, operators etc

What does the n stand for in `sscanf(s, "%d %n", &i, &n)`?

sscanf function usage in c

In gcc How do I pad a numeric string with leading 0's?

Categories

Resources