Macro function for printing with UB - c

I'm learning how to use macro functions and now faced some (most likely undefined) behavior. Here is an example:
#include <stdio.h>
#define FOO(a, b) { \
printf("%s%s\n", #a #b); \
} \
int main(int argc, char * argv[]){
{ printf("%s%s\n", 1 2); } //compile error
FOO(1, 2); //prints 12 with some garbage
}
Demo1
Demo2
I'm most likely experiencing UB, but digging into the N1570 did not give the clear explanation of this. The closest thing to this that I found was 5.1.1.2(p4):
Preprocessing directives are executed, macro invocations are expanded,
and _Pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal character name is
produced by token concatenation (6.10.3.3), the behavior is undefined.
Probably tokens "1" "2" were concatenated yielding UB, but I'm not sure.

Probably tokens "1" "2" were concatenated yielding UB, but I'm not sure.
You are correct.
"1" and "2" became "12", and went to the first %s in printf(). Then, the second %s has nothing to process, thus the garbage values.
The compiler warnings agree too (of course):
prog.cc:4:12: warning: format '%s' expects a matching 'char*' argument [-Wformat=]
4 | printf("%s%s\n", #a #b); \
| ^~~~~~~~
prog.cc:9:5: note: in expansion of macro 'FOO'
9 | FOO(1, 2); //prints 12 with some garbage
| ^~~
prog.cc:4:16: note: format string is defined here
4 | printf("%s%s\n", #a #b); \
| ~^
| |
| char*
In you Macro, change this:
printf("%s%s\n", #a #b);
to this:
printf("%s%s\n", #a, #b);
where the comma will do the trick, as #Blaze commented. Live Demo
Note: For the hardcoded printf() call to work, you would like to make 1 and 2 strings; using a comma would not suffice. Example: printf("%s%s\n", "1", "2");.

FOO expands to printf("%s%s\n", "1" "2"). The string literals are concatenated during preprocessing, yielding printf("%s%s\n", "12").
This is not a correct call to printf and UB. The relevant part in the standard is this:
7.21.6.1 The fprintf function
...
2 ... If there are insufficient arguments for the format, the behavior is undefined.

Related

C lexer: token concatenation of unterminated string literals

Consider the following c code:
#include <stdio.h>
#define PRE(a) " ## a
int main() {
printf("%s\n", PRE("));
return 0;
}
If we adhere strictly to the tokenization rules of c99, I would expect it to break up as:
...
[#] [define] [PRE] [(] [a] [)] ["]* [##] [a]
...
[printf] [(] ["%s\n"] [,] [PRE] [(] ["]* [)] [)] [;]
...
* A single non-whitespace character that does not match any preprocessing-token pattern
And thus, after running preprocessing directives, the printf line should become:
printf("%s\n", "");
And parse normally. But instead, it throws an error when compiled with gcc, even when using the flags -std=c99 -pedantic. What am I missing?
From C11 6.4. Lexical elements:
preprocessing-token:
header-name
identifier
pp-number
character-constant
string-literal
punctuator
each non-white-space character that cannot be one of the above
3 [...] The categories of preprocessing tokens are: header names,
identifiers, preprocessing numbers, character constants, string
literals, punctuators, and single non-white-space characters that do
not lexically match the other preprocessing token categories.69) If a
' or a " character matches the last category, the behavior is
undefined. [...]
So if " is not part of a string-literal, but is a non-white-space character, the behavior is undefined. I do not know why it's undefined and not a hard error - I think it's to allow compilers to parse multiline string literals.
it throws an error
But on godbolt:
<source>:3:16: warning: missing terminating " character
3 | #define PRE(a) " ## a
| ^
<source>:6:24: warning: missing terminating " character
6 | printf("%s\n", PRE("));
| ^
<source>:8: error: unterminated argument list invoking macro "PRE"
8 | }
|
<source>: In function 'int main()':
<source>:6:20: error: 'PRE' was not declared in this scope
6 | printf("%s\n", PRE("));
| ^~~
<source>:6:20: error: expected '}' at end of input
<source>:5:12: note: to match this '{'
5 | int main() {
| ^
it throws an error not on #define PRE line (it could), but on PRE(") line. Tokens are recognized before macro substitutions (phase 3 vs phase 4), so whatever you do you can't like "create" new lexical string literals as a result of macro substitution by for example gluing two macros or like you want to do. Note that -pedantic will not change the warning into error - -pedantic throws errors where standard tells to throw error, but the standard tells the behavior is undefined, so no error is needed there.

C macro stringifying a variable

When can I pass variable's value to a macro for stringifying?
For example the code taken from this post works with a constant-defined macro.
#define MAX_STRING_LENGTH 20
#define STRINGIFY(x) STRINGIFY2(x)
#define STRINGIFY2(x) #x
{
...
char word[MAX_STRING_LENGTH+1];
scanf("%" STRINGIFY(MAX_STRING_LENGTH) "s", word);
...
}
However I cannot use it with a variable such as:
{
...
int val = 20;
char word[MAX_STRING_LENGTH+1];
scanf("%" STRINGIFY(val) "s", word);
...
}
since the compilation is successful with this warning:
warning: invalid conversion specifier 'v' [-Wformat-invalid-specifier]
scanf("%" STRINGIFY(var) "s", word);
~~~^~~~~~~~~~~~~
test2.c:4:22: note: expanded from macro 'STRINGIFY'
#define STRINGIFY(x) STRINGIFY2(x)
^
test2.c:5:23: note: expanded from macro 'STRINGIFY2'
#define STRINGIFY2(x) #x
^
<scratch space>:466:2: note: expanded from here
"var"
^
1 warning generated
but the run of the code does not wait for any input.
On the contrary in this other post it was possible to pass a variable to this macro:
#define PRINT(int) printf(#int "%d\n",int)
...
int var =8;
PRINT(var);
What is the difference between the two cases? How can I modify the first one so that it accepts also variables?
I tried using %d inside the macro but I was not successful.
The preprocessor always operates on tokens only.
A macro is not a function. You don't pass it a variable (by value). You pass a token sequence. In STRINGIFY(MAX_STRING_LENGTH) the token sequence is MAX_STRING_LENGTH, and in STRINGIFY(val) it's the token sequence val.
MAX_STRING_LENGTH is itself a macro, and due to how STRINGIFY is defined to work, the macro will be expanded by the preprocessor before turning it a string literal. So 20 is in turn the token which gets # applied to it, and it produces "20" as a string literal.
On the other hand val is not a macro, the preprcosseor is not going to expand it. It's going to keep the token sequence as val. The fact val is the name of a variable with some value means nothing to the preprocessor, it only cares about tokens. So val is transformed into the literal "val".
The example you brought from another post worked because it expanded to this:
printf("var" "%d\n", var);
The variable name in #int turns into a literal, there is no magic that lets the preprocessor read a variable's value. The fact var 8 is printed is only because var is passed as an argument to printf! It's printed at run-time by the %d specifier.
Finally, when experimenting with the preprcoessor it's always helpful to look at the source file after prpeprocessing is done, but before the file is compiled. The gcc -E flag (or equivalent for your compiler) can help you do that.
STRINGIFY(val) will result in "val", not the value you wanted to stringify, so you get a final format string of "%vals" ("%" "val" "s"). That's how the C preprocessor works, it does just text replacements, nothing more.
The PRINT example:
#define PRINT(int) printf(#int "%d\n", int)
PRINT(var); // to be resolved
printf(#var "%d\n", var); // intermediate result
printf("var" "%d\n", var); // final result, this is what the C compiler sees
But why did it work with MAX_STRING_LENGTH?
#define MAX_STRING_LENGTH 20
#define STRINGIFY(x) STRINGIFY2(x)
#define STRINGIFY2(x) #x
STRINGIFY(MAX_STRING_LENGTH) // to be resolved
STRINGIFY2(20) // intermediate step; STRINGIFY2 known as macro, thus:
#20 // another intermediate step
"20" // final result

Printf format specifier and fields outside quotes

I came across the following code:
#define ERROR 0
#define WARN 1
#define INFO 2
#define DEBUG 3
extern int log_level;
char const *LEVEL_TO_STRING[] = { "ERROR", "WARN", "INFO", "DEBUG" };
#define LOG(level, s, ...) \
do \
{ \
if(level <= log_level) \
printf( "[%s] " s "\n", LEVEL_TO_STRING[level], ##__VA_ARGS__) \
} \
while(0) \
I do not understand what the s is doing outside the quotes in the printf statement. I tried searching for what this is and how it works, but I'm not sure what to look for. Could someone explain to me how this code works?
As a follow-up, is it possible to write code like the example above outside a macro? The closest I've seen to this is using format specifiers:
#define FORMAT "ld"
long num = 1000000;
printf("%" FORMAT "\n", num);
It would help to understand how these two cases work internally, and why C does not let me do something like, printf("%s" s "\n", string1, string2) as is done in the macro above.
EDIT : Not a clean dup of How does concatenation of two string literals work? because this post is specific to printf (and format specifiers) as it relates to macros. Also, there is useful information in the responses to this post that isn't available in the other.
I do not understand what the s is doing outside the quotes in the printf statement
In order to see what happens you need to recall that s is replaced with the second parameter of LOG macro in the text of the program. The only way that this could work is when s is a string literal, because C merges them. In other words, there is no difference between
"quick brown fox"
and
"quick" " brown " "fox"
These two forms of writing a string literal are equivalent.
In the same way, passing "ld" to FORMAT in
printf("%" FORMAT "\n", num);
is equivalent to
printf("%ld\n", num);
and is legal.
why C does not let me do something like, printf("%s" s "\n", string1, string2) as is done in the macro above?
Passing anything other than a string literal is illegal:
char FORMAT[] = "ld";
printf("%" FORMAT "\n", num); // <<== Does not compile
s and FORMAT in your code must be not just strings, but string literals:
#define s "[%s]"
...
printf("%s" s "\n", string1, string2); // This compiles
"[%s] " s "\n"
when s is defined as a macro ie using #define would concatenate everything together.
As the substitution happens during the preprocessing, it won't be flagged as an error. In all other cases, you should get a syntax error.
The key is the line continuation '\' at the end of the definition. The code defines a macro function LOG which does the specified logging.
Apparently the user of the macro can specify their own formatted string in s and give the arguments in ... -> ##__VA_ARGS_

Printing define name c

#define Page 5
void printSystemInfo() {
printf ("%i", Page);
}
Thats my code can anyone explain me how to print Page 5 in the console?
For now my console looks like this "5" But I want to have "Page 5"
Thanks for helping !
You can use a little preprocessor trick. We have the # operator, which will convert a symbol into a string.
#define _(a) #a
When you call _(foo), it translates it as "foo". So, in your case, you could do something like:
#include <stdio.h>
#define _(a) # a
#define PAGE 5
int main(int argc, char *argv[])
{
printf("%s: %i\n", _(PAGE), PAGE);
return 0;
}
What this will do is:
We define a macro named _ that takes one parameter a. This macro uses the operator # from the preprocessor (called stringification). This will case a named passed to the macro to be converted into a string. Example: _(foo) gets translated to "foo".
In main, the printf() call is then translated as printf("%s: %i\n", "PAGE", 5);. In a stepwise way, when the preprocessor sees the _(PAGE) symbol, it translates it as "PAGE".
The inner workings of this things is explained in the above link, which I quote (my markings):
Sometimes you may want to convert a macro argument into a string constant. Parameters are not replaced inside string constants, but you can use the ‘#’ preprocessing operator instead. When a macro parameter is used with a leading ‘#’, the preprocessor replaces it with the literal text of the actual argument, converted to a string constant. Unlike normal parameter replacement, the argument is not macro-expanded first. This is called stringification.
Here you go. This is very trivial stuff, but please ask if something is unclear.
#define Page 5
void printSystemInfo()
{
printf((char const[])??<0120,0141,0147,0145,0040,0045,0151,!"bad"??>,Page);
}

Having troubles formatting scanf while keeping my code understandable

I need a function to read a file name, with a max length of MAX_FILE_NAME_SIZE, which is a symbolic constant, I did this the following way:
char * readFileName()
{
char format[6];
char * fileName = malloc(MAX_FILE_NAME_SIZE * sizeof(fileName[0]));
if(fileName== NULL)
return NULL;
sprintf(format, "%%%ds", MAX_FILE_NAME_SIZE-1);
scanf(format, fileName);
fileName= realloc(fileName, strlen(fileName)*sizeof(fileName[0]));
return fileName;
}
I'd really like to get read of the sprintf part (and also the format vector), what's the cleanest and most efficient way to do this?
Solution
You can make a little Preprocessor hack:
#define MAX_BUFFER 30
#define FORMAT(s) "%" #s "s"
#define FMT(s) FORMAT(s)
int main(void)
{
char buffer[MAX_BUFFER + 1];
scanf(FMT(MAX_BUFFER), buffer);
printf("string: %s\n", buffer);
printf("length: %d\n", strlen(buffer));
return 0;
}
The FORMAT and FMT macros are necessary for the preprocessor to translate them correctly. If you call FORMAT directly with FORMAT(MAX_BUFFER), it will translate into "%" "MAX_BUFFER" "s" which is no good.
You can verify that using gcc -E scanf.c. However, if you call it through another macro, which will effectively resolve the macro names for you and translate to "%" "30" "s", which is a fine format string for scanf.
Edit
As correctly pointed out by #Jonathan Leffler in the comments, you can't do any math on that macro, so you need to declare buffer with plus 1 character for the NULL terminating byte, since the macro expands to %30s, which will read 30 characters plus the null byte.
So the correct buffer declaration should be char buffer[MAX_BUFFER + 1];.
Requested Explanation
As asked in the comments, the one macro version won't work because the preprocessor operator # turns an argument into a string (stringification, see bellow). So, when you call it with FORMAT(MAX_BUFFER), it just stringifies MAX_BUFFER instead of macro-expanding it, giving you the result: "%" "MAX_BUFFER" "s".
Section 3.4 Stringification of the C Preprocessor Manual says this:
Sometimes you may want to convert a macro argument into a string constant. Parameters are not replaced inside string constants, but you can use the ‘#’ preprocessing operator instead. When a macro parameter is used with a leading ‘#’, the preprocessor replaces it with the literal text of the actual argument, converted to a string constant. Unlike normal parameter replacement, the argument is not macro-expanded first. This is called stringification.
This is the output of the gcc -E scanf.c command on a file with the one macro version (the last part of it):
int main(void)
{
char buffer[30 + 1];
scanf("%" "MAX_BUFFER" "s", buffer);
printf("string: %s\n", buffer);
printf("length: %d\n", strlen(buffer));
return 0;
}
As expected. Now, for the two levels, I couldn't explain better than the documentation itself, and in the last part of it there's an actual example of this specific case (two macros):
If you want to stringify the result of expansion of a macro argument, you have to use two levels of macros.
#define xstr(s) str(s)
#define str(s) #s
#define foo 4
str (foo)
==> "foo"
xstr (foo)
==> xstr (4)
==> str (4)
==> "4"
s is stringified when it is used in str, so it is not macro-expanded first. But s is an ordinary argument to xstr, so it is completely macro-expanded before xstr itself is expanded (see Argument Prescan). Therefore, by the time str gets to its argument, it has already been macro-expanded.
Resource
The C Preprocessor

Resources