Meanning of %! and %K with printf - c

I try to rewrite printf function and i found a strange result when use format specifier (%) with ! or K. I want to understand why i get this result.
printf("%!");
printf("%K")
I get output ! and K.
Thank for all response

According to §7.21.6.1 ¶9 of the ISO C11 standard, using an invalid conversion specification will result in undefined behavior.
Therefore, you cannot rely on any specific behavior. Anything may happen. On different compilers, the behavior may be different. Even on the same compiler the behavior may change, if you for example update the compiler to a different version or compile with a different optimization level.
If you want the behavior to be well-defined, so that you can rely on a specific behavior, then you should only use valid conversion specifiers. You can use the %% conversion specification to print a literal %.

printf("%!"); printf("%K") are both undefined behavior. Any result is possible.
As part of OP's rewrite, code can do anything in these cases.

Related

Choose the lesser evil incorrect Printf() statements: Fewer parameters vs extra parameters

A. printf("Values: X=%s Y=%s\n", x,y,z);
B. printf("Values: x=%s, Y=%s\n", x);
Both of the above printf() statements are incorrect: one has extra parameters, other has fewer parameters. I would like to choose between the lesser evil with an explanation. Can a modern C compiler help catch such problems? If yes, how does printf() implementor need to assist the compiler?
Both of the above printf() statements are incorrect: one has extra parameters, other has fewer parameters.
The first one is not incorrect according to the C standard. The rules for function calls in general, in C 2018 6.5.2.2, do not make it an error to pass unused arguments for a ... in the function prototype. For printf specifically, C 2018 7.21.6.1 2 (about fprintf, which the specification for printf refers to) says extra arguments are harmless:
… If the format is exhausted while arguments remain, the excess arguments are evaluated (as always) but are otherwise ignored…
Certainly if a programmer writes printf("Values: X=%s. Y=%s.\n", x, y, z);, they might have made a mistake, and a compiler would be reasonable in pointing out this possibility. However, consider code such as:
printf(ComputedFormat, x, y, z);
Here it is reasonable that we wish to print different numbers of values in different circumstances, and the ComputedFormat reflects this. It would be tedious to write code for each case and dispatch to them with a switch statement. It is simpler to write one call and let the computed format determine how many values are printed. So it is not always an error to have more arguments than the conversion specifications use.
I would like to choose between the lesser evil with an explanation.
The behavior of the latter code is not defined by the C standard. C 2018 7.21.6.1 2 also says:
… If there are insufficient arguments for the format, the behavior is undefined…
Thus, no behavior may be relied on from the latter code, unless there is some guarantee from the C implementation.
Can a modern C compiler help catch such problems?
Good modern C compilers have information about the specification of printf and, when the format argument is a string literal, they compare the number and types of the arguments to the conversion specifications in the string.
If yes, how does printf() implementor need to assist the compiler?
The implementor of printf does not need to do anything except conform to the specification of printf in the C standard. The aid described above is performed by the C compiler with reference to the C standard; it does not rely on features of the particular printf implementation.
In some platforms, information about the number of arguments passed is provided to the called routine. In such platforms, a printf implementor could check whether too few arguments are provided and signal an error in some method.
Eric Postpischil has already made a great answer that uses the most reliable source (the C standard), but I just want to post my own answer about why printf may behave as it does in both cases.
printf is a variadic function which can take a variable number of arguments. The way it knows how many you have passed is solely through the format string; every time it finds a format specifier, it takes the next argument out of the list (and assumes its type from which specifier has been used). Nothing would really happen to any extra arguments because since there is no specifier for them, the function will not even try to take them and they will not be printed. So you may be warned about the extra arguments by the compiler, but the behavior in the first example is well-defined.
The second, on the other hand, is definitely undefined behavior. Since there are not enough arguments to match the number of format specifiers in the string, eventually when it finds the second %s, it will try to take the next variadic argument, but the issue is that you haven't passed any. When this happens for me, it prints some garbage value in place of the format specifier that doesn't look too nice. Anything could happen in undefined behavior though. In this case, the function seems to try to take the next variadic argument from a CPU register / the stack (memory) and fetches some garbage value that happened to be there (though again, anything could happen with undefined behavior).
So in short:
printf("%s\n", "Hello", "World");
| | ^^^^^^^ Ignored
-------
and
printf("%s\n"); ?
| |
----------

Printing statement in c

What if we just give format string in printf statement in c like:
printf("%d, %d, %d",a, b);
What does the third %d give In answer?
I did it but not able to understand the output of the code.
The third %d gives in answer undefined behavior because there is no corresponding argument.
From the C Standard (7.21.6.1 The fprintf function)
9 If a conversion specification is invalid, the behavior is
undefined.275) If any argument is not the correct type for the
corresponding conversion specification, the behavior is undefined.
Pay attention to that the name of the standard function is printf not Printf.
In C, functions that take variadic arguments (i.e. the ... parameter) have no way of knowing beforehand the number or type of the arguments. They must be kept track of in some way. A separate length parameter is one way, but the printf-family functions use the number of format specifiers in the format string to keep track.
If you tell the function "hey, there's a third parameter" when you don't pass one, this is undefined behavior. Anything can happen. It may appear to not print anything. It may read a garbage value from the memory location or the register where it expects to find the value. It may crash.
Reasoning about what might happen when your code invokes undefined behavior is a waste of time. Just make sure your code is free of it.
You've told printf to expect 3 additional int arguments, but only passed 2. It's going to look for that third int argument somewhere, and depending on how printf is implemented on your system, you may get a runtime error, or you may see garbage output, or you may see no output, or something entirely different may happen.
Officially, the behavior is left undefined - neither the compiler nor the runtime environment the program is executing in are required to handle the situation in any particular way. The result isn't guaranteed to be predictable or repeatable.
There is no "should" here - any result is "correct" as far as the language definition is concerned.

Why printf is not able to handle flags, field width and precisions properly?

I'm trying to discover all capabilities of printf and I have tried this :
printf("Test:%+*0d", 10, 20);
that prints
Test:%+100d
I have use first the flag +, then the width * and the re-use the flag 0.
Why it's make this output ? I purposely used printf() in a bad way but I wonder why it shows me the number 100?
This is because, you're supplying syntactical nonsense to the compiler, so it is free to do whatever it wants. Related reading, undefined behavior.
Compile your code with warnings enabled and it will tell you something like
warning: unknown conversion type character ‘0’ in format [-Wformat=]
printf("Test:%+*0d", 10, 20);
^
To be correct, the statement should be either of
printf("Test:%+*.0d", 10, 20); // note the '.'
where, the 0 is used as a precision
Related, quoting the C11, chapter §7.21.6.1, (emphasis mine)
An optional precision that gives the minimum number of digits to appear for the d, i,
o, u, x, and X conversions, the number of digits to appear after the decimal-point
character for a, A, e, E, f, and F conversions, the maximum number of significant
digits for the g and G conversions, or the maximum number of bytes to be written for s conversions. The precision takes the form of a period (.) followed either by an
asterisk * (described later) or by an optional decimal integer; if only the period is
specified, the precision is taken as zero. If a precision appears with any other
conversion specifier, the behavior is undefined.
printf("Test:%+0*d", 10, 20);
where, the 0 is used as a flag. As per the syntax, all the flags should appear together, before any other conversion specification entry, you cannot just put it anywhere in the conversion specification and expect the compiler to follow your intention.
Again, to quote, (and my emphasis)
Each conversion specification is introduced by the character %. After the %, the following
appear in sequence:
Zero or more flags (in any order) [...]
An optional minimum field width [...]
An optional precision [...]
An optional length modifier [...]
A conversion specifier [....]
Your printf format is incorrect: the flags must precede the width specifier.
After it handles * as the width specifier, printf expects either a . or a length modifier or a conversion specifier, 0 being none of these, the behavior is undefined.
Your library's implementation of printf does something bizarre, it seems to handle * by replacing it with the actual width argument... A side effect of the implementation. Others may do something else, including aborting the program. Such a format error would be especially risky if followed by a %s conversion.
Changing your code to printf("Test:%+0*d", 10, 20); should produce the expected output:
Test:+000000020
In complement of Sourav Ghosh's answer; an important notion is that of undefined behavior, which is tricky. Be sure to read Lattner's blog: What Every C Programmer Should Know About Undefined Behavior. See also this.
So, leaving on purpose (or perhaps depending upon) some undefined behavior in your code is intentional malpractice. Don't do that. In the very rare cases you want to do that (I cannot see any), please document it and justify yourself in some comment.
Be aware that if indeed printf is implemented by the C standard library, it can be (and often is) specially handled by the compiler (with GCC and GNU libc, that magic might happens using internally __builtin_printf)
The C99 & C11 standards are partially specifying the behavior of printf but does leave some undefined behavior cases to ease the implementation. You are unlikely to full understand or be able to mimic these cases. And the implementation itself could change (for example, on my Debian Linux, an upgrade of libc might change the undefined behavior of printf)
If you want to understand more printf study the source of some C standard library implementation (e.g. musl-libc, whose code is quite readable) and of the GCC implementation (assuming a Linux operating system).
But maintainers of GNU libc and of GCC (& even of the Linux kernel, thru syscalls) stay free to change the undefined behavior (of printf and anything else)
In practice, always compile with gcc -Wall (and probably also -g) if using GCC. Don't accept any warnings (so improve your own code till you get none).

Can code that will never be executed invoke undefined behavior?

The code that invokes undefined behavior (in this example, division by zero) will never get executed, is the program still undefined behavior?
int main(void)
{
int i;
if(0)
{
i = 1/0;
}
return 0;
}
I think it still is undefined behavior, but I can't find any evidence in the standard to support or deny me.
So, any ideas?
Let's look at how the C standard defines the terms "behavior" and "undefined behavior".
References are to the N1570 draft of the ISO C 2011 standard; I'm not aware of any relevant differences in any of the three published ISO C standards (1990, 1999, and 2011).
Section 3.4:
behavior
external appearance or action
Ok, that's a bit vague, but I'd argue that a given statement has no "appearance", and certainly no "action", unless it's actually executed.
Section 3.4.3:
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
It says "upon use" of such a construct. The word "use" is not defined by the standard, so we fall back to the common English meaning. A construct is not "used" if it's never executed.
There's a note under that definition:
NOTE Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during translation
or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a
diagnostic message).
So a compiler is permitted to reject your program at compile time if its behavior is undefined. But my interpretation of that is that it can do so only if it can prove that every execution of the program will encounter undefined behavior. Which implies, I think, that this:
if (rand() % 2 == 0) {
i = i / 0;
}
which certainly can have undefined behavior, cannot be rejected at compile time.
As a practical matter, programs have to be able to perform runtime tests to guard against invoking undefined behavior, and the standard has to permit them to do so.
Your example was:
if (0) {
i = 1/0;
}
which never executes the division by 0. A very common idiom is:
int x, y;
/* set values for x and y */
if (y != 0) {
x = x / y;
}
The division certainly has undefined behavior if y == 0, but it's never executed if y == 0. The behavior is well defined, and for the same reason that your example is well defined: because the potential undefined behavior can never actually happen.
(Unless INT_MIN < -INT_MAX && x == INT_MIN && y == -1 (yes, integer division can overflow), but that's a separate issue.)
In a comment (since deleted), somebody pointed out that the compiler may evaluate constant expressions at compile time. Which is true, but not relevant in this case, because in the context of
i = 1/0;
1/0 is not a constant expression.
A constant-expression is a syntactic category that reduces to conditional-expression (which excludes assignments and comma expressions). The production constant-expression appears in the grammar only in contexts that actually require a constant expression, such as case labels. So if you write:
switch (...) {
case 1/0:
...
}
then 1/0 is a constant expression -- and one that violates the constraint in 6.6p4: "Each constant expression shall evaluate to a constant that is in the range of representable
values for its type.", so a diagnostic is required. But the right hand side of an assignment does not require a constant-expression, merely a conditional-expression, so the constraints on constant expressions don't apply. A compiler can evaluate any expression that it's able to at compile time, but only if the behavior is the same as if it were evaluated during execution (or, in the context of if (0), not evaluated during execution().
(Something that looks exactly like a constant-expression is not necessarily a constant-expression, just as, in x + y * z, the sequence x + y is not an additive-expression because of the context in which it appears.)
Which means the footnote in N1570 section 6.6 that I was going to cite:
Thus, in the following initialization,
static int i = 2 || 1 / 0;
the expression is a valid integer constant expression with value one.
isn't actually relevant to this question.
Finally, there are a few things that are defined to cause undefined behavior that aren't about what happens during execution. Annex J, section 2 of the C standard (again, see the N1570 draft) lists things that cause undefined behavior, gathered from the rest of the standard. Some examples (I don't claim this is an exhaustive list) are:
A nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial
preprocessing token or comment
Token concatenation produces a character sequence matching the syntax of a universal character name
A character not in the basic source character set is encountered in a source file, except in an identifier, a character constant, a string
literal, a header name, a comment, or a preprocessing token that is
never converted to a token
An identifier, comment, string literal, character constant, or header name contains an invalid multibyte character or does not begin
and end in the initial shift state
The same identifier has both internal and external linkage in the same translation unit
These particular cases are things that a compiler could detect. I think their behavior is undefined because the committee didn't want to, or couldn't, impose the same behavior on all implementations, and defining a range of permitted behaviors just wasn't worth the effort. They don't really fall into the category of "code that will never be executed", but I mention them here for completeness.
This article discusses this question in section 2.6:
int main(void){
guard();
5 / 0;
}
The authors consider that the program is defined when guard() does not terminate. They also find themselves distinguishing notions of “statically undefined” and “dynamically undefined”, e.g.:
The intention behind the standard11 appears to be that, in general, situations are made statically undefined if it is not easy to generate code for them. Only when code can be generated, then the situation can be undefined dynamically.
11) Private correspondence with committee member.
I would recommend looking at the entire article. Taken together, it paints a consistent picture.
The fact that the authors of the article had to discuss the question with a committee member confirms that the standard is currently fuzzy on the answer to your question.
In this case the undefined behavior is the result of executing the code. So if the code is not executed, there is no undefined behavior.
Non executed code could invoke undefined behavior if the undefined behavior was the result of solely the declaration of the code (e.g. if some case of variable shadowing was undefined).
I'd go with the last paragraph of this answer: https://stackoverflow.com/a/18384176/694576
... UB is a runtime issue, not a compiletime issue ...
So, no, there is no UB invoked.
Only when the standard makes breaking changes and your code suddenly is no longer "never gets executed". But I don't see any logical way in which this can cause 'undefined behaviour'. Its not causing anything.
On the subject of undefined behaviour it is often hard to separate the formal aspects from the practical ones. This is the definition of undefined behaviour in the 1989 standard (I don't have a more recent version at hand, but I don't expect this to have changed substantially):
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of
erroneous data, for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely
with unpredictable results, to behaving during translation or program execution
in a documented manner characteristic of the environment (with or without the
issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
From a formal point of view I'd say your program does invoke undefined behaviour, which means that the standard places no requirement whatsoever on what it will do when run, just because it contains division by zero.
On the other hand, from a practical point of view I'd be surprised to find a compiler that didn't behave as you intuitively expect.
The standard says, as I remember right, it's allowed to do anything from the moment, a rule got broken. Maybe there are some special events with kind of global flavour (but I never heard or read about something like that)... So I would say: No this can't be UB, because as long the behavior is well defined 0 is allways false, so the rule can't get broken on runtime.
I think it still is undefined behavior, but I can't find any evidence in the standard to support or deny me.
I think the program does not invoke undefined behavior.
Defect Report #109 addresses a similar question and says:
Furthermore, if every possible execution of a given program would result in undefined behavior, the given program is not strictly conforming.
A conforming implementation must not fail to translate a strictly conforming program simply because some possible execution of that program would result in undefined behavior. Because foo might never be called, the example given must be successfully translated by a conforming implementation.
It depends on how the expression "undefined behavior" is defined, and whether "undefined behavior" of a statement is the same as "undefined behavior" for a program.
This program looks like C, so a deeper analysis of what the C standard used by the compiler (as some answers did) is appropriate.
In absence of a specified standard, the correct answer is "it depends". In some languages, compilers after the first error try to guess what the programmer might mean and still generate some code, according to the compilers guess. In other, more pure languages, once somerthing is undefined, the undefinedness propagate to the whole program.
Other languages have a concept of "bounded errors". For some limited kinds of errors, these languages define how much damage an error can produce. In particular languages with implied garbage collection frequently make a difference whether an error invalidates the typing system or does not.

Is there specific documentation for the behavior of "i=i--" in gcc?

Once again, our best loved "i=i--" -like issues. In C99 we have:
6.5 Expressions #2: Between the previous and next sequence point an
object shall have its stored value
modified at most once
70) This paragraph renders
!!undefined!! statement expressions
such as
i = ++i + 1;
But for undefinded behavior there can be variants from random output to "program execution in a documented manner" (c99 3.4.3)
So, the question:
Does gcc document the behavior for i=i++, i=i--, and so on statements?
Actual code is
int main(){int i=2;i=i--;return i;}
GCC does not document this behaviour. The Warning Options page mentions sequence points issues in -Wsequence-point, but does not hint at well-defined sematics for violations.
GCC does have a nice list of C Implementation Defined Behaviour, but I could not find any reference to this issue here either.
It's left to the back-end implementation to decide what it does. You can use -S and inspect the generated code to determine the exact sequence of events.
It is not documented but even it it was, I wouldn't want to read it. You should never rely on what a particular implementation does when running into undefined behavior.
why on earth would you want to do that? Seriously. I'm curious.

Resources