Printf format specifiers - c

I've been trying to cobble together the format of printf into a sort of linear format. Is the following a correct understanding of the possible printf formats?
% <justification: [-]?> <sign: [ +]?> <alternate: [#]?>
<padding: [0? num]?> <precision: [.num]?> <modifier: [h|hh|l|ll|L|z|t|j]?>
<format: [c|d(i)|e(E)|f|o|p|x(X)|u|s|g(G)]>
Is the order and meanings correct in the above? A couple examples being:
printf(" %-10.3s %-+20ld", "Hello!", 14L);

Is the following a correct understanding of the possible printf formats?
"Generally" yes, but for example you "can't" do %jg or like %0#p.
There is also %n.
Both "precision" and "padding" may be asterisks, like %*s or %.*s (but you could have defined num as ([0-9]+|\*)...).
Also . is optionally followed by a number. So it's more like <precision: [. num? ]> - if only . is specified, precision is taken as zero.
Is the order
The order of - +#0 is irrelevant and you can repeat them, so you can %-+020d and %+0-+++000----20d with same meaning (and 0 is ignored when used with -, so also there are corner cases).
meanings correct in the above?
There is no explanation in the above. - is not "justification" (taken literally, a word?), it's a flag that makes the output be left justified within the field. Also meaning depends on context - "precision" for floats maybe can be understood as the number of digits after comma, but "precision of a string" sounds strange. But generally, yes.

Your specification is too restrictive:
the flags +, -, #, 0 and space can appear in any order, but some combinations are meaning less, such as %+s.
width and precision can be specified as *
a and A were introduced to produce hexadecimal floating point representations
F is available and different from f for NaNs and infinities.
%% and %n should be recognised too.
Here is a regular expression to match all valid printf conversion specifications, but that will not detect invalid combinations:
%[ +-#0]*{[*]|[1-9][0-9]*}?(.{[*]|[0-9]*}?)?{h|hh|l|ll|L|z|t|j}?[%naAcdieEfFopxXusgG]
You might refine it to reject any flags for %% and restrict other cases too, but it will become quite complicated to express as a regex.

Related

What does scanf("%f%c", ...) do against input `100e`?

Consider the following C code (online available io.c):
#include <stdio.h>
int main () {
float f;
char c;
scanf ("%f%c", &f, &c);
printf ("%f \t %c", f, c);
return 0;
}
When the input is 100f, it outputs 100.000000 f.
However, when the input is 100e, it outputs only 100.000000, without e followed. What is going on here? Isn't 100e an invalid floating-point number?
This is (arguably) a glibc bug.
This behaviour clearly goes against the standard. However it is exhibited by other implementations. Some people consider it a bug in the standard instead.
Per the standard, An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence. So 100e is an input item because it is a prefix of a matching input sequence, say, 100e1, but any longer sequence of characters from the input isn't. Further, If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. 100e is not a matching sequence so the standard requires the directive to fail.
The standard cannot tell scanf to accept 100 and continue scanning from e, as some people would expect, because stdio has a limited push-back of just one character. So having read 100e, the implementation would have to read at least one more character, say a newline to be specific, and then push back both newline and e, which it cannot always do.
I'd say this is pretty clearly a pretty unclear, gray area.
If you're an implementor of a C library (or a member of the X3J11 committee), you have to worry about this sort of thing — sometimes a lot. You have to worry about the edge cases, and sometimes the edge cases can be particularly edgy.
However, you did not tag your question with the "language lawyer" tag, so perhaps you're not worried about a scrupulously correct, official interpretation.
If you're not an implementor of a C library or a member of the X3J11 committee, I'd say: don't worry what the "right" answer here is! You don't have to worry, because you don't care, because you'd be crazy to write code which is sensitive to this question — precisely because it's such an obvious gray area. (Even if you do figure out what the right behavior here is, do you trust every implementor of every C library in the world to always implement that behavior?)
I'd say there are three things you can do in the category of "not worrying", and not writing code which is sensitive to this question.
Don't use scanf at all (for anything). It's an odious, imprecise, imperfect function, that's not good for anything except — perhaps — getting numbers into the first few programs you ever write while you're first learning C. After that, scanf has no use in any serious program.
Don't arrange your code and data such that it has to confront ambiguous input like "100e" in the first place. Where is it coming from, anyway? Is it input the user might type? Data being read in from a data file? Is it expected or unexpected, correct or incorrect input? If you're reading a data file, do you have control over the code that writes the data file? Can you guarantee that floating-point fields will always be delimited appropriately, will not occasionally have random alphabetic characters appended?
If you do have to parse input that might contain a valid floating-point number, might have random alphabetic characters appended, and might therefore be ambiguous like this, I'd encourage you to use strtod instead, which is likely to be both better-defined and better-implemented.
Give a space between "%f %c" like that and also when you are going to enter input make sure to have a space between two inputs.
I am assuming you just want to print a character.
From the C Standard (6.4.4.2 Floating constants)
decimal-floating-constant:
fractional-constant exponent-partopt floating-suffixopt
digit-sequence exponent-part floating-suffixopt
and
exponent-part:
e signopt digit-sequence
E signopt digit-sequence
If you will change the call of printf the following way
printf ("%e \t %d\n", f, c);
you will get the output
1.000000e+02 10
that is the variable c has gotten the new line character '\n'.
It seems that the implementation of scanf is made such a way that the symbol e is interpreted as a part of a floating number though there is no digit after the symbol.
According to the C Standard (7.21.6.2 The fscanf function)
9 An input item is read from the stream, unless the specification
includes an n specifier. An input item is defined as the longest
sequence of input characters which does not exceed any specified
field width and which is, or is a prefix of, a matching input
sequence.278) The first character, if any, after the input item
remains unread.
So 100e is a matching input sequence of characters for a floating number.

What is % in format specifiers?

What is the proper name for the % operator in any format specifier? Like what does % in %d stand for? I have searched over the internet to help me figure out the solution but unable to find any. Any help?
Maybe you can learn something from this TOPIC.
Please find time to read and understand.
% means, the character after % is the place holder and will be replaced by the respective argument.
The character % does not have a special name other than "the character %". It is not a C operator in the context of a format, but is simply a character used by 2 family of functions, as they read the format, to introduced a conversion specification.
Pulling verbiage from the C spec...
For both the printf() and scanf() family of functions there is a format.
The format is composed of zero or more directives.
1) one or more white-space characters (scanf)
2) characters (not %)
3) conversion specifications
Each conversion specification is introduced by the character %.

C printf formatting: What does "." and "|" mean in this context?

I'm taking a security course and am having trouble understanding this code due to a lack of understanding of the C programming language.
printf ("%08x.%08x.%08x.%08x|%s|");
I was told that this code should move along the stack until a pointer to a function is found.
I thought the . was just an indicator of precision of output, so I don't know what this means in this context since there are indicators of precision?
Also, I don't understand what the | means, and I can't find it in the C documentation.
The symbols have no special meaning here since they are outside of a format specifier, they are simply output literally. Note however that you haven't provided all the arguments that printf expects so it will instead print 5 values that happen to be on the stack.
In this string the . and | characters are just outputted. The dots acted as separators for hex strings and the pipes highlighting a string.
The dots are only considered an indicator of precession if they appear after the % sign and before the format specifier, for example %4.2f.

Why are hexadecimal numbers prefixed with 0x?

Why are hexadecimal numbers prefixed as 0x?
I understand the usage of the prefix but I don't understand the significance of why 0x was chosen.
Short story: The 0 tells the parser it's dealing with a constant (and not an identifier/reserved word). Something is still needed to specify the number base: the x is an arbitrary choice.
Long story: In the 60's, the prevalent programming number systems were decimal and octal — mainframes had 12, 24 or 36 bits per byte, which is nicely divisible by 3 = log2(8).
The BCPL language used the syntax 8 1234 for octal numbers. When Ken Thompson created B from BCPL, he used the 0 prefix instead. This is great because
an integer constant now always consists of a single token,
the parser can still tell right away it's got a constant,
the parser can immediately tell the base (0 is the same in both bases),
it's mathematically sane (00005 == 05), and
no precious special characters are needed (as in #123).
When C was created from B, the need for hexadecimal numbers arose (the PDP-11 had 16-bit words) and all of the points above were still valid. Since octals were still needed for other machines, 0x was arbitrarily chosen (00 was probably ruled out as awkward).
C# is a descendant of C, so it inherits the syntax.
Note: I don't know the correct answer, but the below is just my personal speculation!
As has been mentioned a 0 before a number means it's octal:
04524 // octal, leading 0
Imagine needing to come up with a system to denote hexadecimal numbers, and note we're working in a C style environment. How about ending with h like assembly? Unfortunately you can't - it would allow you to make tokens which are valid identifiers (eg. you could name a variable the same thing) which would make for some nasty ambiguities.
8000h // hex
FF00h // oops - valid identifier! Hex or a variable or type named FF00h?
You can't lead with a character for the same reason:
xFF00 // also valid identifier
Using a hash was probably thrown out because it conflicts with the preprocessor:
#define ...
#FF00 // invalid preprocessor token?
In the end, for whatever reason, they decided to put an x after a leading 0 to denote hexadecimal. It is unambiguous since it still starts with a number character so can't be a valid identifier, and is probably based off the octal convention of a leading 0.
0xFF00 // definitely not an identifier!
It's a prefix to indicate the number is in hexadecimal rather than in some other base. The programming language uses it to tell compiler.
Example:
0x6400 translates to 6*16^3 + 4*16^2 + 0*16^1 +0*16^0 = 25600.
When compiler reads 0x6400, It understands the number is hexadecimal with the help of 0x term. Usually we can understand by (6400)16 or (6400)8 or whatever ..
For binary it would be:
0b00000001
Good day!
The preceding 0 is used to indicate a number in base 2, 8, or 16.
In my opinion, 0x was chosen to indicate hex because 'x' sounds like hex.
Just my opinion, but I think it makes sense.
Good Day!
I don't know the historical reasons behind 0x as a prefix to denote hexadecimal numbers - as it certainly could have taken many forms. This particular prefix style is from the early days of computer science.
As we are used to decimal numbers there is usually no need to indicate the base/radix. However, for programming purposes we often need to distinguish the bases from binary (base-2), octal (base-8), decimal (base-10) and hexadecimal (base-16) - as the most commonly used number bases.
At this point in time it is a convention used to denote the base of a number. I've written the number 29 in all of the above bases with their prefixes:
0b11101: Binary
0o35: Octal, denoted by an o
0d29: Decimal, this is unusual because we assume numbers without a prefix are decimal
0x1D: Hexadecimal
Basically, an alphabet we most commonly associate with a base (e.g. b for binary) is combined with 0 to easily distinguish a number's base.
This is especially helpful because smaller numbers can confusingly appear the same in all the bases: 0b1, 0o1, 0d1, 0x1.
If you were using a rich text editor though, you could alternatively use subscript to denote bases: 12, 18, 110, 116

Value of C define changes unexpectedly

I have a lot of #define's in my code. Now a weird problem has crept up.
I have this:
#define _ImmSign 010100
(I'm trying to simulate a binary number)
Obviously, I expect the number to become 10100. But when I use the number it has changed into 4160.
What is happening here? And how do I stop it?
ADDITIONAL
Okay, so this is due to the language interpreting this as an octal. Is there some smart way however to force the language to interpret the numbers as integers? If a leading 0 defines octal, and 0x defines hexadecimal now that I think of it...
Integer literals starting with a 0 are interpreted as octal, not decimal, in the same way that integer literals starting with 0x are interpreted as hexadecimal.
Remove the leading zero and you should be good to go.
Note also that identifiers beginning with an underscore followed by a capital letter or another underscore are reserved for the implementation, so you shouldn't define them in your code.
Prefixing an integer with 0 makes it an octal number instead of decimal, and 010100 in octal is 4160 in decimal.
There is no binary number syntax in C, at least without some compiler extension. What you see is 010100 interpreted as an octal (base 8) number: it is done when a numeric literal begins with 0.
010100 is treated as octal by C because of the leading 0. Octal 10100 is 4160.
Check this out it has some macros for using binary numbers in C
http://www.velocityreviews.com/forums/t318127-using-binary-numbers-in-c.html
There is another thread that has this also
Can I use a binary literal in C or C++?
If you are willing to write non-portable code and use gcc, you can use the binary constants extension:
#define _ImmSign 0b010100
Octal :-)
You may find these macros helpful to represent binary numbers with decimal or octal numbers in the form of 1's and 0's. They do handle leading zeros, but unfortunately you have to pick the correct macro name depending on whether you have a leading zero or not. Not perfect, but hopefully helpful.

Resources