C printf formatting: What does "." and "|" mean in this context? - c

I'm taking a security course and am having trouble understanding this code due to a lack of understanding of the C programming language.
printf ("%08x.%08x.%08x.%08x|%s|");
I was told that this code should move along the stack until a pointer to a function is found.
I thought the . was just an indicator of precision of output, so I don't know what this means in this context since there are indicators of precision?
Also, I don't understand what the | means, and I can't find it in the C documentation.

The symbols have no special meaning here since they are outside of a format specifier, they are simply output literally. Note however that you haven't provided all the arguments that printf expects so it will instead print 5 values that happen to be on the stack.

In this string the . and | characters are just outputted. The dots acted as separators for hex strings and the pipes highlighting a string.
The dots are only considered an indicator of precession if they appear after the % sign and before the format specifier, for example %4.2f.

Related

What does scanf("%f%c", ...) do against input `100e`?

Consider the following C code (online available io.c):
#include <stdio.h>
int main () {
float f;
char c;
scanf ("%f%c", &f, &c);
printf ("%f \t %c", f, c);
return 0;
}
When the input is 100f, it outputs 100.000000 f.
However, when the input is 100e, it outputs only 100.000000, without e followed. What is going on here? Isn't 100e an invalid floating-point number?
This is (arguably) a glibc bug.
This behaviour clearly goes against the standard. However it is exhibited by other implementations. Some people consider it a bug in the standard instead.
Per the standard, An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence. So 100e is an input item because it is a prefix of a matching input sequence, say, 100e1, but any longer sequence of characters from the input isn't. Further, If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. 100e is not a matching sequence so the standard requires the directive to fail.
The standard cannot tell scanf to accept 100 and continue scanning from e, as some people would expect, because stdio has a limited push-back of just one character. So having read 100e, the implementation would have to read at least one more character, say a newline to be specific, and then push back both newline and e, which it cannot always do.
I'd say this is pretty clearly a pretty unclear, gray area.
If you're an implementor of a C library (or a member of the X3J11 committee), you have to worry about this sort of thing — sometimes a lot. You have to worry about the edge cases, and sometimes the edge cases can be particularly edgy.
However, you did not tag your question with the "language lawyer" tag, so perhaps you're not worried about a scrupulously correct, official interpretation.
If you're not an implementor of a C library or a member of the X3J11 committee, I'd say: don't worry what the "right" answer here is! You don't have to worry, because you don't care, because you'd be crazy to write code which is sensitive to this question — precisely because it's such an obvious gray area. (Even if you do figure out what the right behavior here is, do you trust every implementor of every C library in the world to always implement that behavior?)
I'd say there are three things you can do in the category of "not worrying", and not writing code which is sensitive to this question.
Don't use scanf at all (for anything). It's an odious, imprecise, imperfect function, that's not good for anything except — perhaps — getting numbers into the first few programs you ever write while you're first learning C. After that, scanf has no use in any serious program.
Don't arrange your code and data such that it has to confront ambiguous input like "100e" in the first place. Where is it coming from, anyway? Is it input the user might type? Data being read in from a data file? Is it expected or unexpected, correct or incorrect input? If you're reading a data file, do you have control over the code that writes the data file? Can you guarantee that floating-point fields will always be delimited appropriately, will not occasionally have random alphabetic characters appended?
If you do have to parse input that might contain a valid floating-point number, might have random alphabetic characters appended, and might therefore be ambiguous like this, I'd encourage you to use strtod instead, which is likely to be both better-defined and better-implemented.
Give a space between "%f %c" like that and also when you are going to enter input make sure to have a space between two inputs.
I am assuming you just want to print a character.
From the C Standard (6.4.4.2 Floating constants)
decimal-floating-constant:
fractional-constant exponent-partopt floating-suffixopt
digit-sequence exponent-part floating-suffixopt
and
exponent-part:
e signopt digit-sequence
E signopt digit-sequence
If you will change the call of printf the following way
printf ("%e \t %d\n", f, c);
you will get the output
1.000000e+02 10
that is the variable c has gotten the new line character '\n'.
It seems that the implementation of scanf is made such a way that the symbol e is interpreted as a part of a floating number though there is no digit after the symbol.
According to the C Standard (7.21.6.2 The fscanf function)
9 An input item is read from the stream, unless the specification
includes an n specifier. An input item is defined as the longest
sequence of input characters which does not exceed any specified
field width and which is, or is a prefix of, a matching input
sequence.278) The first character, if any, after the input item
remains unread.
So 100e is a matching input sequence of characters for a floating number.

Why does this scanf() conversion actually work?

Ah, the age old tale of a programmer incrementally writing some code that they aren't expecting to do anything more than expected, but the code unexpectedly does everything, and correctly, too.
I'm working on some C programming practice problems, and one was to redirect stdin to a text file that had some lines of code in it, then print it to the console with scanf() and printf(). I was having trouble getting the newline characters to print as well (since scanf typically eats up whitespace characters) and had typed up a jumbled mess of code involving multiple conditionals and flags when I decided to start over and ended up typing this:
(where c is a character buffer large enough to hold the entirety of the text file's contents)
scanf("%[a-zA-Z -[\n]]", c);
printf("%s", c);
And, voila, this worked perfectly. I tried to figure out why by creating variations on the character class (between the outside brackets), such as:
[\w\W -[\n]]
[\w\d -[\n]]
[. -[\n]]
[.* -[\n]]
[^\n]
but none of those worked. They all ended up reading either just one character or producing a jumbled mess of random characters. '[^\n]' doesn't work because the text file contains newline characters, so it only prints out a single line.
Since I still haven't figured it out, I'm hoping someone out there would know the answer to these two questions:
Why does "[a-zA-Z -[\nn]]" work as expected?
The text file contains letters, numbers, and symbols (':', '-', '>', maybe some others); if 'a-z' is supposed to mean "all characters from unicode 'a' to unicode 'z'", how does 'a-zA-Z' also include numbers?
It seems like the syntax for what you can enter inside the brackets is a lot like regex (which I'm familiar with from Python), but not exactly. I've read up on what can be used from trying to figure out this problem, but I haven't been able to find any info comparing whatever this syntax is to regex. So: how are they similar and different?
I know this probably isn't a good usage for scanf, but since it comes from a practice problem, real world convention has to be temporarily ignored for this usage.
Thanks!
You are picking up numbers because you have " -[" in your character set. This means all characters from space (32) to open-bracket (91), which includes numbers in ASCII (48-57).
Your other examples include this as well, but they are missing the "a-zA-Z", which lets you pick up the lower-case letters (97-122). Sequences like '\w' are treated as unknown escape sequences in the string itself, so \w just becomes a single w. . and * are taken literally. They don't have a special meaning like in a regular expression.
If you include - inside the [ (other than at the beginning or end) then the behaviour is implementation-defined.
This means that your compiler documentation must describe the behaviour, so you should consult that documentation to see what the defined behaviour is, which would explain why some of your code worked and some didn't.
If you want to write portable code then you can't use - as anything other than matching a hyphen.

What is % in format specifiers?

What is the proper name for the % operator in any format specifier? Like what does % in %d stand for? I have searched over the internet to help me figure out the solution but unable to find any. Any help?
Maybe you can learn something from this TOPIC.
Please find time to read and understand.
% means, the character after % is the place holder and will be replaced by the respective argument.
The character % does not have a special name other than "the character %". It is not a C operator in the context of a format, but is simply a character used by 2 family of functions, as they read the format, to introduced a conversion specification.
Pulling verbiage from the C spec...
For both the printf() and scanf() family of functions there is a format.
The format is composed of zero or more directives.
1) one or more white-space characters (scanf)
2) characters (not %)
3) conversion specifications
Each conversion specification is introduced by the character %.

What is the name given to those 3 character constructs that represent another character or characters

Apologies for the vagueness; I barely know how to pose this question.
Can anyone tell me the name of that family of 3 character constructs that represent another character or characters?
I think they were used in the old VT100 terminal days.
I know C supports them.
They are called trigraph. There are also two characters code called digraphs.
They are called trigraph sequences. E.g. ??/ maps to \. You have to take care to remember this when building regular expression-type parsers for C code.

kprintf printing out block letters

In my C program in an operating systems code (on the kernal side), I am trying to use kprintf to print a character, but when even I do, it prints it as well as some block character which has these four small circles in it.
kprintf(&ch);
Does anyone know whats going on here?
The printf() family of functions take a format string which tells what you want to print. You cannot print a character directly as you are doing, because printf() (or kprintf() as the case may be) will continue to read as if it were a string. You want something like:
kprintf("%c", ch);
The format string tells printf() what additional arguments to expect. In this case, %c indicates a character argument.

Resources