sprintf: printing a percent followed by 0-padded hex

sprintf: printing a percent followed by 0-padded hex - c

I thought I understand printf, but I guess not. I have:
char sTemp[100];
sprintf(sTemp, "%%%02x", (unsigned)c);
I think that c is an unsigned char and I think a linefeed, but for some reason, what I get coming out is
0x0.000000000000ap-1022
If I make the 'x' in the format string an 'X', then an 'X' appears in the output string.

I completely misinterpreted the results of my experiments in the first version of this answer; apologies all around.
The result of that sprintf() call when c is '\n' is this string:
"%0a"
I believe you are then doing:
printf(sTemp);
Which is the same as:
printf("%0a");
Which is a valid format string for hexadecimal float output. You aren't passing a float variable, however, so printf() pulls whatever happens to be on the stack nearby and uses that as the value to format.
Instead, do:
printf( "%s", sTemp );
and you should see your expected "%0a".
Note that clang, and probably other compilers, give you a warning when you use printf(sTemp):
so.c:9:12: warning: format string is not a string literal (potentially
insecure) [-Wformat-security]
Because of precisely this sort of thing: memory on the stack is accessed that wasn't supposed to be.

Related

Why can a null character be embedded in a conversion specifier for scanf?

Perhaps I'm misinterpreting my results, but:
#include <stdio.h>
int
main(void)
{
char buf[32] = "";
int x;
x = scanf("%31[^\0]", buf);
printf("x = %d, buf=%s", x, buf);
}
$ printf 'foo\n\0bar' | ./a.out
x = 1, buf=foo
Since the string literal "%31[^\0]" contains an embedded null, it seems that it should be treated the same as "%31[^", and the compiler should complain that the [ is unmatched. Indeed, if you replace the string literal, clang gives:
warning: no closing ']' for '%[' in scanf format string [-Wformat]
Why does it work to embed a null character in the string literal passed to scanf?
-- EDIT --
The above is undefined behavior and merely happens to "work".

First of all, Clang totally fails to output any meaningful diagnostics here, whereas GCC knows exactly what is happening - so yet again GCC 1 - 0 Clang.
And as for the format string - well, it doesn't work. The format argument to scanf is a string. The string ends at terminating null, i.e. the format string you're giving to scanf is
scanf("%31[^", buf);
On my computer, compiling the program gives
% gcc scanf.c
scanf.c: In function ‘main’:
scanf.c:8:20: warning: no closing ‘]’ for ‘%[’ format [-Wformat=]
8 | x = scanf("%31[^\0]", buf);
| ^
scanf.c:8:21: warning: embedded ‘\0’ in format [-Wformat-contains-nul]
8 | x = scanf("%31[^\0]", buf);
| ^~
The scanset must have the closing right bracket ], otherwise the conversion specifier is invalid. If conversion specifier is invalid, the behaviour is undefined.
And, on my computer running it,
% printf 'foo\n\0bar' | ./a.out
x = 0, buf=
Q.E.D.

This is a rather strange situation. I think there are a couple of things going on.
First of all, a string in C ends by definition at the first \0. You can always scoff at this rule, for example by writing a string literal with an explicit \0 in the middle of it. When you do, though, the characters after the \0 are mostly invisible. Very few standard library functions are able to see them, because of course just about everything that interprets a C string will stop at the first \0 it finds.
However: the string you pass as the first argument to scanf is typically parsed twice -- and by "parsed" I mean actually interpreted as a scanf format string possibly containing special % sequences. It's always going to be parsed at run time, by the actual copy of scanf in your C run-time library. But it's typically also parsed by the compiler, at compile time, so that the compiler can warn you if the % sequences don't match the actual arguments you call it with. (The run-time library code for scanf, of course, is unable to perform this checking.)
Now, of course, there's a pretty significant potential problem here: what if the parsing performed by the compiler is in some way different than the parsing performed by the actual scanf code in the run-time library? That might lead to confusing results.
And, to my considerable surprise, it looks like the scanf format parsing code in compilers can (and in some cases does) do something special and unexpected. clang doesn't (it doesn't complain about the malformed string at all), but gcc says both "no closing ‘]’ for ‘%[’ format" and "embedded ‘\0’ in format". So it's noticing.
This is possible (though still surprising) because the compiler, at least, can see the whole string literal, and is in a position to notice that the null character is an explicit one inserted by the programmer, not the more usual implicit one appended by the compiler. And indeed the warning "embedded ‘\0’ in format" emitted by gcc proves that gcc, at least, is quite definitely written to accommodate this possibility. (See the footnote below for a bit more on the compiler's ability to "see" the whole string literal.)
But the second question is, why does it (seem to) work at runtime? What is the actual scanf code in the C library doing?
That code, at least, has no way of knowing that the \0 was explicit and that there are "real" characters following it. That code simply must stop at the first \0 that it finds. So it's operating as if the format string was
"%31[^"
That's a malformed format string, of course. The run-time library code isn't required to do anything reasonable. But my copy, like yours, is able to read the full string "foo". What's up with that?
My guess is that after seeing the % and the [ and the ^, and deciding that it's going to scan characters not matching some set, it's perfectly willing to, in effect, infer the missing ], and sail on matching characters from the scanset, which ends up having no excluded characters.
I tested this by trying the variant
x = scanf("%31[^\0o]", buf);
This also matched and printed "foo", not "f".
Obviously things are nothing like guaranteed to work like this, of course. #AnttiHaapala has already posted an answer showing that his C RTL declines to scan "foo" with the malformed scan string at all.
Footnote:
Most of the time, an embedded in \0 in a string truly, prematurely ends it. Most of the time, everything following the \0 is effectively invisible, because at run time, every piece of string interpreting code will stop at the first \0 it finds, with no way to know whether it was one explicitly inserted by the programmer or implicitly appended by the compiler. But as we've seen, the compiler can tell the difference, because the compiler (obviously) can see the entire string literal, exactly as entered by the programmer. Here's proof:
char str1[] = "Hello, world!";
char str2[] = "Hello\0world!";
printf("sizeof(str1) = %zu, strlen(str1) = %zu\n", sizeof(str1), strlen(str1));
printf("sizeof(str2) = %zu, strlen(str2) = %zu\n", sizeof(str2), strlen(str2));
Normally, sizeof on a string literal gives you a number one bigger than strlen. But this code prints:
sizeof(str1) = 14, strlen(str1) = 13
sizeof(str2) = 13, strlen(str2) = 5
Just for fun I also tried:
char str3[5] = "Hello";
This time, though, strlen gave a larger number:
sizeof(str3) = 5, strlen(str3) = 10
I was mildly lucky. str3 has no trailing \0, neither one inserted by me nor appended by the compiler, so strlen sails off the end, and could easily have counted hundreds or thousands of characters before finding a random \0 somewhere in memory, or crashing.

Why can a null character be embedded in a conversion specifier for scanf?
A null character cannot directly be specified as part of a scanset as in "%31[^\0]" as the parsing of the string ends with the first null character.
"%31[^\0]" is parsed by scanf() as if it was "%31[^". As it is an invalid scanf()
specifier, UB will likely follow. A compiler may provide diagnostics on more than what scanf() sees.
A null character can be part of a scanset as in "%31[^\n]". This will read in all characters, including the null character, other than '\n'.
In the unusual case of reading null chracters, to determine the number of characters read scanned, use "%n".
int n = 0;
scanf("%31[^\n]%n", buf, &n);
scanf("%*1[\n]"); // Consume any 1 trailing \n
if (n) {
printf("First part of buf=%s, %d characters read ", buf, n);
}

printf() working without double quotes, prints random characters [duplicate]

This question already has answers here:
Behaviour of printf when printing a %d without supplying variable name
(6 answers)
Closed 4 years ago.
I stumbled upon this C code in a college test and when I tested it on Dev-C++ 5.11 compiler, it printed random characters. I can't understand how or why this code works. Can someone enlighten me?
int main() {
char a[10] = "%s" ;
printf( a ) ;
}

This question has two parts: the missing quotes and the random characters.
printf() is just a function. You can pass strings and other values to functions as arguments. You don't have to use a literal. You can use both char *a = "something"; printf(a) (passing a variable as an argument) and printf("something") (passing a string literal as an argument).
printf() is also a variadic function. This means it can accept any number of arguments. You can use printf("hello world"), printf("%s", "hello world") and even printf("%s %s", "hello", "world"). Some older compilers don't verify you actually passed the right number of arguments based on the first argument which is the format string. This is why your code compiles even though it's missing an argument. When the program runs the code goes over the format string, sees "%s" and looks for the second argument to print it as a string. Since there is no second argument it basically reads random memory and you get garbage characters.

printf function signature is:
int printf(const char *format, ...);
It expects format string as the first argument and variable number of arguments that are handled and printed based on the format specifiers in the format string. variable a in your question is providing it the format string. Reason for random characters is that the argument for format specifier %s is missing. Following will correctly print a string:
printf( a, "Hello World!" );
A list of format specifiers can be seen here https://en.wikipedia.org/wiki/Printf_format_string
Why does it compile?
Because variadic arguments accepted by printf are processed at run time. Not all compilers do compile time checks for validating arguments against the format string. Even if they do they would at most throw a warning, but still compile the program.

It's using the string "%s" as a format string, and using uninitialized memory as the "data".
The only reason it does "something" is because the compiler was apparently not smart enough to recognize that the format string required one parameter and zero parameters were supplied. Or because compiler warnings were ignored and/or errors were turned off.
Just an FYI for anybody who bumps into this: "Always leave all warnings and errors enabled and fix your code until they're gone" This doesn't guarantee correct behaviour but does make "mysterious" problems less likely.

read 2 bytes in hexadecimal base and convert into decimal using C language fscanf

Well as said Im using C language and fscanf for this task but it seems to make the program crash each time then its surely that I did something wrong here, I havent dealed a lot with this type of input read so even after reading several topics here I still cant find the right way, I have this array to read the 2 bytes
char p[2];
and this line to read them, of course fopen was called earlier with file pointer fp, I used "rb" as read mode but tried other options too when I noticed this was crashing, Im just saving space and focusing in the trouble itself.
fscanf(fp,"%x%x",p[0],p[1]);
later to convert into decimal I have this line (if its not the EOF that we reached)
v = strtol(p, 0, 10);
Well v is mere integer to store the final value we are seeking. But the program keeps crashing when scanf is called or I think thats the case, Im not compiling to console so its a pitty that I cant output what has been done and what hasnt but in debugger it seems like crashing there
Well I hope you can help me out in this, Im a bit lost regarding this type of read/conversion any clue will help me greatly, thanks =).
PS forgot to add that this is not homework, a friend want to make some file conversion for a game and this code will manipulate the files needed alone, so while I could be using any language or environment for this, I always feel better in C language

char strings in C are really called null-terminated byte strings. That null-terminated part is important, as it means a string of two characters needs space for three characters to include the null-terminator character '\0'. Not having the terminator means string functions will go out of bounds in their search for it, leading to undefined behavior.
Furthermore the "%x" format is to read a heaxadecimal integer number and store it in an int. Mismatching format specifiers and arguments leads to undefined behavior.
Lastly and probably what's causing the crash: The scanf family of function expects pointers as their arguments. Not providing pointers will again lead to undefined behavior.
There are two solutions to the above problems:
Going with code similar to what you already use, first of all you must make space for the terminator in the array. Then you need to read two characters. Lastly you need to add the terminator:
char p[3] = { 0 }; // String for two characters, initialized to zero
// The initialization means that we don't need to explicitly add the terminator
// Read two characters, skipping possible leading white-space
fscanf(fp," %c%c",p[0],p[1]);
// Now convert the string to an integer value
// The string is in base-16 (two hexadecimal characters)
v = strtol(p, 0, 16);
Read the hexadecimal value into an integer directly:
unsigned int v;
fscanf(fp, "%2x", &v); // Read as hexadecimal
The second alternative is what I would recommend. It reads two characters and parses it as a hexadecimal value, and stores the result into the variable v. It's important to note that the value in v is stored in binary! Hexadecimal, decimal or octal are just presentation formats, internally in the computer it will still be stored in binary ones and zeros (which is true for the first alternative as well). To print it as decimal use e.g.
printf("%d\n", v);

You need to pass to fscanf() the address of a the variable(s) to scan into.
Also the conversion specifier need to suite the variable provided. In your case those are chars. x expects an int, to scan into a char use the appropriate length modifiers, two times h here:
fscanf(fp, "%hhx%hhx", &p[0], &p[1]);
strtol() expects a C-string as 1st parameter.
What you pass isn't a C-string, as a C-string ought to be 0-terminated, which p isn't.
To fix this you could do the following:
char p[3];
fscanf(fp, "%x%x", p[0], p[1]);
p[2] = '\0';
long v = strtol(p, 0, 10);

XCode saying something is potentially insecure for a C program? What exactly does it mean?

Say we are working in C.
if I go ahead and do this:
char *word;
word = "Hello friends";
printf(word);
then XCode tells me that because I'm not using a string literal, that I might have something that is potentially insecure. Does that mean an opening for something to hack my program? If so, how could that happen?
Alternatively, if I do this:
char *word;
word = "Hello friends";
printf("%s", word);
Then XCode raises no flags and I'm fine. What exactly is the difference?

The first argument to printf is not the string you want to print. It's a format string. It can contain formatting instructions which results in the string that is printed once it is combined with further arguments.
Your first example is an uncontrolled format string vulnerability.

The issue is that in the first case if word contains formatting specs (%d, %f, %s, etc.) then printf() will assume those values are on the stack, but in fact they aren't, which could lead to a crash.
Use puts() or fputs() instead if you don't care about formatting.

Why does C's printf format string have both %c and %s?

Why does C's printf format string have both %c and %s?
I know that %c represents a single character and %s represents a null-terminated string of characters, but wouldn't the string representation alone be enough?

Probably to distinguish between null terminated string and a character. If they just had %s, then every single character must also be null terminated.
char c = 'a';
In the above case, c must be null terminated. This is my assumption though :)

%s prints out chars until it reaches a 0 (or '\0', same thing).
If you just have a char x;, printing it with printf("%s", &x); - you'd have to provide the address, since %s expects a char* - would yield unexpected results, as &x + 1 might not be 0.
So you couldn't just print a single character unless it was null-terminated (very inefficent).
EDIT: As other have pointed out, the two expect different things in the var args parameters - one a pointer, the other a single char. But that difference is somewhat clear.

The issue that is mentioned by others that a single character would have to be null terminated isn't a real one. This could be dealt with by providing a precision to the format %.1s would do the trick.
What is more important in my view is that for %s in any of its forms you'd have to provide a pointer to one or several characters. That would mean that you wouldn't be able to print rvalues (computed expressions, function returns etc) or register variables.
Edit: I am really pissed off by the reaction to this answer, so I will probably delete this, this is really not worth it. It seems that people react on this without even having read the question or knowing how to appreciate the technicality of the question.
To make that clear: I don't say that you should prefer %.1s over %c. I only say that reasons why %c cannot be replaced by that are different than the other answer pretend to tell. These other answers are just technically wrong. Null termination is not an issue with %s.

The printf function is a variadic function, meaning that it has variable number of arguments. Arguments are pushed on the stack before the function (printf) is called. In order for the function printf to use the stack, it needs to know information about what is in the stack, the format string is used for that purpose.
e.g.
printf( "%c", ch ); tells the function the argument 'ch'
is to be interpreted as a character and sizeof(char)
whereas
printf( "%s", s ); tells the function the argument 's' is a pointer
to a null terminated string sizeof(char*)
it is not possible inside the printf function to otherwise determine stack contents e.g. distinguishing between 'ch' and 's' because in C there is no type checking during runtime.

%s says print all the characters until you find a null (treat the variable as a pointer).
%c says print just one character (treat the variable as a character code)
Using %s for a character doesn't work because the character is going to be treated like a pointer, then it's going to try to print all the characters following that place in memory until it finds a null
Stealing from the other answers to explain it in a different way.
If you wanted to print a character using %s, you could use the following to properly pass it an address of a char and to keep it from writing garbage on the screen until finding a null.
char c = 'c';
printf('%.1s', &c);

For %s, we need provide the address of string, not its value.
For %c, we provide the value of characters.
If we used the %s instead of %c, how would we provide a '\0' after the characters?

Id like to add another point of perspective to this fun question.
Really this comes down to data typing. I have seen answers on here that state that you could provide a pointer to the char, and provide a
"%.1s"
This could indeed be true. But the answer lies in the C designer's trying to provide flexibility to the programmer, and indeed a (albeit small) way of decreasing footprint of your application.
Sometimes a programmer might like to run a series of if-else statements or a switch-case, where the need is to simply output a character based upon the state. For this, hard coding the the characters could indeed take less actual space in memory as the single characters are 8 bits versus the pointer which is 32 or 64 bits (for 64 bit computers). A pointer will take up more space in memory.
If you would like to decrease the size through using actual chars versus pointers to chars, then there are two ways one could think to do this within printf types of operators. One would be to key off of the .1s, but how is the routine supposed to know for certain that you are truly providing a char type versus a pointer to a char or pointer to a string (array of chars)? This is why they went with the "%c", as it is different.
Fun Question :-)

C has the %c and %s format specifiers because they handle different types.
A char and a string are about as different as night and 1.

%c expects a char, which is an integer value and prints it according to encoding rules.
%s expects a pointer to a location of memory that contains char values, and prints the characters in that location according to encoding rules until it finds a 0 (null) character.
So you see, under the hood, the two cases while they look alike they have not much in common, as one works with values and the other with pointers. One is instructions for interpreting a specific integer value as an ascii char, and the other is iterating the contents of a memory location char by char and interpreting them until a zero value is encountered.

I have done a experiment with printf("%.1s", &c) and printf("%c", c).
I used the code below to test, and the bash's time utility the get the runing time.
#include<stdio.h>
int main(){
char c = 'a';
int i;
for(i = 0; i < 40000000; i++){
//printf("%.1s", &c); get a result of 4.3s
//printf("%c", c); get a result of 0.67s
}
return 0;
}
The result says that using %c is 10 times faster than %.1s. So, althought %s can do the job of %c, %c is still needed for performance.

Since no one has provided an answer with ANY reference whatsoever, here is a printf specification from pubs.opengroup.com which is similar to the format definition from IBM
%c
The int argument shall be converted to an unsigned char, and the resulting byte shall be written.
%s
The argument shall be a pointer to an array of char. Bytes from the array shall be written up to (but not including) any terminating null byte. If the precision is specified, no more than that many bytes shall be written. If the precision is not specified or is greater than the size of the array, the application shall ensure that the array contains a null byte.