What is wrong with this code?
#include <stdio.h>
#include <stdarg.h>
void myprintf(const char * format, ...) __printflike(1, 2);
int main(int argc, const char * argv[]) {
printf("%s\n");
myprintf("%s\n");
return 0;
}
void myprintf(const char * format, ...) {
if (format) {
va_list arguments;
va_start(arguments, format);
vprintf(format, arguments);
va_end(arguments);
}
}
By using __printflike I get a nice warning, like printf. But unlike printf, which prints trash at least, I get EXC_BAD_ACCESS on the call to vprintf as shown here:
Is there any way I can make this work?
Thanks!
UPDATE: I understand that by calling a function with the wrong number of arguments I get undefined behavior, but I'd like myprintf to behave just like printf does (without crashing). Is that possible? Is there any way I can check the arguments before calling vprintf to avoid the exception?
UPDATE 2: I think I got it now, thanks for all the comments and answers. For this very simple example I think is better to use a macro, which fails fast and crashes at the calling point:
Undefined means unpredictable. In one run printf may produce trash, in another it may produce EXC_BAD_ACCESS instead. You cannot reproduce undefined behavior. In this particular case, the %s term in the formatting string says printf need to find a C-string. Depending on your libc implementation, when you did not specify the second argument, it may find it somewhere. And if it happens that a null character is found not far away from this pointer, you get trash output. If not, printf will continue to search the end of the string until it gets out of range of the memory assigned to your program and you get the EXC_BAD_ACCESS.
It is not possible - at least not in a portable way - to determine how many arguments are passed to a function. The format specifier is the only way for printf to determine how many values are to be popped from the stack so entering a wrong format specifier gives you undefined behavior. It is just one of those C things you need to learn and then move on.
By trying to "correct" something like this you may just make the code more unreadable and hard for other people to understand.
Is there any way I can check the arguments before calling vprintf to avoid the exception?
Just one: Take the compiler's warning serious and eliminate your programming bug that the warning points you to.
See: It's becoming winter now, it will have mud and snow on the streets (in Europe at least), so you have your preferred garage mount the winter tires to your Porsche. On return of this nice car you find the following sticker on the dashboard (put there by the garage):
(apx 100 miles/h)
This sticker reminds you, that the freshly mounted winter tires do not support the maximum speed of the car.
You wouldn't drive faster now expecting the car to stop the moment the tires are about to break, won't you?
It's up to you, the driver to respect this warning!
;-)
Related
According to my knowledge and some threads like this, if you want to print strings in C you have to do something like this:
printf("%s some text", value);
And the value will be displayed instead of %s.
I wrote this code:
char password[] = "default";
printf("Enter name: \n");
scanf("%s", password);
printf("%s is your password", password); // All good - the print is as expected
But I noticed that I can do the exact same thing without the value part and it will still work:
printf("%s is your password");
So my question is why does the %s placeholder get a value without me giving it one, and how does it know what value to give it?
This is undefined behavior, anything can happen included something that looks like correct. But it is incorrect.
Your compiler can probably tell you the problem if you use correct options.
Standard says (emphasized is mine):
7.21.6.1 The fprintf function
The fprintf function writes output to the stream pointed to by stream,
under control of the string pointed to by format that specifies how
subsequent arguments are converted for output. If there are
insufficient arguments for the format, the behavior is undefined. If
the format is exhausted while arguments remain, the excess arguments
are evaluated (as always) but are otherwise ignored. The fprintf
function returns when the end of the format string is encountered.
The printf() function uses a C language feature that lets you pass a variable number of arguments to a function. (Technically called 'variadic functions' - https://en.cppreference.com/w/c/variadic - I'll just say 'varargs' for short.)
When a function is called in C, the arguments to the function are pushed onto the stack(*) - but the design of the varargs feature provides no way for the called function to know how many parameters were passed in.
When the printf() function executes, it scans the format string, and the %s tells it to look for a string in the next position in the variable argument list. Since there are no more arguments in the list, the code 'walks off the end of the array' and grabs the next thing it sees in memory. I suspect what's happening is that the next location in memory still has the address of password from your prior call to scanf, and since that address points to a string, and you told printf to print a string, you got lucky, and it worked.
Try putting another function call (for example: printf("%s %s %s\n","X","Y","Z") in between your call to scanf("%s", password); and printf("%s is your password"); and you will almost certainly see different behavior.
Free Advice: C has a lot of sharp corners and undefined bits, but a good compiler (and static analysis or 'lint' tool) can warn you about a lot of common errors. If you are going to work in C, learn how to crank your compiler warnings to the max, learn what all the errors and warnings mean (as they happen, not all at once!) and force yourself to write C code that compiles without any warnings. It will save you a lot of unnecessary hassle.
(*) generalizing here for simplicity - sometimes arguments can be passed in registers, sometimes things are inlined, blah blah blah.
So, there are a lot of posts telling that you shouldn't do printf("%s is your password");, and that you were just lucky. I guess from your question that you somewhat knew that. But few are telling you the probable reason for why you were lucky.
To understand what probably happened, we have to understand how function parameters are passed. The caller of a function must put the parameters on an agreed upon place for the function to find the parameters. So for parameters 1...N we call these places r1 ... rN. (This kind of agreement is part of something we call a "Function Calling Convention")
That means that this code:
scanf("%s", password);
printf("%s is your password",password);
may be turned into this pseudo-code by the compiler
r1="%s";
r2=password;
call scanf;
r1="%s is your password";
r2=password;
call printf;
If you now remove the second parameter from the printf call, your pseudo-code will look like this:
r1="%s";
r2=password;
call scanf;
r1="%s is your password";
call printf;
Be aware that after call scanf;, r2 might be unmodified and still be set to password, therefore call printf; "works"
You might think that you have discovered a new way to optimize code, by eliminating one of the r2=password; assignments. This might be true for old "dumb" compilers, but not for modern ones.
Modern compilers will already do this when it is safe. And it is not always safe. Reasons for why it isn't safe might be thatscanf and printf have different calling conventions, r2 might have been modified behind your back, etc..
To better get a feeling of what the compiler is doing, I recommend to look at the assembler output from your compiler, at different optimization levels.
And please, always compile with -Wall. The compiler is often good at telling you when you are doing dumb stuff.
In a book that I'm reading, it's written that printf with a single argument (without conversion specifiers) is deprecated. It recommends to substitute
printf("Hello World!");
with
puts("Hello World!");
or
printf("%s", "Hello World!");
Can someone tell me why printf("Hello World!"); is wrong? It is written in the book that it contains vulnerabilities. What are these vulnerabilities?
printf("Hello World!"); is IMHO not vulnerable but consider this:
const char *str;
...
printf(str);
If str happens to point to a string containing %s format specifiers, your program will exhibit undefined behaviour (mostly a crash), whereas puts(str) will just display the string as is.
Example:
printf("%s"); //undefined behaviour (mostly crash)
puts("%s"); // displays "%s\n"
printf("Hello world");
is fine and has no security vulnerability.
The problem lies with:
printf(p);
where p is a pointer to an input that is controlled by the user. It is prone to format strings attacks: user can insert conversion specifications to take control of the program, e.g., %x to dump memory or %n to overwrite memory.
Note that puts("Hello world") is not equivalent in behavior to printf("Hello world") but to printf("Hello world\n"). Compilers usually are smart enough to optimize the latter call to replace it with puts.
Further to the other answers, printf("Hello world! I am 50% happy today") is an easy bug to make, potentially causing all manner of nasty memory problems (it's UB!).
It's just simpler, easier and more robust to "require" programmers to be absolutely clear when they want a verbatim string and nothing else.
And that's what printf("%s", "Hello world! I am 50% happy today") gets you. It's entirely foolproof.
(Steve, of course printf("He has %d cherries\n", ncherries) is absolutely not the same thing; in this case, the programmer is not in "verbatim string" mindset; she is in "format string" mindset.)
I'll just add a bit of information regarding the vulnerability part here.
It's said to be vulnerable because of printf string format vulnerability. In your example, where the string is hardcoded, it's harmless (even if hardcoding strings like this is never fully recommended). But specifying the parameter's types is a good habit to take. Take this example:
If someone puts format string character in your printf instead of a regular string (say, if you want to print the program stdin), printf will take whatever he can on the stack.
It was (and still is) very used to exploit programs into exploring stacks to access hidden information or bypass authentication for example.
Example (C):
int main(int argc, char *argv[])
{
printf(argv[argc - 1]); // takes the first argument if it exists
}
if I put as input of this program "%08x %08x %08x %08x %08x\n"
printf ("%08x %08x %08x %08x %08x\n");
This instructs the printf-function to retrieve five parameters from the stack and display them as 8-digit padded hexadecimal numbers. So a possible output may look like:
40012980 080628c4 bffff7a4 00000005 08059c04
See this for a more complete explanation and other examples.
This is misguided advice. Yes, if you have a run-time string to print,
printf(str);
is quite dangerous, and you should always use
printf("%s", str);
instead, because in general you can never know whether str might contain a % sign. However, if you have a compile-time constant string, there's nothing whatsoever wrong with
printf("Hello, world!\n");
(Among other things, that is the most classic C program ever, literally from the C programming book of Genesis. So anyone deprecating that usage is being rather heretical, and I for one would be somewhat offended!)
Calling printf with literal format strings is safe and efficient, and there
exist tools to automatically warn you if your invocation of printf with user
provided format strings is unsafe.
The most severe attacks on printf take advantage of the %n format
specifier. In contrast to all other format specifiers, e.g. %d, %n actually
writes a value to a memory address provided in one of the format arguments.
This means that an attacker can overwrite memory and thus potentially take
control of your program. Wikipedia
provides more detail.
If you call printf with a literal format string, an attacker cannot sneak
a %n into your format string, and you are thus safe. In fact,
gcc will change your call to printf into a call to puts, so there litteraly
isn't any difference (test this by running gcc -O3 -S).
If you call printf with a user provided format string, an attacker can
potentially sneak a %n into your format string, and take control of your
program. Your compiler will usually warn you that his is unsafe, see
-Wformat-security. There are also more advanced tools that ensure that
an invocation of printf is safe even with user provided format strings, and
they might even check that you pass the right number and type of arguments to
printf. For example, for Java there is Google's Error Prone
and the Checker Framework.
A rather nasty aspect of printf is that even on platforms where the stray memory reads could only cause limited (and acceptable) harm, one of the formatting characters, %n, causes the next argument to be interpreted as a pointer to a writable integer, and causes the number of characters output thus far to be stored to the variable identified thereby. I've never used that feature myself, and sometimes I use lightweight printf-style methods which I've written to include only the features I actually use (and don't include that one or anything similar) but feeding standard printf functions strings received from untrustworthy sources may expose security vulnerabilities beyond the ability to read arbitrary storage.
Since no one has mentioned, I'd add a note regarding their performance.
Under normal circumstances, assuming no compiler optimisations are used (i.e. printf() actually calls printf() and not fputs()), I would expect printf() to perform less efficiently, especially for long strings. This is because printf() has to parse the string to check if there are any conversion specifiers.
To confirm this, I have run some tests. The testing is performed on Ubuntu 14.04, with gcc 4.8.4. My machine uses an Intel i5 cpu. The program being tested is as follows:
#include <stdio.h>
int main() {
int count = 10000000;
while(count--) {
// either
printf("qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM");
// or
fputs("qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM", stdout);
}
fflush(stdout);
return 0;
}
Both are compiled with gcc -Wall -O0. Time is measured using time ./a.out > /dev/null. The following is the result of a typical run (I've run them five times, all results are within 0.002 seconds).
For the printf() variant:
real 0m0.416s
user 0m0.384s
sys 0m0.033s
For the fputs() variant:
real 0m0.297s
user 0m0.265s
sys 0m0.032s
This effect is amplified if you have a very long string.
#include <stdio.h>
#define STR "qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM"
#define STR2 STR STR
#define STR4 STR2 STR2
#define STR8 STR4 STR4
#define STR16 STR8 STR8
#define STR32 STR16 STR16
#define STR64 STR32 STR32
#define STR128 STR64 STR64
#define STR256 STR128 STR128
#define STR512 STR256 STR256
#define STR1024 STR512 STR512
int main() {
int count = 10000000;
while(count--) {
// either
printf(STR1024);
// or
fputs(STR1024, stdout);
}
fflush(stdout);
return 0;
}
For the printf() variant (ran three times, real plus/minus 1.5s):
real 0m39.259s
user 0m34.445s
sys 0m4.839s
For the fputs() variant (ran three times, real plus/minus 0.2s):
real 0m12.726s
user 0m8.152s
sys 0m4.581s
Note: After inspecting the assembly generated by gcc, I realised that gcc optimises the fputs() call to an fwrite() call, even with -O0. (The printf() call remains unchanged.) I am not sure whether this will invalidate my test, as the compiler calculates the string length for fwrite() at compile-time.
For gcc it is possible to enable specific warnings for checking printf() and scanf().
The gcc documentation states:
-Wformat is included in -Wall. For more control over some aspects
of format checking, the options -Wformat-y2k,
-Wno-format-extra-args, -Wno-format-zero-length,
-Wformat-nonliteral, -Wformat-security, and -Wformat=2 are
available, but are not included in -Wall.
The -Wformat which is enabled within the -Wall option does not enable several special warnings that help to find these cases:
-Wformat-nonliteral will warn if you do not pass a string litteral as format specifier.
-Wformat-security will warn if you pass a string that might contain a dangerous construct. It's a subset of -Wformat-nonliteral.
I have to admit that enabling -Wformat-security revealed several bugs we had in our codebase (logging module, error handling module, xml output module, all had some functions that could do undefined things if they had been called with % characters in their parameter. For info, our codebase is now around 20 years old and even if we were aware of these kind of problems, we were extremely surprised when we enabled these warnings how many of these bugs were still in the codebase).
printf("Hello World\n")
automatically compiles to the equivalent
puts("Hello World")
you can check it with diassembling your executable:
push rbp
mov rbp,rsp
mov edi,str.Helloworld!
call dword imp.puts
mov eax,0x0
pop rbp
ret
using
char *variable;
...
printf(variable)
will lead to security issues, don't ever use printf that way!
so your book is actually correct, using printf with one variable is deprecated but you can still use printf("my string\n") because it will automatically become puts
Beside the other well-explained answers with any side-concerns covered, I would like to give a precise and concise answer to the provided question.
Why is printf with a single argument (without conversion specifiers) deprecated?
A printf function call with a single argument in general is not deprecated and has also no vulnerabilities when used properly as you always shall code.
C Users amongst the whole world, from status beginner to status expert use printf that way to give a simple text phrase as output to the console.
Furthermore, Someone have to distinguish whether this one and only argument is a string literal or a pointer to a string, which is valid but commonly not used. For the latter, of course, there can occur inconvenient outputs or any kind of Undefined Behavior, when the pointer is not set properly to point to a valid string but these things can also occur if the format specifiers are not matching the respective arguments by giving multiple arguments.
Of course, It is also not right and proper that the string, provided as one and only argument, has any format or conversion specifiers, since there is no conversion going to be happen.
That said, giving a simple string literal like "Hello World!" as only argument without any format specifiers inside that string like you provided it in the question:
printf("Hello World!");
is not deprecated or "bad practice" at all nor has any vulnerabilities.
In fact, many C programmers begin and began to learn and use C or even programming languages in general with that HelloWorld-program and this printf statement as first ones of its kind.
They wouldn´t be that if they were deprecated.
In a book that I'm reading, it's written that printf with a single argument (without conversion specifiers) is deprecated.
Well, then I would take the focus on the book or the author itself. If an author is really doing such, in my opinion, incorrect assertions and even teaching that without explicitly explaining why he/she is doing so (if those assertions are really literally equivalent provided in that book), I would consider it a bad book. A good book, as opposed to that, shall explain why to avoid certain kind of programming methods or functions.
According to what I said above, using printf with only one argument (a string literal) and without any format specifiers is not in any case deprecated or considered as "bad practice".
You should ask the author, what he meant with that or even better, mind him to clarify or correct the relative section for the next edition or imprints in general.
I understand that if printf is given no arguments it outputs an unexpected value.
Example:
#include <stdio.h>
int main() {
int test = 4 * 4
printf("The answer is: %d\n");
return 0;
}
This returns a random number. After playing around with different formats such as %p, %x etc, it doesn't print 16(because I didn't add the variable to the argument section) What i'd like to know is, where are these values being taken from? Is it the top of the stack? It's not a new value every time I compile, which is weird, it's like a fixed value.
printf("The answer is: %d\n");
invokes undefined behavior. C requires a conversion specifier to have an associated argument. While it is undefined behavior and anything can happen, on most systems you end up dumping the stack. It's the kind of trick used in format string attacks.
It is called undefined behavior and it is scary (see this answer).
If you want an explanation, you need to dive into implementation specific details. So study the generated source code (e.g. compile with gcc -Wall -Wextra -fverbose-asm + your optimization flags, then look into the generated .s assembly file) and the ABI of your system.
The printf function will go looking for the argument on the stack, even if you don't supply one. Anything that's on there will be used, if it can't find an integer argument. Most times, you will get nonsensical data. The data chosen varies depending on the settings of your compiler. On some compilers, you may even get 16 as a result.
For example:
int printf(char*, int d){...}
This would be how printf works(not really, just an example). It doesn't return an error if d is null or empty, it just looks on the stack for the argument that's supposed to be there to display.
Printf is a variable argument function. Most compilers push arguments onto the stack and then call the function, but, depending on machine, operating system, calling convention, number of arguments, etc, there are also other values pushed onto the stack, which might be constant in your function.
Printf reads this area of memory and returns it.
I recently came across a rather unusual coding convention wherein the call for a function returning "void" is prefixed with (void).
e.g.
(void) MyFunction();
Is it any different from the function call like:
MyFunction();
Has it got any advantage or is it yet another needless but there coding convention of some sort?
Some functions like printf() return a value that is almost never used in real code (in the case of printf, the number of characters printed). However, some tools, like lint, expect that if a function returns a value it must be used, and will complain unless you write something like:
int n = printf( "hello" );
using the void cast:
(void) printf( "hello" );
is a way of telling such tools you really don't want to use the return value, thus keeping them quiet. If you don't use such tools, you don't need to bother, and in any case most tools allow you to configure them to ignore return values from specific functions.
No, there isn't any difference -- what's being cast to void is the return value of the function.
I'd say it could make sense of you wanted to make explicit you're not using the return value (you're calling it for the side effects), but as the function already has void return, it doesn't make much sense.
If the function returns something the void could avoid (!) a warning (indeed in no way I was able to make gcc warn me that the return value is lost) on some compilers (or lint tools); but more importantly, it makes clear that a return value is "thrown" away purposely (and not by mistake).
acedemically: a "function" always returns something, else it would be a procedure. So the Author of this code wants to say "i know this naming is wrong, but i will no change the name, so i make this disturbance visible"
Has it got any advantage or is it yet another needless but there coding convention of some sort?
No difference. It is a quite common convention e.g. in software testing to highlight the fact that in context the function return, if any, is safe to be discarded.
In HPUX man pages it is quite common in example code to see a cast to void to get around lint warnings.
fprintf(mystream, "%s\n", "foo");
vs.
(void)fprintf(mystream, "%s\n", "foo");
That may be where the author of the code is coming from. IMO, this is not a great idea because most of the sprintf family, for example, call malloc. malloc will fail when there is not enough memory. SIGINT also causes the underlying write() syscall to interrupt and not write all of the buffer, for printf() family members.
Is there any way to access the command line arguments, without using the argument to main? I need to access it in another function, and I would prefer not passing it in.
I need a solution that only necessarily works on Mac OS and Linux with GCC.
I don't know how to do it on MacOS, but I suspect the trick I will describe here can be ported to MacOS with a bit of cross-reading.
On linux you can use the so called ".init_array" section of the ELF binary, to register a function which gets called during program initilization (before main() is called). This function has the same signature as the normal main() function, execept it returns "void".
Thus, you can use this function to remember or process argc, argv[] and evp[].
Here is some code you can use:
static void my_cool_main(int argc, char* argv[], char* envp[])
{
// your code goes here
}
__attribute__((section(".init_array"))) void (* p_my_cool_main)(int,char*[],char*[]) = &my_cool_main;
PS: This code can also be put in a library, so it should fit your case.
It even works, when your prgram is run with valgrind - valgrind does not fork a new process, and this results in /proc/self/cmdline showing the original valgrind command-line.
PPS: Keep in mind that during this very early program execution many subsystem are not yet fully initialized - I tried libc I/O routines, they seem to work, but don't rely on it - even gloval variables might not yet be constructed, etc...
In Linux, you can open /proc/self/cmdline (assuming that /proc is present) and parse manually (this is only required if you need argc/argv before main() - e.g. in a global constructor - as otherwise it's better to pass them via global vars).
More solutions are available here: http://blog.linuxgamepublishing.com/2009/10/12/argv-and-argc-and-just-how-to-get-them/
Yeah, it's gross and unportable, but if you are solving practical problems you may not care.
You can copy them into global variables if you want.
I do not think you should do it as the C runtime will prepare the arguments and pass it into the main via int argc, char **argv, do not attempt to manipulate the behaviour by hacking it up as it would largely be unportable or possibly undefined behaviour!! Stick to the rules and you will have portability...no other way of doing it other than breaking it...
You can. Most platforms provide global variables __argc and __argv. But again, I support zneak's comment.
P.S. Use boost::program_options to parse them. Please do not do it any other way in C++.
Is there some reason why passing a pointer to space that is already consumed is so bad? You won't be getting any real savings out of eliminating the argument to the function in question and you could set off an interesting display of fireworks. Skirting around main()'s call stack with creative hackery usually ends up in undefined behavior, or reliance on compiler specific behavior. Both are bad for functionality and portability respectively.
Keep in mind the arguments in question are pointers to arguments, they are going to consume space no matter what you do. The convenience of an index of them is as cheap as sizeof(int), I don't see any reason not to use it.
It sounds like you are optimizing rather aggressively and prematurely, or you are stuck with having to add features into code that you really don't want to mess with. In either case, doing things conventionally will save both time and trouble.