Determine if a string is a valid wchar_t* in C - c

I'm trying to recode a part of printf.
setlocale(LC_ALL, "en_US.UTF-8");
int ret = printf("%S\n", "我是一只猫。");
printf("Printf returned %d\n", ret);
If the format is %s, printf writes the wide characters and returns 19.
If the format is %S, printf returns -1 because the argument is not a wide string (no L before "").
In my own implementation of printf, how can I determine if the string passed in parameter is wide, so I can return -1 if it isn't ?
Edit
I'm programming on OS X El Capitan (but I would have like a portable solution if it were possible)
In my programming environment, %S and %ls are the same - it doesn't really matter for my question here
Printf also returns -1 when I don't set a locale for the example with format %s. This is the only reason why I've set a locale.
I'm compiling with clang (Apple LLVM version 7.0.0 (clang-700.1.76))

Basically, you can't. Passing something that is not a wide-string for %S is undefined behaviour, anything can happen, including dæmons flying out of your nose. You are lucky that printf catches that, likely it detects that the contents of "我是一只猫。" when interpreted as an array of wchar_t aren't all valid codepoints (if that happens, errno is set to EILSEQ by printf).

In my own implementation of printf, how can I determine if the string passed in parameter is wide, so I can return -1 if it isn't ?
You cannot. The %S format specifier is documented in printf(3) as
(Not in C99 or C11, but in SUSv2, SUSv3, and SUSv4.) Synonym
for %ls. Don't use.
so you should probably not use it (since it is not in the C11 standard, but in SUSv4). And if you did use it for your own printf, it would be a promise that the corresponding actual argument is a wide string.
You might however, if your C compiler is a recent GCC, use an appropriate format function attribute (it is a GCC extension) in your declaration of your printf (or likewise) function. This would give warnings to the users of ill-typed arguments to your function. And you could even customize GCC (e.g. using MELT) by defining your own function attribute which would enable extra type-checking at compile time, so there is no portable way, given a pointer to something, to check at runtime if it is a pointer to a string or to something else (like an array of integers).
At runtime, your printf would use stdarg(3) facilities so would have to "interpret" the format string to handle appropriately the various format specifiers. Without compiler support (à la __attribute__((format(printf,1,2))) in GCC (also supported by Clang), or with your own function attribute) you cannot get any compile-time type checking for variadic functions. And the type information is erased in C at runtime.
Look also at existing implementation of printf like functions in free software implementations of the C standard library. The stdio/vfprintf.c file of MUSL libc is quite readable.
Also, GNU libunistring has some elementary string checks functions like e.g. u16_check which checks if an array (whose size is given) of 16 bits integers is a valid UTF16 string. Notice that "我是一只猫。" in UTF8 is not a zero-doublebyte or zero-widechar terminated UTF16 string (so simply computing its length as wchar_t* wide string is undefined behavior, because of buffer overflow!) and might not even have the required alignment for wide strings.

Related

Format overflow warning when trying to store a wide string

I'm currently learning C and lately, I have been focusing on the topic of character encoding. Note that I'm a Windows programmer. While I currently test my code only on Windows, I want to eventually port it to Linux and macOS, so I'm trying to learn the best practices right now.
In the example below, I store a file path in a wchar_t variable to be opened later on with _wfopen. I need to use _wfopen because my file path may contain chars not in my default codepage. Afterwards, the file path and a text literal is stored inside a char variable named message for further use. My understanding is that you can store a wide string into a multibyte string with the %ls modifier.
char message[8094] = "";
wchar_t file_path[4096] = L"C:\\test\\test.html";
sprintf(message, "Accessing: %ls\n", file_path);
While the code works, GCC/MinGW outputs the following warning and notes:
warning: '%ls' directive writing up to 49146 bytes into a region of size 8083 [-Wformat-overflow=]|
note: assuming directive output of 16382 bytes|
note: 'sprintf' output between 13 and 49159 bytes into a destination of size 8094|
My issue is that I simply do not understand how sprintf could output up to 49159 bytes into the message variable. I output the Accessing: string literal, the file_path variable, the \n char and the \0 char. What else is there to output?
Sure, I could declare message as a wchar_t variable and use wsprintf instead of sprintf, but my understanding is that wchar_t does not make up for nice portable code. As such, I'm trying to avoid using it unless it's required by a specific API.
So, what am I missing?
The warning doesn't take into account the actual contents of file_path , it is calculated based on file_path having any possible content . There would be an overflow if file_path consisted of 4095 emoji and a null terminator.
Using %ls in narrow printf family converts the source to multi-byte characters which could be several bytes for each wide character.
To avoid this warning you could:
disable it with -Wno-format-overflow
use snprintf instead of sprintf
The latter is always a good idea IMHO, it is always good to have a second line of defence against mistakes introduced in code maintenance later (e.g. someone comes along and changes the code to grab a path from user input instead of hardcoded value).
After-word. Be very careful using wide characters and printf family in MinGW , which implements the printf family by calling MSVCRT which does not follow the C Standard. Further reading
To get closer to standard behaviour, use a build of MinGW-w64 which attempts to implement stdio library functions itself, instead of deferring to MSVCRT. (E.g. MSYS2 build).

Why printf is not able to handle flags, field width and precisions properly?

I'm trying to discover all capabilities of printf and I have tried this :
printf("Test:%+*0d", 10, 20);
that prints
Test:%+100d
I have use first the flag +, then the width * and the re-use the flag 0.
Why it's make this output ? I purposely used printf() in a bad way but I wonder why it shows me the number 100?
This is because, you're supplying syntactical nonsense to the compiler, so it is free to do whatever it wants. Related reading, undefined behavior.
Compile your code with warnings enabled and it will tell you something like
warning: unknown conversion type character ‘0’ in format [-Wformat=]
printf("Test:%+*0d", 10, 20);
^
To be correct, the statement should be either of
printf("Test:%+*.0d", 10, 20); // note the '.'
where, the 0 is used as a precision
Related, quoting the C11, chapter §7.21.6.1, (emphasis mine)
An optional precision that gives the minimum number of digits to appear for the d, i,
o, u, x, and X conversions, the number of digits to appear after the decimal-point
character for a, A, e, E, f, and F conversions, the maximum number of significant
digits for the g and G conversions, or the maximum number of bytes to be written for s conversions. The precision takes the form of a period (.) followed either by an
asterisk * (described later) or by an optional decimal integer; if only the period is
specified, the precision is taken as zero. If a precision appears with any other
conversion specifier, the behavior is undefined.
printf("Test:%+0*d", 10, 20);
where, the 0 is used as a flag. As per the syntax, all the flags should appear together, before any other conversion specification entry, you cannot just put it anywhere in the conversion specification and expect the compiler to follow your intention.
Again, to quote, (and my emphasis)
Each conversion specification is introduced by the character %. After the %, the following
appear in sequence:
Zero or more flags (in any order) [...]
An optional minimum field width [...]
An optional precision [...]
An optional length modifier [...]
A conversion specifier [....]
Your printf format is incorrect: the flags must precede the width specifier.
After it handles * as the width specifier, printf expects either a . or a length modifier or a conversion specifier, 0 being none of these, the behavior is undefined.
Your library's implementation of printf does something bizarre, it seems to handle * by replacing it with the actual width argument... A side effect of the implementation. Others may do something else, including aborting the program. Such a format error would be especially risky if followed by a %s conversion.
Changing your code to printf("Test:%+0*d", 10, 20); should produce the expected output:
Test:+000000020
In complement of Sourav Ghosh's answer; an important notion is that of undefined behavior, which is tricky. Be sure to read Lattner's blog: What Every C Programmer Should Know About Undefined Behavior. See also this.
So, leaving on purpose (or perhaps depending upon) some undefined behavior in your code is intentional malpractice. Don't do that. In the very rare cases you want to do that (I cannot see any), please document it and justify yourself in some comment.
Be aware that if indeed printf is implemented by the C standard library, it can be (and often is) specially handled by the compiler (with GCC and GNU libc, that magic might happens using internally __builtin_printf)
The C99 & C11 standards are partially specifying the behavior of printf but does leave some undefined behavior cases to ease the implementation. You are unlikely to full understand or be able to mimic these cases. And the implementation itself could change (for example, on my Debian Linux, an upgrade of libc might change the undefined behavior of printf)
If you want to understand more printf study the source of some C standard library implementation (e.g. musl-libc, whose code is quite readable) and of the GCC implementation (assuming a Linux operating system).
But maintainers of GNU libc and of GCC (& even of the Linux kernel, thru syscalls) stay free to change the undefined behavior (of printf and anything else)
In practice, always compile with gcc -Wall (and probably also -g) if using GCC. Don't accept any warnings (so improve your own code till you get none).

What is the underlying difference between printf(s) and printf("%s", s)?

The question is plain and simple, s is a string, I suddenly got the idea to try to use printf(s) to see if it would work and I got a warning in one case and none in the other.
char* s = "abcdefghij\n";
printf(s);
// Warning raised with gcc -std=c11:
// format not a string literal and no format arguments [-Wformat-security]
// On the other hand, if I use
char* s = "abc %d efg\n";
printf(s, 99);
// I get no warning whatsoever, why is that?
// Update, I've tested this:
char* s = "random %d string\n";
printf(s, 99, 50);
// Results: no warning, output "random 99 string".
So what's the underlying difference between printf(s) and printf("%s", s) and why do I get a warning in just one case?
In the first case, the non-literal format string could perhaps come from user code or user-supplied (run-time) data, in which case it might contain %s or other conversion specifications, for which you've not passed the data. This can lead to all sorts of reading problems (and writing problems if the string includes %n — see printf() or your C library's manual pages).
In the second case, the format string controls the output and it doesn't matter whether any string to be printed contains conversion specifications or not (though the code shown prints an integer, not a string). The compiler (GCC or Clang is used in the question) assumes that because there are arguments after the (non-literal) format string, the programmer knows what they're up to.
The first is a 'format string' vulnerability. You can search for more information on the topic.
GCC knows that most times the single argument printf() with a non-literal format string is an invitation to trouble. You could use puts() or fputs() instead. It is sufficiently dangerous that GCC generates the warnings with the minimum of provocation.
The more general problem of a non-literal format string can also be problematic if you are not careful — but extremely useful assuming you are careful. You have to work harder to get GCC to complain: it requires both -Wformat and -Wformat-nonliteral to get the complaint.
From the comments:
So ignoring the warning, as if I really know what I am doing and there will be no errors, is one or another more efficient to use or are they the same? Considering both space and time.
Of your three printf() statements, given the tight context that the variable s is as assigned immediately above the call, there is no actual problem. But you could use puts(s) if you omitted the newline from the string or fputs(s, stdout) as it is and get the same result, without the overhead of printf() parsing the entire string to find out that it is all simple characters to be printed.
The second printf() statement is also safe as written; the format string matches the data passed. There is no significant difference between that and simply passing the format string as a literal — except that the compiler can do more checking if the format string is a literal. The run-time result is the same.
The third printf() passes more data arguments than the format string needs, but that is benign. It isn't ideal, though. Again, the compiler can check better if the format string is a literal, but the run-time effect is practically the same.
From the printf() specification linked to at the top:
Each of these functions converts, formats, and prints its arguments under control of the format. The format is a character string, beginning and ending in its initial shift state, if any. The format is composed of zero or more directives: ordinary characters, which are simply copied to the output stream, and conversion specifications, each of which shall result in the fetching of zero or more arguments. The results are undefined if there are insufficient arguments for the format. If the format is exhausted while arguments remain, the excess arguments shall be evaluated but are otherwise ignored.
In all these cases, there is no strong indication of why the format string is not a literal. However, one reason for wanting a non-literal format string might be that sometimes you print the floating point numbers in %f notation and sometimes in %e notation, and you need to choose which at run-time. (If it is simply based on value, %g might be appropriate, but there are times when you want the explicit control — always %e or always %f.)
The warning says it all.
First, to discuss about the issue, as per the signature, the first parameter to printf() is a format string which can contain format specifiers (conversion specifier). In case, a string contains a format specifier and the corresponding argument is not supplied, it invokes undefined behavior.
So, a cleaner (or safer) approach (of printing a string which needs no format specification) would be puts(s); over printf(s); (the former does not process s for any conversion specifiers, removing the reason for the possible UB in the later case). You can choose fputs(), if you're worried about the ending newline that automatically gets added in puts().
That said, regarding the warning option, -Wformat-security from the online gcc manual
At present, this warns about calls to printf and scanf functions where the format string is not a string literal and there are no format arguments, as in printf (foo);. This may be a security hole if the format string came from untrusted input and contains %n.
In your first case, there's only one argument supplied to printf(), which is not a string literal, rather a variable, which can be very well generated/ populated at run time, and if that contains unexpected format specifiers, it may invoke UB. Compiler has no way to check for the presence of any format specifier in that. That is the security problem there.
In the second case, the accompanying argument is supplied, the format specifier is not the only argument passed to printf(), so the first argument need not to be verified. Hence the warning is not there.
Update:
Regarding the third one, with excess argument that required by the supplied format string
printf(s, 99, 50);
quoting from C11, chapter §7.21.6.1
[...] If the format is exhausted while arguments remain, the excess arguments are
evaluated (as always) but are otherwise ignored. [...]
So, passing excess argument is not a problem (from the compiler perspective) at all and it is well defined. NO scope for any warning there.
There are two things in play in your question.
The first is covered succinctly by Jonathan Leffler - the warning you're getting is because the string isn't literal and doesn't have any format specifiers in it.
The other is the mystery of why the compiler doesn't issue a warning that your number of arguments doesn't match the number of specifiers. The short answer is "because it doesn't," but more specifically, printf is a variadic function. It takes any number of arguments after the initial format specification - from 0 on up. The compiler can't check to see if you gave the right amount; that's up to the printf function itself, and leads to the undefined behavior that Joachim mentioned in comments.
EDIT:
I'm going to give further answer to your question, as a means of getting on a small soapbox.
What's the difference between printf(s) and printf("%s", s)? Simple - in the latter, you're using printf as it's declared. "%s" is a const char *, and it will subsequently not generate the warning message.
In your comments to other answers, you mentioned "Ignoring the warning...". Don't do this. Warnings exist for a reason, and should be resolved (otherwise they're just noise, and you'll miss warnings that actually matter among the cruft of all the ones that don't.)
Your issue can be resolved in several ways.
const char* s = "abcdefghij\n";
printf(s);
will resolve the warning, because you're now using a const pointer, and there are none of the dangers that Jonathan mentioned. (You could also declare it as const char* const s, but don't have to. The first const is important, because it then matches the declaration of printf, and because const char* s means that characters pointed to by s can't change, i.e. the string is a literal.)
Or, even simpler, just do:
printf("abcdefghij\n");
This is implicitly a const pointer, and also not a problem.
So what's the underlying difference between printf(s) and printf("%s", s)
"printf(s)" will treat s as a format string. If s contains format specifiers then printf will interpret them and go looking for varargs. Since no varargs actually exist this will likely trigger undefined behaviour.
If an attacker controls "s" then this is likely to be a security hole.
printf("%s",s) will just print what is in the string.
and why do I get a warning in just one case?
Warnings are a balance between catching dangerous stupidity and not creating too much noise.
C programmers are in the habbit of using printf and various printf like functions* as generic print functions even when they don't actually need formatting. In this environment it's easy for someone to make the mistake of writing printf(s) without thinking about where s came from. Since formatting is pretty useless without any data to format printf(s) has little legitimate use.
printf(s,format,arguments) on the other hand indicates that the programmer deliberately intended formatting to take place.
Afaict this warning is not turned on by default in upstream gcc, but some distros are turning it on as part of their efforts to reduce security holes.
* Both standard C functions like sprintf and fprintf and functions in third party libraries.
The underlying reason: printf is declared like:
int printf(const char *fmt, ...) __attribute__ ((format(printf, 1, 2)));
This tells gcc that printf is a function with a printf-style interface where the format string comes first. IMHO it must be literal; I don't think there's a way to tell the good compiler that s is actually a pointer to a literal string it had seen before.
Read more about __attribute__ here.

Why is printf with a single argument (without conversion specifiers) deprecated?

In a book that I'm reading, it's written that printf with a single argument (without conversion specifiers) is deprecated. It recommends to substitute
printf("Hello World!");
with
puts("Hello World!");
or
printf("%s", "Hello World!");
Can someone tell me why printf("Hello World!"); is wrong? It is written in the book that it contains vulnerabilities. What are these vulnerabilities?
printf("Hello World!"); is IMHO not vulnerable but consider this:
const char *str;
...
printf(str);
If str happens to point to a string containing %s format specifiers, your program will exhibit undefined behaviour (mostly a crash), whereas puts(str) will just display the string as is.
Example:
printf("%s"); //undefined behaviour (mostly crash)
puts("%s"); // displays "%s\n"
printf("Hello world");
is fine and has no security vulnerability.
The problem lies with:
printf(p);
where p is a pointer to an input that is controlled by the user. It is prone to format strings attacks: user can insert conversion specifications to take control of the program, e.g., %x to dump memory or %n to overwrite memory.
Note that puts("Hello world") is not equivalent in behavior to printf("Hello world") but to printf("Hello world\n"). Compilers usually are smart enough to optimize the latter call to replace it with puts.
Further to the other answers, printf("Hello world! I am 50% happy today") is an easy bug to make, potentially causing all manner of nasty memory problems (it's UB!).
It's just simpler, easier and more robust to "require" programmers to be absolutely clear when they want a verbatim string and nothing else.
And that's what printf("%s", "Hello world! I am 50% happy today") gets you. It's entirely foolproof.
(Steve, of course printf("He has %d cherries\n", ncherries) is absolutely not the same thing; in this case, the programmer is not in "verbatim string" mindset; she is in "format string" mindset.)
I'll just add a bit of information regarding the vulnerability part here.
It's said to be vulnerable because of printf string format vulnerability. In your example, where the string is hardcoded, it's harmless (even if hardcoding strings like this is never fully recommended). But specifying the parameter's types is a good habit to take. Take this example:
If someone puts format string character in your printf instead of a regular string (say, if you want to print the program stdin), printf will take whatever he can on the stack.
It was (and still is) very used to exploit programs into exploring stacks to access hidden information or bypass authentication for example.
Example (C):
int main(int argc, char *argv[])
{
printf(argv[argc - 1]); // takes the first argument if it exists
}
if I put as input of this program "%08x %08x %08x %08x %08x\n"
printf ("%08x %08x %08x %08x %08x\n");
This instructs the printf-function to retrieve five parameters from the stack and display them as 8-digit padded hexadecimal numbers. So a possible output may look like:
40012980 080628c4 bffff7a4 00000005 08059c04
See this for a more complete explanation and other examples.
This is misguided advice. Yes, if you have a run-time string to print,
printf(str);
is quite dangerous, and you should always use
printf("%s", str);
instead, because in general you can never know whether str might contain a % sign. However, if you have a compile-time constant string, there's nothing whatsoever wrong with
printf("Hello, world!\n");
(Among other things, that is the most classic C program ever, literally from the C programming book of Genesis. So anyone deprecating that usage is being rather heretical, and I for one would be somewhat offended!)
Calling printf with literal format strings is safe and efficient, and there
exist tools to automatically warn you if your invocation of printf with user
provided format strings is unsafe.
The most severe attacks on printf take advantage of the %n format
specifier. In contrast to all other format specifiers, e.g. %d, %n actually
writes a value to a memory address provided in one of the format arguments.
This means that an attacker can overwrite memory and thus potentially take
control of your program. Wikipedia
provides more detail.
If you call printf with a literal format string, an attacker cannot sneak
a %n into your format string, and you are thus safe. In fact,
gcc will change your call to printf into a call to puts, so there litteraly
isn't any difference (test this by running gcc -O3 -S).
If you call printf with a user provided format string, an attacker can
potentially sneak a %n into your format string, and take control of your
program. Your compiler will usually warn you that his is unsafe, see
-Wformat-security. There are also more advanced tools that ensure that
an invocation of printf is safe even with user provided format strings, and
they might even check that you pass the right number and type of arguments to
printf. For example, for Java there is Google's Error Prone
and the Checker Framework.
A rather nasty aspect of printf is that even on platforms where the stray memory reads could only cause limited (and acceptable) harm, one of the formatting characters, %n, causes the next argument to be interpreted as a pointer to a writable integer, and causes the number of characters output thus far to be stored to the variable identified thereby. I've never used that feature myself, and sometimes I use lightweight printf-style methods which I've written to include only the features I actually use (and don't include that one or anything similar) but feeding standard printf functions strings received from untrustworthy sources may expose security vulnerabilities beyond the ability to read arbitrary storage.
Since no one has mentioned, I'd add a note regarding their performance.
Under normal circumstances, assuming no compiler optimisations are used (i.e. printf() actually calls printf() and not fputs()), I would expect printf() to perform less efficiently, especially for long strings. This is because printf() has to parse the string to check if there are any conversion specifiers.
To confirm this, I have run some tests. The testing is performed on Ubuntu 14.04, with gcc 4.8.4. My machine uses an Intel i5 cpu. The program being tested is as follows:
#include <stdio.h>
int main() {
int count = 10000000;
while(count--) {
// either
printf("qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM");
// or
fputs("qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM", stdout);
}
fflush(stdout);
return 0;
}
Both are compiled with gcc -Wall -O0. Time is measured using time ./a.out > /dev/null. The following is the result of a typical run (I've run them five times, all results are within 0.002 seconds).
For the printf() variant:
real 0m0.416s
user 0m0.384s
sys 0m0.033s
For the fputs() variant:
real 0m0.297s
user 0m0.265s
sys 0m0.032s
This effect is amplified if you have a very long string.
#include <stdio.h>
#define STR "qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM"
#define STR2 STR STR
#define STR4 STR2 STR2
#define STR8 STR4 STR4
#define STR16 STR8 STR8
#define STR32 STR16 STR16
#define STR64 STR32 STR32
#define STR128 STR64 STR64
#define STR256 STR128 STR128
#define STR512 STR256 STR256
#define STR1024 STR512 STR512
int main() {
int count = 10000000;
while(count--) {
// either
printf(STR1024);
// or
fputs(STR1024, stdout);
}
fflush(stdout);
return 0;
}
For the printf() variant (ran three times, real plus/minus 1.5s):
real 0m39.259s
user 0m34.445s
sys 0m4.839s
For the fputs() variant (ran three times, real plus/minus 0.2s):
real 0m12.726s
user 0m8.152s
sys 0m4.581s
Note: After inspecting the assembly generated by gcc, I realised that gcc optimises the fputs() call to an fwrite() call, even with -O0. (The printf() call remains unchanged.) I am not sure whether this will invalidate my test, as the compiler calculates the string length for fwrite() at compile-time.
For gcc it is possible to enable specific warnings for checking printf() and scanf().
The gcc documentation states:
-Wformat is included in -Wall. For more control over some aspects
of format checking, the options -Wformat-y2k,
-Wno-format-extra-args, -Wno-format-zero-length,
-Wformat-nonliteral, -Wformat-security, and -Wformat=2 are
available, but are not included in -Wall.
The -Wformat which is enabled within the -Wall option does not enable several special warnings that help to find these cases:
-Wformat-nonliteral will warn if you do not pass a string litteral as format specifier.
-Wformat-security will warn if you pass a string that might contain a dangerous construct. It's a subset of -Wformat-nonliteral.
I have to admit that enabling -Wformat-security revealed several bugs we had in our codebase (logging module, error handling module, xml output module, all had some functions that could do undefined things if they had been called with % characters in their parameter. For info, our codebase is now around 20 years old and even if we were aware of these kind of problems, we were extremely surprised when we enabled these warnings how many of these bugs were still in the codebase).
printf("Hello World\n")
automatically compiles to the equivalent
puts("Hello World")
you can check it with diassembling your executable:
push rbp
mov rbp,rsp
mov edi,str.Helloworld!
call dword imp.puts
mov eax,0x0
pop rbp
ret
using
char *variable;
...
printf(variable)
will lead to security issues, don't ever use printf that way!
so your book is actually correct, using printf with one variable is deprecated but you can still use printf("my string\n") because it will automatically become puts
Beside the other well-explained answers with any side-concerns covered, I would like to give a precise and concise answer to the provided question.
Why is printf with a single argument (without conversion specifiers) deprecated?
A printf function call with a single argument in general is not deprecated and has also no vulnerabilities when used properly as you always shall code.
C Users amongst the whole world, from status beginner to status expert use printf that way to give a simple text phrase as output to the console.
Furthermore, Someone have to distinguish whether this one and only argument is a string literal or a pointer to a string, which is valid but commonly not used. For the latter, of course, there can occur inconvenient outputs or any kind of Undefined Behavior, when the pointer is not set properly to point to a valid string but these things can also occur if the format specifiers are not matching the respective arguments by giving multiple arguments.
Of course, It is also not right and proper that the string, provided as one and only argument, has any format or conversion specifiers, since there is no conversion going to be happen.
That said, giving a simple string literal like "Hello World!" as only argument without any format specifiers inside that string like you provided it in the question:
printf("Hello World!");
is not deprecated or "bad practice" at all nor has any vulnerabilities.
In fact, many C programmers begin and began to learn and use C or even programming languages in general with that HelloWorld-program and this printf statement as first ones of its kind.
They wouldn´t be that if they were deprecated.
In a book that I'm reading, it's written that printf with a single argument (without conversion specifiers) is deprecated.
Well, then I would take the focus on the book or the author itself. If an author is really doing such, in my opinion, incorrect assertions and even teaching that without explicitly explaining why he/she is doing so (if those assertions are really literally equivalent provided in that book), I would consider it a bad book. A good book, as opposed to that, shall explain why to avoid certain kind of programming methods or functions.
According to what I said above, using printf with only one argument (a string literal) and without any format specifiers is not in any case deprecated or considered as "bad practice".
You should ask the author, what he meant with that or even better, mind him to clarify or correct the relative section for the next edition or imprints in general.

Valid printf() statements in C

Given that:
char *message = "Hello, World";
char *format = "x=%i\n";
int x = 10;
Why is printf (message); invalid (i.e. rejected by compiler for being potentially insecure) and printf (format, x); isn't?
Is format treated as a string literal in this case and message as a format string? If so, why?
Update
I know why printf (message); is rejected. My question is, why is printf (format, x); not rejected too.
I'm using clang. The error message for printf (message); is format string is not string literal (potentially insecure).
It compiles fine under gcc. So it does appear to be compiler specific and to do with how clang sets it warnings.
It is a compiler limitation.
If it is known at compile time that the pointer is pointing at a string literal, then the compiler could check the specifiers and omit the warning.
There is no special reason, why you are getting a warning for one and not the other. Standard doesn't specify anything relevant to this issue. It is just how the compiler is implemented. A different one might warn for both cases, or neither one.
You can get a warning in both cases by enabling the -Wformat-nonliteral option, which is not included in either -Wall or -Wextra (but it is in -Weverything).
For whatever reason, this seems like an intentional design decision to emit a security warning only when the non-literal printf statement takes no additional arguments. The source code which emits this warning can be found in lib/Sema/SemaChecking.cpp:
// If there are no arguments specified, warn with -Wformat-security, otherwise
// warn only with -Wformat-nonliteral.
if (Args.size() == firstDataArg)
Diag(Args[format_idx]->getLocStart(),
diag::warn_format_nonliteral_noargs)
<< OrigFormatExpr->getSourceRange();
else
Diag(Args[format_idx]->getLocStart(),
diag::warn_format_nonliteral)
<< OrigFormatExpr->getSourceRange();
I'd guess that this is for compatibility with existing legacy code, but that's pure speculation.
If you pass arguments to be formatted to printf, it expects you know that the first argument is going to be a format string. If it weren’t a format string, well, what would it do with all those extra arguments? It’s a reasonable inference.
On the other hand, say we specified no data to be formatted beyond the format string itself. Then what if it did have format specifiers in it? It would always be erroneous unless no format specifiers were in it, and so since there is a safe alternative for this situation, it’s warning you.
Clang has -Wformat-security on by default. You can suppress the warning by passing the option -Wno-format-security while compiling. That should let you compile printf(message);.
You can find the description here.
From the gcc docpage:
-Wformat-security:
If -Wformat is specified, also warn about uses of format functions that represent possible security problems. At present, this warns about calls to printf and scanf functions where the format string is not a string literal and there are no format arguments, as in printf (foo);. This may be a security hole if the format string came from untrusted input and contains `%n'. (This is currently a subset of what -Wformat-nonliteral warns about, but in future warnings may be added to -Wformat-security that are not included in -Wformat-nonliteral.)"
So, if there are no format arguments, it simply checks that the format string is a string literal. And here's the definition of a string literal. Starting from C++11, char[] and char*'s are not longer considered string literals. Note that you can make message a string literal by making it a const char[].
Coming to the question you raised in the comment, printf("%fHello") will generate a warning if -Wformat is set.
The function prototype for printf is declared as:
int printf (const char * format, ...);
where ... is a variable argument list (i.e. 0 to ∞).
format is treated as a format string. If you want print message using printf() you can try printf (format, message); where format equals "%s".

Resources