will 'printf' always do its job? - c

printf("/*something else*/"); /*note that:without using \n in printf*/
I know printf() uses a buffer which prints whatever it contains when, in the line buffer, "\n" is seen by the buffer function. So when we forget to use "\n" in printf(), rarely, line buffer will not be emptied. Therefore, printf() wont do its job. Am I wrong?

The example you gave above is safe as there are no variable arguments to printf. However it is possible to specify a format string and supply variables that do not match up with the format, which can deliver unexpected (and unsafe) results. Some compilers are taking a more proactive approach with printf use case analysis, but even then one should be very, very careful when printf is used.

From my man page:
These functions return the number of characters printed (not including
the trailing \0 used to end output to strings) or a negative value
if an output error occurs, except for snprintf() and vsnprintf(), which
return the number of characters that would have been printed if the n
were unlimited (again, not including the final \0).
So, it sounds like the can fail with a negative error.

Yes, output to stdout in C (using printf) is normally line buffered. This means that printf() will collect output until either:
the buffer is full, or
the output contains a \n newline
If you want to force the output of the buffer, call fflush(stdout). This will work even if you have printed something without a newline.

Also printf and friends can fail.
Common implementations of C call malloc() in the printf family of the stdC library.
malloc can fail, so then will printf. In UNIX the write() call can be interrupted by EINTR, so context switching in UNIX will trigger faults (EINTR). Windows can and will do similar things.
And... Although you do not see it posted here often you should always check the return code from any system or library function that returns a value.

Like that, no. It won't always work as you expect, especially if you're using user input as the format string. If the first argument has %s or %d or other format specifiers in it, they will be parsed and replaced with values from the stack, which can easily break if it's expecting a pointer and gets an int instead.
This way tends to be a lot safer:
printf("%s", "....");
The output buffer will be flushed before exit, or before you get input, so the data will make it regardless of whether you send a \n.

printf could fail for any number of reasons. If you're deep in recursion, calling printf may blow your stack. The C and C++ standards have little to say on threading issues and calling printf while printf is executing in another thread may fail. It could fail because stdout is attached to a file and you just filled your filesystem, in which case the return value tells you there was a problem. If you call printf with a string that isn't zero terminated then bad things could happen. And printf can apparently fail if you're using buffered I/O and your buffer hasn't been flushed yet.

Related

Will printf still have a cost even if I redirect output to /dev/null?

We have a daemon that contains a lot of print messages. Since we are working on an embedded device with a weak CPU and other constraint hardware, we want to minimize any kinds of costs (IO, CPU, etc..) of printf messages in our final version. (Users don't have a console)
My teammate and I have a disagreement. He thinks we can just redirect everything to /dev/null. It won't cost any IO so affections will be minimal. But I think it will still cost CPU and we better define a macro for printf so we can rewrite "printf" (maybe just return).
So I need some opinions about who is right. Will Linux be smart enough to optimize printf? I really doubt it.
Pretty much.
When you redirect the stdout of the program to /dev/null, any call to printf(3) will still evaluate all the arguments, and the string formatting process will still take place before calling write(2), which writes the full formatted string to the standard output of the process. It's at the kernel level that the data isn't written to disk, but discarded by the handler associated with the special device /dev/null.
So at the very best, you won't bypass or evade the overhead of evaluating the arguments and passing them to printf, the string formatting job behind printf, and at least one system call to actually write the data, just by redirecting stdout to /dev/null. Well, that's a true difference on Linux. The implementation just returns the number of bytes you wanted to write (specified by the 3rd argument of your call to write(2)) and ignores everything else (see this answer). Depending on the amount of data you're writing, and the speed of the target device (disk or terminal), the difference in performance may vary a lot. On embedded systems, generally speaking, cutting off the disk write by redirecting to /dev/null can save quite some system resources for a non-trivial amount of written data.
Although in theory, the program could detect /dev/null and perform some optimizations within the restrictions of standards they comply to (ISO C and POSIX), based on general understanding of common implementations, they practically don't (i.e. I am unaware of any Unix or Linux system doing so).
The POSIX standard mandates writing to the standard output for any call to printf(3), so it's not standard-conforming to suppress the call to write(2) depending on the associated file descriptors. For more details about POSIX requirements, you can read Damon's answer. Oh, and a quick note: All Linux distros are practically POSIX-compliant, despite not being certified to be so.
Be aware that if you replace printf completely, some side effects may go wrong, for example printf("%d%n", a++, &b). If you really need to suppress the output depending on the program execution environment, consider setting a global flag and wrap up printf to check the flag before printing — it isn't going to slow down the program to an extent where the performance loss is visible, as a single condition check is much faster than calling printf and doing all the string formatting.
The printf function will write to stdout. It is not conforming to optimize for /dev/null.
Therefore, you will have the overhead of parsing the format string and evaluating any necessary arguments, and you will have at least one syscall, plus you will copy a buffer to kernel address space (which, compared to the cost of the syscall is neglegible).
This answer is based on the specific documentation of POSIX.
System Interfaces
dprintf, fprintf, printf, snprintf, sprintf - print formatted output
The fprintf() function shall place output on the named output stream. The printf() function shall place output on the standard output stream stdout. The sprintf() function shall place output followed by the null byte, '\0', in consecutive bytes starting at *s; it is the user's responsibility to ensure that enough space is available.
Base Definitions
shall
For an implementation that conforms to POSIX.1-2017, describes a feature or behavior that is mandatory. An application can rely on the existence of the feature or behavior.
The printf function writes to stdout. If the file descriptor connected to stdout is redirected to /dev/null then no output will be written anywhere (but it will still be written), but the call to printf itself and the formatting it does will still happen.
Write your own that wraps printf() using the printf() source as a guideline, and returning immediately if a noprint flag is set. The downside of this is when actually printing it will consume more resources because of having to parse the format string twice. But it uses negligible resources when not printing. Can't simply replace printf() because the underlying calls inside printf() can change with a newer version of the stdio library.
void printf2(const char *formatstring, ...);
Generally speaking, an implementation is permitted to perform such optimisations if they do not affect the observable (functional) outputs of the program. In the case of printf(), that would mean that if the program doesn't use the return value, and if there are no %n conversions, then the implementation would be allowed to do nothing.
In practice, I'm not aware of any implementation on Linux that currently (early 2019) performs such an optimisation - the compilers and libraries I'm familiar with will format the output and write the result to the null device, relying on the kernel' to ignore it.
You may want to write a forwarding function of your own if you really need to save the cost of formatting when the output is not used - you'll want to it to return void, and you should check the format string for %n. (You could use snprintf with a NULL and 0 buffer if you need those side-effects, but the savings are unlikely to repay the effort invested).
in C writing 0; does execute and nothing, which is similar to ;.
means you can write a macro like
#if DEBUG
#define devlognix(frmt,...) fprintf(stderr,(frmt).UTF8String,##__VA_ARGS__)
#else
#define nadanix 0
#define devlognix(frmt,...) nadanix
#endif
#define XYZKitLogError(frmt, ...) devlognix(frmt)
where XYZKitLogError would be your Log command.
or even
#define nadanix ;
which will kick out all log calls at compile time and replace with 0; or ; so it gets parsed out.
you will get Unused variable warnings, but it does what you want and this side effect can even be helpful because it tells you about computation that is not needed in your release.
.UTF8String is an Objective-C method converting NSStrings to const char* - you don't need.

printf and %s in C

I understand that this code segment is supposed to have a buffer overflow vulnerability problem:
printf("File %s", my_file_name);
printf("File %s");
However, I don't get exactly why it is considered risky. Would anyone be able to shed some light on this?
The code below outputs the contents of my_file_name to the standard output:
printf("File %s", my_file_name);
If my_file_name is received from a malicious source and the program outputs to the terminal, it is possible for the malicious source to have put escape sequences in my_file_name that tell the terminal to perform non trivial tasks such as sending terminal contents back through standard input. It is difficult but conceivable that an attacker may derive useful information from such an attack or even attempt to corrupt data via executing commands as if they were typed by the user.
Of course the second call invokes undefined behavior as you are not passing a valid string pointer as a second argument to printf.
The above scenario is probably not what you are referring to by buffer overflow vulnerability. There is no such vulnerability in the printf code it self, but is a buffer overflow flaw exists somewhere else in your code, and the actual format string can be patched via this overflow, an attacker can take advantage of printf's capabilities, especially the %n format to poke any value to almost any location in the program memory. This is the rationale for removing %n in printf_s as exposed in a Microsoft security paper.
The first call is fine. (Issues exist when you use a user provided format string, as in printf(s), where s is under the influence of the user. Here you use a hard-coded format string "File %s", which is not vulnerable. The contents of the string my_file_name will be treated as a regular C string and just be copied to the standard output. Of course it must be null terminated, and if the output is redirected to something else there can be side effects there, but that's not a printf issue.)
The second call is simply undefined behavior, because the number of parameters after the format string (0) does not match the number which the format string demands (1).

Implementing scanf using read()

I want to implement scanf function using read system call, but I have a problem - when you use read you have to specify how many bits it will read, and I am not sure what number to use. I have for example implemented flag %b for binary numbers, %x for hex etc, should I just use a constant amount of bits like 256?
You will probably have to implement a buffering layer. When you call read(), you get a number of bytes from the stream. From the point of view of the system, those bytes are now gone and will not be returned by a subsequent call to read(). The problem is, if you have a format specifier like %d, you can't know when you are done reading the number until you actually read the next non-digit byte. You'll have to store that already-read input somewhere so you can use it later, for example for a following %s.
You'll basically have to do your own buffering, since any number you choose may be too large in some cases (unless you choose something ridiculously small) and too small in others (for example, lots of whitespace), so there is no right number.
read(), without a layer of buffering code, will not work to implement scanf() as scanf() needs the ability to "unget" characters back to stdin.
Instead, scanf() could be implemented with fgetc()/unget(), but it is a great deal to implement a complete scanf(), no matter what route is taken.

What's the difference between gets and scanf?

If the code is
scanf("%s\n",message)
vs
gets(message)
what's the difference?It seems that both of them get input to message.
The basic difference [in reference to your particular scenario],
scanf() ends taking input upon encountering a whitespace, newline or EOF
gets() considers a whitespace as a part of the input string and ends the input upon encountering newline or EOF.
However, to avoid buffer overflow errors and to avoid security risks, its safer to use fgets().
Disambiguation: In the following context I'd consider "safe" if not leading to trouble when correctly used. And "unsafe" if the "unsafetyness" cannot be maneuvered around.
scanf("%s\n",message)
vs
gets(message)
What's the difference?
In terms of safety there is no difference, both read in from Standard Input and might very well overflow message, if the user enters more data then messageprovides memory for.
Whereas scanf() allows you to be used safely by specifying the maximum amount of data to be scanned in:
char message[42];
...
scanf("%41s", message); /* Only read in one few then the buffer (messega here)
provides as one byte is necessary to store the
C-"string"'s 0-terminator. */
With gets() it is not possible to specify the maximum number of characters be read in, that's why the latter shall not be used!
The main difference is that gets reads until EOF or \n, while scanf("%s") reads until any whitespace has been encountered. scanf also provides more formatting options, but at the same time it has worse type safety than gets.
Another big difference is that scanf is a standard C function, while gets has been removed from the language, since it was both superfluous and dangerous: there was no protection against buffer overruns. The very same security flaw exists with scanf however, so neither of those two functions should be used in production code.
You should always use fgets, the C standard itself even recommends this, see C11 K.3.5.4.1
Recommended practice
6 The fgets function allows properly-written
programs to safely process input lines too long to store in the result
array. In general this requires that callers of fgets pay attention to
the presence or absence of a new-line character in the result array.
Consider using fgets (along with any needed processing based on
new-line characters) instead of gets_s.
(emphasis mine)
There are several. One is that gets() will only get character string data. Another is that gets() will get only one variable at a time. scanf() on the other hand is a much, much more flexible tool. It can read multiple items of different data types.
In the particular example you have picked, there is not much of a difference.
gets - Reads characters from stdin and stores them as a string.
scanf - Reads data from stdin and stores them according to the format specified int the scanf statement like %d, %f, %s, etc.
gets:->
gets() reads a line from stdin into the buffer pointed to by s until either a terminating newline or EOF, which it replaces with a null byte ('\0').
BUGS:->
Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead.
scanf:->
The scanf() function reads input from the standard input stream stdin;
BUG:->
Some times scanf makes boundary problems when deals with array and string concepts.
In case of scanf you need that format mentioned, unlike in gets. So in gets you enter charecters, strings, numbers and spaces.
In case of scanf , you input ends as soon as a white-space is encountered.
But then in your example you are using '%s' so, neither gets() nor scanf() that the strings are valid pointers to arrays of sufficient length to hold the characters you are sending to them. Hence can easily cause an buffer overflow.
Tip: use fgets() , but that all depends on the use case
The concept that scanf does not take white space is completely wrong. If you use this part of code it will take white white space also :
#include<stdio.h>
int main()
{
char name[25];
printf("Enter your name :\n");
scanf("%[^\n]s",name);
printf("%s",name);
return 0;
}
Where the use of new line will only stop taking input. That means if you press enter only then it will stop taking inputs.
So, there is basically no difference between scanf and gets functions. It is just a tricky way of implementation.
scanf() is much more flexible tool while gets() only gets one variable at a time.
gets() is unsafe, for example: char str[1]; gets(str)
if you input more then the length, it will end with SIGSEGV.
if only can use gets, use malloc as the base variable.

if one complains about gets(), why not do the same with scanf("%s",...)?

From man gets:
Never use gets(). Because it is
impossible to tell without knowing the
data in advance how many
characters gets() will read, and
because gets() will continue to store
characters past the end of the buffer,
it is extremely dangerous to use.
It has been used to break computer
security. Use fgets() instead.
Almost everywhere I see scanf being used in a way that should have the same problem (buffer overflow/buffer overrun): scanf("%s",string). This problem exists in this case? Why there are no references about it in the scanf man page? Why gcc does not warn when compiling this with -Wall?
ps: I know that there is a way to specify in the format string the maximum length of the string with scanf:
char str[10];
scanf("%9s",str);
edit: I am not asking to determe if the preceding code is right or not. My question is: if scanf("%s",string) is always wrong, why there are no warnings and there is nothing about it in the man page?
The answer is simply that no-one has written the code in GCC to produce that warning.
As you point out, a warning for the specific case of "%s" (with no field width) is quite appropriate.
However, bear in mind that this is only the case for the case of scanf(), vscanf(), fscanf() and vfscanf(). This format specifier can be perfectly safe with sscanf() and vsscanf(), so the warning should not be issued in that case. This means that you cannot simply add it to the existing "scanf-style-format-string" analysis code; you will have to split that into "fscanf-style-format-string" and "sscanf-style-format-string" options.
I'm sure if you produce a patch for the latest version of GCC it stands a good chance of being accepted (and of course, you will need to submit patches for the glibc header files too).
Using gets() is never safe. scanf() can be used safely, as you said in your question. However, determining if you're using it safely is a more difficult problem for the compiler to work out (e.g. if you're calling scanf() in a function where you pass in the buffer and a character count as arguments, it won't be able to tell); in that case, it has to assume that you know what you're doing.
When the compiler looks at the formatting string of scanf, it sees a string! That's assuming the formatting string is not entered at run-time. Some compilers like GCC have some extra functionality to analyze the formatting string if entered at compile time. That extra functionality is not comprehensive, because in some situations a run-time overhead is needed which is a NO NO for languages like C. For example, can you detect an unsafe usage without inserting some extra hidden code in this case:
char* str;
size_t size;
scanf("%z", &size);
str = malloc(size);
scanf("%9s"); // how can the compiler determine if this is a safe call?!
Of course, there are ways to write safe code with scanf if you specify the number of characters to read, and that there is enough memory to hold the string. In the case of gets, there is no way to specify the number of characters to read.
I am not sure why the man page for scanf doesn't mention the probability of a buffer overrun, but vanilla scanf is not a secure option. A rather dated link - Link shows this as the case. Also, check this (not gcc but informative nevertheless) - Link
It may be simply that scanf will allocate space on the heap based on how much data is read in. Since it doesn't allocate the buffer and then read until the null character is read, it doesn't risk overwriting the buffer. Instead, it reads into its own buffer until the null character is found, and presumably copies that buffer into another of the correct size at the end of the read.

Resources