Runtime typechecking for pointers - c

I want to know how scanf function is implemented. (Just for fun of course) Number of arguments is variable, so it's certainly implemented by va_list, va_arg macros.
It also throws some warnings, when number of arguments does not match with format string. This could be done by parsing format string and comparing it with number of arguments. No magic.
The only thing that I can't see how implemented, is type checking. When type of an argument (pointer to a data) does not match corresponding declaration in format literal, scanf produces a warning. How can one check type of data that a pointer points to?
Example:
#include<stdio.h>
int main()
{
char buffer1[32], buffer2[32];
int n;
double x;
scanf("%s %s %d",buffer1, buffer2, &x); // warning
scanf("%s %s %d",buffer1, buffer2, &n); // ok
}
Output:
warning: format ‘%d’ expects argument of type ‘int *’,
but argument 4 has type ‘double *’ [-Wformat]
AFAIK the C library is not a part of C language/Compiler, so there is nothing language-related in <stdio.h>. I'm assuming the warning is produced by implementation of scanf, not by compiler [?]. (Maybe using #warning)
If I want to do something similar in some code, how do I know which data type a pointer is pointing to?
Note: I have downloaded source code of GNU C library and looked at scanf.c. I can't find my way through the very complicated code. There is a lot of #ifndef s and calls to other functions with strange names and structure...

This check is handled by the gcc compiler, specifically for scanf/printf functions.
It's such a common error that it's worth adding special case code to the compiler for these functions.
see the GCC -WFormat flag here: http://gcc.gnu.org/onlinedocs/gcc-3.4.4/gcc/Warning-Options.html
-Wformat : Check calls to printf and scanf, etc., to make sure that the arguments supplied have types appropriate to the format string
specified, and that the conversions specified in the format string
make sense. This includes standard functions, and others specified by
format attributes (see Function Attributes), in the printf, scanf,
strftime and strfmon (an X/Open extension, not in the C standard)
families.
These checks are not implemented by all compilers, so it's certainly not something to rely on.
With GCC you can use Function Attributes 'format' and 'format-arg' to tell the compiler to apply the same checks to your functions.
format (archetype, string-index, first-to-check) The format attribute
specifies that a function takes printf, scanf, strftime or strfmon
style arguments which should be type-checked against a format string.
For example, the declaration:
extern int my_printf (void *my_object, const char *my_format, ...)
__attribute__ ((format (printf, 2, 3)));
...causes the compiler to check the arguments in calls to my_printf for consistency with the printf style format string argument
my_format.

The warning is generated by the compiler. You declared x as a double, so it knows &x is a double*. Then it scans the format string and sees that the format requires an int* there, hence it warns.

Related

How is the type check in printf implemented?

I'm writing a compiler and I want to implement the type check in printf:
printf("%f", i);
warning: format specifies type 'double' but the argument has type 'int' [-Wformat]
printf("%f", 1);
~~ ^~
%d
As you can see, gcc knows what does %f mean, and try to warn me about the type mismatching.
So how I can implement this?
P.S. Is there any chance there are no mysteries, just a special rule gcc write for printf?
It's a combination of two special rules, described in the documentation of -Wformat.
The printf function is built in, which allows GCC both to optimize it and to warn about its misuse. About the specific case of printf, there's a note in the documentation:
In addition, when a function is recognized as a built-in function, GCC may use information about that function to warn about problems with calls to that function, or to generate more efficient code, even if the resulting code still contains calls to that function. For example, warnings are given with -Wformat for bad calls to printf when printf is built in and strlen is known not to modify global memory.
In addition, you can declare your own functions as printf-like for warning purposes with the format attribute.
__attribute__((__format__(__printf__, 2, 3))) /*printf-like function, the format string is parameter 2, the first element is parameter 3*/
int myprintf(int stuff, const char *format, ...);
With a built-in function, GCC tries to replace the function call by something more efficient. For printf, this generally means that when the format argument is a string literal, GCC replaces it by a series of calls to print the individual elements. For example it's likely to compile printf("%d %s\n", x, s); as if the program contained __some_internal_function_to_print_a_dcimal_integer__(x); putc(' '); puts(s);. While the compiler is performing this optimization, it'll notice any mismatch between the format string and the argument types and warn accordingly. If the function isn't built in but has a format attribute, you just get the warnings.

I wanted to know about the working of "printf" function in c. For example for the print statements used in below code the outputs are different why?

how come the pointer gets printed even with %d
int main()
{
int s=5 ,t ,**p ,*n;
n=&s;
p=&n;
printf("%d",n);
printf("\n%p",*p);
return 0;
}
//answer for first print statement is purely integers
//answer for second one is hexadecimal
This is undefined behavor. You must match the type required by the format specifier to the type of the value passed.
In a comment you made it clear that you wanted to know "why it didn't show any kind of warning or anything".
printf is an example -- the best-known example -- of a variadic or "varargs" function. It does not accept one, fixed list of arguments with an exact number of predefined types. Every time you call printf, you call it with a different number of arguments, and the only way to determine how many arguments and what types they should be is to go through the format string looking for the % signs.
The printf function itself knows how to do this. At run time, when you call it, it goes through the format string looking for % signs, and each time it finds one, it looks at the following letter to determine what type of argument it should fetch next. But at run time, there's no mechanism to double-check what type of argument has actually been passed -- printf just blindly fetches a value (typically from the stack) that's assumed to be of the correct type, and prints it out assuming it had the correct type.
But that was at run time. The only way you could get a warning or error message at compile time would be for the compiler to go through the format string looking for % signs, and match them up with the actual arguments it can see you're passing.
Once upon a time, that wasn't a reasonable thing for a compiler to do. But times have changed, and today this is actually something that a good compiler does do! Here's what I got when I compiled your code using clang on my Mac:
warning: format specifies type 'int' but the argument has type 'int *'
And here's what I got when I compiled it with gcc, using the -Wall option:
warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘int *’
warning: unused variable ‘t’
So, the bottom line is: use a modern compiler (and remember to request warnings), and it should tell you about problems like these.
P.S. In a comment I said, "%p is for pointers, but you gave it an int." But I misread your code. Your first printf call uses %d, and passes n, but n is int * which is a pointer, so that's wrong.
Your second printf call uses %p, and passes *p. In your code, p is an int **, a pointer to pointer to int, so *p is a pointer-to-int, and that call is therefore (mostly) correct.

Lack of access specifer & in C does not result in a compile error

In C, when & is absent for an argument in scanf(), no compilation error is produced; instead, the displayed results are wrong (i.e. a semantic error occurs).
Consider the following code:
char str[30];
int a;
printf("Enter the value");
scanf("%s %d", str, a); // This is the statement in question.
printf("You entered %s %d", str, a);
Here I know str is a character array so it will have a base address, and thus will not produce a compilation error. But why does the absence of & for the argument a not result in a compilation error?
Also str gives correct output, but the integer is always producing the value -28770 as the output. Why is this?
scanf has the prototype:
int scanf(const char *fmt, ...);
The ... means that any number of arguments of any type may be specified after fmt. Hence it is the responsibility of the caller, not the compiler, to ensure that the provided arguments match what the function will be expecting.
int scanf(const char *format, ...);
This is the prototype for scanf. First argument is char* and other arguments are variable lengths. So no error will be generated.
As Umamahesh P wrote, you won't get a compilation error, because scanf is a variable-argument function. As to what happens when you run the program, it is what the C standard calls "undefined behaviour". Anything can happen. You need to look at what happens at the machine-instruction and memory address level to see what exactly scanf does with the integer value you gave it instead of a pointer.
You can get a compiler warning, depending on what compiler you use and what options you specify.
Also even if it tries to show those warnings, your compiler will be totally unable to do it if you use vscanf() or if you use a dynamic string as your format parameter.
For instance, this is the output I get using mingw-64 gcc 4.5.4:
gcc -Wall -o small.exe small.c
small.c:7:3: warning: format '%d' expects type 'int *', but argument 3 has type 'int'
small.c:7:8: warning: 'a' is used uninitialized in this function
Just add -Werror option to turn these warnings into errors.

Does sprintf() require format specifiers to work properly?

I have read the post sprintf format specifier replace by nothing, and others related, but have not seen this addressed specifically.
Until today, I have never seen sprintf used with only 2 arguments.
The prototype my system uses for sprintf() is:
int sprintf (char Target_String[], const char Format_String[], ...);
While working with some legacy code I ran across this: (simplified for illustration)
char toStr[30];
char fromStr[]={"this is the in string"};
sprintf(toStr, fromStr);
My interpretation of the prototype is that the second argument should be comprised of a const char[], and accepting standard ansi C format specifiers such as these.
But the above example seems to work just fine with the string fromStr as the 2nd argument.
Is it purely by undefined behavior that this works?, or is this usage perfectly legal?
I a working on Windows 7, using a C99 compiler.
Perfectly legal. The variadic arguments are optional.
In this case the printf serves as strcpy but parses the fmt string for % specifiers.
I'd write sprintf(toStr,"%s",fromStr); so it doesn't have to parse that long string.
The behavior you are observing is correct, a format string is not required to have any conversion specifiers. In this case the variable-length argument list, represented by ..., has length of zero. This is perfectly legal, although it's definitely less efficient than its equivalent
strcpy(toStr, fromStr);
It's perfectly legal code, but
If you just want to copy a string, use strcpy() instead.
If you are working with user input, you could be making yourself vulnerable to a format string attack.
Synopsis for sprintf is:
int sprintf(char *str, const char *format, ...);
That means 2 arguments are legal option.
It works because you have no further parameter (ie no control format %) to print.
It's no difference than printf without second parameter:
int printf ( const char * format, ... );
It also works if you don't have any second parameter:
printf(fromStr);
the second argument should be comprised of a const char[]
A const specifier of a function argument guarantees that the function does not change the value of that argument (given it can change it which is the case on arrays because they are passed by address to the function). It does not require that a const value to be used on the actual call.
The code you posted do not use a const string as the second argument to sprintf() but the conversion from non-const to const is implicit; there is no need to worry there.
accepting standard ansi C format specifiers
"accepting" does not mean "requiring". The format string you specified does not contain any format specifier. Accordingly, the function is called with only 2 arguments (no values to format). A third argument would be ignored by sprinf() anyway and many modern compilers would issue an warning about it.
Update: I don't want to start a debate about which compilers are modern and which are not.
It happens that I'm using the default compiler on OSX 10.11 and this what it outputs:
axiac: ~/test$ cc -v
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.3.0
Thread model: posix
axiac: ~/test$ cc -o 1 1.c
1.c:8:25: warning: data argument not used by format string [-Wformat-extra-args]
sprintf(x, "abc\n", n);
~~~~~~~ ^

Passing too many arguments to printf

Any C programmer who's been working for more than a week has encountered crashes that result from calling printf with more format specifiers than actual arguments, e.g.:
printf("Gonna %s and %s, %s!", "crash", "burn");
However, are there any similar bad things that can happen when you pass too many arguments to printf?
printf("Gonna %s and %s!", "crash", "burn", "dude");
My knowledge of x86/x64 assembly leads me to believe that this is harmless, though I'm not convinced that there's not some edge condition I'm missing, and I have no idea about other architectures. Is this condition guaranteed to be harmless, or is there a potentially crash-inducing pitfall here, too?
Online C Draft Standard (n1256), section 7.19.6.1, paragraph 2:
The fprintf function writes output to the stream pointed to by stream, under control of the string pointed to by format that specifies how subsequent arguments are
converted for output. If there are insufficient arguments for the format, the behavior is
undefined. If the format is exhausted while arguments remain, the excess arguments are
evaluated (as always) but are otherwise ignored. The fprintf function returns when
the end of the format string is encountered.
Behavior for all the other *printf() functions is the same wrt excess arguments except for vprintf() (obviously).
You probably know the prototype for the printf function as something like this
int printf(const char *format, ...);
A more complete version of that would actually be
int __cdecl printf(const char *format, ...);
The __cdecl defines the "calling convention" which, along with other things, describes how arguments are handled. In the this case it means that args are pushed onto the stack and that the stack is cleaned by the function making the call.
One alternative to _cdecl is __stdcall, there are others. With __stdcall the convention is that arguments are pushed onto the stack and cleaned by the function that is called. However, as far as I know, it isn't possible for a __stdcall function to accept a variable number of arguments. That makes sense since it wouldn't know how much stack to clean.
The long and the short of it is that in the case of __cdecl functions its safe to pass however many args you want, since the cleanup is performed in the code makeing the call. If you were to somehow pass too many arguments to a __stdcall function it result in a corruption of the stack. One example of where this could happen is if you had the wrong prototype.
More information on calling conventions can be found on Wikipedia here.
All the arguments will be pushed on the stack and removed if the stack frame is removed. this behaviour is independend from a specific processor. (I only remember a mainframe which had no stack, designed in 70s) So, yes the second example wont't fail.
printf is designed to accept any number of arguments. printf then reads the format specifier (first argument), and pulls arguments from the argument list as needed. This is why too few arguments crash: the code simply starts using non-existent arguments, accessing memory that doesn't exist, or some other bad thing. But with too many arguments, the extra arguments will simply be ignored. The format specifier will use fewer arguments than have been passed in.
Comment: both gcc and clang produce warnings:
$ clang main.c
main.c:4:29: warning: more '%' conversions than data arguments [-Wformat]
printf("Gonna %s and %s, %s!", "crash", "burn");
~^
main.c:5:47: warning: data argument not used by format string
[-Wformat-extra-args]
printf("Gonna %s and %s!", "crash", "burn", "dude");
~~~~~~~~~~~~~~~~~~ ^
2 warnings generated.

Resources