Difference between division and shift - c

I was reading this question on SO. After reading the first answer I was unable to understand the reason for -5 >> 1 = -3. I also tweaked a little bit more around it.
You can also see the code and output here.
Here is what I did:
#include<stdio.h>
int main(){
printf("5/2 = %d\n",5/2);
printf("5 >> 1 = %d\n",5 >> 1);
printf("5/2 = %lf\n",5/2);
printf("5 >> 1 = %f\n",5 >> 1);
printf("-5/2 = %d\n",-5/2);
printf("-5 >> 1 = %d\n",-5 >> 1);
printf("-5/2 = %f\n",-5/2);
printf("-5 >> 1 = %f\n",-5 >> 1);
return 0;
}
Output :
5/2 = 2
5 >> 1 = 2
5/2 = 2.168831
5 >> 1 = 2.168831
-5/2 = -2
-5 >> 1 = -3
-5/2 = 2.168833
-5 >> 1 = 2.168833
I am unable to understand 5/2 == 2.168831, 5 >> 2 == 2.168831, 5 >> 1 == -3.
Why this is happening? (It may be possible that the answer is very basic and I am missing some basic things, so pls guide me).

The result of -5 / 2 is an int, not a float or a double. However your format specifier is %f, so your int gets interpreted as a float, which makes no sense, hence the erratic values. What you are doing is called undefined behavior: anything can happen.

The reason you see the results you do is:
When you pass an int argument but use a printf specifier for double (remember that a float is converted to a double in this situation), then most C implementations pass the int argument according to their usual rules for passing a int argument to a variadic function (a function that accepts diverse argument types), but the printf routine interprets the machine state as if it were passed a double argument, as described below. (This is not necessarily what always happens; once you leave the behavior defined by the C standard, a C implementation may do other things. In particular, there can be complex interactions with the optimizer that cause surprising results. However, this is what happens most commonly. You cannot rely on it.)
Each computing platform has some rules for how arguments are passed. One platform might specify that all arguments are pushed onto the stack, from right to left, and that each argument is put onto the stack using only as many bytes as it needs. Another platform might specify that arguments are pushed onto the stack left to right or that arguments are padded up to the next multiple of four bytes, to keep the stack nicely aligned. Many modern platforms specify that integer arguments under a certain size are passed in general registers, floating-point arguments are passed in floating-point registers, and other arguments are passed on the stack.
When printf sees %f and looks for a double argument, but you have passed an int, what does printf find? If this platform pushes both arguments onto the stack, then printf finds the bits for your int, but it interprets those bits as if they were a double. This results in printf printing a value determined by your int, but it bears no obvious relationship to your int because the bits have entirely different meanings in the encodings for int and for double.
If this platform puts an int argument in one place but a double argument in another place, then printf finds some bits that have nothing at all to do with your int argument. They are bits that just happened to be left over in, for example, the floating-point register where the double argument should be. Those bits are just the residue of previous work. The value you get will be essentially random with respect to the int you have passed. You can also get a mix, with printf looking for eight bytes of a double by taking four bytes of the int you passed along with four bytes of whatever else was nearby.
When you run the program multiple times, you will often see the same value printed. This happens for two reasons. First, computers are mechanical. They operate in largely deterministic ways, so they do the same things over and over again, even if those things were not particularly designed to be used the way you are using them. Second, the environment the operating system passes to the program when it starts is largely the same each time you start the program. Most of its memory is either cleared or is initialized from the program file. Some of the memory or other program state is initialized from other environment in the computer. That data can be different from run to run of the program. For example, the current time obviously changes from run to run. So does your command history, plus values the command shell has placed in its environment variables. Sometimes running a program multiple times will produce different results.
When you use code whose behavior is not defined by some specification (which may be the C specification, a compiler specification, a machine and operating system specification, or other documents), then you cannot rely on the behavior of that code. (It is possible to rely on the behavior of code compiled by a particular C compiler that is specified by that C compiler even though it is not fully specified by the C standard.)

Related

Why can I printf with the wrong specifier and still get output?

My question involves the memory layout and mechanics behind the C printf() function. Say I have the following code:
#include <stdio.h>
int main()
{
short m_short;
int m_int;
m_int = -5339876;
m_short = m_int;
printf("%x\n", m_int);
printf("%x\n", m_short);
return 0;
}
On GCC 7.5.0 this program outputs:
ffae851c
ffff851c
My question is, where is the ffff actually coming from in the second hex number? If I'm correct, those fs should be outside the bounds of the short, but printf is getting them from somewhere.
When I properly format with specifier %hx, the output is rightly:
ffae851c
851c
As far as I have studied, the compiler simply truncates the top half of the number, as shown in the second output. So in the first output, are the first four fs from the program actually reading into memory that it shouldn't? Or does the C compiler behind-the-scenes still reserve a full integer even for a short, sign-extended, but the high half shall be undefined behavior, if used?
Note: I am performing research, in a real-world application, I would never try to abuse the language.
When a char or short (including signed and unsigned versions) is used as a function argument where there is no specific type (as with the ... arguments to printf(format,...))1, it is automatically promoted to an int (assuming it is not already as wide as an int2).
So printf("%x\n", m_short); has an int argument. What is the value of that argument? In the assignment m_short = m_int;, you attempted to assign it the value −5339876 (represented with bytes 0xffae851c). However, −5339876 will not fit in this 16-bit short. In assignments, a conversion is automatically performed, and, when a conversion of an integer to a signed integer type does not fit, the result is implementation-defined. It appears your implementation, as many do, uses two’s complement and simply takes the low bits of the integer. Thus, it puts the bytes 0x851c in m_short, representing the value −31460.
Recall that this is being promoted back to int for use as the argument to printf. In this case, it fits in an int, so the result is still −31460. In a two’s complement int, that is represented with the bytes 0xffff851c.
Now we know what is being passed to printf: An int with bytes 0xffff851c representing the value −31460. However, you are printing it with %x, which is supposed to receive an unsigned int. With this mismatch, the behavior is not defined by the C standard. However, it is a relatively minor mismatch, and many C implementations let it slide. (GCC and Clang do not warn even with -Wall.)
Let’s suppose your C implementation does not treat printf as a special known function and simply generates code for the call as you have written it, and that you later link this program with a C library. In this case, the compiler must pass the argument according to the specification of the Application Binary Interface (ABI) for your platform. (The ABI specifies, among other things, how arguments are passed to functions.) To conform to the ABI, the C compiler will put the address of the format string in one place and the bits of the int in another, and then it will call printf.
The printf routine will read the format string, see %x, and look for the corresponding argument, which should be an unsigned int. In every C implementation and ABI I know of, an int and an unsigned int are passed in the same place. It may be a processor register or a place on the stack. Let’s say it is in register r13. So the compiler designed your calling routine to put the int with bytes 0xffff851c in r13, and the printf routine looked for an unsigned int in r13 and found bytes 0xffff851c.
So the result is that printf interprets the bytes 0xffff851c as if they were an unsigned int, formats them with %x, and prints “ffff851c”.
Essentially, you got away with this because (a) a short is promoted to an int, which is the same size as the unsigned int that printf was expecting, and (b) most C implementations are not strict about mismatching integer types of the same width with printf. If you had instead tried printing an int using %ld, you might have gotten different results, such as “garbage” bits in the high bits of the printed value. Or you might have a case where the argument you passed is supposed to be in a completely different place from the argument printf expected, so none of the bits are correct. In some architectures, passing arguments incorrectly could corrupt the stack and break the program in a variety of ways.
Footnotes
1 This automatic promotion happens in many other expressions too.
2 There are some technical details regarding these automatic integer promotions that need not concern us at the moment.

Use of format specifiers for conversions

I am unable to deduce the internal happenings inside the machine when we print data using format specifiers.
I was trying to understand the concept of signed and unsigned integers and the found the following:
unsigned int b=-12;
printf("%d\n",b); //prints -12
printf("%u\n\n",b); //prints 4294967284
I am guessing that b actually stores the binary version of -12 as 11111111111111111111111111110100.
So, since b is unsigned , b technically stores 4294967284.
But still the format specifier %d causes the binary value of b to be printed as its signed version i,e, -12.
However,
printf("%f\n",2); //prints 0.000000
printf("%f\n",100); //prints 0.000000
printf("%d\n",3.2); //prints 2147483639
printf("%d\n",3.1); //prints 2147483637
I kind of expected the 2 to be printed as 2.00000 and 3.2 to be printed as 3 as per type conversion norms.
Why does this not happen and what exactly takes place at machine level ?
Mismatching format specifier and argument type (like using the floating point specifier "%f" to print an int value) leads to undefined behavior.
Remember that 2 is an integer value, and vararg functions (like printf) doesn't really know the types of the arguments. The printf function have to rely on the format specifier to assume the argument is of the specified type.
To better understand how you get the results you get, to understand "the internal happenings", we first must make two assumptions:
The system uses 32 bits for the int type
The system uses 64 bits for the double type
Now what happens with
printf("%f\n",2); //prints 0.000000
is that the printf function sees the "%f" specifier, and fetch the next argument as a 64-bit double value. Since the int value you provided in the argument list is only 32 bits, half of the bits in the double value will be unknown. The printf function will then print the (invalid) double value. If you're unlucky some of the unknown bits might lead the value to be a trap value which can cause a crash.
Similarly with
printf("%d\n",3.2); //prints 2147483639
the printf function fetches the next argument as a 32-bit int value, losing half of the bits in the 64-bit double value provided as the actual argument. Exactly which 32 bits are copied into the internal int value depends on endianness. Integers don't have trap values so no crashes happens, just an unexpected value will be printed.
what exactly takes place at machine level ?
The stdio.h functions are quite far from the machine level. They provide a standardized abstraction layer on top of various OS API. Whereas "machine level" would refer to the generated assembler. The behavior you experience is mostly related to details of the C language rather than the machine.
On the machine level, there exists no signed numbers, but everything is treated as raw binary data. The compiler can turn raw binary data into a signed number by using an instruction that tells the CPU: "use what's stored at this location and treat it as a signed number". Specifically, as a two's complement signed number on all common computers. But this is irrelevant when explaining why your code misbehaves.
The integer constant 12 is of type int. When we write -12 we apply the unary - operator on that. The result is still of type int but now of value -12.
Then you attempt to store this negative number in an unsigned int. This triggers an implicit conversion to unsigned int, which should be carried out according to the C standard:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type
The maximum value of a 32 bit unsigned int is 2^32 - 1, which equals 4.29*10^9 - 1. "One more than the maximum" gives 4.29*10^9. If we calculate-12 + 4.29*10^9 we get 4294967284. This is in range of an unsigned int and is the result you see later.
Now as it happens, the printf family of functions is very unsafe. If you provide a wrong format specifier which doesn't matches the type, they might crash or display the wrong result etc - the program invokes undefined behavior.
So when you use %d or %i reserved for signed int, but pass an unsigned int, anything can happen. "Anything" includes the compiler trying to convert the passed type to match the passed format specifier. That's what happened when you used %d.
When you pass values of types completely mismatching the format specifier, the program just prints gibberish though. Because you are still invoking undefined behavior.
I kind of expected the 2 to be printed as 2.00000 and 3.2 to be printed as 3 as per type conversion norms.
The reason why the printf family can't do anything intelligent like assuming that 2 should be converted to 2.0, is because they are variadic (variable argument) functions. Meaning they can take any number of arguments. In order to make that possible, the parameters are essentially passed as raw binary through something called va_list, and all type information is lost. The printf implementation is therefore left with no type information but the format string you gave it. This is why variadic functions are so unsafe to use.
Unlike a regular function which has more type safety - if you declare void foo (float f) and pass the integer constant 2 (type int), it will attempt to implicitly convert from integer to float, and perhaps also give a conversion warning.
The behaviors you observe are the result of printf interpreting the bits given to it as the type specified by the format specifier. In particular, at least for your system:
The bits for an int argument and an unsigned argument in the same position within the argument list would be passed in the same place, so when you give printf one and tell it to format the other, it uses the bits you give it as if they were the bits of the other.
The bits for an int argument and a double argument would be passed in different places—possibly a general register for the int argument and a special floating-point register for the double argument, so when you give printf one and tell it to format the other, it does not get the bits for the double to use for the int; it gets completely unrelated bits that were left lying around by previous operations.
Whenever a function is called, values for its arguments must be placed in certain places. These places vary according to the software and hardware used, and they vary by the type and number of arguments. However, for any particular argument type, argument position, and specific software and hardware used, there is a specific place (or combination of places) where the bits of that argument should be stored to be passed to the function. The rules for this are part of the Application Binary Interface (ABI) for the software and hardware being used.
First, let us neglect any compiler optimization or transformation and examine what happens when the compiler implements a function call in source code directly as a function call in assembly language. The compiler will take the arguments you provide for printf and write them to the places designated for those types of arguments. When printf executes, it examines the format string. When it sees a format specifier, it figures out what type of argument it should have, and it looks for the value of that argument in the place for that type of argument.
Now, there are two things that can happen. Say you passed an unsigned but used a format specifier for int, like %d. In every ABI I have seen, an unsigned and an int argument (in the same position within the list of arguments) are passed in the same place. So, when printf looks for the bits for the int it is expected, it will get the bits for the unsigned you passed.
Then printf will interpret those bits as if they encoded the value for an int, and it will print the results. In other words, the bits of your unsigned value are reinterpreted as the bits of an int.1
This explains why you see “-12” when you pass the unsigned value 4,294,967,284 to printf to be formatted with %d. When the bits 11111111111111111111111111110100 are interpreted as an unsigned, they represent the value 4,294,967,284. When they are interpreted as an int, they represent the value −12 on your system. (This encoding system is called two’s complement. Other encoding systems include one’s complement and sign-and-magnitude, in which these bits would represent −1 and −2,147,483,636, respectively. Those systems are rare for plain integer types these days.)
That is the first of two things that can happen, and it is common when you pass the wrong type but it is similar to the correct type in size and nature—it is passed in the same place as the wrong type. The second thing that can happen is that the argument you pass is passed in a different place than the argument that is expected. For example, if you pass a double as an argument, it is, in many systems, placed in separate set of registers for floating-point values. When printf goes looking for an int argument for %d, it will not find the bits of your double at all. Instead, what it finds in the place where it looks for an int argument might be whatever bits happened to be left in a register or memory location from previous operations, or it might be the bits of the next argument in the list of arguments. In any case, this means that the value printf prints for the %d will have nothing to do with the double value you passed, because the bits of the double are not involved in any way—a complete different set of bits is used.
This is also part of the reason the C standard says it does not define the behavior when the wrong argument type is passed for a printf conversion. Once you have messed up the argument list by passing double where an int should have been, all the following arguments may be in the wrong places too. They might be in different registers from where they are expected, or they might be in different stack locations from where they are expected. printf has no way to recover from this mistake.
As stated, all of the above neglects compiler optimization. The rules of C arose out of various needs, such as accommodating the problems above and making C portable to a variety of systems. However, once those rules are written, compilers can take advantage of them to allow optimization. The C standard permits a compiler to make any transformation of a program as long as the changed program has the same behavior as the original program under the rules of the C standard. This permission allows compilers to speed up programs tremendously in some circumstances. But a consequence is that, if your program has behavior not defined by the C standard (and not defined by any other rules the compiler follows), it is allowed to transform your program into anything. Over the years, compilers have grown increasingly aggressive about their optimizations, and they continue to grow. This means, aside from the simple behaviors described above, when you pass incorrect arguments to printf, the compiler is allowed to produce completely different results. Therefore, although you may commonly see the behaviors I describe above, you may not rely on them.
Footnote
1 Note that this is not a conversion. A conversion is an operation whose input is one type and whose output is another type but has the same value (or as nearly the same as is possible, in some sense, as when we convert a double 3.5 to an int 3). In some cases, a conversion does not require any change to the bits—an unsigned 3 and an int 3 use the same bits to represent 3, so the conversion does not change the bits, and the result is the same as a reinterpretation. But they are conceptually different.

printf with unmatched format and parameters

i'm trying to understand the printf function.
I know after i read about this function that the c compiler automatically casts all the parameters which are smaller than int like chars and shorts to int.
I also know that long long int (8 bytes) is not casted and pushed to the stack as it is.
so i wrote this simple c code:
#include <stdio.h>
int main()
{
long long int a = 0x4444444443434343LL;
// note that 0x44444444 is 4 times 0x44 which is D in ascii.
// and 0x43434343 is 4 times 0x43 which is C in ascii.
printf("%c %c\n", a);
return 0;
}
that creates the a variable whose size is 8 bytes and pushes it to the stack.
i also know that the printf loops through the format string and when it sees %c it will increment the pointer by 4 (because it knows that a char was converted to int - example below)
something like:
char c = (char) va_arg(list, int) -->
(*(int *)((pointer += sizeof(int)) - sizeof(int)))
as you can see it gets the 4 bytes when the pointer points, and increment it by 4
My question is:
in my logic, it should print on little endian machines C D
this is not what happens and i ask why? im sure some of you know more than me about the implementation and thats why i ask his question.
EDIT: the actual result is C with some garbage character follows it.
i know some might say that its undefined behavior - it really depends on the implementation and i just want to know the logic of the implementation..
Your logic would have explained the behavior of early C compilers in the 70s and 80s. Newer ABIs use a variety of methods to pass arguments to functions, including variable argument functions. You have to study your system ABI to understand how parameters are passed in your case, inferring from constructions that have explicit undefined behavior does not help.
By the way, types shorter than int are not cast or casted, they are promoted to int. Note that float values are converted to double when passed to variable argument functions. Non integer types and integer types larger than int are passed according to the ABI, which means they may be passed in regular registers or even special registers, not necessarily on the stack.
printf relies on macros defined in <stdarg.h> to hide these implementation details, and thus can be written in a portable manner for architectures with different ABIs and different standard type sizes.
There is a fundamental misunderstanding here, as revealed by the comment
according to the format string here the compiler should know that 4 bytes were pushed, convert 4 bytes to char and print it...
But the problem is that there is no rule saying that C uses a single, byte-addressed stack for everything.
Different processor architectures can -- and do -- use a variety of techniques for passing arguments to functions. Some arguments may be passed on a conventional stack, but others may be passed in registers, or via other techniques. Arguments of different types may be passed in different types of registers (32 vs. 64 bit, integer vs. floating point, etc.).
Obviously a C compiler has to know how to properly pass arguments for the platform it's compiling for. Obviously a variadic function like printf has to be carefully written to fetch its variable arguments correctly, based on the platform it's being used on. But a format specifier like %d does not, repeat not, simply mean "pop 4 bytes from the stack and treat them as an int". Similarly, %c does not mean "pop 4 bytes and print the resulting integer as a character". When printf encounters the format specifier %c or %d, it needs to arrange to fetch the next argument of type int, whatever it takes to do that. And if, in fact, the next argument actually passed by the calling code was not of type int -- for example if, as here, the next argument was actually of type long long int -- there's just no way of knowing in general what might happen.
Specifically, when printf has just seen a %d or %c specifier, what it does internally is the equivalent of calling
va_arg(argp, int)
And this literally says, "fetch the next argument of type int". And then it's actually up to the author of va_arg (and the rest of the functions and macros declared in <stdarg.h>) to know exactly what it takes to fetch the next argument of type int on this particular platform.
Clearly it is possible to know what will actually happen on a particular platform. (Obviously the author of va_arg had to know!) But you won't figure it out based on the C language itself, or by making guesses about what you think ought to happen. You're going to have to read about the ABI -- the Application Binary Interface -- that specifies the details of function calling conventions on your platform. These details can be hard to find, because very few programmers actually care about them.
I said that "printf has to be carefully written to fetch its variable arguments correctly", but actually I misspoke slightly, because as I said later, "it's actually up to the author of va_arg to know exactly what it takes". You're right, it is possible to write a reasonably portable implementation of printf. There's an example in the C FAQ list.
If you want to know more about function calling conventions, another interesting topic to read about is Foreign Function Interfaces or FFI. (For example, there's another library libffi that helps you to -- portably! -- perform some more exotic tasks involved in manipulating function arguments.)
There are simply too many notes types
C specifies 11 integer types signed char, char, … unsigned long long as distinct types. Aside from char must match signed char or unsigned char, these could be implemented as 10 different encodings or just 2 (Use 64-bit signed or unsigned for all).
The standard library has a printf() specifiers for each of those 11. (Due to sub-int promotions, there are additional concerns).
So far no real issues.
Yet C has lots of other types with printf() specifiers:
ju uintmax_t
jd intmax_t
zu size_t
td ptrdiff_t
PRIdLEASTN int_leastN_t where N is 8, 16, 32, 64
PRIuLEASTN uint_leastN_t
PRIdN intN_
PRIuN uintN_t
Many others
In general1 these additional types, could be distinct from or compatible with the 11 above.
Any time code uses these other types in a printf(), the distinct/compatible issue will arise and prevent many compilers from detecting/providing the best suggested matching print specifier.
1 Various conditions/limitations exist.

Automatic typecasting discrepancy

#include<stdio.h>
int main(){
float a,b;
a=5;
b=12;
printf("Result:%f",a+b);
return 0;
}
If I display the result as float,I get 17.0.No problem
But I display a+b as int,I get the result as 0.
I tried different values of a & b to look for some pattern.No matter what the values of a & b are,a+b is displayed as 0 when displayed as int.
No problems when displayed as float.
My reasoning says that if I try to print a float as an int,the decimal part will be truncated.
Where am I wrong.I searched through typecasting tutorials but couldn't interpret this discrepancy.
May be this is a very elementary doubt but I couldn't reason out the causes behind the discrepancy.I do not know if the title is appropriate.Sorry for that.
I'm a beginner.So along with the answer if you could provide a source I can refer to for such kinds of doubts,I'll be grateful & also won't bother the community with stupid doubts.
Automatic typecasting discrepancy
There is no automatic conversion of the arguments of printf() based on the format string. The arguments of printf() after the first one are promoted according to the default argument promotions, which involve promoting float to double but, to reiterate, do not take the format string into account.
No matter what the values of a & b are,a+b is displayed as 0 when displayed as int.
If you did the equivalent of printf("%d", a + b);, it is normal for it not to work, because it is not supposed to. Technically, it invokes undefined behavior. The actual effects vary depending on the compilation platform and in particular the argument-passing conventions. Printing 0 is one of the possibilities.
What would be supposed to work would be printf("%d", (int) (a + b));, which you can fully expect to print 17.

Why does %d show two different values for *b and *c in the code [b and c points to same address]

Consider the Code below
{
float a=7.999,*b,*c;
b=&a;c=b;
printf("%d-b\n%d-c\n%d-a\n",*b,*c,a);
}
OUTPUT:
-536870912-b
1075838713-c
-536870912-a
I know we are not allowed to use %d instead of %f, but why does *b and *c give two different values?
Both have the same address, can someone explain?
I want to know the logic behind it
Here is a simplified example of your ordeal:
#include <stdio.h>
int main() {
float a=7.999, b=7.999;
printf("%d-a\n%d-b\n",a,b);
}
What's happening is that a and b are converted to doubles (8 bytes each) for the call to printf (since it is variadic). Inside the printf function, the given data, 16 bytes, is printed as two 4-byte ints. So you're printing the first and second half of one of the given double's data as ints.
Try changing the printf call to this and you'll see both halves of both doubles as ints:
printf("%d-a1\n%d-a2\n%d-b1\n%d-b2\n",a,b);
I should add that as far as the standard is concerned, this is simply "undefined behavior", so the above interpretation (tested on gcc) would apply only to certain implementations.
There can be any number of reasons.
The most obvious -- your platform passes some integers in integer registers and some floating point numbers in floating point registers, causing printf to look in registers that have never been set to any particular value.
Another possibility -- the variables are different sizes. So printf is looking in data that wasn't set or was set as part of some other operation.
Because printf takes its parameters through ..., type agreement is essential to ensure the implementation of the function is even looking in the right places for its parameters.
You would have to have deep knowledge of your platform and/or dig into the generated assembly code to know for sure.
Using wrong conversion specification invokes undefined behavior. You may get either expected or unexpected value. Your program may crash, may give different result on different compiler or any unexpected behavior.
C11: 7.21.6 Formatted input/output functions:
If a conversion specification is invalid, the behavior is undefined.282) If any argument is
not the correct type for the corresponding conversion specification, the behavior is
undefined.
// Bad
float a=7.999,*b,*c;
b=&a;c=b;
printf("%d-b\n%d-c\n%d-a\n",*b,*c,a);
// Good
float a=7.999,*b,*c;
b=&a;c=b;
printf("%f-b\n%f-c\n%f-a\n",*b,*c,a);
Using an integer format specified "%d" instead of correctly using a float specified "%f" is what alk elliptically fails to explain as "undefined behavior".
You need to use the correct format specifier.

Resources