What is the purpose of requiring a data type specification of the returned value in a function definition or prototype? Type casting? - c

Consider the following snippet of code:
#include<stdio.h>
int demo_function(float a);
int main(void)
{
int b;
float fraction_number = 3.15f;
b = demo_function(fraction_number);
printf("The number returned is %d", b);
return 0;
}
int demo_function(float a)
{
float c;
c = a;
printf("The number passed is %.2f \n", c);
return c;
}
The output is:
The number passed is 3.15
The number returned is 3
From this little test code, it seems like actual purpose of writing int as the data type of demo_function's returned value is to type cast.
Firstly, is that the correct interpretation of what is going on?
Secondly, why does the compiler actually need this information? (Or perhaps a better question is, "How does the compiler use this information?"). Specifically, if the compiler sees that variable b is declared as an int, why does it need to explicitly know that the returned value is of type int?
If we end up storing the returned value of variable c into b, what issues would arise if the compiler did NOT require the explicit mention that the returned value is of type int? Would there be information loss as the float c variable tries to get squished into the smaller memory allocated int b variable?
Thanks!

As a practical matter, the compiler needs to know what type a function returns because it needs to know how to interpret the bits a function returns and it needs to know where those bits are.
Suppose that when a function returns, the register used for the return value contains 0x80. If the function is supposed to return an 8-bit unsigned char, then the return value is 128. If the function is supposed to return an 8-bit two’s complement signed char, then the return value is −128. Knowing the type is necessary to know the interpretation of the bits.
In many systems, a function that returns an integer is supposed to put the bits in a certain general register of the processor, but a function that returns a floating-point value is supposed to put the bits in a certain floating-point register. In this case, the caller needs to know the return type of the function in order to know where the bits of the return value are.
At a more abstract level, the compiler needs to know the return type of the function so that it can interpret the expression the function call appears in. Consider that, in C, the expression 5 / 4 performs integer division with truncation and produces 1, but the expression 5. / 4 performs floating-point division and produces 1.25. So, in the expression f(x) / 4, the compiler needs to know what type f returns so that it knows whether to perform integer division of floating-point division.
For another example, suppose f returns a pointer, and the program uses y = *f(x). To execute this code, the compiler has to take the value returned by f and use it as an address to fetch something from memory. But what does it fetch? Does the address point to a one-byte char, an eight-byte double, or a 100-byte structure? The compiler needs to know the type of the pointer returned by f so that it knows what type of object it points to.
From this little test code, it seems like actual purpose of writing int as the data type of demo_function's returned value is to type cast.
The primary purpose is as described above. The fact that the value in a return statement is converted to the return type of the function is a secondary effect; it is merely a convenience of the language, not a necessary effect. (The implicit conversion is not necessary because we could effect the conversion by specifying it explicitly.)
Also note that a cast is an explicit operator. It is not the operation. For example, + and * are operators; they are things that appear in source code that say we want to do certain operations. The actual operations are addition and multiplication. Similarly, a cast is a type name in parentheses; it is some text that appears in source code that specifies we want to perform a conversion.

Related

What is the order in which C expressions are evaluated

int main(){
char a = 5;
float b = 6.0;
int c = a + b;
return c;
}
Looking at the generate instructions with gcc, the above code is evaluated like this:
Load 5 and convert it to float as a
Load 6 as b
Add a and b
Convert the result to an integer and return
Does gcc not care about the return type while it's dealing with the expression?
It could have converted b to an int right off the bat as everything else is an integer type.
Is there a rule which explains how one side of an expression is evaluated first regardless of what the other side is?
You ask "Is there a rule?" Of course there is a rule. Any widely used programming language will have a huge set of rules.
You have an expression "a + b". a has type char, b has type float. There's a rule in the C language that the compiler has to find a common type and convert both to the common type. Since one of the values is a floating-point type, the common type must be a floating-point type, which is float, double, or long double. If you look closer at the rules, it turns out the common type must be float or double, and the compiler must document this. It seems the compiler chose "float" as the common type.
So a is converted from char to float, b is already float, both are added, the result has type float. And then there's a rule that to assign float to int, a conversion takes place according to very specific rules.
Any C compiler must follow these rules exactly. There is one exception: If the compiler can produce the results that it is supposed to produce then it doesn't matter how. As long as you can't distinguish it from the outside. So the compiler can change the whole code to "return 11;".
In the C language, partial expressions are evaluated without regard how they are used later. Whether a+b is assigned to an int, a char, a double, it is always evaluated in the same way. There are other languages with different rules, where the fact that a+b is assigned to an int could change how it is evaluated. C is not one of those languages.
If you change it to:
int main(){
char a = 5;
float b = 6.6;
int c = a + 2*b;
return c;
}
then it becomes clear that you have to keep the float 18.2 until the end.
With no optimizations, gcc acts as if this could happen and does a lot of conversions.
With just -O it already does the math itself and directly returns the final integer, even in above example.
There is no in-between reasoning and short-cut here. Why simplify from 5+6.0 to 5+6 but not to 11? Either act stupid and do cvtsi2ss xmm1,eax (and back etc.), or then tell them directly 11.

Why can I printf with the wrong specifier and still get output?

My question involves the memory layout and mechanics behind the C printf() function. Say I have the following code:
#include <stdio.h>
int main()
{
short m_short;
int m_int;
m_int = -5339876;
m_short = m_int;
printf("%x\n", m_int);
printf("%x\n", m_short);
return 0;
}
On GCC 7.5.0 this program outputs:
ffae851c
ffff851c
My question is, where is the ffff actually coming from in the second hex number? If I'm correct, those fs should be outside the bounds of the short, but printf is getting them from somewhere.
When I properly format with specifier %hx, the output is rightly:
ffae851c
851c
As far as I have studied, the compiler simply truncates the top half of the number, as shown in the second output. So in the first output, are the first four fs from the program actually reading into memory that it shouldn't? Or does the C compiler behind-the-scenes still reserve a full integer even for a short, sign-extended, but the high half shall be undefined behavior, if used?
Note: I am performing research, in a real-world application, I would never try to abuse the language.
When a char or short (including signed and unsigned versions) is used as a function argument where there is no specific type (as with the ... arguments to printf(format,...))1, it is automatically promoted to an int (assuming it is not already as wide as an int2).
So printf("%x\n", m_short); has an int argument. What is the value of that argument? In the assignment m_short = m_int;, you attempted to assign it the value −5339876 (represented with bytes 0xffae851c). However, −5339876 will not fit in this 16-bit short. In assignments, a conversion is automatically performed, and, when a conversion of an integer to a signed integer type does not fit, the result is implementation-defined. It appears your implementation, as many do, uses two’s complement and simply takes the low bits of the integer. Thus, it puts the bytes 0x851c in m_short, representing the value −31460.
Recall that this is being promoted back to int for use as the argument to printf. In this case, it fits in an int, so the result is still −31460. In a two’s complement int, that is represented with the bytes 0xffff851c.
Now we know what is being passed to printf: An int with bytes 0xffff851c representing the value −31460. However, you are printing it with %x, which is supposed to receive an unsigned int. With this mismatch, the behavior is not defined by the C standard. However, it is a relatively minor mismatch, and many C implementations let it slide. (GCC and Clang do not warn even with -Wall.)
Let’s suppose your C implementation does not treat printf as a special known function and simply generates code for the call as you have written it, and that you later link this program with a C library. In this case, the compiler must pass the argument according to the specification of the Application Binary Interface (ABI) for your platform. (The ABI specifies, among other things, how arguments are passed to functions.) To conform to the ABI, the C compiler will put the address of the format string in one place and the bits of the int in another, and then it will call printf.
The printf routine will read the format string, see %x, and look for the corresponding argument, which should be an unsigned int. In every C implementation and ABI I know of, an int and an unsigned int are passed in the same place. It may be a processor register or a place on the stack. Let’s say it is in register r13. So the compiler designed your calling routine to put the int with bytes 0xffff851c in r13, and the printf routine looked for an unsigned int in r13 and found bytes 0xffff851c.
So the result is that printf interprets the bytes 0xffff851c as if they were an unsigned int, formats them with %x, and prints “ffff851c”.
Essentially, you got away with this because (a) a short is promoted to an int, which is the same size as the unsigned int that printf was expecting, and (b) most C implementations are not strict about mismatching integer types of the same width with printf. If you had instead tried printing an int using %ld, you might have gotten different results, such as “garbage” bits in the high bits of the printed value. Or you might have a case where the argument you passed is supposed to be in a completely different place from the argument printf expected, so none of the bits are correct. In some architectures, passing arguments incorrectly could corrupt the stack and break the program in a variety of ways.
Footnotes
1 This automatic promotion happens in many other expressions too.
2 There are some technical details regarding these automatic integer promotions that need not concern us at the moment.

Use of format specifiers for conversions

I am unable to deduce the internal happenings inside the machine when we print data using format specifiers.
I was trying to understand the concept of signed and unsigned integers and the found the following:
unsigned int b=-12;
printf("%d\n",b); //prints -12
printf("%u\n\n",b); //prints 4294967284
I am guessing that b actually stores the binary version of -12 as 11111111111111111111111111110100.
So, since b is unsigned , b technically stores 4294967284.
But still the format specifier %d causes the binary value of b to be printed as its signed version i,e, -12.
However,
printf("%f\n",2); //prints 0.000000
printf("%f\n",100); //prints 0.000000
printf("%d\n",3.2); //prints 2147483639
printf("%d\n",3.1); //prints 2147483637
I kind of expected the 2 to be printed as 2.00000 and 3.2 to be printed as 3 as per type conversion norms.
Why does this not happen and what exactly takes place at machine level ?
Mismatching format specifier and argument type (like using the floating point specifier "%f" to print an int value) leads to undefined behavior.
Remember that 2 is an integer value, and vararg functions (like printf) doesn't really know the types of the arguments. The printf function have to rely on the format specifier to assume the argument is of the specified type.
To better understand how you get the results you get, to understand "the internal happenings", we first must make two assumptions:
The system uses 32 bits for the int type
The system uses 64 bits for the double type
Now what happens with
printf("%f\n",2); //prints 0.000000
is that the printf function sees the "%f" specifier, and fetch the next argument as a 64-bit double value. Since the int value you provided in the argument list is only 32 bits, half of the bits in the double value will be unknown. The printf function will then print the (invalid) double value. If you're unlucky some of the unknown bits might lead the value to be a trap value which can cause a crash.
Similarly with
printf("%d\n",3.2); //prints 2147483639
the printf function fetches the next argument as a 32-bit int value, losing half of the bits in the 64-bit double value provided as the actual argument. Exactly which 32 bits are copied into the internal int value depends on endianness. Integers don't have trap values so no crashes happens, just an unexpected value will be printed.
what exactly takes place at machine level ?
The stdio.h functions are quite far from the machine level. They provide a standardized abstraction layer on top of various OS API. Whereas "machine level" would refer to the generated assembler. The behavior you experience is mostly related to details of the C language rather than the machine.
On the machine level, there exists no signed numbers, but everything is treated as raw binary data. The compiler can turn raw binary data into a signed number by using an instruction that tells the CPU: "use what's stored at this location and treat it as a signed number". Specifically, as a two's complement signed number on all common computers. But this is irrelevant when explaining why your code misbehaves.
The integer constant 12 is of type int. When we write -12 we apply the unary - operator on that. The result is still of type int but now of value -12.
Then you attempt to store this negative number in an unsigned int. This triggers an implicit conversion to unsigned int, which should be carried out according to the C standard:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type
The maximum value of a 32 bit unsigned int is 2^32 - 1, which equals 4.29*10^9 - 1. "One more than the maximum" gives 4.29*10^9. If we calculate-12 + 4.29*10^9 we get 4294967284. This is in range of an unsigned int and is the result you see later.
Now as it happens, the printf family of functions is very unsafe. If you provide a wrong format specifier which doesn't matches the type, they might crash or display the wrong result etc - the program invokes undefined behavior.
So when you use %d or %i reserved for signed int, but pass an unsigned int, anything can happen. "Anything" includes the compiler trying to convert the passed type to match the passed format specifier. That's what happened when you used %d.
When you pass values of types completely mismatching the format specifier, the program just prints gibberish though. Because you are still invoking undefined behavior.
I kind of expected the 2 to be printed as 2.00000 and 3.2 to be printed as 3 as per type conversion norms.
The reason why the printf family can't do anything intelligent like assuming that 2 should be converted to 2.0, is because they are variadic (variable argument) functions. Meaning they can take any number of arguments. In order to make that possible, the parameters are essentially passed as raw binary through something called va_list, and all type information is lost. The printf implementation is therefore left with no type information but the format string you gave it. This is why variadic functions are so unsafe to use.
Unlike a regular function which has more type safety - if you declare void foo (float f) and pass the integer constant 2 (type int), it will attempt to implicitly convert from integer to float, and perhaps also give a conversion warning.
The behaviors you observe are the result of printf interpreting the bits given to it as the type specified by the format specifier. In particular, at least for your system:
The bits for an int argument and an unsigned argument in the same position within the argument list would be passed in the same place, so when you give printf one and tell it to format the other, it uses the bits you give it as if they were the bits of the other.
The bits for an int argument and a double argument would be passed in different places—possibly a general register for the int argument and a special floating-point register for the double argument, so when you give printf one and tell it to format the other, it does not get the bits for the double to use for the int; it gets completely unrelated bits that were left lying around by previous operations.
Whenever a function is called, values for its arguments must be placed in certain places. These places vary according to the software and hardware used, and they vary by the type and number of arguments. However, for any particular argument type, argument position, and specific software and hardware used, there is a specific place (or combination of places) where the bits of that argument should be stored to be passed to the function. The rules for this are part of the Application Binary Interface (ABI) for the software and hardware being used.
First, let us neglect any compiler optimization or transformation and examine what happens when the compiler implements a function call in source code directly as a function call in assembly language. The compiler will take the arguments you provide for printf and write them to the places designated for those types of arguments. When printf executes, it examines the format string. When it sees a format specifier, it figures out what type of argument it should have, and it looks for the value of that argument in the place for that type of argument.
Now, there are two things that can happen. Say you passed an unsigned but used a format specifier for int, like %d. In every ABI I have seen, an unsigned and an int argument (in the same position within the list of arguments) are passed in the same place. So, when printf looks for the bits for the int it is expected, it will get the bits for the unsigned you passed.
Then printf will interpret those bits as if they encoded the value for an int, and it will print the results. In other words, the bits of your unsigned value are reinterpreted as the bits of an int.1
This explains why you see “-12” when you pass the unsigned value 4,294,967,284 to printf to be formatted with %d. When the bits 11111111111111111111111111110100 are interpreted as an unsigned, they represent the value 4,294,967,284. When they are interpreted as an int, they represent the value −12 on your system. (This encoding system is called two’s complement. Other encoding systems include one’s complement and sign-and-magnitude, in which these bits would represent −1 and −2,147,483,636, respectively. Those systems are rare for plain integer types these days.)
That is the first of two things that can happen, and it is common when you pass the wrong type but it is similar to the correct type in size and nature—it is passed in the same place as the wrong type. The second thing that can happen is that the argument you pass is passed in a different place than the argument that is expected. For example, if you pass a double as an argument, it is, in many systems, placed in separate set of registers for floating-point values. When printf goes looking for an int argument for %d, it will not find the bits of your double at all. Instead, what it finds in the place where it looks for an int argument might be whatever bits happened to be left in a register or memory location from previous operations, or it might be the bits of the next argument in the list of arguments. In any case, this means that the value printf prints for the %d will have nothing to do with the double value you passed, because the bits of the double are not involved in any way—a complete different set of bits is used.
This is also part of the reason the C standard says it does not define the behavior when the wrong argument type is passed for a printf conversion. Once you have messed up the argument list by passing double where an int should have been, all the following arguments may be in the wrong places too. They might be in different registers from where they are expected, or they might be in different stack locations from where they are expected. printf has no way to recover from this mistake.
As stated, all of the above neglects compiler optimization. The rules of C arose out of various needs, such as accommodating the problems above and making C portable to a variety of systems. However, once those rules are written, compilers can take advantage of them to allow optimization. The C standard permits a compiler to make any transformation of a program as long as the changed program has the same behavior as the original program under the rules of the C standard. This permission allows compilers to speed up programs tremendously in some circumstances. But a consequence is that, if your program has behavior not defined by the C standard (and not defined by any other rules the compiler follows), it is allowed to transform your program into anything. Over the years, compilers have grown increasingly aggressive about their optimizations, and they continue to grow. This means, aside from the simple behaviors described above, when you pass incorrect arguments to printf, the compiler is allowed to produce completely different results. Therefore, although you may commonly see the behaviors I describe above, you may not rely on them.
Footnote
1 Note that this is not a conversion. A conversion is an operation whose input is one type and whose output is another type but has the same value (or as nearly the same as is possible, in some sense, as when we convert a double 3.5 to an int 3). In some cases, a conversion does not require any change to the bits—an unsigned 3 and an int 3 use the same bits to represent 3, so the conversion does not change the bits, and the result is the same as a reinterpretation. But they are conceptually different.

Ternary operator in C

Why this program is giving unexpected numbers(ex: 2040866504, -786655336)?
#include <stdio.h>
int main()
{
int test = 0;
float fvalue = 3.111f;
printf("%d", test? fvalue : 0);
return 0;
}
Why it is printing unexpected numbers instead of 0? should it supposed to do implicit typecast? This program is for learning purpose nothing serious.
Most likely, your platform passes floating point values in a floating point register and integer values in a different register (or on the stack). You told printf to look for an integer, so it's looking in the register integers are passed in (or on the stack). But you passed it a float, so the zero was placed in the floating point register that printf never looked at.
The ternary operator follows language rules to decide the type of its result. It can't sometimes be an integer and sometimes be a float. Those could be different sizes, stored in different places, and so on, which would make it impossible to generate sane code to handle both possible result types.
This is a guess. Perhaps something completely different is happening. Undefined behavior is undefined for a reason. These kinds of things can be impossible to predict and very difficult to understand without lots of experience and knowledge of platform and compiler details. Never let someone convince you that UB is okay or safe because it seems to work on their system.
Because you are using %d for printing a float value. Use %f. Using %d to print a float value invokes undefined behavior.
EDIT:
Regarding OP's comments;
Why it is printing random numbers instead of 0?
When you compile this code, compiler should give you a warning:
[Warning] format '%d' expects argument of type 'int', but argument 2 has type 'double' [-Wformat]
This warning is self explanatory that this line of code is invoking an undefined behavior. This is because, the conversion specification %d specifies that printf is to convert an int value from binary to a string of decimal digits, while %f does the same for a float value. On passing the fvalue compiler know that it is of float type but on the other hand it sees that printf expects an argument of type int. In such cases, sometimes it does what you expect, sometimes it does what I expect. Sometimes it does what nobody expects (Nice Comment by David Schwartz).
See the test cases 1 and 2. It is working fine with %f.
should it supposed to do implicit typecast?
No.
Although the existing upvoted answers are correct, I think they are far too technical and ignore the logic a beginner programmer might have:
Let's look at the statement causing confusion in some heads:
printf("%d", test? fvalue : 0);
^ ^ ^ ^
| | | |
| | | - the value we expect, an integral constant, hooray!
| | - a float value, this won't be printed as the test doesn't evaluate to true
| - an integral value of 0, will evaluate to false
- We will print an integer!
What the compiler sees is a bit different. He agrees on the value of test meaning false. He agrees on fvalue beeing a float and 0 an integer. However, he learned that the different possible outcomes of the ternary operator must be of same type! int and float aren't. In this case, "float wins", 0 becomes 0.0f!
Now printf isn't type safe. This means you can falsely say "print me an integer" and pass an float without the compiler noticing. Exactly that happened. No matter what the value of test is, the compiler deduced that the result will be of type float. Hence, your code is equivalent to:
float x = 0.0f;
printf("%d", x);
At this point, you experience undefined behaviour. float simply isn't something integral what is expected by %d.
The observed behaviour is dependent on the compiler and machine you're using. You may see dancing elephants, although most terminals don't support that afaik.
When we have the expression E1 ? E2 : E3, there are four types involved. Expressions E1, E2 and E3 each have a type (and the types of E2 and E3 can be different). Furthermore, the whole expression E1 ? E2 : E3 has a type.
If E2 and E3 have the same type, then this is easy: the overall expression has that type. We can express this in a meta-notation like this:
(T1 ? T2 : T2) -> T2
"The type of a ternary expression whose alterantives are both of the same type T2
is just T2."
If they don't have the same type, things get somewhat interesting, and the situation is quite similar to E2 and E3 being involved together in an arithmetic operation. For instance if you add together an int and float, the int operand is converted to float. That is what is happening in your program. The type situation is:
(int ? float : int) -> float
the test fails, and so the int value 0 is converted to the float value 0.0.
This float value is not compatible with the %d conversion specifier of printf, which requires an int.
More precisely, the float value undergoes one more. When a float is passed as one of the trailing arguments to a variadic function, it is converted to double.
So in fact the double value 0.0 is being passed to printf where it expects int.
In any case, it is undefined behavior: it is nonportable code for which the ISO standard definition of the C language doesn't offer a meaning.
From here on, we can apply platform-specific reasoning why we don't just see 0. Suppose int is a 32 bit, four byte type, and double is the common 64 bit, 8 byte, IEE754 representation, and that an all-bits-zero is used for 0.0. So then, why isn't a 32 bit portion of this all-bits-zero treated by printf as the int value 0?
Quite possibly, the 64 bit double argument value forces 8 byte alignment when it is put onto the stack, possibly moving the stack pointer by four bytes. And then printf pulls out garbage from those four bytes, rather than zero bits from the double value.

why i am not getting the expected output?

int main()
{
int x;
float y;
char c;
x = -4443;
y = 24.25;
c = 'M';
printf("\nThe value of integer variable x is %f", (float)x);
printf("\nThe value of float variable y is %d", y);
printf("\nThe value of character variable c is %f\n",c);
return 0;
}
Output:
The value of integer variable x is -4443.000000
The value of float variable y is 0
The value of character variable c is 24.250000
Why am I not getting the expected output?
But when I am using external casting I am getting expected output which is:
The value of integer variable x is -4443.000000
The value of float variable y is 24
The value of character variable c is 77.000000
why i am not getting the expected output ?
Short answer: Because your expectations are wrong.
You're instructing the compiler to read an integer from where y is. Which is wrong. Format specifier don't tell the compiler to do casts, just what type to expect, and trust you to provide the right type.
The behaviour can be due to the fact that, for example, a float is stored in 8 bytes. The high-order bytes will be 0 in this case. But an int is stored in 4 bytes. So you tell the compiler read the int from where y is, it reads the first 4 bytes, which are 0, and prints 0...
EDIT: As John pointed out in the comments, this is UB, which means that anything can happen:
7.21.6.1/9
If a conversion specification is invalid, the behavior is undefined.282) If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.
Many computing platforms pass different types of arguments in different ways. On some platforms, floating-point arguments are passed in special floating-point registers. On most platforms, integer arguments are passed in general processor registers. Large arguments, such as structures, are stored somewhere in memory, and a pointer is passed instead (invisibly to the C source code). Once the few registers available for arguments are used, the remaining arguments are typically passed on the stack.
When you call printf, the compiler does not match the arguments you pass to the conversion specifiers in the format string. (Except that a good compiler will check and issue a warning if the types do not match.) In order to operate, the printf routine reads the format string and, when it finds a conversion specifier, it reads data from where the corresponding argument should be. If you specify “%d” but pass a float, the printf routine may read data from a general processor register, but the float value is in a floating-point register. Therefore, the value that is printed will be whatever data happened to be in the general processor register.
Similarly, when you specify “%f” but pass an integer, the printf routine may read from a floating-point register, but the integer value is in a general processor register.
The compiler will not convert printf arguments to the target type and might not warn you about the mismatches. You must match the conversion specifiers in the format string to the argument types.
Bonus: Here are documents describing how arguments are passed to subroutines on one platform (Mac OS X).
You cannot format a char as a float "%f", use "%c" or "%d" instead. I find that http://www.cplusplus.com/reference/clibrary/cstdio/printf/ is a good reference.
The format specifiers and the types of the arguments don't match, which I believe causes undefined behavior. printf doesn't do casting for you, so you have to explicitly cast the arguments.

Resources