Integer Overflow and the difference between pow() and multiplication - c

When I tried this multiplication compiler gave an integer overflow error
int main(){
long long int x;
x = 55201 * 55201;
printf("%lld", x);
return 0;
}
But When i do the same operation with pow() function i do not get any error.
int main(){
long long int x;
x = pow(55201, 2);
printf("%lld", x);
return 0;
}
Why is that so? I must use the first code.

You need to change your code like this
int main(){
long long int x;
x = 55201LL * 55201LL; // <--- notice the LL
printf("%lld", x);
return 0;
}
to make the multiplication done as long long
When you use the pow function you don't see any problems because pow uses floating point for calculations.

Here (Linux 64bits, gcc 5.2.1), 55201 is an integer literal of size 4, and the expression 55201 * 55201 seems to be stored in an integer of size 4 before being assigned to your long long int.
One option is storing the factor in another variable before multiplying, to increase the range.
int main(){
long long int x, factor;
factor = 55201;
x = factor * factor;
printf("%lld", x);
return 0;
}

In below code 55201 is taken as integer by default and then multiplied and result will also be an integer after multiplying. During code optimization phase multiplication is going to be calculated but then it seems to overflow the integer limit...That's why compiler generates the warning i.e. integer overflow
int main(){
long long int x;
x = 55201 * 55201;
printf("%lld", x);
return 0;
}
Declaration of pow is as:
double pow(double x, double y);
But in second case function pow take every arguments as double so now "55201" and "2" will be implicitly cast as double and now calculation takes place on the double precision so after calculation result will not cross the limit for double type...And hence the compiler will not generate any overflow message in this case.
To establish same result but using method 1 can be done as:
long long int result, number;
number = 55201;
result = number * number;
// Print result as..
printf("%lld\n", result);
That's it.. Was it helpful to understand...

Problem is the operation is performed with the largest type of the operands, at least int (C standard, 6.3.1.8). The assignment is just another expression in C and the type of the left hand side of the = is irrelevant for the right hand side operation.
On your platform, both constants fit into an int, so the expression 55201 * 55201 is evaluated as int. Problem is the result does not fit into an int, thus generates an overflow.
Signed integer overflow is undefined behaviour. This means everything can happen. Luckily your compiler is clever enough to detect this and warn you instead of the computer jumping out of the window. Briefly: avoid it!
Solution is to perform the operation with a type which can hold the full result. A short calculation yields that the product requires 32 bits to represent the value. Thus an unsigned long would be sufficient. If you want a signed integer, you need another bit for the sign, i.e. 33 bits. Such a type is very rare nowadays, so you have to use a long long which has at least 64 bits. (Don't feel tempted to use long, even iff it has 64 bits on your platform; this makes your code implementation defined, thus non-portable without any benefit.)
For this, you need at least one of the operands to have the type of the result type:
x = 55201LL * 55201; // first factor is a long long constant
If variables are involved use a cast:
long f1 = 55201; // an int is not guaranteed to hold this value!
x = (long long)f1 * 55201;
Note not using L suffix for the constants here. They will automatically be promoted to the smallest type (int at least) which can represent the value.
The other expression x = pow(55201, 2) uses a floating point function. Thus the arguments are converted to double before pow is called. The double result is converted by the assignment operator to the left hand side type.
This has two problems:
A double is not guaranteed to have a mantissa of 63 bits (excluding sign) like a long long. The common IEEE754 implementations have this problem. (This is not relevant for this specific calculation)
All floating point arithmetic may include rounding errors, so the result might deviate from the exactly result. That's why you have to use the first version.
As a general rule one should never use floating point arithmetic if an exact result is required. And mixing floating point and integer should be done very cautiously.

Related

Is type casting always necessary for all arithmetic operations?

I have studied that when division is performed then if there are 2 datatypes like there is a float value which is divided by an int value then it's converted to the much higher precision i.e. float.
So is it always the case with arithmetic operations that compiler will do an implicit type conversion or do we have to perform this type casting explicitly somewhere or else it will give error.
In general, you are not required to cast operands when performing arithmetic. C has a number of rules for automatically converting operands, and they serve well in many situations.
In many current C implementations, float is not more precise than int. int is commonly 32 bits, and float has 24 bits for the significand (fraction portion of the floating-point number), along with about eight for the exponent and one for the sign. This gives float wider range but less precision. The conversion rules give a ranking of types used for the conversions, but it is not strictly from less precise to more precise.
The automatic conversions do not serve all situations, and C programmers need to become familiar with the rules so they know when to add casts. These include:
When the result is not representable, or is not representable with desired accuracy, in the default type.
When the automatic conversions would cause errors in one of the operands.
An example of 1 is when we want to divide two integers and get a floating-point result:
float x = 1/3; // Wrong, integer division is performed, yielding zero, but we want (approximately) ⅓.
float x = (float) 1 / 3; // Right, convert at least one operand to float (or use 1.f, a float constant).
Another example is when two integers might overflow:
int x = Some large integer;
int y = Some large integer;
long z = x*y; // Wrong, result may overflow.
long z = (long) x * y; // Possibly right, long may be wide enough to represent product.
An example of 2 is when conversion from int to float may lose precision:
float x = 2;
int y = 123456789;
double z = x*y; // Wrong, converting 123456789 to float loses precision and produces 123456792 in many C implementations.
double z = x * (double) y; // Right, double has enough precision in many C implementations.

Difference between the result after dividing by 2 and multiplying with 0.5

#include <stdio.h>
int main() {
unsigned long long int c = 9999999999999999999U / 2;
unsigned long long int d = 9999999999999999999U * 0.5;
unsigned long long int e = 9999999999999999999U >> 1;
printf("%llu\n%llu\n%llu\n", c, d, e);
return 0;
}
So the output of that is:
4999999999999999999
5000000000000000000
4999999999999999999
Why is there a difference when multiplied by 0.5?
and why doesn't this difference show up when the numbers are small?
In the case of d, 9999999999999999999 is promoted to a double, which if your C implementation uses IEEE 754 doubles, would be converted to 10000000000000000000 (if I did my calculations correctly) because they only have 53 bits available in the significand, one of which is an implied 1. Multiplying 10000000000000000000 by 0.5 is 5000000000000000000. Floating point is weird. Read up on it at https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html.
9999999999999999999U is a large number. It requires 64 bits to represent in binary. Type unsigned long long int is guaranteed by the C Standard to have at least 64 value bits, so depending on the actual range of smaller integer types, it is a integer constant with type unsigned int, unsigned long int or at most unsigned long long int.
The expressions 9999999999999999999U / 2 and 9999999999999999999U >> 1 are thus fully defined and evaluate to 4999999999999999999, typically at compile time through constant folding, with the same type. This value can be stored into c and e and output correctly by printf with a format %llu as expected.
Conversely 9999999999999999999U * 0.5 (or similarly 9999999999999999999U / 2.0) is evaluated as a floating point expression: (double)9999999999999999999U * 0.5, the floating point result of type double is converted an unsigned long long int when assigned to d.
The double type is only guaranteed to provide enough precision for converting numbers up to 10 decimal digits without loss, a lot less than required for your number. Most C implementations use IEEE-754 representation for the double type that has exactly 53 bits of precision. The value 9999999999999999999 is thus rounded as 1E19 when converted to a double. Multiplying by 0.5 or dividing by 2.0 is performed exactly as it only changes the binary exponent part. The result 5E18 is converted to unsigned long long int and printed as 5000000000000000000 as you see on your system.
The differences are explained with type propagation.
First example, dividing integer by integer. Dividing by two and right-shifting are equivalent here, they are done on the operands as-is.
Second example, dividing integer by double. Here, the compiler will first convert the integer operand to a double (which only guarantees ten decimal digits, I think) and then performs the division. In order to store the result in an integer again, it is truncated.
I hope that illustrates that there are different operations going on that are caused by different types of operands, even though they seem to be the similar from a mathematical point of view.

Multiplying two long numbers

I have tried to multiply to numbers i.e. 10000 and 10000 + 1 through C program. But I am not getting the correct output.
printf("%lld",(100000)*(100001));
I have tried the above code on different compilers but I am getting same 1410165408 instead of 10000100000.
Well, let's multiply
int64_t a = 100000;
int64_t b = 100001;
int64_t c = a * b;
And we'll get (binary)
1001010100000011010110101010100000 /* 10000100000 decimal */
but if you convert it to int32_t
int32_t d = (int32_t) c;
you'll get the last 32 bits only (and throw away the top 10):
01010100000011010110101010100000 /* 1410165408 decimal */
A simplest way out, probably, is to declare both constants as 64-bit values (LL suffix stands for long long):
printf("%lld",(100000LL)*(100001LL));
In C, the type which is used for a calculation is determined from the type of the operands, not from the type where you store the result in.
Plain integer constants such as 100000 is of type int, because they will fit inside one. The multiplication of 100000 * 100001 will however not fit, so you get integer overflow and undefined behavior. Switching to long won't necessarily solve anything, because it might be 32 bit too.
In addition, printing an int with the %lld format specifier is also undefined behavior on most systems.
The root of all evil here is the crappy default types in C (called "primitive data types" for a reason). Simply get rid of them and all their uncertainties, and all your bugs will go away with them:
#include <stdio.h>
#include <inttypes.h>
int main(void)
{
printf("%"PRIu64, (uint64_t)100000 * (uint64_t)100001);
return 0;
}
Or equivalent: UINT64_C(100000) * UINT64_C(100001).
Your two integers are int, that will make the result int too. That the printf() format specifier says %lld, which needs long long int, doesn't matter.
You can cast or use suffixes:
printf("%lld", 100000LL * 100001LL);
This prints 10000100000. Of course there's still a limit, since the number of bits in a long long int is still constant.
You can do it like this
long long int a = 100000;
long long int b = 100001;
printf("%lld",(a)*(b));
this will give the correct answer.
What you are doing is (100000)*(100001) i.e by default compiler takes 100000 into an integer and multiplies 100001 and stores it in (int)
But during printf it prints (int) as (long long int)

adding and subtracting float from unsigned short in C

I ran to some problem and it is driven me nuts.
I have a code like this
float a;
unsigned short b;
b += a;
When a is negative, b is going bananas.
I even did a cast
b += (unsigned short) a;
but it doesn't work.
What did I do wrong? How can I add float to a unsigned short?
FYI:
When 'a' is -1 and b is 0 then I'll see 'b +=a' will give b = 65535.
The way to add a float to an unsigned short is simply to add it, exactly as you've done. The operands of the addition will undergo conversions, as I'll describe below.
A simple example, based on your code, is:
#include <stdio.h>
int main(void) {
float a = 7.5;
unsigned short b = 42;
b += a;
printf("b = %hu\n", b);
return 0;
}
The output, unsurprisingly, is:
b = 49
The statement
b += a;
is equivalent to:
b = b + a;
(except that b is only evaluated once). When operands of different types are added (or subtracted, or ...), they're converted to a common type based on a set of rules you can find in the C standard section 6.3.1.8. In this case, b is converted from unsigned short to float. The addition is equivalent to 42.0f + 7.5f, which yields 49.5f. The assignment then converts this result from float to unsigned short, and the result,49is stored inb`.
If the mathematical result of the addition is outside the range of float (which is unlikely), or if it's outside the range of unsigned short (which is much more likely), then the program will have undefined behavior. You might see some garbage value stored in b, your program might crash, or in principle quite literally anything else could happen. When you convert a signed or unsigned integer to an unsigned integer type, the result is wrapped around; this does not happen when converting a floating-point value to an unsigned type.
Without more information, it's impossible to tell what problem you're actually having or how to fix it.
But it does seem that adding an unsigned short and a float and storing the result in an unsigned short is an unusual thing to do. There could be situations where it's exactly what you need (if so you need to avoid overflow), but it's possible that you'd be better off storing the result in something other than an unsigned short, perhaps in a float or double. (Incidentally, double is used more often than float for floating-point data; float is useful mostly for saving space when you have a lot of data.)
If you're doing numeric conversions, even implicit ones, it's often (but by no means always) an indication that you should have used a variable of a different type in the first place.
Your question would be improved by showing actual values you have trouble with, and explaining what value you expected to get.
But in the meantime, the definition of floating to integer conversion in C11 6.3.1.4/1 is:
When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
This comes into play at the point where the result of b + a, which is a float, is assigned back to b. Recall that b += a is equivalent to b = b + a.
If b + a is a negative number of -1 or greater magnitude, then its integral part is out of range for unsigned short so the code causes undefined behaviour which means anything can happen; including but not limited to going bananas.
A footnote repeats the point that the float is not first converted to a signed integer and then to unsigned short:
The remaindering operation performed when a value of integer type is converted to unsigned type need not be performed when a value of real floating type is converted to unsigned type. Thus, the range of portable real floating values is (−1, Utype_MAX+1)
As an improvement you could write:
b += (long long)a;
which will at least not cause UB so long as a > LLONG_MIN.
You want b to be positive (it is unsigned), but a can be negative. It is OK as long as a is not larger than b. This is first point.
Second - when you are casting negative value to unsign.. what actually the result is supposed to be? Number sign is stored in most significant bit and for negative values it is 1. When value is unsigned when if most significant bit is 1 the value is really high and has nothing in common with negative one.
Maybe trying b -= fabs(a) for negative a. Isn't that what you are looking for?
You are observing the combination of the float being converted to an integer, and unsigned integer wrap-around ( https://stackoverflow.com/a/9052112/1149664 ).
Consider
b += a
for example with a = -100.67 you add a negative value to a signed data type, and depending on the initial value of b the result aught to be negative. How come you got the idea to use an unsigned short and not just float or double for this task?

Weirdness with unsigned int, float data types and multiplication

I am not very good at C language and just met a problem I don't understand. The code is:
int main()
{
unsigned int a = 100;
unsigned int b = 200;
float c = 2;
int result_i;
unsigned int result_u;
float result_f;
result_i = (a - b)*2;
result_u = (a - b);
result_f = (a-b)*c;
printf("%d\n", result_i);
printf("%d\n", result_u);
printf("%f\n", result_f);
return 0;
}
And the output is:
-200
-100
8589934592.000000
Program ended with exit code: 0
For (a-b) is negative and a,b are unsigned int type, (a-b) is trivial. And after multiplying a float type number c, the result is 8589934592.000000. I have two questions:
First, why the result is non-trivial after multiplying int type number 2 and assigned to an int type number?
Second, why the result_u is non-trivial even though (a-b) is negative and result_u is unsigned int type?
I am using Xcode to test this code, and the compiler is the default APPLE LLVM 6.0.
Thanks!
Your assumption that a - b is negative is completely incorrect.
Since a and b have unsigned int type all arithmetic operations with these two variables are performed in the domain of unsigned int type. The same applies to mixed "unsigned int with int" arithmetic as well. Such operations implement modulo arithmetic, with the modulo being equal to UINT_MAX + 1.
This means that expression a - b produces a result of type unsigned int. It is a large positive value equal to UINT_MAX + 1 - 100. On a typical platform with 32-bit int it is 4294967296 - 100 = 4294967196.
Expression (a - b) * 2 also produces a result of type unsigned int. It is also a large positive value (UINT_MAX + 1 - 100 multiplied by 2 and taken modulo UINT_MAX + 1). On a typical platform it is 4294967096.
This latter value is too large for type int. Which means that when you force it into a variable result_i, signed integer overflow occurs. The result of signed integer overflow on assignment is implementation defined. In your case result_i ended up being -200. It looks "correct", but this is not guaranteed by the language. (Albeit it might be guaranteed by your implementation.)
Variable result_u receives the correct unsigned result - a positive value UINT_MAX + 1 - 100. But you print that result using %d format specifier in printf, instead of the proper %u. It is illegal to print unsigned int values that do not fit into the range of int using %d specifier. The behavior of your code is undefined for that reason. The -100 value you see in the output is just a manifestation of that undefined behavior. This output is formally meaningless, even though it appears "correct" at the first sight.
Finally, variable result_f receives the "proper" result of (a-b)*c expression, calculated without overflows, since the multiplication is performed in the float domain. What you see is that large positive value I mentioned above, multiplied by 2. It is likely rounded to the precision of float type though, which is implementation-defined. The exact value would be 4294967196 * 2 = 8589934392.
One can argue that the last value you printed is the only one that properly reflects the properties of unsigned arithmetic, i.e. it is "naturally" derived from the actual result of a - b.
You get negative numbers in the printf because you've asked it to print a signed integer with %d. Use %u if you want to see the actual value you ended up with. That will also show you how you ended up with the output for the float multiplication.

Resources