Assigning a value to a float variable changes the value in C - c

I am trying to calculate an average of integer numbers and assign it to a float variable. When I debug it with cgdb and print the right side of the average calculation, it gives me the right number. However, when I assign it to the (float*)payload the value changes from 401850471 to 401850464.00.
float sum= 0.0;
for (int i = 0;
i<avg_operator->data_source->column_pointer.result->num_tuples;
i++) {
sum+= ((int*)avg_operator->data_source->column_pointer.result->payload)[i];
}
((float*)avg_operator->result->payload)[0]=
sum/(float)avg_operator->data_source->column_pointer.result->num_tuples;

You cannot convert an int to a float by casting their pointers, that gives a random / undefined value. You need to dereference the float pointer, and assign the value.

that:
((float*)avg_operator->result->payload)[0]= sum/(float)avg_operator->data_source->column_pointer.result->num_tuples;
isn't casting, it's lying to the compiler. You should dereference, and no need to cast to float, as the conversion to integer is done automatically:
avg_operator->result->payload[0]= sum/(float)avg_operator->data_source->column_pointer.result->num_tuples;
(well, maybe you need to round the value instead of truncating, though)
also, since payload is an integer, no need to cast to integer pointer as well, just do:
sum+= avg_operator->data_source->column_pointer.result->payload[i];
and define sum as a float, one never knows with floating point accumulation error (if the sum isn't too big for an integer, that is)

When I debug it with cgdb and print the right side of the average calculation, it gives me the right number.
The debugger is showing the quotient using double math. C allows float division to use wider types. But once the quotient is assigned to a float, precision narrowing may occur.
401850471 is a 29 bit value. A float typically has 24 bits of precision. Something must give.
401850464.0 is the closest representable float to 401850471, so that speaks well to that at least there is a reasonable result.
OP is also doing other strange code manipulations. A recommend solution begins with a wider sum type and more precise division and storage.
long long sum = 0.0;
int n = avg_operator->data_source->column_pointer.result->num_tuples
int *data = (int*)avg_operator->data_source->column_pointer.result->payload;
for (int i = 0; i < n; i++) {
sum += data[i];
}
double average = 1.0 * sum / n;
printf("Average %f\n", average);
If the answer must be a float, code must live with a rounded (in a binary sense) average.

Related

Is it defined what will happen if you shift a float?

I am following This video tutorial to implement a raycaster. It contains this code:
if(ra > PI) { ry = (((int)py>>6)<<6)-0.0001; rx=(py-ry)*aTan+px; yo=-64; xo=-yo*aTan; }//looking up
I hope I have transcribed this correctly. In particular, my question is about casting py (it's declared as float) to integer, shifting it back and forth, subtracting something, and then assigning it to a ry (also a float) This line of code is entered at time 7:24, where he also explains that he wants to
round the y position to the nearest 64th value
(I'm unsure if that means the nearest multiple of 64 or the nearest (1/64), but I know that the 6 in the source is derived from the number 64, being 2⁶)
For one thing, I think that it would be valid for the compiler to load (say) a 32-bit float into a machine register, and then shift that value down by six spaces, and then shift it back up by six spaces (these two operations could interfere with the mantissa, or the exponent, or maybe something else, or these two operations could be deleted by a peephole optimisation step.)
Also I think it would be valid for the compiler to make demons fly out of your nose when this statement is executed.
So my question is, is (((int)py>>6)<<6) defined in C when py is float?
is (((int)py>>6)<<6) defined in C when py is float?
It is certainly undefined behavior (UB) for many float. The cast to an int is UB for float with a whole number value outside the [INT_MIN ... INT_MAX] range.
So code is UB for about 38% of all typical float - the large valued ones, NaNs and infinities.
For typical float, a cast to int128_t is defined for nearly all float.
To get to OP's goal, code could use the below, which I believe to be well defined for all float.
If anything, use the below to assess the correctness of one's crafted code.
// round the y position to the nearest 64th value
float round_to_64th(float x) {
if (isfinite(x)) {
float ipart;
// The modf functions break the argument value into integral and fractional parts
float frac = modff(x, &ipart);
x = ipart + roundf(frac*64)/64;
}
return x;
}
"I'm unsure if that means the nearest multiple of 64 or the nearest (1/64)"
On review, OP's code is attempting to truncate to the nearest multiple of 64 or 2⁶.
It is still UB for many float.
That code doesn't shift a float because the bitshift operators aren't defined for floating-point types. If you try it you will get a compiler error.
Notice that the code is (int)py >> 6, the float is cast to an int before the shift operation. The integer value is what is being shifted.
If your question is "what will happen if you shift a float?", the answer is it won't compile. Example on Compiler Explorer.
The best possible recreation of the shift ops for floating points, in short, without using additional functions are the following:
Left shift:
ShiftFloat(py,6,1);
Right shift:
ShiftFloat(py,6,0);
float ShiftFloat(float x, int count, int ismultiplication)
{
float value = x;
for (int i = 0; i < count; ++i)
{
value *= (powf(0.5,(float)(ismultiplication^1)) / powf(2.0,(float)(ismultiplication)));
}
return count != 0 ? value : x;
}

floating point exception(core dumped)

I tried with putting puts("..") to find where the mistake is, but it didn't help.
this is my third function, first and second are working.
i translated in englih, i hope it's understandably.
void write3(sth_st*E, int n, char* Typ){
int i;
int sum=0;
int count=0;
float result;
for(i=0; i<n; i++){
if(strcmp(Typ, E[i].typ)==0){
sum=sum+E[i].time;
count++;
}
}
FILE*write3;
write3=open("xD", "w");
puts("rand");
result=sum/count;
fprintf(write3, "%f", result);
return;
}
count is most likely integer and 0. The system error message is misleading, especially since dividing by 0 is perfectly valid for floating point values.
With the extra context, we can infer that Typ was not found in the E array or maybe n is too small, so count and sum both stay at 0 and the division sum/count invokes undefined behavior because it is an integer division.
If you convert one or the other to double, you will get a floating point division, which is undoubtedly what you expect and printf will print nan for this case.
result = (double)sum / count;
There is also a possiblity that the sum of all times may overflow the int type. You should make sum a double to avoid that.
Note that %f is the printf format for double. result is a float, but luckily floats are silently converted to double when passed to printf. There is no benefit at using float types, use double instead.

Trying to print answer to equation and getting zero in C.

printf("Percent decrease: ");
printf("%.2f", (float)((orgChar-codeChar)/orgChar));
I'm using this statement to print some results to my command console, however, I end up with zero. Putting the equation into another variable doesn't work either.
orgChar = 91 and codeChar = 13, how do I print out this equation?
Integer division will lead to result 0 here and you are type casting the result later to float so eventually you will end up with 0
Make any one of the variables float before division
(orgChar-codeChar)/(float)orgChar
As others have mentioned, the subtraction and division are done using integer math before the cast to (float). By that point, the integer division has a truncated result of 0. Instead:
// (float)((orgChar-codeChar)/orgChar)
((float) orgChar - codeChar)/orgChar
// or
(orgChar - codeChar)/ (float) orgChar
As the float argument gets converted to double as part of the "usual argument promotion" of arguments to a variadic function like printf(), might as well do
printf("%.2f", (orgChar-codeChar)/ (double) orgChar);
Casting, in general, should be avoided. Some casts unintentionally narrow the operation. If unsigned is 32-bit and a1 is uint64_t, then a1 was narrowed before the shift and unexpected results may occur. If a1 was a char, it is nicely converted without trouble to an unsigned.
The second method of *1u will not narrow. It will insure a2*1u is at least the width of an unsigned.
unsigned sh1 = (unsigned) a1 >> b1; // avoid
unsigned sh2 = a2*1u >> b2; // better
So recommend, rather than (float) or (double), use the idiom of multiplying by 1.
printf("%.2f", (orgChar - codeChar) * 1.0 / orgChar);
you don't need to typecast the whole expression. you can simply type cast either the numerator or the denominator to get the float result with precision of 2 decimal places.
for eg:
here in this code defining a variable c as float doesnt guarantee the result to be float.for getting the precise result you need to typecast either the numerator or denominator.
You shouldn't need to cast to float at all. Simply make sure both variables are of type float or double before attempting to print them as floats. This means either declaring the variables as floats, or using the correct function, such as atof () when converting the data to floats (normally this is done when you get the data from the command-line or a file.)
This should work...
#include <stdio.h>
int
main (void)
{
float orgChar = 91;
float codeChar = 13;
printf ("%.2f\n", (orgChar - codeChar) / orgChar);
return 0;
}

Average of an array displays correctly only if casted to float

So I have a pretty noobish question. Although I declare the average as a float, when I calculate it avg = sum / counter;, where counter is the number of elements bigger than 0 in an array, and then print it, I get only 0s after the decimal point.
However if I calculate it by casting to a float, avg = (float) sum/counter;, the average is printed out correctly.
Shouldn't the first one be correct? If I declare a variable as a float, why should I cast it later to a float again?
When you declare
int sum;
int counter;
...
then sum / counter performs an integer division, resulting in an integer value. You can still assign that result to a float variable, but the value will remain the integer part only.
To solve this, you need to cast either sum or counter to a float - only then you are getting the float value also as a result:
float result = (float) sum / counter;
This is, by the way, the same as ((float) sum) / counter - means, the cast as you wrote it applies to sum.
The cast as you wrote it applies to sum, not to avg, the result you obtain is perfectly normal if sum is an integer type.
The operator / applied to two integers performs integer division. The operator / applied to two floating point values performs floating point division. If one value is floating point, the other is promoted to floating point. But if both are integer, integer division is performed.
Later, the result of this operation, which already exists in memory, is assigned to another variable. But that is another story. You need to get the correct result in the first place, and then you can assign it to a variable which will store it.
The problem is sum is of type integer so it rounds of to 0's after the decimal point.It does not automatically type cast.So either declare sum as float or manually type cast the result as float

Floating point rounding in C

I've run into some weird rounding behaviour with floats. The code below demonstrates the problem. What is the best way to solve this? I've been looking for solutions but haven't had much luck.
#include<stdio.h>
int main(void)
{
float t;
t = 5592411;
printf("%f\n", 1.5*t);
t *= 1.5;
printf("%f\n", t);
return 0;
}
The code above should print out the same value, but I get this on my setup using GCC 4.7.2:
8388616.500000
8388616.000000
If I use a calculator, I get the first value, so I assume the second is being rounded somehow. I have identical Fortran code which does not round the value(has the 0.5).
1.5 is a double constant rather than a float and C has automatic promotion rules. So when you perform 1.5*t what happens is (i) t is converted to a double; (ii) that double is multiplied by the double 1.5; and (iii) the double is printed (as %f is the formatter for a double).
Conversely, t *= 1.5 promotes t to a double, performs a double multiplication and then truncates the result to store it back into a [single precision] float.
For evidence, try either:
float t;
t = 5592411;
printf("%f\n", 1.5f*t); // multiply a float by a float, for no promotion
t *= 1.5;
printf("%f\n", t);
return 0;
Or:
double t; // store our intermediate results in a double
t = 5592411;
printf("%f\n", 1.5f*t);
t *= 1.5;
printf("%f\n", t);
return 0;
The first calculation is done with double precision, the second is calculated the same, but truncated to single precision in the assignment to float.
If you use double for your variable, you'll get the same result. It's a good idea to use this type over float whenever accuracy may be a concern.
In the first case, the result is a double which can precisely represent the desired value.
In the second case, the result is a float which can't precisely represent the desired value.
Try the same with double and you'll end up with the same results either way.
#include<stdio.h>
int main(void)
{
double t;
t = 5592411;
printf("%f\n", 1.5*t);
t *= 1.5;
printf("%f\n", t);
return 0;
}
Writing 1.5 in C code is interpreted as a double, which has more precision than the float type.
The first case,
printf("%f\n", 1.5*t);
results in t being implicitly converted to a double (with greater precision) and then multiplied. The printf function, which casts the input corresponding to %f anyway, prints the result, which is also a double.
The second case has the 1.5 being converted to the float type, which has less precision and cannot store as small details.
If you want to avoid this effect, use 1.5f instead on 1.5 to use floats, or change the type of t to double.
Whether this would work at all depends on the machine representation of floats and doubles. Passing a float on a typical 32 bit architecture pushes 4 bytes on the argument stack. Passing a double would push 8 bytes. Passing a double but using %f is asking to treat it as a float which will look at the first 4 bytes pushed in our typical case. Depending on machine representation this might be close to the intended result or might be way out in left field.

Resources