MISRA violation "441 - Float cast to non-float "

MISRA violation "441 - Float cast to non-float " - c

I am trying to correct the MISRA violation "441 - Float cast to non-float" that is occurring with the following code:
tULong frames = (tULong)(runTimeSeconds * 40.0f);
runTimeSeconds is a float and obviously 40.0f is assigned as a float. Any ideas?

There is a rule (MISRA-C:2004 10.4) stating the value of a complex expression of floating type may only be cast to a narrower floating type.
(runTimeSeconds * 40.0f) is such a so-called complex expression (a MISRA-C:2004 term). To dodge the MISRA violation, you can introduce a temporary variable:
float tmp = runTimeSeconds * 40.0f;
tULong frames = (tULong)tmp; // no complex expression, this is fine
The rationale for this rule is that complex expressions could potentially contain implicit type promotions and similar dangerous things.
MISRA-C:2004 is also worried/paranoid about incompetent programmers who think that changing code like uint8_t u8a, u8b; ... u8a + u8b into (uint32_t)(u8a + u8b) would somehow cause the addition to get carried out as an unsigned 32 bit type.
These rules have been improved in MISRA-C:2012 and are more reasonable there. A cast from a float expression to an unsigned one is fine as per MISRA-C:2012 10.5.

<math.h> has a nice family of functions that round and convert in one call. No cast needed to convert from float to tULong. Below has a (tULong) cast to handle an integer to integer conversion which may be eliminated depending on unposted issues of range and tULong details.
#include <math.h>
// long int lrintf(float x);
// long long int llrint(double x);
// 4 others
tULong frames = (tULong) llrintf(runTimeSeconds * 40.0f);
This rounds rather than truncates like OP's original code.

If the idea is to truncate the result, use the truncf function:
ULong frames = truncf(runTimeSeconds * 40.0f);
That way, your intention is made explicitly.

Related

long double in fabs, range and overflow errors

At wiki.sei.cmu.edu, they claim the following code is error-free for out-of-range floating-point errors during assignment; I've narrowed it down to the long double case:
Compliant Solution (Narrowing Conversion)
This compliant solution checks whether the values to be stored can be represented in the new type:
#include <float.h>
#include <math.h>
void func(double d_a, long double big_d) {
double d_b;
// ...
if (big_d != 0.0 &&
(isnan(big_d) ||
isgreater(fabs(big_d), DBL_MAX) ||
isless(fabs(big_d), DBL_MIN))) {
/* Handle error */
} else {
d_b = (double)big_d;
}
}
Unless I'm missing something, the declaration of fabs according to the C99 and C11 standards is double fabs(double x), which means it takes a double, so this code isn't compliant, and instead long double fabsl(long double x) should be used.
Further, I believe isgreater and isless should be declared as taking a long double as their first parameters (since that's what fabsl returns).
#include <stdio.h>
#include <math.h>
int main(void)
{
long double ld = 1.12345e506L;
printf("%lg\n", fabs(ld)); // UB: ld is outside the range of double (~ 1e308)
printf("%Lg\n", fabsl(ld)); // OK
return 0;
}
On my machine, this produces the following output:
inf
1.12345e+506
along with a warning (GCC):
warning: conversion from 'long double' to 'double' may change value [-Wfloat-conversion]
printf("%lg\n", fabs(ld));
^~
Am I therefore correct in saying their code results in undefined behavior?
On p. 211 of the C99 standard there's a footnote that reads:
Particularly on systems with wide expression evaluation, a <math.h> function might pass arguments
and return values in wider format than the synopsis prototype indicates.
and on some systems long double has the exact same value range, representation, etc. as double, but this doesn't mean the code above is portable.
Now I have a related question here, and I'd just like to ask for confirmation (I've read through dozens of questions and answers here, but I'm still a little confused because they often deal with specific examples and specific types, not all of them are sourced, or they're about C++, and I think it'd be a waste of time to ask each of these questions as a separate, "formal" question on Stack Overflow): according to the C99 and C11 standards, there's a difference between overflow, which occurs during an arithmetic operation, and a range error, which occurs when a value is too large to be represented in a given type. I've provided excerpts from the C99 standard that talk about this, and I'd appreciate it if someone could confirm that my interpretation is correct. (I'm aware of the fact that certain implementations define what happens when undefined behavior occurs, e.g. as explained here, but that's not what I'm interested in right now.)
for floating-point types, overflow results in some representation of a "large value" (i.e. as defined by the HUGE_VAL* macro definition as per 7.12.1):
A floating result overflows if the magnitude of the mathematical result is finite but so
large that the mathematical result cannot be represented without extraordinary roundoff
error in an object of the specified type. If a floating result overflows and default rounding
is in effect, or if the mathematical result is an exact infinity (for example log(0.0)),
then the function returns the value of the macro HUGE_VAL, HUGE_VALF, or HUGE_VALL according to the return type, with the same sign as the correct value of the
function;
On my system, HUGE_VAL* is defined as INFINITY cast to the appropriate floating-point type.
So this is completely legal, the value of HUGE_VAL* being implementation-defined or something like that notwithstanding.
for floating-point types, a range error results in undefined behavior (6.3.1.5):
When a double is demoted to float, a long double is demoted to double or
float, or a value being represented in greater precision and range than required by its
semantic type (see 6.3.1.8) is explicitly converted to its semantic type [...]. If the value being converted is outside the range of values that can be represented, the behavior is undefined.

When to use casting in C

I have a statement in C code which I suspect may be giving me periodic errors, so want to make sure I am doing the right thing as it mixes types.
Objective is to change timebase from 1/32768 seconds to 1/1024, with all times 32 bit integers.
What I have is this:
ts_sys = latest_timestamp * VELO_TICKS_FROM_RTC;
Where ts_sys and latest_timestamp are both unsigned 32 bit integers.
VELO_TICKS_FROM_RTC is a define as follows:
#define VELO_TICKS_PER_SECOND 1024
#define VELO_TICKS_FROM_RTC (VELO_TICKS_PER_SECOND / 32768.0f)
Should I be using a cast here to make sure the division doesn't return an integer (which would be zero) and therefore return the wrong thing? For example would this be better:
ts_sys = (uint32_t) ((float)latest_timestamp * VELO_TICKS_FROM_RTC);
but that seems like overkill..

Should I be using a cast here to make sure the division doesn't return an integer (which would be zero) and therefore return the wrong thing?
no, you are doing A/B, and B is a float, so the compiler promotes A to float and the result is a float!

"Should I be using a cast here to make sure the division doesn't return an integer?"
No, 1024 is of type int and latest_timestamp is of type uint32_t. Both get converted to float in the arithmetic expressions before the respective calculation is done:
Otherwise, if the corresponding real type of either operand is float, the other operand is converted, without change of type domain, to a type whose corresponding real type is float.
C18, §6.3.1.8/1; "Usual arithmetic conversions"
"but that seems like an overkill..."
It is.

MISRA warning 12.4: integer conversion resulted in truncation (negation operation)

In a huge macro I have in a program aimed for a 16-bit processor, the following code (simplified) appears several times:
typedef unsigned short int uint16_t;
uint16_t var;
var = ~0xFFFF;
MISRA complains with the warning 12.4: integer conversion resulted in truncation. The tool used to get this is Coverity.
I have checked the forum but I really need a solution (instead of changing the negation by the actual value) as this line is inside a macro with varying parameters.
I have tried many things and here is the final attempt which fails also:
var = (uint16_t)((~(uint16_t)(0xFFFFu))&(uint16_t)0xFFFFu);
(the value 0xFFFF is just an example. In the actual code, the value is a variable which can take whatever value (but 16 bits))
Do you have any other idea please? Thanks.
EDIT:
I have tried then to use 32bits value and the result is the same with the following code:
typedef unsigned int uint32_t;
uint32_t var;
var = (uint32_t)(~(uint32_t)(0xFFFF0000u));

Summary:
Assuming you are using a static analyser for MISRA-C:2012, you should have gotten warnings for violations against rule 10.3 and 7.2.
Rule 12.4 is only concerned with wrap-around of unsigned integer constants, which can only occur with the binary + and - operators. It seems irrelevant here.
The warning text doesn't seem to make sense for neither MISRA-C:2004 12.4 nor MISRA-C:2012 12.4. Possibly, the tool is displaying the wrong warning.
There is however a MISRA:2012 rule 10.3 that forbids to assign a value to a variable that is of a smaller type than intended in the expression.
To use MISRA terms, the essential type of ~0xFFFF is unsigned, because the hex literal is of type unsigned int. On your system, unsigned int is apparently larger than uint16_t (int is a "greater ranked" integer type than short in the standard 6.3.1.1, even if they are of the same size). That is, uint16_t is of a narrower essential type than unsigned int, so your code does not conform to rule 10.3. This is what your tool should have reported.
The actual technical issue, which is hidden behind the MISRA terms, is that the ~ operator is dangerous because it comes with an implicit integer promotion. Which in turn causes code like for example
uint8_t x=0xFF;
~x << n; // BAD, always a bug
to invoke undefined behavior when the value 0xFFFFFF00 is left shifted.
It is therefore always good practice to cast the result of the ~ operator to the correct, intended type. There was even an explicit rule about this in MISRA 2004, which has now merged into the "essential type" rules.
In addition, MISRA (7.2) states that all integer constants should have an u or U suffix.
MISRA-C:2012 compliant code would look like this:
uint16_t var;
var = (uint16_t)~0xFFFFu;
or overly pedantic:
var = (uint16_t)~(uint16_t)0xFFFFu;

When the compiler looks at the right side, first it sees the literal 0xFFFF. It is automatically promoted to an integer which is (obvious from the warning) 32-bit in your system. Now we can imagine that value as 0x0000FFFF (whole 32-bit). When the compiler does the ~ operation on it, it becomes 0xFFFF0000 (whole 32-bit). When you write var = ~0xFFFF; the compiler in fact sees var = 0xFFFF0000; just before the assign operation. And of course a truncation happens during this assignment...

Casting in mixed type calculations in C?

If I define these variables:
double x0, xn, h;
int n;
and I have this mathematical expression:
h = (xn - x0)/n;
Is it necessary that I cast n into double prior doing the division for maximum accuracy like in
h = (xn - x0)/ (double) n;
I wrote a program to check the above but both expressions give the same answers. I understand that C will promote the integer to double type as variables xn and x0 are of type double but strangely enough in a book, the second expression with casting was emphasized.
My question would be if I'm thinking right.
Thanks a lot...

Your understanding is correct, and the book you read is either mistaken or being over-cautious (like people who claim that you should always test 0 == x instead of x == 0). The expression without the cast should always give precisely the same result as the expression with the cast.

No, this conversion is unnecessary because the numerator is a double. This promotes n to a double as well. The book probably mentions the explicit cast as a good habit, because if xn and x0 were ints then the expression would only use integer division.

It's unnecessary in this situation. It's typically needed in situations where you want a result that's different in type from the operands. A typical one is when doing timing. You often end up with code like: double(end_time-start_time)/CLOCKS_PER_SEC; In this case, the cast really is needed, because all the inputs are (typically) integer types, but you want a floating point result.

In C, the precision of any expression is the same as the highest precision constant/variable involved in the expression.
Explicit Casts are useful:
As a precaution.
Tomorrow you may edit the variables in
the expression to use ints. A cast
would still return the proper value
As a guide.
Someone else refering/modfiying ur code will easily understand that U are using a double.
i.e. "Let Ur code be its own comment !!"

Can I compare and add a floating-point number to an integer in C?

Can I compare a floating-point number to an integer?
Will the float compare to integers in code?
float f; // f has a saved predetermined floating-point value to it
if (f >=100){__asm__reset...etc}
Also, could I...
float f;
int x = 100;
x+=f;
I have to use the floating point value f received from an attitude reference system to adjust a position value x that controls a PWM signal to correct for attitude.

The first one will work fine. 100 will be converted to a float, and IEE754 can represent all integers exactly as floats, up to about 223.
The second one will also work but will be converted into an integer first, so you'll lose precision (that's unavoidable if you're turning floats into integers).

Since you've identified yourself as unfamiliar with the subtleties of floating point numbers, I'll refer you to this fine paper by David Goldberg: What Every Computer Scientist Should Know About Floating-Point Arithmetic (reprint at Sun).
After you've been scared by that, the reality is that most of the time floating point is a huge boon to getting calculations done. And modern compilers and languages (including C) handle conversions sensibly so that you don't have to worry about them. Unless you do.
The points raised about precision are certainly valid. An IEEE float effectively has only 24 bits of precision, which is less than a 32-bit integer. Use of double for intermediate calculations will push all rounding and precision loss out to the conversion back to float or int.

Mixed-mode arithmetic (arithmetic between operands of different types and/or sizes) is legal but fragile. The C standard defines rules for type promotion in order to convert the operands to a common representation. Automatic type promotion allows the compiler to do something sensible for mixed-mode operations, but "sensible" does not necessarily mean "correct."
To really know whether or not the behavior is correct you must first understand the rules for promotion and then understand the representation of the data types. In very general terms:
shorter types are converted to longer types (float to double, short to int, etc.)
integer types are converted to floating-point types
signed/unsigned conversions favor avoiding data loss (whether signed is converted to
unsigned or vice-versa depends on the size of the respective types)
Whether code like x > y (where x and y have different types) is right or wrong depends on the values that x and y can take. In my experience it's common practice to prohibit (via the coding standard) implicit type conversions. The programmer must consider the context and explicitly perform any type conversions necessary.

Can you compare a float and an integer, sure. But the problem you will run into is precision. On most C/C++ implementations, float and int have the same size (4 bytes) and wildly different precision levels. Neither type can hold all values of the other type. Since one type cannot be converted to the other type without loss of precision and the types cannot be native compared, doing a comparison without considering another type will result in precision loss in some scenarios.
What you can do to avoid precision loss is to convert both types to a type which has enough precision to represent all values of float and int. On most systems, double will do just that. So the following usually does a non-lossy comparison
float f = getSomeFloat();
int i = getSomeInt();
if ( (double)i == (double)f ) {
...
}

LHS defines the precision,
So if your LHS is int and RHS is float, then this results in loss of precision.
Also take a look at FP related CFAQ

Yes, you can compare them, you can do math on them without terribly much regard for which is which, in most cases. But only most. The big bugaboo is that you can check for f<i etc. but should not check for f==i. An integer and a float that 'should' be identical in value are not necessarily identical.

Yeah, it'll work fine. Specifically, the int will be converted to float for the purposes of the conversion. In the second one you'll need to cast to int but it should be fine otherwise.

Yes, and sometimes it'll do exactly what you expect.
As the others have pointed out, comparing, eg, 1.0 == 1 will work out, because the integer 1 is type cast to double (not float) before the comparison.
However, other comparisons may not.

About that, the notation 1.0 is of type double so the comparison is made in double by type promotion rules like said before. 1.f or 1.0f is of type float and the comparison would have been made in float. And it would have worked as well since we said that 2^23 first integers are representible in a float.