MISRA warning 12.4: integer conversion resulted in truncation (negation operation) - c

In a huge macro I have in a program aimed for a 16-bit processor, the following code (simplified) appears several times:
typedef unsigned short int uint16_t;
uint16_t var;
var = ~0xFFFF;
MISRA complains with the warning 12.4: integer conversion resulted in truncation. The tool used to get this is Coverity.
I have checked the forum but I really need a solution (instead of changing the negation by the actual value) as this line is inside a macro with varying parameters.
I have tried many things and here is the final attempt which fails also:
var = (uint16_t)((~(uint16_t)(0xFFFFu))&(uint16_t)0xFFFFu);
(the value 0xFFFF is just an example. In the actual code, the value is a variable which can take whatever value (but 16 bits))
Do you have any other idea please? Thanks.
EDIT:
I have tried then to use 32bits value and the result is the same with the following code:
typedef unsigned int uint32_t;
uint32_t var;
var = (uint32_t)(~(uint32_t)(0xFFFF0000u));

Summary:
Assuming you are using a static analyser for MISRA-C:2012, you should have gotten warnings for violations against rule 10.3 and 7.2.
Rule 12.4 is only concerned with wrap-around of unsigned integer constants, which can only occur with the binary + and - operators. It seems irrelevant here.
The warning text doesn't seem to make sense for neither MISRA-C:2004 12.4 nor MISRA-C:2012 12.4. Possibly, the tool is displaying the wrong warning.
There is however a MISRA:2012 rule 10.3 that forbids to assign a value to a variable that is of a smaller type than intended in the expression.
To use MISRA terms, the essential type of ~0xFFFF is unsigned, because the hex literal is of type unsigned int. On your system, unsigned int is apparently larger than uint16_t (int is a "greater ranked" integer type than short in the standard 6.3.1.1, even if they are of the same size). That is, uint16_t is of a narrower essential type than unsigned int, so your code does not conform to rule 10.3. This is what your tool should have reported.
The actual technical issue, which is hidden behind the MISRA terms, is that the ~ operator is dangerous because it comes with an implicit integer promotion. Which in turn causes code like for example
uint8_t x=0xFF;
~x << n; // BAD, always a bug
to invoke undefined behavior when the value 0xFFFFFF00 is left shifted.
It is therefore always good practice to cast the result of the ~ operator to the correct, intended type. There was even an explicit rule about this in MISRA 2004, which has now merged into the "essential type" rules.
In addition, MISRA (7.2) states that all integer constants should have an u or U suffix.
MISRA-C:2012 compliant code would look like this:
uint16_t var;
var = (uint16_t)~0xFFFFu;
or overly pedantic:
var = (uint16_t)~(uint16_t)0xFFFFu;

When the compiler looks at the right side, first it sees the literal 0xFFFF. It is automatically promoted to an integer which is (obvious from the warning) 32-bit in your system. Now we can imagine that value as 0x0000FFFF (whole 32-bit). When the compiler does the ~ operation on it, it becomes 0xFFFF0000 (whole 32-bit). When you write var = ~0xFFFF; the compiler in fact sees var = 0xFFFF0000; just before the assign operation. And of course a truncation happens during this assignment...

Related

MISRA: Compound Assignment operators

I have taken over code that has a lot of compound assignment operators in it.
I think that compound operators are not 'really' MISRA compliant.
I can't seem to find any reference to them.
I believe I understand what is happening and that it should actually be separated.
UINT16 A |= B; /* Incorrect */
UINT16 A = (UINT16)((UINT32)A | (UINT32)B); /* Correct */
UINT16 A <<= 1u; /* Incorrect */
UINT16 A = (UINT16)((UINT32)A << (UINT32)1u); /* Correct */
So, my questions are:
Does MISRA frown upon compound assignments?
Is there any kind of quick fix, instead of ignoring the warning?
Thank you,
Does MISRA frown upon compound assignments?
Not as such. The rules for compound assignment are similar to the rules for simple assignment. MISRA speaks of assignment operators in general times, including all of them.
MISRA does however frown upon implicit type promotions, see Implicit type promotion rules. You can't understand these MISRA warnings unless you understand implict promotions.
Is there any kind of quick fix, instead of ignoring the warning?
You can't really fix this without understanding the warning. The only quick fix is to only use uint32_t everywhere and never signed or small integer types, but that isn't always feasible. Your original code would have been compliant if all variables were uint32_t.
In general, the latest MISRA only allows various conversions between types of the same essential type category. Unsigned integers is one such category, signed integers in another, and so on.
It isn't easy to tell how exactly your code violates MISRA without knowing the type of B and sizeof(int). This has nothing to do with compound assignment as such, except that compound assignment operators are a bit cumbersome to work with when it comes to implicit promotions.
MISRA frowns upon assigning the value of an expression (after implicit promotion) to a narrower essential type of the same category or to a different category. In plain English you shouldn't for example assign the result of a uint32_t operation to a uint16_t variable or a signed result to an unsigned variable. This is most often fixed with casting at appropriate places.
Regarding your specific examples, assuming B is uint16_t and the CPU is 32 bit, you get problems with implicit type promotion.
Since A |= B is equivalent to A | B, the usual arithmetic conversions will promote the operands to int in case of 32 bit CPU. So it is a signed 32 bit type.
Assume you had A << 31u - then this would actually have invoked an undefined behavior bug, which the rule seeks to protect against.
Sufficient fixes for MISRA-C compliance:
A = (uint16_t) (A | B); // compliant
A = (uint16_t) ((uint32_t)A << 1u) // compliant

Defending "U" suffix after Hex literals

There is some debate between my colleague and I about the U suffix after hexadecimally represented literals. Note, this is not a question about the meaning of this suffix or about what it does. I have found several of those topics here, but I have not found an answer to my question.
Some background information:
We're trying to come to a set of rules that we both agree on, to use that as our style from that point on. We have a copy of the 2004 Misra C rules and decided to use that as a starting point. We're not interested in being fully Misra C compliant; we're cherry picking the rules that we think will most increase efficiency and robustness.
Rule 10.6 from the aforementioned guidelines states:
A ā€œUā€ suffix shall be applied to all constants of unsigned type.
I personally think this is a good rule. It takes little effort, looks better than explicit casts and more explicitly shows the intention of a constant. To me it makes sense to use it for all unsigned contants, not just numerics, since enforcing a rule doesn't happen by allowing exceptions, especially for a commonly used representation of constants.
My colleague, however, feels that the hexadecimal representation doesn't need the suffix. Mostly because we almost exclusively use it to set micro-controller registers, and signedness doesn't matter when setting registers to hex constants.
My Question
My question is not one about who is right or wrong. It is about finding out whether there are cases where the absence or presence of the suffix changes the outcome of an operation. Are there any such cases, or is it a matter of consistency?
Edit: for clarification; Specifically about setting micro-controller registers by assigning hexadecimal values to them. Would there be a case where the suffix could make a difference there? I feel like it wouldn't. As an example, the Freescale Processor Expert generates all register assignments as unsigned.
Appending a U suffix to all hexadecimal constants makes them unsigned as you already mentioned. This may have undesirable side-effects when these constants are used in operations along with signed values, especially comparisons.
Here is a pathological example:
#define MY_INT_MAX 0x7FFFFFFFU // blindly applying the rule
if (-1 < MY_INT_MAX) {
printf("OK\n");
} else {
printf("OOPS!\n");
}
The C rules for signed/unsigned conversions are precisely specified, but somewhat counter-intuitive so the above code will indeed print OOPS.
The MISRA-C rule is precise as it states A ā€œUā€ suffix shall be applied to all constants of unsigned type. The word unsigned has far reaching consequences and indeed most constants should not really be considered unsigned.
Furthermore, the C Standard makes a subtile difference between decimal and hexadecimal constants:
A hexadecimal constant is considered unsigned if its value can be represented by the unsigned integer type and not the signed integer type of the same size for types int and larger.
This means that on 32-bit 2's complement systems, 2147483648 is a long or a long long whereas 0x80000000 is an unsigned int. Appending a U suffix may make this more explicit in this case but the real precaution to avoid potential problems is to mandate the compiler to reject signed/unsigned comparisons altogether: gcc -Wall -Wextra -Werror or clang -Weverything -Werror are life savers.
Here is how bad it can get:
if (-1 < 0x8000) {
printf("OK\n");
} else {
printf("OOPS!\n");
}
The above code should print OK on 32-bit systems and OOPS on 16-bit systems. To make things even worse, it is still quite common to see embedded projects use obsolete compilers which do not even implement the Standard semantics for this issue.
For your specific question, the defined values for micro-processor registers used specifically to set them via assignment (assuming these registers are memory-mapped), need not have the U suffix at all. The register lvalue should have an unsigned type and the hex value will be signed or unsigned depending on its value, but the operation will proceed the same. The opcode for setting a signed number or an unsigned number is the same on your target architecture and on any architectures I have ever seen.
With all integer-constants
Appending u/U insures the integer-constant will be some unsigned type.
Without a u/U
For a decimal-constant, the integer-constant will be some signed type.
For a hexadecimal/octal-constant, the integer-constant will be signed or unsigned type, depending of value and integer type ranges.
Note: All integer-constants have positive values.
// +-------- unary operator
// |+-+----- integer-constant
int x = -123;
absence or presence of the suffix changes the outcome of an operation?
When is this important?
With various expressions, the sign-ness and width of the math needs to be controlled and preferable not surprising.
// Examples: assume 32-bit `unsigned`, `long`, 64-bit `long long`
// Bad signed int overflow (UB)
unsigned a = 4000 * 1000 * 1000;
// OK
unsigned b = 4000u * 1000 * 1000;
// undefined behavior
unsigned c = 1 << 31
// OK
unsigned d = 1u << 31
printf("Size %zu\n", sizeof(0xFFFFFFFF)); // 8 type is `long long`
printf("Size %zu\n", sizeof(0xFFFFFFFFu)); // 4 type is `unsigned`
// 2 ** 63
long long e = -9223372036854775808; // C99: bad "9223372036854775808" not representable
long long f = -9223372036854775807 - 1; // ok
long long g = -9223372036854775808u; // implementation defined behavior **
some_unsigned_type h_max = -1; OK, max value for the target type.
some_unsigned_type i_max = -1u; OK, but not max value for wide unsigned types
// when negating a negative `int`
unsigned j = 0 - INT_MIN; // typically int overflow or UB
unsigned k = 0u - INT_MIN; // Never UB
** or an implementation-defined signal is raised.
For the specific question, which was loading register(s), then the U makes it an unsigned value, but whether the compiler treats the n-bit word pattern as a signed or unsigned value it will move the same bit pattern, assuming there isn't any size extension that would propagate an MSB. The difference that might matter is if the register load operation will set any processor condition flags based on a signed or unsigned loading. As an overall guide if the processor supports storing a constant to configuration register or a memory address then loading a peripheral register is unlikely to set the processor's NEG condition flag. Loading a general purpose register connected to an ALU, one that can be the target of an arithmetic operation like add increment or decrement, might set a negative flag on loading so that e.g. a trailing "branch (if) negative" opcode would execute the branch. You would want to check the processor's references to be sure. Small instruction set processors tend to have only a load register instruction, while larger instruction sets are more likely to have a load unsigned variant of the load instruction that doesn't set the NEG bit in the processor's flags, but again, check the processor's references. if you don't have access to the processor's errata (the boo-boo list) and need a specific flag state. All of this only tends to come up when an optimizing compiler re-aranges code with an inline assembly instruction and other uncommon situations. Examine the generate assembly code, turn off some or all compiler optimizations for the module when needed, etc.

ANSI C All-Bits-Set Type-Promotion Safe Literal with -Wsign-conversion

Can I set all bits in an unsigned variable of any width to 1s without triggering a sign conversion error (-Wsign-conversion) using the same literal?
Without -Wsign-conversion I could:
#define ALL_BITS_SET (-1)
uint32_t mask_32 = ALL_BITS_SET;
uint64_t mask_64 = ALL_BITS_SET;
uintptr_t mask_ptr = ALL_BITS_SET << 12; // here's the narrow problem!
But with -Wsign-conversion I'm stumped.
error: negative integer implicitly converted to unsigned type [-Werror=sign-conversion]
I've tried (~0) and (~0U) but no dice. The preprocessor promotes the first to int, which triggers -Wsign-conversion, and the second doesn't promote past 32 bits and only sets the lower 32 bits of the 64 bit variable.
Am I out of luck?
EDIT: Just to clarify, I'm using the defined ALL_BITS_SET in many places throughout the project, so I hesitate to litter the source with things like (~(uint32_t)0) and (~(uintptr_t)0).
one's complement change all zeros to ones, and vise versa.
so try
#define ALL_BITS_SET (~(0))
uint32_t mask_32 = ALL_BITS_SET;
uint64_t mask_64 = ALL_BITS_SET;
Try
uint32_t mask_32 = ~((uint32_t)0);
uint64_t mask_64 = ~((uint64_t)0);
uintptr_t mask_ptr = ~((uintptr_t)0);
Maybe clearer solutions exist - this one being a bit pedantic but confident it meets your needs.
The reason you're getting the warning "negative integer implicitly converted to unsigned type" is that 0 is a literal integer value. As a literal integer value, it of type int, which is a signed type, so (~(0)), as an all-bits-one value of type int, has the value of (int)-1. The only way to convert a negative value to an unsigned value non-implicitly is, of course, to do it explicitly, but you appear to have already rejected the suggestion of using a type-appropriate cast. Alternative options:
Obviously, you can also eliminate the implicit conversion to unsigned type by negating an unsigned 0... (~(0U)) but then you'd only have as many bits as are in an unsigned int
Write a slightly different macro, and use the macro to declare your variables
`#define ALL_BITS_VAR(type,name) type name = ~(type)0`
`ALL_BITS_VAR(uint64_t,mask_32);`
But that still only works for declarations.
Someone already suggested defining ALL_BITS_SET using the widest available type, which you rejected on the grounds of having an absurdly strict dev environment, but honestly, that's by far the best way to do it. If your development environment is really so strict as to forbid assignment of an unsigned value to an unsigned variable of a smaller type, (the result of which is very clearly defined and perfectly valid), then you really don't have a choice anymore, and have to do something type-specific:
#define ALL_BITS_ONE(type) (~(type)0)
uint32_t mask_32 = ALL_BITS_SET(uint32_t);
uint64_t mask_64 = ALL_BITS_SET(uint64_t);
uintptr_t mask_ptr = ALL_BITS_SET(uintptr_t) << 12;
That's all.
(Actually, that's not quite all... since you said that you're using GCC, there's some stuff you could do with GCC's typeof extension, but I still don't see how to make it work without a function macro that you pass a variable to.)

will ~ operator change the data type?

When I read someone's code I find that he bothered to write an explicite type cast.
#define ULONG_MAX ((unsigned long int) ~(unsigned long int) 0)
When I write code
1 #include<stdio.h>
2 int main(void)
3 {
4 unsigned long int max;
5 max = ~(unsigned long int)0;
6 printf("%lx",max);
7 return 0;
8 }
it works as well. Is it just a meaningless coding style?
The code you read is very bad, for several reasons.
First of all user code should never define ULONG_MAX. This is a reserved identifier and must be provided by the compiler implementation.
That definition is not suitable for use in a preprocessor #if. The _MAX macros for the basic integer types must be usable there.
(unsigned long)0 is just crap. Everybody should just use 0UL, unless you know that you have a compiler that is not compliant with all the recent C standards with that respect. (I don't know of any.)
Even ~0UL should not be used for that value, since unsigned long may (theoretically) have padding bits. -1UL is more appropriate, because it doesn't deal with the bit pattern of the value. It uses the guaranteed arithmetic properties of unsigned integer types. -1 will always be the maximum value of an unsigned type. So ~ may only be used in a context where you are absolutely certain that unsigned long has no padding bits. But as such using it makes no sense. -1 serves better.
"recasting" an expression that is known to be unsigned long is just superfluous, as you observed. I can't imagine any compiler that bugs on that.
Recasting of expression may make sense when they are used in the preprocessor, but only under very restricted circumstances, and they are interpreted differently, there.
#if ((uintmax_t)-1UL) == SOMETHING
..
#endif
Here the value on the left evalues to UINTMAX_MAX in the preprocessor and in later compiler phases. So
#define UINTMAX_MAX ((uintmax_t)-1UL)
would be an appropriate definition for a compiler implementation.
To see the value for the preprocessor, observe that there (uintmax_t) is not a cast but an unknown identifier token inside () and that it evaluates to 0. The minus sign is then interpreted as binary minus and so we have 0-1UL which is unsigned and thus the max value of the type. But that trick only works if the cast contains a single identifier token, not if it has three as in your example, and if the integer constant has a - or + sign.
They are trying to ensure that the type of the value 0 is unsigned long. When you assign zero to a variable, it gets cast to the appropriate type.
In this case, if 0 doesn't happen to be an unsigned long then the ~ operator will be applied to whatever other type it happens to be and the result of that will be cast.
This would be a problem if the compiler decided that 0 is a short or char.
However, the type after the ~ operator should remain the same. So they are being overly cautious with the outer cast, but perhaps the inner cast is justified.
They could of course have specified the correct zero type to begin with by writing ~0UL.

doubt regarding operations on "int" flavors

I am having following doubt regarding "int" flavors (unsigned int, long int, long long int).
When we do some operations(* , /, + , -) between int and its flavors (lets say long int)
in 32bit system and 64bit system is the implicit typecast happen for "int"
for example :-
int x ;
long long int y = 2000;
x = y ; (Higher is assigned to lower one data truncation may happen)
I am expecting compiler to give warning for this But I am not getting any such warning.
Is this due to implicit typecast happen for "x" here.
I am using gcc with -Wall option. Is the behavior will change for 32bit and 64bit.
Thanks
Arpit
-Wall does not activate all possible warnings. -Wextra enables other warnings. Anyway, what you do is a perfectly "legal" operation and since the compiler can't always know at compile-time the value of the datum that could be "truncated", it is ok it does not warn: programmer should be already aware of the fact that a "large" integer could not fit into a "small" integer, so it is up to the programmer usually. If you think your program is written in not-awareness of this, add -Wconversion.
Casting without an explicit type cast operator is perfectly legal in C, but may have undefined behavior. In your case, int x; is signed, so if you try to store a value in it that's outside the range of int, your program has undefined behavior. On the other hand, if x were declared as unsigned x; the behavior is well-defined; cast is via reduction modulo UINT_MAX+1.
As for arithmetic, when you perform arithmetic between integers of different types, the 'smaller' type is promoted to the 'larger' type prior to the arithmetic. The compiler is free to optimize out this promotion of course if it does not affect the results, which leads to idioms like casting a 32bit integer to 64bit before multiplying to get a full 64bit result. Promotion gets a bit confusing and can have unexpected results when signed and unsigned values are mixed. You should look it up if you care to know since it's hard to explain informally.
If you are worried, you can include <stdint.h> and use types with defined lengths, such as uint16_t for a 16-bit unsigned integer.
Your code is perfectly valid (as already said by others). If you want to program in a portable way in most cases you should not use the bare C types int, long or unsigned int but types that tell a bit better what you are planing to do with it.
E.g for indices of arrays use always size_t. Regardless on whether or not you are on a 32 or 64 bit system this will be the right type. Or if you want to take the integer of maximal width on the platform you happen to land on use intmax_t or uintmax_t.
See http://gcc.gnu.org/ml/gcc-help/2003-06/msg00086.html -- the code is perfectly valid C/C++.
You might want to look at static analysis tools (sparse, llvm, etc.) to check for this type of truncation.

Resources