When declaring or using in an artithmetic expression an integer
constant expression in C of a type that’s defined in
stdint.h,
for instance uint64_t, one could cast the integer to the desired
type (uint64_t)x, or use a Macro for Integer Constant
Expressions
such as UINT64_C(x) (where x is an integer constant expression).
I’m more enclined to use the macro, however I’m wondering in what
cases the two approaches are equivalent, differ, and what could go
wrong. More precisely: is there a case where using one would lead to
a bug, but not with the other?
Thanks!
More precisely: is there a case where using one would lead to a bug, but not with the other?
Yes, there are such cases, though they are rather contrived. Unary operators such as cast operators have high precedence, but all the postfix operators have higher. Of those, the indexing operator, [], can be applied to an integer constant when the expression inside is a pointer to a complete type. Thus, given this declaration in scope:
int a[4] = { 1, 2, 3, 4 };
... the expression (uint64_t) 1[a] evaluates to a uint64_t with value 2, whereas the expression UINT64_C(1)[a] evaluates to an int with value 2. The type difference can cause different behavior to manifest. That can arise from different implicit conversion behavior, which is generally a subtle effect, or if these are used as control expressions for a generic selection then the overall expression can evaluate to wildly different things depending on which variation you use.
However, I think there is no practical difference if you put the cast expression in parentheses: ((uint64_t) 1).
An esoteric but possible case: If the system has no 64-bit type (e.g. 32-bit int, 128-bit long), then (uint64_t)1 will fail to compile, whereas UINT64_C(1) will give something of the smallest unsigned integer type larger than 64 bits.
The macro forms are likely to expand to suffixes rather than casts. But I can't think of any other situation where a conforming program would behave differently (other than the syntax precedence issue of course).
If the program is non-conforming then there are various possibilities, e.g. UINT64_C(-1) is undefined behaviour no diagnostic required, as is UINT8_C(256). The macro argument must be an unsuffixed integer constant that is in range for the target type.
Keep it simple and make everything clear and obvious to the reader. I.e. avoid the preprocessor as much as possible, and only introduce a cast where absolutely necessary.
Related
I'll quote from N1570, but the C11 standard has similar wording:
The fpclassify macro classifies its argument value as NaN, infinite, normal,
subnormal, zero, or into another implementation-defined category. First, an argument
represented in a format wider than its semantic type is converted to its semantic type.
Then classification is based on the type of the argument.
(my emphasis)
And a footnote:
Since an expression can be evaluated with more range and precision than its type has, it is important to
know the type that classification is based on. For example, a normal long double value might
become subnormal when converted to double, and zero when converted to float.
What does it mean for the argument to be "converted to its semantic type". There is no definition of "semantic type" anywhere evident.
My understanding is that that any excess precision is removed, as if storing the expression's value to a variable of float, double or long double, resulting in a value of the precision the programmer expected. In which case, using fpclassify() and friends on an lvalue would result in no conversion necessary for a non-optimising compiler. Am I correct, or are these functions much less useful than advertised to be?
(This question arises from comments to a Code Review answer)
The semantic type is simply the type of the expression as described elsewhere in the C standard, disregarding the fact that the value is permitted to be represented with excess precision and range. Equivalently, the semantic type is the type of the expression if clause 5.2.4.2.2 paragraph 9 (which says that floating-point values may be evaluated with excess range and precision) were not in the standard.
Converting an argument to its semantic type means discarding the excess precision and range (by rounding the value to the semantic type using whatever rounding rule is in effect for the operation).
Regarding your hypothesis that applying fpclassify to an lvalue does not require any conversion (because the value stored in an object designated by an lvalue must have already been converted to its semantic type when it was assigned), I am not sure that holds formally. Certainly when the object’s value is updated by assignment, 5.2.4.2.2 9 requires that excess range and precision be removed. But consider alternate ways of modifying the value, such as the postfix increment operator. Does that count as an assignment? Its specification in 6.5.2.4 2 says to see the discussion of compound assignment for information on its conversions and effects. That is a bit vague. One would have to consider all possible ways of modifying an object and evaluate what the C standard says about them.
In the following C snippet that checks if the first two bits of a 16-bit sequence are set:
bool is_pointer(unsigned short int sequence) {
return (sequence >> 14) == 3;
}
CLion's Clang-Tidy is giving me a "Use of a signed integer operand with a binary bitwise operator" warning, and I can't understand why. Is unsigned short not unsigned enough?
The code for this warning checks if either operand to the bitwise operator is signed. It is not sequence causing the warning, but 14, and you can alleviate the problem by making 14 unsigned by appending a u to the end.
(sequence >> 14u)
This warning is bad. As Roland's answer describes, CLion is fixing this.
There is a check in clang-tidy that is called hicpp-signed-bitwise. This check follows the wording of the HIC++ standard. That standard is freely available and says:
5.6.1. Do not use bitwise operators with signed operands
Use of signed operands with bitwise operators is in some cases subject to undefined or implementation defined behavior. Therefore, bitwise operators should only be used with operands of unsigned integral types.
The authors of the HIC++ coding standard misinterpreted the intention of the C and C++ standards and either accidentally or intentionally focused on the type of the operands instead of the value of the operands.
The check in clang-tidy implements exactly this wording, in order to conform to that standard. That check is not intended to be generally useful, its only purpose is to help the poor souls whose programs have to conform to that one stupid rule from the HIC++ standard.
The crucial point is that by definition integer literals without any suffix are of type int, and that type is defined as being a signed type. HIC++ now wrongly concludes that positive integer literals might be negative and thus could invoke undefined behavior.
For comparison, the C11 standard says:
6.5.7 Bitwise shift operators
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
This wording is carefully chosen and emphasises that the value of the right operand is important, not its type. It also covers the case of a too large value, while the HIC++ standard simply forgot that case. Therefore, saying 1u << 1000u is ok in HIC++, while 1 << 3 isn't.
The best strategy is to explicitly disable this single check. There are several bug reports for CLion mentioning this, and it is getting fixed there.
Update 2019-12-16: I asked Perforce what the motivation behind this exact wording was and whether the wording was intentional. Here is their response:
Our C++ team who were involved in creating the HIC++ standard have taken a look at the Stack Overflow question you mentioned.
In short, referring to the object type in the HIC++ rule instead of the value is an intentional choice to allow easier automated checking of the code. The type of an object is always known, while the value is not.
HIC++ rules in general aim to be "decidable". Enforcing against the type ensures that a decidable check is always possible, ie. directly where the operator is used or where a signed type is converted to unsigned.
The rationale explicitly refers to "possible" undefined behavior, therefore a sensible implementation can exclude:
constants unless there is definitely an issue and,
unsigned types that are promoted to signed types.
The best operation is therefore for CLion to limit the checking to non-constant types before promotion.
I think the integer promotion causes here the warning. Operands smaller than an int are widened to integer for the arithmetic expression, which is signed. So your code is effectively return ( (int)sequence >> 14)==3; which leds to the warning. Try return ( (unsigned)sequence >> 14)==3; or return (sequence & 0xC000)==0xC000;.
How do I implement basic type inference, nothing fancy just for inferring if the given value is an integer, double, or float. For instance, if I had a token for each type WHOLE_NUMBER, FLOAT_NUMBER, DOUBLE_NUMBER, and I had an expression like 4f + 2 + 5f, how would I deduce what type that is? My current idea was to just use the first type as the inferred type, so that would be a float. However, this doesn't work in most cases. What would I have to do?
My current idea was to just use the first type as the inferred type
No. Usually, the expression's type is that of its "widest" term. If it contains a double, then it's a double. If not but contains a float, then it's a float. If it has only integers then it is integer...
This applies to each parenthesized sub-expression.
Unless you make an explicit cast.
In your example above, there are 2 floats and an int, so it is a float. The compiler should warn you though, as any implicit conversion it has to make may result in a loss of data.
The way I would do it would be to cast into the most "accurate" or specific type. For example, if you add a bunch of integers together, the result can always be represented by an integer. The moment a floating-point value is included in the expression, the result must be a float, as the result of the calculation might be fractional due to the floating-point term in the addition.
Similarly, if there are any doubles in the expression, the answer must be a double, as down-casting to a float might result in loss of precision. So, the steps required to infer the type are:
Does the expression contain any doubles? If so, the result is a double - cast any integers or floats to double as appropriate. If not...
Does the expression contain any floats? If so, the result is a float - case any integers to float as appropriate. If not...
The result is an integer, as the expression is entirely in terms of integers.
Different programming languages handle these sorts of situations differently, and it might be appropriate to add compiler warnings in situations where these automatic casts could cause a precision error. In general, make sure the behaviour of your compiler/interpreter is well-defined and predictable, such that any developer needing alternate behaviour can (and knows when to) use explicit casts if they need to preserve the accuracy of a calculation.
following is extract from the book MISRA C:2012 which I am unable to understand
The value of composite expression (+ ,- ,* ,/ ,&, |, ^, <<, >>, ?:) shall not be assigned to an object with wider type.
Example:
u32a = u16a + u16b; //non - compliant to MISRA C guidelines
where u16 is uint16_t and u32 is uint32_t.
Will this cause any issue during program run ? we might need to use u32a as u16a might cause integer overflow in this case?
This is because you never know how your compiler will deal with (u16a + u16b) expression, since it may consider to store the result in 16-bits as well, before assigning it to target variable (consider 8 and 16 bit targets used in embedded systems).
You expect (wrongly) that there is no overflow possible, while it is absolutely possible. Writing the assignment as recommended by MISRA simply makes you see clearly this problem of overflow.
Edited to include proper standard reference thanks to Carl Norum.
The C standard states
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
Are there compiler switches that guarantee certain behaviors on integer overflow? I'd like to avoid nasal demons. In particular, I'd like to force the compiler to wrap on overflow.
For the sake of uniqueness, let's take the standard to be C99 and the compiler to be gcc. But I would be interested in answers for other compilers (icc, cl) and other standards (C1x, C89). In fact, just to annoy the C/C++ crowd, I'd even appreciate answers for C++0x, C++03, and C++98.
Note: International standard ISO/IEC 10967-1 may be relevant here, but as far as I could tell it was mentioned only in the informative annex.
Take a look at -ftrapv and -fwrapv:
-ftrapv
This option generates traps for signed overflow on addition, subtraction, multiplication operations.
-fwrapv
This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation. This flag enables some optimizations and disables other. This option is enabled by default for the Java front-end, as required by the Java language specification.
For your C99 answer, I think 6.5 Expressions, paragraph 5 is what you're looking for:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
That means if you get an overflow, you're out of luck - no behaviour of any kind guaranteed. Unsigned types are a special case, and never overflow (6.2.5 Types, paragraph 9):
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
C++ has the same statements, worded a bit differently:
5 Expressions, paragraph 4:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. [Note: most existing implementations of C++ ignore integer overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point exceptions vary among machines, and is usually adjustable by a library function. —endnote]
3.9.1 Fundamental types, paragraph 4:
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2^n where n is the number of bits in the value representation of that particular size of integer.
In C99 the general behavior is desribed in 6.5/5
If an exceptional condition occurs
during the evaluation of an expression
(that is, if the result is not
mathematically defined or not in the
range of representable values for its
type), the behavior is undefined.
The behavior of unsigned types is described in 6.2.5/9, which basically states that operations on unsigned types never lead to exceptional condition
A computation involving unsigned
operands can never overflow, because a
result that cannot be represented by
the resulting unsigned integer type is
reduced modulo the number that is one
greater than the largest value that
can be represented by the resulting
type.
GCC compiler has a special option -ftrapv, which is intended to catch run-time overflow of signed integer operations.
For completeness, I'd like to add that Clang now has "checked arithmetic builtins" as a language extension. Here is an example using checked unsigned multiplication:
unsigned x, y, result;
...
if (__builtin_umul_overflow(x, y, &result)) {
/* overflow occured */
...
}
...
http://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins
6.2.5 paragraph 9 is what you're looking for:
The range of nonnegative values of a signed integer type is a subrange of the
corresponding unsigned integer type, and the representation of the same value in each
type is the same.31) A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.
The previous postings all commented on the C99 standard, but in fact this guarantee was already available earlier.
The 5th paragraph of Section 6.1.2.5 Types
of the C89 standard states
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned
integer type is reduced modulo the number that is one greater than
the largest value that can be represented by the resulting unsigned integer type.
Note that this allows C programmers to replace all unsigned divisions by some constant to be replaced by a multiplication with the inverse element of the ring formed by C's modulo 2^N interval arithmetic.
And this can be done without any "correction" as it would be necessary by approximating the division with a fixed-point multiplication with the reciprocal value.
Instead, the Extended Euclidian Algorithm can be used to find the inverse Element and use it as the multiplier. (Of course, for the sake of staying portable, bitwise AND operations should also be applied in order to ensure the results have the same bit widths.)
It may be worthwhile to comment that most C compilers already implement this as an optimization. However, such optimizations are not guaranteed, and therefore it might still be interesting for programmers to perform such optimizations manually in situations where speed matters, but the capabilities of the C optimizer are either unknown or particularly weak.
And as a final remark, the reason for why trying to do so at all: The machine-level instructions for multiplication are typically much faster than those for division, especially on high-performance CPUs.
I'm not sure if there are any compiler switches you can use to enforce uniform behavior for overflows in C/C++. Another option is to use the SafeInt<T> template. It's a cross platform C++ template that provides definitive overflow / underflow checks for all types of integer operations.
http://safeint.codeplex.com/