There is a question already answering the particular case of variable declaration, but what about other literal constant uses?
For example:
uint64_t a;
...
int32_t b = a / 1000000000;
Is last piece of code equivalent to next one in any standard C compiler?
uint64_t a;
...
int32_t b = (int32_t)(a / UINT64_C(1000000000));
In other words, are xINTn_C macros needed at all (supposing we are using explicit casting in cases where implicit one is wrong)?
EDIT
When compiler reads 1000000000, is it allowed to store it as int in an internal representation (dropping all overflowing bits) or it must store it at highest possible precision (long long) until it resolves whole expression type? Is it an implementation-defined behavior or it is mandated by the standard?
Your second example isn't valid C99 and looks like C++. Perhaps want you want is a cast, i.e. (int32_t)(a / UINT64_C(1000000000))?
Is there a difference between a / UINT64_C(1000000000) and a / 1000000000? No, they'll end up with the same operation. But I don't think that's really your question.
I think your question boils down to what will the type of the integer literal "1000000000" be? Will it be an int32_t or an int64_t? The answer in C99 comes from §6.4.4.1 paragraph 5:
The type of an integer constant is the first of the corresponding list in which its value can be represented.
For decimal constants with no suffix, the list is int, long int, long long int. So the first literal will almost certainly be an int (depend on the size of an int, which will likely be 32-bits and therefor large enough to hold one billion). The second literal with the UINT64_C macro will likely be either a unsigned long or unsigned long long, depending on the platform. It will be whatever type corresponds to uint64_t.
So the types of the constants are not the same. The first will be signed while the second is unsigned. And the second will most likely have more "longs", depending on the compiler's sizes of the basic int types.
In your example, it makes no difference that the literals have different types because the / operator will need to promote the literal to the type of a (because a will be of equal or greater rank than the literal in any case). Which is why I didn't think that was really your question.
For an example of why UINT64_C() would matter, consider an expression where the result changes if the literals are promoted to a larger type. I.e., overflow will occur in the literals' native types.
int32_t a = 10;
uint64_t b = 1000000000 * a; // overflows 32-bits
uint64_t c = UINT64_C(1000000000) * a; // constant is 64-bit, no overflow
To compute c, the compiler will need to promote a to uint64_t and perform a 64-bit multiplication. But to compute b the compiler will use 32-bit multiplication since both values are 32-bits.
In the last example, one could use a cast instead of the macro:
uint64_t c = (uint_least64_t)(1000000000) * a;
That would also force the multiplication to be at least 64 bits.
Why would you ever use the macro instead of casting a literal? One possibility is because decimal literals are signed. Suppose you want a constant that isn't representable as a signed value? For example:
uint64_t x = (uint64_t)9888777666555444333; // warning, literal is too large
uint64_t y = UINT64_C(9888777666555444333); // works
uint64_t z = (uint64_t)(9888777666555444333U); // also works
Another possibility is for preprocessor expressions. A cast isn't legal syntax for use in the expression of a #if directive. But the UINTxx_C() macros are.
Since the macros use suffixes pasted onto literals and there is no suffix for a short, one will likely find that UINT16_C(x) and UINT32_C(x) are identical. This gives the result that (uint_least16_t)(65537) != UINT16_C(65537). Not what one might expect. In fact, I have a hard time seeing how this complies with C99 §7.18.4.1:
The macro UINTN_C(value) shall expand to an integer constant expression corresponding to the type uint_leastN_t.
Related
assuming two arbitrary timestamps:
uint32_t timestamp1;
uint32_t timestamp2;
Is there a standard conform way to get a signed difference of the two beside the obvious variants of converting into bigger signed type and the rather verbose if-else.
Beforehand it is not known which one is larger, but its known that the difference is not greater than max 20bit, so it will fit into 32 bit signed.
int32_t difference = (int32_t)( (int64_t)timestamp1 - (int64_t)timestamp2 );
This variant has the disadvantage that using 64bit arithmetic may not be supported by hardware and is possible of course only if a larger type exists (what if the timestamp already is 64bit).
The other version
int32_t difference;
if (timestamp1 > timestamp2) {
difference = (int32_t)(timestamp1 - timestamp2);
} else {
difference = - ((int32_t)(timestamp2 - timestamp1));
}
is quite verbose and involves conditional jumps.
That is with
int32_t difference = (int32_t)(timestamp1 - timestamp2);
Is this guaranteed to work from standards perspective?
You can use a union type pun based on
typedef union
{
int32_t _signed;
uint32_t _unsigned;
} u;
Perform the calculation in unsigned arithmetic, assign the result to the _unsigned member, then read the _signed member of the union as the result:
u result {._unsigned = timestamp1 - timestamp2};
result._signed; // yields the result
This is portable to any platform that implements the fixed width types upon which we are relying (they don't need to). 2's complement is guaranteed for the signed member and, at the "machine" level, 2's complement signed arithmetic is indistinguishable from unsigned arithmetic. There's no conversion or memcpy-type overhead here: a good compiler will compile out what's essentially standardese syntactic sugar.
(Note that this is undefined behaviour in C++.)
Bathsheba's answer is correct but for completeness here are two more ways (which happen to work in C++ as well):
uint32_t u_diff = timestamp1 - timestamp2;
int32_t difference;
memcpy(&difference, &u_diff, sizeof difference);
and
uint32_t u_diff = timestamp1 - timestamp2;
int32_t difference = *(int32_t *)&u_diff;
The latter is not a strict aliasing violation because that rule explicitly allows punning between signed and unsigned versions of an integer type.
The suggestion:
int32_t difference = (int32_t)(timestamp1 - timestamp2);
will work on any actual machine that exists and offers the int32_t type, but technically is not guaranteed by the standard (the result is implementation-defined).
The conversion of an unsigned integer value to a signed integer is implementation defined. This is spelled out in section 6.3.1.3 of the C standard regarding integer conversions:
1 When a value with integer type is converted to another integer type other than
_Bool ,if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type
until the value is in the range of the new type. 60)
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.
On implementations people are most likely to use, the conversion will occur the way you expect, i.e. the representation of the unsigned value will be reinterpreted as a signed value.
Specifically GCC does the following:
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object
of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N
to be within range of the type; no signal is raised.
MSVC:
When a long integer is cast to a short, or a short is cast to a char,
the least-significant bytes are retained.
For example, this line
short x = (short)0x12345678L;
assigns the value 0x5678 to x, and this line
char y = (char)0x1234;
assigns the value 0x34 to y.
When signed variables are converted to unsigned and vice versa, the
bit patterns remain the same. For example, casting -2 (0xFE) to an
unsigned value yields 254 (also 0xFE).
So for these implementations, what you proposed will work.
Rebranding Ian Abbott's macro-packaging of Bathseba's answer as an answer:
#define UTOS32(a) ((union { uint32_t u; int32_t i; }){ .u = (a) }.i)
int32_t difference = UTOS32(timestamp1 - timestamp2);
Summarizing the discussions on why this is more portable than a simple typecast: The C standard (back to C99, at least) specifies the representation of int32_t (it must be two's complement), but not in all cases how it should be cast from uint32_t.
Finally, note that Ian's macro, Bathseba's answer, and M.M's answers all also work in the more general case where the counters are allowed to wrap around 0, as is the case, for example, with TCP sequence numbers.
From an Example
unsigned long x = 12345678UL
We have always learnt that the compiler needs to see only "long" in the above example to set 4 bytes (in 32 bit) of memory. The question is why is should we use L/UL in long constants even after declaring it to be a long.
When a suffix L or UL is not used, the compiler uses the first type that can contain the constant from a list (see details in C99 standard, clause 6.4.4:5. For a decimal constant, the list is int, long int, long long int).
As a consequence, most of the times, it is not necessary to use the suffix. It does not change the meaning of the program. It does not change the meaning of your example initialization of x for most architectures, although it would if you had chosen a number that could not be represented as a long long. See also codebauer's answer for an example where the U part of the suffix is necessary.
There are a couple of circumstances when the programmer may want to set the type of the constant explicitly. One example is when using a variadic function:
printf("%lld", 1LL); // correct, because 1LL has type long long
printf("%lld", 1); // undefined behavior, because 1 has type int
A common reason to use a suffix is ensuring that the result of a computation doesn't overflow. Two examples are:
long x = 10000L * 4096L;
unsigned long long y = 1ULL << 36;
In both examples, without suffixes, the constants would have type int and the computation would be made as int. In each example this incurs a risk of overflow. Using the suffixes means that the computation will be done in a larger type instead, which has sufficient range for the result.
As Lightness Races in Orbit puts it, the litteral's suffix comes before the assignment. In the two examples above, simply declaring x as long and y as unsigned long long is not enough to prevent the overflow in the computation of the expressions assigned to them.
Another example is the comparison x < 12U where variable x has type int. Without the U suffix, the compiler types the constant 12 as an int, and the comparison is therefore a comparison of signed ints.
int x = -3;
printf("%d\n", x < 12); // prints 1 because it's true that -3 < 12
With the U suffix, the comparison becomes a comparison of unsigned ints. “Usual arithmetic conversions” mean that -3 is converted to a large unsigned int:
printf("%d\n", x < 12U); // prints 0 because (unsigned int)-3 is large
In fact, the type of a constant may even change the result of an arithmetic computation, again because of the way “usual arithmetic conversions” work.
Note that, for decimal constants, the list of types suggested by C99 does not contain unsigned long long. In C90, the list ended with the largest standardized unsigned integer type at the time (which was unsigned long). A consequence was that the meaning of some programs was changed by adding the standard type long long to C99: the same constant that was typed as unsigned long in C90 could now be typed as a signed long long instead. I believe this is the reason why in C99, it was decided not to have unsigned long long in the list of types for decimal constants.
See this and this blog posts for an example.
Because numerical literals are of typicaly of type int. The UL/L tells the compiler that they are not of type int, e.g. assuming 32bit int and 64bit long
long i = 0xffff;
long j = 0xffffUL;
Here the values on the right must be converted to signed longs (32bit -> 64bit)
The "0xffff", an int, would converted to a long using sign extension, resulting in a negative value (0xffffffff)
The "0xffffUL", an unsigned long, would be converted to a long, resulting in a positive value (0x0000ffff)
The question is why is should we use L/UL in long constants even after declaring it to be a long.
Because it's not "after"; it's "before".
First you have the literal, then it is converted to whatever the type is of the variable you're trying to squeeze it into.
They are two objects. The type of the target is designated by the unsigned long keywords, as you've said. The type of the source is designated by this suffix because that's the only way to specify the type of a literal.
Related to this post is why a u.
A reason for u is to allow an integer constant greater than LLONG_MAX in decimal form.
// Likely to generate a warning.
unsigned long long limit63bit = 18446744073709551615; // 2^64 - 1
// OK
unsigned long long limit63bit = 18446744073709551615u;
I'm currently fixing a legacy bug in C code. In the process of fixing this bug, I stored an unsigned int into an unsigned long long. But to my surprise, math stopped working when I compiled this code on a 64 bit version of GCC. I discovered that the problem was that when I assigned a long long an int value, then I got a number that looked like 0x0000000012345678, but on the 64-bit machine, that number became 0xFFFFFFFF12345678.
Can someone explain to me or point me to some sort of spec or documentation on what is supposed to happen when storing a smaller data type in a larger one and perhaps what the appropriate pattern for doing this in C is?
Update - Code Sample
Here's what I'm doing:
// Results in 0xFFFFFFFFC0000000 in 64 bit gcc 4.1.2
// Results in 0x00000000C0000000 in 32 bit gcc 3.4.6
u_long foo = 3 * 1024 * 1024 * 1024;
I think you have to tell the compiler that the number on the right is unsigned. Otherwise it thinks it's a normal signed int, and since the sign bit is set, it thinks it's negative, and then it sign-extends it into the receiver.
So do some unsigned casting on the right.
Expressions are generally evaluated independently; their results are not affected by the context in which they appear.
An integer constant like 1024 is of the smallest of int, long int, long long int into which its value will fit; in the particular case of 1024 that's always int.
I'll assume here that u_long is a typedef for unsigned long (though you also mentioned long long in your question).
So given:
unsigned long foo = 3 * 1024 * 1024 * 1024;
the 4 constants in the initialization expression are all of type int, and all three multiplications are int-by-int. The result happens to be greater (by a factor of 1.5) than 231, which means it won't fit in an int on a system where int is 32 bits. The int result, whatever it is, will be implicitly converted to the target type unsigned long, but by that time it's too late; the overflow has already occurred.
The overflow means that your code has undefined behavior (and since this can be determined at compile time, I'd expect your compiler to warn about it). In practice, signed overflow typically wraps around, so the above will typically set foo to -1073741824. You can't count on that (and it's not what you want anyway).
The ideal solution is to avoid the implicit conversions by ensuring that everything is of the target type in the first place:
unsigned long foo = 3UL * 1024UL * 1024UL * 1024UL;
(Strictly speaking only the first operand needs to be of type unsigned long, but it's simpler to be consistent.)
Let's look at the more general case:
int a, b, c, d; /* assume these are initialized */
unsigned long foo = a * b * c * d;
You can't add a UL suffix to a variable. If possible, you should change the declarations of a, b, c, and d so they're of type unsigned long long, but perhaps there's some other reason they need to be of type int. You can add casts to explicitly convert each one to the correct type. By using casts, you can control exactly when the conversions are performed:
unsigned long foo = (unsigned long)a *
(unsigned long)b *
(unsigned long)d *
(unsigned long)d;
This gets a bit verbose; you might consider applying the cast only to the leftmost operand (after making sure you understand how the expression is parsed).
NOTE: This will not work:
unsigned long foo = (unsigned long)(a * b * c * d);
The cast converts the int result to unsigned long, but only after the overflow has already occurred. It merely specifies explicitly the cast that would have been performed implicitly.
Integral literals with a suffix are int if they can fit, in your case 3 and 1024 can definitely fit. This is covered in the draft C99 standard section 6.4.4.1 Integer constants, a quote of this section can be found in my answer to Are C macros implicitly cast?.
Next we have the multiplication, which performs the usual arithmetic conversions conversions on it's operands but since they are all int the result of which is too large to fit in a signed int which results in overflow. This is undefined behavior as per section 5 which says:
If an exceptional condition occurs during the evaluation of an expression (that is, if the
result is not mathematically defined or not in the range of representable values for its
type), the behavior is undefined.
We can discover this undefined behavior empirically using clang and the -fsanitize=undefined flags (see it live) which says:
runtime error: signed integer overflow: 3145728 * 1024 cannot be represented in type 'int'
Although in two complement this will just end up being a negative number. One way to fix this would be to use the ul suffix:
3ul * 1024ul * 1024ul * 1024ul
So why does a negative number converted to an unsigned value give a very large unsigned value? This is covered in section 6.3.1.3 Signed and unsigned integers which says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.49)
which basically means unsigned long max + 1 is added to the negative number which results in very large unsigned value.
I've read and wondered about the source code of sqlite
static int strlen30(const char *z){
const char *z2 = z;
while( *z2 ){ z2++; }
return 0x3fffffff & (int)(z2 - z);
}
Why use strlen30() instead of strlen() (in string.h)??
The commit message that went in with this change states:
[793aaebd8024896c] part of check-in [c872d55493] Never use strlen(). Use our own internal sqlite3Strlen30() which is guaranteed to never overflow an integer. Additional explicit casts to avoid nuisance warning messages. (CVS 6007) (user: drh branch: trunk)
(this is my answer from Why reimplement strlen as loop+subtraction? , but it was closed)
I can't tell you the reason why they had to re-implement it, and why they chose int instead if size_t as the return type. But about the function:
/*
** Compute a string length that is limited to what can be stored in
** lower 30 bits of a 32-bit signed integer.
*/
static int strlen30(const char *z){
const char *z2 = z;
while( *z2 ){ z2++; }
return 0x3fffffff & (int)(z2 - z);
}
Standard References
The standard says in (ISO/IEC 14882:2003(E)) 3.9.1 Fundamental Types, 4.:
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer. 41)
...
41): This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer
type
That part of the standard does not define overflow-behaviour for signed integers. If we look at 5. Expressions, 5.:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined, unless such an expression is a constant expression
(5.19), in which case the program is ill-formed. [Note: most existing implementations of C + + ignore integer
overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point
exceptions vary among machines, and is usually adjustable by a library function. ]
So far for overflow.
As for subtracting two pointers to array elements, 5.7 Additive operators, 6.:
When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as ptrdiff_t in the cstddef header (18.1). [...]
Looking at 18.1:
The contents are the same as the Standard C library header stddef.h
So let's look at the C standard (I only have a copy of C99, though), 7.17 Common Definitions :
The types used for size_t and ptrdiff_t should not have an integer conversion rank
greater than that of signed long int unless the implementation supports objects
large enough to make this necessary.
No further guarantee made about ptrdiff_t. Then, Annex E (still in ISO/IEC 9899:TC2) gives the minimum magnitude for signed long int, but not a maximum:
#define LONG_MAX +2147483647
Now what are the maxima for int, the return type for sqlite - strlen30()? Let's skip the C++ quotation that forwards us to the C-standard once again, and we'll see in C99, Annex E, the minimum maximum for int:
#define INT_MAX +32767
Summary
Usually, ptrdiff_t is not bigger than signed long, which is not smaller than 32bits.
int is just defined to be at least 16bits long.
Therefore, subtracting two pointers may give a result that does not fit into the int of your platform.
We remember from above that for signed types, a result that does not fit yields undefined behaviour.
strlen30 does applies a bitwise or upon the pointer-subtract-result:
| 32 bit |
ptr_diff |10111101111110011110111110011111| // could be even larger
& |00111111111111111111111111111111| // == 3FFFFFFF<sub>16</sub>
----------------------------------
= |00111101111110011110111110011111| // truncated
That prevents undefiend behaviour by truncation of the pointer-subtraction result to a maximum value of 3FFFFFFF16 = 107374182310.
I am not sure about why they chose exactly that value, because on most machines, only the most significant bit tells the signedness. It could have made sense versus the standard to choose the minimum INT_MAX, but 1073741823 is indeed slightly strange without knowing more details (though it of course perfectly does what the comment above their function says: truncate to 30bits and prevent overflow).
The CVS commit message says:
Never use strlen(). Use our own internal sqlite3Strlen30() which is guaranteed to never overflow an integer. Additional explicit casts to avoid nuisance warning messages. (CVS 6007)
I couldn't find any further reference to this commit or explanation how they got an overflow in that place. I believe that it was an error reported by some static code analysis tool.
I understand typecasting...but only in retrospect. My process to figure out what requires typecasting in expressions is usually retroactive because I can't predict when it will be required because I don't know how the compiler steps through them. A somewhat trite example:
int8_t x = -50;
uint16_t y = 50;
int32_t z = x * y;
On my 8-bit processor (Freescale HCS08) sets z to 63036 (2^16 - 50^2). I can see how that would be one possible answer (out of maybe 4 others), but I would not have guessed it would be the one.
A better way to ask might be: when types interact with operators (+-*/), what happens?
The compiler is suppsed upcast to the largest type in the expression and then place the result into the size of the location. If you were to look at the assembler output of the above, you could see exactly how the types are being read in native format from memory. Upcastings from a smaller to a larger size is safe and won't generate warnings. It's when you go from a larger type into a smaller type that precision may be lost and the compiler is supposed to warn or error.
There are cases where you want the information to be lost though. Say you are working with a sin/cos lookup table that is 256 entries long. It's very convienent and common (at least in embedded land) to use a u8 value to access the table so that the index is wrapped naturally to the table size while preseving the circular nature of sin/cos. Then a typecast back into a u8 is required, but is exactly what you want.
The folks here that say that values are always converted to the larger type are wrong. We cannot talk about anything if we don't know your platform (I see you have provided some information now). Some examples
int = 32bits, uint16_t = unsigned short, int8_t = signed char
This results in value -2500 because both operands are converted to int, and the operation is carried out signed and the signed result is written to an int32_t.
int = 16bits, uint16_t = unsigned int, int8_t = signed char
This results in value 63036 because the int8_t operand is first converted to unsinged int, resulting in 65536-50. It is then multiplied with it, resulting in 3 274 300 % 65536 (unsigned is modulo arithmetic) which is 63036. That result is then written to int32_t.
Notice that the minimum int bit-size is 16 bits. So on your 8-bit platform, this second scenario is what likely happens.
I'm not going to try and explain the rules here because it doesn't make sense to me to repeat what is written in the Standard / Draft (which is freely available) in great detail and which is usually easily understandable.
You will need type casting when you are down casting.
upcasting is auto and is safe, that is why the compiler never issues a warning/error. But when you are downcasting you are actually placing a value which has higher precision than the type of variable you are storing it in that is why the compiler wants you to be sure and you need to explicitly down cast.
If you want a complete answer, look at other people's suggestions. Read the C standard regarding implicit type conversion. And write test cases for your code...
It is interesting that you say this, because this code:
#include "stdio.h"
#include "stdint.h"
int main(int argc, char* argv[])
{
int8_t x = -50;
uint16_t y = 50;
int32_t z = x * y;
printf("%i\n", z);
return 0;
}
Is giving me the answer -2500.
See: http://codepad.org/JbSR3x4s
This happens for me, both on Codepad.org, and Visual Studio 2010
When the compiler does implicit casting, it follows a standard set of arithmetic conversions. These are documented in the C standard in section 6.3. If you happen to own the K&R book, there is a good summary in appendix section A6.5.
What happens to you, here, is integer promotion. Basically before computation takes place all types that are of a rank smaller than int are promoted to signed or unsigned, here to unsigned since one of your types is an unsigned type.
The computation is than performed with that width and signedness and the result is finally assigned.
On your architecture unsigned is probably 16 bit wide, which corresponds to the value that you see. Then for the assignment the computed value fits in the target type which is even wider, so the value remains the same.
To explain what happens in your example, you've got a signed 8-bit type multiplied by an unsigned 16-bit type, and so the smaller signed type is promoted to the larger unsigned type. Once this value is created, it's assigned to the 32-bit type.
If you're just working with signed or unsigned integer types, it's pretty simple. The system can always convert a smaller integer type to a larger without loss of precision, so it will convert the smaller value to the larger type in an operation. In mixed floating-point and integer calculations, it will convert the integer to the floating-point type, perhaps losing some precision.
It appears you're being confused by mixing signed and unsigned types. The system will convert to the larger type. If that larger type is signed, and can hold all the values of the unsigned type in the operation, then the operation is done as signed, otherwise as unsigned. In general, the system prefers to interpret mixed mode as unsigned.
This can be the cause of confusion (it confused you, for example), and is why I'm not entirely fond of unsigned types and arithmetic in C. I'd advise sticking to signed types when practical, and not trying to control the type size as closely as you're doing.