I like to initialize my variables to some "dummy" value and have started to use int64_t and uint64_t. So far, it looks like there are at least three ways I could initialize an int64_t to a particular value (and with slight changes for the unsigned equivalent):
int64_t method_one = 0;
int64_t method_two = 0LL;
int64_t method_three = INT64_C(0);
I use GCC and target OS X and Linux. I'd like to pick a method that aims for ease of portability and clarity — but correctness, above all. Am I overthinking this, or is there a "best" or "most recommended" approach for initializing this variable type, for any particular value I throw at it (which is within its bounds, of course)?
int64_t method_one = 0;
...is perfectly reasonable. C99 (see e.g. draft here; yes, I know it's not the most recent standard any more, but it's the one that introduced the int<N>_t types) says that:
the 0 has type int (§6.4.4.1 para.5);
the type of the expression is int64_t (§6.5.16 para.3);
the type of the right-hand side will be converted to the type of the expression (§6.5.16.1 para.2);
this conversion will not change the value (§6.3.1.3 para.1).
So there's nothing wrong with that at all, and the lack of additional clutter makes it the most readable of the options when initialising to 0 or anything else in the range of an int.
int64_t method_two = 0LL;
int64_t is not guaranteed to be the same as long long; however, this should in fact work portably for any signed 64-bit value as well (and similarly ULL for unsigned 64-bit values): long long (and unsigned long long) should be at least 64 bits in a C99-compliant implementation (§5.2.4.2.1), so LL (and ULL) should always be safe for initialising 64-bit values.
int64_t method_three = INT64_C(0);
This is arguably a better option for values which may be outside the range of an int, as it expresses the intent more clearly: INT64_C(n) will expand to something appropriate for any n in (at least) a 64-bit range (see §7.18 in general, and particularly §7.8.4.1).
In practice, I might well use any of the above, depending on context. For example:
uint64_t counter = 0;
(Why add unnecessary clutter?)
uint64_t some_bit = 1ULL << 40;
(1 << 40 simply won't work unless int is unusually wide; and UINT64_C(1) << 40 seems less readable to me here.)
uint64_t some_mask = UINT64_C(0xFF00FF00FF00FF00);
(In this case, explicitly calling out the value as a 64-bit constant seems more readable to me than writing 0xFF00FF00FF00FF00ULL.)
Personnally, I would use the third, which is the most portable way to achieve this.
#include <stdint.h>
int64_t method_three = INT64_C(0);
uint64_t method_three = UINT64_C(0);
Anyway, I don't think it's a very important thing.
According to the ANSI C standard, the suffix for a long long int and unsigned long long int is LL and ULL respectively:
octal or hexadecimal suffixed by ll or LL long long int, unsigned
long long int decimal, octal, or hexadecimal suffixed by both u or U,
and ll or LL unsigned long long int
If you know that int64_t is defined as:
typedef signed long long int int64_t
Then method two is most definitely the correct one:
int64_t method_two = 0LL;
uint64_t method_two = 0ULL;
Edit:
Keeping in mind the portability issues, and the fact that it's not guaranteed to be defined as long long, then it would be better to use the third method:
INT64_C()
UINT64_C()
Related
I am trying to learn C after working with Java for a couple of years.
I've found some code that I wanted to reproduce which looked something like this:
U64 attack_table[...]; // ~840 KiB
struct SMagic {
U64* ptr; // pointer to attack_table for each particular square
U64 mask; // to mask relevant squares of both lines (no outer squares)
U64 magic; // magic 64-bit factor
int shift; // shift right
};
SMagic mBishopTbl[64];
SMagic mRookTbl[64];
U64 bishopAttacks(U64 occ, enumSquare sq) {
U64* aptr = mBishopTbl[sq].ptr;
occ &= mBishopTbl[sq].mask;
occ *= mBishopTbl[sq].magic;
occ >>= mBishopTbl[sq].shift;
return aptr[occ];
}
U64 rookAttacks(U64 occ, enumSquare sq) {
U64* aptr = mRookTbl[sq].ptr;
occ &= mRookTbl[sq].mask;
occ *= mRookTbl[sq].magic;
occ >>= mRookTbl[sq].shift;
return aptr[occ];
}
The code is not that important but I already failed at using the same datatype: U64, I only found uint64_t. Now I would like to know where the difference in U64, uint64_t and long is.
I am very happy if someone could briefly explain this one to me, including the advantage of each of them.
Greetings,
Finn
TL;DR - for a 64-bit exact width unsigned integer, #include <stdint.h> and use uint64_t.
Presumably, U64 is a custom typedef for 64-bit wide unsigned integer.
If you're using at least a C99 compliant compiler, it would have <stdint.h> with a typedef for 64-bit wide unsigned integer with no padding bits: uint64_t. However, it might be that the code targets a compiler that doesn't have the standard uint64_t defined. In that case, there might be some configuration header where a type is chosen for U64. Perhaps you can grep the files for typedef.*U64;
long on the other hand, is a signed type. Due to various undefined and implementation-defined aspects of signed math, you wouldn't want to use a signed type for bit-twiddling at all. Another complication is that unlike in Java, the C long doesn't have a standardized width; instead long is allowed to be only 32 bits wide - and it is so on most 32-bit platforms, and even on 64-bit Windows. If you ever need exact width types, you wouldn't use int or long. Only long long and unsigned long long are guaranteed to be at least 64 bits wide.
From an Example
unsigned long x = 12345678UL
We have always learnt that the compiler needs to see only "long" in the above example to set 4 bytes (in 32 bit) of memory. The question is why is should we use L/UL in long constants even after declaring it to be a long.
When a suffix L or UL is not used, the compiler uses the first type that can contain the constant from a list (see details in C99 standard, clause 6.4.4:5. For a decimal constant, the list is int, long int, long long int).
As a consequence, most of the times, it is not necessary to use the suffix. It does not change the meaning of the program. It does not change the meaning of your example initialization of x for most architectures, although it would if you had chosen a number that could not be represented as a long long. See also codebauer's answer for an example where the U part of the suffix is necessary.
There are a couple of circumstances when the programmer may want to set the type of the constant explicitly. One example is when using a variadic function:
printf("%lld", 1LL); // correct, because 1LL has type long long
printf("%lld", 1); // undefined behavior, because 1 has type int
A common reason to use a suffix is ensuring that the result of a computation doesn't overflow. Two examples are:
long x = 10000L * 4096L;
unsigned long long y = 1ULL << 36;
In both examples, without suffixes, the constants would have type int and the computation would be made as int. In each example this incurs a risk of overflow. Using the suffixes means that the computation will be done in a larger type instead, which has sufficient range for the result.
As Lightness Races in Orbit puts it, the litteral's suffix comes before the assignment. In the two examples above, simply declaring x as long and y as unsigned long long is not enough to prevent the overflow in the computation of the expressions assigned to them.
Another example is the comparison x < 12U where variable x has type int. Without the U suffix, the compiler types the constant 12 as an int, and the comparison is therefore a comparison of signed ints.
int x = -3;
printf("%d\n", x < 12); // prints 1 because it's true that -3 < 12
With the U suffix, the comparison becomes a comparison of unsigned ints. “Usual arithmetic conversions” mean that -3 is converted to a large unsigned int:
printf("%d\n", x < 12U); // prints 0 because (unsigned int)-3 is large
In fact, the type of a constant may even change the result of an arithmetic computation, again because of the way “usual arithmetic conversions” work.
Note that, for decimal constants, the list of types suggested by C99 does not contain unsigned long long. In C90, the list ended with the largest standardized unsigned integer type at the time (which was unsigned long). A consequence was that the meaning of some programs was changed by adding the standard type long long to C99: the same constant that was typed as unsigned long in C90 could now be typed as a signed long long instead. I believe this is the reason why in C99, it was decided not to have unsigned long long in the list of types for decimal constants.
See this and this blog posts for an example.
Because numerical literals are of typicaly of type int. The UL/L tells the compiler that they are not of type int, e.g. assuming 32bit int and 64bit long
long i = 0xffff;
long j = 0xffffUL;
Here the values on the right must be converted to signed longs (32bit -> 64bit)
The "0xffff", an int, would converted to a long using sign extension, resulting in a negative value (0xffffffff)
The "0xffffUL", an unsigned long, would be converted to a long, resulting in a positive value (0x0000ffff)
The question is why is should we use L/UL in long constants even after declaring it to be a long.
Because it's not "after"; it's "before".
First you have the literal, then it is converted to whatever the type is of the variable you're trying to squeeze it into.
They are two objects. The type of the target is designated by the unsigned long keywords, as you've said. The type of the source is designated by this suffix because that's the only way to specify the type of a literal.
Related to this post is why a u.
A reason for u is to allow an integer constant greater than LLONG_MAX in decimal form.
// Likely to generate a warning.
unsigned long long limit63bit = 18446744073709551615; // 2^64 - 1
// OK
unsigned long long limit63bit = 18446744073709551615u;
I implemented a fast-power algorithm to perform the operation (x^y) % m.
Since m is very large (4294434817), I used long long to store the result. However, long long still seems not enough during the operation. For example, I got a negative number for (3623752876 * 3623752876) % 4294434817.
Is there anyway to figure it out?
All three of those constants are between 231 and 232.
The type unsigned long long is guaranteed to be able to store values up to at least 264-1, which exceeds the product 3623752876 * 3623752876.
So just use unsigned long long for the calculation. long long is wide enough to hold the individual constants, but not the product.
You could also use uint64_t, defined in <stdint.h>. Unlike unsigned long long, it's guaranteed to be exactly 64 bits wide. Since you don't really need an exact width of 64 bits (128-bit arithmetic would work just as well), uint_least64_t or uint_fast64_t is probably more suitable. But unsigned long long is arguably simpler, and in this case it will work correctly. (uint64_t is not guaranteed to exist, though on any C99 or later implementation it almost certainly will.)
For larger values, including intermediate results, you'll likely need to use something wider than unsigned long long, which is likely to require some kind of multi-precision arithmetic. The GNU GMP library is one possibility. Another is to use a language that has built-in support for arbitrary-width integer arithmetic (such as Python).
This answer is based on the calculation (x * x) % y although the question is not entirely clear.
Use uint64_t because although unsigned int is large enough to hold the operands, and the result, it won't hold the product.
#include <stdio.h>
#include <stdint.h>
int main(void)
{
unsigned x = 3623752876;
unsigned m = 4294434817;
uint64_t r;
r = ((uint64_t)x * x) % m;
printf("%u\n", (unsigned)r);
return 0;
}
Program output:
3896043471
We can use the power of modulus arithmetic to do such calculations. The fundamental property of multiplication in modulus arithmetic for two numbers a and b states:
(a*b)%m = ((a%m)*(b%m))%m
If m fits into the long long int type, then the above calculation will never overflow.
Since you are trying to do modular exponentiation, you can read more about it here.
Why are constants in C terminated by use of L or UL etc. For example
unsigned long x = 12345678UL;
My question is what is the significance of this and are there any advantages in doing this.
because any number such as 12345 is treated as an integer in C. The problem comes when you try to do bitwise operations on them. Then, it can overflow.
This can have serious untracable errors and bugs. To avoid that, when a larger constant number is to be assigned to a (unsigned)long variable, UL and L are used.
UL is to tell the compiler to treat the integer token as an unsigned long rather than int
L is to tell the compiler to treat the integer token as long rather than int.
The suffix of an integer constant forces a minimum integer type, but the compiler will choose a large one (consistent with some constraints on signedness) if the number cannot be represented in it (see C11 6.4.4.1, in particular the table after §5).
If all you do is use the constant to initialize a variable, you don't need any suffix (except for the edge case of a number that is in range of unsigned long long, but not long long - in that case, any of the unsigned suffixes u, ul or ull as well as octal or hexadecimal representation can be used - decimal integer constants without suffix only promote to signed types).
Suffixes become important if you use the constants in more complex expressions because they will determine the result of it, eg
32u << 30
has type unsigned and will truncate the value, whereas
32ull << 30
won't.
I saw the question which is commented. But my question is different. I want to know the reason behind doing this.
The reason is overflow:
#include <stdio.h>
int main(void)
{
unsigned long a = 1U << 32; // (1U = unsigned int)
unsigned long b = 1UL << 32; // (1UL = unsigned long int)
printf("%lu %lu\n", a, b);
return 0;
}
On my computer int is 32 bits and long is 64, this is the output:
0 4294967296
This is what happens:
a 000000000000000000000000000000000000001 1U
^--------------------------------------- << 32 left shift count >= width of type
b 000000000000000000000000000000000000000000000000000000000000000000000000001 1UL
^---------------------------------- << 32
The reason is that C is very badly designed, and will forcibly convert numbers that are too large into an int for no reason whatsoever, unless the programmer hasn't explicitly told the compiler not to be incredibly stupid via. various "numerical constant suffixes" (like UL).
Note that these "numerical constant suffixes" only really push the problem to larger numbers while also causing additional portability problems. For a simple example, consider uint64_t myNumber = 0xFDECBA9876543210UL; (which will break when compiled on a compiler where long int is only 32-bits).
For another example; consider uint128_t myNumber = 0xFDECBA9876543210FDECBA9876543210ULL; which is broken on virtually all compilers (where long long is 64-bit or smaller), and there is no way to do it correctly (as there simply isn't any "128-bit suffix" that can be used).
I'm compiling the code below and for some reason I can't assign -2147483648 to the variable which is 8 bytes long and signed.
long long x = -2147483648;
When I step over this line, the value of x is 2147483648 and the 'Watch' window in MS Visual Studio shows that the type of x is __int64. A sizeof(x) also returns 8.
According to limit.h the limits for a signed long long are:
#define LLONG_MAX 9223372036854775807i64 /* maximum signed long long int value */
#define LLONG_MIN (-9223372036854775807i64 - 1) /* minimum signed long long int value */
and:
/* minimum signed 64 bit value */
#define _I64_MIN (-9223372036854775807i64 - 1)
/* maximum signed 64 bit value */
#define _I64_MAX 9223372036854775807i64
I just don't get it!!!
Can somebody please shed some light on this?
Without the LL, the compiler appears to deduce 2147483648 is a 32-bit unsigned long. Then it applies the - operator. The result is 0 - 2147483648. Since this is less than 0 and being an unsigned long t, 4294967296 is added, which is 2147483648 again. This value is then assigned to long long x.
Suggest:
long long x = -2147483648LL;
// or
long long x = -2147483647 - 1;
Try assigning to -2147483648LL
see Integer constants here
Your code compiles and executes fine on my GCC 4.6.3 compiler, with --std=c99. I suspect that you are using the rather hopeless so-called C compiler that Microsoft supply. It obviously isn't very clever. Use a long long suffix (i64, ll or LL) to trick it into behaving.
Interestingly the MS C++ compiler cannot get this right either:
#include <iostream>
int main()
{
long long x = -2147483647;
std::cout << x << std::endl;
x = -2147483648;
std::cout << x << std::endl;
x = -2147483649;
std::cout << x << std::endl;
return 0;
}
Output
-2147483647
2147483648
2147483647
I compiled this with the x86 C++ compiler from VS2013.
And I get the same output from my g++ 4.6.3.
So I think there is more to this than meets the eye. I hope somebody that knows more than me could explain all this.
In answer to some of the other comments (sorry, can't reply to each of them as I don't have enough rep yet):
In C and C++ the type of an expression doesn't depend on its context. In this case, the type of -214743648 is defined by the language rules, the fact that you later assign it to a long long doesn't affect this.
Actually this way of doing things makes the language much simpler than the alternative, it's one of the things that attracted me to C in the first place.
In David Heffernan's example,
x = -2147483648;
std::cout << x << std::endl; // gives 2147483648
x = -2147483649;
std::cout << x << std::endl; // gives 2147483647
The important thing is that the - sign is NOT part of an integer literal. The expression 2147483648 is an integer constant whose type is determined according to a set of rules in the standard; and then the unary minus operator is applied to the value (which does not change its type).
Unfortunately, C90, C99, C++98 and C++11 all have different rules for the types of integer literals. Further, the rules are different for decimal constants than for hex or octal constants! You can look them up in the relevant standards documents.
If you have 32-bit ints, then 2147483648 is too large to be an int. In all dialects the next possible type for it is long int. If you also have 32-bit long ints, then in C99 or C++11 it has type long long int. In C90 or C++98 it has type unsigned long int. (Those languages do not have a long long type).
Back to David Heffernan's example. C++98 does not have long long, so either you're using a C++11 compiler, or using Microsoft extensions. Assuming the latter; who knows what they've decided to do for integer constants, but if they have retained the C++98 definition that 2147483648 has type unsigned long int, that would explain the results.
Turns out that I just had to write it like this:
long long x = -2147483648i64;
Why is the compiler not able to figure it out. I already spelled out the type, so why do I have to put 'i64' after the number ???