Why are constants in C terminated by use of L or UL etc. For example
unsigned long x = 12345678UL;
My question is what is the significance of this and are there any advantages in doing this.
because any number such as 12345 is treated as an integer in C. The problem comes when you try to do bitwise operations on them. Then, it can overflow.
This can have serious untracable errors and bugs. To avoid that, when a larger constant number is to be assigned to a (unsigned)long variable, UL and L are used.
UL is to tell the compiler to treat the integer token as an unsigned long rather than int
L is to tell the compiler to treat the integer token as long rather than int.
The suffix of an integer constant forces a minimum integer type, but the compiler will choose a large one (consistent with some constraints on signedness) if the number cannot be represented in it (see C11 6.4.4.1, in particular the table after §5).
If all you do is use the constant to initialize a variable, you don't need any suffix (except for the edge case of a number that is in range of unsigned long long, but not long long - in that case, any of the unsigned suffixes u, ul or ull as well as octal or hexadecimal representation can be used - decimal integer constants without suffix only promote to signed types).
Suffixes become important if you use the constants in more complex expressions because they will determine the result of it, eg
32u << 30
has type unsigned and will truncate the value, whereas
32ull << 30
won't.
I saw the question which is commented. But my question is different. I want to know the reason behind doing this.
The reason is overflow:
#include <stdio.h>
int main(void)
{
unsigned long a = 1U << 32; // (1U = unsigned int)
unsigned long b = 1UL << 32; // (1UL = unsigned long int)
printf("%lu %lu\n", a, b);
return 0;
}
On my computer int is 32 bits and long is 64, this is the output:
0 4294967296
This is what happens:
a 000000000000000000000000000000000000001 1U
^--------------------------------------- << 32 left shift count >= width of type
b 000000000000000000000000000000000000000000000000000000000000000000000000001 1UL
^---------------------------------- << 32
The reason is that C is very badly designed, and will forcibly convert numbers that are too large into an int for no reason whatsoever, unless the programmer hasn't explicitly told the compiler not to be incredibly stupid via. various "numerical constant suffixes" (like UL).
Note that these "numerical constant suffixes" only really push the problem to larger numbers while also causing additional portability problems. For a simple example, consider uint64_t myNumber = 0xFDECBA9876543210UL; (which will break when compiled on a compiler where long int is only 32-bits).
For another example; consider uint128_t myNumber = 0xFDECBA9876543210FDECBA9876543210ULL; which is broken on virtually all compilers (where long long is 64-bit or smaller), and there is no way to do it correctly (as there simply isn't any "128-bit suffix" that can be used).
Related
From an Example
unsigned long x = 12345678UL
We have always learnt that the compiler needs to see only "long" in the above example to set 4 bytes (in 32 bit) of memory. The question is why is should we use L/UL in long constants even after declaring it to be a long.
When a suffix L or UL is not used, the compiler uses the first type that can contain the constant from a list (see details in C99 standard, clause 6.4.4:5. For a decimal constant, the list is int, long int, long long int).
As a consequence, most of the times, it is not necessary to use the suffix. It does not change the meaning of the program. It does not change the meaning of your example initialization of x for most architectures, although it would if you had chosen a number that could not be represented as a long long. See also codebauer's answer for an example where the U part of the suffix is necessary.
There are a couple of circumstances when the programmer may want to set the type of the constant explicitly. One example is when using a variadic function:
printf("%lld", 1LL); // correct, because 1LL has type long long
printf("%lld", 1); // undefined behavior, because 1 has type int
A common reason to use a suffix is ensuring that the result of a computation doesn't overflow. Two examples are:
long x = 10000L * 4096L;
unsigned long long y = 1ULL << 36;
In both examples, without suffixes, the constants would have type int and the computation would be made as int. In each example this incurs a risk of overflow. Using the suffixes means that the computation will be done in a larger type instead, which has sufficient range for the result.
As Lightness Races in Orbit puts it, the litteral's suffix comes before the assignment. In the two examples above, simply declaring x as long and y as unsigned long long is not enough to prevent the overflow in the computation of the expressions assigned to them.
Another example is the comparison x < 12U where variable x has type int. Without the U suffix, the compiler types the constant 12 as an int, and the comparison is therefore a comparison of signed ints.
int x = -3;
printf("%d\n", x < 12); // prints 1 because it's true that -3 < 12
With the U suffix, the comparison becomes a comparison of unsigned ints. “Usual arithmetic conversions” mean that -3 is converted to a large unsigned int:
printf("%d\n", x < 12U); // prints 0 because (unsigned int)-3 is large
In fact, the type of a constant may even change the result of an arithmetic computation, again because of the way “usual arithmetic conversions” work.
Note that, for decimal constants, the list of types suggested by C99 does not contain unsigned long long. In C90, the list ended with the largest standardized unsigned integer type at the time (which was unsigned long). A consequence was that the meaning of some programs was changed by adding the standard type long long to C99: the same constant that was typed as unsigned long in C90 could now be typed as a signed long long instead. I believe this is the reason why in C99, it was decided not to have unsigned long long in the list of types for decimal constants.
See this and this blog posts for an example.
Because numerical literals are of typicaly of type int. The UL/L tells the compiler that they are not of type int, e.g. assuming 32bit int and 64bit long
long i = 0xffff;
long j = 0xffffUL;
Here the values on the right must be converted to signed longs (32bit -> 64bit)
The "0xffff", an int, would converted to a long using sign extension, resulting in a negative value (0xffffffff)
The "0xffffUL", an unsigned long, would be converted to a long, resulting in a positive value (0x0000ffff)
The question is why is should we use L/UL in long constants even after declaring it to be a long.
Because it's not "after"; it's "before".
First you have the literal, then it is converted to whatever the type is of the variable you're trying to squeeze it into.
They are two objects. The type of the target is designated by the unsigned long keywords, as you've said. The type of the source is designated by this suffix because that's the only way to specify the type of a literal.
Related to this post is why a u.
A reason for u is to allow an integer constant greater than LLONG_MAX in decimal form.
// Likely to generate a warning.
unsigned long long limit63bit = 18446744073709551615; // 2^64 - 1
// OK
unsigned long long limit63bit = 18446744073709551615u;
I implemented a fast-power algorithm to perform the operation (x^y) % m.
Since m is very large (4294434817), I used long long to store the result. However, long long still seems not enough during the operation. For example, I got a negative number for (3623752876 * 3623752876) % 4294434817.
Is there anyway to figure it out?
All three of those constants are between 231 and 232.
The type unsigned long long is guaranteed to be able to store values up to at least 264-1, which exceeds the product 3623752876 * 3623752876.
So just use unsigned long long for the calculation. long long is wide enough to hold the individual constants, but not the product.
You could also use uint64_t, defined in <stdint.h>. Unlike unsigned long long, it's guaranteed to be exactly 64 bits wide. Since you don't really need an exact width of 64 bits (128-bit arithmetic would work just as well), uint_least64_t or uint_fast64_t is probably more suitable. But unsigned long long is arguably simpler, and in this case it will work correctly. (uint64_t is not guaranteed to exist, though on any C99 or later implementation it almost certainly will.)
For larger values, including intermediate results, you'll likely need to use something wider than unsigned long long, which is likely to require some kind of multi-precision arithmetic. The GNU GMP library is one possibility. Another is to use a language that has built-in support for arbitrary-width integer arithmetic (such as Python).
This answer is based on the calculation (x * x) % y although the question is not entirely clear.
Use uint64_t because although unsigned int is large enough to hold the operands, and the result, it won't hold the product.
#include <stdio.h>
#include <stdint.h>
int main(void)
{
unsigned x = 3623752876;
unsigned m = 4294434817;
uint64_t r;
r = ((uint64_t)x * x) % m;
printf("%u\n", (unsigned)r);
return 0;
}
Program output:
3896043471
We can use the power of modulus arithmetic to do such calculations. The fundamental property of multiplication in modulus arithmetic for two numbers a and b states:
(a*b)%m = ((a%m)*(b%m))%m
If m fits into the long long int type, then the above calculation will never overflow.
Since you are trying to do modular exponentiation, you can read more about it here.
With my compiler, c is 54464 (16 bits truncated) and d is 10176.
But with gcc, c is 120000 and d is 600000.
What is the true behavior? Is the behavior undefined? Or is my compiler false?
unsigned short a = 60000;
unsigned short b = 60000;
unsigned long c = a + b;
unsigned long d = a * 10;
Is there an option to alert on these cases?
Wconversion warns on:
void foo(unsigned long a);
foo(a+b);
but doesn't warn on:
unsigned long c = a + b
First, you should know that in C the standard types do not have a specific precision (number of representable values) for the standard integer types. It only requires a minimal precision for each type. These result in the following typical bit sizes, the standard allows for more complex representations:
char: 8 bits
short: 16 bits
int: 16 (!) bits
long: 32 bits
long long (since C99): 64 bits
Note: The actual limits (which imply a certain precision) of an implementation are given in limits.h.
Second, the type an operation is performed is determined by the types of the operands, not the type of the left side of an assignment (becaus assignments are also just expressions). For this the types given above are sorted by conversion rank. Operands with smaller rank than int are converted to int first. For other operands, the one with smaller rank is converted to the type of the other operand. These are the usual arithmetic conversions.
Your implementation seems to use 16 bit unsigned int with the same size as unsigned short, so a and b are converted to unsigned int, the operation is performed with 16 bit. For unsigned, the operation is performed modulo 65536 (2 to the power of 16) - this is called wrap-around (this is not required for signed types!). The result is then converted to unsigned long and assigned to the variables.
For gcc, I assume this compiles for a PC or a 32 bit CPU. for this(unsigned) int has typically 32 bits, while (unsigned) long has at least 32 bits (required). So, there is no wrap around for the operations.
Note: For the PC, the operands are converted to int, not unsigned int. This because int can already represent all values of unsigned short; unsigned int is not required. This can result in unexpected (actually: implementation defined) behaviour if the result of the operation overflows an signed int!
If you need types of defined size, see stdint.h (since C99) for uint16_t, uint32_t. These are typedefs to types with the appropriate size for your implementation.
You can also cast one of the operands (not the whole expression!) to the type of the result:
unsigned long c = (unsigned long)a + b;
or, using types of known size:
#include <stdint.h>
...
uint16_t a = 60000, b = 60000;
uint32_t c = (uint32_t)a + b;
Note that due to the conversion rules, casting one operand is sufficient.
Update (thanks to #chux):
The cast shown above works without problems. However, if a has a larger conversion rank than the typecast, this might truncate its value to the smaller type. While this can be easily avoided as all types are known at compile-time (static typing), an alternative is to multiply with 1 of the wanted type:
unsigned long c = ((unsigned long)1U * a) + b
This way the larger rank of the type given in the cast or a (or b) is used. The multiplication will be eliminated by any reasonable compiler.
Another approach, avoiding to even know the target type name can be done with the typeof() gcc extension:
unsigned long c;
... many lines of code
c = ((typeof(c))1U * a) + b
a + b will be computed as an unsigned int (the fact that it is assigned to an unsigned long is not relevant). The C standard mandates that this sum will wrap around modulo "one plus the largest unsigned possible". On your system, it looks like an unsigned int is 16 bit, so the result is computed modulo 65536.
On the other system, it looks like int and unsigned int are larger, and therefore capable of holding the larger numbers. What happens now is quite subtle (acknowledge #PascalCuoq): Beacuse all values of unsigned short are representable in int, a + b will be computed as an int. (Only if short and int are the same width or, in some other way, some values of unsigned short cannot be represented as int will the sum will be computed as unsigned int.)
Although the C standard does not specify a fixed size for either an unsigned short or an unsigned int, your program behaviour is well-defined. Note that this is not true for an signed type though.
As a final remark, you can use the sized types uint16_t, uint32_t etc. which, if supported by your compiler, are guaranteed to have the specified size.
In C the types char, short (and their unsigned couterparts) and float should be considered to be as "storage" types because they're designed to optimize the storage but are not the "native" size that the CPU prefers and they are never used for computations.
For example when you have two char values and place them in an expression they are first converted to int, then the operation is performed. The reason is that the CPU works better with int. The same happens for float that is always implicitly converted to a double for computations.
In your code the computation a+b is a sum of two unsigned integers; in C there's no way of computing the sum of two unsigned shorts... what you can do is store the final result in an unsigned short that, thanks to the properties of modulo math, will be the same.
I'm compiling the code below and for some reason I can't assign -2147483648 to the variable which is 8 bytes long and signed.
long long x = -2147483648;
When I step over this line, the value of x is 2147483648 and the 'Watch' window in MS Visual Studio shows that the type of x is __int64. A sizeof(x) also returns 8.
According to limit.h the limits for a signed long long are:
#define LLONG_MAX 9223372036854775807i64 /* maximum signed long long int value */
#define LLONG_MIN (-9223372036854775807i64 - 1) /* minimum signed long long int value */
and:
/* minimum signed 64 bit value */
#define _I64_MIN (-9223372036854775807i64 - 1)
/* maximum signed 64 bit value */
#define _I64_MAX 9223372036854775807i64
I just don't get it!!!
Can somebody please shed some light on this?
Without the LL, the compiler appears to deduce 2147483648 is a 32-bit unsigned long. Then it applies the - operator. The result is 0 - 2147483648. Since this is less than 0 and being an unsigned long t, 4294967296 is added, which is 2147483648 again. This value is then assigned to long long x.
Suggest:
long long x = -2147483648LL;
// or
long long x = -2147483647 - 1;
Try assigning to -2147483648LL
see Integer constants here
Your code compiles and executes fine on my GCC 4.6.3 compiler, with --std=c99. I suspect that you are using the rather hopeless so-called C compiler that Microsoft supply. It obviously isn't very clever. Use a long long suffix (i64, ll or LL) to trick it into behaving.
Interestingly the MS C++ compiler cannot get this right either:
#include <iostream>
int main()
{
long long x = -2147483647;
std::cout << x << std::endl;
x = -2147483648;
std::cout << x << std::endl;
x = -2147483649;
std::cout << x << std::endl;
return 0;
}
Output
-2147483647
2147483648
2147483647
I compiled this with the x86 C++ compiler from VS2013.
And I get the same output from my g++ 4.6.3.
So I think there is more to this than meets the eye. I hope somebody that knows more than me could explain all this.
In answer to some of the other comments (sorry, can't reply to each of them as I don't have enough rep yet):
In C and C++ the type of an expression doesn't depend on its context. In this case, the type of -214743648 is defined by the language rules, the fact that you later assign it to a long long doesn't affect this.
Actually this way of doing things makes the language much simpler than the alternative, it's one of the things that attracted me to C in the first place.
In David Heffernan's example,
x = -2147483648;
std::cout << x << std::endl; // gives 2147483648
x = -2147483649;
std::cout << x << std::endl; // gives 2147483647
The important thing is that the - sign is NOT part of an integer literal. The expression 2147483648 is an integer constant whose type is determined according to a set of rules in the standard; and then the unary minus operator is applied to the value (which does not change its type).
Unfortunately, C90, C99, C++98 and C++11 all have different rules for the types of integer literals. Further, the rules are different for decimal constants than for hex or octal constants! You can look them up in the relevant standards documents.
If you have 32-bit ints, then 2147483648 is too large to be an int. In all dialects the next possible type for it is long int. If you also have 32-bit long ints, then in C99 or C++11 it has type long long int. In C90 or C++98 it has type unsigned long int. (Those languages do not have a long long type).
Back to David Heffernan's example. C++98 does not have long long, so either you're using a C++11 compiler, or using Microsoft extensions. Assuming the latter; who knows what they've decided to do for integer constants, but if they have retained the C++98 definition that 2147483648 has type unsigned long int, that would explain the results.
Turns out that I just had to write it like this:
long long x = -2147483648i64;
Why is the compiler not able to figure it out. I already spelled out the type, so why do I have to put 'i64' after the number ???
Can you tell me what exactly does the u after a number, for example:
#define NAME_DEFINE 1u
Integer literals like 1 in C code are always of the type int. int is the same thing as signed int. One adds u or U (equivalent) to the literal to ensure it is unsigned int, to prevent various unexpected bugs and strange behavior.
One example of such a bug:
On a 16-bit machine where int is 16 bits, this expression will result in a negative value:
long x = 30000 + 30000;
Both 30000 literals are int, and since both operands are int, the result will be int. A 16-bit signed int can only contain values up to 32767, so it will overflow. x will get a strange, negative value because of this, rather than 60000 as expected.
The code
long x = 30000u + 30000u;
will however behave as expected.
It is a way to define unsigned literal integer constants.
It is a way of telling the compiler that the constant 1 is meant to be used as an unsigned integer. Some compilers assume that any number without a suffix like 'u' is of int type. To avoid this confusion, it is recommended to use a suffix like 'u' when using a constant as an unsigned integer. Other similar suffixes also exist. For example, for float 'f' is used.
it means "unsigned int", basically it functions like a cast to make sure that numeric constants are converted to the appropriate type at compile-time.
A decimal literal in the code (rules for octal and hexadecimal literals are different, see https://en.cppreference.com/w/c/language/integer_constant) has one of the types int, long or long long. From these, the compiler has to choose the smallest type that is large enough to hold the value. Note that the types char, signed char and short are not considered. For example:
0 // this is a zero of type int
32767 // type int
32768 // could be int or long: On systems with 16 bit integers
// the type will be long, because the value does not fit in an int there.
If you add a u suffix to such a number (a capital U will also do), the compiler will instead have to choose the smallest type from unsigned int, unsigned long and unsigned long long. For example:
0u // a zero of type unsigned int
32768u // type unsigned int: always fits into an unsigned int
100000u // unsigned int or unsigned long
The last example can be used to show the difference to a cast:
100000u // always 100000, but may be unsigned int or unsigned long
(unsigned int)100000 // always unsigned int, but not always 100000
// (e.g. if int has only 16 bit)
On a side note: There are situations, where adding a u suffix is the right thing to ensure correctness of computations, as Lundin's answer demonstrates. However, there are also coding guidelines that strictly forbid mixing of signed and unsigned types, even to the extent that the following statement
unsigned int x = 0;
is classified as non-conforming and has to be written as
unsigned int x = 0u;
This can lead to a situation where developers that deal a lot with unsigned values develop the habit of adding u suffixes to literals everywhere. But, be aware that changing signedness can lead to different behavior in various contexts, for example:
(x > 0)
can (depending on the type of x) mean something different than
(x > 0u)
Luckily, the compiler / code checker will typically warn you about suspicious cases. Nevertheless, adding a u suffix should be done with consideration.