I've got a question about using suffix for numbers in C.
Example:
long long c;
The variable c is of long long type. To initiate its value, I do (usually)
c = 12;
When done like that, the compiler recognizes c as a long long type.
Then, if I do
printf("%d",sizeof(c));
the result is 8 - which of course is 64 bit. So the compiler remembers that c is of long long type.
But I've seen some examples where I need to force the type to be long long, by doing
c = 12LL
Why is that?
You're declaring the variable c as a long long, so it's a long long int. The type of the variable is not dependent on its value; rather, the range of possible values for c is dependent on the type of c.
On the other hand: For an integer constant/literal, the type is determined by its value and suffix (if any). 12 has no prefix, so it's a decimal constant. And it has no suffix, meaning it has a type of int, since 12 is guaranteed to be in the long range of it. 12LL has no prefix, so it's also a decimal constant. It has a suffix of LL, meaning it has a type of long long int. It's safe to assign 12 to the variable c, because an int can safely be converted to a long long int.
Hope that helps.
long long c;
c = 12;
c is of type long long but 12 is of type int. When 12 is assigned to long long object c it is first converted to long long and then assigned to c.
c = 12LL;
does exactly the same assignment, only there is no need to implicitly convert it first. Both assignments are equivalent and no sane compiler will make a difference.
Note that some coding guides, from example MISRA (for automotive embedded code) requires constants assigned to unsigned types to be suffixed with U:
Example, in C both assignments (here unsigned int x;) are equivalent:
x = 0; /* non-MISRA compliant */
x = 0U;
but MISRA requires the second form (MISRA-C:2004, rule 10.6).
Related
From an Example
unsigned long x = 12345678UL
We have always learnt that the compiler needs to see only "long" in the above example to set 4 bytes (in 32 bit) of memory. The question is why is should we use L/UL in long constants even after declaring it to be a long.
When a suffix L or UL is not used, the compiler uses the first type that can contain the constant from a list (see details in C99 standard, clause 6.4.4:5. For a decimal constant, the list is int, long int, long long int).
As a consequence, most of the times, it is not necessary to use the suffix. It does not change the meaning of the program. It does not change the meaning of your example initialization of x for most architectures, although it would if you had chosen a number that could not be represented as a long long. See also codebauer's answer for an example where the U part of the suffix is necessary.
There are a couple of circumstances when the programmer may want to set the type of the constant explicitly. One example is when using a variadic function:
printf("%lld", 1LL); // correct, because 1LL has type long long
printf("%lld", 1); // undefined behavior, because 1 has type int
A common reason to use a suffix is ensuring that the result of a computation doesn't overflow. Two examples are:
long x = 10000L * 4096L;
unsigned long long y = 1ULL << 36;
In both examples, without suffixes, the constants would have type int and the computation would be made as int. In each example this incurs a risk of overflow. Using the suffixes means that the computation will be done in a larger type instead, which has sufficient range for the result.
As Lightness Races in Orbit puts it, the litteral's suffix comes before the assignment. In the two examples above, simply declaring x as long and y as unsigned long long is not enough to prevent the overflow in the computation of the expressions assigned to them.
Another example is the comparison x < 12U where variable x has type int. Without the U suffix, the compiler types the constant 12 as an int, and the comparison is therefore a comparison of signed ints.
int x = -3;
printf("%d\n", x < 12); // prints 1 because it's true that -3 < 12
With the U suffix, the comparison becomes a comparison of unsigned ints. “Usual arithmetic conversions” mean that -3 is converted to a large unsigned int:
printf("%d\n", x < 12U); // prints 0 because (unsigned int)-3 is large
In fact, the type of a constant may even change the result of an arithmetic computation, again because of the way “usual arithmetic conversions” work.
Note that, for decimal constants, the list of types suggested by C99 does not contain unsigned long long. In C90, the list ended with the largest standardized unsigned integer type at the time (which was unsigned long). A consequence was that the meaning of some programs was changed by adding the standard type long long to C99: the same constant that was typed as unsigned long in C90 could now be typed as a signed long long instead. I believe this is the reason why in C99, it was decided not to have unsigned long long in the list of types for decimal constants.
See this and this blog posts for an example.
Because numerical literals are of typicaly of type int. The UL/L tells the compiler that they are not of type int, e.g. assuming 32bit int and 64bit long
long i = 0xffff;
long j = 0xffffUL;
Here the values on the right must be converted to signed longs (32bit -> 64bit)
The "0xffff", an int, would converted to a long using sign extension, resulting in a negative value (0xffffffff)
The "0xffffUL", an unsigned long, would be converted to a long, resulting in a positive value (0x0000ffff)
The question is why is should we use L/UL in long constants even after declaring it to be a long.
Because it's not "after"; it's "before".
First you have the literal, then it is converted to whatever the type is of the variable you're trying to squeeze it into.
They are two objects. The type of the target is designated by the unsigned long keywords, as you've said. The type of the source is designated by this suffix because that's the only way to specify the type of a literal.
Related to this post is why a u.
A reason for u is to allow an integer constant greater than LLONG_MAX in decimal form.
// Likely to generate a warning.
unsigned long long limit63bit = 18446744073709551615; // 2^64 - 1
// OK
unsigned long long limit63bit = 18446744073709551615u;
I'm currently fixing a legacy bug in C code. In the process of fixing this bug, I stored an unsigned int into an unsigned long long. But to my surprise, math stopped working when I compiled this code on a 64 bit version of GCC. I discovered that the problem was that when I assigned a long long an int value, then I got a number that looked like 0x0000000012345678, but on the 64-bit machine, that number became 0xFFFFFFFF12345678.
Can someone explain to me or point me to some sort of spec or documentation on what is supposed to happen when storing a smaller data type in a larger one and perhaps what the appropriate pattern for doing this in C is?
Update - Code Sample
Here's what I'm doing:
// Results in 0xFFFFFFFFC0000000 in 64 bit gcc 4.1.2
// Results in 0x00000000C0000000 in 32 bit gcc 3.4.6
u_long foo = 3 * 1024 * 1024 * 1024;
I think you have to tell the compiler that the number on the right is unsigned. Otherwise it thinks it's a normal signed int, and since the sign bit is set, it thinks it's negative, and then it sign-extends it into the receiver.
So do some unsigned casting on the right.
Expressions are generally evaluated independently; their results are not affected by the context in which they appear.
An integer constant like 1024 is of the smallest of int, long int, long long int into which its value will fit; in the particular case of 1024 that's always int.
I'll assume here that u_long is a typedef for unsigned long (though you also mentioned long long in your question).
So given:
unsigned long foo = 3 * 1024 * 1024 * 1024;
the 4 constants in the initialization expression are all of type int, and all three multiplications are int-by-int. The result happens to be greater (by a factor of 1.5) than 231, which means it won't fit in an int on a system where int is 32 bits. The int result, whatever it is, will be implicitly converted to the target type unsigned long, but by that time it's too late; the overflow has already occurred.
The overflow means that your code has undefined behavior (and since this can be determined at compile time, I'd expect your compiler to warn about it). In practice, signed overflow typically wraps around, so the above will typically set foo to -1073741824. You can't count on that (and it's not what you want anyway).
The ideal solution is to avoid the implicit conversions by ensuring that everything is of the target type in the first place:
unsigned long foo = 3UL * 1024UL * 1024UL * 1024UL;
(Strictly speaking only the first operand needs to be of type unsigned long, but it's simpler to be consistent.)
Let's look at the more general case:
int a, b, c, d; /* assume these are initialized */
unsigned long foo = a * b * c * d;
You can't add a UL suffix to a variable. If possible, you should change the declarations of a, b, c, and d so they're of type unsigned long long, but perhaps there's some other reason they need to be of type int. You can add casts to explicitly convert each one to the correct type. By using casts, you can control exactly when the conversions are performed:
unsigned long foo = (unsigned long)a *
(unsigned long)b *
(unsigned long)d *
(unsigned long)d;
This gets a bit verbose; you might consider applying the cast only to the leftmost operand (after making sure you understand how the expression is parsed).
NOTE: This will not work:
unsigned long foo = (unsigned long)(a * b * c * d);
The cast converts the int result to unsigned long, but only after the overflow has already occurred. It merely specifies explicitly the cast that would have been performed implicitly.
Integral literals with a suffix are int if they can fit, in your case 3 and 1024 can definitely fit. This is covered in the draft C99 standard section 6.4.4.1 Integer constants, a quote of this section can be found in my answer to Are C macros implicitly cast?.
Next we have the multiplication, which performs the usual arithmetic conversions conversions on it's operands but since they are all int the result of which is too large to fit in a signed int which results in overflow. This is undefined behavior as per section 5 which says:
If an exceptional condition occurs during the evaluation of an expression (that is, if the
result is not mathematically defined or not in the range of representable values for its
type), the behavior is undefined.
We can discover this undefined behavior empirically using clang and the -fsanitize=undefined flags (see it live) which says:
runtime error: signed integer overflow: 3145728 * 1024 cannot be represented in type 'int'
Although in two complement this will just end up being a negative number. One way to fix this would be to use the ul suffix:
3ul * 1024ul * 1024ul * 1024ul
So why does a negative number converted to an unsigned value give a very large unsigned value? This is covered in section 6.3.1.3 Signed and unsigned integers which says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.49)
which basically means unsigned long max + 1 is added to the negative number which results in very large unsigned value.
If I write this declaration:
unsigned ux = 2147483648;
(231), will the C compiler treat 2147483648 as an unsigned or signed value?
I've heard that constant values are always treated as signed, but I don't think that's always right.
The value of an unsuffixed decimal constant such as 2147483648 depends on the value of the constant, the ranges of the predefined type, and, in some cases on the version of the C standard you're using.
In C89/C90, the type is the first of:
int
long int
unsigned long int
in which it fits.
In C99 and later, it's the first of:
int
long int
long long int
in which it fits.
You didn't tell us what implementation you're using, but if long int is 32 bits on your system, then 2147483648 will be of type unsigned long int if you have a pre-C99 compiler, or (signed) long long int if you have a C99 or later compiler.
But in your particular case:
unsigned ux = 2147483648;
it doesn't matter. If the constant is of type unsigned int, then it's already of the right type, and no conversion is necessary. If it's of type long long int (as it must be in C99 or later, given 32-bit long), then the value must be converted from that type to unsigned. Conversion from a signed type to an unsigned type is well defined.
So if unsigned is wide enough to represent the value 2147483648, then that's the value that will be stored in ux. And if it isn't (if unsigned int is 16 bits, for example), then the conversion will result in 0 being stored in ux.
You can exercise some control over the type of a constant by appending a suffix to it. For example, 2147483648UL is guaranteed to be of some unsigned type (it could be either unsigned int or unsigned long int).
Incidentally, your question's title is currently "About Class Cast.(if I write unsigned ux=2147483648(2 to the 31 st))", but your question has nothing to do with classes (which don't exist in C) or with casts. I'll edit the question.
Can you tell me what exactly does the u after a number, for example:
#define NAME_DEFINE 1u
Integer literals like 1 in C code are always of the type int. int is the same thing as signed int. One adds u or U (equivalent) to the literal to ensure it is unsigned int, to prevent various unexpected bugs and strange behavior.
One example of such a bug:
On a 16-bit machine where int is 16 bits, this expression will result in a negative value:
long x = 30000 + 30000;
Both 30000 literals are int, and since both operands are int, the result will be int. A 16-bit signed int can only contain values up to 32767, so it will overflow. x will get a strange, negative value because of this, rather than 60000 as expected.
The code
long x = 30000u + 30000u;
will however behave as expected.
It is a way to define unsigned literal integer constants.
It is a way of telling the compiler that the constant 1 is meant to be used as an unsigned integer. Some compilers assume that any number without a suffix like 'u' is of int type. To avoid this confusion, it is recommended to use a suffix like 'u' when using a constant as an unsigned integer. Other similar suffixes also exist. For example, for float 'f' is used.
it means "unsigned int", basically it functions like a cast to make sure that numeric constants are converted to the appropriate type at compile-time.
A decimal literal in the code (rules for octal and hexadecimal literals are different, see https://en.cppreference.com/w/c/language/integer_constant) has one of the types int, long or long long. From these, the compiler has to choose the smallest type that is large enough to hold the value. Note that the types char, signed char and short are not considered. For example:
0 // this is a zero of type int
32767 // type int
32768 // could be int or long: On systems with 16 bit integers
// the type will be long, because the value does not fit in an int there.
If you add a u suffix to such a number (a capital U will also do), the compiler will instead have to choose the smallest type from unsigned int, unsigned long and unsigned long long. For example:
0u // a zero of type unsigned int
32768u // type unsigned int: always fits into an unsigned int
100000u // unsigned int or unsigned long
The last example can be used to show the difference to a cast:
100000u // always 100000, but may be unsigned int or unsigned long
(unsigned int)100000 // always unsigned int, but not always 100000
// (e.g. if int has only 16 bit)
On a side note: There are situations, where adding a u suffix is the right thing to ensure correctness of computations, as Lundin's answer demonstrates. However, there are also coding guidelines that strictly forbid mixing of signed and unsigned types, even to the extent that the following statement
unsigned int x = 0;
is classified as non-conforming and has to be written as
unsigned int x = 0u;
This can lead to a situation where developers that deal a lot with unsigned values develop the habit of adding u suffixes to literals everywhere. But, be aware that changing signedness can lead to different behavior in various contexts, for example:
(x > 0)
can (depending on the type of x) mean something different than
(x > 0u)
Luckily, the compiler / code checker will typically warn you about suspicious cases. Nevertheless, adding a u suffix should be done with consideration.
In C, are the following equivalent:
long int x = 3L; (notice the L)
and
long int x = 3
They seem to be the same. In case they are, which one should be used? Should the L be specified explicitly?
If they are different, what is the difference?
3.14L is a long double literal, while 3.14 is a double literal. It won't make much difference in this case, since both are being used to initialize a long int. The result will be 3.
EDIT:
Ok, 3L is a long literal, while 3 is an int literal. It still won't make much difference, since the int will be "promoted" to a long. The result will be the same in both cases.
EDIT 2:
One place it might make a difference is something like this:
printf("%ld\n", 123);
This is undefined behavior, since the format string specifies a long and only an int is being passed. This would be correct:
printf("%ld\n", 123L);
A decimal integer constant without suffix has - depending on its value - the type int, long, long long, or possibly an implementation-defined extended signed integer type with range greater than long long.
Adding the L suffix means the type will be at least long, the LL suffix means that the type will be at least long long.
If you use the constant to initialize a variable, adding a suffix makes no difference, as the value will be converted to the target-type anyway. However, the type of the constant may well be relevant in more complex expressions as it affects operator semantics, argument promotion and possibly other things I didn't think of right now. For example, assuming a 16-bit int type,
long foo = 42 << 20;
invokes undefined behaviour, whereas
long bar = 42L << 20;
is well-defined.