Why do we put suffixes following numeric literals? [duplicate]

Why do we put suffixes following numeric literals? [duplicate] - c

This question already has answers here:
what is the reason for explicitly declaring L or UL for long values
(4 answers)
Closed 6 years ago.
I am pretty much sure this question has been answered, though I didn't manage to find it. I know the rules for the type conversions like this. Even if we assign 1 (which is by default of a type signed int) to unsigned int variable, the variable of type unsigned int will have the value if 1 in either case. In other words, why would I want to put U suffix, unless to avoid type conversion (if I intend to assign that value mostly to unsigned ints)?

Literal value suffixes are most important when you need to precisely control the type. For example 4 billion fits in an unsigned 32-bit int but not a signed one. So if you do this, your compiler may complain:
printf("%u", 4000000000);
warning: format specifies type 'unsigned int' but the argument has type 'long'
Also one may use the float suffix f to ensure a value is used that way in arithmetic, such as 1f / x (which could also be written 1. / x or 1.0 / x). This is important if x might be an integral type but the result is meant to be floating point.

An integer constant does not need a suffix to exist with a given value (aside from values representable as some unsigned in decimal but not signed). The trick is what type is that constant and how is that used.
Suppress warning as the integer decimal constant cannot be presented as a signed long long.
// pow(2,64) - 1
unsigned long long x1 = 18446744073709551615; // warning
unsigned long long x2 = 18446744073709551615u;// no warning
Consider #Eugene Sh. example
-1 >> 1 // Rigth shifting negative values is implementation defined behavior
// versus
-1U >> 1 // Well defined. -1U has the positive value of `UINT_MAX`
Sometime simple constants like 1u are used for gently type conversion
// The product is the wider of the type of `x` or `unsigned`
x*1u
#John Zwinck provides a good printf() example.
The suffixes u and U insure the type is some unsigned integer like unsigned or wider.
The is no suffix to insure the type is signed. Use decimal constants.
The suffixes l and L insure the type is at least long/unsigned long without changing it sign-ness.
The suffixes ll and LL insure the type is at least long long/unsigned long long without changing it sign-ness.
The is no suffix to insure the type is narrower than int/unsigned.
There is no standard suffix to insure the type is intmax_t/uintmax_t.

Related

What's the difference between 1024 and 1024L while assigning variable?

What's the difference ? Both of them give same output while using printf("%ld")
long x = 1024;
long y = 1024L;

In C source code, 1024 is an int, and 1024L is a long int. During an assignment, the value on the right is converted to the type of the left operand. As long as the rules about which combinations of operands are obeyed and the value on the right is in the range of the left operand, there is no difference—the value remains unchanged.
In general, a decimal constant without a suffix is an int, and a decimal constant with an L is a long int. However, if its value is too big to be represented in the usual type, it will automatically be the next larger type. For example, in a C implementation where the maximum int is 2147483647, the constant 3000000000 in source code will be a long int even though it has no suffix. (Note that this rule means the same constant in source code can have different types in different C implementations.) If a long int is not big enough, it will be long long int. If that is not big enough, it can be a signed extended integer type, if the implementation supports one.
The rules above are for decimal constants. There are also hexadecimal constants (which begin with 0x or 0X) and octal constants (which begin with 0—020 is octal for sixteen, unlike 20 which is decimal for twenty), which may have signed or unsigned types. The different integer types are important because overflow and conversions behave differently depending on type. It is easy to take integer operations as a matter of course and assume they work, but it important to learn the details to avoid problems.

Suffix for a intmax_t literal

There doesn't seem to be a 'J' suffix (a la printf's %jd).
So, is it guaranteed that the LL and ULL suffixes are going to work with intmax_t and uintmax_t types?
#include <stdint.h>
intmax_t yuuge = 123456789101112131411516LL;
or is it possible that there are literals that are too big for the LL suffix? Say, a (hypothetical) system with 32 bit int, 32 bit long, 64 bit long long, 128 bit intmax_t.

No suffix is needed if you just want the value to be faithfully represented. The C language automatically gives integer literals the right type. Suffixes are only needed if you want to force a literal to have higher-rank type than it would naturally have due to its value (e.g. 1UL to get the value 1 as unsigned long rather than int, or -1UL as an alternate expression for ULONG_MAX).
If you do want to force a literal to have type intmax_t, use the INTMAX_C() macro from stdint.h.

it possible that there are literals that are too big for the LL suffix
Yes, if the integer constant exceeds the range of (u)intmax_t, it is too big, with or without the LL.
See Assigning 128 bit integer in C
for a similar problem.
LL and LLU are not for types. They are for integer constants.
An L or LL insures the minimum type of a constant. The is no suffix for intmax_t.
123 is an `int`
123L is a `long`
123LL is a `long long`
123456789012345 is a `long long` on OP's hypothetical system even without LL
intmax_t may have the same range as long long - or it may be wider. Both intmax_t and long long are at least 64-bit.
With a well warning enabled compiler, should the constant exceed the intmax_t range, a warning would occur. Examples:
// warning: integer overflow in expression
intmax_t yuuge1 = (intmax_t)123456*1000000000000000000 + 789101112131411516;
// warning: overflow in implicit constant conversion [-Woverflow]
intmax_t yuuge2 = 123456789101112131411516;
C provides macros for greatest-width integer constants
The following macro expands to an integer constant expression having the value specified by its argument and the type intmax_t: C11 §7.20.4.2 1
INTMAX_C(value)
The INTMAX_C(value) does have a limitation
The argument in any instance of these macros shall be an unsuffixed integer constant ... with a value that does not exceed the limits for the corresponding type.
The following does not meet that requirement on machines with with 64-bit intmax_t.
// Not so portable code
intmax_t yuuge = INTMAX_C(123456789101112131411516);
# pre-processing is also limited to intmax_t.
Code that attempts to create a constant outside the (u)int64_t range can easily have portability problems. For portability, another coding approach is advised (Avoid such large constants).

Are the L and LL integer suffixes ever needed? [duplicate]

From an Example
unsigned long x = 12345678UL
We have always learnt that the compiler needs to see only "long" in the above example to set 4 bytes (in 32 bit) of memory. The question is why is should we use L/UL in long constants even after declaring it to be a long.

When a suffix L or UL is not used, the compiler uses the first type that can contain the constant from a list (see details in C99 standard, clause 6.4.4:5. For a decimal constant, the list is int, long int, long long int).
As a consequence, most of the times, it is not necessary to use the suffix. It does not change the meaning of the program. It does not change the meaning of your example initialization of x for most architectures, although it would if you had chosen a number that could not be represented as a long long. See also codebauer's answer for an example where the U part of the suffix is necessary.
There are a couple of circumstances when the programmer may want to set the type of the constant explicitly. One example is when using a variadic function:
printf("%lld", 1LL); // correct, because 1LL has type long long
printf("%lld", 1); // undefined behavior, because 1 has type int
A common reason to use a suffix is ensuring that the result of a computation doesn't overflow. Two examples are:
long x = 10000L * 4096L;
unsigned long long y = 1ULL << 36;
In both examples, without suffixes, the constants would have type int and the computation would be made as int. In each example this incurs a risk of overflow. Using the suffixes means that the computation will be done in a larger type instead, which has sufficient range for the result.
As Lightness Races in Orbit puts it, the litteral's suffix comes before the assignment. In the two examples above, simply declaring x as long and y as unsigned long long is not enough to prevent the overflow in the computation of the expressions assigned to them.
Another example is the comparison x < 12U where variable x has type int. Without the U suffix, the compiler types the constant 12 as an int, and the comparison is therefore a comparison of signed ints.
int x = -3;
printf("%d\n", x < 12); // prints 1 because it's true that -3 < 12
With the U suffix, the comparison becomes a comparison of unsigned ints. “Usual arithmetic conversions” mean that -3 is converted to a large unsigned int:
printf("%d\n", x < 12U); // prints 0 because (unsigned int)-3 is large
In fact, the type of a constant may even change the result of an arithmetic computation, again because of the way “usual arithmetic conversions” work.
Note that, for decimal constants, the list of types suggested by C99 does not contain unsigned long long. In C90, the list ended with the largest standardized unsigned integer type at the time (which was unsigned long). A consequence was that the meaning of some programs was changed by adding the standard type long long to C99: the same constant that was typed as unsigned long in C90 could now be typed as a signed long long instead. I believe this is the reason why in C99, it was decided not to have unsigned long long in the list of types for decimal constants.
See this and this blog posts for an example.

Because numerical literals are of typicaly of type int. The UL/L tells the compiler that they are not of type int, e.g. assuming 32bit int and 64bit long
long i = 0xffff;
long j = 0xffffUL;
Here the values on the right must be converted to signed longs (32bit -> 64bit)
The "0xffff", an int, would converted to a long using sign extension, resulting in a negative value (0xffffffff)
The "0xffffUL", an unsigned long, would be converted to a long, resulting in a positive value (0x0000ffff)

The question is why is should we use L/UL in long constants even after declaring it to be a long.
Because it's not "after"; it's "before".
First you have the literal, then it is converted to whatever the type is of the variable you're trying to squeeze it into.
They are two objects. The type of the target is designated by the unsigned long keywords, as you've said. The type of the source is designated by this suffix because that's the only way to specify the type of a literal.

Related to this post is why a u.
A reason for u is to allow an integer constant greater than LLONG_MAX in decimal form.
// Likely to generate a warning.
unsigned long long limit63bit = 18446744073709551615; // 2^64 - 1
// OK
unsigned long long limit63bit = 18446744073709551615u;

What is the proper way to store narrower data types into a wider data type in the C language?

I'm currently fixing a legacy bug in C code. In the process of fixing this bug, I stored an unsigned int into an unsigned long long. But to my surprise, math stopped working when I compiled this code on a 64 bit version of GCC. I discovered that the problem was that when I assigned a long long an int value, then I got a number that looked like 0x0000000012345678, but on the 64-bit machine, that number became 0xFFFFFFFF12345678.
Can someone explain to me or point me to some sort of spec or documentation on what is supposed to happen when storing a smaller data type in a larger one and perhaps what the appropriate pattern for doing this in C is?
Update - Code Sample
Here's what I'm doing:
// Results in 0xFFFFFFFFC0000000 in 64 bit gcc 4.1.2
// Results in 0x00000000C0000000 in 32 bit gcc 3.4.6
u_long foo = 3 * 1024 * 1024 * 1024;

I think you have to tell the compiler that the number on the right is unsigned. Otherwise it thinks it's a normal signed int, and since the sign bit is set, it thinks it's negative, and then it sign-extends it into the receiver.
So do some unsigned casting on the right.

Expressions are generally evaluated independently; their results are not affected by the context in which they appear.
An integer constant like 1024 is of the smallest of int, long int, long long int into which its value will fit; in the particular case of 1024 that's always int.
I'll assume here that u_long is a typedef for unsigned long (though you also mentioned long long in your question).
So given:
unsigned long foo = 3 * 1024 * 1024 * 1024;
the 4 constants in the initialization expression are all of type int, and all three multiplications are int-by-int. The result happens to be greater (by a factor of 1.5) than 231, which means it won't fit in an int on a system where int is 32 bits. The int result, whatever it is, will be implicitly converted to the target type unsigned long, but by that time it's too late; the overflow has already occurred.
The overflow means that your code has undefined behavior (and since this can be determined at compile time, I'd expect your compiler to warn about it). In practice, signed overflow typically wraps around, so the above will typically set foo to -1073741824. You can't count on that (and it's not what you want anyway).
The ideal solution is to avoid the implicit conversions by ensuring that everything is of the target type in the first place:
unsigned long foo = 3UL * 1024UL * 1024UL * 1024UL;
(Strictly speaking only the first operand needs to be of type unsigned long, but it's simpler to be consistent.)
Let's look at the more general case:
int a, b, c, d; /* assume these are initialized */
unsigned long foo = a * b * c * d;
You can't add a UL suffix to a variable. If possible, you should change the declarations of a, b, c, and d so they're of type unsigned long long, but perhaps there's some other reason they need to be of type int. You can add casts to explicitly convert each one to the correct type. By using casts, you can control exactly when the conversions are performed:
unsigned long foo = (unsigned long)a *
(unsigned long)b *
(unsigned long)d *
(unsigned long)d;
This gets a bit verbose; you might consider applying the cast only to the leftmost operand (after making sure you understand how the expression is parsed).
NOTE: This will not work:
unsigned long foo = (unsigned long)(a * b * c * d);
The cast converts the int result to unsigned long, but only after the overflow has already occurred. It merely specifies explicitly the cast that would have been performed implicitly.

Integral literals with a suffix are int if they can fit, in your case 3 and 1024 can definitely fit. This is covered in the draft C99 standard section 6.4.4.1 Integer constants, a quote of this section can be found in my answer to Are C macros implicitly cast?.
Next we have the multiplication, which performs the usual arithmetic conversions conversions on it's operands but since they are all int the result of which is too large to fit in a signed int which results in overflow. This is undefined behavior as per section 5 which says:
If an exceptional condition occurs during the evaluation of an expression (that is, if the
result is not mathematically defined or not in the range of representable values for its
type), the behavior is undefined.
We can discover this undefined behavior empirically using clang and the -fsanitize=undefined flags (see it live) which says:
runtime error: signed integer overflow: 3145728 * 1024 cannot be represented in type 'int'
Although in two complement this will just end up being a negative number. One way to fix this would be to use the ul suffix:
3ul * 1024ul * 1024ul * 1024ul
So why does a negative number converted to an unsigned value give a very large unsigned value? This is covered in section 6.3.1.3 Signed and unsigned integers which says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.49)
which basically means unsigned long max + 1 is added to the negative number which results in very large unsigned value.

What does 'u' mean after a number?

Can you tell me what exactly does the u after a number, for example:
#define NAME_DEFINE 1u

Integer literals like 1 in C code are always of the type int. int is the same thing as signed int. One adds u or U (equivalent) to the literal to ensure it is unsigned int, to prevent various unexpected bugs and strange behavior.
One example of such a bug:
On a 16-bit machine where int is 16 bits, this expression will result in a negative value:
long x = 30000 + 30000;
Both 30000 literals are int, and since both operands are int, the result will be int. A 16-bit signed int can only contain values up to 32767, so it will overflow. x will get a strange, negative value because of this, rather than 60000 as expected.
The code
long x = 30000u + 30000u;
will however behave as expected.

It is a way to define unsigned literal integer constants.

It is a way of telling the compiler that the constant 1 is meant to be used as an unsigned integer. Some compilers assume that any number without a suffix like 'u' is of int type. To avoid this confusion, it is recommended to use a suffix like 'u' when using a constant as an unsigned integer. Other similar suffixes also exist. For example, for float 'f' is used.

it means "unsigned int", basically it functions like a cast to make sure that numeric constants are converted to the appropriate type at compile-time.

A decimal literal in the code (rules for octal and hexadecimal literals are different, see https://en.cppreference.com/w/c/language/integer_constant) has one of the types int, long or long long. From these, the compiler has to choose the smallest type that is large enough to hold the value. Note that the types char, signed char and short are not considered. For example:
0 // this is a zero of type int
32767 // type int
32768 // could be int or long: On systems with 16 bit integers
// the type will be long, because the value does not fit in an int there.
If you add a u suffix to such a number (a capital U will also do), the compiler will instead have to choose the smallest type from unsigned int, unsigned long and unsigned long long. For example:
0u // a zero of type unsigned int
32768u // type unsigned int: always fits into an unsigned int
100000u // unsigned int or unsigned long
The last example can be used to show the difference to a cast:
100000u // always 100000, but may be unsigned int or unsigned long
(unsigned int)100000 // always unsigned int, but not always 100000
// (e.g. if int has only 16 bit)
On a side note: There are situations, where adding a u suffix is the right thing to ensure correctness of computations, as Lundin's answer demonstrates. However, there are also coding guidelines that strictly forbid mixing of signed and unsigned types, even to the extent that the following statement
unsigned int x = 0;
is classified as non-conforming and has to be written as
unsigned int x = 0u;
This can lead to a situation where developers that deal a lot with unsigned values develop the habit of adding u suffixes to literals everywhere. But, be aware that changing signedness can lead to different behavior in various contexts, for example:
(x > 0)
can (depending on the type of x) mean something different than
(x > 0u)
Luckily, the compiler / code checker will typically warn you about suspicious cases. Nevertheless, adding a u suffix should be done with consideration.