Declaration of Long (Modifier) in c - c

Original question
I have a piece of code here:
unsigned long int a =100000;
int a =100000UL;
Do the above two lines represent the same thing?
Revised question
#include <stdio.h>
int main(void)
{
long int x=50000*1024000;
printf("%ld\n",x);
return 0;
}
For a long int, my compiler uses 8 bytes, so the max range is (2^63-1). So here 50000*1024000 results in something which is definitely less than the max range of long int So why does my compiler warn of overflow and give the wrong output?

Original question
The two definitions are not the same.
The types of the variables are different — unsigned long versus (signed) int. The behaviour of these types is quite different because of the difference in signedness. They also may have quite different ranges of valid values.
Technically, the numeric constants are different too; the first is a (signed) int unless int cannot hold the value 100,000, in which case it will be (signed) long instead. That will be converted to unsigned long and assigned to the first a. The other constant is an unsigned long value because of the UL integer suffix, and will be converted to int using the normal rules. If int cannot hold the value 100,000, the normal conversion rules will apply. It is legitimate, though very unusual these days, for sizeof(int) == 2 * sizeof(CHAR_BIT) where CHAR_BIT is 8 — so int is a 16-bit signed type. This is normally treated as a short and normally int is a 32-bit signed type, but the standard does not rule out the alternative.
Most likely, the two variants of a both end up holding the value 100,000, but they are not the same because of the difference in signedness.
Revised question
The arithmetic is done in terms of the two operands of the * operator, and those are 50000 and 1024000. Each of those fits in a 32-bit int, so the calculation is done as int — and the result would be 51200000000, but that requires at least 36 bits to represent the value, so you have 32-bit arithmetic overflow, and the result is undefined behaviour.
After the arithmetic is complete, the int result is converted to 64-bit long — not before.
The compiler is correct to warn, and because you invoked undefined behaviour, anything that is printed is 'correct'.
To fix the code, you can write:
#include <stdio.h>
int main(void)
{
long x = 50000L * 1024000L;
printf("%ld\n", x);
return 0;
}
Strictly, you only need one of the two L suffixes, but symmetry suggests using both. You could use one or two (long) casts instead if you prefer. You can save on spaces too, if you wish, but they help the readability of the code.

The long int and int are not necessarily the same, but they might be. Unsigned and signed are not the same thing. Numerical constants can represent the same value without being the same thing, as in 100000 and 100000UL (the former being a signed int, the latter being unsigned long)

Related

Adding or assigning an integer literal to a size_t

In C I see a lot of code that adds or assigns an integer literal to a size_t variable.
size_t foo = 1;
foo += 1;
What conversion takes place here, and can it ever happen that a size_t is "upgraded" to an int and then converted back to a size_t? Would that still wraparound if I was at the max?
size_t foo = SIZE_MAX;
foo += 1;
Is that defined behavior? It's an unsigned type size_t which is having a signed int added to it (that may be a larger type?) and the converted back to a size_t. Is there risk of signed integer overflow?
Would it make sense to write something like foo + bar + (size_t)1 instead of foo + bar + 1? I never see code like that, but I'm wondering if it's necessary if integer promotions are troublesome.
C89 doesn't say how a size_t will be ranked or what exactly it is:
The value of the result is implementation-defined, and its type (an unsigned integral type) is size_t defined in the header.
The current C standard allows for a possibility of an implementation that would cause undefined behavior when executing the following code, however such implementation does not exist, and probably never will:
size_t foo = SIZE_MAX;
foo += 1;
The type size_t is as unsigned type1, with a minimum range:2 [0,65535].
The type size_t may be defined as a synonym for the type unsigned short. The type unsigned short may be defined having 16 precision bits, with the range: [0,65535]. In that case the value of SIZE_MAX is 65535.
The type int may be defined having 16 precision bits (plus one sign bit), two's complement representation, and range: [-65536,65535].
The expression foo += 1, is equivalent to foo = foo + 1 (except that foo is evaluated only once but that is irrelevant here). The variable foo will get promoted using integer promotions3. It will get promoted to type int because type int can represent all values of type size_t and rank of size_t, being a synonym for unsigned short, is lower than the rank of int. Since the maximum values of size_t, and int are the same, the computation causes a signed overflow, causing undefined behavior.
This holds for the current standard, and it should also hold for C89 since it doesn't have any stricter restrictions on types.
Solution for avoiding signed overflow for any imaginable implementation is to use an unsigned int integer constant:
foo += 1u;
In that case if foo has a lower rank than int, it will be promoted to unsigned int using usual arithmetic conversions.
1 (Quoted from ISO/IEC 9899/201x 7.19 Common definitions 2)
size_t
which is the unsigned integer type of the result of the sizeof operator;
2 (Quoted from ISO/IEC 9899/201x 7.20.3 Limits of other integer types 2)
limit of size_t
SIZE_MAX 65535
3 (Quoted from ISO/IEC 9899/201x 6.3.1.1 Boolean, characters, and integers 2)
The following may be used in an expression wherever an int or unsigned int may
be used:
An object or expression with an integer type (other than int or unsigned int)
whose integer conversion rank is less than or equal to the rank of int and
unsigned int.
If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int; otherwise, it is converted to an unsigned
int. These are called the integer promotions. All other types are unchanged by the
integer promotions.
It depends, since size_t is an implementation-defined unsigned integral type.
Operations involving a size_t will therefore introduce promotions, but these depend on what size_t actually is, and what other types involved in the expression actually are.
If size_t was equivalent to a unsigned short (e.g. a 16-bit type) then
size_t foo = 1;
foo += 1;
would (semantically) promote foo to a int, add 1, and then convert the result back to size_t for storing in foo. (I say "semantically", because that is the meaning of the code according to the standard. A compiler is free to apply the "as if" rule - i.e. do anything it likes, as long as it delivers the same net effect).
On another hand, if size_t was equivalent to a long long unsigned (e.g. a 64-bit signed type), then the same code would promote 1 to be of type long long unsigned, add that to the value of foo, and store the result back into foo.
In both cases, the net result is the same unless an overflow occurs. In this case, there is no overflow, since an both int and size_t are guaranteed able to represent the values 1 and 2.
If an overflow occurs (e.g. adding a larger integral value), then the behaviour can vary. Overflow of a signed integral type (e.g. int) results in undefined behaviour. Overflow of an unsigned integral type uses modulo arithmetic.
As to the code
size_t foo = SIZE_MAX;
foo += 1;
it is possible to do the same sort of analysis.
If size_t is equivalent to a unsigned short then foo would be converted to int. If int is equivalent to a signed short, it cannot represent the value of SIZE_MAX, so the conversion will overflow, and the result is undefined behaviour. If int is able to represent a larger range than short int (e.g. it is equivalent to long), then the conversion of foo to int will succeed, incrementing that value will succeed, and storing back to size_t will use modulo arithmetic and produce the result of 0.
If size_t is equivalent to unsigned long, then the value 1 will be converted to unsigned long, adding that to foo will use modulo arithmetic (i.e. produce a result of zero), and that will be stored into foo.
It is possible to do similar analyses assuming that size_t is actually other unsigned integral types.
Note: In modern systems, a size_t that is the same size or smaller than an int is unusual. However, such systems have existed (e.g. Microsoft and Borland C compilers targeting 16-bit MS-DOS on hardware with an 80286 CPU). There are also 16-bit microprocessors still in production, mostly for use in embedded systems with lower power usage and low throughput requirements, and C compilers that target them (e.g. Keil C166 compiler which targets the Infeon XE166 microprocessor family). [Note: I've never had reason to use the Keil compiler but, given its target platform, it would not be a surprise if it supports a 16-bit size_t that is the same size or smaller than the native int type on that platform].
foo += 1 means foo = foo + 1. If size_t is narrower than int (that is, int can represent all values of size_t), then foo is promoted to int in the expression foo + 1.
The only way this could overflow is if INT_MAX == SIZE_MAX. Theoretically that is possible, e.g. 16-bit int and 15-bit size_t. (The latter probably would have 1 padding bit).
More likely, SIZE_MAX will be less than INT_MAX, so the code will be implementation-defined due to out-of-range assignment. Normally the implementation definition is the "obvious" one, high bits are discarded, so the result will be 0.
As a practical decision I would not recommend mangling your code to cater to these cases (15-bit size_t, or non-obvious implementation-definition) which probably have never happened and never will. Instead, you could do some compile-time tests that will give an error if these cases do occur. A compile-time assertion that INT_MAX < SIZE_MAX would be practical in this day and age.

Are the L and LL integer suffixes ever needed? [duplicate]

From an Example
unsigned long x = 12345678UL
We have always learnt that the compiler needs to see only "long" in the above example to set 4 bytes (in 32 bit) of memory. The question is why is should we use L/UL in long constants even after declaring it to be a long.
When a suffix L or UL is not used, the compiler uses the first type that can contain the constant from a list (see details in C99 standard, clause 6.4.4:5. For a decimal constant, the list is int, long int, long long int).
As a consequence, most of the times, it is not necessary to use the suffix. It does not change the meaning of the program. It does not change the meaning of your example initialization of x for most architectures, although it would if you had chosen a number that could not be represented as a long long. See also codebauer's answer for an example where the U part of the suffix is necessary.
There are a couple of circumstances when the programmer may want to set the type of the constant explicitly. One example is when using a variadic function:
printf("%lld", 1LL); // correct, because 1LL has type long long
printf("%lld", 1); // undefined behavior, because 1 has type int
A common reason to use a suffix is ensuring that the result of a computation doesn't overflow. Two examples are:
long x = 10000L * 4096L;
unsigned long long y = 1ULL << 36;
In both examples, without suffixes, the constants would have type int and the computation would be made as int. In each example this incurs a risk of overflow. Using the suffixes means that the computation will be done in a larger type instead, which has sufficient range for the result.
As Lightness Races in Orbit puts it, the litteral's suffix comes before the assignment. In the two examples above, simply declaring x as long and y as unsigned long long is not enough to prevent the overflow in the computation of the expressions assigned to them.
Another example is the comparison x < 12U where variable x has type int. Without the U suffix, the compiler types the constant 12 as an int, and the comparison is therefore a comparison of signed ints.
int x = -3;
printf("%d\n", x < 12); // prints 1 because it's true that -3 < 12
With the U suffix, the comparison becomes a comparison of unsigned ints. “Usual arithmetic conversions” mean that -3 is converted to a large unsigned int:
printf("%d\n", x < 12U); // prints 0 because (unsigned int)-3 is large
In fact, the type of a constant may even change the result of an arithmetic computation, again because of the way “usual arithmetic conversions” work.
Note that, for decimal constants, the list of types suggested by C99 does not contain unsigned long long. In C90, the list ended with the largest standardized unsigned integer type at the time (which was unsigned long). A consequence was that the meaning of some programs was changed by adding the standard type long long to C99: the same constant that was typed as unsigned long in C90 could now be typed as a signed long long instead. I believe this is the reason why in C99, it was decided not to have unsigned long long in the list of types for decimal constants.
See this and this blog posts for an example.
Because numerical literals are of typicaly of type int. The UL/L tells the compiler that they are not of type int, e.g. assuming 32bit int and 64bit long
long i = 0xffff;
long j = 0xffffUL;
Here the values on the right must be converted to signed longs (32bit -> 64bit)
The "0xffff", an int, would converted to a long using sign extension, resulting in a negative value (0xffffffff)
The "0xffffUL", an unsigned long, would be converted to a long, resulting in a positive value (0x0000ffff)
The question is why is should we use L/UL in long constants even after declaring it to be a long.
Because it's not "after"; it's "before".
First you have the literal, then it is converted to whatever the type is of the variable you're trying to squeeze it into.
They are two objects. The type of the target is designated by the unsigned long keywords, as you've said. The type of the source is designated by this suffix because that's the only way to specify the type of a literal.
Related to this post is why a u.
A reason for u is to allow an integer constant greater than LLONG_MAX in decimal form.
// Likely to generate a warning.
unsigned long long limit63bit = 18446744073709551615; // 2^64 - 1
// OK
unsigned long long limit63bit = 18446744073709551615u;

What is the proper way to store narrower data types into a wider data type in the C language?

I'm currently fixing a legacy bug in C code. In the process of fixing this bug, I stored an unsigned int into an unsigned long long. But to my surprise, math stopped working when I compiled this code on a 64 bit version of GCC. I discovered that the problem was that when I assigned a long long an int value, then I got a number that looked like 0x0000000012345678, but on the 64-bit machine, that number became 0xFFFFFFFF12345678.
Can someone explain to me or point me to some sort of spec or documentation on what is supposed to happen when storing a smaller data type in a larger one and perhaps what the appropriate pattern for doing this in C is?
Update - Code Sample
Here's what I'm doing:
// Results in 0xFFFFFFFFC0000000 in 64 bit gcc 4.1.2
// Results in 0x00000000C0000000 in 32 bit gcc 3.4.6
u_long foo = 3 * 1024 * 1024 * 1024;
I think you have to tell the compiler that the number on the right is unsigned. Otherwise it thinks it's a normal signed int, and since the sign bit is set, it thinks it's negative, and then it sign-extends it into the receiver.
So do some unsigned casting on the right.
Expressions are generally evaluated independently; their results are not affected by the context in which they appear.
An integer constant like 1024 is of the smallest of int, long int, long long int into which its value will fit; in the particular case of 1024 that's always int.
I'll assume here that u_long is a typedef for unsigned long (though you also mentioned long long in your question).
So given:
unsigned long foo = 3 * 1024 * 1024 * 1024;
the 4 constants in the initialization expression are all of type int, and all three multiplications are int-by-int. The result happens to be greater (by a factor of 1.5) than 231, which means it won't fit in an int on a system where int is 32 bits. The int result, whatever it is, will be implicitly converted to the target type unsigned long, but by that time it's too late; the overflow has already occurred.
The overflow means that your code has undefined behavior (and since this can be determined at compile time, I'd expect your compiler to warn about it). In practice, signed overflow typically wraps around, so the above will typically set foo to -1073741824. You can't count on that (and it's not what you want anyway).
The ideal solution is to avoid the implicit conversions by ensuring that everything is of the target type in the first place:
unsigned long foo = 3UL * 1024UL * 1024UL * 1024UL;
(Strictly speaking only the first operand needs to be of type unsigned long, but it's simpler to be consistent.)
Let's look at the more general case:
int a, b, c, d; /* assume these are initialized */
unsigned long foo = a * b * c * d;
You can't add a UL suffix to a variable. If possible, you should change the declarations of a, b, c, and d so they're of type unsigned long long, but perhaps there's some other reason they need to be of type int. You can add casts to explicitly convert each one to the correct type. By using casts, you can control exactly when the conversions are performed:
unsigned long foo = (unsigned long)a *
(unsigned long)b *
(unsigned long)d *
(unsigned long)d;
This gets a bit verbose; you might consider applying the cast only to the leftmost operand (after making sure you understand how the expression is parsed).
NOTE: This will not work:
unsigned long foo = (unsigned long)(a * b * c * d);
The cast converts the int result to unsigned long, but only after the overflow has already occurred. It merely specifies explicitly the cast that would have been performed implicitly.
Integral literals with a suffix are int if they can fit, in your case 3 and 1024 can definitely fit. This is covered in the draft C99 standard section 6.4.4.1 Integer constants, a quote of this section can be found in my answer to Are C macros implicitly cast?.
Next we have the multiplication, which performs the usual arithmetic conversions conversions on it's operands but since they are all int the result of which is too large to fit in a signed int which results in overflow. This is undefined behavior as per section 5 which says:
If an exceptional condition occurs during the evaluation of an expression (that is, if the
result is not mathematically defined or not in the range of representable values for its
type), the behavior is undefined.
We can discover this undefined behavior empirically using clang and the -fsanitize=undefined flags (see it live) which says:
runtime error: signed integer overflow: 3145728 * 1024 cannot be represented in type 'int'
Although in two complement this will just end up being a negative number. One way to fix this would be to use the ul suffix:
3ul * 1024ul * 1024ul * 1024ul
So why does a negative number converted to an unsigned value give a very large unsigned value? This is covered in section 6.3.1.3 Signed and unsigned integers which says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.49)
which basically means unsigned long max + 1 is added to the negative number which results in very large unsigned value.

What does 'u' mean after a number?

Can you tell me what exactly does the u after a number, for example:
#define NAME_DEFINE 1u
Integer literals like 1 in C code are always of the type int. int is the same thing as signed int. One adds u or U (equivalent) to the literal to ensure it is unsigned int, to prevent various unexpected bugs and strange behavior.
One example of such a bug:
On a 16-bit machine where int is 16 bits, this expression will result in a negative value:
long x = 30000 + 30000;
Both 30000 literals are int, and since both operands are int, the result will be int. A 16-bit signed int can only contain values up to 32767, so it will overflow. x will get a strange, negative value because of this, rather than 60000 as expected.
The code
long x = 30000u + 30000u;
will however behave as expected.
It is a way to define unsigned literal integer constants.
It is a way of telling the compiler that the constant 1 is meant to be used as an unsigned integer. Some compilers assume that any number without a suffix like 'u' is of int type. To avoid this confusion, it is recommended to use a suffix like 'u' when using a constant as an unsigned integer. Other similar suffixes also exist. For example, for float 'f' is used.
it means "unsigned int", basically it functions like a cast to make sure that numeric constants are converted to the appropriate type at compile-time.
A decimal literal in the code (rules for octal and hexadecimal literals are different, see https://en.cppreference.com/w/c/language/integer_constant) has one of the types int, long or long long. From these, the compiler has to choose the smallest type that is large enough to hold the value. Note that the types char, signed char and short are not considered. For example:
0 // this is a zero of type int
32767 // type int
32768 // could be int or long: On systems with 16 bit integers
// the type will be long, because the value does not fit in an int there.
If you add a u suffix to such a number (a capital U will also do), the compiler will instead have to choose the smallest type from unsigned int, unsigned long and unsigned long long. For example:
0u // a zero of type unsigned int
32768u // type unsigned int: always fits into an unsigned int
100000u // unsigned int or unsigned long
The last example can be used to show the difference to a cast:
100000u // always 100000, but may be unsigned int or unsigned long
(unsigned int)100000 // always unsigned int, but not always 100000
// (e.g. if int has only 16 bit)
On a side note: There are situations, where adding a u suffix is the right thing to ensure correctness of computations, as Lundin's answer demonstrates. However, there are also coding guidelines that strictly forbid mixing of signed and unsigned types, even to the extent that the following statement
unsigned int x = 0;
is classified as non-conforming and has to be written as
unsigned int x = 0u;
This can lead to a situation where developers that deal a lot with unsigned values develop the habit of adding u suffixes to literals everywhere. But, be aware that changing signedness can lead to different behavior in various contexts, for example:
(x > 0)
can (depending on the type of x) mean something different than
(x > 0u)
Luckily, the compiler / code checker will typically warn you about suspicious cases. Nevertheless, adding a u suffix should be done with consideration.

How to cast or convert an unsigned int to int in C?

My apologies if the question seems weird. I'm debugging my code and this seems to be the problem, but I'm not sure.
Thanks!
It depends on what you want the behaviour to be. An int cannot hold many of the values that an unsigned int can.
You can cast as usual:
int signedInt = (int) myUnsigned;
but this will cause problems if the unsigned value is past the max int can hold. This means half of the possible unsigned values will result in erroneous behaviour unless you specifically watch out for it.
You should probably reexamine how you store values in the first place if you're having to convert for no good reason.
EDIT: As mentioned by ProdigySim in the comments, the maximum value is platform dependent. But you can access it with INT_MAX and UINT_MAX.
For the usual 4-byte types:
4 bytes = (4*8) bits = 32 bits
If all 32 bits are used, as in unsigned, the maximum value will be 2^32 - 1, or 4,294,967,295.
A signed int effectively sacrifices one bit for the sign, so the maximum value will be 2^31 - 1, or 2,147,483,647. Note that this is half of the other value.
Unsigned int can be converted to signed (or vice-versa) by simple expression as shown below :
unsigned int z;
int y=5;
z= (unsigned int)y;
Though not targeted to the question, you would like to read following links :
signed to unsigned conversion in C - is it always safe?
performance of unsigned vs signed integers
Unsigned and signed values in C
What type-conversions are happening?
IMHO this question is an evergreen. As stated in various answers, the assignment of an unsigned value that is not in the range [0,INT_MAX] is implementation defined and might even raise a signal. If the unsigned value is considered to be a two's complement representation of a signed number, the probably most portable way is IMHO the way shown in the following code snippet:
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else if (u >= (unsigned int)INT_MIN)
i = -(int)~u - 1; /*(2)*/
else
i = INT_MIN; /*(3)*/
Branch (1) is obvious and cannot invoke overflow or traps, since it
is value-preserving.
Branch (2) goes through some pains to avoid signed integer overflow
by taking the one's complement of the value by bit-wise NOT, casts it
to 'int' (which cannot overflow now), negates the value and subtracts
one, which can also not overflow here.
Branch (3) provides the poison we have to take on one's complement or
sign/magnitude targets, because the signed integer representation
range is smaller than the two's complement representation range.
This is likely to boil down to a simple move on a two's complement target; at least I've observed such with GCC and CLANG. Also branch (3) is unreachable on such a target -- if one wants to limit the execution to two's complement targets, the code could be condensed to
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else
i = -(int)~u - 1; /*(2)*/
The recipe works with any signed/unsigned type pair, and the code is best put into a macro or inline function so the compiler/optimizer can sort it out. (In which case rewriting the recipe with a ternary operator is helpful. But it's less readable and therefore not a good way to explain the strategy.)
And yes, some of the casts to 'unsigned int' are redundant, but
they might help the casual reader
some compilers issue warnings on signed/unsigned compares, because the implicit cast causes some non-intuitive behavior by language design
If you have a variable unsigned int x;, you can convert it to an int using (int)x.
It's as simple as this:
unsigned int foo;
int bar = 10;
foo = (unsigned int)bar;
Or vice versa...
If an unsigned int and a (signed) int are used in the same expression, the signed int gets implicitly converted to unsigned. This is a rather dangerous feature of the C language, and one you therefore need to be aware of. It may or may not be the cause of your bug. If you want a more detailed answer, you'll have to post some code.
Some explain from C++Primer 5th Page 35
If we assign an out-of-range value to an object of unsigned type, the result is the remainder of the value modulo the number of values the target type can hold.
For example, an 8-bit unsigned char can hold values from 0 through 255, inclusive. If we assign a value outside the range, the compiler assigns the remainder of that value modulo 256.
unsigned char c = -1; // assuming 8-bit chars, c has value 255
If we assign an out-of-range value to an object of signed type, the result is undefined. The program might appear to work, it might crash, or it might produce garbage values.
Page 160:
If any operand is an unsigned type, the type to which the operands are converted depends on the relative sizes of the integral types on the machine.
...
When the signedness differs and the type of the unsigned operand is the same as or larger than that of the signed operand, the signed operand is converted to unsigned.
The remaining case is when the signed operand has a larger type than the unsigned operand. In this case, the result is machine dependent. If all values in the unsigned type fit in the large type, then the unsigned operand is converted to the signed type. If the values don't fit, then the signed operand is converted to the unsigned type.
For example, if the operands are long and unsigned int, and int and long have the same size, the length will be converted to unsigned int. If the long type has more bits, then the unsigned int will be converted to long.
I found reading this book is very helpful.

Resources