Integer arithmetic overflow prevention in C

Integer arithmetic overflow prevention in C - c

I thought the following code might cause an overflow since a * 65535 is larger than what unsigned short int can hold, but the result seems to be correct.
Is there some built-in mechanism in C that stores intermediary arithmetic results in a larger data type? Or is it working as an accident?
unsigned char a = 30;
unsigned short int b = a * 65535 /100;
printf("%hu", b);

It works because all types narrower than int will go under default promotion. Since unsigned char and unsigned short are both smaller than int on your platform, they'll be promoted to int and the result won't overflow if int contains CHAR_BIT + 17 bits or more (which is the result of a * 65535 plus sign) 22 bits or more (which is the number of bits in 30 * 65535 plus sign). However if int has fewer bits then overflow will occur and undefined behavior happens. It won't work if sizeof(unsigned short) == sizeof(int) either
Default promotion allows operations to be done faster (because most CPUs work best with values in its native size) and also prevents some naive overflow from happening. See
Implicit type promotion rules
Will char and short be promoted to int before being demoted in assignment expressions?
Why must a short be converted to an int before arithmetic operations in C and C++?

Related

c = a + b and implicit conversion

With my compiler, c is 54464 (16 bits truncated) and d is 10176.
But with gcc, c is 120000 and d is 600000.
What is the true behavior? Is the behavior undefined? Or is my compiler false?
unsigned short a = 60000;
unsigned short b = 60000;
unsigned long c = a + b;
unsigned long d = a * 10;
Is there an option to alert on these cases?
Wconversion warns on:
void foo(unsigned long a);
foo(a+b);
but doesn't warn on:
unsigned long c = a + b

First, you should know that in C the standard types do not have a specific precision (number of representable values) for the standard integer types. It only requires a minimal precision for each type. These result in the following typical bit sizes, the standard allows for more complex representations:
char: 8 bits
short: 16 bits
int: 16 (!) bits
long: 32 bits
long long (since C99): 64 bits
Note: The actual limits (which imply a certain precision) of an implementation are given in limits.h.
Second, the type an operation is performed is determined by the types of the operands, not the type of the left side of an assignment (becaus assignments are also just expressions). For this the types given above are sorted by conversion rank. Operands with smaller rank than int are converted to int first. For other operands, the one with smaller rank is converted to the type of the other operand. These are the usual arithmetic conversions.
Your implementation seems to use 16 bit unsigned int with the same size as unsigned short, so a and b are converted to unsigned int, the operation is performed with 16 bit. For unsigned, the operation is performed modulo 65536 (2 to the power of 16) - this is called wrap-around (this is not required for signed types!). The result is then converted to unsigned long and assigned to the variables.
For gcc, I assume this compiles for a PC or a 32 bit CPU. for this(unsigned) int has typically 32 bits, while (unsigned) long has at least 32 bits (required). So, there is no wrap around for the operations.
Note: For the PC, the operands are converted to int, not unsigned int. This because int can already represent all values of unsigned short; unsigned int is not required. This can result in unexpected (actually: implementation defined) behaviour if the result of the operation overflows an signed int!
If you need types of defined size, see stdint.h (since C99) for uint16_t, uint32_t. These are typedefs to types with the appropriate size for your implementation.
You can also cast one of the operands (not the whole expression!) to the type of the result:
unsigned long c = (unsigned long)a + b;
or, using types of known size:
#include <stdint.h>
...
uint16_t a = 60000, b = 60000;
uint32_t c = (uint32_t)a + b;
Note that due to the conversion rules, casting one operand is sufficient.
Update (thanks to #chux):
The cast shown above works without problems. However, if a has a larger conversion rank than the typecast, this might truncate its value to the smaller type. While this can be easily avoided as all types are known at compile-time (static typing), an alternative is to multiply with 1 of the wanted type:
unsigned long c = ((unsigned long)1U * a) + b
This way the larger rank of the type given in the cast or a (or b) is used. The multiplication will be eliminated by any reasonable compiler.
Another approach, avoiding to even know the target type name can be done with the typeof() gcc extension:
unsigned long c;
... many lines of code
c = ((typeof(c))1U * a) + b

a + b will be computed as an unsigned int (the fact that it is assigned to an unsigned long is not relevant). The C standard mandates that this sum will wrap around modulo "one plus the largest unsigned possible". On your system, it looks like an unsigned int is 16 bit, so the result is computed modulo 65536.
On the other system, it looks like int and unsigned int are larger, and therefore capable of holding the larger numbers. What happens now is quite subtle (acknowledge #PascalCuoq): Beacuse all values of unsigned short are representable in int, a + b will be computed as an int. (Only if short and int are the same width or, in some other way, some values of unsigned short cannot be represented as int will the sum will be computed as unsigned int.)
Although the C standard does not specify a fixed size for either an unsigned short or an unsigned int, your program behaviour is well-defined. Note that this is not true for an signed type though.
As a final remark, you can use the sized types uint16_t, uint32_t etc. which, if supported by your compiler, are guaranteed to have the specified size.

In C the types char, short (and their unsigned couterparts) and float should be considered to be as "storage" types because they're designed to optimize the storage but are not the "native" size that the CPU prefers and they are never used for computations.
For example when you have two char values and place them in an expression they are first converted to int, then the operation is performed. The reason is that the CPU works better with int. The same happens for float that is always implicitly converted to a double for computations.
In your code the computation a+b is a sum of two unsigned integers; in C there's no way of computing the sum of two unsigned shorts... what you can do is store the final result in an unsigned short that, thanks to the properties of modulo math, will be the same.

What is the proper way to store narrower data types into a wider data type in the C language?

I'm currently fixing a legacy bug in C code. In the process of fixing this bug, I stored an unsigned int into an unsigned long long. But to my surprise, math stopped working when I compiled this code on a 64 bit version of GCC. I discovered that the problem was that when I assigned a long long an int value, then I got a number that looked like 0x0000000012345678, but on the 64-bit machine, that number became 0xFFFFFFFF12345678.
Can someone explain to me or point me to some sort of spec or documentation on what is supposed to happen when storing a smaller data type in a larger one and perhaps what the appropriate pattern for doing this in C is?
Update - Code Sample
Here's what I'm doing:
// Results in 0xFFFFFFFFC0000000 in 64 bit gcc 4.1.2
// Results in 0x00000000C0000000 in 32 bit gcc 3.4.6
u_long foo = 3 * 1024 * 1024 * 1024;

I think you have to tell the compiler that the number on the right is unsigned. Otherwise it thinks it's a normal signed int, and since the sign bit is set, it thinks it's negative, and then it sign-extends it into the receiver.
So do some unsigned casting on the right.

Expressions are generally evaluated independently; their results are not affected by the context in which they appear.
An integer constant like 1024 is of the smallest of int, long int, long long int into which its value will fit; in the particular case of 1024 that's always int.
I'll assume here that u_long is a typedef for unsigned long (though you also mentioned long long in your question).
So given:
unsigned long foo = 3 * 1024 * 1024 * 1024;
the 4 constants in the initialization expression are all of type int, and all three multiplications are int-by-int. The result happens to be greater (by a factor of 1.5) than 231, which means it won't fit in an int on a system where int is 32 bits. The int result, whatever it is, will be implicitly converted to the target type unsigned long, but by that time it's too late; the overflow has already occurred.
The overflow means that your code has undefined behavior (and since this can be determined at compile time, I'd expect your compiler to warn about it). In practice, signed overflow typically wraps around, so the above will typically set foo to -1073741824. You can't count on that (and it's not what you want anyway).
The ideal solution is to avoid the implicit conversions by ensuring that everything is of the target type in the first place:
unsigned long foo = 3UL * 1024UL * 1024UL * 1024UL;
(Strictly speaking only the first operand needs to be of type unsigned long, but it's simpler to be consistent.)
Let's look at the more general case:
int a, b, c, d; /* assume these are initialized */
unsigned long foo = a * b * c * d;
You can't add a UL suffix to a variable. If possible, you should change the declarations of a, b, c, and d so they're of type unsigned long long, but perhaps there's some other reason they need to be of type int. You can add casts to explicitly convert each one to the correct type. By using casts, you can control exactly when the conversions are performed:
unsigned long foo = (unsigned long)a *
(unsigned long)b *
(unsigned long)d *
(unsigned long)d;
This gets a bit verbose; you might consider applying the cast only to the leftmost operand (after making sure you understand how the expression is parsed).
NOTE: This will not work:
unsigned long foo = (unsigned long)(a * b * c * d);
The cast converts the int result to unsigned long, but only after the overflow has already occurred. It merely specifies explicitly the cast that would have been performed implicitly.

Integral literals with a suffix are int if they can fit, in your case 3 and 1024 can definitely fit. This is covered in the draft C99 standard section 6.4.4.1 Integer constants, a quote of this section can be found in my answer to Are C macros implicitly cast?.
Next we have the multiplication, which performs the usual arithmetic conversions conversions on it's operands but since they are all int the result of which is too large to fit in a signed int which results in overflow. This is undefined behavior as per section 5 which says:
If an exceptional condition occurs during the evaluation of an expression (that is, if the
result is not mathematically defined or not in the range of representable values for its
type), the behavior is undefined.
We can discover this undefined behavior empirically using clang and the -fsanitize=undefined flags (see it live) which says:
runtime error: signed integer overflow: 3145728 * 1024 cannot be represented in type 'int'
Although in two complement this will just end up being a negative number. One way to fix this would be to use the ul suffix:
3ul * 1024ul * 1024ul * 1024ul
So why does a negative number converted to an unsigned value give a very large unsigned value? This is covered in section 6.3.1.3 Signed and unsigned integers which says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.49)
which basically means unsigned long max + 1 is added to the negative number which results in very large unsigned value.

Is arithmetic overflow equivalent to modulo operation?

I need to do modulo 256 arithmetic in C. So can I simply do
unsigned char i;
i++;
instead of
int i;
i=(i+1)%256;

No. There is nothing that guarantees that unsigned char has eight bits. Use uint8_t from <stdint.h>, and you'll be perfectly fine. This requires an implementation which supports stdint.h: any C99 compliant compiler does, but older compilers may not provide it.
Note: unsigned arithmetic never overflows, and behaves as "modulo 2^n". Signed arithmetic overflows with undefined behavior.

Yes, the behavior of both of your examples is the same. See C99 6.2.5 §9 :
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.

unsigned char c = UCHAR_MAX;
c++;
Basically yes, there is no overflow, but not because c is of an unsigned type. There is a hidden promotion of c to int here and an integer conversion from int to unsigned char and it is perfectly defined.
For example,
signed char c = SCHAR_MAX;
c++;
is also not undefined behavior, because it is actually equivalent to:
c = (int) c + 1;
and the conversion from int to signed char is implementation-defined here (see c99, 6.3.1.3p3 on integer conversions). To simplify CHAR_BIT == 8 is assumed.
For more information on the example above, I suggest to read this post:
"The Little C Function From Hell"
http://blog.regehr.org/archives/482

Very probably yes, but the reasons for it in this case are actually fairly complicated.
unsigned char i = 255;
i++;
The i++ is equivalent to i = i + 1.
(Well, almost. i++ yields the value of i before it was incremented, so it's really equivalent to (tmp=i; i = i + 1; tmp). But since the result is discarded in this case, that doesn't raise any additional issues.)
Since unsigned char is a narrow type, an unsigned char operand to the + operator is promoted to int (assuming int can hold all possible values in the range of unsigned char). So if i == 255, and UCHAR_MAX == 255, then the result of the addition is 256, and is of type (signed) int.
The assignment implicitly converts the value 256 from int back to unsigned char. Conversion to an unsigned type is well defined; the result is reduced modulo MAX+1, where MAX is the maximum value of the target unsigned type.
If i were declared as an unsigned int:
unsigned int i = UINT_MAX;
i++;
there would be no type conversion, but the semantics of the + operator for unsigned types also specify reduction module MAX+1.
Keep in mind that the value assigned to i is mathematically equivalent to (i+1) % UCHAR_MAX. UCHAR_MAX is usually 255, and is guaranteed to be at least 255, but it can legally be bigger.
There could be an exotic system on which UCHAR_MAX is too be to be stored in a signed int object. This would require UCHAR_MAX > INT_MAX, which means the system would have to have at least 16-bit bytes. On such a system, the promotion would be from unsigned char to unsigned int. The final result would be the same. You're not likely to encounter such a system. I think there are C implementations for some DSPs that have bytes bigger than 8 bits. The number of bits in a byte is specified by CHAR_BIT, defined in <limits.h>.
CHAR_BIT > 8 does not necessarily imply UCHAR_MAX > INT_MAX. For example, you could have CHAR_BIT == 16 and sizeof (int) == 2 i.e., 16-bit bytes and 32 bit ints).

There's another alternative that hasn't been mentioned, if you don't want to use another data type.
unsigned int i;
// ...
i = (i+1) & 0xFF; // 0xFF == 255
This works because the modulo element == 2^n, meaning the range will be [0, 2^n-1] and thus a bitmask will easily keep the value within your desired range. It's possible this method would not be much or any less efficient than the unsigned char/uint8_t version, either, depending on what magic your compiler does behind the scenes and how the targeted system handles non-word loads (for example, some RISC architectures require additional operations to load non-word-size values). This also assumes that your compiler won't detect the usage of power-of-two modulo arithmetic on unsigned values and substitute a bitmask for you, of course, as in cases like that the modulo usage would have greater semantic value (though using that as the basis for your decision is not exactly portable, of course).
An advantage of this method is that you can use it for powers of two that are not also the size of a data type, e.g.
i = (i+1) & 0x1FF; // i %= 512
i = (i+1) & 0x3FF; // i %= 1024
// etc.

This should work fine because it should just overflow back to 0. As was pointed out in a comment on a different answer, you should only do this when the value is unsigned, as you may get undefined behavior with a signed value.
It is probably best to leave this using modulo, however, because the code will be better understood by other people maintaining the code, and a smart compiler may be doing this optimization anyway, which may make it pointless in the first place. Besides, the performance difference will probably be so small that it wouldn't matter in the first place.

It will work if the number of bits that you are using to represent the number is equal to number of bits in binary (unsigned) representation (100000000) of the divisor -1
which in this case is : 9-1= 8 (char)

How to cast or convert an unsigned int to int in C?

My apologies if the question seems weird. I'm debugging my code and this seems to be the problem, but I'm not sure.
Thanks!

It depends on what you want the behaviour to be. An int cannot hold many of the values that an unsigned int can.
You can cast as usual:
int signedInt = (int) myUnsigned;
but this will cause problems if the unsigned value is past the max int can hold. This means half of the possible unsigned values will result in erroneous behaviour unless you specifically watch out for it.
You should probably reexamine how you store values in the first place if you're having to convert for no good reason.
EDIT: As mentioned by ProdigySim in the comments, the maximum value is platform dependent. But you can access it with INT_MAX and UINT_MAX.
For the usual 4-byte types:
4 bytes = (4*8) bits = 32 bits
If all 32 bits are used, as in unsigned, the maximum value will be 2^32 - 1, or 4,294,967,295.
A signed int effectively sacrifices one bit for the sign, so the maximum value will be 2^31 - 1, or 2,147,483,647. Note that this is half of the other value.

Unsigned int can be converted to signed (or vice-versa) by simple expression as shown below :
unsigned int z;
int y=5;
z= (unsigned int)y;
Though not targeted to the question, you would like to read following links :
signed to unsigned conversion in C - is it always safe?
performance of unsigned vs signed integers
Unsigned and signed values in C
What type-conversions are happening?

IMHO this question is an evergreen. As stated in various answers, the assignment of an unsigned value that is not in the range [0,INT_MAX] is implementation defined and might even raise a signal. If the unsigned value is considered to be a two's complement representation of a signed number, the probably most portable way is IMHO the way shown in the following code snippet:
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else if (u >= (unsigned int)INT_MIN)
i = -(int)~u - 1; /*(2)*/
else
i = INT_MIN; /*(3)*/
Branch (1) is obvious and cannot invoke overflow or traps, since it
is value-preserving.
Branch (2) goes through some pains to avoid signed integer overflow
by taking the one's complement of the value by bit-wise NOT, casts it
to 'int' (which cannot overflow now), negates the value and subtracts
one, which can also not overflow here.
Branch (3) provides the poison we have to take on one's complement or
sign/magnitude targets, because the signed integer representation
range is smaller than the two's complement representation range.
This is likely to boil down to a simple move on a two's complement target; at least I've observed such with GCC and CLANG. Also branch (3) is unreachable on such a target -- if one wants to limit the execution to two's complement targets, the code could be condensed to
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else
i = -(int)~u - 1; /*(2)*/
The recipe works with any signed/unsigned type pair, and the code is best put into a macro or inline function so the compiler/optimizer can sort it out. (In which case rewriting the recipe with a ternary operator is helpful. But it's less readable and therefore not a good way to explain the strategy.)
And yes, some of the casts to 'unsigned int' are redundant, but
they might help the casual reader
some compilers issue warnings on signed/unsigned compares, because the implicit cast causes some non-intuitive behavior by language design

If you have a variable unsigned int x;, you can convert it to an int using (int)x.

It's as simple as this:
unsigned int foo;
int bar = 10;
foo = (unsigned int)bar;
Or vice versa...

If an unsigned int and a (signed) int are used in the same expression, the signed int gets implicitly converted to unsigned. This is a rather dangerous feature of the C language, and one you therefore need to be aware of. It may or may not be the cause of your bug. If you want a more detailed answer, you'll have to post some code.

Some explain from C++Primer 5th Page 35
If we assign an out-of-range value to an object of unsigned type, the result is the remainder of the value modulo the number of values the target type can hold.
For example, an 8-bit unsigned char can hold values from 0 through 255, inclusive. If we assign a value outside the range, the compiler assigns the remainder of that value modulo 256.
unsigned char c = -1; // assuming 8-bit chars, c has value 255
If we assign an out-of-range value to an object of signed type, the result is undefined. The program might appear to work, it might crash, or it might produce garbage values.
Page 160:
If any operand is an unsigned type, the type to which the operands are converted depends on the relative sizes of the integral types on the machine.
...
When the signedness differs and the type of the unsigned operand is the same as or larger than that of the signed operand, the signed operand is converted to unsigned.
The remaining case is when the signed operand has a larger type than the unsigned operand. In this case, the result is machine dependent. If all values in the unsigned type fit in the large type, then the unsigned operand is converted to the signed type. If the values don't fit, then the signed operand is converted to the unsigned type.
For example, if the operands are long and unsigned int, and int and long have the same size, the length will be converted to unsigned int. If the long type has more bits, then the unsigned int will be converted to long.
I found reading this book is very helpful.

C: 8x8 -> 16 bit multiply precision guaranteed by integer promotions?

I'm trying to figure out if the C Standard (C90, though I'm working off Derek Jones' annotated C99 book) guarantees that I will not lose precision multiplying two unsigned 8-bit values and storing to a 16-bit result. An example statement is as follows:
unsigned char foo;
unsigned int foo_u16 = foo * 10;
Our Keil 8051 compiler (v7.50 at present) will generate a MUL AB instruction which stores the MSB in the B register and the LSB in the accumulator. If I cast foo to a unsigned int first:
unsigned int foo_u16 = (unsigned int)foo * 10;
then the compiler correctly decides I want a unsigned int there and generates an expensive call to a 16x16 bit integer multiply routine. I would like to argue beyond reasonable doubt that this defensive measure is not necessary. As I read the integer promotions described in 6.3.1.1, the effect of the first line shall be as if foo and 10 were promoted to unsigned int, the multiplication performed, and the result stored as unsigned int in foo_u16. If the compiler knows an instruction that does 8x8->16 bit multiplications without loss of precision, so much the better; but the precision is guaranteed. Am I reading this correctly?
Best regards,
Craig Blome

The promotion is guaranteed, but the promotion is made to signed int type if the range of unsigned char fits into the range of signed int. So (assuming it fits) from the language point of view your
unsigned int foo_u16 = foo * 10;
is equivalent to
unsigned int foo_u16 = (signed) foo * 10;
while what you apparently want is
unsigned int foo_u16 = (unsigned) foo * 10;
The result of the multiplication can be different if it (the result) doesn't fit into the signed int range.
If your compiler interprets it differently, it could be a bug in the compiler (again, under the assumption that range of unsigned char fits into the range of signed int).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight