32 bit unsigned multiply on 64 bit causing undefined behavior?

32 bit unsigned multiply on 64 bit causing undefined behavior? - c

So I have about this code:
uint32_t s1 = 0xFFFFFFFFU;
uint32_t s2 = 0xFFFFFFFFU;
uint32_t v;
...
v = s1 * s2; /* Only need the low 32 bits of the result */
In all the followings I assume the compiler couldn't have any preconceptions on the range of s1 or s2, the initializers only serving for an example above.
If I compiled this on a compiler with an integer size of 32 bits (such as when compiling for x86), no problem. The compiler would simply use s1 and s2 as uint32_t typed values (not being able to promote them further), and the multiplication would simply give the result as the comment says (modulo UINT_MAX + 1 which is 0x100000000 this case).
However if I compiled this on a compiler with an integer size of 64 bits (such as for x86-64), there might be undefined behavior from what I can deduce from the C standard. Integer promotion would see uint32_t can be promoted to int (64 bit signed), the multiplication would then attempt to multiply two int's, which, if they happen to have the values shown in the example, would cause an integer overflow, which is undefined behavior.
Am I correct with this and if so how would you avoid it in a sane way?
I spotted this question which is similar, but covers C++: What's the best C++ way to multiply unsigned integers modularly safely?. Here I would like to get an answer applicable to C (preferably C89 compatible). I wouldn't consider making a poor 32 bit machine potentially executing a 64 bit multiply an acceptable answer though (usually in code where this would be of concern, 32 bit performance might be more critical as typically those are the slower machines).
Note that the same problem can apply to 16 bit unsigned ints when compiled with a compiler having a 32 bit int size, or unsigned chars when compiled with a compiler having a 16 bit int size (the latter might be common with compilers for 8 bit CPUs: the C standard requires integers to be at least 16 bits, so a conforming compiler is likely affected).

The simplest way to get the multiplication to happen in an unsigned type that is at least uint32_t, and also at least unsigned int, is to involve an expression of type unsigned int.
v = 1U * s1 * s2;
This either converts 1U to uint32_t, or s1 and s2 to unsigned int, depending on what's appropriate for your particular platform.
#Deduplicator comments that some compilers, where uint32_t is narrower than unsigned int, may warn about the implicit conversion in the assignment, and notes that such warnings are likely suppressable by making the conversion explicit:
v = (uint32_t) (1U * s1 * S2);
It looks a bit less elegant, in my opinion, though.

Congratulations on finding a friction point.
A possible way:
v = (uint32_t) (UINT_MAX<=0xffffffff
? s1 * s2
: (unsigned)s1 * (unsigned)s2);
Anyway, looks like adding some typedefs to <stdint.h> for types guaranteed to be no smaller than int would be in order ;-).

Related

c = a + b and implicit conversion

With my compiler, c is 54464 (16 bits truncated) and d is 10176.
But with gcc, c is 120000 and d is 600000.
What is the true behavior? Is the behavior undefined? Or is my compiler false?
unsigned short a = 60000;
unsigned short b = 60000;
unsigned long c = a + b;
unsigned long d = a * 10;
Is there an option to alert on these cases?
Wconversion warns on:
void foo(unsigned long a);
foo(a+b);
but doesn't warn on:
unsigned long c = a + b

First, you should know that in C the standard types do not have a specific precision (number of representable values) for the standard integer types. It only requires a minimal precision for each type. These result in the following typical bit sizes, the standard allows for more complex representations:
char: 8 bits
short: 16 bits
int: 16 (!) bits
long: 32 bits
long long (since C99): 64 bits
Note: The actual limits (which imply a certain precision) of an implementation are given in limits.h.
Second, the type an operation is performed is determined by the types of the operands, not the type of the left side of an assignment (becaus assignments are also just expressions). For this the types given above are sorted by conversion rank. Operands with smaller rank than int are converted to int first. For other operands, the one with smaller rank is converted to the type of the other operand. These are the usual arithmetic conversions.
Your implementation seems to use 16 bit unsigned int with the same size as unsigned short, so a and b are converted to unsigned int, the operation is performed with 16 bit. For unsigned, the operation is performed modulo 65536 (2 to the power of 16) - this is called wrap-around (this is not required for signed types!). The result is then converted to unsigned long and assigned to the variables.
For gcc, I assume this compiles for a PC or a 32 bit CPU. for this(unsigned) int has typically 32 bits, while (unsigned) long has at least 32 bits (required). So, there is no wrap around for the operations.
Note: For the PC, the operands are converted to int, not unsigned int. This because int can already represent all values of unsigned short; unsigned int is not required. This can result in unexpected (actually: implementation defined) behaviour if the result of the operation overflows an signed int!
If you need types of defined size, see stdint.h (since C99) for uint16_t, uint32_t. These are typedefs to types with the appropriate size for your implementation.
You can also cast one of the operands (not the whole expression!) to the type of the result:
unsigned long c = (unsigned long)a + b;
or, using types of known size:
#include <stdint.h>
...
uint16_t a = 60000, b = 60000;
uint32_t c = (uint32_t)a + b;
Note that due to the conversion rules, casting one operand is sufficient.
Update (thanks to #chux):
The cast shown above works without problems. However, if a has a larger conversion rank than the typecast, this might truncate its value to the smaller type. While this can be easily avoided as all types are known at compile-time (static typing), an alternative is to multiply with 1 of the wanted type:
unsigned long c = ((unsigned long)1U * a) + b
This way the larger rank of the type given in the cast or a (or b) is used. The multiplication will be eliminated by any reasonable compiler.
Another approach, avoiding to even know the target type name can be done with the typeof() gcc extension:
unsigned long c;
... many lines of code
c = ((typeof(c))1U * a) + b

a + b will be computed as an unsigned int (the fact that it is assigned to an unsigned long is not relevant). The C standard mandates that this sum will wrap around modulo "one plus the largest unsigned possible". On your system, it looks like an unsigned int is 16 bit, so the result is computed modulo 65536.
On the other system, it looks like int and unsigned int are larger, and therefore capable of holding the larger numbers. What happens now is quite subtle (acknowledge #PascalCuoq): Beacuse all values of unsigned short are representable in int, a + b will be computed as an int. (Only if short and int are the same width or, in some other way, some values of unsigned short cannot be represented as int will the sum will be computed as unsigned int.)
Although the C standard does not specify a fixed size for either an unsigned short or an unsigned int, your program behaviour is well-defined. Note that this is not true for an signed type though.
As a final remark, you can use the sized types uint16_t, uint32_t etc. which, if supported by your compiler, are guaranteed to have the specified size.

In C the types char, short (and their unsigned couterparts) and float should be considered to be as "storage" types because they're designed to optimize the storage but are not the "native" size that the CPU prefers and they are never used for computations.
For example when you have two char values and place them in an expression they are first converted to int, then the operation is performed. The reason is that the CPU works better with int. The same happens for float that is always implicitly converted to a double for computations.
In your code the computation a+b is a sum of two unsigned integers; in C there's no way of computing the sum of two unsigned shorts... what you can do is store the final result in an unsigned short that, thanks to the properties of modulo math, will be the same.

Type casting gives different results for different compilers

This is just a basic stuff ,but still confused with this
My code consists of a line
signed char var = 0x80;
unsigned short temp_val;
temp_val = (unsigned short)var ;
When I compiled this code using XC32 compiler and executed the program the result obtained was 0xFF80 as expected.
But when compiled using CCS Compiler for MSP430 the result obtained was 0x0080
Why this difference with respect to compilers?
Can somebody please explain this from a Processor's view of type casting?

This is signed integer overflow because 0x80 equals 128 and, if CHAR_BIT is 8 as you say, then signed char only goes up to 127.
signed char var = 0x80;
Signed integer overflow produces undefined behavior.
Try with signed char var = -128 instead.
If that produces the same result, it's a compiler bug. Converting a negative number to an unsigned type subtracts it from the corresponding modulus, which is one greater than the maximum value of that type.

That will happen if char has more than 8 bits, which it is certainly allowed to. (It can't have fewer.) It would also happen if short turned out to be only 8 bits, which wouldn't conform to standard C, but the CCS compiler only claims to be "97%" compliant.
The size of the various integer datatypes is conditioned by the processor, but ultimately it is defined by the compiler, which might even have options to change the sizes.
I don't know anything about the MSP430, but Google told me it is a 16-bit chip. It is possible that its memory is not byte-addressable, in which case a char would have to occupy the full 16 bits. (Edit: apparently, it is byte-addressable, so scratch that theory.)

Any char can be, to the compiler, various lengths depending on the compiler settings, including 7 bits (rare these days), 8 bits (what most people think it is), 16 bits (getting more common especially with Unicode), I have even come across machine/compiler combinations that would store each char in a 64 bit locations.
This sort of discrepancy between compilers are why MISRA C has rules against:
Base Types
Bit operations on signed types
Just because the chip is byte addressable it doesn't mean any given compiler uses it that way by default since byte operations on 16 and bigger storage systems can incur a processing overhead some compilers default to memory size operations to optimise speed rather than storage.
I would suggest searching the compilers documentation for pack.
A few years ago the default Solaris compiler used a 64 bit location for each field in a bit field - luckily for us Solaris gcc didn't so we were able to complete the project.
It would also be interesting to try:
signed char var = 0x80;
signed short temp_val1;
unsigned short temp_val2;
temp_val1 = (signed short)var ;
temp_val2 = (unsigned short)temp_val1;
This might highlight problems with sign extending in that compiler.

Visual Studio casting issue

I'm trying to do, what I imagined to be, a fairly basic task. I have two unsigned char variables and I'm trying to combine them into a single signed int. The problem here is that the unsigned chars start as signed chars, so I have to cast them to unsigned first.
I've done this task in three IDE's; MPLAB (as this is an embedded application), MATLAB, and now trying to do it in visual studio. Visual is the only one having problems with the casting.
For an example, two signed chars are -5 and 94. In MPLAB I first cast the two chars into unsigned chars:
unsigned char a = (unsigned char)-5;
unsigned char b = (unsigned char)94;
This gives me 251 and 94 respectively. I then want to do some bitshifting and concat:
int c = (int)((((unsigned int) a) << 8) | (unsigned int) b);
In MPLAB and MATLAB this gives me the right signed value of -1186. However, the exact same code in visual refuses to output results as a signed value, only unsigned (64350). This has been checked by both debugging and stepping through the code and printing the results:
printf("%d\n", c);
What am I doing wrong? This is driving me insane. The application is an electronic device that collects sensor data, then stores it on an SD card for later decoding using a program written in C. I technically could do all the calculations in MPLAB and then store those on the SDCARD, but I refuse to let Microsoft win.
I understand my method of casting is very unoptimised and you could probably do it in one line, but having had this problem for a couple of days now I've tried to break the steps down as much as possible.
Any help is most appreciated!

The problem is that an int on most systems is 32-bits. If you concatenate two 8-bit quantities and store it into a 32-bit quantity, you will get a positive integer because you are not setting the sign bit, which is the most significant bit. More specifically, you are only populating the lower 16 bits of a 32-bit integer, which will naturally be interpreted as a positive number.
You can fix this by explicitly using as 16-bit signed int.
#include <stdio.h>
#include <stdint.h>
int main() {
unsigned char a = (unsigned char)-5;
unsigned char b = (unsigned char)94;
int16_t c = (int16_t)((((unsigned int) a) << 8) | (unsigned int) b);
printf("%d\n", c);
}
Note that I am on a Linux system, so you will probably have to change stdint.h to the Microsoft equivalent, and possibly change int16_t to whatever Microsoft calls their 16-bit signed integer type, if it is different, but this should work with those modifications.

This is the correct behavior of the standard C language. When you convert an unsigned to a signed type, the language does not perform sign extension, i.e. it does not propagate the highest bit of the unsigned into the sign bit of the signed type.
You can fix your problem by casting a to a signed char, like this:
unsigned char a = (unsigned char)-5;
unsigned char b = (unsigned char)94;
int c = (signed char)a << 8 | b;
printf("%d\n", c); // Prints -1186
Now that a is treated as signed, the language propagates its top bit into the sign bit of the 32-bit int, making the result negative.
Demo on ideone.

Converting an out-of-range unsigned value to a signed value causes implementation-defined behaviour, which means that the compiler must document what it does in this situation; and different compilers can do different things.
In C99 there is also a provision that the compiler may raise a signal in this case (terminating the program if you don't have a signal handler). I believe it was undefined behaviour in C89, but C99 tightened this up a bit.
Is there some reason you can't go:
signed char x = -5;
signed char y = 94;
int c = x * 256 + y;
?
BTW if you are OK with implementation-defined behaviour, and your system has a 16-bit type then you can just go, with #include <stdint.h>,
int c = (int16_t)(x * 256 + y);
To explain, C deals in values. In math, 251 * 256 + 94 is a positive number, and C is no exception to that. The bit-shift operators are just *2 and /2 in disguise. If you want your value to be reduced (mod 65536) you have to specifically request that.
If you also think in terms of values rather than representations, you don't have to worry about things like sign bits and sign extension.

Is arithmetic overflow equivalent to modulo operation?

I need to do modulo 256 arithmetic in C. So can I simply do
unsigned char i;
i++;
instead of
int i;
i=(i+1)%256;

No. There is nothing that guarantees that unsigned char has eight bits. Use uint8_t from <stdint.h>, and you'll be perfectly fine. This requires an implementation which supports stdint.h: any C99 compliant compiler does, but older compilers may not provide it.
Note: unsigned arithmetic never overflows, and behaves as "modulo 2^n". Signed arithmetic overflows with undefined behavior.

Yes, the behavior of both of your examples is the same. See C99 6.2.5 §9 :
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.

unsigned char c = UCHAR_MAX;
c++;
Basically yes, there is no overflow, but not because c is of an unsigned type. There is a hidden promotion of c to int here and an integer conversion from int to unsigned char and it is perfectly defined.
For example,
signed char c = SCHAR_MAX;
c++;
is also not undefined behavior, because it is actually equivalent to:
c = (int) c + 1;
and the conversion from int to signed char is implementation-defined here (see c99, 6.3.1.3p3 on integer conversions). To simplify CHAR_BIT == 8 is assumed.
For more information on the example above, I suggest to read this post:
"The Little C Function From Hell"
http://blog.regehr.org/archives/482

Very probably yes, but the reasons for it in this case are actually fairly complicated.
unsigned char i = 255;
i++;
The i++ is equivalent to i = i + 1.
(Well, almost. i++ yields the value of i before it was incremented, so it's really equivalent to (tmp=i; i = i + 1; tmp). But since the result is discarded in this case, that doesn't raise any additional issues.)
Since unsigned char is a narrow type, an unsigned char operand to the + operator is promoted to int (assuming int can hold all possible values in the range of unsigned char). So if i == 255, and UCHAR_MAX == 255, then the result of the addition is 256, and is of type (signed) int.
The assignment implicitly converts the value 256 from int back to unsigned char. Conversion to an unsigned type is well defined; the result is reduced modulo MAX+1, where MAX is the maximum value of the target unsigned type.
If i were declared as an unsigned int:
unsigned int i = UINT_MAX;
i++;
there would be no type conversion, but the semantics of the + operator for unsigned types also specify reduction module MAX+1.
Keep in mind that the value assigned to i is mathematically equivalent to (i+1) % UCHAR_MAX. UCHAR_MAX is usually 255, and is guaranteed to be at least 255, but it can legally be bigger.
There could be an exotic system on which UCHAR_MAX is too be to be stored in a signed int object. This would require UCHAR_MAX > INT_MAX, which means the system would have to have at least 16-bit bytes. On such a system, the promotion would be from unsigned char to unsigned int. The final result would be the same. You're not likely to encounter such a system. I think there are C implementations for some DSPs that have bytes bigger than 8 bits. The number of bits in a byte is specified by CHAR_BIT, defined in <limits.h>.
CHAR_BIT > 8 does not necessarily imply UCHAR_MAX > INT_MAX. For example, you could have CHAR_BIT == 16 and sizeof (int) == 2 i.e., 16-bit bytes and 32 bit ints).

There's another alternative that hasn't been mentioned, if you don't want to use another data type.
unsigned int i;
// ...
i = (i+1) & 0xFF; // 0xFF == 255
This works because the modulo element == 2^n, meaning the range will be [0, 2^n-1] and thus a bitmask will easily keep the value within your desired range. It's possible this method would not be much or any less efficient than the unsigned char/uint8_t version, either, depending on what magic your compiler does behind the scenes and how the targeted system handles non-word loads (for example, some RISC architectures require additional operations to load non-word-size values). This also assumes that your compiler won't detect the usage of power-of-two modulo arithmetic on unsigned values and substitute a bitmask for you, of course, as in cases like that the modulo usage would have greater semantic value (though using that as the basis for your decision is not exactly portable, of course).
An advantage of this method is that you can use it for powers of two that are not also the size of a data type, e.g.
i = (i+1) & 0x1FF; // i %= 512
i = (i+1) & 0x3FF; // i %= 1024
// etc.

This should work fine because it should just overflow back to 0. As was pointed out in a comment on a different answer, you should only do this when the value is unsigned, as you may get undefined behavior with a signed value.
It is probably best to leave this using modulo, however, because the code will be better understood by other people maintaining the code, and a smart compiler may be doing this optimization anyway, which may make it pointless in the first place. Besides, the performance difference will probably be so small that it wouldn't matter in the first place.

It will work if the number of bits that you are using to represent the number is equal to number of bits in binary (unsigned) representation (100000000) of the divisor -1
which in this case is : 9-1= 8 (char)

Padding bits in unsigned integers and bitwise operations in C89

I have a lot of code that performs bitwise operations on unsigned integers. I wrote my code with the assumption that those operations were on integers of fixed width without any padding bits. For example an array of 32-bit unsigned integers of which all 32 bits available for each integer.
I'm looking to make my code more portable and I'm focused on making sure I'm C89 compliant (in this case). One of the issues that I've come across is possible padded integers. Take this extreme example, taken from the GMP manual:
However on Cray vector systems it may be noted that short and int are always stored in 8 bytes (and with sizeof indicating that) but use only 32 or 46 bits. The nails feature can account for this, by passing for instance 8*sizeof(int)-INT_BIT.
I've also read about this type of padding in other places. I actually read of a post on SO last night (forgive me, I don't have the link and I'm going to cite something similar from memory) where if you have, say, a double with 60 usable bits the other 4 could be used for padding and those padding bits could serve some internal purpose so they cannot be modified.
So let's say for example my code is compiled on a platform where an unsigned int type is sized at 4 bytes, each byte being 8 bits, however the most significant 2 bits are padding bits. Would UINT_MAX in that case be 0x3FFFFFFF (1073741823)?
#include <stdio.h>
#include <stdlib.h>
/* padding bits represented by underscores */
int main( int argc, char **argv )
{
unsigned int a = 0x2AAAAAAA; /* __101010101010101010101010101010 */
unsigned int b = 0x15555555; /* __010101010101010101010101010101 */
unsigned int c = a ^ b; /* ?? __111111111111111111111111111111 */
unsigned int d = c << 5; /* ?? __111111111111111111111111100000 */
unsigned int e = d >> 5; /* ?? __000001111111111111111111111111 */
printf( "a: %X\nb: %X\nc: %X\nd: %X\ne: %X\n", a, b, c, d, e );
return 0;
}
Is it safe to XOR two integers with padding bits?
Wouldn't I XOR whatever the padding bits are?
I can't find this behavior covered in C89.
Furthermore is the c variable guaranteed to be 0x3FFFFFFF or if for example the two padding bits were both on in a or b would c be 0xFFFFFFFF?
Same question with d and e. Am I manipulating the padding bits by shifting?
I would expect to see this below, assuming 32 bits with the 2 most significant bits used for padding, but I want to know if something like this is guaranteed:
a: 2AAAAAAA
b: 15555555
c: 3FFFFFFF
d: 3FFFFFE0
e: 01FFFFFF
Also are padding bits always the most significant bits or could they be the least significant bits?
EDIT 12/19/2010 5PM EST: Christoph has answered my question. Thanks!
I had also asked (above) whether padding bits are always the most significant bits. This is cited in the rationale for the C99 standard, and the answer is no. I am playing it safe and assuming the same for C89. Here is specifically what the C99 rationale says for §6.2.6.2 (Representation of Integer Types):
Padding bits are user-accessible in an unsigned integer type. For example, suppose a machine uses a pair of 16-bit shorts (each with its own sign bit) to make up a 32-bit int and the sign bit of the lower short is ignored when used in this 32-bit int. Then, as a 32-bit signed int, there is a padding bit (in the middle of the 32 bits) that is ignored in determining the value of the 32-bit signed int. But, if this 32-bit item is treated as a 32-bit unsigned int, then that padding bit is visible to the user’s program. The C committee was told that there is a machine that works this way, and that is one reason that padding bits were added to C99.
Footnotes 44 and 45 mention that parity bits might be padding bits. The committee does not know of any machines with user-accessible parity bits within an integer. Therefore, the committee is not aware of any machines that treat parity bits as padding bits.
EDIT 12/28/2010 3PM EST: I found an interesting discussion on comp.lang.c from a few months ago.
Bitwise Operator Effects on Padding Bits (VelocityReviews reader)
Bitwise Operator Effects on Padding Bits (Google Groups alternate link)
One point made by Dietmar which I found interesting:
Let's note that padding bits are not necessary for the existence of trap representations; combinations of value bits which do not represent a value of the object type would also do.

Bitwise operations (like arithmetic operations) operate on values and ignore padding. The implementation may or may not modify padding bits (or use them internally, eg as parity bits), but portable C code will never be able to detect this. Any value (including UINT_MAX) will not include the padding.
Where integer padding might lead to problems on is if you use things like sizeof (int) * CHAR_BIT and then try to use shifts to access all these bits. If you want to be portable, either only use (unsigned) char, fixed-sized integers (a C99 addition) or determine the number of value-bits programatically. This can be done at compile-time with the preprocessor by comparing UINT_MAX against powers of 2 or at runtime by using bit-operations.
edit:
C90 does not mention integer padding at all, but as far as I can tell, 'invisible' preceding or trailing integer padding bits shouldn't violate the standard (I didn't go through all relevant sections to make sure this is really the case, though); there probaby are problems with mixed padding and value bits as mentioned in the C99 rationale because otherwise, the standard would not have needed to be changed.
As to the meaning of user-accessible: Padding bits are accessible insofar as you can alwaye get at any bit of foo (including padding) by using bit-operations on ((unsigned char *)&foo)[…]. Be careful when modifying the padding bits, though: the result won't change the value of the integer, but might create be a trap-representation nevertheless. In case of C90, this is implicitly unspecified (as in not mentioned at all), in case of C99, it's implementation-defined.
This was not what the rationale quotation was about, though: the cited architecture represents 32-bit integers via two 16-bit integers. In case of unsigned types, the resulting integer has 32 value bits and a precision of 32; in case of signed integers, it only has 31 value bits and a precision of 30: one of the sign bits of the 16-bit integers is used as the sign bit of the 32-bit integer, the other one is ignored, thus creating a padding bit surrounded by value bits. Now, if you access a 32-bit signed integer as an unsigned integer (which is explicitly allowed and does not violate the C99 aliasing rules), the padding bit becomes a (user-accessible) value bit.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight