Store signed 32-bit in unsigned 64-bit int - c

Basically, what I want is to "store" a signed 32-bit int inside (in the 32 rightmost bits) an unsigned 64-bit int - since I want to use the leftmost 32 bits for other purposes.
What I'm doing right now is a simple cast and mask:
#define packInt32(X) ((uint64_t)X | INT_MASK)
But this approach has an obvious issue: If X is a positive int (the first bit is not set), everything goes fine. If it's negative, it becomes messy.
The question is:
How to achieve the above, also supporting negative numbers, in the fastest and most-efficient way?

The "mess" you mention happens because you cast a small signed type to a large unsigned type.
During this conversion the size is adjusted first with applying sign extension. This is what causes your trouble.
You can simply cast the (signed) integer to an unsigned type of same size first. Then casting to 64 bit will not trigger sign extension:
#define packInt32(X) ((uint64_t)(uint32_t)(X) | INT_MASK)

You need to mask out any bits besides the low order 32 bits. You can do that with a bitwise AND:
#define packInt32(X) (((uint64_t)(X) & 0xFFFFFFFF) | INT_MASK)

A negative 32-bit integer will get sign-extended into 64-bits.
#include <stdint.h>
uint64_t movsx(int32_t X) { return X; }
movsx on x86-64:
movsx:
movsx rax, edi
ret
Masking out the higher 32-bits will remove cause it to be just zero-extended:
#include <stdint.h>
uint64_t mov(int32_t X) { return (uint64_t)X & 0xFFFFFFFF; }
//or uint64_t mov(int32_t X) { return (uint64_t)(uint32_t)X; }
mov on x86-64:
mov:
mov eax, edi
ret
https://gcc.godbolt.org/z/fihCmt
Neither method loses any info from the lower 32-bits, so either method is a valid way of storing a 32-bit integer into a 64-bit one.
The x86-64 code for a plain mov is one byte shorter (3 bytes vs 4). I don't think there should be much of a speed difference, but if there is one, I'd expect the plain mov to win by a tiny bit.

One option is to untangle the sign-extension and the upper value when it is read back, but that can be messy.
Another option is to construct a union with a bit-packed word. This then defers the problem to the compiler to optimise:
union {
int64_t merged;
struct {
int64_t field1:32,
field2:32;
};
};
A third option is to deal with the sign bit yourself. Store a 15-bit absolute value and a 1-bit sign. Not super-efficient, but more likely to be legal if you should ever encounter a non-2's-complement processor where negative signed values can't be safely cast to unsigned. They are rare as hens teeth, so I wouldn't worry about this myself.

Assuming that the only operation on the 64 bit value will be to convert it back to 32 (and potentially, storing/displaying it), there is no need to apply a mask. The compiler will sign extend the 32 bit attributes when casting it to 64 bit, and will pick the lowest 32 bit when casting the 64 bit value back to 32 bit.
#define packInt32(X) ((uint64_t)(X))
#define unpackInt32(X) ((int)(X))
Or better, using (inline) functions:
inline uint64_t packInt32(int x) { return ((uint64_t) x) ; }
inline int unpackInt32(uint64_t x) { return ((int) x) ; }

Related

C bitfield with assigned value 1 shows -1

I played with bit-fields and stuck with some strange thing:
#include <stdio.h>
struct lol {
int a;
int b:1,
c:1,
d:1,
e:1;
char f;
};
int main(void) {
struct lol l = {0};
l.a = 123;
l.c = 1; // -1 ???
l.f = 'A';
printf("%d %d %d %d %d %c\n", l.a, l.b, l.c, l.d, l.e, l.f);
return 0;
}
The output is:
123 0 -1 0 0 A
Somehow the value of l.c is -1. What is the reason?
Sorry if obvious.
Use unsigned bitfields if you don't want sign-extension.
What you're getting is your 1 bit being interpreted as the sign bit in a two's complement representation. In two's complement, the sign-bit is the highest bit and it's interpreted as -(2^(width_of_the_number-1)), in your case -(2^(1-1)) == -(2^0) == -1. Normally all other bits offset this (because they're interpreted as positive) but a 1-bit number doesn't and can't have other bits, so you just get -1.
Take for example 0b10000000 as a as int8_t in two's complement.
(For the record, 0b10000000 == 0x80 && 0x80 == (1<<7)). It's the highest bit so it's interpreted as -(2^7) (==-128)
and there's no positive bits to offset it, so you get printf("%d\n", (int8_t)0x80); /*-128*/
Now if you set all bits on, you get -1, because -128 + (128-1) == -1. This (all bits on == -1) holds true for any width interpreted as in two's complement–even for width 1, where you get -1 + (1-1) == -1`.
When such a signed integer gets extended into a wider width, it undergoes so called sign extension.
Sign extension means that the highest bit gets copied into all the newly added higher bits.
If the highest bit is 0, then it's trivial to see that sign extension doesn't change the value (take for example 0x01 extended into 0x00000001).
When the highest bit is 1 as in (int8_t)0xff (all 8 bits 1), then sign extension copies the sign bit into all the new bits: ((int32_t)(int8_t)0xff == (int32_t)0xffffffff). ((int32_t)(int8_t)0x80 == (int32_t)0xffffff80) might be a better example as it more clearly shows the 1 bits are added at the high end (try _Static_assert-ing either of these).
This doesn't change the value either as long as you assume two's complement, because if you start at:
-(2^n) (value of sign bit) + X (all the positive bits) //^ means exponentiation here
and add one more 1-bit to the highest position, then you get:
-(2^(n+1)) + 2^(n) + X
which is
2*-(2^(n)) + 2^(n) + X == -(2^n) + X //same as original
//inductively, you can add any number of 1 bits
Sign extension normally happens when you width-extend a native signed integer into a native wider width (signed or unsigned), either with casts or implicit conversions.
For the native widths, platforms usually have an instruction for it.
Example:
int32_t signExtend8(int8_t X) { return X; }
Example's dissassembly on x86_64:
signExtend8:
movsx eax, dil //the sx stands for Sign-eXtending
ret
If you want to make it work for nonstandard widths, you can usually utilize the fact that signed-right-shifts normally copy the the sign bit alongside the shifted range (it's really implementation defined what signed right-shifts do)
and so you can unsigned-left-shift into the sign bit and then back to get sign-extension artificially for non-native width such as 2:
#include <stdint.h>
#define MC_signExtendIn32(X,Width) ((int32_t)((uint32_t)(X)<<(32-(Width)))>>(32-(Width)))
_Static_assert( MC_signExtendIn32(3,2 /*width 2*/)==-1,"");
int32_t signExtend2(int8_t X) { return MC_signExtendIn32(X,2); }
Disassembly (x86_64):
signExtend2:
mov eax, edi
sal eax, 30
sar eax, 30
ret
Signed bitfields essentially make the compiler generate (hidden) macros like the above for you:
struct bits2 { int bits2:2; };
int32_t signExtend2_via_bitfield(struct bits2 X) { return X.bits2; }
Disassembly (x86_64) on clang:
signExtend2_via_bitfield: # #signExtend2_via_bitfield
mov eax, edi
shl eax, 30
sar eax, 30
ret
Example code on godbolt: https://godbolt.org/z/qxd5o8 .
Bit-fields are very poorly standardized and they are generally not guaranteed to behave predictably. The standard just vaguely states (6.7.2.1/10):
A bit-field is interpreted as having a signed or unsigned integer type consisting of the
specified number of bits.125)
Where the informative note 125) says:
125) As specified in 6.7.2 above, if the actual type specifier used is int or a typedef-name defined as int,
then it is implementation-defined whether the bit-field is signed or unsigned.
So we can't know if int b:1 gives a signed type or unsigned type, it's up to the compiler. Your compiler apparently decided that it would be a great idea to have signed bits. So it treats your 1 bit as binary translated into a two's complement 1 bit number, where binary 1 is decimal -1 and zero is zero.
Furthermore, we can't know where b in your code ends up in memory, it could be anywhere and also depends on endianess. What we do know is that you save absolutely no memory from using a bit-field here, since at least 16 bits for an int will get allocated anyway.
General good advise:
Never use bit-fields for any purpose.
Use the bit-wise operators << >> | & ^ ~ and named bit-masks instead, for 100% portable and well-defined code.
Use the stdint.h types or at least unsigned ones whenver dealing with raw binary.
You are using a signed integer, and since the representation of 1 in binary has the very first bit set to 1, in a signed representation that is translated with the existence of negative signedness, so you get -1. As other comments suggest, use the unsigned keyword to remove the possibility to represent negative integers.

How to get the mantissa of an 80-bit long double as an int on x86-64

frexpl won't work because it keeps the mantissa as part of a long double. Can I use type punning, or would that be dangerous? Is there another way?
x86's float and integer endianness is little-endian, so the significand (aka mantissa) is the low 64 bits of an 80-bit x87 long double.
In assembly, you just load the normal way, like mov rax, [rdi].
Unlike IEEE binary32 (float) or binary64 (double), 80-bit long double stores the leading 1 in the significand explicitly. (Or 0 for subnormal). https://en.wikipedia.org/wiki/Extended_precision#x86_extended_precision_format
So the unsigned integer value (magnitude) of the true significand is the same as what's actually stored in the object-representation.
If you want signed int, too bad; including the sign bit it would be 65 bits but int is only 32-bit on any x86 C implementation.
If you want int64_t, you could maybe right shift by 1 to discard the low bit, making room for a sign bit. Then do 2's complement negation if the sign bit was set, leaving you with a signed 2's complement representation of the significand value divided by 2. (IEEE FP uses sign/magnitude with a sign bit at the top of the bit-pattern)
In C/C++, yes you need to type-pun, e.g. with a union or memcpy. All C implementations on x86 / x86-64 that expose 80-bit floating point at all use a 12 or 16-byte type with the 10-byte value at the bottom.
Beware that MSVC uses long double = double, a 64-bit float, so check LDBL_MANT_DIG from float.h, or sizeof(long double). All 3 static_assert() statements trigger on MSVC, so they all did their job and saved us from copying a whole binary64 double (sign/exp/mantissa) into our uint64_t.
// valid C11 and C++11
#include <float.h> // float numeric-limit macros
#include <stdint.h>
#include <assert.h> // C11 static assert
#include <string.h> // memcpy
// inline
uint64_t ldbl_mant(long double x)
{
// we can assume CHAR_BIT = 8 when targeting x86, unless you care about DeathStation 9000 implementations.
static_assert( sizeof(long double) >= 10, "x87 long double must be >= 10 bytes" );
static_assert( LDBL_MANT_DIG == 64, "x87 long double significand must be 64 bits" );
uint64_t retval;
memcpy(&retval, &x, sizeof(retval));
static_assert( sizeof(retval) < sizeof(x), "uint64_t should be strictly smaller than long double" ); // sanity check for wrong types
return retval;
}
This compiles efficiently on gcc/clang/ICC (on Godbolt) to just one instruction as a stand-alone function (because the calling convention passes long double in memory). After inlining into code with a long double in an x87 register, it will presumably compile to a TBYTE x87 store and an integer reload.
## gcc/clang/ICC -O3 for x86-64
ldbl_mant:
mov rax, QWORD PTR [rsp+8]
ret
For 32-bit, gcc has a weird redundant-copy missed-optimization bug which ICC and clang don't have; they just do the 2 loads from the function arg without copying first.
# GCC -m32 -O3 copies for no reason
ldbl_mant:
sub esp, 28
fld TBYTE PTR [esp+32] # load the stack arg
fstp TBYTE PTR [esp] # store a local
mov eax, DWORD PTR [esp]
mov edx, DWORD PTR [esp+4] # return uint64_t in edx:eax
add esp, 28
ret
C99 makes union type-punning well-defined behaviour, and so does GNU C++. I think MSVC defines it too.
But memcpy is always portable so that might be an even better choice, and it's easier to read in this case where we just want one element.
If you also want the exponent and sign bit, a union between a struct and long double might be good, except that padding for alignment at the end of the struct will make it bigger. It's unlikely that there'd be padding after a uint64_t member before a uint16_t member, though. But I'd worry about :1 and :15 bitfields, because IIRC it's implementation-defined which order the members of a bitfield are stored in.

C literal suffix U, UL problems

Could someone explain to me what can happen if I'll forget suffix(postfix) for constants(literals) in ANSI C?
For example I saw for bit shift operations such defines:
#define AAR_INTENSET_NOTRESOLVED_Pos (2UL) /*!< Position of NOTRESOLVED field. */
#define AAR_INTENSET_NOTRESOLVED_Msk (0x1UL << AAR_INTENSET_NOTRESOLVED_Pos) /*!< Bit mask of NOTRESOLVED field. */
#define AAR_INTENSET_NOTRESOLVED_Disabled (0UL) /*!< Interrupt disabled. */
#define AAR_INTENSET_NOTRESOLVED_Enabled (1UL) /*!< Interrupt enabled. */
#define AAR_INTENSET_NOTRESOLVED_Set (1UL) /*!< Enable interrupt on write. */
It's used in 32bit architecture. But it could be ported to 16bit or 8bit.
What can happen if postfix UL is not used and I'll use these macros for bit shift operations as it is supposed?
I just assume that e.g. in 8-bit architecture can (1<<30) leads to overflow.
EDIT: I have found nice link: http://dystopiancode.blogspot.cz/2012/08/constant-suffixes-and-prefixes-in-ansi-c.html
But is it safe to use suffixes if the code is supposed to be ported on various architectures?
For instance if suffix U represents unisgned int so for 8bit architecture it's usually 16bit but for 32bit it's 32bit variable, so 0xFFFFAAAAU is ok for 32bit compiler but not for 8bit compiler, right?
A decimal number like -1,1,2,12345678, etc. without any suffix will get the smallest type it will fit, starting with int, long, long long.
An octal or hex number like 0, 0123, 0x123, 0X123 without any suffix will get the smallest type it will fit, starting with int, unsigned, long, unsigned long, long long, unsigned long long.
The following is a potential problem should AAR_INTENSET_NOTRESOLVED_Pos exceed 31. Note: unsigned long must be at least 32 bits. It would result in 0 ** if unsigned long was 32 bits, but non-zero if longer.
(0x1UL << AAR_INTENSET_NOTRESOLVED_Pos)
The following is a similar potential problem should AAR_INTENSET_NOTRESOLVED_Pos exceed 15. 0x1 is an unsigned, which must only be at least 16 bits. Also if unsigned/int is 16 bits, the minimum, 0x1 will be int. So without explicitly using U, 0x1 could be a problem if AAR_INTENSET_NOTRESOLVED_Pos == 15. [#Matt McNabb]
(0x1 << AAR_INTENSET_NOTRESOLVED_Pos)
Bitwise shift operators
"The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined." C11dr §6.5.7 3
Machine width is not the key issue. An 8bit or 16bit machine could use 16, 32, etc. bit size int. Again, 16 bit is the minimum size for a compliant C compiler.
[Edit] **
I should have said " It (shifting more than 31 bits) would result in Undefined behavior, UB, if unsigned long was 32 bits."
It can break.
It might be better to include the cast in the code itself, i.e.
uint32_t something = (uint32_t) AAR_INTENSET_NOTRESOLVED_Set << 30;
This makes it work even if the #define for the constant is simply an integer.

Clarification needed on (u/i)int_fastN_t

i read many explanation on fastest minimum-width integer types but i couldn't understand when to use these data types.
My understanding :
On 32-bit machine,
uint_least16_t could be typedef to an unsigned short.
1. uint_least16_t small = 38;
2. unsigned short will be of 16 bits so the value 38 will be stored using 16 bits. And this will take up 16 bits of memory.
3. The range for this data type will be 0 to (2^N)-1 , here N=16.
uint_fast16_t could be typedef to an unsigned int.
1. uint_fast16_t fast = 38;
2. unsigned int will be of 32 bits so the value 38 will be stored using 32 bits. And this will take up 32 bits of memory.
3. what will be the range for this data type ?
uint_fast16_t => uint_fastN_t , here N = 16
but the value can be stored in 32 bits so IS it 0 to (2^16)-1 OR 0 to (2^32)-1 ?
how can we make sure that its not overflowing ?
Since its a 32 bit, Can we assign >65535 to it ?
If it is a signed integer, how signedness is maintained.
For example int_fast16_t = 32768;
since the value falls within the signed int range, it'll be a positive value.
A uint_fast16_t is just the fastest unsigned data type that has at least 16 bits. On some machines it will be 16 bits and on others it could be more. If you use it, you should be careful because arithmetic operations that give results above 0xFFFF could have different results on different machines.
On some machines, yes, you will be able to store numbers larger than 0xFFFF in it, but you should not rely on that being true in your design because on other machines it won't be possible.
Generally the uint_fast16_t type will either be an alias for uint16_t, uint32_t, or uint64_t, and you should make sure the behavior of your code doesn't depend on which type is used.
I would say you should only use uint_fast16_t if you need to write code that is both fast and cross-platform. Most people should stick to uint16_t, uint32_t, and uint64_t so that there are fewer potential issues to worry about when porting code to another platform.
An example
Here is an example of how you might get into trouble:
bool bad_foo(uint_fast16_t a, uint_fast16_t b)
{
uint_fast16_t sum = a + b;
return sum > 0x8000;
}
If you call the function above with a as 0x8000 and b as 0x8000, then on some machines the sum will be 0 and on others it will be 0x10000, so the function could return true or false. Now, if you can prove that a and b will never sum to a number larger than 0xFFFF, or if you can prove that the result of bad_foo is ignored in those cases, then this code would be OK.
A safer implementation of the same code, which (I think) should behave the same way on all machines, would be:
bool good_foo(uint_fast16_t a, uint_fast16_t b)
{
uint_fast16_t sum = a + b;
return (sum & 0xFFFF) > 0x8000;
}

Can the type difference between constants 32768 and 0x8000 make a difference?

The Standard specifies that hexadecimal constants like 0x8000 (larger than fits in a signed integer) are unsigned (just like octal constants), whereas decimal constants like 32768 are signed long. (The exact types assume a 16-bit integer and a 32-bit long.) However, in regular C environments both will have the same representation, in binary 1000 0000 0000 0000.
Is a situation possible where this difference really produces a different outcome? In other words, is a situation possible where this difference matters at all?
Yes, it can matter. If your processor has a 16-bit int and a 32-bit long type, 32768 has the type long (since 32767 is the largest positive value fitting in a signed 16-bit int), whereas 0x8000 (since it is also considered for unsigned int) still fits in a 16-bit unsigned int.
Now consider the following program:
int main(int argc, char *argv[])
{
volatile long long_dec = ((long)~32768);
volatile long long_hex = ((long)~0x8000);
return 0;
}
When 32768 is considered long, the negation will invert 32 bits,
resulting in a representation 0xFFFF7FFF with type long; the cast is
superfluous.
When 0x8000 is considered unsigned int, the negation will invert
16 bits, resulting in a representation 0x7FFF with type unsigned int;
the cast will then zero-extend to a long value of 0x00007FFF.
Look at H&S5, section 2.7.1 page 24ff.
It is best to augment the constants with U, UL or L as appropriate.
On a 32 bit platform with 64 bit long, a and b in the following code will have different values:
int x = 2;
long a = x * 0x80000000; /* multiplication done in unsigned -> 0 */
long b = x * 2147483648; /* multiplication done in long -> 0x100000000 */
Another examine not yet given: compare (with greater-than or less-than operators) -1 to both 32768 and to 0x8000. Or, for that matter, try comparing each of them for equality with an 'int' variable equal to -32768.
Assuming int is 16 bits and long is 32 bits (which is actually fairly unusual these days; int is more commonly 32 bits):
printf("%ld\n", 32768); // prints "32768"
printf("%ld\n", 0x8000); // has undefined behavior
In most contexts, a numeric expression will be implicitly converted to an appropriate type determined by the context. (That's not always the type you want, though.) This doesn't apply to non-fixed arguments to variadic functions, such as any argument to one of the *printf() functions following the format string.
The difference would be if you were to try and add a value to the 16 bit int it would not be able to do so because it would exceed the bounds of the variable whereas if you were using a 32bit long you could add any number that is less than 2^16 to it.

Resources