#define SCALE (1 << 31)
#define fix_Q31_80(x) ( (int) ( (float)(x)*(float)0x80000000 ) )
#define fix_Q31_SC(x) ( (int) ( (float)(x)*(float)SCALE ) )
int main()
{
int fix_80 = fix_Q31_80(0.5f);
int fix_sc = fix_Q31_SC(0.5f);
}
Why are the values fix_80 and fix_sc different?
fix_80 == Hex:0x40000000
fix_sc == Hex:0xc0000000
1 << 31 is undefined behavior on most platforms (e. g., systems with 16-bit or 32-bit int) as its result cannot be represented in an int (the resulting type of the expression). Don't use that expression in code. On the other hand 1U << 31 is a valid expression on systems with 32-bit int as its result is representable in an unsigned int (the resulting type of the expression).
On a 32-bit int system, 0x80000000 is a (relatively) big positive integer number of type unsigned int. If you are lucky (or unlucky) enough to not have demons to fly out of your nose by using 1 << 31 expression, the most likely result of this expression is INT_MIN which is a (relatively) big negative integer number of type int.
All integer constants have a type. In the case of 1, the type is int. On a system with 32 bit integers, 1 << 31 gives a number which is too large to be represented as an int. This is undefined behavior and therefore a bug.
But 0x80000000 will work as expected, because on a 32 bit system it happens to be the type unsigned int. This is because decimal constants and hexadecimal constants behave differently when the compiler goes looking for what type they should have, as explained here.
As several people have mentioned, don't use bitwise operators on signed types.
0x80000000 is a big number which needs 32 bit to represent that number. This means on a 32 bit system (or in a 32 bit compatible application) an int is too small. So use unsigned long instead:
#define SCALE (1u << 31)
#define fix_Q31_80(x) ( (unsigned long) ( (float)(x)*(float)0x80000000u ) )
#define fix_Q31_SC(x) ( (unsigned long) ( (float)(x)*(float)SCALE ) )
int main()
{
unsigned long fix_80 = fix_Q31_80(0.5f);
unsigned long fix_sc = fix_Q31_SC(0.5f);
}
Related
I need to find the most significant bit of signed int N and save it in signBitN. I want to do this using bitwise only operations.
Also, how would I make signBitN extend so that all its bits are equal to its significant bit.
i.e. if it's significant bit was zero, how would I extend that to be 00000...00?
The closest I've gotten is signBitN=1&(N>>(sizeof(int)-1));
Portable expression:
1 & (x >> (CHAR_BIT * sizeof(int) - 1))
Latest C standards put 3 standards on representation of ints.
sign and magnitude
one complement
two complement
See section 6.2.6.2 Integer types of C11 standard.
Only the third option is relevant in practice for modern machines.
As specified in 6.2.6.1:
Values stored in non-bit-field objects of any other object type
consist of n x CHAR_BIT bits, where n is the size of an object of that type,
in bytes.
Therefore int will consist of sizeof(int) * CHAR_BIT bits, likely 32.
Thus the highest bit of int can be read by shifting right by sizeof(int) * CHAR_BIT - 1 bits and reading the last bit with bitwise & operator.
Note that the exact value of the int after the shift is implementation defined as stated in 6.5.7.5.
On sane machines it would be:
int y = x < 0 ? -1 : 0;
The portable way would be casting between int and an array of unsigned char and setting all bytes to -1.
See 6.3.1.3.2:
if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum
value that can be represented in the new type until the value
is in the range of the new type.
And 6.2.6.1.2
Values stored in unsigned bit-fields and objects of type
unsigned char shall be represented using a pure binary notation.
You can use memset() for that.
int x;
memset(&x, (x < 0 ? -1 : 0), sizeof x);
If the question is how to check the MSB bit of the integer (for example 31st bit of the 32 bit integer) ten IMO this is portable.
#define MSB(i) ((i) & (((~0U) >> 1) ^ (~0U)))
#define issetMSB(i) (!!MSB(i))
int main(void)
{
printf("%x\n", MSB(-1));
printf("%x\n", issetMSB(-1));
}
If CHAR_BIT == 8 on your target system (most cases), it's very easy to mask out a single byte:
unsigned char lsb = foo & 0xFF;
However, there are a few systems and C implementations out there where CHAR_BIT is neither 8 nor a multiple thereof. Since the C standard only mandates a minimum range for char values, there is no guarantee that masking with 0xFF will isolate an entire byte for you.
I've searched around trying to find information about a generic "byte mask", but so far haven't found anything.
There is always the O(n) solution:
unsigned char mask = 1;
size_t i;
for (i = 0; i < CHAR_BIT; i++)
{
mask |= (mask << i);
}
However, I'm wondering if there is any O(1) macro or line of code somewhere that can accomplish this, given how important this task is in many system-level programming scenarios.
The easiest way to extract an unsigned char from an integer value is simply to cast it to unsigned char:
(unsigned char) SomeInteger
Per C 2018 6.3.1.3 2, the result is the remainder of SomeInteger modulo UCHAR_MAX+1. (This is a non-negative remainder; it is always adjusted to be greater than or equal to zero and less than UCHAR_MAX+1.)
Assigning to an unsigned char has the same effect, as assignment performs a conversion (and initializing works too):
unsigned char x;
…
x = SomeInteger;
If you want an explicit bit mask, UCHAR_MAX is such a mask. This is so because unsigned integers are pure binary in C, and the maximum value of an unsigned integer has all value bits set. (Unsigned integers in general may also have padding bit, but unsigned char may not.)
One difference can occur in very old or esoteric systems: If a signed integer is represented with sign-and-magnitude or one’s complement instead of today’s ubiquitous two’s complement, then the results of extracting an unsigned char from a negative value will differ depending on whether you use the conversion method or the bit-mask method.
On review (after accept) , #Eric Postpischil answer's part about UCHAR_MAX makes for a preferable mask.
#define BYTE_MASK UCHAR_MAX
The value UCHAR_MAX shall equal 2CHAR_BIT − 1. C11dr §5.2.4.2.1 2
As unsigned char cannot have padding. So UCHAR_MAX is always the all bits set pattern in a character type and hence in a C "byte".
some_signed & some_unsigned is a problem on non-2's complement as the some_signed is convert to unsigned before the & thus changing the bit pattern on negative vales. To avoid, the all ones mask needs to be signed when masking signed types. The is usually the case with foo & UINT_MAX
Conclusion
Assume: foo is of some integer type.
If only 2's complement is of concern, use a cast - it does not change the bit pattern.
unsigned char lsb = (unsigned char) foo;
Otherwise with any integer encoding and CHAR_MAX <= INT_MAX
unsigned char lsb = foo & UCHAR_MAX;
Otherwise TBD
Shifting an unsigned 1 by CHAR_BIT and then subtracting 1 will work even on esoteric non-2's complement systems. #Some programmer dude. Be sure to use unsigned math.
On such systems, this preserves the bit patten unlike (unsigned char) cast on negative integers.
unsigned char mask = (1u << CHAR_BIT) - 1u;
unsigned char lsb = foo & mask;
Or make a define
#define BYTE_MASK ((1u << CHAR_BIT) - 1u)
unsigned char lsb = foo & BYTE_MASK;
To also handle those pesky cases where UINT_MAX == UCHAR_MAX where 1u << CHAR_BIT would be UB, shift in 2 steps.
#define BYTE_MASK (((1u << (CHAR_BIT - 1)) << 1u) - 1u)
UCHAR_MAX does not have to be equal to (1U << CHAR_BIT) - 1U
you need actually to and with that calculated value not with the UCHAR_MAX
value & ((1U << CHAR_BIT) - 1U).
Many real implementations (for example TI) define UCHAR_MAX as 255 and emit the code which behaves like the one on the machines having 8 bits bytes. It is done to preserve compatibility with the code written for other targets.
For example
unsigned char x;
x++;
will generate the code which checks in the value of x is larger than UCHAR_MAX and if it the truth zeroing the 'x'
I'm looking here to understand sign extension:
http://www.shrubbery.net/solaris9ab/SUNWdev/SOL64TRANS/p8.html
struct foo {
unsigned int base:19, rehash:13;
};
main(int argc, char *argv[])
{
struct foo a;
unsigned long addr;
a.base = 0x40000;
addr = a.base << 13; /* Sign extension here! */
printf("addr 0x%lx\n", addr);
addr = (unsigned int)(a.base << 13); /* No sign extension here! */
printf("addr 0x%lx\n", addr);
}
They claim this:
------------------ 64 bit:
% cc -o test64 -xarch=v9 test.c
% ./test64
addr 0xffffffff80000000
addr 0x80000000
%
------------------ 32 bit:
% cc -o test32 test.c
% ./test32
addr 0x80000000
addr 0x80000000
%
I have 3 questions:
What is sign extension ? Yes I read wiki, but didn't understand when type promotion occurs, what's going on with sign extension?
Why ffff.. in 64 bit(referring addr) ?
When I do type cast, why no sign extension?
EDIT:
4. Why not an issue in 32 bit system?
The left operand of the << operator undergoes standard promotions, so in your case it is promoted to int -- so far so good. Next, the int of value 0x4000 is multiplied by 213, which causes overflow and thus undefined behaviour. However, we can see what's happening: the value of the expression is now simply INT_MIN, the smallest representable int. Finally, when you convert that to an unsigned 64-bit integer, the usual modular arithmetic rules entail that the resulting value is 0xffffffff80000000. Similarly, converting to an unsigned 32-bit integer gives the value 0x80000000.
To perform the operation on unsigned values, you need to control the conversions with a cast:
(unsigned int)(a.base) << 13
a.base << 13
The bitwise operator performs integer promotions on both its operands.
So this is equivalent to:
(int) a.base << 13
which is a negative value of type int.
Then:
addr = (int) a.base << 13;
converts this signed negative value ((int) a.base << 13) to the type of addr which is unsigned long through integer conversions.
Integer conversions (C99, 6.3.1.3p2) rules that is the same as doing:
addr = (long) ((int) a.base << 13);
The conversion long performs the sign extension here because ((int) a.base << 13) is a negative signed number.
On the other case, with a cast you have something equivalent to:
addr = (unsigned long) (unsigned int) ((int) a.base << 13);
so no sign extension is performed in your second case because (unsigned int) ((int) a.base << 13) is an unsigned (and positive of course) value.
EDIT: as KerrekSB mentioned in his answer a.base << 13 is actually not representable in an int (I assume 32-bit int) so this expression invokes undefined behavior and the implementation has he right to behave in any other way, for example crashing.
For information, this is definitely not portable but if you are using gcc, gcc does not consider a.base << 13 here as undefined behavior. From gcc documentation:
"GCC does not use the latitude given in C99 only to treat certain aspects of signed '<<' as undefined, but this is subject to change."
in http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
This is more of a question about bit-fields. Note that if you change the struct to
struct foo {
unsigned int base, rehash;
};
you get very different results.
As #JensGustedt noted in Type of unsigned bit-fields: int or unsigned int the specification says:
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int;
Even though you've specified that base is unsigned, the compiler converts it to a signed int when you read it. That's why you don't get sign extension when you cast it to unsigned int.
Sign extension has to do with how negative numbers are represented in binary. The most common scheme is 2s complement. In this scheme, -1 is represented in 32 bits as 0xFFFFFFFF, -2 is 0xFFFFFFFE, etc. So what should be done when we want to convert a 32-bit number to a 64-bit number, for example? If we convert 0xFFFFFFFF to 0x00000000FFFFFFFF, the numbers will have the same unsigned value (about 4 billion), but different signed values (-1 vs. 4 billion). On the other hand, if we convert 0xFFFFFFFF to 0xFFFFFFFFFFFFFFFF, the numbers will have the same signed value (-1) but different unsigned values. The former is called zero-extension (and is appropriate for unsigned numbers) and the latter is called sign-extension (and is appropriate for signed numbers). It's called "sign-extension" because the "sign bit" (the most significant, or left-most bit) is extended, or copied, to make the number wider.
It took me a while and a lot of reading/testing.
Maybe my, beginner way to understand what's going on will get to you (as I got it)
a.base=0x40000 (1(0)x18) -> 19-bit bitfield
addr=a.base<<13.
any value a.base can hold int can hold, too so conversion from 19-bit unsigned int bitfield to 32-bit signed integer. (a.base is now (0)x13,1,(0)x18).
now (converted to signed int a.base)<<13 which results in 1(0)x31). Remember it's signed int now.
addr=(1(0)x31). addr is of unsigned long type(64 bit) so to do the assignment righ value is converted to long int. Conversion from signed int to long int make addr (1)x33,(0)x31.
And that's what being printed after all of thos converstions you weren't even aware of:
0xffffffff80000000.
Why the second line prints 0x80000000 is because of that cast to (unsigned int) before conversion to long int. When converting unsigned int to long int there is no bit sign so value is just filled with trailing 0's to match the size and that's all.
What's different on with 32-bit, is during conversion from 32-bit signed int to 32-bit unsigned long their sizes match and do trailing bit signs are added,so:
1(0)x31 will stay 1(0)x31
even after conversion from int to long int(they have the same size, the value is interpreted different but bits are intact.)
Quotation from your link:
Any code that makes this assumption must be changed to work for both
ILP32 and LP64. While an int and a long are both 32-bits in the ILP32
data model, in the LP64 data model, a long is 64-bits.
In here
When converting from bytes buffer back to unsigned long int:
unsigned long int anotherLongInt;
anotherLongInt = ( (byteArray[0] << 24)
+ (byteArray[1] << 16)
+ (byteArray[2] << 8)
+ (byteArray[3] ) );
where byteArray is declared as unsigned char byteArray[4];
Question:
I thought byteArray[1] would be just one unsigned char (8 bit). When left-shifting by 16, wouldn't that shift all the meaningful bits out and fill the entire byte with 0? Apparently it is not 8 bit. Perhaps it's shifting the entire byteArray which is a consecutive 4 byte? But I don't see how that works.
In that arithmetic context byteArray[0] is promoted to either int or unsigned int, so the shift is legal and maybe even sensible (I like to deal only with unsigned types when doing bitwise stuff).
6.5.7 Bitwise shift operators
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand.
And integer promotions:
6.3.1.1
If an int can represent all values of the original type the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the integer promotions.
The unsigned char's are implicitly cast to int's when shifting. Not sure to what type exactly it is cast, I thing that depends on the platform and the compiler. To get what you intend, it is safer to explicitly cast the bytes, that also makes it more portable and the reader immediately sees what you intend to do:
unsigned long int anotherLongInt;
anotherLongInt = ( ((unsigned long)byteArray[0] << 24)
+ ((unsigned long)byteArray[1] << 16)
+ ((unsigned long)byteArray[2] << 8)
+ ((unsigned long)byteArray[3] ) );
I am using the following code to simplify assigning large values to specific locations in memory:
int buffer_address = virtual_to_physical(malloc(BUFFER_SIZE));
unsigned long int ring_slot = buffer_address << 32 | BUFFER_SIZE;
However, the compiler complains "warning: left shift count >= width of type". But an unsigned long int in C is 64 bits, so bit-shifting an int (32 bits) to the left 32 bits should yield a 64 bit value, and hence the compiler shouldn't complain. But it does.
Is there something obvious I'm missing, or otherwise is there a simple workaround?
An unsigned long int is not necessarily 64 bits, but for the simplicity let's assume it is.
buffer_address is of type int. Any expression without any "higher" types on buffer_address should return int. Thereby buffer_address << 32 should return int, and not unsigned long. Thus the compiler complains.
This should solve your issue though:
unsigned long ring_slot = ((unsigned long) buffer_address) << 32 | BUFFER_SIZE;
Please note, an unsigned long is not necessarily 64 bits, this depends on the implementation. Use this instead:
#include <stdint.h> // introduced in C99
uint64_t ring_slot = ((uint64_t) buffer_address) << 32 | BUFFER_SIZE;
buffer_address is a (32-bit) int, so buffer_size << 32 is shifting it by an amount greater than or equal to its size.
unsigned long ring_slot = ((unsigned long) buffer_address << 32) | BUFFER_SIZE:
Note that 'unsigned long' need not be 64-bits (it is not on Windows - 32-bit (ILP32) or 64-bit (LLP64); nor it is on a 32-bit Unix machine (ILP32)). To get a guaranteed (at least) 64-bit integer, you need unsigned long long.
There are few machines where int is a 64-bit quantity (ILP64); the DEC Alpha was one such, and I believe some Cray machines also used that (and the Cray's also used 'big' char types - more than 8 bits per char).
The result of the expression on the right side of the = sign does not depend on what it's assigned to. You must cast to unsigned long first.
unsigned long int ring_slot = (unsigned long)buffer_address << 32 | BUFFER_SIZE;