variables of incompatible width

variables of incompatible width - c

I am using the following code to simplify assigning large values to specific locations in memory:
int buffer_address = virtual_to_physical(malloc(BUFFER_SIZE));
unsigned long int ring_slot = buffer_address << 32 | BUFFER_SIZE;
However, the compiler complains "warning: left shift count >= width of type". But an unsigned long int in C is 64 bits, so bit-shifting an int (32 bits) to the left 32 bits should yield a 64 bit value, and hence the compiler shouldn't complain. But it does.
Is there something obvious I'm missing, or otherwise is there a simple workaround?

An unsigned long int is not necessarily 64 bits, but for the simplicity let's assume it is.
buffer_address is of type int. Any expression without any "higher" types on buffer_address should return int. Thereby buffer_address << 32 should return int, and not unsigned long. Thus the compiler complains.
This should solve your issue though:
unsigned long ring_slot = ((unsigned long) buffer_address) << 32 | BUFFER_SIZE;
Please note, an unsigned long is not necessarily 64 bits, this depends on the implementation. Use this instead:
#include <stdint.h> // introduced in C99
uint64_t ring_slot = ((uint64_t) buffer_address) << 32 | BUFFER_SIZE;

buffer_address is a (32-bit) int, so buffer_size << 32 is shifting it by an amount greater than or equal to its size.
unsigned long ring_slot = ((unsigned long) buffer_address << 32) | BUFFER_SIZE:
Note that 'unsigned long' need not be 64-bits (it is not on Windows - 32-bit (ILP32) or 64-bit (LLP64); nor it is on a 32-bit Unix machine (ILP32)). To get a guaranteed (at least) 64-bit integer, you need unsigned long long.
There are few machines where int is a 64-bit quantity (ILP64); the DEC Alpha was one such, and I believe some Cray machines also used that (and the Cray's also used 'big' char types - more than 8 bits per char).

The result of the expression on the right side of the = sign does not depend on what it's assigned to. You must cast to unsigned long first.
unsigned long int ring_slot = (unsigned long)buffer_address << 32 | BUFFER_SIZE;

Related

shifting an unsigned char by more than 8 bits

I'm a bit troubled by this code:
typedef struct _slink{
struct _slink* next;
char type;
void* data;
}
assuming what this describes is a link in a file, where data is 4bytes long representing either an address or an integer(depending on the type of the link)
Now I'm looking at reformatting numbers in the file from little-endian to big-endian, and so what I wanna do is change the order of the bytes before writing back to the file, i.e.
for 0x01020304, I wanna convert it to 0x04030201 so when I write it back, its little endian representation is gonna look like the big endian representation of 0x01020304, I do that by multiplying the i'th byte by 2^8*(3-i), where i is between 0 and 3. Now this is one way it was implemented, and what troubles me here is that this is shifting bytes by more than 8 bits.. (L is of type _slink*)
int data = ((unsigned char*)&L->data)[0]<<24) + ((unsigned char*)&L->data)[1]<<16) +
((unsigned char*)&L->data)[2]<<8) + ((unsigned char*)&L->data)[3]<<0)
Can anyone please explain why this actually works? without having explicitly cast these bytes to integers to begin with(since they're only 1 bytes but are shifted by up to 24 bits)
Thanks in advance.

Any integer type smaller than int is promoted to type int when used in an expression.
So the shift is actually applied to an expression of type int instead of type char.

Can anyone please explain why this actually works?
The shift does not occur as an unsigned char but as a type promoted to an int1. #dbush.
Reasons why code still has issues.
32-bit int
Shifting a int 1 into the the sign's place is undefined behavior UB. See also #Eric Postpischil.
((unsigned char*)&L->data)[0]<<24) // UB
16-bit int
Shifting by the bit width or more is insufficient precision even if the type was unsigned. As int it is UB like above. Perhaps then OP would have only wanted a 2-byte endian swap?
Alternative
const uint8_t *p = &L->data;
uint32_t data = (uint32_t)p[0] << 24 | (uint32_t)p[1] << 16 | //
(uint32_t)p[2] << 8 | (uint32_t)p[3] << 0;
For the pedantic
Had int used non-2's complement, the addition of a negative value from ((unsigned char*)&L->data)[0]<<24) would have messed up the data pattern. Endian manipulations are best done using unsigned types.
from little-endian to big-endian
This code does not swap between those 2 endians. It is a big endian to native endian swap. When this code is run on a 32-bit unsigned little endian machine, it is effectively a big/little swap. On a 32-bit unsigned big endian machine, it could have been a no-op.
1 ... or posibly an unsigned on select platforms where UCHAR_MAX > INT_MAX.

Converting non-Ascii characters to int in C, the extra bits are supplemented by 1 rather than 0

When coding in C, I have accidently found that as for non-Ascii characters, after they are converted from char (1 byte) to int (4 bytes), the extra bits (3 bytes) are supplemented by 1 rather than 0. (As for Ascii characters, the extra bits are supplemented by 0.) For example:
char c[] = "ā";
int i = c[0];
printf("%x\n", i);
And the result is ffffffc4, rather than c4 itself. (The UTF-8 code for ā is \xc4\x81.)
Another related issue is that when performing right shift operations >> on a non-Ascii character, the extra bits on the left end are also supplemented by 1 rather than 0, even though the char variable is explicitly converted to unsigned int (for as for signed int, the extra bits are supplemented by 1 in my OS). For example:
char c[] = "ā";
unsigned int u_c;
int i = c[0];
unsigned int u_i = c[0];
c[0] = (unsigned int)c[0] >> 1;
u_c = (unsigned int)c[0] >> 1;
i = i >> 1;
u_i = u_i >> 1;
printf("c=%x\n", (unsigned int)c[0]); // result: ffffffe2. The same with the signed int i.
printf("u_c=%x\n", u_c); // result: 7fffffe2.
printf("i=%x\n", i); // result: ffffffe2.
printf("u_i=%x\n", u_i); // result: 7fffffe2.
Now I am confused with these results... Are they concerned with the data structures of char, int and unsigned int, or related to my operating system (ubuntu 14.04), or related to the ANSI C requirements? I have tried to compile this program with both gcc(4.8.4) and clang(3.4), but there is no difference.
Thank you so much!

It is implementation-defined whether char is signed or unsigned. On x86 computers, char is customarily a signed integer type; and on ARM it is customarily an unsigned integer type.
A signed integer will be sign-extended when converted to a larger signed type;
a signed integer converted to unsigned integer will use the modulo arithmetic to wrap the signed value into the range of the unsigned type as if by repeatedly adding or subtracting the maximum value of the unsigned type + 1.
The solution is to use/cast to unsigned char if you want the value to be portably zero-extended, or for storing small integers in range 0..255.
Likewise, if you want to store signed integers in range -127..127/128, use signed char.
Use char if the signedness doesn't matter - the implementation will probably have chosen the type that is the most efficient for the platform.
Likewise, for the assignment
unsigned int u_c; u_c = (uint8_t)c[0];,
Since -0x3c or -60 is not in the range of uint16_t, then the actual value is the value (mod UINT16_MAX + 1) that falls in the range of uint16_t; iow, we add or subtract UINT16_MAX + 1 (notice that the integer promotions could trick here so you might need casts if in C code) until the value is in the range. UINT16_MAX is naturally always 0xFFFFF; add 1 to it to get 0x10000. 0x10000 - 0x3C is 0xFFC4 that you saw. And then the uint16_t value is zero-extended to the uint32_t value.
Had you run this on a platform where char is unsigned, the result would have been 0xC4!
BTW in i = i >> 1;, i is a signed integer with a negative value; C11 says that the value is implementation-defined, so the actual behaviour can change from compiler to compiler. The GCC manuals state that
Signed >> acts on negative numbers by sign extension.
However a strictly-conforming program should not rely on this.

Defining (1 << 31) or using 0x80000000? Result is different

#define SCALE (1 << 31)
#define fix_Q31_80(x) ( (int) ( (float)(x)*(float)0x80000000 ) )
#define fix_Q31_SC(x) ( (int) ( (float)(x)*(float)SCALE ) )
int main()
{
int fix_80 = fix_Q31_80(0.5f);
int fix_sc = fix_Q31_SC(0.5f);
}
Why are the values fix_80 and fix_sc different?
fix_80 == Hex:0x40000000
fix_sc == Hex:0xc0000000

1 << 31 is undefined behavior on most platforms (e. g., systems with 16-bit or 32-bit int) as its result cannot be represented in an int (the resulting type of the expression). Don't use that expression in code. On the other hand 1U << 31 is a valid expression on systems with 32-bit int as its result is representable in an unsigned int (the resulting type of the expression).
On a 32-bit int system, 0x80000000 is a (relatively) big positive integer number of type unsigned int. If you are lucky (or unlucky) enough to not have demons to fly out of your nose by using 1 << 31 expression, the most likely result of this expression is INT_MIN which is a (relatively) big negative integer number of type int.

All integer constants have a type. In the case of 1, the type is int. On a system with 32 bit integers, 1 << 31 gives a number which is too large to be represented as an int. This is undefined behavior and therefore a bug.
But 0x80000000 will work as expected, because on a 32 bit system it happens to be the type unsigned int. This is because decimal constants and hexadecimal constants behave differently when the compiler goes looking for what type they should have, as explained here.
As several people have mentioned, don't use bitwise operators on signed types.

0x80000000 is a big number which needs 32 bit to represent that number. This means on a 32 bit system (or in a 32 bit compatible application) an int is too small. So use unsigned long instead:
#define SCALE (1u << 31)
#define fix_Q31_80(x) ( (unsigned long) ( (float)(x)*(float)0x80000000u ) )
#define fix_Q31_SC(x) ( (unsigned long) ( (float)(x)*(float)SCALE ) )
int main()
{
unsigned long fix_80 = fix_Q31_80(0.5f);
unsigned long fix_sc = fix_Q31_SC(0.5f);
}

set most significant bit in C

I am trying to set the most significant bit in a long long unsigned, x.
To do that I am using this line of code:
x |= 1<<((sizeof(x)*8)-1);
I thought this should work, because sizeof gives size in bytes, so I multiplied by 8 and subtract one to set the final bit. Whenever I do that, the compiler has this warning: "warning: left shift count >= width of type"
I don't understand why this error is occurring.

The 1 that you are shifting is a constant of type int, which means that you are shifting an int value by sizeof(unsigned long long) * 8) - 1 bits. This shift can easily be more than the width of int, which is apparently what happened in your case.
If you want to obtain some bit-mask mask of unsigned long long type, you should start with an initial bit-mask of unsigned long long type, not of int type.
1ull << (sizeof(x) * CHAR_BIT) - 1
An arguably better way to build the same mask would be
~(-1ull >> 1)
or
~(~0ull >> 1)

use 1ULL << instead of 1 <<
Using just "1" makes you shift an integer. 1ULL will be an unsigned long long which is what you need.
An integer will probably be 32 bits and long long probably 64 bits wide. So shifting:
1 << ((sizeof(long long)*8)-1)
will be (most probably):
1 << 63
Since 1 is an integer which is (most probably) 32 bits you get a warning because you are trying to shift past the MSB of a 32 bit value.

The literal 1 you are shifting is not automatically an unsigned long long (but an int) and thus does not have as many bits as you need. Suffix it with ULL (i.e., 1ULL), or cast it to unsigned long long before shifting to make it the correct type.
Also, to be a bit safer for strange platforms, replace 8 with CHAR_BIT. Note that this is still not necessarily the best way to set the most significant bit, see, e.g., this question for alternatives.
You should also consider using a type such as uint64_t if you're assuming unsigned long long to be a certain width, or uint_fast64_t/uint_least64_t if you need at least a certain width, or uintmax_t if you need the largest available type.

Thanks to the 2's complement representation of negative integers, the most-negative interger is exactly the desired bit pattern with only the MSB set. So x |= (unsigned long long )LONG_LONG_MIN; should work too.

sign extension in C

I'm looking here to understand sign extension:
http://www.shrubbery.net/solaris9ab/SUNWdev/SOL64TRANS/p8.html
struct foo {
unsigned int base:19, rehash:13;
};
main(int argc, char *argv[])
{
struct foo a;
unsigned long addr;
a.base = 0x40000;
addr = a.base << 13; /* Sign extension here! */
printf("addr 0x%lx\n", addr);
addr = (unsigned int)(a.base << 13); /* No sign extension here! */
printf("addr 0x%lx\n", addr);
}
They claim this:
------------------ 64 bit:
% cc -o test64 -xarch=v9 test.c
% ./test64
addr 0xffffffff80000000
addr 0x80000000
%
------------------ 32 bit:
% cc -o test32 test.c
% ./test32
addr 0x80000000
addr 0x80000000
%
I have 3 questions:
What is sign extension ? Yes I read wiki, but didn't understand when type promotion occurs, what's going on with sign extension?
Why ffff.. in 64 bit(referring addr) ?
When I do type cast, why no sign extension?
EDIT:
4. Why not an issue in 32 bit system?

The left operand of the << operator undergoes standard promotions, so in your case it is promoted to int -- so far so good. Next, the int of value 0x4000 is multiplied by 213, which causes overflow and thus undefined behaviour. However, we can see what's happening: the value of the expression is now simply INT_MIN, the smallest representable int. Finally, when you convert that to an unsigned 64-bit integer, the usual modular arithmetic rules entail that the resulting value is 0xffffffff80000000. Similarly, converting to an unsigned 32-bit integer gives the value 0x80000000.
To perform the operation on unsigned values, you need to control the conversions with a cast:
(unsigned int)(a.base) << 13

a.base << 13
The bitwise operator performs integer promotions on both its operands.
So this is equivalent to:
(int) a.base << 13
which is a negative value of type int.
Then:
addr = (int) a.base << 13;
converts this signed negative value ((int) a.base << 13) to the type of addr which is unsigned long through integer conversions.
Integer conversions (C99, 6.3.1.3p2) rules that is the same as doing:
addr = (long) ((int) a.base << 13);
The conversion long performs the sign extension here because ((int) a.base << 13) is a negative signed number.
On the other case, with a cast you have something equivalent to:
addr = (unsigned long) (unsigned int) ((int) a.base << 13);
so no sign extension is performed in your second case because (unsigned int) ((int) a.base << 13) is an unsigned (and positive of course) value.
EDIT: as KerrekSB mentioned in his answer a.base << 13 is actually not representable in an int (I assume 32-bit int) so this expression invokes undefined behavior and the implementation has he right to behave in any other way, for example crashing.
For information, this is definitely not portable but if you are using gcc, gcc does not consider a.base << 13 here as undefined behavior. From gcc documentation:
"GCC does not use the latitude given in C99 only to treat certain aspects of signed '<<' as undefined, but this is subject to change."
in http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html

This is more of a question about bit-fields. Note that if you change the struct to
struct foo {
unsigned int base, rehash;
};
you get very different results.
As #JensGustedt noted in Type of unsigned bit-fields: int or unsigned int the specification says:
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int;
Even though you've specified that base is unsigned, the compiler converts it to a signed int when you read it. That's why you don't get sign extension when you cast it to unsigned int.
Sign extension has to do with how negative numbers are represented in binary. The most common scheme is 2s complement. In this scheme, -1 is represented in 32 bits as 0xFFFFFFFF, -2 is 0xFFFFFFFE, etc. So what should be done when we want to convert a 32-bit number to a 64-bit number, for example? If we convert 0xFFFFFFFF to 0x00000000FFFFFFFF, the numbers will have the same unsigned value (about 4 billion), but different signed values (-1 vs. 4 billion). On the other hand, if we convert 0xFFFFFFFF to 0xFFFFFFFFFFFFFFFF, the numbers will have the same signed value (-1) but different unsigned values. The former is called zero-extension (and is appropriate for unsigned numbers) and the latter is called sign-extension (and is appropriate for signed numbers). It's called "sign-extension" because the "sign bit" (the most significant, or left-most bit) is extended, or copied, to make the number wider.

It took me a while and a lot of reading/testing.
Maybe my, beginner way to understand what's going on will get to you (as I got it)
a.base=0x40000 (1(0)x18) -> 19-bit bitfield
addr=a.base<<13.
any value a.base can hold int can hold, too so conversion from 19-bit unsigned int bitfield to 32-bit signed integer. (a.base is now (0)x13,1,(0)x18).
now (converted to signed int a.base)<<13 which results in 1(0)x31). Remember it's signed int now.
addr=(1(0)x31). addr is of unsigned long type(64 bit) so to do the assignment righ value is converted to long int. Conversion from signed int to long int make addr (1)x33,(0)x31.
And that's what being printed after all of thos converstions you weren't even aware of:
0xffffffff80000000.
Why the second line prints 0x80000000 is because of that cast to (unsigned int) before conversion to long int. When converting unsigned int to long int there is no bit sign so value is just filled with trailing 0's to match the size and that's all.
What's different on with 32-bit, is during conversion from 32-bit signed int to 32-bit unsigned long their sizes match and do trailing bit signs are added,so:
1(0)x31 will stay 1(0)x31
even after conversion from int to long int(they have the same size, the value is interpreted different but bits are intact.)
Quotation from your link:
Any code that makes this assumption must be changed to work for both
ILP32 and LP64. While an int and a long are both 32-bits in the ILP32
data model, in the LP64 data model, a long is 64-bits.