I thought I'd found something similar in this answer but in that case they weren't assigning the result of the expression to the variable. In my case I am assigning it but the bitshift part of the expression has no effect.
unsigned leftmost1 = ((~0)>>20);
printf("leftmost1 %u\n", leftmost1);
Returns
leftmost1 4294967295
Whereas
unsigned leftmost1 = ~0;
leftmost1 = leftmost1 >> 20;
printf("leftmost1 %u\n", leftmost1);
Gives me
leftmost1 4095
I would expect separating the logic into two lines would have no impact, why are the results different?
In the first case, you are doing a signed right shift, because ~0 results in a signed value. The exact behavior of signed right shifts is implementation-defined, but most platforms, including yours, extend the sign bit, so the shift is a no-op for your input of "all ones".
In the second case, you are doing an unsigned right shift, since leftmost1 is an unsigned value. So you shift in zeros from the left.
If you wanted to do an unsigned shift without the intermediate assignmetn, you can do:
(~0u) >> 20
Where the u suffix indicates an unsigned literal.
~0 is an int. So your first piece of code isn't equivalent to the second, it's equivalent to
int tmp = ~0;
tmp = tmp >> 20;
unsigned leftmost1 = tmp;
You're seeing the results of sign extension when you right-shift a negative number.
0 has type int. ~0 is -1 on a typical two's complement machine. Right-shifting a negative number has implementation-defined results, but a common choice is to shift in 1 bits, which for -1 leaves the number unchanged (i.e. -1 >> anything is -1 again).
You can fix this by writing 0u (which is a literal of type unsigned int). This forces the operations to be done in unsigned int, as in your second example:
unsigned leftmost1 = ~0;
This line is equivalent to unsigned leftmost1 = -1, which implicitly converts -1 (a signed int) to UINT_MAX. The following operation (leftmost1 >> 20) then uses unsigned arithmetic.
Try casting like this. ~0 is promoted to int which is signed so it's carrying the sign bit when you shift
unsigned leftmost1 = ((unsigned)(~0)>>20);
printf("leftmost1 %u\n", leftmost1);
Related
I feel like the bloodiest beginner - Why does the following not work:
// declarations
unsigned short currentAddr= 0x0000;
unsigned short addr[20] = {1, 0};
// main
addr[1] = (~currentAddr)/2+1;
printf("addr[1] wert: %hu\n", addr[1]); // equals 1, expecte 0x8000
addr[1] = ~currentAddr>>1;
printf("addr[1] wert: %hu\n", addr[1]); // equals 65535, expected 0x7FFF
In printf and also in my debugger's watchlist the value for addr[1] is not as expected. My aim is to have half the maximum of the variable, here 0x8000.
Info: I am doing ~currentAddr to get the max. 0xFFFF in case short is in a different length on my embedded platform than here on my PC.
cheers, Stefan
What went wrong
The integer promotions are performed on the operand of the unary ~.
On many systems int is larger than short. On such systems, for unsigned short currentAddr = 0, the value of currentAddr is first promoted to int in the expression ~currentAddr. Then ~currentAddr evaluates to -1 (assuming twos-complement representation).
On some systems int and short may be the same size (though int must be at least as large as short); here currentAddr would instead be promoted to unsigned int since an int cannot hold all values of an unsigned integer type of the same size. In such a case, ~currentAddr would evaluate to UINT_MAX. For 16-bit int (short must be at least 16-bit, so here int and short would be the same size) the result of ~currentAddr would be 65,535.
The OP's system must have int larger than short. In the case of addr[1] = (~currentAddr)/2+1; this becomes addr[1] = (-1)/2+1; which evaluates to 1.
In the second case, addr[1] = ~currentAddr>>1; evaluates to addr[1] = (-1)>>1;. Here, the result of right-shifting a negative value is implementation-defined. In the present case, the result appears to be INT_MAX, which is converted to unsigned short in the assignment to addr[1], which takes the value USHRT_MAX in the conversion. This value is 65,535 on OP's system.
What to do about it
To obtain maximum and minimum values for standard integer types clearly and reliably, use the macros found in limits.h instead of attempting bit manipulations. This method will not disappoint:
#include <stdio.h>
#include <limits.h>
int main(void)
{
unsigned short val;
val = (USHRT_MAX / 2) + 1;
printf("(USHRT_MAX / 2) + 1: %#hx\n", val);
val = USHRT_MAX >> 1;
printf(" USHRT_MAX >> 1: %#hx\n", val);
return 0;
}
Program output:
(USHRT_MAX / 2) + 1: 0x8000
USHRT_MAX >> 1: 0x7fff
The problem lies here:
addr[1] = (~currentAddr)/2+1;
You expect currentAddr to 0xFFFF, which is partially right. But, what you might have missed out is integer promotion rule, which makes it 0xFFFFFFFF which is hexadecimal representation of -1.
Now, there is simple math:
(~currentAddr)/2+1 is nothing but 0x01 or 1, When you ~currentAddr>>1; do this shift, it is again becoming -1.
From
My aim is to have half the maximum of the variable, here 0x8000
If I understand you correctly, what you are trying to do is get the value which is equal to (Maximum value of Unsigned short)/2. If it is so, the proper way of doing it will be using USHRT_MAX. Of course, you'll need to include limits.h file in your source code.
Update:
Referring to your comments to David's answer, following changes works as expected. (You have tested, I haven't)
unsigned short c;
c = ~currentAddr;
unsigned short c_z = sizeof (c);
unsigned short ci;
ci = (c >> 1) + 1;
unsigned short ci_z = sizeof (ci);
addr[1] = ci;
Now, why this isn't promoted to integer as opposed to previous case
c = ~currentAddr;
It is promoted, but it yields an expected result because, as chux explainned (which I couldn't have done) it is (temporarily) promoted to int during its operation, but resolved as (converted to) a unsigned short again when it is stored in memory allocated to c.
The C standard answers the question:
From the C99 standard: 6.5.16.1 Simple assignment
In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.
In your case since both the LHS and RHS are of the same type, there is no need for any conversion.
Also, it says:
The type of an assignment expression is the type the left operand would have after lvalue conversion.
The same is specified by C11 6.5.16.1/2:
In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.
Try this yourself:
int main(void)
{
unsigned short c;
unsigned short currentAddr= 0x0000;
c = ~currentAddr;
printf("\n0x%x", c);
printf("\n0x%x", (~currentAddr));
return 0;
}
This should print:
0xffff
0xffffffff
addr[1] = (~currentAddr)/2+1;
Let us break it down: currentAddr is an unsigned short involved in a computation so the value/type is first promoted to int or unsigned. In C this is integer promotion.
If an int can represent all values of the original type ..., the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions. C11dr §6.3.1.1 2
When USHRT_MAX <= INT_MAX, (e.g. 16 bit short int/unsigned, 32-bit int/unsigned), code is like below. With currentAddr == 0 and typical 2's complement behavior, ~0 --> -1 and addr[1] --> 1.
int tmp = currentAddr;
addr[1] = (~tmp)/2+1;
When USHRT_MAX > INT_MAX, (e.g. 16 bit short int/unsigned, 16-bit int/unsigned), code is like below. With currentAddr == 0 and unsigned behavior, ~0 --> 0xFFFF and addr[1] --> 0x8000.
unsigned tmp = currentAddr;
addr[1] = (~tmp)/2+1;
My aim is to have half the maximum of the variable
The best way to get the maximum of an unsigned short is to use SHRT_MAX and skip the ~ code. It will work as expected regardless of unsigned short, int, unsigned range. It also better documents code intent.
#include <limits.h>
addr[1] = USHRT_MAX/2+1;
Because the number 2 is int and int can hold unsigned short,so,the the actual operation is addr[1] = (unsigned short)(((int)(~currentAddr)/2)+1)
When coding in C, I have accidently found that as for non-Ascii characters, after they are converted from char (1 byte) to int (4 bytes), the extra bits (3 bytes) are supplemented by 1 rather than 0. (As for Ascii characters, the extra bits are supplemented by 0.) For example:
char c[] = "ā";
int i = c[0];
printf("%x\n", i);
And the result is ffffffc4, rather than c4 itself. (The UTF-8 code for ā is \xc4\x81.)
Another related issue is that when performing right shift operations >> on a non-Ascii character, the extra bits on the left end are also supplemented by 1 rather than 0, even though the char variable is explicitly converted to unsigned int (for as for signed int, the extra bits are supplemented by 1 in my OS). For example:
char c[] = "ā";
unsigned int u_c;
int i = c[0];
unsigned int u_i = c[0];
c[0] = (unsigned int)c[0] >> 1;
u_c = (unsigned int)c[0] >> 1;
i = i >> 1;
u_i = u_i >> 1;
printf("c=%x\n", (unsigned int)c[0]); // result: ffffffe2. The same with the signed int i.
printf("u_c=%x\n", u_c); // result: 7fffffe2.
printf("i=%x\n", i); // result: ffffffe2.
printf("u_i=%x\n", u_i); // result: 7fffffe2.
Now I am confused with these results... Are they concerned with the data structures of char, int and unsigned int, or related to my operating system (ubuntu 14.04), or related to the ANSI C requirements? I have tried to compile this program with both gcc(4.8.4) and clang(3.4), but there is no difference.
Thank you so much!
It is implementation-defined whether char is signed or unsigned. On x86 computers, char is customarily a signed integer type; and on ARM it is customarily an unsigned integer type.
A signed integer will be sign-extended when converted to a larger signed type;
a signed integer converted to unsigned integer will use the modulo arithmetic to wrap the signed value into the range of the unsigned type as if by repeatedly adding or subtracting the maximum value of the unsigned type + 1.
The solution is to use/cast to unsigned char if you want the value to be portably zero-extended, or for storing small integers in range 0..255.
Likewise, if you want to store signed integers in range -127..127/128, use signed char.
Use char if the signedness doesn't matter - the implementation will probably have chosen the type that is the most efficient for the platform.
Likewise, for the assignment
unsigned int u_c; u_c = (uint8_t)c[0];,
Since -0x3c or -60 is not in the range of uint16_t, then the actual value is the value (mod UINT16_MAX + 1) that falls in the range of uint16_t; iow, we add or subtract UINT16_MAX + 1 (notice that the integer promotions could trick here so you might need casts if in C code) until the value is in the range. UINT16_MAX is naturally always 0xFFFFF; add 1 to it to get 0x10000. 0x10000 - 0x3C is 0xFFC4 that you saw. And then the uint16_t value is zero-extended to the uint32_t value.
Had you run this on a platform where char is unsigned, the result would have been 0xC4!
BTW in i = i >> 1;, i is a signed integer with a negative value; C11 says that the value is implementation-defined, so the actual behaviour can change from compiler to compiler. The GCC manuals state that
Signed >> acts on negative numbers by sign extension.
However a strictly-conforming program should not rely on this.
What I'm trying to do is make a mask with a 1 bit all the way to the left side of the set of bits with the rest being zero, irrespective of variable size. I tried the following:
unsigned char x = ~(~0 >> 1);
which, to me, should work whether it's done on a char or an int, but it doesn't!
To me, the manipulation looks like this:
||||||||
0|||||||
|0000000
This is what it appears it should look like, and on a 16-bit integer:
|||||||| ||||||||
0||||||| ||||||||
|0000000 00000000
Why doesn't this construct work? It's giving me zero whether I try to assign it to an unsigned char, or an int.
I'm on like 50 page of K&R, so I'm pretty new. I don't know what a literal means, I'm not sure what an "arithmetic" shift is, I don't know how to use suffix', and I damn sure can't use a structure.
~0 is the int zero with all bits inverted, which is the int consisting of all ones. On a 2s complement machine, this is a -1. Right shifting a -1 will cause sign extension, so ~0 >> 1 is still all ones.
What you want is to right shift an unsigned quantity, which will not invoke sign extension.
~0u >> 1
is an unsigned integer with the high order bit zero and all others set to 1, so
~(0u >> 1)
is an unsigned integer with the high order bit of one and all others set to zero.
Now getting this to work for all data sizes is nontrivial because C converts the operands of integer arithmetic to int or unsigned int beforehand. For example,
~(unsigned char)0 >> 1
produces an int result of -1 because the unsigned char is "promoted" to int before the ~ is applied.
So to get what you want with all data types, the only way I can see is to use sizeof to see how many bytes (or octets) are in the data.
#include <stdio.h>
#include <limits.h>
#define LEADING_ONE(X) (1 << (CHAR_BIT * sizeof(X) - 1))
int main(void) {
printf("%x\n", LEADING_ONE(char));
printf("%x\n", LEADING_ONE(int));
return 0;
}
The general rule for C is that expressions are evaluated in a common type, in this case (signed) integer. The evaluation of (~0) and (~0 >> 1) are signed integers and the shift is an arithmetic shift. In your case that is being implemented with sign extension, so:
(0xffffffff >> 1) => (0xffffffff)
A logical shift will inject the zero on the left that you were expecting, so your problem is how to make the compiler do a logical shift. Try:
unsigned char a = ~0;
unsigned char b = a >> 1; // this should do a logical shift
unsigned char c = ~b;
There are better ways to do what you are trying, but this should get you over the current problem.
There are two things that are giving you the unexpected result.
You are starting out with 0, which is treated as a signed int.
The intermediate results get converted to int.
If you work with unsigned char at strategic points, you should be OK.
unsigned char c = ((unsigned char)~0 >> 1);
c = ~c;
I'm looking here to understand sign extension:
http://www.shrubbery.net/solaris9ab/SUNWdev/SOL64TRANS/p8.html
struct foo {
unsigned int base:19, rehash:13;
};
main(int argc, char *argv[])
{
struct foo a;
unsigned long addr;
a.base = 0x40000;
addr = a.base << 13; /* Sign extension here! */
printf("addr 0x%lx\n", addr);
addr = (unsigned int)(a.base << 13); /* No sign extension here! */
printf("addr 0x%lx\n", addr);
}
They claim this:
------------------ 64 bit:
% cc -o test64 -xarch=v9 test.c
% ./test64
addr 0xffffffff80000000
addr 0x80000000
%
------------------ 32 bit:
% cc -o test32 test.c
% ./test32
addr 0x80000000
addr 0x80000000
%
I have 3 questions:
What is sign extension ? Yes I read wiki, but didn't understand when type promotion occurs, what's going on with sign extension?
Why ffff.. in 64 bit(referring addr) ?
When I do type cast, why no sign extension?
EDIT:
4. Why not an issue in 32 bit system?
The left operand of the << operator undergoes standard promotions, so in your case it is promoted to int -- so far so good. Next, the int of value 0x4000 is multiplied by 213, which causes overflow and thus undefined behaviour. However, we can see what's happening: the value of the expression is now simply INT_MIN, the smallest representable int. Finally, when you convert that to an unsigned 64-bit integer, the usual modular arithmetic rules entail that the resulting value is 0xffffffff80000000. Similarly, converting to an unsigned 32-bit integer gives the value 0x80000000.
To perform the operation on unsigned values, you need to control the conversions with a cast:
(unsigned int)(a.base) << 13
a.base << 13
The bitwise operator performs integer promotions on both its operands.
So this is equivalent to:
(int) a.base << 13
which is a negative value of type int.
Then:
addr = (int) a.base << 13;
converts this signed negative value ((int) a.base << 13) to the type of addr which is unsigned long through integer conversions.
Integer conversions (C99, 6.3.1.3p2) rules that is the same as doing:
addr = (long) ((int) a.base << 13);
The conversion long performs the sign extension here because ((int) a.base << 13) is a negative signed number.
On the other case, with a cast you have something equivalent to:
addr = (unsigned long) (unsigned int) ((int) a.base << 13);
so no sign extension is performed in your second case because (unsigned int) ((int) a.base << 13) is an unsigned (and positive of course) value.
EDIT: as KerrekSB mentioned in his answer a.base << 13 is actually not representable in an int (I assume 32-bit int) so this expression invokes undefined behavior and the implementation has he right to behave in any other way, for example crashing.
For information, this is definitely not portable but if you are using gcc, gcc does not consider a.base << 13 here as undefined behavior. From gcc documentation:
"GCC does not use the latitude given in C99 only to treat certain aspects of signed '<<' as undefined, but this is subject to change."
in http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
This is more of a question about bit-fields. Note that if you change the struct to
struct foo {
unsigned int base, rehash;
};
you get very different results.
As #JensGustedt noted in Type of unsigned bit-fields: int or unsigned int the specification says:
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int;
Even though you've specified that base is unsigned, the compiler converts it to a signed int when you read it. That's why you don't get sign extension when you cast it to unsigned int.
Sign extension has to do with how negative numbers are represented in binary. The most common scheme is 2s complement. In this scheme, -1 is represented in 32 bits as 0xFFFFFFFF, -2 is 0xFFFFFFFE, etc. So what should be done when we want to convert a 32-bit number to a 64-bit number, for example? If we convert 0xFFFFFFFF to 0x00000000FFFFFFFF, the numbers will have the same unsigned value (about 4 billion), but different signed values (-1 vs. 4 billion). On the other hand, if we convert 0xFFFFFFFF to 0xFFFFFFFFFFFFFFFF, the numbers will have the same signed value (-1) but different unsigned values. The former is called zero-extension (and is appropriate for unsigned numbers) and the latter is called sign-extension (and is appropriate for signed numbers). It's called "sign-extension" because the "sign bit" (the most significant, or left-most bit) is extended, or copied, to make the number wider.
It took me a while and a lot of reading/testing.
Maybe my, beginner way to understand what's going on will get to you (as I got it)
a.base=0x40000 (1(0)x18) -> 19-bit bitfield
addr=a.base<<13.
any value a.base can hold int can hold, too so conversion from 19-bit unsigned int bitfield to 32-bit signed integer. (a.base is now (0)x13,1,(0)x18).
now (converted to signed int a.base)<<13 which results in 1(0)x31). Remember it's signed int now.
addr=(1(0)x31). addr is of unsigned long type(64 bit) so to do the assignment righ value is converted to long int. Conversion from signed int to long int make addr (1)x33,(0)x31.
And that's what being printed after all of thos converstions you weren't even aware of:
0xffffffff80000000.
Why the second line prints 0x80000000 is because of that cast to (unsigned int) before conversion to long int. When converting unsigned int to long int there is no bit sign so value is just filled with trailing 0's to match the size and that's all.
What's different on with 32-bit, is during conversion from 32-bit signed int to 32-bit unsigned long their sizes match and do trailing bit signs are added,so:
1(0)x31 will stay 1(0)x31
even after conversion from int to long int(they have the same size, the value is interpreted different but bits are intact.)
Quotation from your link:
Any code that makes this assumption must be changed to work for both
ILP32 and LP64. While an int and a long are both 32-bits in the ILP32
data model, in the LP64 data model, a long is 64-bits.
I have two char´s that I want my program to interpert as one 2´s complement value. For example if I have:
char i = 0xFF;
char j = 0xF0;
int k = ((i<<8) | j);
Then I want C to interpert k as 2´s complement (so -16 in stead of 65520). How do I do this?
int variables, in comparison to unsigned int are always interpreted as two's complement. Your value is just not -16 :)
after you run your code, k will be (assuming 32 bit integer width)
k == 0x0000FFF0 // k == 65520
whereas:
-16 == 0xFFFFFFF0
what you can do, to overcome this, is setting all bits of k to 1 beforehand
int k = -1; // k == 0xFFFFFFFF
k &= ((i << 8) | j); // k == 0xFFFFFFF0
You want to set all the most significant bits, except the lower 16 to 1. Something like this should do it.
k |= (-1&~0xFFFF);
That said, if your compiler interprets chars as signed (as I think most do) k is already -16.
Furthermore, with signed chars your result will typically be incorrect if j has its most significant bit set (as it does in this case). During the evaluation of the expression, j is going to be type promoted to a negative number with all the most significant bits set. When such a number is ORed with the rest of the expression, those bits are going to override everything else. It only works in this case because i already has all its bits set so it makes no difference either way.
In general the bit operations in C/C++ on signed values might have undefined result (the specific format of numbers is not specified - specific wording in section about shifts) - for details see C99 standard. While most architectures currently use 2s-complement and most compilers will generate correct code it is unwise to rely on such assumption - compilers are known to introduce new optimalizations, which break incorrect code, even if said code have 'trivial' meaning (for human).
unsigned char i = 0xFF; // Char might be either signed or unsigned by default
unsigned char j = 0xF0;
uint16_t bit_result = (i << 8) | j; // 0XFFF0
int32_t sign = (bit_result & (1U << 15)) ? -(1U << 15) : 0;
int32_t result = sign + (bit_result & ((1U << 15) - 1));
The above code had no jumps after optimization [preventing constant propagation of i and j so it should be nearly as quick as the code below:
// WARNING: Undefined behaviour. Might return wrong value (depending on compiler, processor etc.)
unsigned char i = 0xFF;
unsigned char j = 0xF0;
unsigned uint16_t bit_result = (i << 8) | j; // 0xFFF0
int16_t result = bit_result;
In unlikely event that this is performance critical code AND the second code is faster you might consider the second one. Other wise I would use the first one as more correct.
You are compiling your code with a compiler that takes an unqualified char as unsigned. On my system it is taken as signed and I do get -16. If you really want 2's complement char, that is signed, then you can write that:
#include <stdio.h>
int main(void)
{
signed char i = 0xFF, j = 0xF0;
printf("%d\n", ((i<<8) | j));
return 0;
}
Just for reference, Appendix J.3.4 Implementation-defined behavior Characters
Which of signed char or unsigned char has the same range, representation,
and behavior as ‘‘plain’’ char (6.2.5, 6.3.1.1).
And in J.3.5 Implementation-defined behavior Integers
Whether signed integer types are represented using sign and magnitude, two’s
complement, or ones’ complement, and whether the extraordinary value is a trap
representation or an ordinary value (6.2.6.2).
As Maciej correctly points out it should be noted that shifting left of negative values is undefined behavior and thus should be avoided as compilers may assume you will never shift a negative value to the left.
6.5.7 Bitwise shift operators ad 4
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2^E2 , reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.