chars to 2´s complement in C - c

I have two char´s that I want my program to interpert as one 2´s complement value. For example if I have:
char i = 0xFF;
char j = 0xF0;
int k = ((i<<8) | j);
Then I want C to interpert k as 2´s complement (so -16 in stead of 65520). How do I do this?

int variables, in comparison to unsigned int are always interpreted as two's complement. Your value is just not -16 :)
after you run your code, k will be (assuming 32 bit integer width)
k == 0x0000FFF0 // k == 65520
whereas:
-16 == 0xFFFFFFF0
what you can do, to overcome this, is setting all bits of k to 1 beforehand
int k = -1; // k == 0xFFFFFFFF
k &= ((i << 8) | j); // k == 0xFFFFFFF0

You want to set all the most significant bits, except the lower 16 to 1. Something like this should do it.
k |= (-1&~0xFFFF);
That said, if your compiler interprets chars as signed (as I think most do) k is already -16.
Furthermore, with signed chars your result will typically be incorrect if j has its most significant bit set (as it does in this case). During the evaluation of the expression, j is going to be type promoted to a negative number with all the most significant bits set. When such a number is ORed with the rest of the expression, those bits are going to override everything else. It only works in this case because i already has all its bits set so it makes no difference either way.

In general the bit operations in C/C++ on signed values might have undefined result (the specific format of numbers is not specified - specific wording in section about shifts) - for details see C99 standard. While most architectures currently use 2s-complement and most compilers will generate correct code it is unwise to rely on such assumption - compilers are known to introduce new optimalizations, which break incorrect code, even if said code have 'trivial' meaning (for human).
unsigned char i = 0xFF; // Char might be either signed or unsigned by default
unsigned char j = 0xF0;
uint16_t bit_result = (i << 8) | j; // 0XFFF0
int32_t sign = (bit_result & (1U << 15)) ? -(1U << 15) : 0;
int32_t result = sign + (bit_result & ((1U << 15) - 1));
The above code had no jumps after optimization [preventing constant propagation of i and j so it should be nearly as quick as the code below:
// WARNING: Undefined behaviour. Might return wrong value (depending on compiler, processor etc.)
unsigned char i = 0xFF;
unsigned char j = 0xF0;
unsigned uint16_t bit_result = (i << 8) | j; // 0xFFF0
int16_t result = bit_result;
In unlikely event that this is performance critical code AND the second code is faster you might consider the second one. Other wise I would use the first one as more correct.

You are compiling your code with a compiler that takes an unqualified char as unsigned. On my system it is taken as signed and I do get -16. If you really want 2's complement char, that is signed, then you can write that:
#include <stdio.h>
int main(void)
{
signed char i = 0xFF, j = 0xF0;
printf("%d\n", ((i<<8) | j));
return 0;
}
Just for reference, Appendix J.3.4 Implementation-defined behavior Characters
Which of signed char or unsigned char has the same range, representation,
and behavior as ‘‘plain’’ char (6.2.5, 6.3.1.1).
And in J.3.5 Implementation-defined behavior Integers
Whether signed integer types are represented using sign and magnitude, two’s
complement, or ones’ complement, and whether the extraordinary value is a trap
representation or an ordinary value (6.2.6.2).
As Maciej correctly points out it should be noted that shifting left of negative values is undefined behavior and thus should be avoided as compilers may assume you will never shift a negative value to the left.
6.5.7 Bitwise shift operators ad 4
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2^E2 , reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

Related

Is there a generic "isolate a single byte" bit mask for all systems, irrespective of CHAR_BIT?

If CHAR_BIT == 8 on your target system (most cases), it's very easy to mask out a single byte:
unsigned char lsb = foo & 0xFF;
However, there are a few systems and C implementations out there where CHAR_BIT is neither 8 nor a multiple thereof. Since the C standard only mandates a minimum range for char values, there is no guarantee that masking with 0xFF will isolate an entire byte for you.
I've searched around trying to find information about a generic "byte mask", but so far haven't found anything.
There is always the O(n) solution:
unsigned char mask = 1;
size_t i;
for (i = 0; i < CHAR_BIT; i++)
{
mask |= (mask << i);
}
However, I'm wondering if there is any O(1) macro or line of code somewhere that can accomplish this, given how important this task is in many system-level programming scenarios.
The easiest way to extract an unsigned char from an integer value is simply to cast it to unsigned char:
(unsigned char) SomeInteger
Per C 2018 6.3.1.3 2, the result is the remainder of SomeInteger modulo UCHAR_MAX+1. (This is a non-negative remainder; it is always adjusted to be greater than or equal to zero and less than UCHAR_MAX+1.)
Assigning to an unsigned char has the same effect, as assignment performs a conversion (and initializing works too):
unsigned char x;
…
x = SomeInteger;
If you want an explicit bit mask, UCHAR_MAX is such a mask. This is so because unsigned integers are pure binary in C, and the maximum value of an unsigned integer has all value bits set. (Unsigned integers in general may also have padding bit, but unsigned char may not.)
One difference can occur in very old or esoteric systems: If a signed integer is represented with sign-and-magnitude or one’s complement instead of today’s ubiquitous two’s complement, then the results of extracting an unsigned char from a negative value will differ depending on whether you use the conversion method or the bit-mask method.
On review (after accept) , #Eric Postpischil answer's part about UCHAR_MAX makes for a preferable mask.
#define BYTE_MASK UCHAR_MAX
The value UCHAR_MAX shall equal 2CHAR_BIT − 1. C11dr §5.2.4.2.1 2
As unsigned char cannot have padding. So UCHAR_MAX is always the all bits set pattern in a character type and hence in a C "byte".
some_signed & some_unsigned is a problem on non-2's complement as the some_signed is convert to unsigned before the & thus changing the bit pattern on negative vales. To avoid, the all ones mask needs to be signed when masking signed types. The is usually the case with foo & UINT_MAX
Conclusion
Assume: foo is of some integer type.
If only 2's complement is of concern, use a cast - it does not change the bit pattern.
unsigned char lsb = (unsigned char) foo;
Otherwise with any integer encoding and CHAR_MAX <= INT_MAX
unsigned char lsb = foo & UCHAR_MAX;
Otherwise TBD
Shifting an unsigned 1 by CHAR_BIT and then subtracting 1 will work even on esoteric non-2's complement systems. #Some programmer dude. Be sure to use unsigned math.
On such systems, this preserves the bit patten unlike (unsigned char) cast on negative integers.
unsigned char mask = (1u << CHAR_BIT) - 1u;
unsigned char lsb = foo & mask;
Or make a define
#define BYTE_MASK ((1u << CHAR_BIT) - 1u)
unsigned char lsb = foo & BYTE_MASK;
To also handle those pesky cases where UINT_MAX == UCHAR_MAX where 1u << CHAR_BIT would be UB, shift in 2 steps.
#define BYTE_MASK (((1u << (CHAR_BIT - 1)) << 1u) - 1u)
UCHAR_MAX does not have to be equal to (1U << CHAR_BIT) - 1U
you need actually to and with that calculated value not with the UCHAR_MAX
value & ((1U << CHAR_BIT) - 1U).
Many real implementations (for example TI) define UCHAR_MAX as 255 and emit the code which behaves like the one on the machines having 8 bits bytes. It is done to preserve compatibility with the code written for other targets.
For example
unsigned char x;
x++;
will generate the code which checks in the value of x is larger than UCHAR_MAX and if it the truth zeroing the 'x'

can't shift negative numbers to the right in c

I am going through 'The C language by K&R'. Right now I am doing the bitwise section. I am having a hard time in understanding the following code.
int mask = ~0 >> n;
I was playing on using this to mask n left side of another binary like this.
0000 1111
1010 0101 // random number
My problem is that when I print var mask it still negative -1. Assuming n is 4. I thought shifting ~0 which is -1 will be 15 (0000 1111).
thanks for the answers
Performing a right shift on a negative value yields an implementation defined value. Most hosted implementations will shift in 1 bits on the left, as you've seen in your case, however that doesn't necessarily have to be the case.
Unsigned types as well as positive values of signed types always shift in 0 bits on the left when shifting right. So you can get the desired behavior by using unsigned values:
unsigned int mask = ~0u >> n;
This behavior is documented in section 6.5.7 of the C standard:
5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative
value, the value of the result is the integral part of the quotient
of E1 / 2E2 .If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
Right-shifting negative signed integers is an implementation-defined behavior, which is usually (but not always) filling the left with ones instead of zeros. That's why no matter how many bits you've shifted, it's always -1, as the left is always filled by ones.
When you shift unsigned integers, the left will always be filled by zeros. So you can do this:
unsigned int mask = ~0U >> n;
^
You should also note that int is typically 2 or 4 bytes, meaning if you want to get 15, you need to right-shift 12 or 28 bits instead of only 4. You can use a char instead:
unsigned char mask = ~0U;
mask >>= 4;
In C, and many other languages, >> is (usually) an arithmetic right shift when performed on signed variables (like int). This means that the new bit shifted in from the left is a copy of the previous most-significant bit (MSB). This has the effect of preserving the sign of a two's compliment negative number (and in this case the value).
This is in contrast to a logical right shift, where the MSB is always replaced with a zero bit. This is applied when your variable is unsigned (e.g. unsigned int).
From Wikipeda:
The >> operator in C and C++ is not necessarily an arithmetic shift. Usually it is only an arithmetic shift if used with a signed integer type on its left-hand side. If it is used on an unsigned integer type instead, it will be a logical shift.
In your case, if you plan to be working at a bit level (i.e. using masks, etc.) I would strongly recommend two things:
Use unsigned values.
Use types with specific sizes from <stdint.h> like uint32_t

Bitshift on assignment has no effect on the variable

I thought I'd found something similar in this answer but in that case they weren't assigning the result of the expression to the variable. In my case I am assigning it but the bitshift part of the expression has no effect.
unsigned leftmost1 = ((~0)>>20);
printf("leftmost1 %u\n", leftmost1);
Returns
leftmost1 4294967295
Whereas
unsigned leftmost1 = ~0;
leftmost1 = leftmost1 >> 20;
printf("leftmost1 %u\n", leftmost1);
Gives me
leftmost1 4095
I would expect separating the logic into two lines would have no impact, why are the results different?
In the first case, you are doing a signed right shift, because ~0 results in a signed value. The exact behavior of signed right shifts is implementation-defined, but most platforms, including yours, extend the sign bit, so the shift is a no-op for your input of "all ones".
In the second case, you are doing an unsigned right shift, since leftmost1 is an unsigned value. So you shift in zeros from the left.
If you wanted to do an unsigned shift without the intermediate assignmetn, you can do:
(~0u) >> 20
Where the u suffix indicates an unsigned literal.
~0 is an int. So your first piece of code isn't equivalent to the second, it's equivalent to
int tmp = ~0;
tmp = tmp >> 20;
unsigned leftmost1 = tmp;
You're seeing the results of sign extension when you right-shift a negative number.
0 has type int. ~0 is -1 on a typical two's complement machine. Right-shifting a negative number has implementation-defined results, but a common choice is to shift in 1 bits, which for -1 leaves the number unchanged (i.e. -1 >> anything is -1 again).
You can fix this by writing 0u (which is a literal of type unsigned int). This forces the operations to be done in unsigned int, as in your second example:
unsigned leftmost1 = ~0;
This line is equivalent to unsigned leftmost1 = -1, which implicitly converts -1 (a signed int) to UINT_MAX. The following operation (leftmost1 >> 20) then uses unsigned arithmetic.
Try casting like this. ~0 is promoted to int which is signed so it's carrying the sign bit when you shift
unsigned leftmost1 = ((unsigned)(~0)>>20);
printf("leftmost1 %u\n", leftmost1);

Why first bitshifting left and then right, instead of AND-ing?

I came across this piece of C code:
typedef int gint
// ...
gint a, b;
// ...
a = (b << 16) >> 16;
For ease of notation let's assume that b = 0x11223344 at this point. As far as I can see it does the following:
b << 16 will give 0x33440000
>> 16 will give 0x00003344
So, the 16 highest bits are discarded.
Why would anyone write (b << 16) >> 16 if b & 0x0000ffff would work as well? Isn't the latter form more understandable? Is there any reason to use bitshifts in a case like this? Is there any edge-case where the two could not be the same?
Assuming that the size of int is 32 bits, then there is no need to use shifts. Indeed, bitwise & with a mask would be more readable, more portable and safer.
It should be noted that left-shifting on negative signed integers invokes undefined behavior, and that left-shifting things into the sign bits of a signed integer could also invoke undefined behavior. C11 6.5.7 (emphasis mine):
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the value of
the result is E1 × 2E2, reduced modulo one more than the maximum value
representable in the result type. If E1 has a signed type and
nonnegative value, and E1 × 2E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is
undefined.
(The only possible rationale I can think of, is some pre-mature optimization for a 16-bit CPU that comes with a poor compiler. Then the code would be more efficient if you broke up the arithmetic in 16 bit chunks. But on such a system, int would most likely be 16 bits, so the code wouldn't make any sense then.)
As a side note, it doesn't make any sense to use the signed int type either. The most correct and safe type for this code would have been uint32_t.
So, the 16 highest bits are discarded.
They are not. Though it is formally implementation-defined how right-shift operation is performed on signed types, most compilers do it so as to replicate the sign bit.
Thus, the 16 highest bits are filled by the replicated value of the 15th bit as the result of this expression.
For an unsigned integral type (eg, the uint32_t we first thought was being used),
(b << 16) >> 16
is identical to b & (1<<16 - 1).
For a signed integral type though,
(b << 16)
could become negative (ie, the low int16_t would have been considered negative when taken on its own), in which case
(b << 16) >> 16
will (probably) still be negative due to sign extension. In that case, it isn't the same as the & mask, because the top bits will be set instead of zero.
Either this behaviour is deliberate (in which case the commented-out typedef is misleading), or it's a bug. I can't tell without reading the code.
Oh, and the shift behaviour in both directions is how I'd expect gcc to behave on x86, but I can't comment on how portable it is outside that. The left-shift may be UB as Lundin points out, and sign extension on the right-shift is implementation defined.

Assigning bits to a 64-bit variable

I am kinda new to bit operations. I am trying to store information in an int64_t variable like this:
int64_t u = 0;
for(i=0;i<44;i++)
u |= 1 << i;
for(;i<64;i++)
u |= 0 << i;
int t = __builtin_popcountl(u);
and what I intended with this was to store 44 1s in variable u and make sure that the remaining positions are all 0, so "t" returns 44. However, it always returns 64. With other variables, e.g. int32, it also fails. Why?
The type of an expression is generally determined by the expression itself, not by the context in which it appears.
Your variable u is of type int64_t (incidentally, uint64_t would be better since you're performing bitwise operations).
In this line:
u |= 1 << i;
since 1 is of type int, 1 << i is also of type int. If, as is typical, int is 32 bits, this has undefined behavior for larger values of i.
If you change this line to:
u |= (uint64_t)1 << i;
it should do what you want.
You could also change the 1 to 1ULL. That gives it a type of unsigned long long, which is guaranteed to be at least 64 bits but is not necessarily the same type as uint64_t.
__builtin_popcountl takes unsigned long as its paremeter, which is not always 64-bit integer. I personally use __builtin_popcountll, which takes long long. Looks like it's not the case for you
Integers have type 'int' by default, and by shifting int by anything greater or equal to 32 (to be precise, int's size in bits), you get undefined behavior. Correct usage: u |= 1LL << i; Here LL stands for long long.
Oring with zero does nothing. You can't just set bit to a particular value, you should either OR with mask (if you want to set some bits to 1s) or AND with mask's negation (if you want to set some bits to 0s), negation is done by tilda (~).
When you shift in the high bit of the 32-bit integer and and convert to 64-bit the sign bit will extend through the upper 32 bits; which you will then OR in setting all 64 bits, because your literal '1' is a signed 32 bit int by default. The shift will also not effect the upper 32 bits because the value is only 32 bit; however the conversion to 64-bit will when the the value being converted is negative.
This can be fixed by writing your first loop like this:
for(i=0;i<44;i++)
u |= (int64_t)1 << i;
Moreover, this loop does nothing since ORing with 0 will not alter the value:
for(;i<64;i++)
u |= 0 << i;

Resources