Arithmetic right-shift of signed integer - c

The C99 spec states:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2^E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.
I'm curious to know, which implementations/compilers will not treat a signed E1 >> 31 as a bunch of 11111....?

Most embedded compilers for microcontrollers tend to favour logical shift (shift in zeroes) instead of arithmetic shift (shift in the sign bit).
This is probably because signed numbers are somewhat rare in embedded systems, since such programming is much closer to the hardware and further away from users, than for example desktop programming with screens.
Signed numbers is nothing but user presentation after all. If you have no need to print numbers to a user, then you don't need signed numbers very often at all.
And of course, it doesn't really make any sense to use shift on signed numbers to begin with. I have never in my programming career encountered a scenario when I had needed to do that. Meaning in most cases, such shifts are just accidental bugs.

You can simulate a signed, 2's complement arithmetic right shift using unsigned types without the use of if statements. For example:
#include <limits.h>
unsigned int asr(unsigned int x, unsigned int shift)
{
return (x >> shift) | -((x & ~(UINT_MAX >> 1)) >> shift);
}
You might need to use a different unsigned type and its associated max value in your code.

Related

What is the safest cross-platform way to get the low byte or the high byte of a 16-bit integer?

While looking at various sdks it seems LOBYTE and HIBYTE are rarely consistent as shown below.
Windows
#define LOBYTE(w) ((BYTE)(((DWORD_PTR)(w)) & 0xff))
#define HIBYTE(w) ((BYTE)((((DWORD_PTR)(w)) >> 8) & 0xff))
Various Linux Headers
#define HIBYTE(w) ((u8)(((u16)(w) >> 8) & 0xff))
#define LOBYTE(w) ((u8)(w))
Why is & 0xff needed if it's cast to a u8? Why wouldn't the following be the way to go? (assuming uint8_t and uint16_t are defined)
#define HIBYTE(w) ((uint8_t)(((uint16_t)(w) >> 8)))
#define LOBYTE(w) ((uint8_t)(w))
From ISO/IEC 9899:TC3, 6.3.1.3 Signed and unsigned integers (under 6.3 Conversions):
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it
is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
While that sounds a little convoluted, it answers the following question.
Why is & 0xff needed if it's cast to a u8?
It is not needed, because the cast does the masking automatically.
When it comes to the question in the topic, the OP's last suggestion is:
#define HIBYTE(w) ((uint8_t)(((uint16_t)(w) >> 8)))
#define LOBYTE(w) ((uint8_t)(w))
That will work as expected for all unsigned values. Signed values will always be converted to unsigned values by the macros, which in the case of two's complement will not change the representation, so the results of the calculations are well defined. Assuming two's complement, however, is not portable, so the solution is not strictly portable for signed integers.
Implementing a portable solution for signed integers would be quite difficult, and one could even question the meaning of such an implementation:
Is the result supposed to be signed or unsigned?
If the result is supposed to be unsigned, it does not really qualify as the high/low byte of the initial number, since a change of representation might be necessary to obtain it.
If the result is supposed to signed, it would have to be further specified. The result of >> for negative values, for instance, is implementation-defined, so getting a portable well-defined "high byte" sounds challenging. One should really question the purpose of such a calculation.
And since we are playing language lawyer, we might want to wonder about the signedness of the left operand of (uint16_t)(w) >> 8. Unsigned could seem as the obvious answer, but it is not, because of the integer promotion rules.
Integer promotion applies, among others, to objects or expressions specified as follows.
An object or expression with an integer type whose integer conversion rank is less than or equal to the rank of int and unsigned int.
The integer promotion rule in such a case is specified as:
If an int can represent all values of the original type, the value is converted to an int;
That will be the case for the left operand on a typical 32-bit or 64-bit machine.
Fortunately in such a case, the left operand after conversion will still be nonnegative, which makes the result of >> well defined:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2.

can't shift negative numbers to the right in c

I am going through 'The C language by K&R'. Right now I am doing the bitwise section. I am having a hard time in understanding the following code.
int mask = ~0 >> n;
I was playing on using this to mask n left side of another binary like this.
0000 1111
1010 0101 // random number
My problem is that when I print var mask it still negative -1. Assuming n is 4. I thought shifting ~0 which is -1 will be 15 (0000 1111).
thanks for the answers
Performing a right shift on a negative value yields an implementation defined value. Most hosted implementations will shift in 1 bits on the left, as you've seen in your case, however that doesn't necessarily have to be the case.
Unsigned types as well as positive values of signed types always shift in 0 bits on the left when shifting right. So you can get the desired behavior by using unsigned values:
unsigned int mask = ~0u >> n;
This behavior is documented in section 6.5.7 of the C standard:
5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative
value, the value of the result is the integral part of the quotient
of E1 / 2E2 .If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
Right-shifting negative signed integers is an implementation-defined behavior, which is usually (but not always) filling the left with ones instead of zeros. That's why no matter how many bits you've shifted, it's always -1, as the left is always filled by ones.
When you shift unsigned integers, the left will always be filled by zeros. So you can do this:
unsigned int mask = ~0U >> n;
^
You should also note that int is typically 2 or 4 bytes, meaning if you want to get 15, you need to right-shift 12 or 28 bits instead of only 4. You can use a char instead:
unsigned char mask = ~0U;
mask >>= 4;
In C, and many other languages, >> is (usually) an arithmetic right shift when performed on signed variables (like int). This means that the new bit shifted in from the left is a copy of the previous most-significant bit (MSB). This has the effect of preserving the sign of a two's compliment negative number (and in this case the value).
This is in contrast to a logical right shift, where the MSB is always replaced with a zero bit. This is applied when your variable is unsigned (e.g. unsigned int).
From Wikipeda:
The >> operator in C and C++ is not necessarily an arithmetic shift. Usually it is only an arithmetic shift if used with a signed integer type on its left-hand side. If it is used on an unsigned integer type instead, it will be a logical shift.
In your case, if you plan to be working at a bit level (i.e. using masks, etc.) I would strongly recommend two things:
Use unsigned values.
Use types with specific sizes from <stdint.h> like uint32_t

Why first bitshifting left and then right, instead of AND-ing?

I came across this piece of C code:
typedef int gint
// ...
gint a, b;
// ...
a = (b << 16) >> 16;
For ease of notation let's assume that b = 0x11223344 at this point. As far as I can see it does the following:
b << 16 will give 0x33440000
>> 16 will give 0x00003344
So, the 16 highest bits are discarded.
Why would anyone write (b << 16) >> 16 if b & 0x0000ffff would work as well? Isn't the latter form more understandable? Is there any reason to use bitshifts in a case like this? Is there any edge-case where the two could not be the same?
Assuming that the size of int is 32 bits, then there is no need to use shifts. Indeed, bitwise & with a mask would be more readable, more portable and safer.
It should be noted that left-shifting on negative signed integers invokes undefined behavior, and that left-shifting things into the sign bits of a signed integer could also invoke undefined behavior. C11 6.5.7 (emphasis mine):
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the value of
the result is E1 × 2E2, reduced modulo one more than the maximum value
representable in the result type. If E1 has a signed type and
nonnegative value, and E1 × 2E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is
undefined.
(The only possible rationale I can think of, is some pre-mature optimization for a 16-bit CPU that comes with a poor compiler. Then the code would be more efficient if you broke up the arithmetic in 16 bit chunks. But on such a system, int would most likely be 16 bits, so the code wouldn't make any sense then.)
As a side note, it doesn't make any sense to use the signed int type either. The most correct and safe type for this code would have been uint32_t.
So, the 16 highest bits are discarded.
They are not. Though it is formally implementation-defined how right-shift operation is performed on signed types, most compilers do it so as to replicate the sign bit.
Thus, the 16 highest bits are filled by the replicated value of the 15th bit as the result of this expression.
For an unsigned integral type (eg, the uint32_t we first thought was being used),
(b << 16) >> 16
is identical to b & (1<<16 - 1).
For a signed integral type though,
(b << 16)
could become negative (ie, the low int16_t would have been considered negative when taken on its own), in which case
(b << 16) >> 16
will (probably) still be negative due to sign extension. In that case, it isn't the same as the & mask, because the top bits will be set instead of zero.
Either this behaviour is deliberate (in which case the commented-out typedef is misleading), or it's a bug. I can't tell without reading the code.
Oh, and the shift behaviour in both directions is how I'd expect gcc to behave on x86, but I can't comment on how portable it is outside that. The left-shift may be UB as Lundin points out, and sign extension on the right-shift is implementation defined.

Why doesn't this swap macro using shifts not work for negative numbers?

I found some code in a built in library that I have and need to extend.
But it appears to be broken:
#define BSWAP16(__x) ((((__x) >> 8) | ((__x) << 8)))
Does not function the same as:
__builtin_bswap16()
This program proves it.
#include <stdio.h>
#define BSWAP16(__x) ((((__x) >> 8) | ((__x) << 8)))
int main(int argc, char* argv[])
{
unsigned short a = (unsigned short)BSWAP16(0xff00);
unsigned short b = __builtin_bswap16(0xff00);
short c = (short)BSWAP16(-8);
short d = __builtin_bswap16(-8);
printf("a=%04x, b=%04x, c=%04x, d=%04x\n", a,b,c,d);
return 0;
}
Output:
a=00ff, b=00ff, c=ffffffff, d=fffff8ff
I'm not wanting answers telling me I should use endian.h or __builtin_bswap16. On the target platform used by this in-house library on this platform/compiler configuration, I'm triggering this default code that is defaulting to using the above macro.
So my question is. Why doesn't it work for negative numbers?
If I represent -8 as a short value of 0xfff8 it works.
So I'm guessing it has something to do with internal conversion to int.
How do I fix this macro to work properly?
It is undefined behavior to left shift a negative number, from the draft C99 standard section 6.5.7 Bitwise shift operators which says (emphasis mine going forward):
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the value of
the result is E1 ´ 2E2, reduced modulo one more than the maximum value
representable in the result type. If E1 has a signed type and
nonnegative value, and E1 ´ 2E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is
undefined.
So the result of the left shift of -8 is unpredictable. Casting to unsigned short should fix the issue:
BSWAP16((unsigned short)-8)
-8 is an integer constant(literal) and since it does not have a suffix will be an int since int can take on it's value. Assuming 32-bit int and twos complement will have have the following value:
FFFFFFF8
casting to unsigned short will remove the unwanted higher bits. Casting to unsigned int won't help since it will preserve the higher bits.
Also right shifting a negative number is implementation defined:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has
an unsigned type or if E1 has a signed type and a nonnegative value,
the value of the result is the integral part of the quotient of E1 /
2E2. If E1 has a signed type and a negative value, the resulting value
is implementation-defined.
You should mask out the unwanted bits before shifting.
#define BSWAP16(__x) (((((__x) & 0xFF00) >> 8) | (((__x) & 0xFF) << 8)))
Regarding why it doesn't work without shifting, note that -8 is indeed 0xFFFFFFF8. Without masking, there are plenty of higher 1 bits. int is used during computation.

Why does right shifting negative numbers in C bring 1 on the left-most bits? [duplicate]

This question already has answers here:
Are the shift operators (<<, >>) arithmetic or logical in C?
(11 answers)
Closed 7 years ago.
The book "C The Complete Reference" by Herbert Schildt says that "(In the case of a signed, negative integer, a right shift will cause a 1 to be brought in so that the sign bit is preserved.)"
What's the point of preserving the sign bit?
Moreover, I think that the book is referring to the case when negative numbers are represented using a sign bit and not using two's complement. But still even in that case the reasoning doesn't seem to make any sense.
The Schildt book is widely acknowledged to be exceptionally poor.
In fact, C doesn't guarantee that a 1 will be shifted in when you right-shift a negative signed number; the result of right-shifting a negative value is implementation-defined.
However, if right-shift of a negative number is defined to shift in 1s to the highest bit positions, then on a 2s complement representation it will behave as an arithmetic shift - the result of right-shifting by N will be the same as dividing by 2N, rounding toward negative infinity.
The statement is sweeping and inaccurate, like many a statement by Mr Schildt. Many people recommend throwing his books away. (Amongst other places, see The Annotated Annotated C Standard, and ACCU Reviews — do an author search on Schildt; see also the Definitive List of C Books on Stack Overflow).
It is implementation defined whether right shifting a negative (necessarily signed) integer shifts zeros or ones into the high order bits. The underlying CPUs (for instance, ARM; see also this class) often have two different underlying instructions — ASR or arithmetic shift right and LSR or logical shift right, of which ASR preserves the sign bit and LSR does not. The compiler writer is allowed to choose either, and may do so for reasons of compatibility, speed or whimsy.
ISO/IEC 9899:2011 §6.5.7 Bitwise shift operators
¶5 The result of E1 >> E2is E1 right-shifted E2 bit positions. If E1 has an unsigned type
or if E1 has a signed type and a nonnegative value, the value of the result is the integral
part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
The point is that the C >> (right shift) operator preserves1 the sign for a (signed) int.
For example:
int main() {
int a;
unsigned int b;
a = -8;
printf("%d (0x%X) >> 1 = %d (0x%X)\n", a, a, a>>1, a>>1);
b = 0xFFEEDDCC;
printf("%d (0x%X) >> 1 = %d (0x%X)\n", b, b, b>>1, b>>1);
return 0;
}
Output:
-8 (0xFFFFFFF8) >> 1 = -4 (0xFFFFFFFC) [sign preserved, LSB=1]
-1122868 (0xFFEEDDCC) >> 1 = 2146922214 (0x7FF76EE6) [MSB = 0]
If it didn't preserve the sign, the result would make absolutely no sense. You would take a small negative number, and by shifting right one (dividing by two), you would end up with a large positive number instead.
1 - This is implementation-defined, but from my experience, most compilers choose an arithmetic (sign-preserving) shift instruction.
In the case of a signed, negative integer, a right shift will cause a 1 to be brought in so that the sign bit is preserved
Not necessarily. See the C standard C11 6.5.7:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has
an unsigned type or if E1 has a signed type and a nonnegative value,
the value of the result is the integral part of the quotient of E1 /
2E2. If E1 has a signed type and a negative value, the resulting value
is implementation-defined.
This means that the compiler is free to shift in whatever it likes (0 or 1), as long as it documents it.

Resources