Same-line vs. multi-lined bitwise operation discrepancy - c

While writing a program for uni, I noticed that
unsigned char byte_to_write_1 = (0xFF << 2) >> 2; ==> 0xFF (wrong)
unsigned char byte_to_write_2 = (0xFF << 2);
byte_to_write_2 = byte_to_write_2 >> 2; ==> 0x3F (correct)
I don't understand what's causing the discrepancy... my best guess is that while modifying a byte with multiple operations on the same line, C "holds onto" the extra bits in a slightly larger datatype until the line is terminated so 0xFF << 2 is held as 11[1111 1100] instead of 1111 1100 so on the same-line shiftback, the result is 1111 1111 instead of 1111 1100.
What causes the difference in results? Thanks in advance...
I first noticed the issue in a larger code project but have been able to recreate the issue using a much more simple program.image of simplified code to showcase problem

unsigned char byte_to_write_1 = (0xFF << 2) >> 2; // 0xFF
0xFF is an int. (Obtaining 0xFF from an unsigned char wouldn't change anything since it would get promoted to an int.) 0xFF << 2 is 0x3FC. 0x3FC >> 2 is 0xFF.
unsigned char byte_to_write_2 = 0xFF << 2; // 0xFC
byte_to_write_2 >>= 2; // 0x3F
We've already established that 0xFF << 2 is 0x3FC. But you assign that to an unsigned char which is presumably only 8 bits in size. So you end up assigning 0xFC instead. (gcc warns about this if you enable warnings as you should.)
And of course, you get the desired value when you right-shift that.
Solutions:
(unsigned char)( 0xFF << 2 ) >> 2
( ( 0xFF << 2 ) & 0xFF ) >> 2
0xFF & 0x3F
Demo on Compiler Explorer.

The difference between these two code snippets
unsigned char byte_to_write_1 = (0xFF << 2) >> 2; ==> 0xFF (wrong)
and
unsigned char byte_to_write_2 = (0xFF << 2);
byte_to_write_2 = byte_to_write_2 >> 2; ==> 0x3F (correct)
is that in the second code snippet there is used the variable byte_to_write_2 to store the intermediate result of the expression (0xFF << 2). The variable can not hold the full integer result. So the integer result is converted to the type unsigned char that can store only one byte.
The maximum value that can be stored in an object of the type unsigned character is 0xFF
β€” maximum value for an object of type unsigned char
UCHAR_MAX 255 // 28 βˆ’ 1
that is equal to 255. While the expression 0xFF << 2 that is equivalent to 255 * 4 can not fit in an object of the type unsigned char.
In the first code snippet the intermediate result of the expression (0xFF << 2) has the type int due to the integer promotion and can be used without a change in the full expression (0xFF << 2) >> 2.
Consider the outputs of these two calls of printf
printf( "0xFF << 2 = %d\n", 0xFF << 2 );
printf( "( unsigned char )( 0xFF << 2 ) = %d\n", ( unsigned char )( 0xFF << 2 ) );
They are
0xFF << 2 = 1020
( unsigned char )( 0xFF << 2 ) = 252

Related

Arduino - Converto long to byte and back to long

I took this example from the following page. I am trying to convert long into a 4 byte array. This is the original code from the page.
long n;
byte buf[4];
buf[0] = (byte) n;
buf[1] = (byte) n >> 8;
buf[2] = (byte) n >> 16;
buf[3] = (byte) n >> 24;
long value = (unsigned long)(buf[4] << 24) | (buf[3] << 16) | (buf[2] << 8) | buf[1];
I modified the code replacing
long value = (unsigned long)(buf[4] << 24) | (buf[3] << 16) | (buf[2] << 8) | buf[1];
for
long value = (unsigned long)(buf[3] << 24) | (buf[2] << 16) | (buf[1] << 8) | buf[0];
I tried the original code where n is 15000 and value would return 0. After modifiying the line in question (i think there was an error in the indexes on the original post?) value returns 152.
The objetive is to have value return the same number as n. Also, n can be negative, so value should also return the same negative number.
Not sure what I am doing wrong. Thanks!
You were correct that the indices were wrong. A 4-byte array indexes from 0 to 3, not 1 to 4.
The rest of the issues were because you were using a signed 'long' type. Doing bit-manipulations on signed datatypes is not well defined, since it assumes something about how signed integers are stored (twos-complement on most systems, although I don't think any standard requires it).
e.g. see here
You're then assigning between signed 'longs' and unsigned 'bytes'.
Someone else has posted an answer (possibly abusing casts) that I'm sure works. But without any explanation I feel it doesn't help much.

Converting Char array to Long in C error

I try to covert unsigned long int to char
but I got some error
int main(void)
{
unsigned char pdest[4];
unsigned long l=0xFFFFFFFF;
pdest[0] = l & 0xFF;
pdest[1] = (l >> 8) & 0xFF;
pdest[2] = (l >> 16) & 0xFF;
pdest[3] = (l >> 24) & 0xFF;
unsigned long int l1=0;
l1 |= (pdest[0]);
l1 |= (pdest[1] << 8);
l1 |= (pdest[2] << 16);
l1 |= (pdest[3] << 24);
printf ("%lu",l1);
}
and output is
18446744073709551615
not 4294967295?
How to do it correct?
Read this:
https://en.wikipedia.org/wiki/C_data_types
...Long unsigned integer type. Capable of containing at least the [0,
4,294,967,295] range;
You should write the last 4 lines as:
l1 |= ((unsigned long) pdest[0]);
l1 |= (((unsigned long) pdest[1]) << 8);
l1 |= (((unsigned long) pdest[2]) << 16);
l1 |= (((unsigned long) pdest[3]) << 24);
As you should cast the byte to unsigned long before shifting.
pdest[3] << 24 is of type signed int.
Change your code to
unsigned long int l1=0;
l1 |= (pdest[0]);
l1 |= (pdest[1] << 8);
l1 |= (pdest[2] << 16);
l1 |= ((unsigned int)pdest[3] << 24);
The problem is the shifting of char. You must force a conversion to unsigned long before to shift beyond the 8 bits of a char. Moreover the sign of char will play a further alteration on the result.
Try
unsigned long int l1=0;
l1 |= ((unsigned long)pdest[0]);
l1 |= ((unsigned long)pdest[1] << 8);
l1 |= ((unsigned long)pdest[2] << 16);
l1 |= ((unsigned long)pdest[3] << 24);
Note the use of a cast to force compilers to convert the char to an unsigned long before that the shift take place.
Your unsigned long does not have to be 4 bytes long.
#include <stdio.h>
#include <stdint.h>
int main(void) {
int index;
unsigned char pdest[sizeof(unsigned long)];
unsigned long l=0xFFFFFFFFUL;
for(index = 0; index < sizeof(unsigned long); index++)
{
pdest[index] = l & 0xff;
l >>= 8;
}
unsigned long l1=0;
for(index = 0; index < sizeof(unsigned long); index++)
{
l1 |= (unsigned long)pdest[index] << (8 * index);
}
printf ("%lx\n",l1);
}
First of all, to name a type that is exactly 32 bits wide, use uint32_t, not unsigned long int. unsigned long int is generally 64 bits wide in 64-bit *nixes (so called LP64), whereas they're 32 bits in Windows (LLP64).
Anyway, the problem is with integer promotions. An operand to a arithmetic operation with conversion rank less than int or unsigned int will be converted to int or unsigned int, whichever its range fits into. Since all unsigned chars are representible as signed ints, the pdest[3] is converted to signed int and the result of pdest[3] << 24 is also of type signed int!. Now, if that has the most significant bit set, the bit is shifted into the sign bit of the integer, and the behaviour is according to the C standard, undefined.
However, GCC has defined behaviour for this case; in there, the result is just a negative integer with 2's complement representation; therefore the result of (unsigned char)0xFF << 24 is (int)-16777216. Now, for | operation this then needs to be promoted to the rank of the other operand, which is unsigned. The unsigned conversion happens as if by repeatedly adding or subtracting one more than the maximum (i.e. repeatedly adding or subtracting 2⁢⁴) until the value fits in the range of the value. Since unsigned long is 64 bits on your platform, the result of this conversion is 2^64 - 16777216, or 18446744073692774400, which is ORred with the bits from previous steps.
How to fix? Easy, just prior to shifts, cast each shifted number to uint32_t. Print with the help of PRIu32 macro:
#include <inttypes.h>
...
uint32_t l1=0;
l1 |= (uint32_t)pdest[0];
l1 |= (uint32_t)pdest[1] << 8;
l1 |= (uint32_t)pdest[2] << 16;
l1 |= (uint32_t)pdest[3] << 24;
printf ("%" PRIu32, l1);
The problem with your code is implicit type conversions of unsigned char to int and then to unsigned long with signed extension for bitwise or operation, the corresponding values for each lines are as commented below
l1 |= (pdest[0]); //dec = 255 hex = 0xFF
l1 |= (pdest[1] << 8); //dec = 65535 hex = 0xFFFF
l1 |= (pdest[2] << 16); //dec = 16777215 hex =0xFFFFFF
l1 |= (pdest[3] << 24); //here is the problem
In last line pdest[3] << 24 = 0xFF000000 which is equivalent to -16777216 due to implicit conversion to int. It is again converted to unsigned long for bitwise or operation, where signed extension happens in l1 |= (pdest[3] << 24) which is equivalent to 0x0000000000FFFFFF | 0xFFFFFFFFFF000000.
As many people suggested you can use explicit type conversion or you can use below code snippet,
l1 = (l1 << 0) | pdest[3];
l1 = (l1 << 8) | pdest[2];
l1 = (l1 << 8) | pdest[1];
l1 = (l1 << 8) | pdest[0];
I hope it solves your problem and reasons out for such a huge output.

How to shift bytes from char array into int

I would like to make a int varibale out of a char array in C.
The char array looks like this:
buffer[0] = 0xcf
buffer[1] = 0x04
buffer[2] = 0x00
buffer[3] = 0x00
The shifting looks like this
x = (buffer[1] << 8 )| (buffer[0] << 0) ;
After that x looks like this:
x = 0xffff04cf
Right now everthing would be fine, if the first two bytes wouldn't be ff.
If I try this line
x = (buffer[3] << 24 )| (buffer[2] << 16)| (buffer[1] << 8)| (buffer[0] << 0) ;
it still looks
x = 0xffff04cf
Even when I try to shift in the zeros before or after I shift in 04cf it looks still the same.
Is this the rigth idea to it or what am I doing wrong?
The issue is that you declared buffer by means of a signed type, probably (signed) char. When applying operator <<, integral promotions will be performed, and as the value 0xcf in an 8-bit signed type represents a negative value (i.e. -49), it will remain a negative value (yet represented by more bits, i.e. 0xffffffcf). Note that -1 is represented as 0xFFFFFFFF and vice versa.
To overcome this issue, simply define buffer as
unsigned char buffer[4]
And if you weren't allowed to change the data type of buffer, you could write...
unsigned x = ( (unsigned char)buffer[0] << 8 )| ((unsigned char)buffer[1] << 4) ;
For tasks like this I like using unions, for example:
union tag_int_chars {
char buffer[sizeof(int32_t)];
int32_t value;
} int_chars;
int_chars.value = 0x01234567;
int_chars.buffer[0] = 0xff;
This will automate the memory overlay without the need to shift. Set the value of the int and voila the chars have changed, change a char value and voila the int has changed.
The example will leave the int value = 0x012345ff on a little endian machine.
Another easy way is to use memcpy():
#include <string.h>
char buffer[sizeof(int32_t)];
int32_t value;
memcpy(&value, buffer, sizeof(int32_t)); // chars to int
memcpy(buffer, &value, sizeof(int32_t)); // int to chars

what does a[0] = addr & 0xff?

i'm currently learning from the book "the shellcoder's handbook", I have a strong understanding of c but recently I came across a piece of code that I can't grasp.
Here is the piece of code:
char a[4];
unsigned int addr = 0x0806d3b0;
a[0] = addr & 0xff;
a[1] = (addr & 0xff00) >> 8;
a[2] = (addr & 0xff0000) >> 16;
a[3] = (addr) >> 24;
So the question is what does this, what is addr & 0xff (and the three lines below it) and what makes >> 8 to it (I know that it divides it 8 times by 2)?
Ps: don't hesitate to tell me if you have ideas for the tags that I should use.
The variable addr is 32 bits of data, while each element in the array a is 8 bits. What the code does is copy the 32 bits of addr into the array a, one byte at a time.
Lets take this line:
a[1] = (addr & 0xff00) >> 8;
And then do it step by step.
addr & 0xff00 This gets the bits 8 to 15 of the value in addr, the result after the operation is 0x0000d300.
>> 8 This shifts the bits to the right, so 0x0000d300 becomes 0x000000d3.
Assign the resulting value of the mask and shift to a[1].
The code is trying to enforce endianness on the data input. Specifically, it is trying to enforce little endian behavior on the data. Here is the explaination:
a[0] = addr & 0xff; /* gets the LSB 0xb0 */
a[1] = (addr & 0xff00) >> 8; /* gets the 2nd LSB 0xd3 */
a[2] = (addr & 0xff0000) >> 16; /* gets 2nd MSB 0x06 */
a[3] = (addr) >> 24; /* gets the MSB 0x08 */
So basically, the code is masking and separating out every byte of data and storing it in the array "a" in the little endian format.
unsigned char a[4]; /* I think using unsigned char is better in this case */
unsigned int addr = 0x0806d3b0;
a[0] = addr & 0xff; /* get the least significant byte 0xb0 */
a[1] = (addr & 0xff00) >> 8; /* get the second least significant byte 0xd3 */
a[2] = (addr & 0xff0000) >> 16; /* get the second most significant byte 0x06 */
a[3] = (addr) >> 24; /* get the most significant byte 0x08 */
Apparently, the code isolates the individual bytes from addr to store them in the array a so they can be indexed. The first line
a[0] = addr & 0xff;
masks out the byte of lowest value by using 0xff as a bit mask; the subsequent lines do the same, but in addition shift the result to the rightmost position. Finally, the the last line
a[3] = (addr) >> 24;
no masking is necessary anymore, as all unneccesary information is discarded by the shift.
The code is effectively storing a 32 bit adress in a 4 chars long array. As you may know, a char has a byte (8 bit). It first copies the first byte of the adress, then shifts, copies the second byte, then shifts, etc. You get the gist.
It enforces endianness, and stores the integer in little-endian format in a.
See the illustration on wikipedia.
also, why not visualize the bit shifting results..
char a[4];
unsigned int addr = 0x0806d3b0;
a[0] = addr & 0xff;
a[1] = (addr & 0xff00) >> 8;
a[2] = (addr & 0xff0000) >> 16;
a[3] = (addr) >> 24;
int i = 0;
for( ; i < 4; i++ )
{
printf( "a[%d] = %02x\t", i, (unsigned char)a[i] );
}
printf("\n" );
Output:
a[0] = b0 a[1] = d3 a[2] = 06 a[3] = 08
I addition to the multiple answers given, the code has some flaws that need to be fixed to make the code portable. In particular, the char type is very dangerous to use for storing values, because of its implementation-defined signedness. Very classic C bug. If the code was taken from a book, then you should read that book sceptically.
While we are at it, we can also tidy up the code, make it overly explicit to avoid potential future maintenance bugs, remove some implicit type promotions of integer literals etc.
#include <stdint.h>
uint8_t a[4];
uint32_t addr = 0x0806d3b0UL;
a[0] = addr & 0xFFu;
a[1] = (addr >> 8) & 0xFFu;
a[2] = (addr >> 16) & 0xFFu;
a[3] = (addr >> 24) & 0xFFu;
The masks & 0xFFu are strictly speaking not needed, but they might save you from some false positive compiler warnings about wrong integer types. Alternatively, each shift result could be cast to uint8_t and that would have been fine too.

Signed right shift = strange result?

I was helping someone with their homework and ran into this strange issue. The problem is to write a function that reverses the order of bytes of a signed integer(That's how the function was specified anyway), and this is the solution I came up with:
int reverse(int x)
{
int reversed = 0;
reversed = (x & (0xFF << 24)) >> 24;
reversed |= (x & (0xFF << 16)) >> 8;
reversed |= (x & (0xFF << 8)) << 8;
reversed |= (x & 0xFF) << 24;
return reversed;
}
If you pass 0xFF000000 to this function, the first assignment will result in 0xFFFFFFFF. I don't really understand what is going on, but I know it has something to do with conversions back and forth between signed and unsigned, or something like that.
If I either append ul to 0xFF it works fine, which I assume is because it's forced to unsigned then converted to signed or something in that direction. The resulting code also changes; without the ul specifier it uses sar(shift arithmetic right), but as unsigned it uses shr as intended.
I would really appreciate it if someone could shed some light on this for me. I'm supposed to know this stuff, and I thought I did, but I'm really not sure what's going on here.
Thanks in advance!
Since x is a signed quantity, the result of (x & (0xFF << 24)) is 0xFF000000 which is also signed and thus a negative number since the top (sign) bit is set. The >> operator on int (a signed value) performs sign extension (Edit: though this behaviour is undefined and implementation-specific) and propagates the sign bit value of 1 as the value is shifted to the right.
You should rewrite the function as follows to work exclusively on unsigned values:
unsigned reverse(unsigned x)
{
unsigned int reversed = 0;
reversed = (x & (0xFF << 24)) >> 24;
reversed |= (x & (0xFF << 16)) >> 8;
reversed |= (x & (0xFF << 8)) << 8;
reversed |= (x & 0xFF) << 24;
return reversed;
}
From your results we can deduce that you are on a 32-bit machine.
(x & (0xFF << 24)) >> 24
In this expression 0xFF is an int, so 0xFF << 24 is also an int, as is x.
When you perform the bitwise & between two int, the result is also an int and in this case the value is 0xFF000000 which on a 32-bit machine means that the sign bit is set, so you have a negative number.
The result of performing a right-shift on an object of signed type with a negative value is implementation-defined. In your case, as sign-preserving arithmetic shift right is performed.
If you right-shift an unsigned type, then you would get the results that you were expecting for a byte reversal function. You could achieve this by making either operand of the bitwise & operand an unsigned type forcing conversion of both operands to the unsigned type. (This is true on any implementation where an signed int can't hold all the possible range of positive values of an unsigned int which is nearly all implementations.)
Right shift on signed types is implementation defined, in particular the compiler is free to do an arithmetic or logical shift as pleases. This is something you will not notice if the concrete value that you are treating is positive, but as soon as it is negative you may fall into a trap.
Just don't do it, this is not portable.
x is signed, so the highest bit is used for the sign. 0xFF000000 means "negative 0x7F000000". When you do the shift, the result is "sign extended": The binary digit that is added on the left to replace the former MSB that was shifted right, is always the same as the sign of value. So
0xFF000000 >> 1 == 0xFF800000
0xFF000000 >> 2 == 0xFFC00000
0xFF000000 >> 3 == 0xFFE00000
0xFF000000 >> 4 == 0xFFF00000
If the value being shifted is unsigned, or if the shift is toward the left, the new bit would be 0. It's only in right-shifts of signed values that sign-extension come into play.
If you want it to work the same on al platforms with both signed and unsigned integers, change
(x & (0xFF << 24)) >> 24
into
(x >> 24) & 0xFF
If this is java code you should use '>>>' which is an unsigned right shift, otherwise it will sign extend the value

Resources