I think I confused myself with endianness and bit-shifting, please help.
I have 4 8-bit ints which I want to convert to a 32-bit int. This is what I an doing:
uint h;
t_uint8 ff[4] = {1,2,3,4};
if (BIG_ENDIAN) {
h = ((int)ff[0] << 24) | ((int)ff[1] << 16) | ((int)ff[2] << 8) | ((int)ff[3]);
}
else {
h = ((int)ff[0] >> 24) | ((int)ff[1] >> 16) | ((int)ff[2] >> 8) | ((int)ff[3]);
}
However, this seems to produce a wrong result. With a little experimentation I realised that it should be other way round: in the case of big endian I am supposed to shift bits to the right, and otherwise to the left. However, I don't understand WHY.
This is how I understand it. Big endian means most significant byte first (first means leftmost, right? perhaps this where I am wrong). So, converting 8-bit int to 32-bit int would prepend 24 zeros to my existing 8 bits. So, to make it a 1st byte I need to shift bits 24 to the left.
Please point out where I am wrong.
You always have to shift the 8-bit-values left. But in the little-endian case, you have to change the order of indices, so that the fourth byte goes into the most-significant position, and the first byte into the least-significant.
if (BIG_ENDIAN) {
h = ((int)ff[0] << 24) | ((int)ff[1] << 16) | ((int)ff[2] << 8) | ((int)ff[3]);
}
else {
h = ((int)ff[3] << 24) | ((int)ff[2] << 16) | ((int)ff[1] << 8) | ((int)ff[0]);
}
Related
I took this example from the following page. I am trying to convert long into a 4 byte array. This is the original code from the page.
long n;
byte buf[4];
buf[0] = (byte) n;
buf[1] = (byte) n >> 8;
buf[2] = (byte) n >> 16;
buf[3] = (byte) n >> 24;
long value = (unsigned long)(buf[4] << 24) | (buf[3] << 16) | (buf[2] << 8) | buf[1];
I modified the code replacing
long value = (unsigned long)(buf[4] << 24) | (buf[3] << 16) | (buf[2] << 8) | buf[1];
for
long value = (unsigned long)(buf[3] << 24) | (buf[2] << 16) | (buf[1] << 8) | buf[0];
I tried the original code where n is 15000 and value would return 0. After modifiying the line in question (i think there was an error in the indexes on the original post?) value returns 152.
The objetive is to have value return the same number as n. Also, n can be negative, so value should also return the same negative number.
Not sure what I am doing wrong. Thanks!
You were correct that the indices were wrong. A 4-byte array indexes from 0 to 3, not 1 to 4.
The rest of the issues were because you were using a signed 'long' type. Doing bit-manipulations on signed datatypes is not well defined, since it assumes something about how signed integers are stored (twos-complement on most systems, although I don't think any standard requires it).
e.g. see here
You're then assigning between signed 'longs' and unsigned 'bytes'.
Someone else has posted an answer (possibly abusing casts) that I'm sure works. But without any explanation I feel it doesn't help much.
I have to say what the output of the program for w = 33. I do not know how to do it. Does anyone have an idea how to solve this without writing the binary representation of each number?
void notChicken(int w)
{
unsigned int v1 = 0x12345678;
unsigned int v2 = 0x87654785;
unsigned int v3 = 0xffffffff;
unsigned int tmp;
tmp = (v1 >> 3) | (v2 << 3);
tmp &= v3 & ~(v3 << (w >> 1));
printf("%8x\n", tmp);
}
Thanks
Although not a good idea, lets try to break down your operation.
You have given w = 33
The last part -
v3 & ~(v3 << (w >> 1)) is going to evaluate as v3 & ~(v3 << 16)
v3 << 16 is 0xffff0000 and ~ of that is 0xffff
since v3 is all ones you get 0xffff. This will mask off the upper 16 bits of the previous computation.
Now (v1 >> 3) | (v2 << 3);
We care only about the lower 16 bits.
>> 3 is dividing by 8 and << 3 is multiplying by 8.
So the result of first part will be
0x2468ACF | 0x3B2A3C28
Keeping only the lower 16 bits
0x8ACF | 0x3C28
Finally I don't know how you are going to do the OR without writing the bitwise representation. I can help with the last hex. It will be F.
I have some undefined behaviour in a seemingly innocuous function which is parsing a double value from a buffer. I read the double in two halves, because I am reasonably certain the language standard says that shifting char values is only valid in a 32-bit context.
inline double ReadLittleEndianDouble( const unsigned char *buf )
{
uint64_t lo = (buf[3] << 24) | (buf[2] << 16) | (buf[1] << 8) | buf[0];
uint64_t hi = (buf[7] << 24) | (buf[6] << 16) | (buf[5] << 8) | buf[4];
uint64_t val = (hi << 32) | lo;
return *(double*)&val;
}
Since I am storing 32-bit values into 64-bit variables lo and hi, I reasonably expect that the high-order 32-bits of these variables will always be 0x00000000. But sometimes they contain 0xffffffff or other non-zero rubbish.
The fix is to mask it like this:
uint64_t val = ((hi & 0xffffffffULL) << 32) | (lo & 0xffffffffULL);
Alternatively, it seems to work if I mask during the assignment instead:
uint64_t lo = ((buf[3] << 24) | (buf[2] << 16) | (buf[1] << 8) | buf[0]) & 0xffffffff;
uint64_t hi = ((buf[7] << 24) | (buf[6] << 16) | (buf[5] << 8) | buf[4]) & 0xffffffff;
I would like to know why this is necessary. All I can think of to explain this is that my compiler is doing all the shifting and combining for lo and hi directly on 64-bit registers, and I might expect undefined behaviour in the high-order 32-bits if this is the case.
Can someone please confirm my suspicions or otherwise explain what is happening here, and comment on which (if any) of my two solutions is preferable?
If you try to shift a char or unsigned char you're leaving yourself at the mercy of the standard integer promotions. You're better off casting the values yourself, before you try to shift them. You don't have to separate the lower and upper halves if you do so.
inline double ReadLittleEndianDouble( const unsigned char *buf )
{
uint64_t val = ((uint64_t)buf[7] << 56) | ((uint64_t)buf[6] << 48) | ((uint64_t)buf[5] << 40) | ((uint64_t)buf[4] << 32) |
((uint64_t)buf[3] << 24) | ((uint64_t)buf[2] << 16) | ((uint64_t)buf[1] << 8) | (uint64_t)buf[0];
return *(double*)&val;
}
All this is necessary only if the CPU is big-endian or if the buffer might not be properly aligned for the CPU architecture, otherwise you can simplify this greatly:
return *(double*)buf;
I was looking at this function online and am wondering how it works:
/*
* reverseBytes - reverse bytes
* Example: reverseBytes(0x12345678) = 0x78563412
* Legal ops: ! ~ & ^ | + << >>
*/
int reverseBytes(int x)
{
int newbyte0 = (x >> 24) & 0xff;
int newbyte1 = (x >> 8) & 0xff00;
int newbyte2 = (x << 8) & 0xff0000;
int newbyte3 = x << 24;
return newbyte0 | newbyte1 | newbyte2 | newbyte3;
}
Here's what I think I understand:
0xff, 0xff00, and 0xff0000 in binary are 1111 1111, 1111 1111 0000 0000, and 1111 1111 0000 0000 0000 0000 respectively
The method creates four new bytes with masks (0xff, etc), and then adds their values together using the | operator
I really don't get how this reverses the bytes though. I would appreciate a detailed explanation. Thanks!
The code assumes a 32 bit integer and 8 bit bytes. A 32 bit integer is made up of 4 bytes:
Let's say these 4 bytes are laid out in memory like so:
+---------------------------------+
|Byte 4 | Byte 3 | Byte 2 | Byte 1|
+---------------------------------+
This could relate to the Endianess of a given CPU type. When interpreting an integer that is made up of several bytes, some CPU families will treat the leftmost byte, the one with a lower memory address as the most significant byte of the integer - such CPUs are called big endian. Other CPUs will do the reverse, they will treat the rightmost byte within an integer , the byte with the largest memory address as the most significant byte - little endian CPUs. So your functions convert an integer from one endian to another.
int newbyte0 = (x >> 24) & 0xff;
This takes the integer (the 4 bytes) depicted above, shifts it 24 bits to the right, and masks off everything but the lower 8 bits, newbyte0 looks like this now, where Byte 4 is the original Byte 4 of x and the other 3 bytes have all bits set to zero.
+---------------------------------+
| 0 | 0 | 0 | Byte 4 |
+---------------------------------+
Similarely
int newbyte1 = (x >> 8) & 0xff00;
Shifts the bits 8 bits to the right, and masks off everything but the 8 bits in the 2. byte from the left. The result looks like this with, with only Byte 3 remaining of the original value of x
+---------------------------------+
| 0 | 0 | Byte 3 | 0 |
+---------------------------------+
The 2 leftmost bytes are handled similarly, just x is shifted left to accomplish the same thing.
Finally you have
newbyte0 | newbyte1 | newbyte2 | newbyte3;
Which combines all the integers you created above, each with only 8 bits remaining from the original x. Do a bitwise or of them, and you end up with
+---------------------------------+
|Byte 1 | Byte 2 | Byte 3 | Byte 4|
+---------------------------------+
int newbyte0 = (x >> 24) & 0xff;
Shifts the number 24 bits to the right, so that the left-most byte will now be the right-most byte. It then uses a mask (0xff) to zero out the rest of the bytes, which is redundant as the shift will have zeroed them anyways, so the mask can be omitted.
int newbyte1 = (x >> 8) & 0xff00;
Shifts the number 8 bits to the right, so that the second byte from the left is now the second byte from the right, and the rest of the bytes are zeroed out with a mask.
int newbyte2 = (x << 8) & 0xff0000;
Shifts the number 8 bits to the left this time - essentially the same thing as the last line, only now the second byte from the right becomes the second byte from the left.
int newbyte3 = x << 24;
The same as the first line (this time the redundant mask really is omitted) - the right-most byte becomes the left-most byte.
return newbyte0 | newbyte1 | newbyte2 | newbyte3;
And finally you just OR all the bytes to finish the reversal.
You can actually follow this process step-by-step in code by using printf("%x", newbyte) to print each of the bytes - the %x format allows you to print in hexadecimal.
Lets assume for 32 bit system you have passed 0x12345678 to the function.
int newbyte0 = (x >> 24) & 0xff; //will be 0x00000012
int newbyte1 = (x >> 8) & 0xff00; //will be 0x00003400
int newbyte2 = (x << 8) & 0xff0000; //will be 0x00560000
int newbyte3 = x << 24; //will be 0x78000000
return newbyte0 | newbyte1 | newbyte2 | newbyte3; will be 0x78563412
This function just shift byte to the right position in an integer and than OR all of them together.
For example x is 0xAABBCCDD:
For the first byte we shift all byte to the right, so we have 0x00000000AA & 0xFF which is 0xAA.
For the second byte we have 0x00AABBCC & 0xFF00 which is 0x0000BB00
And so on.
We just shift bits to the right position and erase all other bits.
Yes, your understands the code correctly, but of course it assumes int as 32 bits value.
int newbyte0 = (x >> 24) & 0xff; // Shift the bits 24~31 to 0~7
int newbyte1 = (x >> 8) & 0xff00; // Shift the bits 16~23 to 8~15
int newbyte2 = (x << 8) & 0xff0000; // Shifts bit bits 8~15 to 16~23
int newbyte3 = x << 24; // Shift bits 0~7 to 24~31
return newbyte0 | newbyte1 | newbyte2 | newbyte3; // Join all the bits
I am receiving a 3-byte integer, which I'm storing in an array. For now, assume the array is unsigned char myarray[3]
Normally, I would convert this into a standard int using:
int mynum = ((myarray[2] << 16) | (myarray[1] << 8) | (myarray[0]));
However, before I can do this, I need to convert the data from network to host byte ordering.
So, I change the above to (it comes in 0-1-2, but it's n to h, so 0-2-1 is what I want):
int mynum = ((myarray[1] << 16) | (myarray[2] << 8) | (myarray[0]));
However, this does not seem to work. For the life of me can't figure this out. I've looked at it so much that at this point I think I'm fried and just confusing myself. Is what I am doing correct? Is there a better way? Would the following work?
int mynum = ((myarray[2] << 16) | (myarray[1] << 8) | (myarray[0]));
int correctnum = ntohl(mynum);
Here's an alternate idea. Why not just make it structured and make it explicit what you're doing. Some of the confusion you're having may be rooted in the "I'm storing in an array" premise. If instead, you defined
typedef struct {
u8 highByte;
u8 midByte;
u8 lowByte;
} ThreeByteInt;
To turn it into an int, you just do
u32 ThreeByteTo32(ThreeByteInt *bytes) {
return (bytes->highByte << 16) + (bytes->midByte << 8) + (bytes->lowByte);
}
if you receive the value in network ordering (that is big endian) you have this situation:
myarray[0] = most significant byte
myarray[1] = middle byte
myarray[2] = least significant byte
so this should work:
int result = (((int) myarray[0]) << 16) | (((int) myarray[1]) << 8) | ((int) myarray[2]);
Beside the ways of using strucures / unions with byte-size members you have two other ways
Using ntoh / hton and masking out the high byte of the 4-byte integer before or after
the conversion with an bitwise and.
Doing the bitshift operations contained in other answers
At any rate you should not rely on side effects and shift data beyond the size of data type.
Shift by 16 is beyond the size of unsigned char and will cause problems depending on compiler, flags, platform endianess and byte order. So always do the proper cast before bitwise to make it work on any compiler / platform:
int result = (((int) myarray[0]) << 16) | (((int) myarray[1]) << 8) | ((int) myarray[2]);
Why don't just receive into the top 3 bytes of a 4-byte buffer? After that you could use ntohl which is just a byte swap instruction in most architectures. In some optimization levels it'll be faster than simple bitshifts and or
union
{
int32_t val;
unsigned char myarray[4];
} data;
memcpy(&data, buffer, 3);
data.myarray[3] = 0;
data.val = ntohl(data.val);
or in case you have copied it to the bottom 3 bytes then another shift is enough
memcpy(&data.myarray[1], buffer, 3);
data.myarray[0] = 0;
data.val = ntohl(data.val) >> 8; // or data.val = ntohl(data.val << 8);
unsigned char myarray[3] = { 1, 2, 3 };
# if LITTLE_ENDIAN // you figure out a way to express this on your platform
int mynum = (myarray[0] << 0) | (myarray[1] << 8) | (myarray[2] << 16);
# else
int mynum = (myarray[0] << 16) | (myarray[1] << 8) | (myarray[2] << 0);
# endif
printf("%x\n", mynum);
That prints 30201 which I think is what you want. The key is to realize that you have to shift the bytes differently per-platform: you can't easily use ntohl() because you don't know where to put the extra zero byte.