Bitwise operation in C language (0x80, 0xFF, << ) - c

I have a problem understanding this code. What I know is that we have passed a code into a assembler that has converted code into "byte code". Now I have a Virtual machine that is supposed to read this code. This function is supposed to read the first byte code instruction. I don't understand what is happening in this code. I guess we are trying to read this byte code but don't understand how it is done.
static int32_t bytecode_to_int32(const uint8_t *bytecode, size_t size)
{
int32_t result;
t_bool sign;
int i;
result = 0;
sign = (t_bool)(bytecode[0] & 0x80);
i = 0;
while (size)
{
if (sign)
result += ((bytecode[size - 1] ^ 0xFF) << (i++ * 8));
else
result += bytecode[size - 1] << (i++ * 8);
size--;
}
if (sign)
result = ~(result);
return (result);
}

This code is somewhat badly written, lots of operations on a single line and therefore containing various potential bugs. It looks brittle.
bytecode[0] & 0x80 Simply reads the MSB sign bit, assuming it's 2's complement or similar, then converts it to a boolean.
The loop iterates backwards from most significant byte to least significant.
If the sign was negative, the code will perform an XOR of the data byte with 0xFF. Basically inverting all bits in the data. The result of the XOR is an int.
The data byte (or the result of the above XOR) is then bit shifted i * 8 bits to the left. The data is always implicitly promoted to int, so in case i * 8 happens to give a result larger than INT_MAX, there's a fat undefined behavior bug here. It would be much safer practice to cast to uint32_t before the shift, carry out the shift, then convert to a signed type afterwards.
The resulting int is converted to int32_t - these could be the same type or different types depending on system.
i is incremented by 1, size is decremented by 1.
If sign was negative, the int32_t is inverted to some 2's complement negative number that's sign extended and all the data bits are inverted once more. Except all zeros that got shifted in with the left shift are also replaced by ones. If this is intentional or not, I cannot tell. So for example if you started with something like 0x0081 you now have something like 0xFFFF01FF. How that format makes sense, I have no idea.
My take is that the bytecode[size - 1] ^ 0xFF (which is equivalent to ~) was made to toggle the data bits, so that they would later toggle back to their original values when ~ is called later. A programmer has to document such tricks with comments, if they are anything close to competent.
Anyway, don't use this code. If the intention was merely to swap the byte order (endianess) of a 4 byte integer, then this code must be rewritten from scratch.
That's properly done as:
static int32_t big32_to_little32 (const uint8_t* bytes)
{
uint32_t result = (uint32_t)bytes[0] << 24 |
(uint32_t)bytes[1] << 16 |
(uint32_t)bytes[2] << 8 |
(uint32_t)bytes[3] << 0 ;
return (int32_t)result;
}
Anything more complicated than the above is highly questionable code. We need not worry about signs being a special case, the above code preserves the original signedness format.

So the A^0xFF toggles the bits set in A, so if you have 10101100 xored with 11111111.. it will become 01010011. I am not sure why they didn't use ~ here. The ^ is a xor operator, so you are xoring with 0xFF.
The << is a bitshift "up" or left. In other words, A<<1 is equivalent to multiplying A by 2.
the >> moves down so is equivalent to bitshifting right, or dividing by 2.
The ~ inverts the bits in a byte.
Note it's better to initialise variables at declaration it costs no additional processing whatsoever to do it that way.
sign = (t_bool)(bytecode[0] & 0x80); the sign in the number is stored in the 8th bit (or position 7 counting from 0), which is where the 0x80 is coming from. So it's literally checking if the signed bit is set in the first byte of bytecode, and if so then it stores it in the sign variable.
Essentially if it's unsigned then it's copying the bytes from from bytecode into result one byte at a time.
If the data is signed then it flips the bits then copies the bytes, then when it's done copying, it flips the bits back.
Personally with this kind of thing i prefer to get the data, stick in htons() format (network byte order) and then memcpy it to an allocated array, store it in a endian agnostic way, then when i retrieve the data i use ntohs() to convert it back to the format used by the computer. htons() and ntohs() are standard C functions and are used in networking and platform agnostic data formatting / storage / communication all the time.

This function is a very naive version of the function which converts form the big endian to little endian.
The parameter size is not needed as it works only with the 4 bytes data.
It can be much easier archived by the union punning (and it allows compilers to optimize it - in this case to the simple instruction):
#define SWAP(a,b,t) do{t c = (a); (a) = (b); (b) = c;}while(0)
int32_t my_bytecode_to_int32(const uint8_t *bytecode)
{
union
{
int32_t i32;
uint8_t b8[4];
}i32;
uint8_t b;
i32.b8[3] = *bytecode++;
i32.b8[2] = *bytecode++;
i32.b8[1] = *bytecode++;
i32.b8[0] = *bytecode++;
return i32.i32;
}
int main()
{
union {
int32_t i32;
uint8_t b8[4];
}i32;
uint8_t b;
i32.i32 = -4567;
SWAP(i32.b8[0], i32.b8[3], uint8_t);
SWAP(i32.b8[1], i32.b8[2], uint8_t);
printf("%d\n", bytecode_to_int32(i32.b8, 4));
i32.i32 = -34;
SWAP(i32.b8[0], i32.b8[3], uint8_t);
SWAP(i32.b8[1], i32.b8[2], uint8_t);
printf("%d\n", my_bytecode_to_int32(i32.b8));
}
https://godbolt.org/z/rb6Na5

If the purpose of the code is to sign-extend a 1-, 2-, 3-, or 4-byte sequence in network/big-endian byte order to a signed 32-bit int value, it's doing things the hard way and reimplementing the wheel along the way.
This can be broken down into a three-step process: convert the proper number of bytes to a 32-bit integer value, sign-extend bytes out to 32 bits, then convert that 32-bit value from big-endian to the host's byte order.
The "wheel" being reimplemented in this case is the the POSIX-standard ntohl() function that converts a 32-bit unsigned integer value in big-endian/network byte order to the local host's native byte order.
The first step I'd do is to convert 1, 2, 3, or 4 bytes into a uint32_t:
#include <stdint.h>
#include <limits.h>
#include <arpa/inet.h>
#include <errno.h>
// convert the `size` number of bytes starting at the `bytecode` address
// to a uint32_t value
static uint32_t bytecode_to_uint32( const uint8_t *bytecode, size_t size )
{
uint32_t result = 0;
switch ( size )
{
case 4:
result = bytecode[ 0 ] << 24;
case 3:
result += bytecode[ 1 ] << 16;
case 2:
result += bytecode[ 2 ] << 8;
case 1:
result += bytecode[ 3 ];
break;
default:
// error handling here
break;
}
return( result );
}
Then, sign-extend it (borrowing from this answer):
static uint32_t sign_extend_uint32( uint32_t in, size_t size );
{
if ( size == 4 )
{
return( in );
}
// being pedantic here - the existence of `[u]int32_t` pretty
// much ensures 8 bits/byte
size_t bits = size * CHAR_BIT;
uint32_t m = 1U << ( bits - 1 );
uint32_t result = ( in ^ m ) - m;
return ( result );
}
Put it all together:
static int32_t bytecode_to_int32( const uint8_t *bytecode, size_t size )
{
uint32_t result = bytecode_to_uint32( bytecode, size );
result = sign_extend_uint32( result, size );
// set endianness from network/big-endian to
// whatever this host's endianness is
result = ntohl( result );
// converting uint32_t here to signed int32_t
// can be subject to implementation-defined
// behavior
return( result );
}
Note that the conversion from uint32_t to int32_t implicitly performed by the return statement in the above code can result in implemenation-defined behavior as there can be uint32_t values that can not be mapped to int32_t values. See this answer.
Any decent compiler should optimize that well into inline functions.
I personally think this also needs much better error handling/input validation.

Related

How to combine two hex value(High Value & Low Value) at two different array positions?

I received two hex values where at array[1] = lowbyte and at array[2] = highbyte where for my example lowbyte = 0xF4 and highbyte = 0x01 so the value will be in my example 1F4(500). So I want to combine these two values and compare but how do I do that without any library function?
Please help and sorry for my bad English.
I did some research and I found this as my solution and it seems to be working fine:
int temp = (short)(((HIGHBYTE) & 0xFF) << 8 | (LOWBYTE) & 0xFF);
Just a basic example showing how to combine values of two different variables into one:
#include <stdio.h>
int main (void)
{
char highbyte = 0x01;
unsigned char lowbyte = 0xF4; //Edited as per comments from #Fe2O3,
short int val = 0;
val = (highbyte << 8) | lowbyte; // If lowbyte declared as signed, then masking is required `lowbyte & 0xFF`
printf("0x%hx\n", val);
return 0;
}
Tested this on Linux PC.
Based on the answer where you converted to short, it seems you may want to combine the two bytes to produce a 16-bit two’s complement integer. This answer shows how to do that in three ways for which the behavior is fully defined by the C standard, as well as a fourth way that requires knowledge of the C implementation being used. Methods 1 and 3 are also defined in C++.
Given two eight-bit unsigned bytes with the more significant byte in highbyte and the less significant byte in lowbyte, four options for constructing the 16-bit two’s complement value they represent are:
Assemble the bytes in the desired order and copy them into an int16_t: uint16_t t = (uint16_t) highbyte << 8 | lowbyte; int16_t result; memcpy(&result, &t, sizeof result);.
Assemble the bytes in the desired order and use a union to reinterpret them: int16_t result = (union { uint16_t u; int16_t i; }) { (uint16_t) highbyte << 8 | lowbyte } .i;.
Construct the result arithmetically: int16_t result = ((highbyte ^ 128) - 128) * 256 + lowbyte;.
If it is given that the code will be used only with C implementations that define conversion to a signed integer to wrap, then a conversion may be used: int16_t result = (int16_t) ((uint16_t) highbyte << 8 | lowbyte);.
(In the last, the conversion to int16_t is implicit in the initialization, but a cast is used because, without it, some compilers will produce a warning or error, depending on switches.)
Note: int16_t and uint16_t are defined by including <stdint.h>. Alternatively, if it is given that short is 16 bits, then short and unsigned short may be used in place of int16_t and uint16_t.
Here is more information about the first three of these.
1. Assemble the bytes and copy
(uint16_t) highbyte << 8 | lowbyte converts to a type suitable for shifting without sign-bit issues, moves the more significant byte into the upper 8 bits of 16, and puts the less significant byte into the lower 8 bits.
Then uint16_t = …; puts those bits into a uint16_t.
memcpy(&result, &t, sizeof result); copies those bits into an int16_t. C 2018 7.20.1.1 1 guarantees that int16_t uses two’s complement. C 2018 6.2.6.2 2 guarantees that the value bits in int16_t have the same position values as their counterparts in uint16_t, so the copy produces the desired arrangement in result.
2. Assemble the bytes and use a union
(type) { initial value } is a compound literal. (union { uint16_t u; int16_t i; }) { (uint16_t) highbyte << 8 | lowbyte } makes a compound literal that is a union and initializes its u member to have the value described above. Then .i reads the i member of the union, which reinterprets the bits using the type int16_t, which is two’s complement as describe above. Then int16_t result = …; initializes result to this value.
3. Construct the result arithmetically
Here we start with the more significant byte separately, interpreting the eight bits of highbyte as two’s complement. In eight-bit two’s complement, the sign bit represents 0 if it is off and −128 if it is on. (For example, 111111002 as unsigned binary represents 128+64+32+16+8+4 =252, but, in two’s complement, it is −128+64+32+16+8+4 = −4.)
Consider highbyte ^ 128) - 128. If the first bit is off, ^ 128 turns it on, which adds 128 to its unsigned binary meaning. Then - 128 subtracts 128, producing a net effect of zero. If the first bit is on, ^ 128 turns it off, which cancels its unsigned binary meaning. Then - 128 gives the desired value. Thus (highbyte ^ 128) - 128 reinterprets the first bit to have a value of 0 if it is off and −128 if it is on.
Then ((highbyte ^ 128) - 128) * 256 moves this to the more significant byte of 16 bits (in an int type at this point), and + lowbyte puts the less significant byte in the less significant position. And of course int16_t result = …; initializes result to this computed value.

Is reading one byte at a time endianness agnostic regardless of value size?

Say I am reading and writing uint32_t values to and from a stream. If I read/write one byte at a time to/from a stream and shift each byte like the below examples, will the results be consistent regardless of machine endianness?
In the examples here the stream is a buffer in memory called p.
static uint32_t s_read_uint32(uint8_t** p)
{
uint32_t value;
value = (*p)[0];
value |= (((uint32_t)((*p)[1])) << 8);
value |= (((uint32_t)((*p)[2])) << 16);
value |= (((uint32_t)((*p)[3])) << 24);
*p += 4;
return value;
}
static void s_write_uint32(uint8_t** p, uint32_t value)
{
(*p)[0] = value & 0xFF;
(*p)[1] = (value >> 8 ) & 0xFF;
(*p)[2] = (value >> 16) & 0xFF;
(*p)[3] = value >> 24;
*p += 4;
}
I don't currently have access to a big-endian machine to test this out, but the idea is if each byte is written one at a time each individual byte can be independently written or read from the stream. Then the CPU can handle endianness by hiding these details behind the shifting operations. Is this true, and if not could anyone please explain why not?
If I read/write one byte at a time to/from a stream and shift each byte like the below examples, will the results be consistent regardless of machine endianness?
Yes. Your s_write_uint32() function stores the bytes of the input value in order from least significant to most significant, regardless of their order in the native representation of that value. Your s_read_uint32() correctly reverses this process, regardless of the underlying representation of uint32_t. These work because
the behavior of the shift operators (<<, >>) is defined in terms of the value of the left operand, not its representation
the & 0xff masks off all bits of the left operand but those of its least-significant byte, regardless of the value's representation (because 0xff has a matching representation), and
the |= operations just put the bytes into the result; the positions are selected, appropriately, by the preceding left shift. This might be more clear if += were used instead, but the result would be no different.
Note, however, that to some extent, you are reinventing the wheel. POSIX defines a function pair htonl() and nothl() -- supported also on many non-POSIX systems -- for dealing with byte-order issues in four-byte numbers. The idea is that when sending, everyone uses htonl() to convert from host byte order (whatever that is) to network byte order (big endian) and sends the resulting four-byte buffer. On receipt, everyone accepts four bytes into one number, then uses ntohl() to convert from network to host byte order.
It'll work but a memcpy followed by a conditional byteswap will give you much better codegen for the write function.
#include <stdint.h>
#include <string.h>
#define LE (((char*)&(uint_least32_t){1})[0]) // little endian ?
void byteswap(char*,size_t);
uint32_t s2_read_uint32(uint8_t** p)
{
uint32_t value;
memcpy(&value,*p,sizeof(value));
if(!LE) byteswap(&value,4);
return *p+=4, value;
}
void s2_write_uint32(uint8_t** p, uint32_t value)
{
memcpy(*p,&value,sizeof(value));
if(!LE) byteswap(*p,4);
*p+=4;
}
Gcc since the 8th series (but not clang) can eliminate this shifts on a little-endian platforms, but you should help it by restrict-qualifying the doubly-indirect pointer to the destination, or else it might think that a write to (*p)[0] can invalidate *p (uint8_t is a char type and therefore permitted to alias anything).
void s_write_uint32(uint8_t** restrict p, uint32_t value)
{
(*p)[0] = value & 0xFF;
(*p)[1] = (value >> 8 ) & 0xFF;
(*p)[2] = (value >> 16) & 0xFF;
(*p)[3] = value >> 24;
*p += 4;
}

Copy 6 byte array to long long integer variable

I have read from memory a 6 byte unsigned char array.
The endianess is Big Endian here.
Now I want to assign the value that is stored in the array to an integer variable. I assume this has to be long long since it must contain up to 6 bytes.
At the moment I am assigning it this way:
unsigned char aFoo[6];
long long nBar;
// read values to aFoo[]...
// aFoo[0]: 0x00
// aFoo[1]: 0x00
// aFoo[2]: 0x00
// aFoo[3]: 0x00
// aFoo[4]: 0x26
// aFoo[5]: 0x8e
nBar = (aFoo[0] << 64) + (aFoo[1] << 32) +(aFoo[2] << 24) + (aFoo[3] << 16) + (aFoo[4] << 8) + (aFoo[5]);
A memcpy approach would be neat, but when I do this
memcpy(&nBar, &aFoo, 6);
the 6 bytes are being copied to the long long from the start and thus have padding zeros at the end.
Is there a better way than my assignment with the shifting?
What you want to accomplish is called de-serialisation or de-marshalling.
For values that wide, using a loop is a good idea, unless you really need the max. speed and your compiler does not vectorise loops:
uint8_t array[6];
...
uint64_t value = 0;
uint8_t *p = array;
for ( int i = (sizeof(array) - 1) * 8 ; i >= 0 ; i -= 8 )
value |= (uint64_t)*p++ << i;
// left-align
value <<= 64 - (sizeof(array) * 8);
Note using stdint.h types and sizeof(uint8_t) cannot differ from1`. Only these are guaranteed to have the expected bit-widths. Also use unsigned integers when shifting values. Right shifting certain values is implementation defined, while left shifting invokes undefined behaviour.
Iff you need a signed value, just
int64_t final_value = (int64_t)value;
after the shifting. This is still implementation defined, but all modern implementations (and likely the older) just copy the value without modifications. A modern compiler likely will optimize this, so there is no penalty.
The declarations can be moved, of course. I just put them before where they are used for completeness.
You might try
nBar = 0;
memcpy((unsigned char*)&nBar + 2, aFoo, 6);
No & needed before an array name caz' it's already an address.
The correct way to do what you need is to use an union:
#include <stdio.h>
typedef union {
struct {
char padding[2];
char aFoo[6];
} chars;
long long nBar;
} Combined;
int main ()
{
Combined x;
// reset the content of "x"
x.nBar = 0; // or memset(&x, 0, sizeof(x));
// put values directly in x.chars.aFoo[]...
x.chars.aFoo[0] = 0x00;
x.chars.aFoo[1] = 0x00;
x.chars.aFoo[2] = 0x00;
x.chars.aFoo[3] = 0x00;
x.chars.aFoo[4] = 0x26;
x.chars.aFoo[5] = 0x8e;
printf("nBar: %llx\n", x.nBar);
return 0;
}
The advantage: the code is more clear, there is no need to juggle with bits, shifts, masks etc.
However, you have to be aware that, for speed optimization and hardware reasons, the compiler might squeeze padding bytes into the struct, leading to aFoo not sharing the desired bytes of nBar. This minor disadvantage can be solved by telling the computer to align the members of the union at byte-boundaries (as opposed to the default which is the alignment at word-boundaries, the word being 32-bit or 64-bit, depending on the hardware architecture).
This used to be achieved using a #pragma directive and its exact syntax depends on the compiler you use.
Since C11/C++11, the alignas() specifier became the standard way to specify the alignment of struct/union members (given your compiler already supports it).

Shifting bit values in C

Say I have the following code:
uint32_t fillThisNum(int16_t a, int16_t b, int16_t c){
uint32_t x = 0;
uint16_t temp_a = 0, temp_b = 0, temp_c = 0;
temp_a = a << 24;
temp_b = b << 4;
temp_c = c << 4;
x = temp_a|temp_b|temp_c;
return x;
}
Essentially what I'm trying to do is fill the 32-bit number with bit information that I can extract at a later time to perform different operations.
Parameter a would hold the first 24 bits of "data", b would hold the next 4 bits of "data" and c would hold the final 4 bits of "data".
I have a couple questions:
Do the parameters have to be the same bit length as the function type, and must they be unsigned?
Can I assign an unsigned int to a signed int? (i.e. uint32_t a = int32_t b;)
Can I fill a 32-bit number with the 16-bit parameters so long they don't exceed the length of the 32-bit return value.
Any advice/tips/hints would be much appreciated, thank you.
A correct way to write this code is:
uint32_t fillThisNum(uint32_t a, uint32_t b, uint32_t c)
{
// mask out the bits we are not interested in
a &= 0xFFFFFF; // save lowest 24 bits
b &= 0xF; // save lowest 4 bits
c &= 0xF; // save lowest 4 bits
// arrange a,b,c within a 32-bit unit so that they do not overlap
return (a << 8) + (b << 4) + c;
}
By using an unsigned type for the parameters, you avoid any issues with signed arithmetic overflow, sign extension, etc.
It's OK to pass signed values as arguments when calling the function, those values will be converted to unsigned.
By using uint32_t as the parameter type then you avoid having to declare any temporary variables or worry about type width when doing your casting. It makes it easier for you to write clear code, this way.
You don't have to do it this way but this is a simple way to make sure you don't make any mistakes.
Do the parameters have to be the same bit length as the function type, and must they be unsigned?
No, the arguments and the return value can be different types.
Can I assign an unsigned int to a signed int? (i.e. uint32_t a = int32_t b;)
Yes, the value will be converted from a signed to an unsigned value. The bits in "b" will stay the same, so while "b" is in 2's complement, "a" will be a positive 32-bit number.
So, for example, let int8_t c = -127. If you perform an assignment uint8_t d = c, then "d" will be 129.
Can I fill a 32-bit number with the 16-bit parameters so long they don't exceed the length of the 32-bit return value.
If by that, you mean the way that you did in your code:
x = temp_a|temp_b|temp_c;
Yes, that is fine, with the caveat that #chux mentioned: you can't shift an n-bit value more than n bits. If you wanted to set bits more significant than bit 15 in x, a way to do this would be to set up one of the temp masks with a 32-bit value instead of a 16-bit one.

fetch 32bit instruction from binary file in C

I need to read 32bit instructions from a binary file.
so what i have right now is:
unsigned char buffer[4];
fread(buffer,sizeof(buffer),1,file);
which will put 4 bytes in an array
how should I approach that to connect those 4 bytes together in order to process 32bit instruction later?
Or should I even start in a different way and not use fread?
my weird method right now is to create an array of ints of size 32 and the fill it with bits from buffer array
The answer depends on how the 32-bit integer is stored in the binary file. (I'll assume that the integer is unsigned, because it really is an id, and use the type uint32_t from <stdint.h>.)
Native byte order The data was written out as integer on this machine. Just read the integer with fread:
uint32_t op;
fread(&op, sizeof(op), 1, file);
Rationale: fread read the raw representation of the integer into memory. The matching fwrite does the reverse: It writes the raw representation to thze file. If you don't need to exchange the file between platforms, this is a good method to store and read data.
Little-endian byte order The data is stored as four bytes, least significant byte first:
uint32_t op = 0u;
op |= getc(file); // 0x000000AA
op |= getc(file) << 8; // 0x0000BBaa
op |= getc(file) << 16; // 0x00CCbbaa
op |= getc(file) << 24; // 0xDDccbbaa
Rationale: getc reads a char and returns an integer between 0 and 255. (The case where the stream runs out and getc returns the negative value EOF is not considered here for brevity, viz laziness.) Build your integer by shifting each byte you read by multiples of 8 and or them with the existing value. The comments sketch how it works. The capital letters are being read, the lower-case letters were already there. Zeros have not yet been assigned.
Big-endian byte order The data is stored as four bytes, least significant byte last:
uint32_t op = 0u;
op |= getc(file) << 24; // 0xAA000000
op |= getc(file) << 16; // 0xaaBB0000
op |= getc(file) << 8; // 0xaabbCC00
op |= getc(file); // 0xaabbccDD
Rationale: Pretty much the same as above, only that you shift the bytes in another order.
You can imagine little-endian and big-endian as writing the number one hundred and twenty tree (CXXIII) as either 321 or 123. The bit-shifting is similar to shifting decimal digtis when dividing by or multiplying with powers of 10, only that you shift my 8 bits to multiply with 2^8 = 256 here.
Add
unsigned int instruction;
memcpy(&instruction,buffer,4);
to your code. This will copy the 4 bytes of buffer to a single 32-bit variable. Hence you will get connected 4 bytes :)
If you know that the int in the file is the same endian as the machine the program's running on, then you can read straight into the int. No need for a char buffer.
unsigned int instruction;
fread(&instruction,sizeof(instruction),1,file);
If you know the endianness of the int in the file, but not the machine the program's running on, then you'll need to add and shift the bytes together.
unsigned char buffer[4];
unsigned int instruction;
fread(buffer,sizeof(buffer),1,file);
//big-endian
instruction = (buffer[0]<<24) + (buffer[1]<<16) + (buffer[2]<<8) + buffer[3];
//little-endian
instruction = (buffer[3]<<24) + (buffer[2]<<16) + (buffer[1]<<8) + buffer[0];
Another way to think of this is that it's a positional number system in base-256. So just like you combine digits in a base-10.
257
= 2*100 + 5*10 + 7
= 2*10^2 + 5*10^1 + 7*10^0
So you can also combine them using Horner's rule.
//big-endian
instruction = ((((buffer[0]*256) + buffer[1]*256) + buffer[2]*256) + buffer[3]);
//little-endian
instruction = ((((buffer[3]*256) + buffer[2]*256) + buffer[1]*256) + buffer[0]);
#luser droog
There are two bugs in your code.
The size of the variable "instruction" must not be 4 bytes: for example, Turbo C assumes sizeof(int) to be 2. Obviously, your program fails in this case. But, what is much more important and not so obvious: your program will also fail in case sizeof(int) be more than 4 bytes! To understand this, consider the following example:
int main()
{ const unsigned char a[4] = {0x21,0x43,0x65,0x87};
const unsigned char* p = &a;
unsigned long x = (((((p[3] << 8) + p[2]) << 8) + p[1]) << 8) + p[0];
printf("%08lX\n", x);
return 0;
}
This program prints "FFFFFFFF87654321" under amd64, because an unsigned char variable becomes SIGNED INT when it is used! So, changing the type of the variable "instruction" from "int" to "long" does not solve the problem.
The only way is to write something like:
unsigned long instruction;
instruction = 0;
for (int i = 0, unsigned char* p = buffer + 3; i < 4; i++, p--) {
instruction <<= 8;
instruction += *p;
}

Resources