build int32_t from 4 uint8_t values - c

I have a function which constructs an int32_t from 4 uint8_t values and use the following test to have some confidence that my results are what they're expected to be because I'm depending on (what I believe is) Implementation Defined Behaviour.
Does it make sense what I'm doing?
Are there better ways to construct the int32_t?
int32_t expected = -1;
int32_t res = 0;
uint8_t b0 = 0xFF;
uint8_t b1 = 0xFF;
uint8_t b2 = 0xFF;
uint8_t b3 = 0xFF;
res |= b0;
res |= b1 << 8;
res |= b2 << 16;
/* This is IDB, this value cannot be represented in int32_t */
res |= ((uint32_t) b3) << 24;
ck_assert(res == expected);

This is pretty much the best way, except you should have a cast to uint32_t on every line, to avoid implicit conversions and signedness issues.
The main concern here is that performing bit-wise operations on signed operands tends to invoke poorly-defined behavior. In particular, look out for this:
Left-shifting a bit into the sign bit of a signed number invokes undefined behavior. This includes shifting more bits than the type can hold.
Left-shifting a signed variable which has a negative value invokes undefined behavior.
Right-shifting a signed variable which has a negative value invokes implementation-defined behavior (you end up with either arithmetic or logical shift).
These concerns can all be avoided by making sure that you always shift using an unsigned type, like a uint32_t. For example:
res |= (int32_t) ((uint32_t)b1 << 8;)
The above code is rugged and good practice, since it doesn't contain any implicit promotions.
Endianess is no concern in this case since you use bit shifts. Had you used any other method (type punning, pointer arithmetic etc) it would have been a concern.
Signedness format is no concern in this case. The stdint.h types are guaranteed to use 2's complement. They will never use 1's complement or sign & magnitude.

typedef union
{
uint32_t u32;
int32_t i32;
float f;
uint16_t u16[2];
int16_t i16[2];
uint8_t u8[4];
int8_t i8[4];
char c[4];
} any32;
I keep that in my back pocket for all of my embedded system projects. Aside from needing to understand the endian-ness of your system, you can build the 32bit values rather easily from 8bit pieces. This is very useful if you are shuttling out bytes on a serial line or I2C or SPI. It's also useful if you are working with 8.24 (or 16.16 or 24.8) fixed point math. I generally supplement this with some #defines to help with any endian headaches:
//!\todo add 16-bit boundary endian-ness options
#if (__LITTLE_ENDIAN)
#define FP_824_INTEGER (3)
#define FP_824_FRAC_HI (2)
#define FP_824_FRAC_MID (1)
#define FP_824_FRAC_LOW (0)
#elif (__BIG_ENDIAN)
#define FP_824_INTEGER (0)
#define FP_824_FRAC_HI (1)
#define FP_824_FRAC_MID (2)
#define FP_824_FRAC_LOW (3)
#else
#error undefined endian implementation
#endif

If the implementation supports int32_t, uint32_t, and uint8_t, the following is guaranteed to have no implementation-defined values, or to raise an implementation-defined signal (see C11 §6.3.1.3):
#include <stdint.h>
void foo(void)
{
uint8_t b0 = 0xef;
uint8_t b1 = 0xbe;
uint8_t b2 = 0xad;
uint8_t b3 = 0xde;
uint32_t ures = b0 | ((uint32_t)b1 << 8) |
((uint32_t)b2 << 16) | ((uint32_t)b3 << 24);
// Avoid implementation-defined value or signal...
int32_t sres = (ures < (uint32_t)INT32_MIN) ?
(int32_t)ures : INT32_MIN + (int32_t)(ures & INT32_MAX);
}
The fixed-width signed integer types are guaranteed to have a 2's complement representation, but have no specific rules for converting out-of-range values (unlike the unsigned fixed-width integer types).
EDIT: Perhaps (ures <= INT32_MAX) would be more intuitive than (ures < (uint32_t)INT32_MIN) in the above code.

Related

How to combine two hex value(High Value & Low Value) at two different array positions?

I received two hex values where at array[1] = lowbyte and at array[2] = highbyte where for my example lowbyte = 0xF4 and highbyte = 0x01 so the value will be in my example 1F4(500). So I want to combine these two values and compare but how do I do that without any library function?
Please help and sorry for my bad English.
I did some research and I found this as my solution and it seems to be working fine:
int temp = (short)(((HIGHBYTE) & 0xFF) << 8 | (LOWBYTE) & 0xFF);
Just a basic example showing how to combine values of two different variables into one:
#include <stdio.h>
int main (void)
{
char highbyte = 0x01;
unsigned char lowbyte = 0xF4; //Edited as per comments from #Fe2O3,
short int val = 0;
val = (highbyte << 8) | lowbyte; // If lowbyte declared as signed, then masking is required `lowbyte & 0xFF`
printf("0x%hx\n", val);
return 0;
}
Tested this on Linux PC.
Based on the answer where you converted to short, it seems you may want to combine the two bytes to produce a 16-bit two’s complement integer. This answer shows how to do that in three ways for which the behavior is fully defined by the C standard, as well as a fourth way that requires knowledge of the C implementation being used. Methods 1 and 3 are also defined in C++.
Given two eight-bit unsigned bytes with the more significant byte in highbyte and the less significant byte in lowbyte, four options for constructing the 16-bit two’s complement value they represent are:
Assemble the bytes in the desired order and copy them into an int16_t: uint16_t t = (uint16_t) highbyte << 8 | lowbyte; int16_t result; memcpy(&result, &t, sizeof result);.
Assemble the bytes in the desired order and use a union to reinterpret them: int16_t result = (union { uint16_t u; int16_t i; }) { (uint16_t) highbyte << 8 | lowbyte } .i;.
Construct the result arithmetically: int16_t result = ((highbyte ^ 128) - 128) * 256 + lowbyte;.
If it is given that the code will be used only with C implementations that define conversion to a signed integer to wrap, then a conversion may be used: int16_t result = (int16_t) ((uint16_t) highbyte << 8 | lowbyte);.
(In the last, the conversion to int16_t is implicit in the initialization, but a cast is used because, without it, some compilers will produce a warning or error, depending on switches.)
Note: int16_t and uint16_t are defined by including <stdint.h>. Alternatively, if it is given that short is 16 bits, then short and unsigned short may be used in place of int16_t and uint16_t.
Here is more information about the first three of these.
1. Assemble the bytes and copy
(uint16_t) highbyte << 8 | lowbyte converts to a type suitable for shifting without sign-bit issues, moves the more significant byte into the upper 8 bits of 16, and puts the less significant byte into the lower 8 bits.
Then uint16_t = …; puts those bits into a uint16_t.
memcpy(&result, &t, sizeof result); copies those bits into an int16_t. C 2018 7.20.1.1 1 guarantees that int16_t uses two’s complement. C 2018 6.2.6.2 2 guarantees that the value bits in int16_t have the same position values as their counterparts in uint16_t, so the copy produces the desired arrangement in result.
2. Assemble the bytes and use a union
(type) { initial value } is a compound literal. (union { uint16_t u; int16_t i; }) { (uint16_t) highbyte << 8 | lowbyte } makes a compound literal that is a union and initializes its u member to have the value described above. Then .i reads the i member of the union, which reinterprets the bits using the type int16_t, which is two’s complement as describe above. Then int16_t result = …; initializes result to this value.
3. Construct the result arithmetically
Here we start with the more significant byte separately, interpreting the eight bits of highbyte as two’s complement. In eight-bit two’s complement, the sign bit represents 0 if it is off and −128 if it is on. (For example, 111111002 as unsigned binary represents 128+64+32+16+8+4 =252, but, in two’s complement, it is −128+64+32+16+8+4 = −4.)
Consider highbyte ^ 128) - 128. If the first bit is off, ^ 128 turns it on, which adds 128 to its unsigned binary meaning. Then - 128 subtracts 128, producing a net effect of zero. If the first bit is on, ^ 128 turns it off, which cancels its unsigned binary meaning. Then - 128 gives the desired value. Thus (highbyte ^ 128) - 128 reinterprets the first bit to have a value of 0 if it is off and −128 if it is on.
Then ((highbyte ^ 128) - 128) * 256 moves this to the more significant byte of 16 bits (in an int type at this point), and + lowbyte puts the less significant byte in the less significant position. And of course int16_t result = …; initializes result to this computed value.

Convert 8 bit signed integer to unsigned and then convert to int32

I have a signed 8-bit integer (int8_t) -- which can be any value from -5 to 5 -- and need to convert it to an unsigned 8-bit integer (uint8_t).
This uint8_t value then gets passed to another piece of hardware (which can only handle 32-bit types) and needs to be converted to a int32_t.
How can I do this?
Example code:
#include <stdio.h>
#include <stdint.h>
void main() {
int8_t input;
uint8_t package;
int32_t output;
input = -5;
package = (uint8_t)input;
output = (int32_t)package;
printf("output = %d",output);
}
In this example, I start with -5. It temporarily gets cast to 251 so it can be packaged as a uint8_t. This data then gets sent to another piece of hardware where I can't use (int8_t) to cast the 8-bit unsigned integer back to signed before casting to int32_t. Ultimately, I want to be able to obtain the original -5 value.
For more info, the receiving hardware is a SHARC processor which doesn't allow int8_t - see https://ez.analog.com/dsp/sharc-processors/f/q-a/118470/error-using-stdint-h-types
The smallest addressable memory unit on the SHARC processor is 32 bits, which means that the minimum size of any data type is 32 bits. This applies to the native C types like char and short. Because the types "int8_t", "uint16_t" specify that the size of the type must be 8 bits and 16 bits respectively, they cannot be supported for SHARC.
Here is one possible branch-free conversion:
output = package; // range 0 to 255
output -= (output & 0x80) << 1;
The second line will subtract 256 if bit 7 is set, e.g.:
251 has bit 7 set, 251 - 256 = -5
5 has bit 7 clear, 5 - 0 = 5
If you want to get the negative sign back using 32-bit operations, you could do something like this:
output = (int32_t)package;
if (output & 0x80) { /* char sign bit set */
output |= 0xffffff00;
}
printf("output = %d",output);
Since your receiver platform does not have types that are less than 32 bits wide, your simplest option is to solve this problem on the sender:
int8_t input = -5;
int32_t input_extended = input;
uint8_t buffer[4];
memcpy(buffer, &input_extended, 4);
send_data(buffer, 4);
Then on the receiving end you can simply treat the data as a single int32_t:
int32_t received_data;
receive_data(&received_data, 4);
All of this is assuming that your sender and receiver share the same endianness. If not, you will have to flip the endianness in the sender before sending:
int8_t input = -5;
int32_t input_extended = input;
uint32_t tmp = (uint32_t)input_extended;
tmp = ((tmp >> 24) & 0x000000ff)
| ((tmp >> 8) & 0x0000ff00)
| ((tmp << 8) & 0x00ff0000)
| ((tmp << 24) & 0xff000000);
uint8_t buffer[4];
memcpy(buffer, &tmp, 4);
send_data(buffer, 4);
Just subtract 256 from the value, because in 2's complement an n-bit negative value v is stored as 2n - v
input = -5;
package = (uint8_t)input;
output = package > 127 ? (int32_t)package - 256 : package;
EDIT:
If the issue is that your code has if statements for values of -5 to 5, than the simplest solution might be to test for result + 5 and change the if statements to values between 0 and 10.
This is probably what the compiler will do when optimizing (since values of 0-10 can be converted to a map, avoiding if statements and minimizing predictive CPU flushing).
Original:
Type casting will work if first cast to uint8_t and then uint32_t...
output = (int32_t)(uint32_t)(uint8_t)input;
Of course, if the 8th bit is set it will remain set, but the sign won't be extended since the type casting operation is telling the compiler to treat the 8th bit as a regular bit (it is unsigned).
Of course, you can always have fun with bit masking if you want to be even more strict, but that's essentially a waste or CPU cycles.
The code:
#include <stdint.h>
#include <stdio.h>
void main() {
int8_t input;
int32_t output;
input = -5;
output = (int32_t)(uint32_t)(uint8_t)input;
printf("output = %d\n", output);
}
Results in "output = 251".

Difference between two C code reading register

I would like to know the difference between these 2 codes. The first
int32_t lsm6dso32_temperature_raw_get(stmdev_ctx_t *ctx, int16_t *val)
uint8_t buff[2];
int32_t ret;
ret = lsm6dso32_read_reg(ctx, LSM6DSO32_OUT_TEMP_L, buff, 2);
val[0] = (int16_t)buff[1];
val[0] = (val[0] * 256) + (int16_t)buff[0];
return ret;
}
And this one
//some code here
uint8_t buffer[14];
data_reg.read(buffer, 14);
rawTemp = buffer[1] << 8 | buffer[0];
//some other code here
The first code comes from ST and the second comes from Adafruit, all these code are made for LSM6DSO32 6 dof captor. I'am only interested in reading byte. what is the difference between
val[0] = (int16_t)buff[1];
val[0] = (val[0] * 256) + (int16_t)buff[0];
And this
rawTemp = buffer[1] << 8 | buffer[0];
Is one solution better than the other ? All of these code read raw temperature from the captor.
Edit rawTemp is int16_t
The two methods may be identical in their defined behavior, depending on the type of rawTemp, which we are not shown. However, neither is a generally correct way to decode an int16_t value from two eight-bit bytes.
Considering this code first:
val[0] = (int16_t)buff[1];
val[0] = (val[0] * 256) + (int16_t)buff[0];
buff[1] is uint8_t, so its values range from to 255, so val[0]*256 can range from 0 to 255•256 = 65,280. val[0] is int16_t, so its values range from −32,768 to +32,767. So this assignment may exceed what can be stored in val[0]. In that case, the result of the conversion is implementation-defined.
Further, the type of val[0]*256 is int (due to the usual arithmetic conversions). So, if int is 16 bit, then val[0]*256 may overflow what is representable in int, in which case the behavior is undefined.
Considering this code:
int16_t rawTemp;
rawTemp = buffer[1] << 8 | buffer[0];
buffer[1] << 8 may also overflow int. Since the behavior is undefined, an overflow is not required to produce the same result as the above code, particularly since the operation is different (shift versus multiplication) and it is combined with the other byte in a different way (bitwise OR instead of addition).
As with the first code, an implementation-defined conversion may occur in the assignment. If so, and the right side of the assignment does not overflow (perhaps because int is 32 bits), then the conversion will produce the same result as in the first code, since it is implementation-defined.
In the absence of assurance that the first byte is less than 128 or that int is more than 16 bits and the implementation-defined conversion wraps modulo 216, neither of these is a correct way to decode an int16_t from two eight-bit bytes.
C is deficient in good ways to reassemble a signed integer from bits. The various arithmetic and bit operators get hung up on the sign bit and overflows. One method is to assemble an unsigned integer and copy its bits into a signed integer:
uint16_t t = (uint16_t) buffer[1] << 8 | buffer[0];
int16_t ret;
memcpy(&ret, &t, sizeof ret);
A good compiler will generate code that eliminates the memcpy.
(Note all of the above assumes a little-endian representation—the earlier byte, buffer[0] goes into the low bits (“little end”) of the value.)

Bitwise operation in C language (0x80, 0xFF, << )

I have a problem understanding this code. What I know is that we have passed a code into a assembler that has converted code into "byte code". Now I have a Virtual machine that is supposed to read this code. This function is supposed to read the first byte code instruction. I don't understand what is happening in this code. I guess we are trying to read this byte code but don't understand how it is done.
static int32_t bytecode_to_int32(const uint8_t *bytecode, size_t size)
{
int32_t result;
t_bool sign;
int i;
result = 0;
sign = (t_bool)(bytecode[0] & 0x80);
i = 0;
while (size)
{
if (sign)
result += ((bytecode[size - 1] ^ 0xFF) << (i++ * 8));
else
result += bytecode[size - 1] << (i++ * 8);
size--;
}
if (sign)
result = ~(result);
return (result);
}
This code is somewhat badly written, lots of operations on a single line and therefore containing various potential bugs. It looks brittle.
bytecode[0] & 0x80 Simply reads the MSB sign bit, assuming it's 2's complement or similar, then converts it to a boolean.
The loop iterates backwards from most significant byte to least significant.
If the sign was negative, the code will perform an XOR of the data byte with 0xFF. Basically inverting all bits in the data. The result of the XOR is an int.
The data byte (or the result of the above XOR) is then bit shifted i * 8 bits to the left. The data is always implicitly promoted to int, so in case i * 8 happens to give a result larger than INT_MAX, there's a fat undefined behavior bug here. It would be much safer practice to cast to uint32_t before the shift, carry out the shift, then convert to a signed type afterwards.
The resulting int is converted to int32_t - these could be the same type or different types depending on system.
i is incremented by 1, size is decremented by 1.
If sign was negative, the int32_t is inverted to some 2's complement negative number that's sign extended and all the data bits are inverted once more. Except all zeros that got shifted in with the left shift are also replaced by ones. If this is intentional or not, I cannot tell. So for example if you started with something like 0x0081 you now have something like 0xFFFF01FF. How that format makes sense, I have no idea.
My take is that the bytecode[size - 1] ^ 0xFF (which is equivalent to ~) was made to toggle the data bits, so that they would later toggle back to their original values when ~ is called later. A programmer has to document such tricks with comments, if they are anything close to competent.
Anyway, don't use this code. If the intention was merely to swap the byte order (endianess) of a 4 byte integer, then this code must be rewritten from scratch.
That's properly done as:
static int32_t big32_to_little32 (const uint8_t* bytes)
{
uint32_t result = (uint32_t)bytes[0] << 24 |
(uint32_t)bytes[1] << 16 |
(uint32_t)bytes[2] << 8 |
(uint32_t)bytes[3] << 0 ;
return (int32_t)result;
}
Anything more complicated than the above is highly questionable code. We need not worry about signs being a special case, the above code preserves the original signedness format.
So the A^0xFF toggles the bits set in A, so if you have 10101100 xored with 11111111.. it will become 01010011. I am not sure why they didn't use ~ here. The ^ is a xor operator, so you are xoring with 0xFF.
The << is a bitshift "up" or left. In other words, A<<1 is equivalent to multiplying A by 2.
the >> moves down so is equivalent to bitshifting right, or dividing by 2.
The ~ inverts the bits in a byte.
Note it's better to initialise variables at declaration it costs no additional processing whatsoever to do it that way.
sign = (t_bool)(bytecode[0] & 0x80); the sign in the number is stored in the 8th bit (or position 7 counting from 0), which is where the 0x80 is coming from. So it's literally checking if the signed bit is set in the first byte of bytecode, and if so then it stores it in the sign variable.
Essentially if it's unsigned then it's copying the bytes from from bytecode into result one byte at a time.
If the data is signed then it flips the bits then copies the bytes, then when it's done copying, it flips the bits back.
Personally with this kind of thing i prefer to get the data, stick in htons() format (network byte order) and then memcpy it to an allocated array, store it in a endian agnostic way, then when i retrieve the data i use ntohs() to convert it back to the format used by the computer. htons() and ntohs() are standard C functions and are used in networking and platform agnostic data formatting / storage / communication all the time.
This function is a very naive version of the function which converts form the big endian to little endian.
The parameter size is not needed as it works only with the 4 bytes data.
It can be much easier archived by the union punning (and it allows compilers to optimize it - in this case to the simple instruction):
#define SWAP(a,b,t) do{t c = (a); (a) = (b); (b) = c;}while(0)
int32_t my_bytecode_to_int32(const uint8_t *bytecode)
{
union
{
int32_t i32;
uint8_t b8[4];
}i32;
uint8_t b;
i32.b8[3] = *bytecode++;
i32.b8[2] = *bytecode++;
i32.b8[1] = *bytecode++;
i32.b8[0] = *bytecode++;
return i32.i32;
}
int main()
{
union {
int32_t i32;
uint8_t b8[4];
}i32;
uint8_t b;
i32.i32 = -4567;
SWAP(i32.b8[0], i32.b8[3], uint8_t);
SWAP(i32.b8[1], i32.b8[2], uint8_t);
printf("%d\n", bytecode_to_int32(i32.b8, 4));
i32.i32 = -34;
SWAP(i32.b8[0], i32.b8[3], uint8_t);
SWAP(i32.b8[1], i32.b8[2], uint8_t);
printf("%d\n", my_bytecode_to_int32(i32.b8));
}
https://godbolt.org/z/rb6Na5
If the purpose of the code is to sign-extend a 1-, 2-, 3-, or 4-byte sequence in network/big-endian byte order to a signed 32-bit int value, it's doing things the hard way and reimplementing the wheel along the way.
This can be broken down into a three-step process: convert the proper number of bytes to a 32-bit integer value, sign-extend bytes out to 32 bits, then convert that 32-bit value from big-endian to the host's byte order.
The "wheel" being reimplemented in this case is the the POSIX-standard ntohl() function that converts a 32-bit unsigned integer value in big-endian/network byte order to the local host's native byte order.
The first step I'd do is to convert 1, 2, 3, or 4 bytes into a uint32_t:
#include <stdint.h>
#include <limits.h>
#include <arpa/inet.h>
#include <errno.h>
// convert the `size` number of bytes starting at the `bytecode` address
// to a uint32_t value
static uint32_t bytecode_to_uint32( const uint8_t *bytecode, size_t size )
{
uint32_t result = 0;
switch ( size )
{
case 4:
result = bytecode[ 0 ] << 24;
case 3:
result += bytecode[ 1 ] << 16;
case 2:
result += bytecode[ 2 ] << 8;
case 1:
result += bytecode[ 3 ];
break;
default:
// error handling here
break;
}
return( result );
}
Then, sign-extend it (borrowing from this answer):
static uint32_t sign_extend_uint32( uint32_t in, size_t size );
{
if ( size == 4 )
{
return( in );
}
// being pedantic here - the existence of `[u]int32_t` pretty
// much ensures 8 bits/byte
size_t bits = size * CHAR_BIT;
uint32_t m = 1U << ( bits - 1 );
uint32_t result = ( in ^ m ) - m;
return ( result );
}
Put it all together:
static int32_t bytecode_to_int32( const uint8_t *bytecode, size_t size )
{
uint32_t result = bytecode_to_uint32( bytecode, size );
result = sign_extend_uint32( result, size );
// set endianness from network/big-endian to
// whatever this host's endianness is
result = ntohl( result );
// converting uint32_t here to signed int32_t
// can be subject to implementation-defined
// behavior
return( result );
}
Note that the conversion from uint32_t to int32_t implicitly performed by the return statement in the above code can result in implemenation-defined behavior as there can be uint32_t values that can not be mapped to int32_t values. See this answer.
Any decent compiler should optimize that well into inline functions.
I personally think this also needs much better error handling/input validation.

Store an int in a char buffer in C and then retrieve the same

I am writing a socket client-server application where the server needs to send a large buffer to a client and all buffers should be processed separately, so I want to put the buffer length in the buffer so that the client can read the length of data from the buffer and process accordingly.
To put the length value I need to divide an integer value in one byte each and store it in a buffer to be sent over the socket. I am able to break the integer into four parts, but at the time of joining I am not able to retrieve the correct value. To demonstrate my problem I have written a sample program where I am dividing int into four char variables and then join it back in another integer. The goal is that after joining I should get the same result.
Here is my small program.
#include <stdio.h>
int main ()
{
int inVal = 0, outVal =0;
char buf[5] = {0};
inVal = 67502978;
printf ("inVal: %d\n", inVal);
buf[0] = inVal & 0xff;
buf[1] = (inVal >> 8) & 0xff;
buf[2] = (inVal >> 16) & 0xff;
buf[3] = (inVal >> 24) & 0xff;
outVal = buf[3];
outVal = outVal << 8;
outVal |= buf[2];
outVal = outVal << 8;
outVal |= buf[1];
outVal = outVal << 8;
outVal |= buf[0];
printf ("outVal: %d\n",outVal);
return 0;
}
Output
inVal: 67502978
outVal: -126
What am I doing wrong?
One problem is that you are using bit-wise operators on signed numbers. This is always a bad idea and almost always incorrect. Please note that char has implementation-defined signedness, unlike int which is always signed.
Therefore you should replace int with uint32_t and char with uint8_t. With such unsigned types you eliminate the possibility of using bit shifts on negative numbers, which would be a bug. Similarly, if you shift data into the sign bits of a signed number, you will get bugs.
And needless to say, the code will not work if integers are not 4 bytes large.
Your method has potential implementation defined behavior as well as undefined behavior:
storing values into the array of type char beyond the range of type char has implementation defined behavior: buf[0] = inVal & 0xff; and the next 3 statements (inVal & 0xff might be larger than CHAR_MAX if char type is signed by default).
left shifting negative values invokes undefined behavior: if any of the 3 first bytes in the array becomes negative as the implementation defined result of storing a value larger than CHAR_MAX into it, the resulting outVal becomes negative, left shifting it is undefined.
In your specific example, your architecture uses 2's complement representation for negative values and the type char is signed. The value stored into buf[0] is 67502978 & 0xff = 130, becomes -126. The last statement outVal |= buf[0]; sets bits 7 through 31 of outVal and the result is -126.
You can avoid these issues by using an array of unsigned char and values of type unsigned int:
#include <stdio.h>
int main(void) {
unsigned int inVal = 0, outVal = 0;
unsigned char buf[4] = { 0 };
inVal = 67502978;
printf("inVal: %u\n", inVal);
buf[0] = inVal & 0xff;
buf[1] = (inVal >> 8) & 0xff;
buf[2] = (inVal >> 16) & 0xff;
buf[3] = (inVal >> 24) & 0xff;
outVal = buf[3];
outVal <<= 8;
outVal |= buf[2];
outVal <<= 8;
outVal |= buf[1];
outVal <<= 8;
outVal |= buf[0];
printf("outVal: %u\n", outVal);
return 0;
}
Note that the above code still assumes 32-bit ints.
While bit shifts of signed values can be a problem, this is not the case here (all left hand values are positive, and all results are within the range of a 32 bit unsigned int).
The problematic expression with somewhat unintuitive semantics is the last bitwise OR:
outVal |= buf[0];
buf[0] is a (on your and my architecture) signed char with the value -126, simply because the most significant bit in the least significant byte of 67502978 is set. In C all operands in an arithmetic expression are subject to the arithmetic conversions. Specifically, they undergo integer promotion which states: "If an int can represent all values of the original type [...], the value is converted to an int". Accordingly, the signed character buf[0] is converted to a (signed) int, preserving its value of -126. A negative signed int has the sign bit set. ORing that with another signed int sets the result's sign bit as well, making that value negative. That is exactly what we are seeing.
Making the bytes unsigned chars fixes the issue because the value of the temporary integer to which the unsigned char is converted is then a simple 8 bit value of 130.
C++ standard N3936 quotes about shift operators:
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are zero-filled.
If E1 has an unsigned type,
the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value
representable in the result type.
Otherwise, if E1 has a signed type and non-negative value,
and E1 × 2^E2 is representable in the
corresponding unsigned type of the result type, then that value,
converted to the result type, is the resulting value; otherwise, the
behavior is undefined.
So, to avoid undefined behaviour, it is recommended to use unsigned data types, and ensure the 64-bits length of data type.
Use unsigned char buf[5] = {0}; and unsigned int for inVal and outVal, and it should work.
When using signed integral types, there arise two sorts of problems:
First, if buf[3] is negative, then due to outVal = buf[3] variable outVal becomes negative; consequent bit shift operators on outVal are then undefined behaviour cppreference.com concerning bit shift operators:
For signed and positive a, the value of a << b is a * 2b if it is
representable the return type, otherwise the behavior is
undefined. (until C++14), the value of a << b is a * 2b if it is
representable in the unsigned version of the return type (which is
then converted to signed: this makes it legal to create INT_MIN as
1<<31), otherwise the behavior is undefined. (since C++14)
For negative a, the behavior of a << b is undefined.
Note that with OP's inVal = 67502978 this does not occur, since buf[3]=4; But for other inVals it may occur and then may bring problems due to "undefined behaviour".
The second problem is that with operation outVal |= buf[0] with buf[0]=-126, the value (char)-126, which in binary format is 10000010, is converted to (int)-126, which in binary format is 11111111111111111111111110000010 before operator |= is applied, and this then will fill up outVal with a lot of 1-bits. The reason for conversion is defined at conversion rules for arithmetic operations (cppreference.com):
If both operands are signed or both are unsigned, the operand with
lesser conversion rank is converted to the operand with the greater
integer conversion rank
So the problem in OP's case is actually not because of any undefined behaviour, but because of having character buf[3] being a negative value, which is converted to int before |= operation.
Note, however, that if either buf[2] or buf[1] had been negative, this would have made outVal negative and would have lead to undefined behaviour on subsequent shift operations, too.
This may be a terrible idea but I'll post it here for interest - you can use a union:
union my_data
{
uint32_t one_int;
struct
{
uint8_t byte3;
uint8_t byte2;
uint8_t byte1;
uint8_t byte0;
}bytes;
};
// Your original code modified to use union my_data
#include <stdio.h>
int main(void) {
union my_data data;
uint32_t inVal = 0, outVal = 0;
uint8_t buf[4] = {0};
inVal = 67502978;
printf("inVal: %u\n", inVal);
data.one_int = inVal;
// Populate bytes into buff
buf[3] = data.bytes.byte3;
buf[2] = data.bytes.byte2;
buf[1] = data.bytes.byte1;
buf[0] = data.bytes.byte0;
return 0;
}
I don't know if this would also work, can't see why not:
union my_data
{
uint32_t one_int;
uint8_t bytes[4];
};
Because of endian differences between architectures, it is best practice to convert numeric values to network order, which is big-endian. On receipt, they can then be converted to the native host order. We can do this in a portable way by using htonl() (host to network "long" = uint32_t), and convert to host order on receipt with ntohl(). Example:
#include <stdio.h>
#include <arpa/inet.h>
int main(int argc, char **argv) {
uint32_t inval = 67502978, outval, backinval;
outval = htonl(inval);
printf("outval: %d\n", outval);
backinval = ntohl(outval);
printf("backinval: %d\n", backinval);
return 0;
}
This gives the following result on my 64 bit x86 which is little endian:
$ gcc -Wall example.c
$ ./a.out
outval: -2113731068
backinval: 67502978
$

Resources