char[] to uint32_t not working properly - c

I am using the following to convert a char[4] to a uint32_t.
frameSize = (uint32_t)(frameSizeBytes[0] << 24) | (frameSizeBytes[1] << 16) | (frameSizeBytes[2] << 8) | frameSizeBytes[3];
frameSize is a uint32_t variable, and frameSizeBytes is a char[4] array. When the array contains, for example, the following values (in hex)
00 00 02 7b
frameSize is set to 635, which is the correct value. This method also works for other combinations of bytes, with the exception of the following
00 00 9e ba
for this case, frameSize is set to 4294967226, which, according to this website, is incorrect, as it should be 40634 instead. Why is this behavior happening?

Your char type is signed in your specific implementation and undergoes integer promotion with most operators. Use a cast to unsigned char where the signed array elements are used.
EDIT: actually as pointed out by Olaf in the comment, you should actually prefer casts to unsigned int (assuming common 32-bit unsigned int) or uint32_t to avoid potential undefined behavior with the << 24 shift operation.

To keep things tidy I'd suggest an inline function along the lines of:
static inline uint32_t be_to_uint32(void const *ptr)
{
unsigned char const *p = ptr;
return p[0] * 0x1000000ul + p[1] * 0x10000 + p[2] * 0x100 + p[3];
}
Note: by using an unsigned long constant this code avoids the problem of unsigned char being promoted to signed int and then having the multiplication/shift cause integer overflow (an annoying historical feature of C). Of course you could also use ((uint32_t)p[0]) << 24 as suggested by Olaf.

Related

Cast char subarray to integer

unsigned int b;
unsigned char a[] =
{0x00,0x00,0x00,0x12,0x00,0x00,0x81,0x03,0x00,0x00,0x00,0x00,0x01,0x91,0x01,0x01,0xb1,0x04,0x47,0x86,0x8f,0xf8,0x00};
I'm a newbie in C programming,
I need to take the 4 bytes subarray start at a[18] which is 0x47868ff8,
and cast it into corresponding decimal integer:1200001016.
I try to use memcpy(&b,a+18, 4), but it does not seem to work,
Could anyone give me some hints to work out this function?
And if I want to read a char pointer message then cast per 4 bytes in order into integer array,
what is the best way to do that? Thanks.
Copying like that has implementation-defined behavior, and you'll get different results depending on the endianness of the CPU.
To do it portably you can use bitwise operations.
b = (unsigned int)a[18] << 24 | (unsigned int)a[19] << 16 | (unsigned int)a[20] << 8 | a[21];

Problem of converting byte order for unsigned 64-bit number in C

I am playing with little endian/big endian conversion and found something that a is a bit confusing but also interesting.
In first example, there is no problem using bit shift to convert byte order for type of uint32_t. It basically cast a uint32_t integer to an array of uint8_t and try to access each byte and bit shift.
Example #1:
uint32_t htonl(uint32_t x)
{
uint8_t *s = (uint8_t*)&x;
return (uint32_t)(s[0] << 24 | s[1] << 16 | s[2] << 8 | s[3]);
}
However, if I try to do something similar on a uint64_t below, the compiler throws a warning about 's[0] width is less than 56 bits` as in Example #2 below.
Example #2:
uint64_t htonl(uint64_t x)
{
uint8_t *s = (uint8_t*)&x;
return (uint64_t)(s[0] << 56 ......);
}
To make it work, I have to fetch each byte into a uint64_t so I can do bit shift without any errors as in Example #3 below.
Example #3:
uint64_t htonll2(uint64_t x)
{
uint64_t byte1 = x & 0xff00000000000000;
uint64_t byte2 = x & 0x00ff000000000000;
uint64_t byte3 = x & 0x0000ff0000000000;
uint64_t byte4 = x & 0x000000ff00000000;
uint64_t byte5 = x & 0x00000000ff000000;
uint64_t byte6 = x & 0x0000000000ff0000;
uint64_t byte7 = x & 0x000000000000ff00;
uint64_t byte8 = x & 0x00000000000000ff;
return (uint64_t)(byte1 >> 56 | byte2 >> 40 | byte3 >> 24 | byte4 >> 8 |
byte5 << 8 | byte6 << 24 | byte7 << 40 | byte8 << 56);
}
I am a little bit confused by Example #1 and Example #2, as far as I understand, both s[i] is of uint8_t size, but somehow if it only shift 32 bits or less there is no problem at all, but there is an issue when shifting like 56 bits. I am running this program on Ubuntu with GCC 8.3.0.
Does the compiler implicitly convert s[i] into 32-bit numbers in this case? sizeof(s[0]) is 1 when I added debug messages to that.
Values with a type smaller than int are promoted to int when used in an expression. Assuming an int is 32 bit on your platform this works in most cases when converting a 32 bit value. The time it won't work is if you shift a 1 bit into the sign bit.
In the 64 bit case this means you're attempting to shift a value more than its bit length which is undefined behavior.
You need to cast each byte to a uint64_t in both cases to allows the shifts to work properly.
The s[0] expression has an 8-bit wide integral type, which is promoted to a 32-bit unsigned integer when operated on by the shift operator – so s[0] << 24 in the first example works OK, as the shift by 24 does not exceed the uint length.
OTOH the shift by 56 bits moves data outside the result's length as the offset exceeds the length of integer, so it certainly causes a loss of information, hence the warning.

build int32_t from 4 uint8_t values

I have a function which constructs an int32_t from 4 uint8_t values and use the following test to have some confidence that my results are what they're expected to be because I'm depending on (what I believe is) Implementation Defined Behaviour.
Does it make sense what I'm doing?
Are there better ways to construct the int32_t?
int32_t expected = -1;
int32_t res = 0;
uint8_t b0 = 0xFF;
uint8_t b1 = 0xFF;
uint8_t b2 = 0xFF;
uint8_t b3 = 0xFF;
res |= b0;
res |= b1 << 8;
res |= b2 << 16;
/* This is IDB, this value cannot be represented in int32_t */
res |= ((uint32_t) b3) << 24;
ck_assert(res == expected);
This is pretty much the best way, except you should have a cast to uint32_t on every line, to avoid implicit conversions and signedness issues.
The main concern here is that performing bit-wise operations on signed operands tends to invoke poorly-defined behavior. In particular, look out for this:
Left-shifting a bit into the sign bit of a signed number invokes undefined behavior. This includes shifting more bits than the type can hold.
Left-shifting a signed variable which has a negative value invokes undefined behavior.
Right-shifting a signed variable which has a negative value invokes implementation-defined behavior (you end up with either arithmetic or logical shift).
These concerns can all be avoided by making sure that you always shift using an unsigned type, like a uint32_t. For example:
res |= (int32_t) ((uint32_t)b1 << 8;)
The above code is rugged and good practice, since it doesn't contain any implicit promotions.
Endianess is no concern in this case since you use bit shifts. Had you used any other method (type punning, pointer arithmetic etc) it would have been a concern.
Signedness format is no concern in this case. The stdint.h types are guaranteed to use 2's complement. They will never use 1's complement or sign & magnitude.
typedef union
{
uint32_t u32;
int32_t i32;
float f;
uint16_t u16[2];
int16_t i16[2];
uint8_t u8[4];
int8_t i8[4];
char c[4];
} any32;
I keep that in my back pocket for all of my embedded system projects. Aside from needing to understand the endian-ness of your system, you can build the 32bit values rather easily from 8bit pieces. This is very useful if you are shuttling out bytes on a serial line or I2C or SPI. It's also useful if you are working with 8.24 (or 16.16 or 24.8) fixed point math. I generally supplement this with some #defines to help with any endian headaches:
//!\todo add 16-bit boundary endian-ness options
#if (__LITTLE_ENDIAN)
#define FP_824_INTEGER (3)
#define FP_824_FRAC_HI (2)
#define FP_824_FRAC_MID (1)
#define FP_824_FRAC_LOW (0)
#elif (__BIG_ENDIAN)
#define FP_824_INTEGER (0)
#define FP_824_FRAC_HI (1)
#define FP_824_FRAC_MID (2)
#define FP_824_FRAC_LOW (3)
#else
#error undefined endian implementation
#endif
If the implementation supports int32_t, uint32_t, and uint8_t, the following is guaranteed to have no implementation-defined values, or to raise an implementation-defined signal (see C11 §6.3.1.3):
#include <stdint.h>
void foo(void)
{
uint8_t b0 = 0xef;
uint8_t b1 = 0xbe;
uint8_t b2 = 0xad;
uint8_t b3 = 0xde;
uint32_t ures = b0 | ((uint32_t)b1 << 8) |
((uint32_t)b2 << 16) | ((uint32_t)b3 << 24);
// Avoid implementation-defined value or signal...
int32_t sres = (ures < (uint32_t)INT32_MIN) ?
(int32_t)ures : INT32_MIN + (int32_t)(ures & INT32_MAX);
}
The fixed-width signed integer types are guaranteed to have a 2's complement representation, but have no specific rules for converting out-of-range values (unlike the unsigned fixed-width integer types).
EDIT: Perhaps (ures <= INT32_MAX) would be more intuitive than (ures < (uint32_t)INT32_MIN) in the above code.

Combined 3 unsigned chars into an unsigned int

I have a requirement to have an unsigned 3 byte type in C. I am looking for a way to pack them into a single unsigned int.
Is this safe or does this need to be stored inside an array/structure for the 24 bit size?
unsigned int pack_3Byte(unsigned char b1, unsigned char b2, unsigned char b3)
{
return (b1 << 16) | (b2 << 8) | (b3);
}
Your code is correct but like Olaf says you should use the types uint8_t and uint32_t to ensure that your types are really the width you expect them to be.
This may not be a problem right now, but you should also be aware that the bytes in an integer are stored in different order on different processors. This is called endianness.

Store a signed long int (32bit) as 4 octets?

I managed to get a unsigned long int octets-representation (BE) by reading IPv4 methods, and I managed to read about how signed integers are using the MSB as the sign indicator, which makes 00 00 00 00 to be 0, while 7F FF FF FF is 2147483647.
But I can't manage how to do the same for signed long integers?
#include <stdio.h>
#include <string.h>
int main (void)
{
unsigned long int intu32;
unsigned char octets[4];
intu32 = 255;
octets[3] = (intu32) & 255;
octets[2] = (intu32 >> 8) & 255;
octets[1] = (intu32 >> 16) & 255;
octets[0] = (intu32 >> 24) & 255;
printf("(%d)(%d)(%d)(%d)\n", octets[0], octets[1], octets[2], octets[3]);
intu32 = (octets[0] << 24) | (octets[1] << 16) | (octets[2] << 8) | octets[3];
printf("intu32:%lu\n", intu32);
return 0;
}
Thanks in advance,
Doori bar
There is no difference. You can always serialize/deserialize signed integers as if they are unsigned, the difference is only in the interpretation of the bits, not in the bits themselves.
Of course, this only holds true if you know that the unsigned and signed integers are of the same size, so that no bits get lost.
Also, you need to be careful (as you are) that no intermediary stage does any unplanned sign-extension or the like, the use of unsigned char for individual bytes is a good idea.
You are probably confused that it is common practise (and applied in ix86 processors) to encode negative values using twos complement encoding. This means that the hex notation of 4 byte -1 is 0xffffffff. The reason this encoding is used is that by taking into account automatic overflow adding 2 0x00000002 to -1 will yield the correct result (0x00000001).
Do you want something like this? It would be helpul (as Vicki asked) if you could provide what you have and what you want to get.
#include <stdio.h>
#include <string.h>
int main (void)
{
union{
long int intu32;
char octets[4];
} u;
u.intu32 = 255;
printf("(%d)(%d)(%d)(%d)\n", (int) u.octets[3], (int) u.octets[2], (int) u.octets[1], (int) u.octets[0]);
printf("intu32:%lu\n", u.intu32);
return 0;
}

Resources