Converting bytes array to integer - c

I have a 4-byte array (data) of type uint8_t, which represents a speed data integer. I'm trying to cast this array to uint32_t integer (speed), multiply this speed by 10 and then restore it back to the 4-byte array (data). The data format is clear in the code below.
I always get the error:
"assignment to expression with array type"
The code:
volatile uint8_t data[4] = {0x00 , 0x00, 0x00, 0x00};
volatile uint32_t speed;
speed=( uint32_t)*data;
speed=speed*10;
data=(uint8_t*)speed;

To be safe according endianess, portable and secure, you should recreate your data:
speed = ((uint32_t)data[0]) << 24
| ((uint32_t)data[1]) << 16
| ((uint32_t)data[2]) << 8
| ((uint32_t)data[3]);
or
speed = ((uint32_t)data[3]) << 24
| ((uint32_t)data[2]) << 16
| ((uint32_t)data[1]) << 8
| ((uint32_t)data[0]);
Choose solution according position of most significant byte
You get an "assignment to expression with array type" error because you can't assign directly an array: data=(uint8_t*)speed; is totally forbidden in C, you definitively can't have an array for lvalue. You have to do inverse operation:
data[0] = (uint8_t)((speed >> 24) & 0x00FF);
data[1] = (uint8_t)((speed >> 16) & 0x00FF);
data[2] = (uint8_t)((speed >> 8) & 0x00FF);
data[3] = (uint8_t)(speed & 0x00FF);
or, according position of most significant byte:
data[3] = (uint8_t)((speed >> 24) & 0x00FF);
data[2] = (uint8_t)((speed >> 16) & 0x00FF);
data[1] = (uint8_t)((speed >> 8) & 0x00FF);
data[0] = (uint8_t)(speed & 0x00FF);
EDIT
Don't use cast or memcpy as mention in commentaries and original answer: in addition of non portability issues, you will have security issues, according alignment restrictions and aliasing rules on some platform, compiler can generate incorrect code - thanks to user694733 | see here - thanks to Lundin
speed = *((uint32_t *)data); // DANGEROUS NEVER USE IT
*((uint32_t *)data) = speed; // DANGEROUS NEVER USE IT

Your code doesn't work because during data=(uint8_t*)speed; you don't get a "lvalue" for data, you just get an array type which can't be used in assignment or any form of arithmetic. Similarly, speed=( uint32_t)*data; is a bug because that only gives you the first item in the array.
The only correct way you should do this:
volatile uint8_t data[4] = {0x00 , 0x00, 0x00, 0x00};
volatile uint32_t speed;
speed = (uint32_t)data[0] << 24 |
(uint32_t)data[1] << 16 |
(uint32_t)data[2] << 8 |
(uint32_t)data[3] << 0;
speed=speed*10;
data[0] = (uint8_t) ((speed >> 24) & 0xFFu);
data[1] = (uint8_t) ((speed >> 16) & 0xFFu);
data[2] = (uint8_t) ((speed >> 8) & 0xFFu);
data[3] = (uint8_t) ((speed >> 0) & 0xFFu);
This is 100% portable and well-defined code. No implicit promotions take place. This code does not rely on endianess or other implementation-defined behavior. Why write code that does, when you can write code that doesn't?

Related

Behaviour of Type promotion in C (from lower signed to higher unsigned) [duplicate]

I have some undefined behaviour in a seemingly innocuous function which is parsing a double value from a buffer. I read the double in two halves, because I am reasonably certain the language standard says that shifting char values is only valid in a 32-bit context.
inline double ReadLittleEndianDouble( const unsigned char *buf )
{
uint64_t lo = (buf[3] << 24) | (buf[2] << 16) | (buf[1] << 8) | buf[0];
uint64_t hi = (buf[7] << 24) | (buf[6] << 16) | (buf[5] << 8) | buf[4];
uint64_t val = (hi << 32) | lo;
return *(double*)&val;
}
Since I am storing 32-bit values into 64-bit variables lo and hi, I reasonably expect that the high-order 32-bits of these variables will always be 0x00000000. But sometimes they contain 0xffffffff or other non-zero rubbish.
The fix is to mask it like this:
uint64_t val = ((hi & 0xffffffffULL) << 32) | (lo & 0xffffffffULL);
Alternatively, it seems to work if I mask during the assignment instead:
uint64_t lo = ((buf[3] << 24) | (buf[2] << 16) | (buf[1] << 8) | buf[0]) & 0xffffffff;
uint64_t hi = ((buf[7] << 24) | (buf[6] << 16) | (buf[5] << 8) | buf[4]) & 0xffffffff;
I would like to know why this is necessary. All I can think of to explain this is that my compiler is doing all the shifting and combining for lo and hi directly on 64-bit registers, and I might expect undefined behaviour in the high-order 32-bits if this is the case.
Can someone please confirm my suspicions or otherwise explain what is happening here, and comment on which (if any) of my two solutions is preferable?
If you try to shift a char or unsigned char you're leaving yourself at the mercy of the standard integer promotions. You're better off casting the values yourself, before you try to shift them. You don't have to separate the lower and upper halves if you do so.
inline double ReadLittleEndianDouble( const unsigned char *buf )
{
uint64_t val = ((uint64_t)buf[7] << 56) | ((uint64_t)buf[6] << 48) | ((uint64_t)buf[5] << 40) | ((uint64_t)buf[4] << 32) |
((uint64_t)buf[3] << 24) | ((uint64_t)buf[2] << 16) | ((uint64_t)buf[1] << 8) | (uint64_t)buf[0];
return *(double*)&val;
}
All this is necessary only if the CPU is big-endian or if the buffer might not be properly aligned for the CPU architecture, otherwise you can simplify this greatly:
return *(double*)buf;

what does a[0] = addr & 0xff?

i'm currently learning from the book "the shellcoder's handbook", I have a strong understanding of c but recently I came across a piece of code that I can't grasp.
Here is the piece of code:
char a[4];
unsigned int addr = 0x0806d3b0;
a[0] = addr & 0xff;
a[1] = (addr & 0xff00) >> 8;
a[2] = (addr & 0xff0000) >> 16;
a[3] = (addr) >> 24;
So the question is what does this, what is addr & 0xff (and the three lines below it) and what makes >> 8 to it (I know that it divides it 8 times by 2)?
Ps: don't hesitate to tell me if you have ideas for the tags that I should use.
The variable addr is 32 bits of data, while each element in the array a is 8 bits. What the code does is copy the 32 bits of addr into the array a, one byte at a time.
Lets take this line:
a[1] = (addr & 0xff00) >> 8;
And then do it step by step.
addr & 0xff00 This gets the bits 8 to 15 of the value in addr, the result after the operation is 0x0000d300.
>> 8 This shifts the bits to the right, so 0x0000d300 becomes 0x000000d3.
Assign the resulting value of the mask and shift to a[1].
The code is trying to enforce endianness on the data input. Specifically, it is trying to enforce little endian behavior on the data. Here is the explaination:
a[0] = addr & 0xff; /* gets the LSB 0xb0 */
a[1] = (addr & 0xff00) >> 8; /* gets the 2nd LSB 0xd3 */
a[2] = (addr & 0xff0000) >> 16; /* gets 2nd MSB 0x06 */
a[3] = (addr) >> 24; /* gets the MSB 0x08 */
So basically, the code is masking and separating out every byte of data and storing it in the array "a" in the little endian format.
unsigned char a[4]; /* I think using unsigned char is better in this case */
unsigned int addr = 0x0806d3b0;
a[0] = addr & 0xff; /* get the least significant byte 0xb0 */
a[1] = (addr & 0xff00) >> 8; /* get the second least significant byte 0xd3 */
a[2] = (addr & 0xff0000) >> 16; /* get the second most significant byte 0x06 */
a[3] = (addr) >> 24; /* get the most significant byte 0x08 */
Apparently, the code isolates the individual bytes from addr to store them in the array a so they can be indexed. The first line
a[0] = addr & 0xff;
masks out the byte of lowest value by using 0xff as a bit mask; the subsequent lines do the same, but in addition shift the result to the rightmost position. Finally, the the last line
a[3] = (addr) >> 24;
no masking is necessary anymore, as all unneccesary information is discarded by the shift.
The code is effectively storing a 32 bit adress in a 4 chars long array. As you may know, a char has a byte (8 bit). It first copies the first byte of the adress, then shifts, copies the second byte, then shifts, etc. You get the gist.
It enforces endianness, and stores the integer in little-endian format in a.
See the illustration on wikipedia.
also, why not visualize the bit shifting results..
char a[4];
unsigned int addr = 0x0806d3b0;
a[0] = addr & 0xff;
a[1] = (addr & 0xff00) >> 8;
a[2] = (addr & 0xff0000) >> 16;
a[3] = (addr) >> 24;
int i = 0;
for( ; i < 4; i++ )
{
printf( "a[%d] = %02x\t", i, (unsigned char)a[i] );
}
printf("\n" );
Output:
a[0] = b0 a[1] = d3 a[2] = 06 a[3] = 08
I addition to the multiple answers given, the code has some flaws that need to be fixed to make the code portable. In particular, the char type is very dangerous to use for storing values, because of its implementation-defined signedness. Very classic C bug. If the code was taken from a book, then you should read that book sceptically.
While we are at it, we can also tidy up the code, make it overly explicit to avoid potential future maintenance bugs, remove some implicit type promotions of integer literals etc.
#include <stdint.h>
uint8_t a[4];
uint32_t addr = 0x0806d3b0UL;
a[0] = addr & 0xFFu;
a[1] = (addr >> 8) & 0xFFu;
a[2] = (addr >> 16) & 0xFFu;
a[3] = (addr >> 24) & 0xFFu;
The masks & 0xFFu are strictly speaking not needed, but they might save you from some false positive compiler warnings about wrong integer types. Alternatively, each shift result could be cast to uint8_t and that would have been fine too.

Changing endianness on 3 byte integer

I am receiving a 3-byte integer, which I'm storing in an array. For now, assume the array is unsigned char myarray[3]
Normally, I would convert this into a standard int using:
int mynum = ((myarray[2] << 16) | (myarray[1] << 8) | (myarray[0]));
However, before I can do this, I need to convert the data from network to host byte ordering.
So, I change the above to (it comes in 0-1-2, but it's n to h, so 0-2-1 is what I want):
int mynum = ((myarray[1] << 16) | (myarray[2] << 8) | (myarray[0]));
However, this does not seem to work. For the life of me can't figure this out. I've looked at it so much that at this point I think I'm fried and just confusing myself. Is what I am doing correct? Is there a better way? Would the following work?
int mynum = ((myarray[2] << 16) | (myarray[1] << 8) | (myarray[0]));
int correctnum = ntohl(mynum);
Here's an alternate idea. Why not just make it structured and make it explicit what you're doing. Some of the confusion you're having may be rooted in the "I'm storing in an array" premise. If instead, you defined
typedef struct {
u8 highByte;
u8 midByte;
u8 lowByte;
} ThreeByteInt;
To turn it into an int, you just do
u32 ThreeByteTo32(ThreeByteInt *bytes) {
return (bytes->highByte << 16) + (bytes->midByte << 8) + (bytes->lowByte);
}
if you receive the value in network ordering (that is big endian) you have this situation:
myarray[0] = most significant byte
myarray[1] = middle byte
myarray[2] = least significant byte
so this should work:
int result = (((int) myarray[0]) << 16) | (((int) myarray[1]) << 8) | ((int) myarray[2]);
Beside the ways of using strucures / unions with byte-size members you have two other ways
Using ntoh / hton and masking out the high byte of the 4-byte integer before or after
the conversion with an bitwise and.
Doing the bitshift operations contained in other answers
At any rate you should not rely on side effects and shift data beyond the size of data type.
Shift by 16 is beyond the size of unsigned char and will cause problems depending on compiler, flags, platform endianess and byte order. So always do the proper cast before bitwise to make it work on any compiler / platform:
int result = (((int) myarray[0]) << 16) | (((int) myarray[1]) << 8) | ((int) myarray[2]);
Why don't just receive into the top 3 bytes of a 4-byte buffer? After that you could use ntohl which is just a byte swap instruction in most architectures. In some optimization levels it'll be faster than simple bitshifts and or
union
{
int32_t val;
unsigned char myarray[4];
} data;
memcpy(&data, buffer, 3);
data.myarray[3] = 0;
data.val = ntohl(data.val);
or in case you have copied it to the bottom 3 bytes then another shift is enough
memcpy(&data.myarray[1], buffer, 3);
data.myarray[0] = 0;
data.val = ntohl(data.val) >> 8; // or data.val = ntohl(data.val << 8);
unsigned char myarray[3] = { 1, 2, 3 };
# if LITTLE_ENDIAN // you figure out a way to express this on your platform
int mynum = (myarray[0] << 0) | (myarray[1] << 8) | (myarray[2] << 16);
# else
int mynum = (myarray[0] << 16) | (myarray[1] << 8) | (myarray[2] << 0);
# endif
printf("%x\n", mynum);
That prints 30201 which I think is what you want. The key is to realize that you have to shift the bytes differently per-platform: you can't easily use ntohl() because you don't know where to put the extra zero byte.

Type conversion

Can someone please tell me what do these lines of code do
*(a++) = (int)((value >> 16) & 0xFF) ;
*(a++) = (int)((value >> 8) & 0xFF) ;
*(a++) = (int)((value & 0xFF)) ;
I understand that it checks the value, if it is much greater than 16 it converts it to type int and if it is much smaller than 8 does the same. But what does the
& 0xFF and *(a++) do?
I understand that it checks the value
It doesn't check anything, it's not like the << symbol in math which means "much smaller". To break down this line:
*(a++) = (int)((value >> 16) & 0xFF);
(>>) shifts value 16 times to the right
(&) ands the result with 0xFF, thereby discarding everything to the left
Stores the result at the address pointed by a
Increments the pointer, making a point to some "next" element
(value>>16)
No it is not much greater.
It is shift right by 16 bits.
But dividing it by 2 exatly 16 times makes it much smaller than before.
val&0xff makes a solution if it is divisible by 256. For example: if val&0xff is different than zero, than it is not divisible by 256
Given:
char data[10];
uint32_t value = 0x61626364; // 'abcd'
char *a = data;
*(a++) = (int)((value >> 24) & 0xFF);
*(a++) = (int)((value >> 16) & 0xFF);
*(a++) = (int)((value >> 8) & 0xFF);
*(a++) = (int)(value & 0xFF);
*(a++) = ':';
*((uint32_t *)a) = value;
a+=4;
*(a++) = 0;
printf("%s\n", data);
I get (on my intel box, which is a little endian system):
abcd:dcba
So this is ensuring that the bytes of an integer are in an platform-independent form (choosing big endian as the byte format).
Now, for:
*(a++) = (int)((value >> 16) & 0xFF);
we have:
0x61626364 -- value
0x00006162 -- value >> 16 : shifted 2 bytes
0x00000062 -- (value >> 16) & 0xFF : last byte only
*(a++) = (int)((value >> 16) & 0xFF) ;
is like:
aIntValue = value/65536;
aIntBalue = a%256;
*(a++) = (int)((value >> 8) & 0xFF) ;
is like:
aIntValue = value/256;
aIntValue = a%256;
*(a++) = (int)((value & 0xFF)) ;
is like:
aIntValue = a%256;
At the end of the code, either code assign the aIntValut to the value pointed to the pointer 'a' and next the pointer is moved to the next element.

is it possible to bit mask a char array

Let's say I have the following char array
char array[32];
I want to use only the 10 most significant bits of this array as a hash value.
Is it possible to use bitwise operation on this char array?
If so, how should i do it?
Assuming your implementation has 8-bit char, and that you have a 256-bit number stored in big endian in this array, here how to access the 10 msb of the 256-bit number.
uint16_t a;
a = (array[0] << 2 | (array[1] & 0xC0) >> 6) & 0x3FF;
I'm pretty sure you want something like this (again assuming 8-bit chars stored big endian in array):
uint16_t a = (((uint16_t)array[0] & 0xFF) << 2 | ((uint16_t)array[1] & 0xFF) >> 6) & 0x3FF;
To break that down a bit:
uint16_t byte0 = (uint16_t)array[0] & 0xFF;
uint16_t byte1 = (uint16_t)array[1] & 0xFF;
uint16_t a = (byte0 << 2 | byte1 >> 6) & 0x3FF;

Resources