casting pointers in a buffer - c

Say I have a buffer filled with data and that I got it off the network.
uint8_t buffer[100];
Now imagine that this buffer has different fields. Some are 1 byte, some 2 bytes, and some 4 bytes. All these fields are packed in the buffer.
Now pretend that I want to grab the value of one of the 16 bit fields. Say that in the buffer, the field is stored like so:
buffer[2] = one byte of two byte field
buffer[3] = second byte of two byte field
I could grab that value like this:
uint16_t* p_val;
p_val = (int16_t*) &buffer[2];
or
p_val = (int16_t*) (buffer + 2);
printf("value: %d\n", ntohs(*p_val));
Is there anything wrong with this approach? Or alignment issues I should watch out for?

As has come out in commentary, yes, there are issues with your proposed approach. Although it might work on the target machine, or it might happen to work in a given case, it is not, in general, safe to cast between different pointer types. (There are exceptions.)
To properly take alignment and byte order into consideration, you could do this:
union convert {
uint32_t word;
uint16_t halfword[2];
uint8_t bytes[4];
} convert;
uint16_t result16;
memcpy(convert.bytes, buffer + offset, 2);
/* assuming network byte order: */
result16 = ntohs(convert.halfword[0]);
If you are in control of the data format, then network byte order is a good choice, as the program doesn't then need explicitly to determine, assume, or know the byte order of the machine on which it is running.

Related

memcpy misses one byte when copying to struct

I am trying to extract a particular region of my message and interpret it as a struct.
void app_main(void)
{
esp_err_t err;
uint8_t injected_input[]={0xCE,0x33,0xE1,0x00,0x11,0x22,0x33,0x44,0x55,0x66};
model_sensor_data_t stuff = {0};
model_sensor_data_t* sensor_buf = &stuff;
if (extract_sensor_data_msgA(injected_input, sensor_buf) == -1)
{
ESP_LOGE(TAG, "Error in extract_sensor_data_msgA");
}
ESP_LOGI(TAG, "extracted sensor data is 0x%12x", *sensor_buf);
}
typedef struct __attribute__((packed))
{
uint8_t byte0;
uint8_t byte1;
uint8_t byte2;
uint8_t byte3;
uint8_t byte4;
} model_sensor_data_t;
int32_t extract_sensor_data_msgA(uint8_t *buf, model_sensor_data_t *sensor_buf)
{
if (buf == NULL || sensor_buf == NULL)
{
return -1;
}
//do other checks, blah blah
memcpy(sensor_buf, buf + 5, sizeof(model_sensor_data_t)); //problem lies here
return 0;
}
I expect to get CLIENT: extracted sensor data is 0x2233445566 but i am getting CLIENT: extracted sensor data is 0x 55443322
It seems to me there are two problems i need to fix. First one is the endianness issue as the extracted values are all flipped. The second problem is the memcpy with padding(?) concern. I thought the second problem would be fixed if i use attribute((packed)) but it doesn't seem to fix the second problem. Any kind soul can provide an alternative way for me to go about this so as to resolve it? I have referred to https://electronics.stackexchange.com/questions/617711/problems-casting-a-uint8-t-array-to-a-struct and C memcpy copies bytes with little endianness but i am still unsure how to resolve the issue.
ESP_LOGI(TAG, "extracted sensor data is 0x%12x", *sensor_buf)
assuming this is going to a printf-family function (seems likely), it will be expecting a unsigned int as the argument, but you're passing a model_sensor_data_t, so you get undefined behavior.
What is probably happening is that an unsigned int is a 32-bit little-endian value being accessed in the bottom 32 bits of a register, while your calling convention will pass the model_sensor_data_t in a 64-bit register, so you're seeing the first 4 bytes as a little-endian unsigned. Alternately, printf is expecting a 32-bit value on the stack, and you are passing a 40-bit value (probably padded out to 8 bytes for alignment). Either way, it seems almost certain you're using a little-endian machine, such as an x86 of some flavor.
To print this properly, you need to print each byte. Something like
ESP_LOGI(TAG, "extracted sensor data is 0x%02x%02x%02x%02x%02x", sensor_buf->byte0,
sensor_buf->byte1, sensor_buf->byte2, sensor_buf->byte3, sensor_buf->byte4);
will print the extracted data as a 40-bit big-endian hex value.

How to convert to integer a char[4] of "hexadecimal" numbers [C/Linux]

So I'm working with system calls in Linux. I'm using "lseek" to navigate through the file and "read" to read. I'm also using Midnight Commander to see the file in hexadecimal. The next 4 bytes I have to read are in little-endian , and look like this : "2A 00 00 00". But of course, the bytes can be something like "2A 5F B3 00". I have to convert those bytes to an integer. How do I approach this? My initial thought was to read them into a vector of 4 chars, and then to build my integer from there, but I don't know how. Any ideas?
Let me give you an example of what I've tried. I have the following bytes in file "44 00". I have to convert that into the value 68 (4 + 4*16):
char value[2];
read(fd, value, 2);
int i = (value[0] << 8) | value[1];
The variable i is 17480 insead of 68.
UPDATE: Nvm. I solved it. I mixed the indexes when I shift. It shoud've been value[1] << 8 ... | value[0]
General considerations
There seem to be several pieces to the question -- at least how to read the data, what data type to use to hold the intermediate result, and how to perform the conversion. If indeed you are assuming that the on-file representation consists of the bytes of a 32-bit integer in little-endian order, with all bits significant, then I probably would not use a char[] as the intermediate, but rather a uint32_t or an int32_t. If you know or assume that the endianness of the data is the same as the machine's native endianness, then you don't need any other.
Determining native endianness
If you need to compute the host machine's native endianness, then this will do it:
static const uint32_t test = 1;
_Bool host_is_little_endian = *(char *)&test;
It is worthwhile doing that, because it may well be the case that you don't need to do any conversion at all.
Reading the data
I would read the data into a uint32_t (or possibly an int32_t), not into a char array. Possibly I would read it into an array of uint8_t.
uint32_t data;
int num_read = fread(&data, 4, 1, my_file);
if (num_read != 1) { /* ... handle error ... */ }
Converting the data
It is worthwhile knowing whether the on-file representation matches the host's endianness, because if it does, you don't need to do any transformation (that is, you're done at this point in that case). If you do need to swap endianness, however, then you can use ntohl() or htonl():
if (!host_is_little_endian) {
data = ntohl(data);
}
(This assumes that little- and big-endian are the only host byte orders you need to be concerned with. Historically, there have been others, which is why the byte-reorder functions come in pairs, but you are extremely unlikely ever to see one of the others.)
Signed integers
If you need a signed instead of unsigned integer, then you can do the same, but use a union:
union {
uint32_t unsigned;
int32_t signed;
} data;
In all of the preceding, use data.unsigned in place of plain data, and at the end, read out the signed result from data.signed.
Suppose you point into your buffer:
unsigned char *p = &buf[20];
and you want to see the next 4 bytes as an integer and assign them to your integer, then you can cast it:
int i;
i = *(int *)p;
You just said that p is now a pointer to an int, you de-referenced that pointer and assigned it to i.
However, this depends on the endianness of your platform. If your platform has a different endianness, you may first have to reverse-copy the bytes to a small buffer and then use this technique. For example:
unsigned char ibuf[4];
for (i=3; i>=0; i--) ibuf[i]= *p++;
i = *(int *)ibuf;
EDIT
The suggestions and comments of Andrew Henle and Bodo could give:
unsigned char *p = &buf[20];
int i, j;
unsigned char *pi= &(unsigned char)i;
for (j=3; j>=0; j--) *pi++= *p++;
// and the other endian:
int i, j;
unsigned char *pi= (&(unsigned char)i)+3;
for (j=3; j>=0; j--) *pi--= *p++;

Reverse the Endianness of a C structure

I have a structure in C that looks like this:
typedef u_int8_t NN;
typedef u_int8_t X;
typedef int16_t S;
typedef u_int16_t U;
typedef char C;
typedef struct{
X test;
NN test2[2];
C test3[4];
U test4;
} Test;
I have declared the structure and written values to the fields as follows:
Test t;
int t_buflen = sizeof(t);
memset( &t, 0, t_buflen);
t.test = 0xde;
t.test2[0]=0xad; t.test2[1]=0x00;
t.test3[0]=0xbe; t.test3[1]=0xef; t.test3[2]=0x00; t.test3[3]=0xde;
t.test4=0xdeca;
I am sending this structure via UDP to a server. At present this works fine when I test locally, however I now need to send this structure from my little-endian machine to a big-endian machine. I'm not really sure how to do this.
I've looked into using htons but I'm not sure if that's applicable in this situation as it seem to only be defined for unsigned ints of 16 or 32 bits, if I understood correctly.
I think there may be two issues here depending on how you're sending this data over TCP.
Issue 1: Endianness
As, you've said endianness is an issue. You're right when you mention using htons and ntohs for shorts. You may also find htonl and its opposite useful too.
Endianness has to do with the byte ordering of multiple-byte data types in memory. Therefore, for single byte-width data types you do not have to worry. In your case is is the 2-byte data that I guess you're questioning.
To use these functions you will need to do something like the following...
Sender:
-------
t.test = 0xde; // Does not need to be swapped
t.test2[0] = 0xad; ... // Does not need to be swapped
t.test3[0] = 0xbe; ... // Does not need to be swapped
t.test4 = htons(0xdeca); // Needs to be swapped
...
sendto(..., &t, ...);
Receiver:
---------
recvfrom(..., &t, ...);
t.test4 = ntohs(0xdeca); // Needs to be swapped
Using htons() and ntohs() use the Ethernet byte ordering... big endian. Therefore your little-endian machine byte swaps t.test4 and on receipt the big-endian machine just uses that value read (ntohs() is a noop effectively).
The following diagram will make this more clear...
If you did not want to use the htons() function and its variants then you could just define the buffer format at the byte level. This diagram make's this more clear...
In this case your code might look something like
Sender:
-------
uint8_t buffer[SOME SIZE];
t.test = 0xde;
t.test2[0] = 0xad; ...
t.test3[0] = 0xbe; ...
t.test4 = 0xdeca;
buffer[0] = t.test;
buffer[1] = t.test2[0];
/// and so on, until...
buffer[7] = t.test4 & 0xff;
buffer[8] = (t.test4 >> 8) & 0xff;
...
sendto(..., buffer, ...);
Receiver:
---------
uint8_t buffer[SOME SIZE];
recvfrom(..., buffer, ...);
t.test = buffer[0];
t.test2[0] = buffer[1];
// and so on, until...
t.test4 = buffer[7] | (buffer[8] << 8);
The send and receive code will work regardless of the respective endianness of the sender and receiver because the byte-layout of the buffer is defined and known by the program running on both machines.
However, if you're sending your structure through the socket in this way you should also note the caveat below...
Issue 2: Data alignment
The article "Data alignment: Straighten up and fly right" is a great read for this one...
The other problem you might have is data alignment. This is not always the case, even between machines that use different endian conventions, but is nevertheless something to watch out for...
struct
{
uint8_t v1;
uint16_t v2;
}
In the above bit of code the offset of v2 from the start of the structure could be 1 byte, 2 bytes, 4 bytes (or just about anything). The compiler cannot re-order members in your structure, but it can pad the distance between variables.
Lets say machine 1 has a 16-bit wide data bus. If we took the structure without padding the machine will have to do two fetches to get v2. Why? Because we access 2 bytes of memory at a time at the h/w level. Therefore the compiler could pad out the structure like so
struct
{
uint8_t v1;
uint8_t invisible_padding_created_by_compiler;
uint16_t v2;
}
If the sender and receiver differ on how they pack data into a structure then just sending the structure as a binary blob will cause you problems. In this case you may have to pack the variables into a byte stream/buffer manually before sending. This is often the safest way.
There's no endianness of the structure really. It's all the separate fields that need to be converted to big-endian when needed. You can either make a copy of the structure and rewrite each field using hton/htons, then send the result. 8-bit fields don't need any modification of course.
In case of TCP you could also just send each part separately and count on nagle algorithm to merge all parts into a single packet, but with UDP you need to prepare everything up front.
The data you are sending over the network should be the same regardless of the endianess of the machines involved. The key word you need to research is serialization. This means converting a data structure to a series of bits/bytes to be sent over a network or saved to disk, which will always be the same regardless of anything like architecture or compiler.

C programming: words from byte array

I have some confusion regarding reading a word from a byte array. The background context is that I'm working on a MIPS simulator written in C for an intro computer architecture class, but while debugging my code I ran into a surprising result that I simply don't understand from a C programming standpoint.
I have a byte array called mem defined as follows:
uint8_t *mem;
//...
mem = calloc(MEM_SIZE, sizeof(uint8_t)); // MEM_SIZE is pre defined as 1024x1024
During some of my testing I manually stored a uint32_t value into four of the blocks of memory at an address called mipsaddr, one byte at a time, as follows:
for(int i = 3; i >=0; i--) {
*(mem+mipsaddr+i) = value;
value = value >> 8;
// in my test, value = 0x1084
}
Finally, I tested trying to read a word from the array in one of two ways. In the first way, I basically tried to read the entire word into a variable at once:
uint32_t foo = *(uint32_t*)(mem+mipsaddr);
printf("foo = 0x%08x\n", foo);
In the second way, I read each byte from each cell manually, and then added them together with bit shifts:
uint8_t test0 = mem[mipsaddr];
uint8_t test1 = mem[mipsaddr+1];
uint8_t test2 = mem[mipsaddr+2];
uint8_t test3 = mem[mipsaddr+3];
uint32_t test4 = (mem[mipsaddr]<<24) + (mem[mipsaddr+1]<<16) +
(mem[mipsaddr+2]<<8) + mem[mipsaddr+3];
printf("test4= 0x%08x\n", test4);
The output of the code above came out as this:
foo= 0x84100000
test4= 0x00001084
The value of test4 is exactly as I expect it to be, but foo seems to have reversed the order of the bytes. Why would this be the case? In the case of foo, I expected the uint32_t* pointer to point to mem[mipsaddr], and since it's 32-bits long, it would just read in all 32 bits in the order they exist in the array (which would be 00001084). Clearly, my understanding isn't correct.
I'm new here, and I did search for the answer to this question but couldn't find it. If it's already been posted, I apologize! But if not, I hope someone can enlighten me here.
It is (among others) explained here: http://en.wikipedia.org/wiki/Endianness
When storing data larger than one byte into memory, it depends on the architecture (means, the CPU) in which order the bytes are stored. Either, the most significant byte is stored first and the least significant byte last, or vice versa. When you read back the individual bytes through byte access operations, and then merge them to form the original value again, you need to consider the endianess of your particular system.
In your for-loop, you are storing your value byte-wise, starting with the most significant byte (counting down the index is a bit misleading ;-). Your memory looks like this afterwards: 0x00 0x00 0x10 0x84.
You are then reading the word back with a single 32 bit (four byte) access. Depending on our architecture, this will either become 0x00001084 (big endian) or 0x84100000 (little endian). Since you get the latter, you are working on a little endian system.
In your second approach, you are using the same order in which you stored the individual bytes (most significant first), so you get back the same value which you stored earlier.
It seems to be a problem of endianness, maybe comes from casting (uint8_t *) to (uint32_t *)

Safe, efficient way to access unaligned data in a network packet from C

I'm writing a program in C for Linux on an ARM9 processor. The program is to access network packets which include a sequence of tagged data like:
<fieldID><length><data><fieldID><length><data> ...
The fieldID and length fields are both uint16_t. The data can be 1 or more bytes (up to 64k if the full length was used, but it's not).
As long as <data> has an even number of bytes, I don't see a problem. But if I have a 1- or 3- or 5-byte <data> section then the next 16-bit fieldID ends up not on a 16-bit boundary and I anticipate alignment issues. It's been a while since I've done any thing like this from scratch so I'm a little unsure of the details. Any feedback welcome. Thanks.
To avoid alignment issues in this case, access all data as an unsigned char *. So:
unsigned char *p;
//...
uint16_t id = p[0] | (p[1] << 8);
p += 2;
The above example assumes "little endian" data layout, where the least significant byte comes first in a multi-byte number.
You should have functions (inline and/or templated if the language you're using supports those features) that will read the potentially unaligned data and return the data type you're interested in. Something like:
uint16_t unaligned_uint16( void* p)
{
// this assumes big-endian values in data stream
// (which is common, but not universal in network
// communications) - this may or may not be
// appropriate in your case
unsigned char* pByte = (unsigned char*) p;
uint16_t val = (pByte[0] << 8) | pByte[1];
return val;
}
The easy way is to manually rebuild the uint16_ts, at the expense of speed:
uint8_t *packet = ...;
uint16_t fieldID = (packet[0] << 8) | packet[1]; // assumes big-endian host order
uint16_t length = (packet[2] << 8) | packet[2];
uint8_t *data = packet + 4;
packet += 4 + length;
If your processor supports it, you can type-pun or use a union (but beware of strict aliasing).
uint16_t fieldID = htons(*(uint16_t *)packet);
uint16_t length = htons(*(uint16_t *)(packet + 2));
Note that unaligned access aren't always supported (e.g. they might generate a fault of some sort), and on other architectures, they're supported, but there's a performance penalty.
If the packet isn't aligned, you could always copy it into a static buffer and then read it:
static char static_buffer[65540];
memcpy(static_buffer, packet, packet_size); // make sure packet_size <= 65540
uint16_t fieldId = htons(*(uint16_t *)static_buffer);
uint16_t length = htons(*(uint16_t *)(static_buffer + 2));
Personally, I'd just go for option #1, since it'll be the most portable.
Alignment is always going to be fine, although perhaps not super-efficient, if you go through a byte pointer.
Setting aside issues of endian-ness, you can memcpy from the 'real' byte pointer into whatever you want/need that is properly aligned and you will be fine.
(this works because the generated code will load/store the data as bytes, which is alignment safe. It's when the generated assembly has instructions loading and storing 16/32/64 bits of memory in a mis-aligned manner that it all falls apart).

Resources