Safe, efficient way to access unaligned data in a network packet from C

Safe, efficient way to access unaligned data in a network packet from C - c

I'm writing a program in C for Linux on an ARM9 processor. The program is to access network packets which include a sequence of tagged data like:
<fieldID><length><data><fieldID><length><data> ...
The fieldID and length fields are both uint16_t. The data can be 1 or more bytes (up to 64k if the full length was used, but it's not).
As long as <data> has an even number of bytes, I don't see a problem. But if I have a 1- or 3- or 5-byte <data> section then the next 16-bit fieldID ends up not on a 16-bit boundary and I anticipate alignment issues. It's been a while since I've done any thing like this from scratch so I'm a little unsure of the details. Any feedback welcome. Thanks.

To avoid alignment issues in this case, access all data as an unsigned char *. So:
unsigned char *p;
//...
uint16_t id = p[0] | (p[1] << 8);
p += 2;
The above example assumes "little endian" data layout, where the least significant byte comes first in a multi-byte number.

You should have functions (inline and/or templated if the language you're using supports those features) that will read the potentially unaligned data and return the data type you're interested in. Something like:
uint16_t unaligned_uint16( void* p)
{
// this assumes big-endian values in data stream
// (which is common, but not universal in network
// communications) - this may or may not be
// appropriate in your case
unsigned char* pByte = (unsigned char*) p;
uint16_t val = (pByte[0] << 8) | pByte[1];
return val;
}

The easy way is to manually rebuild the uint16_ts, at the expense of speed:
uint8_t *packet = ...;
uint16_t fieldID = (packet[0] << 8) | packet[1]; // assumes big-endian host order
uint16_t length = (packet[2] << 8) | packet[2];
uint8_t *data = packet + 4;
packet += 4 + length;
If your processor supports it, you can type-pun or use a union (but beware of strict aliasing).
uint16_t fieldID = htons(*(uint16_t *)packet);
uint16_t length = htons(*(uint16_t *)(packet + 2));
Note that unaligned access aren't always supported (e.g. they might generate a fault of some sort), and on other architectures, they're supported, but there's a performance penalty.
If the packet isn't aligned, you could always copy it into a static buffer and then read it:
static char static_buffer[65540];
memcpy(static_buffer, packet, packet_size); // make sure packet_size <= 65540
uint16_t fieldId = htons(*(uint16_t *)static_buffer);
uint16_t length = htons(*(uint16_t *)(static_buffer + 2));
Personally, I'd just go for option #1, since it'll be the most portable.

Alignment is always going to be fine, although perhaps not super-efficient, if you go through a byte pointer.
Setting aside issues of endian-ness, you can memcpy from the 'real' byte pointer into whatever you want/need that is properly aligned and you will be fine.
(this works because the generated code will load/store the data as bytes, which is alignment safe. It's when the generated assembly has instructions loading and storing 16/32/64 bits of memory in a mis-aligned manner that it all falls apart).

Related

Changing the Endiannes of an integer which can be 2,4 or 8 bytes using a switch-case statement

In a (real time) system, computer 1 (big endian) gets an integer data from from computer 2 (which is little endian). Given the fact that we do not know the size of int, I check it using a sizeof() switch statement and use the __builtin_bswapX method accordingly as follows (assume that this builtin method is usable).
...
int data;
getData(&data); // not the actual function call. just represents what data is.
...
switch (sizeof(int)) {
case 2:
intVal = __builtin_bswap16(data);
break;
case 4:
intVal = __builtin_bswap32(data);
break;
case 8:
intVal = __builtin_bswap64(data);
break;
default:
break;
}
...
is this a legitimate way of swapping the bytes for an integer data? Or is this switch-case statement totally unnecessary?
Update: I do not have access to the internals of getData() method, which communicates with the other computer and gets the data. It then just returns an integer data which needs to be byte-swapped.
Update 2: I realize that I caused some confusion. The two computers have the same int size but we do not know that size. I hope it makes sense now.

Seems odd to assume the size of int is the same on 2 machines yet compensate for variant endian encodings.
The below only informs the int size of the receiving side and not the sending side.
switch(sizeof(int))
The sizeof(int) is the size, in char of an int on the local machine. It should be sizeof(int)*CHAR_BIT to get the bit size. [Op has edited the post]
The sending machine should detail the data width, as a 16, 32, 64- bit without regard to its int size and the receiving end should be able to detect that value as part of the message or an agreed upon width should be used.
Much like hton() to convert from local endian to network endian, the integer size with these function is moving toward fixed width integers like
#include <netinet/in.h>
uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);
So suggest sending/receiving the "int" as a 32-bit uint32_t in network endian.
[Edit]
Consider computers exist that have different endian (little and big are the most common, others exist) and various int sizes with bit width 32 (common), 16, 64 and maybe even some odd-ball 36 bit and such and room for growth to 128-bit. Let us assume N combinations. Rather than write code to convert from 1 of N to N different formats (N*N) routines, let us define a network format and fix its endian to big and bit-width to 32. Now each computer does not care nor need to know the int width/endian of the sender/recipient of data. Each platform get/receives data in a locally optimized method from its endian/int to network endian/int-width.
OP describes not knowing the the sender's int width yet hints that the int width on the sender/receiver might be the same as the local machine. If the int widths are specified to be the same and the endian are specified to be one big/one little as described, then OP's coding works.
However, such a "endians are opposite and int-width the same" seems very selective. I would prepare code to cope with a interchange standard (network standard) as certainly, even if today it is "opposite endian, same int", tomorrow will evolved to a network standard.

A portable approach would not depend on any machine properties, but only rely on mathematical operations and a definition of the communication protocol that is also hardware independent. For example, given that you want to store bytes in a defined way:
void serializeLittleEndian(uint8_t *buffer, uint32_t data) {
size_t i;
for (i = 0; i < sizeof(uint32_t); ++i) {
buffer[i] = data % 256;
data /= 256;
}
}
and to restore that data to whatever machine:
uint32_t deserializeLittleEndian(uint8_t *buffer) {
uint32_t data = 0;
size_t i;
for (i = 0; i < sizeof(uint32_t); ++i) {
data *= 256;
data += buffer[i];
}
return data;
}
EDIT: This is not portable to systems with other than 8 bits per byte due to the uses of int8_t and int32_t. The use of type int8_t implies a system with 8 bit chars. However, it will not compile for systems where these conditions are not met. Thanks to Olaf and Chqrlie.

Yes, this is totally cool - given you fix your switch for proper sizeof return values. One might be a little fancy and provide, for example, template specializations based on the size of int. But a switch like this is totally cool and will not produce any branches in optimized code.

As already mentioned, you generally want to define a protocol for communications across networks, which the hton/ntoh functions are mostly meant for. Network byte order is generally treated as big endian, which is what the hton/ntoh functions use. If the majority of your machines are little endian, it may be better to standardize on it instead though.
A couple people have been critical of using __builtin_bswap, which I personally consider fine as long you don't plan to target compilers that don't support it. Although, you may want to read Dan Luu's critique of intrinsics.
For completeness, I'm including a portable version of bswap that (at very least Clang) compiles into a bswap for x86(64).
#include <stddef.h>
#include <stdint.h>
size_t bswap(size_t x) {
for (size_t i = 0; i < sizeof(size_t) >> 1; i++) {
size_t d = sizeof(size_t) - i - 1;
size_t mh = ((size_t) 0xff) << (d << 3);
size_t ml = ((size_t) 0xff) << (i << 3);
size_t h = x & mh;
size_t l = x & ml;
size_t t = (l << ((d - i) << 3)) | (h >> ((d - i) << 3));
x = t | (x & ~(mh | ml));
}
return x;
}

casting pointers in a buffer

Say I have a buffer filled with data and that I got it off the network.
uint8_t buffer[100];
Now imagine that this buffer has different fields. Some are 1 byte, some 2 bytes, and some 4 bytes. All these fields are packed in the buffer.
Now pretend that I want to grab the value of one of the 16 bit fields. Say that in the buffer, the field is stored like so:
buffer[2] = one byte of two byte field
buffer[3] = second byte of two byte field
I could grab that value like this:
uint16_t* p_val;
p_val = (int16_t*) &buffer[2];
or
p_val = (int16_t*) (buffer + 2);
printf("value: %d\n", ntohs(*p_val));
Is there anything wrong with this approach? Or alignment issues I should watch out for?

As has come out in commentary, yes, there are issues with your proposed approach. Although it might work on the target machine, or it might happen to work in a given case, it is not, in general, safe to cast between different pointer types. (There are exceptions.)
To properly take alignment and byte order into consideration, you could do this:
union convert {
uint32_t word;
uint16_t halfword[2];
uint8_t bytes[4];
} convert;
uint16_t result16;
memcpy(convert.bytes, buffer + offset, 2);
/* assuming network byte order: */
result16 = ntohs(convert.halfword[0]);
If you are in control of the data format, then network byte order is a good choice, as the program doesn't then need explicitly to determine, assume, or know the byte order of the machine on which it is running.

Reverse the Endianness of a C structure

I have a structure in C that looks like this:
typedef u_int8_t NN;
typedef u_int8_t X;
typedef int16_t S;
typedef u_int16_t U;
typedef char C;
typedef struct{
X test;
NN test2[2];
C test3[4];
U test4;
} Test;
I have declared the structure and written values to the fields as follows:
Test t;
int t_buflen = sizeof(t);
memset( &t, 0, t_buflen);
t.test = 0xde;
t.test2[0]=0xad; t.test2[1]=0x00;
t.test3[0]=0xbe; t.test3[1]=0xef; t.test3[2]=0x00; t.test3[3]=0xde;
t.test4=0xdeca;
I am sending this structure via UDP to a server. At present this works fine when I test locally, however I now need to send this structure from my little-endian machine to a big-endian machine. I'm not really sure how to do this.
I've looked into using htons but I'm not sure if that's applicable in this situation as it seem to only be defined for unsigned ints of 16 or 32 bits, if I understood correctly.

I think there may be two issues here depending on how you're sending this data over TCP.
Issue 1: Endianness
As, you've said endianness is an issue. You're right when you mention using htons and ntohs for shorts. You may also find htonl and its opposite useful too.
Endianness has to do with the byte ordering of multiple-byte data types in memory. Therefore, for single byte-width data types you do not have to worry. In your case is is the 2-byte data that I guess you're questioning.
To use these functions you will need to do something like the following...
Sender:
-------
t.test = 0xde; // Does not need to be swapped
t.test2[0] = 0xad; ... // Does not need to be swapped
t.test3[0] = 0xbe; ... // Does not need to be swapped
t.test4 = htons(0xdeca); // Needs to be swapped
...
sendto(..., &t, ...);
Receiver:
---------
recvfrom(..., &t, ...);
t.test4 = ntohs(0xdeca); // Needs to be swapped
Using htons() and ntohs() use the Ethernet byte ordering... big endian. Therefore your little-endian machine byte swaps t.test4 and on receipt the big-endian machine just uses that value read (ntohs() is a noop effectively).
The following diagram will make this more clear...
If you did not want to use the htons() function and its variants then you could just define the buffer format at the byte level. This diagram make's this more clear...
In this case your code might look something like
Sender:
-------
uint8_t buffer[SOME SIZE];
t.test = 0xde;
t.test2[0] = 0xad; ...
t.test3[0] = 0xbe; ...
t.test4 = 0xdeca;
buffer[0] = t.test;
buffer[1] = t.test2[0];
/// and so on, until...
buffer[7] = t.test4 & 0xff;
buffer[8] = (t.test4 >> 8) & 0xff;
...
sendto(..., buffer, ...);
Receiver:
---------
uint8_t buffer[SOME SIZE];
recvfrom(..., buffer, ...);
t.test = buffer[0];
t.test2[0] = buffer[1];
// and so on, until...
t.test4 = buffer[7] | (buffer[8] << 8);
The send and receive code will work regardless of the respective endianness of the sender and receiver because the byte-layout of the buffer is defined and known by the program running on both machines.
However, if you're sending your structure through the socket in this way you should also note the caveat below...
Issue 2: Data alignment
The article "Data alignment: Straighten up and fly right" is a great read for this one...
The other problem you might have is data alignment. This is not always the case, even between machines that use different endian conventions, but is nevertheless something to watch out for...
struct
{
uint8_t v1;
uint16_t v2;
}
In the above bit of code the offset of v2 from the start of the structure could be 1 byte, 2 bytes, 4 bytes (or just about anything). The compiler cannot re-order members in your structure, but it can pad the distance between variables.
Lets say machine 1 has a 16-bit wide data bus. If we took the structure without padding the machine will have to do two fetches to get v2. Why? Because we access 2 bytes of memory at a time at the h/w level. Therefore the compiler could pad out the structure like so
struct
{
uint8_t v1;
uint8_t invisible_padding_created_by_compiler;
uint16_t v2;
}
If the sender and receiver differ on how they pack data into a structure then just sending the structure as a binary blob will cause you problems. In this case you may have to pack the variables into a byte stream/buffer manually before sending. This is often the safest way.

There's no endianness of the structure really. It's all the separate fields that need to be converted to big-endian when needed. You can either make a copy of the structure and rewrite each field using hton/htons, then send the result. 8-bit fields don't need any modification of course.
In case of TCP you could also just send each part separately and count on nagle algorithm to merge all parts into a single packet, but with UDP you need to prepare everything up front.

The data you are sending over the network should be the same regardless of the endianess of the machines involved. The key word you need to research is serialization. This means converting a data structure to a series of bits/bytes to be sent over a network or saved to disk, which will always be the same regardless of anything like architecture or compiler.

C: Memcpy vs Shifting: Whats more efficient?

I have a byte array containing 16 & 32bit data samples, and to cast them to Int16 and Int32 I currently just do a memcpy with 2 (or 4) bytes.
Because memcpy is probably isn't optimized for lenghts of just two bytes, I was wondering if it would be more efficient to convert the bytes using integer arithmetic (or an union) to an Int32.
I would like to know what the effiency of calling memcpy vs bit shifting is, because the code runs on an embedded platform.

I would say that memcpy is not the way to do this. However, finding the best way depends heavily on how your data is stored in memory.
To start with, you don't want to take the address of your destination variable. If it is a local variable, you will force it to the stack rather than giving the compiler the option to place it in a processor register. This alone could be very expensive.
The most general solution is to read the data byte by byte and arithmetically combine the result. For example:
uint16_t res = ( (((uint16_t)char_array[high]) << 8)
| char_array[low]);
The expression in the 32 bit case is a bit more complex, as you have more alternatives. You might want to check the assembler output which is best.
Alt 1: Build paris, and combine them:
uint16_t low16 = ... as example above ...;
uint16_t high16 = ... as example above ...;
uint32_t res = ( (((uint32_t)high16) << 16)
| low16);
Alt 2: Shift in 8 bits at a time:
uint32_t res = char_array[i0];
res = (res << 8) | char_array[i1];
res = (res << 8) | char_array[i2];
res = (res << 8) | char_array[i3];
All examples above are neutral to the endianess of the processor used, as the index values decide which part to read.
Next kind of solutions is possible if 1) the endianess (byte order) of the device match the order in which the bytes are stored in the array, and 2) the array is known to be placed on an aligned memory address. The latter case depends on the machine, but you are safe if the char array representing a 16 bit array starts on an even address and in the 32 bit case it should start on an address dividable by four. In this case you could simply read the address, after some pointer tricks:
uint16_t res = *(uint16_t *)&char_array[xxx];
Where xxx is the array index corresponding to the first byte in memory. Note that this might not be the same as the index to he lowest value.
I would strongly suggest the first class of solutions, as it is endianess-neutral.
Anyway, both of them are way faster than your memcpy solution.

memcpy is not valid for "shifting" (moving data by an offset shorter than its length within the same array); attempting to use it for such invokes very dangerous undefined behavior. See http://lwn.net/Articles/414467/
You must either use memmove or your own shifting loop. For sizes above about 64 bytes, I would expect memmove to be a lot faster. For extremely short shifts, your own loop may win. Note that memmove has more overhead than memcpy because it has to determine which direction of copying is safe. Your own loop already knows (presumably) which direction is safe, so it can avoid an extra runtime check.

Why both utf-16le and utf-16be exists? endianness efficiency - C

I was wondering why both utf-16le and utf-16be exists? Is it considered to be "inefficient" for a big-endian environment to process a little-endian data?
Currently, this is what I use while storing 2 bytes var locally:
unsigned char octets[2];
short int shotint = 12345; /* (assuming short int = 2 bytes) */
octets[0] = (shortint) & 255;
octets[1] = (shortint >> 8) & 255);
I know that while storing and reading as a fixed endianness locally - there is no endian risk. I was wondering if it's considered to be "inefficient"? what would be the most "efficient" way to store a 2 bytes var? (while restricting the data to the environment's endianness, local use only.)
Thanks, Doori Bar

This allows code to write large amounts of Unicode data to a file without conversion. During loading, you must always check the endianess. If you're lucky, you need no conversion. So in 66% of the cases, you need no conversion and only on 33% you must convert.
In memory, you can then access the data using the native datatypes of your CPU which allows for efficient processing.
That way, everyone can be as happy as possible.
So in your case, you need to check the encoding when loading the data but in RAM, you can use an array of short int to process it.
[EDIT] The fastest way to convert a 16bit value to 2 octets is:
char octet[2];
short * prt = (short*)&octet[0];
*ptr = 12345;
Now you don't know if octet[0] is the low or upper 8 bits. To find that out, write a know value and then examine it.
This will give you one of the encodings; the native one of your CPU.
If you need the other encoding, you can either swap the octets as you write them to a file (i.e. write them octet[1],octet[0]) or your code.
If you have several octets, you can use 32bit integers to swap two 16bit values at once:
char octet[4];
short * prt = (short*)&octet[0];
*ptr ++ = 12345;
*ptr ++ = 23456;
int * ptr32 = (int*)&octet[0];
int val = ((*ptr32 << 8) & 0xff00ff00) || (*ptr >> 8) & 0x00ff00ff);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Safe, efficient way to access unaligned data in a network packet from C - c

To avoid alignment issues in this case, access all data as an unsigned char . So: unsigned char p; //... uint16_t id = p[0] | (p[1] << 8); p += 2; The above example assumes "little endian" data layout, where the least significant byte comes first in a multi-byte number.

Related

Changing the Endiannes of an integer which can be 2,4 or 8 bytes using a switch-case statement

casting pointers in a buffer

Reverse the Endianness of a C structure

C: Memcpy vs Shifting: Whats more efficient?

Why both utf-16le and utf-16be exists? endianness efficiency - C

Categories

Resources

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Safe, efficient way to access unaligned data in a network packet from C - c

To avoid alignment issues in this case, access all data as an unsigned char *. So: unsigned char *p; //... uint16_t id = p[0] | (p[1] << 8); p += 2; The above example assumes "little endian" data layout, where the least significant byte comes first in a multi-byte number.

Related

Changing the Endiannes of an integer which can be 2,4 or 8 bytes using a switch-case statement

casting pointers in a buffer

Reverse the Endianness of a C structure

C: Memcpy vs Shifting: Whats more efficient?

Why both utf-16le and utf-16be exists? endianness efficiency - C

Categories

Resources

To avoid alignment issues in this case, access all data as an unsigned char . So: unsigned char p; //... uint16_t id = p[0] | (p[1] << 8); p += 2; The above example assumes "little endian" data layout, where the least significant byte comes first in a multi-byte number.