Why does fread mess with my byte order? - c

Im trying to parse a bmp file with fread() and when I begin to parse, it reverses the order of my bytes.
typedef struct{
short magic_number;
int file_size;
short reserved_bytes[2];
int data_offset;
}BMPHeader;
...
BMPHeader header;
...
The hex data is 42 4D 36 00 03 00 00 00 00 00 36 00 00 00;
I am loading the hex data into the struct by fread(&header,14,1,fileIn);
My problem is where the magic number should be 0x424d //'BM' fread() it flips the bytes to be 0x4d42 // 'MB'
Why does fread() do this and how can I fix it;
EDIT: If I wasn't specific enough, I need to read the whole chunk of hex data into the struct not just the magic number. I only picked the magic number as an example.

This is not the fault of fread, but of your CPU, which is (apparently) little-endian. That is, your CPU treats the first byte in a short value as the low 8 bits, rather than (as you seem to have expected) the high 8 bits.
Whenever you read a binary file format, you must explicitly convert from the file format's endianness to the CPU's native endianness. You do that with functions like these:
/* CHAR_BIT == 8 assumed */
uint16_t le16_to_cpu(const uint8_t *buf)
{
return ((uint16_t)buf[0]) | (((uint16_t)buf[1]) << 8);
}
uint16_t be16_to_cpu(const uint8_t *buf)
{
return ((uint16_t)buf[1]) | (((uint16_t)buf[0]) << 8);
}
You do your fread into an uint8_t buffer of the appropriate size, and then you manually copy all the data bytes over to your BMPHeader struct, converting as necessary. That would look something like this:
/* note adjustments to type definition */
typedef struct BMPHeader
{
uint8_t magic_number[2];
uint32_t file_size;
uint8_t reserved[4];
uint32_t data_offset;
} BMPHeader;
/* in general this is _not_ equal to sizeof(BMPHeader) */
#define BMP_WIRE_HDR_LEN (2 + 4 + 4 + 4)
/* returns 0=success, -1=error */
int read_bmp_header(BMPHeader *hdr, FILE *fp)
{
uint8_t buf[BMP_WIRE_HDR_LEN];
if (fread(buf, 1, sizeof buf, fp) != sizeof buf)
return -1;
hdr->magic_number[0] = buf[0];
hdr->magic_number[1] = buf[1];
hdr->file_size = le32_to_cpu(buf+2);
hdr->reserved[0] = buf[6];
hdr->reserved[1] = buf[7];
hdr->reserved[2] = buf[8];
hdr->reserved[3] = buf[9];
hdr->data_offset = le32_to_cpu(buf+10);
return 0;
}
You do not assume that the CPU's endianness is the same as the file format's even if you know for a fact that right now they are the same; you write the conversions anyway, so that in the future your code will work without modification on a CPU with the opposite endianness.
You can make life easier for yourself by using the fixed-width <stdint.h> types, by using unsigned types unless being able to represent negative numbers is absolutely required, and by not using integers when character arrays will do. I've done all these things in the above example. You can see that you need not bother endian-converting the magic number, because the only thing you need to do with it is test magic_number[0]=='B' && magic_number[1]=='M'.
Conversion in the opposite direction, btw, looks like this:
void cpu_to_le16(uint8_t *buf, uint16_t val)
{
buf[0] = (val & 0x00FF);
buf[1] = (val & 0xFF00) >> 8;
}
void cpu_to_be16(uint8_t *buf, uint16_t val)
{
buf[0] = (val & 0xFF00) >> 8;
buf[1] = (val & 0x00FF);
}
Conversion of 32-/64-bit quantities left as an exercise.

I assume this is an endian issue. i.e. You are putting the bytes 42 and 4D into your short value. But your system is little endian (I could have the wrong name), which actually reads the bytes (within a multi-byte integer type) left to right instead of right to left.
Demonstrated in this code:
#include <stdio.h>
int main()
{
union {
short sval;
unsigned char bval[2];
} udata;
udata.sval = 1;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
udata.sval = 0x424d;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
udata.sval = 0x4d42;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
return 0;
}
Gives the following output
DEC[ 1] HEX[0001] BYTES[01][00]
DEC[16973] HEX[424d] BYTES[4d][42]
DEC[19778] HEX[4d42] BYTES[42][4d]
So if you want to be portable you will need to detect the endian-ness of your system and then do a byte shuffle if required. There will be plenty of examples round the internet of swapping the bytes around.
Subsequent question:
I ask only because my file size is 3 instead of 196662
This is due to memory alignment issues. 196662 is the bytes 36 00 03 00 and 3 is the bytes 03 00 00 00. Most systems need types like int etc to not be split over multiple memory words. So intuitively you think your struct is laid out im memory like:
Offset
short magic_number; 00 - 01
int file_size; 02 - 05
short reserved_bytes[2]; 06 - 09
int data_offset; 0A - 0D
BUT on a 32 bit system that means files_size has 2 bytes in the same word as magic_number and two bytes in the next word. Most compilers will not stand for this, so the way the structure is laid out in memory is actually like:
short magic_number; 00 - 01
<<unused padding>> 02 - 03
int file_size; 04 - 07
short reserved_bytes[2]; 08 - 0B
int data_offset; 0C - 0F
So when you read your byte stream in the 36 00 is going into your padding area which leaves your file_size as getting the 03 00 00 00. Now if you used fwrite to create this data it should have been OK as the padding bytes would have been written out. But if your input is always going to be in the format you have specified it is not appropriate to read the whole struct as one with fread. Instead you will need to read each of the elements individually.

Writing a struct to a file is highly non-portable -- it's safest to just not try to do it at all. Using a struct like this is guaranteed to work only if a) the struct is both written and read as a struct (never a sequence of bytes) and b) it's always both written and read on the same (type of) machine. Not only are there "endian" issues with different CPUs (which is what it seems you've run into), there are also "alignment" issues. Different hardware implementations have different rules about placing integers only on even 2-byte or even 4-byte or even 8-byte boundaries. The compiler is fully aware of all this, and inserts hidden padding bytes into your struct so it always works right. But as a result of the hidden padding bytes, it's not at all safe to assume a struct's bytes are laid out in memory like you think they are. If you're very lucky, you work on a computer that uses big-endian byte order and has no alignment restrictions at all, so you can lay structs directly over files and have it work. But you're probably not that lucky -- certainly programs that need to be "portable" to different machines have to avoid trying to lay structs directly over any part of any file.

Related

How properly place received data to structure (C)? [duplicate]

This question already has answers here:
How to use structure with dynamically changing size of data?
(3 answers)
Closed 4 years ago.
Question only for C. Vectors, lists and C++ do not solving.
I have buffer with received data:
(from there and further U8 is uin8_t (unsigned char) and so on)
buffer_pic
Data is packetized (it always has info about start, end and len).
Examples of data (hex):
(1 packet)
24 0C 00 02 00 00 00 11 AA 0D 78 C8
(2 packet)
24 0F 00 02 00 00 00 14 D0 07 00 00 0D 7D 53
Here:
'24' - start of packet
2 bytes of full packet len (bold)
4 bytes - special ID (here is 02 00 00 00)
1 byte commad
DATA block (makred as bold)
'0D' - end of packet
last 2 bytes - CRC
I want to use structures to work with this data.
Here is what I did:
typedef __packed struct FM_Packet_s
{
U8 head;
U16 len;
U32 uid;
U8 cmd;
U8 data;
U8 end;
U16 crc;
} FM_Packet_t, *FM_Packet_p;
U8 RX_buff[255];
…
FM_Packet_t *pFM_Packet = (FM_Packet_t *) &RX_buf;
handlerData()
{
// check received CRC
if(pFM_Packet->uid == ID_NUMBER)
{
if(pFM_Packet->cmd == NEEDED_COMMAND)
{
// command received, make actions
if (pFM_Packet->data == SPECIAL_DATA)
{
// do stuff
}
}
}
}
Everything was good until I received 2nd packet, which have more than 1 byte in DATA field. Now data is blended
Of course, field "data" may have different length, not only as showed in this two packets.
How can I handle (place into structures correctly) received data?
U8 data;
needs to be a pointer to a buffer that's the right size to hold your data, not an unsigned integer.
You would need to allocate the buffer to be whatever size you need, before loading the data, then point the pointer to it.
You could also just make your packet buffer a lot larger and use U16 len;to figure out where the data stops.
You can use the "unwarranted chumminess of C" http://computer-programming-forum.com/47-c-language/6c323b3186a9a335.htm
typedef __packed struct FM_Packet_s
{
U8 head;
U16 len;
U32 uid;
U8 cmd;
U8 data[1];
} FM_Packet_t, *FM_Packet_p;
It uses a flexible array - the [1] is just to keep the compiler quiet. On the nit-picking compilers, you may get warnings about array sizes. If we assign pFM_Packet->len to len,
The size of the array is len - 11
The end is at &pFM_Packet->data[len - 11]
The CRC is at &pFM_Packet->data[len - 10]
If you are using gcc, you can use the gcc extension which allows declaration of
U8 data[0];
This is illegal in most compilers but gcc allows it.

Store C structs for multiple platform use - would this approach work?

Compiler: GNU GCC
Application type: console application
Language: C
Platforms: Win7 and Linux Mint
I wrote a program that I want to run under Win7 and Linux. The program writes C structs to a file and I want to be able to create the file under Win7 and read it back in Linux and vice versa.
By now, I have learned that writing complete structs with fwrite() will give almost 100% assurance that it won't be read back correctly by the other platform. This due to padding and maybe other causes.
I defined all structs myself and they (now, after my previous question on this forum) all have members of type int32_t, int64_t and char. I am thinking about writing a WriteStructname() function for each struct that will write the individual members as int32_t, int64_t and char to the outputfile. Likewise, a ReadStructname() function to read the individual struct members from the file and copy them to an empty struct again.
Would this approach work? I prefer to have maximum control over my sourcecode, so I'm not looking for libraries or other dependencies to achieve this unless I really have to.
Thanks for reading
Element-wise writing of data to a file is your best approach, since structs will differ due to alignment and packing differences between compilers.
However, even with the approach you're planning on using, there are still potential pitfalls, such as different endianness between systems, or different encoding schemes (ie: two's complement versus one's complement encoding of signed numbers).
If you're going to do this, you should consider something like a JSON parser to encode and decode your data so you don't corrupt it due to the issues mentioned above.
Good luck!
If you use GCC or any other compiler that supports "packed" structs, as long you avoid yourself from using anything but [u]intX_t types in the struct, and execute endianness fix in any field where type is bigger than 8 bits, you are platform safe :)
This is an example code where you get portability between platforms, do not forget to manually edit the endianness UIP_BYTE_ORDER.
#include <stdint.h>
#include <stdio.h>
/* These macro are set manually, you should use some automated detection methodology */
#define UIP_BIG_ENDIAN 1
#define UIP_LITTLE_ENDIAN 2
#define UIP_BYTE_ORDER UIP_LITTLE_ENDIAN
/* Borrowed from uIP */
#ifndef UIP_HTONS
# if UIP_BYTE_ORDER == UIP_BIG_ENDIAN
# define UIP_HTONS(n) (n)
# define UIP_HTONL(n) (n)
# define UIP_HTONLL(n) (n)
# else /* UIP_BYTE_ORDER == UIP_BIG_ENDIAN */
# define UIP_HTONS(n) (uint16_t)((((uint16_t) (n)) << 8) | (((uint16_t) (n)) >> 8))
# define UIP_HTONL(n) (((uint32_t)UIP_HTONS(n) << 16) | UIP_HTONS((uint32_t)(n) >> 16))
# define UIP_HTONLL(n) (((uint64_t)UIP_HTONL(n) << 32) | UIP_HTONL((uint64_t)(n) >> 32))
# endif /* UIP_BYTE_ORDER == UIP_BIG_ENDIAN */
#else
#error "UIP_HTONS already defined!"
#endif /* UIP_HTONS */
struct __attribute__((__packed__)) s_test
{
uint32_t a;
uint8_t b;
uint64_t c;
uint16_t d;
int8_t string[13];
};
struct s_test my_data =
{
.a = 0xABCDEF09,
.b = 0xFF,
.c = 0xDEADBEEFDEADBEEF,
.d = 0x9876,
.string = "bla bla bla"
};
void save()
{
FILE * f;
f = fopen("test.bin", "w+");
/* Fix endianness */
my_data.a = UIP_HTONL(my_data.a);
my_data.c = UIP_HTONLL(my_data.c);
my_data.d = UIP_HTONS(my_data.d);
fwrite(&my_data, sizeof(my_data), 1, f);
fclose(f);
}
void read()
{
FILE * f;
f = fopen("test.bin", "r");
fread(&my_data, sizeof(my_data), 1, f);
fclose(f);
/* Fix endianness */
my_data.a = UIP_HTONL(my_data.a);
my_data.c = UIP_HTONLL(my_data.c);
my_data.d = UIP_HTONS(my_data.d);
}
int main(int argc, char ** argv)
{
save();
return 0;
}
Thats the saved file dump:
fanl#fanl-ultrabook:~/workspace-tmp/test3$ hexdump -v -C test.bin
00000000 ab cd ef 09 ff de ad be ef de ad be ef 98 76 62 |..............vb|
00000010 6c 61 20 62 6c 61 20 62 6c 61 00 00 |la bla bla..|
0000001c
This is a good approach. If all fields are integer types of a specific size such as int32_t, int64_t, or char, and you read/write the appropriate number of them to/from arrays, you should be fine.
The one thing you need to watch out for is endianness. Any integer type should be written in a known byte order and read back in the proper byte order for the system in question. The simplest way to do this is with the ntohs and htons functions for 16-bit ints and the ntohl and htonl functions for 32-bit ints. There's no corresponding standard functions for 64-bit ints, but that shouldn't be to difficult to write.
Here's a sample of how you could write these functions for 64 bit:
uint64_t htonll(uint64_t val)
{
uint8_t v[8];
uint64_t *result = (uint64_t *)v;
int i;
for (i=0; i<8; i++) {
v[i] = (uint8_t)(val >> ((7-i) * 8));
}
return *result;
}
uint64_t ntohll(uint64_t val)
{
uint8_t *v = (uint8_t *)&val;
uint64_t result = 0;
int i;
for (i=0; i<8; i++) {
result |= (uint64_t)v[i] << ((7-i) * 8);
}
return result;
}

Create BMP header in C (can't limit 2 byte fields)

I'm doing it based on:
https://en.wikipedia.org/wiki/BMP_file_format
I want to create a BMP image from scratch in C.
#include <stdio.h>
#include <stdlib.h>
typedef struct HEADER {
short FileType;
int FileSize;
short R1;
short R2;
int dOffset;
} tp_header;
int main () {
FILE *image;
image = fopen("test.bmp", "w");
tp_header bHeader;
bHeader.FileType = 0x4D42;
bHeader.FileSize = 70;
bHeader.R1 = 0;
bHeader.R2 = 0;
bHeader.dOffset = 54;
fwrite(&bHeader, sizeof(struct HEADER), 1, image);
return 0;
}
I should be getting at output file:
42 4D 46 00 00 00 00 00 00 00 36 00 00 00
But instead i get:
42 4D 40 00 46 00 00 00 00 00 00 00 36 00 00 00
First of it should contain only 14 bytes. That "40 00" ruins it all. Is that the propper way of setting the header in C? How else can i limit the size in bytes outputed?
A struct might include padding bytes between the fields to align the next field to certain address offsets. The values of these padding bytes are indetermined. A typical layout might look like:
struct {
uint8_t field1;
uint8_t <padding>
uint8_t <padding>
uint8_t <padding>
uint32_t field2;
uint16_t field3;
uint8_t <padding>
uint8_t <padding>
};
<padding> is just added by the compile; it is not accessible by your program. This is just an example. Actual padding may differ and is defined by the ABI for your architecture (CPU/OS/toolchain).
Also, the order in which the bytes of a larger type are stored in memory (endianess) depends on the architecture. However, as the file requires a specific endianess, this might also have to be fixed.
Some - but not all - compilers alow to specify a struct to be packed (avoid padding), that still does not help with the endianess-problem.
Best is to serialize the struct properly by shifts and store to an uint8_t-array:
#include <stdint.h>
/** Write an uint16_t to a buffer.
*
* \returns The next position in the buffer for chaining.
*/
inline uint8_t *writeUInt16(uint8_t *bp, value)
{
*bp++ = (uint8_t)value;
*bp++ = (uint8_t)(value >> 8);
return bp;
}
// similar to writeUInt16(), but for uint32_t.
... writeUInt32( ... )
...
int main(void)
{
...
uint8_t buffer[BUFFER_SIZE], *bptr;
bptr = buffer;
bptr = writeUInt16(bptr, 0x4D42U); // FileType
bptr = writeUInt32(bptr, 70U); // FileSize
...
}
That will fill buffer with the header fields. BUFFER_SIZE has to be set according to the header you want to create. Once all fields are stored, write buffer to the file.
Declaring the functions inline hints a good compiler to create almost optimal code for constants.
Note also, that the sizes of short, etc. are not fixed. Use stdint.h types is you need types of defined size.
The problem is that your struct is aligned. You ought to write it like
#pragma pack(push, 1)
typedef struct HEADER {
short FileType;
int FileSize;
short R1;
short R2;
int dOffset;
} tp_header;
#pragma pack(pop)
Just for you to know — the compiler for optimizing reasons by default would lay it out like:
typedef struct HEADER {
short FileType;
char empty1; //inserted by compiler
char empty2; //inserted by compiler
int FileSize;
short R1;
short R2;
int dOffset;
} tp_header;
But you actually made also another error: sizeof(int) ≥ 4 bytes. I.e. depending on a platform integer could be 8 bytes. It is important, in such a cases you have to use types like int32_t from cstdint

c get data from BMP

I find myself writing a simple program to extract data from a bmp file. I just got started and I am at one of those WTF moments.
When I run the program and supply this image: http://www.hack4fun.org/h4f/sites/default/files/bindump/lena.bmp
I get the output:
type: 19778
size: 12
res1: 0
res2: 54
offset: 2621440
The actual image size is 786,486 bytes. Why is my code reporting 12 bytes?
The header format specified in,
http://en.wikipedia.org/wiki/BMP_file_format matches my BMP_FILE_HEADER structure. So why is it getting filled with wrong information?
The image file doesn't appear to be corrupt and other images are giving equally wrong outputs. What am I missing?
#include <stdio.h>
#include <stdlib.h>
typedef struct {
unsigned short type;
unsigned int size;
unsigned short res1;
unsigned short res2;
unsigned int offset;
} BMP_FILE_HEADER;
int main (int args, char ** argv) {
char *file_name = argv[1];
FILE *fp = fopen(file_name, "rb");
BMP_FILE_HEADER file_header;
fread(&file_header, sizeof(BMP_FILE_HEADER), 1, fp);
if (file_header.type != 'MB') {
printf("ERROR: not a .bmp");
return 1;
}
printf("type: %i\nsize: %i\nres1: %i\nres2: %i\noffset: %i\n", file_header.type, file_header.size, file_header.res1, file_header.res2, file_header.offset);
fclose(fp);
return 0;
}
Here the header in hex:
0000000 42 4d 36 00 0c 00 00 00 00 00 36 00 00 00 28 00
0000020 00 00 00 02 00 00 00 02 00 00 01 00 18 00 00 00
The length field is the bytes 36 00 0c 00`, which is in intel order; handled as a 32-bit value, it is 0x000c0036 or decimal 786,486 (which matches the saved file size).
Probably your C compiler is aligning each field to a 32-bit boundary. Enable a pack structure option, pragma, or directive.
There are two mistakes I could find in your code.
First mistake: You have to pack the structure to 1, so every type size is exactly the size its meant to be, so the compiler doesn't align it for example in 4 bytes alignment. So in your code, short, instead of being 2 bytes, it was 4 bytes. The trick for this, is using a compiler directive for packing the nearest struct:
#pragma pack(1)
typedef struct {
unsigned short type;
unsigned int size;
unsigned short res1;
unsigned short res2;
unsigned int offset;
} BMP_FILE_HEADER;
Now it should be aligned properly.
The other mistake is in here:
if (file_header.type != 'MB')
You are trying to check a short type, which is 2 bytes, with a char type (using ''), which is 1 byte. Probably the compiler is giving you a warning about that, it's canonical that single quotes contain just 1 character with 1-byte size.
To get this around, you can divide this 2 bytes into 2 1-byte characters, which are known (M and B), and put them together into a word. For example:
if (file_header.type != (('M' << 8) | 'B'))
If you see this expression, this will happen:
'M' (which is 0x4D in ASCII) shifted 8 bits to the left, will result in 0x4D00, now you can just add or or the next character to the right zeroes: 0x4D00 | 0x42 = 0x4D42 (where 0x42 is 'B' in ASCII). Thinking like this, you could just write:
if (file_header.type != 0x4D42)
Then your code should work.

How to randomly access word aligned data on ARM processors?

ARM CPUs at least up to ARMv5 do not allow random access to memory addresses which are not word aligned. The problem is described in length here: http://lecs.cs.ucla.edu/wiki/index.php/XScale_alignment – One solution is to rewrite your code or consider this alignment in the first place. However it's not said how. Given a byte stream where I have 2- or 4-byte integers which are not word aligned in the stream. How do I access this data in a smart way without losing to much performance?
I have a code snippet which illustrates the problem:
#include <stdio.h>
#include <stdlib.h>
#define BUF_LEN 17
int main( int argc, char *argv[] ) {
unsigned char buf[BUF_LEN];
int i;
unsigned short *p_short;
unsigned long *p_long;
/* fill array */
(void) printf( "filling buffer:" );
for ( i = 0; i < BUF_LEN; i++ ) {
/* buf[i] = 1 << ( i % 8 ); */
buf[i] = i;
(void) printf( " %02hhX", buf[i] );
}
(void) printf( "\n" );
/* testing with short */
(void) printf( "accessing with short:" );
for ( i = 0; i < BUF_LEN - sizeof(unsigned short); i++ ) {
p_short = (unsigned short *) &buf[i];
(void) printf( " %04hX", *p_short );
}
(void) printf( "\n" );
/* testing with long */
(void) printf( "accessing with long:" );
for ( i = 0; i < BUF_LEN - sizeof(unsigned long); i++ ) {
p_long = (unsigned long *) &buf[i];
(void) printf( " %08lX", *p_long );
}
(void) printf( "\n" );
return EXIT_SUCCESS;
}
On a x86 CPU this is the output:
filling buffer: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10
accessing with short: 0100 0201 0302 0403 0504 0605 0706 0807 0908 0A09 0B0A 0C0B 0D0C 0E0D 0F0E
accessing with long: 03020100 04030201 05040302 06050403 07060504 08070605 09080706 0A090807 0B0A0908 0C0B0A09 0D0C0B0A 0E0D0C0B 0F0E0D0C
On a ATMEL AT91SAM9G20 ARMv5 core I get (note: this is the expected behaviour of this CPU!):
filling buffer: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10
accessing with short: 0100 0100 0302 0302 0504 0504 0706 0706 0908 0908 0B0A 0B0A 0D0C 0D0C 0F0E
accessing with long: 03020100 00030201 01000302 02010003 07060504 04070605 05040706 06050407 0B0A0908 080B0A09 09080B0A 0A09080B 0F0E0D0C
So given I want or have to access the byte stream at not aligned addresses: how would I do that efficiently on ARM?
You write your own packing/unpacking functions, which translate between aligned variables and the unaligned byte stream. For example,
void unpack_uint32(uint8_t* unaligned_stream, uint32_t* aligned_var)
{
// copy byte-by-byte from stream to var, you can fill in the details
}
Your example will demonstrate problems on any platform. the simple fix of course:
unsigned char *buf;
int i;
unsigned short *p_short;
unsigned long p_long[BUF_LEN>>2];
if you cannot organize the data with better alignment (more bytes can at times equal better performance) then do the obvious and address everything as 32 bits and chop out portions from there, the optimizer will take care of a lot of it for the shorts and bytes within a word (actually including bytes and shorts in your structures, be they structures or bytes picked out of memory, can be more costly as there will be extra instructions than if you passed everything around as words, you have to do your system engineering).
An example to extract an unaligned word. (have to manage your endians of course)
a = (lptr[offset]<<16)|(lptr[offset+1]>>16);
All arm cores from the armv4 to the present allow unaligned access, most by default have the exception turned on but you can turn it off. Now the older ones rotate within the word but others can grab other byte lanes if I am not mistaken.
Do your system engineering, do your performance analysis and determine if moving everything as words is faster or slower. The actual moving of data will have some overhead, but code on both sides will run much faster if everything is aligned. Can you suffer some number X times slower data move to have a 2x to 4x improvement on generation and reception of that data?
This function always uses aligned 32-bit accesses:
uint32_t fetch_unaligned_uint32 (uint8_t *unaligned_stream)
{
switch (((uint32_t )unaligned_stream) & 3u)
{
case 3u:
return ((*(uint32_t *)unaligned_stream[-3]) << 24)
| ((*(uint32_t *)unaligned_stream[ 1]) & 0xffffffu);
case 2u:
return ((*(uint32_t *)unaligned_stream[-2]) << 16)
| ((*(uint32_t *)unaligned_stream[ 2]) & 0x00ffffu);
case 1u:
return ((*(uint32_t *)unaligned_stream[-1]) << 8)
| ((*(uint32_t *)unaligned_stream[ 3]) & 0x0000ffu);
case 0u:
default:
return *(uint32_t *)unaligned_stream;
}
}
It may be faster than reading and shifting all 4 bytes separately.

Resources