Copying bytes to struct gives wrong values [duplicate] - c

This question already has answers here:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
(13 answers)
Closed 8 years ago.
I'm trying to copy a byte array to a struct:
Actual bytes:
00000000 | ff 53 4d 42 72 00 00 00 00 08 01 c8 00 00 00 00 | .SMBr...........
Destination structure:
typedef struct {
uint8_t protocol[4];
uint8_t command;
uint32_t status;
uint8_t flags;
uint16_t flags2;
uint16_t pidHigh;
uint16_t somethingElse;
} MyStruct;
But for some reason, bytes in myStruct.status are not what they're supposed to be:
printf("%x", bytes[4]);
=> 72 // Ok
printf("%x", myStruct.command);
=> 72 // Ok
printf("%02x%02x%02x%02x", bytes[5], bytes[6], bytes[7], bytes[8]);
=> 00000000 // Ok
printf("%"PRIX32, myStruct.status);
=> C8010800 // What?! Why did it take the next 4 bytes... and reversed them?
Code used to copy those bytes:
MyStruct myStruct;
memcpy(&myStruct, bytes, 16);
This code is running on ARM (iPhone 5), which might explain the little-endianness of the output, but it doesn't explain why there's a +4 bytes offset in the bytes that've been copied.
What's going on here?

The memory layout of a struct is going to conform to the alignment requirements of its members. On 32-bit ARM, 16-bit values need 2 byte alignment and 32-bit and greater values require 4 byte alignment. There are padding bytes in between the structure elements when the alignment doesn't match from one to another. Due to this padding, copying or casting arrays of bytes to a struct is not going to work as you expect.
Unfortunately, there is no great way around this. You can choose to pack your structures, which may reduce their performance. You can copy each element individually. Or you can carefully arrange your structures so that they are tightly packed (assuming you are aware of the alignment rules for all platforms your code will run on).
For example: if you rearrange you struct in this way, it will be perfectly packed with no padding bytes in the middle or at the end (it is an even multiple of 4).
typedef struct {
uint32_t status; // +0
uint16_t flags2; // +4
uint16_t pidHigh; // +6
uint16_t somethingElse; // +8
uint8_t command; // +10
uint8_t flags; // +11
uint8_t protocol[4]; // +12
} MyStruct;

The compiler aligns the elements in the struct, so that all of them occupy a space in memory equal to a multiple of 4.
So basically, command, that supposedly uses 1 byte only, is followed by 3 bytes of garbage before status.
You can tell the compiler to not do that by setting this:
#pragma pack(1)

Related

Endian-independent way of using memcpy() from smaller to larger integer pointer

Suppose I have two arrays.
uint8_t[SIZE] src = { 0 };
uint32_t[SIZE] dst = { 0 };
uint8_t* srcPtr; // Points to current src value
uint32_t* dstPtr; // Points to current dst value
src holds values that sometimes need to be put into dst. Importantly, the values from src may be 8-bit, 16-bit, or 32-bit, and aren't necessarily properly aligned. So, suppose I wish to use memcpy() like below, to copy a 16-bit value
memcpy(dstPtr, srcPtr, 2);
Will I run into an endianness issue here? This works fine on little-endian systems, since if I want to copy 8, then srcPtr has 08 then 00 the bytes at dstPtr will be 08 00 00 00 and the value will be 8, as expected.
But if I were on a big-endian system, srcPtr would be 00 then 08, and the bytes at dstPtr will be 00 08 00 00 (I presume), which would take on a value of 524288.
What would be an endian-independent way to write this copy?
Will I run into an endianness issue here?
Not necessarily endianness issues per se, but yes, the specific approach you describe will run into issues with integer representation.
This works fine on
little-endian systems, since if I want to copy 8, then srcPtr has 08
then 00 the bytes at dstPtr will be 08 00 00 00 and the value will be
8, as expected.
You seem to be making an assumption there, either
that more bytes of the destination will be modified than you actually copy, or perhaps
that relevant parts of the destination are pre-set to all zero bytes.
But you need to understand that memcpy() will copy exactly the number of bytes requested. No more than that will be read from the specified source, and no more than that will be modified in the destination. In particular, the data types of the objects to which the source and destination pointers point have no effect on the operation of memcpy().
What would be an endian-independent way to write this copy?
The most natural way to do it would be via simple assignment, relying on the compiler to perform the necessary conversion:
*dstPtr = *srcPtr;
However, I take your emphasis on the prospect that the arrays might not aligned as a concern that it may be unsafe to dereference the source and / or destination pointer. That will not, in fact, be the case for pointers to char, but it might be the case for pointers to other types. For cases where you take memcpy as the only safe way to read from the arrays, the most portable method for converting value representations is still to rely on the implementation. For example:
uint8_t* srcPtr = /* ... */;
uint32_t* dstPtr = /* ... */;
uint16_t srcVal;
uint32_t dstVal;
memcpy(&srcVal, srcPtr, sizeof(srcVal));
dstVal = srcVal; // conversion is automatically performed
memcpy(dstPtr, &dstVal, sizeof(dstVal));
Will I run into an endianness issue here?
Yes. You're not copying, you're converting from one format to another (packing several unsigned integers into a single larger unsigned integer).
What would be an endian-independent way to write this copy?
The simple way is to make the conversion explicit, like:
for(int i = 0; i < something; i++) {
dest[i] = (uint32_t)src[i*4] | ((uint32_t)src[i*4+1] << 8) |
((uint32_t)src[i*4+2] << 16) | ((uint32_t)src[i*4+3] << 24);
}
However, for cases where using memcpy() works it's likely to be faster, and this won't change after compiling; so you could do something like:
#ifdef BIG_ENDIAN
for(int i = 0; i < something; i++) {
dest[i] = (uint32_t)src[i*4] | ((uint32_t)src[i*4+1] << 8) |
((uint32_t)src[i*4+2] << 16) | ((uint32_t)src[i*4+3] << 24);
}
#else
memcpy(dest, src, something*4);
#endif
Note: you'd also have to define the BIG_ENDIAN macro when appropriate - e.g. maybe a -D BIG_ENDIAN command line argument when starting the compiler when you know the target architecture needs it.
I'm storing 16-bit values in src which aren't 16-bit-aligned which then need to be put into a 64-bit integer
That adds another problem - some architectures do not allow misaligned accesses. You need to use explicit conversion (read 2 separate uint8_t, not a misaligned uint16_t) to avoid this problem too.

Data of a struct into a union

I have declared the next union:
typedef union
{
struct
{
uint32_t data;
};
uint8_t w[4];
} xxx_data_t
I am trying to access a memory by SPI, which only has an input capacity of 1Byte.
I want to enter the variable data, and I have though of descomposing that 32-bits data variable into 4 8-bits variables (1Byte each one), thus forming the w[4] array.
My question is: is this valid? This creates a descomposition of my 32-bits variable?
EXAMPLE
I declare xxx_DATA_t my_variable.
my_variable.data=300 which in hexadecimal is 0x12C. Will the array be my_variable.w[4]=[0,0,1,44]??
data (32-bits) = 300 = 0x 00 00 01 2C
w (4-bits) ====== = 0x [0] [0] [1] [44]
Thanks all.
I think it is right. You can write a test program to test it.
#include<stdio.h>
#include<stdint.h>
typedef union
{
struct
{
uint32_t data;
};
uint8_t w[4];
}data_t;
int main(){
data_t d = {.data = 0x12c};
for(int i = 0; i < 4; i++){
printf("%d\n", d.w[i]);
}
return 0;
}
44
1
0
0
My question is: is this valid?
Yes, you may do type punning between different types using union. Type punning to/from a character type (which uint8_t almost certainly is) is always safe. Alignment/padding shouldn't be an issue either, in this specific case.
Please note that you can only do this in C - you cannot do it in C++. So think twice before using C++ for hardware-related programming.
which in hexadecimal is 0x12C. Will the array be my_variable.w[4]=[0,0,1,44]??
It depends on CPU endianess. What is CPU endianness?
So you can either get 00 00 01 2C on a Big Endian computer such as Power PC, or you can get 2C 01 00 00 on a Little Endian computer such as x86.
As for what endianess you actually want, it is the network endianess. In case of SPI, network endianess is the same as the byte order that the part you are communicating with expects. You have to look that up in a datasheet, in case it's a "dumb" part like an ADC, display or similar. If it's a "smart" part like another MCU, then you can probably specify the network endianess yourself.

Parsing ID3V2 Frames in C

I have been attempting to retrieve ID3V2 Tag Frames by parsing through the mp3 file and retrieving each frame's size. So far I have had no luck.
I have effectively allocated memory to a buffer to aid in reading the file and have been successful in printing out the header version but am having difficulty in retrieving both the header and frame sizes. For the header framesize I get 1347687723, although viewing the file in a hex editor I see 05 2B 19.
Two snippets of my code:
typedef struct{ //typedef structure used to read tag information
char tagid[3]; //0-2 "ID3"
unsigned char tagversion; //3 $04
unsigned char tagsubversion;//4 00
unsigned char flags; //5-6 %abc0000
uint32_t size; //7-10 4 * %0xxxxxxx
}ID3TAG;
if(buff){
fseek(filename,0,SEEK_SET);
fread(&Tag, 1, sizeof(Tag),filename);
if(memcmp(Tag.tagid,"ID3", 3) == 0)
{
printf("ID3V2.%02x.%02x.%02x \nHeader Size:%lu\n",Tag.tagversion,
Tag.tagsubversion, Tag.flags ,Tag.size);
}
}
Due to memory alignment, the compiler has set 2 bytes of padding between flags and size. If your struct were putted directly in memory, size would be at address 6 (from the beginning of the struct). Since an element of 4 bytes size must be at an address multiple of 4, the compiler adds 2 bytes, so that size moves to the closest multiple of 4 address, which is here 8. So when you read from your file, size contains bytes 8-11. If you try to print *(&Tag.size - 2), you'll surely get the correct result.
To fix that, you can read fields one by one.
ID3v2 header structure is consistent across all ID3v2 versions (ID3v2.0, ID3v2.3 and ID3v2.4).
Its size is stored as a big-endian synch-safe int32
Synchsafe integers are
integers that keep its highest bit (bit 7) zeroed, making seven bits
out of eight available. Thus a 32 bit synchsafe integer can store 28
bits of information.
Example:
255 (%11111111) encoded as a 16 bit synchsafe integer is 383
(%00000001 01111111).
Source : http://id3.org/id3v2.4.0-structure ยง 6.2
Below is a straightforward, real-life C# implementation that you can easily adapt to C
public int DecodeSynchSafeInt32(byte[] bytes)
{
return
bytes[0] * 0x200000 + //2^21
bytes[1] * 0x4000 + //2^14
bytes[2] * 0x80 + //2^7
bytes[3];
}
=> Using values you read on your hex editor (00 05 EB 19), the actual tag size should be 112025 bytes.
By coincidence I am also working on an ID3V2 reader. The doc says that the size is encoded in four 7-bit bytes. So you need another step to convert the byte array into an integer... I don't think just reading those bytes as an int will work because of the null bit on top.

Structure and pointer

I'm having a problem getting the entry memory address to a member variable of a structure. I've tried in two ways, one of which didn't work properly. It would be very good if you guys give me some advice.
First, i defined a structure named BITMAP_HEADER.
struct BITMAP_HEADER
{
WORD bfType ;
DWORD bfSize ;
WORD bfReserved1 ;
WORD bfReserved2 ;
DWORD bfOffBits ;
} ;
Second, i defined and initialized some variables. please look at the code below before you read next line. In case you ask me why i got a character pointer, i needed to access each bytes of integer bfSize.
struct BITMAP_HEADER bitmap_header ;
char* pSize = (char*)&bitmap_header.bfSize;
Third, i got a memory address to the bfSize in two different ways and printed the values.
1. printf("%X\n", *pSize) ;
2. printf("%X\n", (unsigned char)*(((char*)&bitmap_header)+2)) ;
(1) directly got a memory address to the bitmap_header.bfSize.
(2) got a memory address to the structure BITMAP_HEADER and shifted the pointer to the next by 2 bytes.
Finally, here is the result.
2D
F6
For your information, here is the hex data of the structure BITMAP_HEADER.
42 4D / F6 C6 2D 00 / 00 00 / 00 00 / 36 00 00 00
Why didn't the first method work? I thought the two methods were exactly same.
You're running into structure padding here. The compiler is inserting two bytes' worth of padding between the bfType and bfSize fields, to align bfSize to 4 bytes' size, since bfSize is a DWORD.
Generally speaking, you cannot rely on being able to calculate exact offsets within a structure, since the compiler might add padding between members. You can control this to some degree using compiler-specific bits; for example, on MSVC, the pack pragma, but I would not recommend this. Structure padding is there to specify member alignment restrictions, and some architectures will fault on unaligned accesses. (Others might fixup the alignment manually, but typically do this rather slowly.)
See also: http://en.wikipedia.org/wiki/Data_structure_alignment#Data_structure_padding
As for the raw data which structure is known in advance, it usually better to read it to an array and use defined offsets to access required fields. This way you won't have to worry about compiler's behaviour (which might often be not as you expected). Your code would look like:
#define FIELD_TYPE 0
#define FIELD_SIZE 2
#define FIELD_RES1 6
#define FIELD_RES2 8
#define FIELD_OFF 10
#define SIZE_HEADER 14
static uint8_t header[SIZE_HEADER];
<...>
uint8_t * pheader = header;
DWORD offset_bits = (DWORD)*(pheader + FIELD_OFF);
P.S. to make this code portable, size of WORD and endianness must be considered, few #ifdef.. #else.. #endif should help with that.
P.P.S it would be even better use manual logical operations and shift operators instead of casting, but left it this way for the sake of brevity.

How are the structure members stored on a little endian machine?

struct Dummy {
int x;
char y;
};
int main() {
struct Dummy dum;
dum.x = 10;
dum.y = 'a';
}
How would be the layout of the structure members on a little endian machine?
Would it be something like this?
0x1000 +0 +1 +2 +3
___________________
x: | 10 | 0 | 0 | 0 |
-------------------
y: | 'a'| 0 | 0 | 0 |
-------------------
0x1000 +4 +5 +6 +7
I think you'll find this question useful. The endianess is usually relevant for a word in the memory, not to the whole structure.
Structure layout is a compiler implementation detail, affected by the default packing. Endianness normally only affects the order of the bytes in a structure member value, not the layout. Check the small print in the compiler manual or use sizeof and the offsetof macro to experiment.
The layout you documented in your question is indeed very common for a 32-bit LE compiler.
The structure members will be in the order declared, with padding inserted as necessary so each field is properly aligned for its type and with padding inserted as necessary at the end such that in an array each subsequent structure is properly aligned and begins immediately after the end of the previous structure. It is also possible (but unlikely) that additional unnecessary padding will be inserted between any two elements or at the end.
Each field itself will be stored as appropriate for the type on the compiler and architecture, e.g. the int 10 would be stored as the bytes 0a 00 00 00 on a normal little-endian machine with 32-bit ints.

Resources