using memcpy for structs - c

I have a problem when using memcpy on a struct.
Consider the following struct
struct HEADER
{
unsigned int preamble;
unsigned char length;
unsigned char control;
unsigned int destination;
unsigned int source;
unsigned int crc;
}
If I use memcpy to copy data from a receive buffer to this struct the copy is OK, but if i redeclare the struct to the following :
struct HEADER
{
unsigned int preamble;
unsigned char length;
struct CONTROL control;
unsigned int destination;
unsigned int source;
unsigned int crc;
}
struct CONTROL
{
unsigned dir : 1;
unsigned prm : 1;
unsigned fcb : 1;
unsigned fcb : 1;
unsigned function_code : 4;
}
Now if I use the same memcpy code as before, the first two variables ( preamble and length ) are copied OK. The control is totally messed up, and last three variables are shifted one up, aka crc = 0, source = crc, destination = source...
ANyone got any good suggestions for me ?

Do you know that the format in the receive buffer is correct, when you add the control in the middle?
Anyway, your problem is that bitfields are the wrong tool here: you can't depend on the layout in memory being anything in particular, least of all the exact same one you've chosen for the serialized form.
It's almost never a good idea to try to directly copy structures to/from external storage; you need proper serialization. The compiler can add padding and alignment between the fields of a structure, and using bitfields makes it even worse. Don't do this.
Implement proper serialization/deserialization functions:
unsigned char * header_serialize(unsigned char *put, const struct HEADER *h);
unsigned char * header_deserialize(unsigned char *get, struct HEADER *h);
That go through the structure and read/write as many bytes as you feel are needed (possibly for each field):
static unsigned char * uint32_serialize(unsigned char *put, uint32_t x)
{
*put++ = (x >> 24) & 255;
*put++ = (x >> 16) & 255;
*put++ = (x >> 8) & 255;
*put++ = x & 255;
return put;
}
unsigned char * header_serialize(unsigned char *put, const struct HEADER *h)
{
const uint8_t ctrl_serialized = (h->control.dir << 7) |
(h->control.prm << 6) |
(h->control.fcb << 5) |
(h->control.function_code);
put = uint32_serialize(put, h->preamble);
*put++ = h->length;
*put++ = ctrl_serialized;
put = uint32_serialize(put, h->destination);
put = uint32_serialize(put, h->source);
put = uint32_serialize(put, h->crc);
return put;
}
Note how this needs to be explicit about the endianness of the serialized data, which is something you always should care about (I used big-endian). It also explicitly builds a single uint8_t version of the control fields, assuming the struct version was used.
Also note that there's a typo in your CONTROL declaration; fcb occurs twice.

Using struct CONTROL control; instead of unsigned char control; leads to a different alignment inside the struct and so filling it with memcpy() produces a different result.

Memcpy copies the values of bytes from the location pointed by source directly to the memory block pointed by destination.
The underlying type of the objects pointed by both the source and destination pointers are irrelevant for this function; The result is a binary copy of the data.
So if there is any structure padding then you will have messed up results.

Check sizeof(struct CONTROL) -- I think it would be 2 or 4 depending on the machine. Since you are using unsigned bitfields (and unsigned is shorthand of unsigned int), the whole structure (struct CONTROL) would take at least the size of unsigned int -- i.e. 2 or 4 bytes.
And, using unsigned char control takes 1 byte for this field. So, definitely there should be mismatch staring with the control variable.
Try rewriting the struct control as below:-
struct CONTROL
{
unsigned char dir : 1;
unsigned char prm : 1;
unsigned char fcb : 1;
unsigned char fcb : 1;
unsigned char function_code : 4;
}

The clean way would be to use a union, like in.:
struct HEADER
{
unsigned int preamble;
unsigned char length;
union {
unsigned char all;
struct CONTROL control;
} uni;
unsigned int destination;
unsigned int source;
unsigned int crc;
};
The user of the struct can then choose the way he wants to access the thing.
struct HEADER thing = {... };
if (thing.uni.control.dir) { ...}
or
#if ( !FULL_MOON ) /* Update: stacking of bits within a word appears to depend on the phase of the moon */
if (thing.uni.all & 1) { ... }
#else
if (thing.uni.all & 0x80) { ... }
#endif
Note: this construct does not solve endianness issues, that will need implicit conversions.
Note2: and you'll have to check the bit-endianness of your compiler, too.
Also note that bitfields are not very useful, especially if the data goes over the wire, and the code is expected to run on different platforms, with different alignment and / or endianness. Plain unsigned char or uint8_t plus some bitmasking yields much cleaner code. For example, check the IP stack in the BSD or linux kernels.

Related

Bitfields and Unions in C giving problems

I am implementing a radio standard and have hit a problem with unions in structure and memory size. In the below example I need this structure to located in a single byte of memory (as per the radio standard) but its currently giving me a size of 2 bytes. After much digging I understand that its because the Union's "size" is byte rather than 3 bits...but havent worked out a way around this.
I have looked at:
Bitfields in C with struct containing union of structs; and
Will this bitfield work the way I expect?
But neither seem to give me a solution.
Any ideas?
Thanks!
#ifdef WIN32
#pragma pack(push)
#pragma pack(1)
#endif
typedef struct three_bit_struct
{
unsigned char bit_a : 1;
unsigned char bit_b : 1;
unsigned char bit_c : 1;
}three_bit_struct_T;
typedef union
{
three_bit_struct_T three_bit_struct;
unsigned char another_three_bits : 3;
}weird_union_T;
typedef struct
{
weird_union_T problem_union;
unsigned char another_bit : 1;
unsigned char reserved : 4;
}my_structure_T;
int _tmain(int argc, _TCHAR* argv[])
{
int size;
size = sizeof(my_structure_T);
return 0;
}
#ifdef WIN32
#pragma pack(pop)
#endif
The problem is that the size of three_bit_struct_T will be rounded up to the nearest byte* regardless of the fact that it only contains three bits in its bitfield. A struct simply cannot have a size which is part-of-a-byte. So when you augment it with the extra fields in my_structure_T, inevitably the size will spill over into a second byte.
To cram all that stuff into a single byte, you'll have to put all the bitfield members in the outer my_structure_T rather than having them as an inner struct/union.
I think the best you can do is have the whole thing as a union.
typedef struct
{
unsigned char bit_a : 1;
unsigned char bit_b : 1;
unsigned char bit_c : 1;
unsigned char another_bit : 1;
unsigned char reserved : 4;
} three_bit_struct_T;
typedef struct
{
unsigned char another_three_bits : 3;
unsigned char another_bit : 1;
unsigned char reserved : 4;
} another_three_bit_struct_T;
typedef union
{
three_bit_struct_T three_bit_struct;
another_three_bit_struct_T another_three_bit_struct;
} my_union_T;
(*) or word, depending on alignment/packing settings.
Two good advices: never use struct/union for data protocols, and never use bit-fields anywhere in any situation.
The best way to implement this is through bit masks and bit-wise operators.
#define BYTE_BIT7 0x80u
uint8_t byte;
byte |= BYTE_BIT_7; // set bit to 1
byte &= ~BYTE_BIT_7; // set bit to 0
if(byte & BYTE_BIT_7) // check bit value
This code is portable to every C compiler in the world and also to C++.

Is it valid to use bit fields with union?

I have used bit field with a structure like this,
struct
{
unsigned int is_static: 1;
unsigned int is_extern: 1;
unsigned int is_auto: 1;
} flags;
Now i wondered to see if this can be done with a union so i modified the code like,
union
{
unsigned int is_static: 1;
unsigned int is_extern: 1;
unsigned int is_auto: 1;
} flags;
I found the bit field with union works but all those fields in the union are given to a single bit as I understood from output. Now I am seeing it is not erroneous to use bit fields with union, but it seems to me that using it like this is not operationally correct. So what is the answer - is it valid to use bit field with union?
It is valid but as you found out, not useful the way you have done it there.
You might do something like this so you can reset all the bits at the same time using flags.
union {
struct {
unsigned int is_static: 1;
unsigned int is_extern: 1;
unsigned int is_auto: 1;
};
unsigned int flags;
};
Or you might do something like this:
union {
struct {
unsigned int is_static: 1;
unsigned int is_extern: 1;
unsigned int is_auto: 1;
};
struct {
unsigned int is_ready: 1;
unsigned int is_done: 1;
unsigned int is_waiting: 1;
};
};
You are given a gun and bullets. Is it okay to shoot your self in foot with it? Of course not, but nobody can stop you from doing this if you want to.
My point is, just like gun and bullets, union and bit fields are tools and they have their purpose, uses and "abuses". So using bitfields in union, as you have written above, is perfectly valid C but a useless piece of code. All the fields inside union share same memory so all the bitfields you mention are essentially same flag as they share same memory.
Here is an example on how to use bit fields with unions. I'm also showing how to arrange for MSB. In pictures it would look something like this:
MSB LSB
7 0
+------+-------+
| five | three |
| bits | bits |
+------+-------+
// A struct tag definition of
// two bit fields in an 8-bit register
struct fields_tag {
// LSB
unsigned int five:5;
unsigned int three:3;
// MSB
};
// here is a tag and typedef for less typing
// to modify the 8-bit value as a whole
// and read in parts.
typedef union some_reg_tag {
uint8_t raw;
struct fields_tag fields;
} some_reg_t;
Here is how to use the bit fields with Arduino
some_reg_t a_register;
a_register.raw = 0xC2; // assign using raw field.
Serial.print("some reg = "); // dump entire register
Serial.println(a_register.raw, HEX); // dump register by field
Serial.print("some reg.three = ");
Serial.println(a_register.fields.three, HEX);
Serial.print("some reg.five = ");
Serial.println(a_register.fields.five, HEX);
Here is the output showing the results
some reg = C2
some reg.three = 6
some reg.five = 2

Dynamic variable length inside a structure C

I am trying to make a structure for a data packet that has a dynamic payload length and is determined by a variable within the header struct (LEN).
I am unsure on how to do this properly and I am confused by some of the examples that i have come across. Bellow is the Structure that is the basis of what i will be using.
Thanks.
struct packet
{
unsigned char payload;
unsigned int CRC : 16;
struct header
{
unsigned char SRC;
unsigned char DST;
unsigned char NS : 3; //3 bits long
unsigned char NR : 3;
unsigned char RSV : 1; //1 bit long
unsigned char LST : 1;
unsigned char OP;
unsigned char LEN;
} HEADER;
};
struct packet PACKET;
You can use a construct sometimes referred to as a "stretchy array". (Or as #Jerry Coffin points out, a "flexible array member") The variable-length payload needs to be at the end:
struct packet
{
struct header
{
unsigned char SRC;
unsigned char DST;
unsigned char NS : 3; //3 bits long
unsigned char NR : 3;
unsigned char RSV : 1; //1 bit long
unsigned char LST : 1;
unsigned char OP;
unsigned char LEN;
} HEADER;
unsigned int CRC : 16;
unsigned char payload[1]; //STRETCHY.
};
struct packet PACKET;
This type of structure needs to be dynamically allocated, since you need to manually make enough room for the payload.
PACKET * p = malloc( sizeof(PACKET)+payloadLength*sizeof(char) );
p->HEADER->LEN = payloadLength;
//fill in rest of header here.
memcpy(p->payload, incomingData, payloadLength);
Make the payload a pointer instead and allocate it at runtime according to the value of LEN in the header field.
You need to include the length in your struct but not the data. Depending on what you are doing you handle the data differently. The struct should probably contain a pointer to the data but you have to handle this when you serialize deserialize the struct. That is the pointer will mean nothing when you read it back in.
So you write the struct out including the size of the data field, then the data field out. When you read it back you fscanf the struct, then read the size of bytes the struct tells you to from the stream and store that as your data, and then finally store a pointer to the newly read data in the struct that was created with fscanf. If you are reading in multiple items like this you can continue at that point reading the next struct then data and so on.

Union to unsigned long long int cast

I have a union as follows:
typedef unsigned long GT_U32;
typedef unsigned short GT_U16;
typedef unsigned char GT_U8;
typedef union
{
GT_U8 c[8];
GT_U16 s[4];
GT_U32 l[2];
} GT_U64;
I want to cast this union into the following:
typedef unsigned long long int UINT64;
The casting function I wrote is as follows:
UINT64 gtu64_to_uint64_cast(GT_U64 number_u)
{
UINT64 casted_number = 0;
casted_number = number_u.l[0];
casted_number = casted_number << 32;
casted_number = casted_number | number_u.l[1];
return casted_number;
}
This function is using the l member to perform the shifting and bitwise or. What will happen if the s or c members of the union are used to set its values?
I am not sure if this function will always cast the values correctly. I suspect it has something to do with the byte ordering of long and short. Can any body help?
Full example program is listed below.
#include <stdio.h>
typedef unsigned long GT_U32;
typedef unsigned short GT_U16;
typedef unsigned char GT_U8;
typedef union
{
GT_U8 c[8];
GT_U16 s[4];
GT_U32 l[2];
} GT_U64;
typedef unsigned long long int UINT64;
UINT64 gtu64_to_uint64_cast(GT_U64 number_u)
{
UINT64 casted_number = 0;
casted_number = number_u.l[0];
casted_number = casted_number << 32;
casted_number = casted_number | number_u.l[1];
return casted_number;
}
int main()
{
UINT64 left;
GT_U64 right;
right.s[0] = 0x00;
right.s[1] = 0x00;
right.s[2] = 0x00;
right.s[3] = 0x01;
left = gtu64_to_uint64_cast(right);
printf ("%llu\n", left);
return 0;
}
That's really ugly and implementation-dependent - just use memcpy, e.g.
UINT64 gtu64_to_uint64_cast(GT_U64 number_u)
{
UINT64 casted_number;
assert(sizeof(casted_number) == sizeof(number_u));
memcpy(&casted_number, &number_u, sizeof(number_u));
return casted_number;
}
First of all, please use the typedefs from "stdint.h" for such a purpose. You have plenty of assumptions of what the width of integer types would be, don't do that.
What will happen if the s or c members
of the union are used to set its
values?
Reading a member of a union that has been written to through another member may cause undefined behavior if there are padding bytes or padding bits. The only exception from that is unsigned char that may always be used to access the individual bytes. So access through c is fine. Access through s may (in very unlikely circumstances) cause undefined behavior.
And there is no such thing like a "correct" cast in your case. It simply depends on how you want to interpret an array of small numbers as one big number. One possible interpretation for that task is the one you gave.
This code should work independantly of padding, endianess, union accessing and implicit integer promotions.
uint64_t gtu64_to_uint64_cast (const GT_U64* number_u)
{
uint64_t casted_number = 0;
uint8_t i;
for(i=0; i<8; i++)
{
casted_number |= (uint64_t) number_u->c[i] << i*8U;
}
return casted_number;
}
If you can't change the declaration of the union to include an explicit 64-bit field, perhaps you can just wrap it? Like this:
UINT64 convert(const GT_U64 *value)
{
union {
GT_U64 in;
UINT64 out;
} tmp;
tmp.in = *value;
return tmp.out;
}
This does violate the rule that says you can only read from the union member last written to, so maybe it'll set your hair on fire. I think it will be quite safe though, don't see a case where a union like this would include padding but of course I could be wrong.
I mainly wanted to include this since just because you can't change the declaration of the "input" union doesn't mean you can't do almost the same thing by wrapping it.
Probably an easier way to cast is to use union with a long long member:
typedef unsigned long long int UINT64;
typedef unsigned long GT_U32;
typedef unsigned short GT_U16;
typedef unsigned char GT_U8;
typedef union
{
GT_U8 c[8];
GT_U16 s[4];
GT_U32 l[2];
UINT64 ll;
} GT_U64;
Then, simply accessing ll will get the 64-bit value without having to do an explicit cast. You will need to tell your compiler to use one-byte struct packing.
You don't specify what "cast the values correctly" means.
This code will cast in the simplest possible way, but it'll give different results depending on your systems endianness.
UINT64 gtu64_to_uint64_cast(GT_U64 number_u) {
assert(sizeof(UINT64) == sizeof(GT_U64));
return *(UINT64 *) &number_u;
}

Pointer casting problem with struct array member

I've run across this source in a legacy code base and I don't really know why exactly it behaves the way it does.
In the following code, the pData struct member either contains the data or a pointer to the real data in shared memory. The message is sent using IPC (msgsnd() and msgrcv()). Using the pointer casts (that are currently commented out), it fails using GCC 4.4.1 on an ARM target, the member uLen gets modified. When using memcpy() and everything works as expected. I can't really see what is wrong with the pointer casting. What is wrong here?
typedef struct {
long mtype;
unsigned short uRespQueue;
unsigned short uID;
unsigned short uLen;
unsigned char pData[8000];
} message_t;
// changing the pointer in the struct
{
unsigned char *pData = <some_pointer>;
#if 0
*((unsigned int *)pMessage->pData) = (unsigned int)pData;
#else
memcpy(pMessage->pData, &pData, sizeof(unsigned int));
#endif
}
// getting the pointer out
{
#if 0
unsigned char *pData; (unsigned char *)(*((unsigned int *)pMessage->pData));
#else
unsigned char *pData;
memcpy(&pData, pMessage->pData, sizeof(int));
#endif
}
I suspect it's an alignment problem and either GCC or the processor is trying to compensate. The structure is defined as:
typedef struct {
long mtype;
unsigned short uRespQueue;
unsigned short uID;
unsigned short uLen;
unsigned char pData[8000];
} message_t;
Assuming normal alignment restrictions and a 32-bit processor, the offsets of each field are:
mtype 0 (alignment 4)
uRespQueue 4 (alignment 2)
uID 6 (alignment 2)
uLen 8 (alignment 2)
pData 10 (alignment 1)
On all but the most recent versions of the ARM processor, memory access must be aligned on the ARM processor and with the casting:
*((unsigned int *)pMessage->pData) = (unsigned int)pData;
you are attempting to write a 32-bit value on a misaligned address. To correct the alignment, the address appears to have truncated the LSB's of the address to have the proper alignment. Doing so happened to overlap with the uLen field causing the problem.
To be able to handle this correctly, you need to make sure that you write the value to a properly aligned address. Either offset the pointer to align it or make sure pData is aligned to be able to handle 32-bit data. I would redefine the structure to align the pData member for 32-bit access.
typedef struct {
long mtype;
unsigned short uRespQueue;
unsigned short uID;
unsigned short uLen;
union { /* this will add 2-bytes of padding */
unsigned char *pData;
unsigned char rgData[8000];
};
} message_t;
The structure should still occupy the same amount of bytes since it has a 4-byte alignment due to the mtype field.
Then you should be able to access the pointer:
unsigned char *pData = ...;
/* setting the pointer */
pMessage->pData = pData;
/* getting the pointer */
pData = pMessage->pData;
That is a very nasty thing to do (the thing that's compiled out). You're trying basically to hack the code, and instead of using the data copy in the message (in the provided 8000 bytes for it), you try to put a pointer, and pass it through IPC.
The main issue is sharing memory between processes. Who knows what happens to that pointer after you send it? Who knows what happens to the data it points to? That's a very bad habbit to send out a pointer to data that is not under your control (i.e.: not protected/properly shared).
Another thing that might happen, and is probably what you're actually talking about, is the alignment. The array is of char's, the previous member in the struct is short, the compiler might attempt packing them. Recasting char[] to int * means that you take memory area and represent it as something else, without telling the compiler. You're stomping over the uLen by the cast.
memcopy is the proper way to do it.
The point here is the code "int header = (((int)(txUserPtr) - 4))"
Illustration of UserTypes and struct pointer casting is great of help!
typedef union UserTypes
{
SAUser AUser;
BUser BUser;
SCUser CUser;
SDUser DUser;
} UserTypes;
typedef struct AUser
{
int userId;
int dbIndex;
ChannelType ChanType;
} AUser;
typedef struct AUser
{
int userId;
int dbIndex;
ChannelType ChanType;
} AUser;
typedef struct BUser
{
int userId;
int dbIndex;
ChannelType ChanType;
} BUser;
typedef struct CUser
{
int userId;
int dbIndex;
ChannelType ChanType;
} CUser;
typedef struct DUser
{
int userId;
int dbIndex;
ChannelType ChanType;
} DUser;
//this is the function I want to test
void Fun(UserTypes * txUserPtr)
{
int header = (*((int*)(txUserPtr) - 4));
//the problem is here
//how should i set incoming pointer "txUserPtr" so that
//Fun() would skip following lines.
// I don't want to execute error()
if((header & 0xFF000000) != (int)0xAA000000)
{
error("sth error\n");
}
/*the following is the rest */
}

Resources