c struct and memcpy (byte array) - c

I'm receiving byte buffer array and trying to copy it to a structure:
my structure is:
typedef struct mydata_block
{
uint8_t cmd;
uint32_t param;
char str_buf[10];
uint32_t crc32;
} mydata_t;
first, the program that sends the data as following:
blockTX.cmd = 2
blockTX.str_buf = "eee789"
blockTX.param = 1001
blockTX.crc32 = 3494074521
-
02-00-00-00-E9-03-00-00-65-65-65-37-38-39-00-00-00-00-00-00-99-58-43-D0
when the data is recieved im copying the data to the structure using the memcpy code below:
memcpy((uint8_t *)&blockRX,(uint8_t *)usbd_cdc_buffer,sizeof(blockRX));
everything looks fine, but not the cmd (its 1 byte but there is padding? in structure?) how do i fix this?

Transfering data needs to consider padding, sizes, endianess etc so you need to generate and parse the byte stream correctly. You can use something like googloe protobuf to serialize and deserialize your data protable and comfortable.
But if you must you can give the structure the packed attribute. This removes all the padding and alignment restrictions. That lets you memcpy() the struct without paddings but at the cost of slower access to the members of the struct itself. There are only two good reasons to do this:
The alignemnt and padding of the struct is determined by forces outside your control (has to match hardware or 3rd party software).
As intermediate step to converting the data into host format.

Related

Memcpy in C - Using arrays in for loops then assigning to a struct

I am using a for loop to copy an array from a UART RX buffer to memory.
This looks like as follows:
UART2_readTimeout(uartR2, rxBuf3, 54, NULL, 500);
GPIO_toggle(CONFIG_GPIO_LED_3);
if ((rxBuf3[4] == 0x8C) && (rxBuf3[10] != 0x8C)) {
int i;
for (i = 0; i < 47; i++) {
sioR2[i]=rxBuf3[i];
}
I want to then use a struct such as the following to make it possible to use dot notation when working with and organizing the data:
typedef struct
{
uint16_t voltage;
uint16_t current;
uint16_t outTemp;
uint16_t inTemp;
uint16_t status;
uint32_t FaultA;
uint32_t FaultB;
uint32_t FaultC;
uint32_t FaultD;
uint8_t softwareMode;
uint8_t logicLoad;
uint8_t outputBits;
uint16_t powerOut;
uint32_t runHours;
uint16_t unitAddresses[6];
} unitValues;
Assuming the total length of these are the same, is it possible to perform a memcpy on the entire array to a single instance of the struct?
Array : 001110101....110001
||||||||||||||||||| <- memcpy
vvvvvvvvvvvvvvvvvvv
Struct : 001110101....110001
Provided that your C implementation offers a way to ensure that the layout of your structure is the same as the layout that the driver in question uses for writing the buffer, a pretty good way to go about this would be to have the driver write directly into the structure. I'm inferring the signature of the driver function here, but that would probably be something like:
UART2_readTimeout(uartR2, (uint8_t *) &values, 54, NULL, 500);
Assuming that uint8_t is an alias for unsigned char or maybe char, it is valid to write into the representation of the structure via a pointer of type uint8_t *. Thus, this avoids you having to make a copy.
The trick, however, is the structure layout. Supposing that you expect the data to be laid out as the structure members given, in the order given, with no gaps, such a structure layout would prevent structure instances being positioned in memory so that all members are aligned on addresses that are multiples of their sizes. Depending on the alignment rules of your hardware, this might be perfectly fine, but probably either it would slow accesses to some of the members, or it would make attempts to access some of the members crash the program.
if you still want to proceed then you will need to check your compiler's documentation for information about how to get the wanted layout of your structure. You might look for references to structure "packing", structure layout, or structure member alignment. There is no standard way to do this -- if your C implementation supports it at all then that constitutes an extension, with implementation-specific details.
All the same issues and caveats would apply to using memcpy to copy the buffer contents onto an instance of your structure type, so if you don't multiple copies of the data and you can arrange to make bulk copy onto the structure work, then you're better off writing directly onto the structure than writing into a separate buffer and then copying.
On the other hand, the safe and standard alternative would be to allow your implementation to lay out the structure however it thinks is best, And to copy the data out of your buffer into the structure in member-by-member fashion, with per-member memcpy()s. Yes, the code will be a bit tedious, but it will not be sensitive to alignment-related issues, nor even to reordering structure members or adding new ones.
You have to change the packing align to 1 byte for the structure.
#pragma pack(1) /* change */
typedef struct {
...
}
#pragma pack() /* restore */
In theory, you can use memcpy() to set the member fields of a struct from the elements of a byte array. However, you will need to be very careful to prevent your compiler from adding 'empty' fields to your struct (see: Structure padding and packing) unless those empty fields are taken into account when loading the data into the source array. (The elements of the source array will be packed into contiguous memory.)
Different compilers use different command-line and/or #pragma options to control structure packing but, for the MSVC compiler, you can use the #pragma pack(n) directives in your source code or the /Zp command-line switch.
Using the MSVC compiler, the structure you have provided will have a total size of 47 bytes only if you have single-byte packing; for default packing, the size will be 52 bytes.
The following code block shows where these 'extra' bytes will be inserted for different packing sizes.
#pragma pack(push, 1) // This saves the current packing level then sets it to "n" (1, here)
typedef struct {
uint16_t voltage;
uint16_t current;
uint16_t outTemp;
uint16_t inTemp;
uint16_t status;
// 4+ byte packing will insert two bytes here
uint32_t FaultA;
uint32_t FaultB;
uint32_t FaultC;
uint32_t FaultD;
uint8_t softwareMode;
uint8_t logicLoad;
uint8_t outputBits;
// 2+ byte packing will insert one byte here
uint16_t powerOut;
// 4+ byte packing will insert two bytes here
uint32_t runHours;
uint16_t unitAddresses[6];
} unitValues;
#pragma pack(pop) // This restores the previous packing level
So, the sizeof(unitValues) will be:
47 bytes when using #pragma pack(1)
48 bytes when using #pragma pack(2)
52 bytes when using #pragma pack(4) (or any higher/default value)

Copying structure to char* buffer

Basicly i have a custom structure that contains different kind of data. For example:
typedef struct example_structure{
uint8_t* example_1[4];
int example_2[4];
int example_3;
} example_structure;
What i need to do is to copy context of this structure to a const char* buffer so i can send that copied data (buffer) using winsock2's send(SOCKET s, const char* buffer, int len, int flags) function. I tried using memcpy() but wouldn't i just copy address of pointers and not the data?
Yes, if you copied or sent that structure through a socket you would end up copying/sending pointers, which would obviously be meaningless to the recipient, however, if the recipient is running on different hardware (e.g. not the same endian), all of the data may be meaningless anyway. On top of that, differences in the amount of padding between structure members may also become a problem.
For non-trivial situations it is best to use an existing protocol (such as protobuf), or roll your own protocol, keeping in mind the potential differences in hardware representation of your data.
You need to design a protocol before you can encode the data in accord with that protocol. Decide exactly how the data will be encoded at the byte level. Then write code to encode and decode to that format that you decided on.
Do not skip the step of actually documenting the wire protocol at the byte level. It will save you pain later, I promise.
See this answer for a bit more detail.
const char* buffer
This buffer has a constant value so u cant copy anything to it. You probably don't need to copy anything. Just use send function in such a way:
send(s, (char*)&example_structure, sizeof(structure), flags)
But here is the problem with pointers in your structure (uint8_t* example_1[4];).
Sending pointers between different applications / machine does not make sense.
Hmm, your struct contains uint8_t * fields, what looks like C strings... It does not make sense copying or sending a pointer which is just a mere memory address in sending process user space.
If your struct has been (note, no pointers):
typedef struct example_structure{
uint8_t example_1[4];
int example_2[4];
int example_3;
} example_structure;
and provided you transfer it on exactly same architecture (same hardware, same compiler, same compiler options), you could do simply:
example_structure ex_struc;
// initialize the struct
...
send(s, &ex_struc, sizeof(ex_struc), flags);
And even in that case, I would strongly advise you to define and use a protocol - as already said by #DavidSchwartz, it could save you time and headaches later...
But as you have pointers, you cannot do that and must define a protocol.
it could be (but you are free to prefere little endian order, or 2 or 8 bytes for each int depending on your actual data):
one byte (or two) for length of first uint8_t array, followed by the array
above repeated 3 more times
four bytes in big endian order for first int of example_2
repeated 3 times
four bytes in big endian order for int of example_3
This clearly defines the format of a message.

Portable way to find size of a packed structure in C

I'm coding a network layer protocol and it is required to find a size of packed a structure defined in C. Since compilers may add extra padding bytes which makes sizeof function useless in my case. I looked up Google and find that we could use ___attribute(packed)___ something like this to prevent compiler from adding extra padding bytes. But I believe this is not portable approach, my code needs to support both windows and linux environment.
Currently, I've defined a macro to map packed sizes of every structure defined in my code. Consider code below:
typedef struct {
...
} a_t;
typedef struct {
...
} b_t;
#define SIZE_a_t 8;
#define SIZE_b_t 10;
#define SIZEOF(XX) SIZE_##XX;
and then in main function, I can use above macro definition as below:-
int size = SIZEOF(a_t);
This approach does work, but I believe it may not be best approach. Any suggestions or ideas on how to efficiently solve this problem in C?
Example
Consider the C structure below:-
typedef struct {
uint8_t a;
uint16_t b;
} e_t;
Under Linux, sizeof function return 4 bytes instead of 3 bytes. To prevent this I'm currently doing this:-
typedef struct {
uint8_t a;
uint16_t b;
} e_t;
#define SIZE_e_t 3
#define SIZEOF(XX) SIZE_##e_t
Now, when I call SIZEOF(e_t) in my functin, it should return 3 not 4.
sizeof is the portable way to find the size of a struct, or of any other C data type.
The problem you're facing is how to ensure that your struct has the size and layout that you need.
#pragma pack or __attribute__((packed)) may well do the job for you. It's not 100% portable (there's no mention of packing in the C standard), but it may be portable enough for your current purposes, but consider whether your code might need to be ported to some other platform in the future. It's also potentially unsafe; see this question and this answer.
The only 100% portable approach is to use arrays of unsigned char and keep track of which fields occupy which ranges of bytes. This is a lot more cumbersome, of course.
Your macro tells you the size that you think the struct should have, if it has been laid out as you intend.
If that's not equal to sizeof(a_t), then whatever code you write that thinks it is packed isn't going to work anyway. Assuming they're equal, you might as well just use sizeof(a_t) for all purposes. If they're not equal then you should be using it only for some kind of check that SIZEOF(a_t) == sizeof(a_t), which will fail and prevent your non-working code from compiling.
So it follows that you might as well just put the check in the header file that sizeof(a_t) == 8, and not bother defining SIZEOF.
That's all aside from the fact that SIZEOF doesn't really behave like sizeof. For example consider typedef a_t foo; sizeof(foo);, which obviously won't work with SIZEOF.
I don't think, that specifying size manually is more portable, than using sizeof.
If size is changed your const-specified size will be wrong.
Attribute packed is portable. In Visual Studio it is #pragma pack.
I would recommend against trying to read/write data by overlaying it on a struct. I would suggest instead writing a family of routines which are conceptually like printf/scanf, but which use format specifiers that specify binary data formats. Rather than using percent-sign-based tags, I would suggest simply using a binary encoding of the data format.
There are a few approaches one could take, involving trade-off between the size of the serialization/deserialization routines themselves, the size of the code necessary to use them, and the ability to handle a variety of deserialization formats. The simplest (and most easily portable) approach would be to have routines which, instead of using a format string, process items individually by taking a double-indirect pointer, read some data type from it, and increment it suitably. Thus:
uint32_t read_uint32_bigendian(uint8_t const ** src)
{
uint8_t *p;
uint32_t tmp;
p = *src;
tmp = (*p++) << 24;
tmp |= (*p++) << 16;
tmp |= (*p++) << 8;
tmp |= (*p++);
*src = p;
}
...
char buff[256];
...
uint8_t *buffptr = buff;
first_word = read_uint32_bigendian(&buffptr);
next_word = read_uint32_bigendian(&buffptr);
This approach is simple, but has the disadvantage of having lots of redundancy in the packing and unpacking code. Adding a format string could simplify it:
#define BIGEND_INT32 "\x43" // Or whatever the appropriate token would be
uint8_t *buffptr = buff;
read_data(&buffptr, BIGEND_INT32 BIGEND_INT32, &first_word, &second_word);
This approach could read any number of data items with a single function call, passing buffptr only once, rather than once per data item. On some systems, it might still be a bit slow. An alternative approach would be to pass in a string indicating what sort of data should be received from the source, and then also pass in a string or structure indicating where the data should go. This could allow any amount of data to be parsed by a single call giving a double-indirect pointer for the source, a string pointer indicating the format of data at the source, a pointer to a struct indicating how the data should be unpacked, and a a pointer to a struct to hold the target data.

Invalid sizeof() struct, gap between members

I have a struct like this:
typedef struct _HEADER_IO
{
uint8_t field1 : 2;
uint8_t field2 : 4;
uint8_t field3 : 1;
uint8_t field4 : 1;
uint16_t field5;
uint8_t field6;
} HEADER_IO;
It's basicly a message header that will be sent over tcp. The server reads this so that it knows what data follows in the buffer. However for some reason intead of the size being 4 bytes (2+4+1+1 first byte + 2 bytes from field 5 + 1 byte field 6) the size is 6 bytes.
Looking it up in memory view it is:
XX AA XX XX XX AA
Instead of:
XX XX XX XX
Where AA are never set no matter what I do. This is a problem because I am planning for the header to be send() to a server and the extra bytes are included making the server interpret the header wrong. What am I doing wrong?
In general, it's a bad idea to use bitfields for things like these. Since you can't know beforehand exactly which byte the bits will end up in, and since there are padding and alignment issues.
In my opinion, it's better to "own up" to the fact that you need more control over the external representation than what C structures give you, and do it manually. You can of course keep the struct as the in-memory (internal) representation.
Basically, you would write a function like:
size_t header_serialize(unsigned char *buf, size_t max, const HEADER_IO *header);
whose job it would be to, in the memory at buf, build the proper byte sequence that represents header.
To clarify (based on comments), the intent is to read the fields from header, not just do e.g.
memcpy(buf, header, sizeof *header); /* DON'T DO THIS! */
Instead, you're supposed to assemble the expected external representation, byte by byte, from the fields of header. That way, you always get the same external representation regardless of what the compiler does to the in-memory format of header.
In standard C you can't help the fact that struct members can have padding inserted between them. You have to write a function to decode the data and store it in your struct before processing. This is because on some architectures unaligned memory access (reading from a pointer not aligned to, for example, 4 bytes) is very expensive and C will automatically pad your structures to avoid the cost. There's no standard way to turn the feature on or off.
For example in GCC you can add __attribute__((packed)) after the struct definition and Visual Studio has some #pragma commands (see http://gcc.gnu.org/onlinedocs/gcc/Structure_002dPacking-Pragmas.html) that are also supported by GCC but beware that overall this is non-standard.
Since your comments mentioned it's a Windows program, probably it would work if you add this before the struct definition:
#pragma pack(push,1)
And this after it:
#pragma pack(pop)
While it would be more portable to write code to more manually decode the header, the above approach should be faster.

Send a struct over a socket with correct padding and endianness in C

I have several structures defined to send over different Operating Systems (tcp networks).
Defined structures are:
struct Struct1 { uint32_t num; char str[10]; char str2[10];}
struct Struct2 { uint16_t num; char str[10];}
typedef Struct1 a;
typedef Struct2 b;
The data is stored in a text file.
Data Format is as such:
123
Pie
Crust
Struct1 a is stored as 3 separate parameters. However, struct2 is two separate parameters with both 2nd and 3rd line stored to the char str[] . The problem is when I write to a server over the multiple networks, the data is not received correctly. There are numerous spaces that separate the different parameters in the structures. How do I ensure proper sending and padding when I write to server? How do I store the data correctly (dynamic buffer or fixed buffer)?
Example of write: write(fd,&a, sizeof(typedef struct a)); Is this correct?
Problem Receive Side Output for struct2:
123( , )
0 (, Pie)
0 (Crust,)
Correct Output
123(Pie, Crust)
write(fd,&a, sizeof(a)); is not correct; at least not portably, since the C compiler may introduce padding between the elements to ensure correct alignment. sizeof(typedef struct a) doesn't even make sense.
How you should send the data depends on the specs of your protocol. In particular, protocols define widely varying ways of sending strings. It is generally safest to send the struct members separately; either by multiple calls to write or writev(2). For instance, to send
struct { uint32_t a; uint16_t b; } foo;
over the network, where foo.a and foo.b already have the correct endianness, you would do something like:
struct iovec v[2];
v[0].iov_base = &foo.a;
v[0].iov_len = sizeof(uint32_t);
v[1].iov_base = &foo.b;
v[1].iov_len = sizeof(uint16_t);
writev(fp, v, 2);
Sending structures over the network is tricky. The following problems you might have
Byte endiannes issues with integers.
Padding introduced by your compiler.
String parsing (i.e. detecting string boundaries).
If performance is not your goal, I'd suggest to create encoders and decoders for each struct to be send and received (ASN.1, XML or custom). If performance is really required you can still use structures and solve (1), by fixing an endianness (i.e. network byte
order) and ensure your integers are stored as such in those structures, and (2) by fixing a compiler and using the pragmas or attributes to enforce a "packed" structure.
Gcc for example uses attribute((packed)) as such:
struct mystruct {
uint32_t a;
uint16_t b;
unsigned char text[24];
} __attribute__((__packed__));
(3) is not easy to solve. Using null terminated strings at a network protocol
and depending on them being present would make your code vulnerable to several attacks. If strings need to be involved I'd use an proper encoding method such as the ones suggested above.
The easy way would be to write two functions for each structure: one to convert from textual representation to the struct and one to convert a struct back to text. Then you just send the text over the network and on the receiving side convert it to your structures. That way endianness does not matter.
There are conversion functions to ensure portability of binary integers across a network. Use htons, htonl, ntohs and ntohl to convert 16 and 32 bit integers from host to network byte order and vice versa.

Resources