I'm a "new" C programmer, but an old assembly programmer, and have been searching for an answer for a few days.
I'm trying to parse multiple fields in a message with the C struct construct, (It's a LORA radio with an embedded RTU modbus packet).
I have This example code that shows my question:
#include <stdio.h>
#include <stdint.h>
struct MessageTable{
uint8_t msg_id;
uint8_t from;
uint8_t to;
unsigned flags1 : 1;
unsigned retransmitted : 1;
unsigned hops : 4;
union {
unsigned long millisecs;
unsigned char bytes[sizeof(unsigned long)];
} ms;
};
struct MessageTable message, *mp;
struct MessageTable message_table[8] = {0};
char buf[256];
void main(void) {
int i;
for (i=0; i<255; i++)
buf[i] = i;
mp = (struct MessageTable) &buf;
printf("To: %u, From: %u", mp->to, mp->from);
}
When I try to compile I get:
question.c: In function ‘main’:
question.c:27:18: error: conversion to non-scalar type requested
27 | mp = (struct MessageTable) &buf;
| ^~~~~~~~~~~~
What I'm attempting to do is, overlay the struct in the buffer space at some arbitrary position for named access to the different fields instead of using hard coded offsets (I.E. to=buf[2]; and retransmitted = buf[3]&02x;
What is the clean, readable, appropriate way to do this?
NOTE: there will be multiple structs at different buf positions (LORA routing, Modbus Send, Modbus Rx, Modbus err, etc...)
and, this is straight C, not C++.
I don't care if the buffer "runs off" the end of the struct, the code constructs take care of that.
First to address your error message on this line:
mp = (struct MessageTable) &buf;
Here you're attempting to convert &buf, which has type char (*)[256] i.e. a pointer to an array, to a struct MessageTable which is not a pointer type. Arrays in most contexts decay to a pointer to the first element, so you don't need to take its address, and you need to cast it to a pointer type:
mp = (struct MessageTable *)buf;
The other issue however is:
The struct might not be exactly the size you expect
The order of bitfieds may not be what you expect
If the buffer is not properly aligned for the fields in the struct you could generate a fault.
You have two problems in:
mp = (struct MessageTable) &buf;
The first is buf is already a pointer due to array/pointer conversion. C11 Standard - 6.3.2.1 Other Operands - Lvalues, arrays, and function designators(p3)
The second problem is you are casting to struct MessageTable instead of a Pointer to struct MessageTable. You can correct both with:
mp = (struct MessageTable*) buf;
Also, unless you are programming in a freestanding environment (without the benefit of any OS), in a standards conforming implementation, the allowable declarations for main for are int main (void) and int main (int argc, char *argv[]) (which you will see written with the equivalent char **argv). See: C11 Standard - §5.1.2.2.1 Program startup(p1). See also: What should main() return in C and C++? In a freestanding environment, the name and type of the function called at program startup are implementation-defined. See: C11 Standard - 5.1.2.1 Freestanding environment
Putting it altogether you would have:
#include <stdio.h>
#include <stdint.h>
struct MessageTable{
uint8_t msg_id;
uint8_t from;
uint8_t to;
unsigned flags1 : 1;
unsigned retransmitted : 1;
unsigned hops : 4;
union {
unsigned long millisecs;
unsigned char bytes[sizeof(unsigned long)];
} ms;
};
struct MessageTable message, *mp;
struct MessageTable message_table[8] = {0};
char buf[256];
int main(void) {
int i;
for (i=0; i<255; i++)
buf[i] = i;
mp = (struct MessageTable*) buf;
printf("To: %u, From: %u", mp->to, mp->from);
}
Example Use/Output
$ ./bin/struct_buf_overlay
To: 2, From: 1
C struct fields are, by default, not guaranteed to be immediately adjacent to one other, and furthermore bitfields can be reordered. Implementations are permitted to reorder bitfields and implement padding in order to efficiently meet system memory alignment requirements. If you need to guarantee that struct fields are positioned in memory immediately adjacent to one another (without padding) and in the order you specified, you need to look up how to tell your compiler to create a packed struct. This is not standard C (but it's necessary to ensure that what you're trying to accomplish will work--it might, but is not guaranteed, to work otherwise), and each compiler has its own way of doing it.
Related
Does C have anything similar to C++ where one can place structs in an unsigned char buffer as is done in C++ as shown in the standard sec. 6.7.2
template<typename ...T>
struct AlignedUnion {
alignas(T...) unsigned char data[max(sizeof(T)...)];
};
int f() {
AlignedUnion<int, char> au;
int *p = new (au.data) int; // OK, au.data provides storage
char *c = new (au.data) char(); // OK, ends lifetime of *p
char *d = new (au.data + 1) char();
return *c + *d; // OK
}
In C I can certainly memcpy a struct of things(or int as shown above) into an unsigned char buffer, but then using a pointer to this struct one runs into strict aliasing violations; the buffer has different declared type.
So suppose one would want to replicate the second line in f the C++ above in C. One would do something like this
#include<string.h>
#include<stdio.h>
struct Buffer {
unsigned char data[sizeof(int)];
};
int main()
{
struct Buffer b;
int n = 5;
int* p = memcpy(&b.data,&n,sizeof(int));
printf("%d",*p); // aliasing violation here as unsigned char is accessed as int
return 0;
}
Unions are often suggested i.e. union Buffer {int i;unsigned char b[sizeof(int)]}; but this is not quite as nice if the aim of the buffer is to act as storage (i.e. placing different sized types in there, by advancing a pointer into the buffer to the free part + potenially some more for proper alignment).
Have you tried using a union?
#include <string.h>
#include <stdio.h>
union Buffer {
int int_;
double double_;
long double long_double_;
unsigned char data[1];
};
int main() {
union Buffer b;
int n = 5;
int *p = memcpy(&b.data, &n, sizeof(int));
printf("%d", *p); // aliasing violation here as unsigned char is accessed as int
return 0;
}
The Buffer aligns data member according the type with the greatest alignment requirement.
Yes, because of strict aliasing rule it is just not possible. As it is not possible to write a standard compliant malloc().
Your buffer is not aligned - alignas(int) from stdalign.h needs to be added.
If you want to protect against compiler optimizations, either:
just cast the pointer and access it and compile with -fno-strict-aliasing, or use volatile
or move the accessor to the buffer to another file that is compiled without LTO so that compiler just is not able to optimize it.
// mybuffer.c
#include <stdalign.h>
alignas(int) unsigned char buffer[sizeof(int)];
void *getbuffer() { return buffer; }
// main.c
#include <string.h>
#include <stdio.h>
#include "mybuffer.h"
int main() {
void *data = getbuffer();
// int *p = new (au.data) int; // OK, au.data provides storage
int *p = data;
// char *c = new (au.data) char(); // OK, ends lifetime of *p
char *c = data;
*c = 0;
// char *d = new (au.data + 1) char();
char *d = (char*)data + 1;
*d = 0;
return *c + *d;
}
The way the definition of Effective Type in 6.5p6 is written, it's unclear what it's supposed to mean in all corner cases--likely because there was never a consensus among Committee Members as to how all corner cases should be handled. Defect reports often add more confusion than clarity, since they use terms like the "active member" of a union when neither the Standard nor the defect reports specify what actions would set or change it.
If one wants to use an object of static or automatic duration as though it were a buffer without a declared type, a safe way of doing that should be to do something like the following:
void volatile *volatile dummy_vp;
void test(void)
{
union {
char dat[1000];
unsigned long force_alignment;
} buffer;
void *volatile launder = buffer.dat;
dummy_vp = &launder;
void *storage_blob = launder;
...
}
Unless an implementation goes out of its way to test whether the read of
launder happened to yield an address matching buffer.dat, it would have no way of knowing whether the object at that address had a declared type. Nothing in the Standard would forbid an implementation from behaving nonsensically if the address happened to match that of buffer.dat, but situations where performance improvements would justify the cost of the check aren't likely to be common enough for compilers to attempt such "optimization".
Consider the following code:
typedef struct { char byte; } byte_t;
typedef struct { char bytes[10]; } blob_t;
int f(void) {
blob_t a = {0};
*(byte_t *)a.bytes = (byte_t){10};
return a.bytes[0];
}
Does this give aliasing problems in the return statement? You do have that a.bytes dereferences a type that does not alias the assignment in patch, but on the other hand, the [0] part dereferences a type that does alias.
I can construct a slightly larger example where gcc -O1 -fstrict-aliasing does make the function return 0, and I'd like to know if this is a gcc bug, and if not, what I can do to avoid this problem (in my real-life example, the assignment happens in a separate function so that both functions look really innocent in isolation).
Here is a longer more complete example for testing:
#include <stdio.h>
typedef struct { char byte; } byte_t;
typedef struct { char bytes[10]; } blob_t;
static char *find(char *buf) {
for (int i = 0; i < 1; i++) { if (buf[0] == 0) { return buf; }}
return 0;
}
void patch(char *b) {
*(byte_t *) b = (byte_t) {10};
}
int main(void) {
blob_t a = {0};
char *b = find(a.bytes);
if (b) {
patch(b);
}
printf("%d\n", a.bytes[0]);
}
Building with gcc -O1 -fstrict-aliasing produces 0
The main issue here is that those two structs are not compatible types. And so there can be various problems with alignment and padding.
That issue aside, the standard 6.5/7 only allows for this (the "strict aliasing rule"):
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
...
an aggregate or union type that includes one of the aforementioned types among its members
Looking at *(byte_t *)a.bytes, then a.bytes has the effective type char[10]. Each individual member of that array has in turn the effective type char. You de-reference that with byte_t, which is not a compatible struct type nor does it have a char[10] among its members. It does have char though.
The standard is not exactly clear how to treat an object which effective type is an array. If you read the above part strictly, then your code does indeed violate strict aliasing, because you access a char[10] through a struct which doesn't have a char[10] member. I'd also be a bit concerned about the compiler padding either struct to meet alignment.
Generally, I'd simply advise against doing fishy things like this. If you need type punning, then use a union. And if you wish to use raw binary data, then use uint8_t instead of the potentially signed & non-portable char.
The error is in *(byte_t *)a.bytes = (byte_t){10};. The C spec has a special rule about character types (6.5§7), but that rule only applies when using character type to access any other type, not when using any type to access a character.
According to the Standard, the syntax array[index] is shorthand for *((array)+(index)). Thus, p->array[index] is equivalent to *((p->array) + (index)), which uses the address of p to compute the address of p->array, and then without regard for p's type, adds index (scaled by the size of the array-element type), and then dereferences the resulting pointer to yield an lvalue of the array-element type. Nothing in the wording of the Standard would imply that an access via the resulting lvalue is an access to an lvalue of the underlying structure type. Thus, if the struct member is an array of character type, the constraints of N1570 6.5p7 would allow an lvalue of that form to access storage of any type.
The maintainers of some compilers such as gcc, however, appear to view the laxity of the Standard there as a defect. This can be demonstrated via the code:
struct s1 { char x[10]; };
struct s2 { char x[10]; };
union s1s2 { struct s1 v1; struct s2 v2; } u;
int read_s1_x(struct s1 *p, int i)
{
return p->x[i];
}
void set_s2_x(struct s2 *p, int i, int value)
{
p->x[i] = value;
}
__attribute__((noinline))
int test(void *p, int i)
{
if (read_s1_x(p, 0))
set_s2_x(p, i, 2);
return read_s1_x(p, 0);
}
#include <stdio.h>
int main(void)
{
u.v2.x[0] = 1;
int result = test(&u, 0);
printf("Result = %d / %d", result, u.v2.x[0]);
}
The code abides the constraints in N1570 6.5p7 because it all accesses to any portion of u are performed using lvalues of character type. Nonetheless, the code generated by gcc will not allow for the possibility that the storage accessed by (*(struct s1))->x[0] might also be accessed by (*(struct s2))->x[i] despite the fact that both accesses use lvalues of character type.
Often in embedded programming (but not limited to) there is a need to serialize some arbitrary struct in order to send it over some communication channel or write to some memory.
Example
Let's consider a structure composed of different data types in a N-aligned memory region:
struct
{
float a;
uint8_t b;
uint32_t c;
} s;
Now let's assume we have a library function
void write_to_eeprom(uint32_t *data, uint32_t len);
which is taking the pointer to data to be written as a uint32_t*. Now we would like to write s to the eeprom using this function. A naive approach would be to do something like
write_to_eeprom((uint32_t*)&s, sizeof(s)/4);
But it is a clear violation of the strict aliasing rule.
Second example
struct
{
uint32_t a;
uint8_t b;
uint32_t c;
} s;
In this case the aliasing (uint32_t*)&s is not violating the rule, as the pointer is compatible with the pointer to the first field type, which is legal. But! The library function can be implemented such that it is doing some pointer arithmetic to iterate the input data, while this arithmetic resulting pointers are incompatible with the data they are pointing to (for example data+1 is the pointer of type uint32_t*, but it might point to the uint8_t field). Which again a violation of the rule, as I understand it.
Possible solution?
Wrap the problematic structure in a union with array of the desired type:
union
{
struct_type s;
uint32_t array[sizeof(struct_type) / 4];
} u;
And pass the u.array to the library function.
Is this the right way to do this? Is this the only right way to do this? What could be some other approaches?
Just a note I am not entirely sure but it can be that it is not always safe to cast uint8_t* to char*(here).
Regardless, what does the last parameter of your write function want, number of bytes to write - or number of uint32_t elements? Let's assume later, and also assume you want to write each member of the struct to separate integer. You can do this:
uint32_t dest[4] = {0};
memcpy(buffer, &s.a, sizeof(float));
memcpy(buffer+1, &s.b, sizeof(uint8_t));
memcpy(buffer+2, &s.c, sizeof(uint32_t));
write_to_eeprom(buffer, 3 /* Nr of elements */);
If you want to copy the structure elements to the integer array consecutively - you can first copy the structure member to a byte array consecutively - and then copy the byte array to the uint32_t array. And also pass number of bytes as last parameter which would be - sizeof(float)+sizeof(uint8_t)+sizeof(uint32_t)
Consider that writing to eeprom is often slower, sometimes a lot slower, than writing to normal memory, that using an intervening buffer is rarely a performance drag. I realize this goes against this comment, yet I feel it deserves consideration as it handles all other C concerns
Write a helper function that has no alignment, aliasing nor size issues
extern void write_to_eeprom(/* I'd expect const */ uint32_t *data, uint32_t len);
// Adjust N per system needs
#define BYTES_TO_EEPROM_N 16
void write_bytes_to_eeprom(const void *ptr, size_t size) {
const unsigned char *byte_ptr = ptr;
union {
uint32_t data32[BYTES_TO_EEPROM_N / sizeof (uint32_t)];
unsigned char data8[BYTES_TO_EEPROM_N];
} u;
while (size >= BYTES_TO_EEPROM_N) {
memcpy(u.data8, byte_ptr, BYTES_TO_EEPROM_N); // **
byte_ptr += BYTES_TO_EEPROM_N;
write_to_eeprom(u.data32, BYTES_TO_EEPROM_N / sizeof (uint32_t));
size -= BYTES_TO_EEPROM_N;
}
if (size > 0) {
memcpy(u.data8, byte_ptr, size);
while (size % sizeof (uint32_t)) {
u.data8[size++] = 0; // zero fill
}
write_to_eeprom(u.data32, (uint32_t) size);
}
}
// usage - very simple
write_bytes_to_eeprom(&s, sizeof s);
** Could use memcpy(u.data32, byte_ptr, BYTES_TO_EEPROM_N); to handle #zwol issue.
I know that memcmp() cannot be used to compare structs that have not been memset() to 0 because of uninitialized padding. However, in my program I have a struct with a few different types at the start, then several dozen of the same type until the end of the struct. My thought was to manually compare the first few types, then use a memcmp() on the remaining contiguous memory block of same typed members.
My question is, what does the C standard guarantee about structure padding? Can I reliably achieve this on any or all compilers? Does the C standard allow struct padding to be inserted between same type members?
I have implemented my proposed solution, and it seems to work exactly as intended with gcc:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
struct foo
{
char a;
void *b;
int c;
int d;
int e;
int f;
};
static void create_struct(struct foo *p)
{
p->a = 'a';
p->b = NULL;
p->c = 1;
p->d = 2;
p->e = 3;
p->f = 4;
}
static int compare(struct foo *p1, struct foo *p2)
{
if (p1->a != p2->a)
return 1;
if (p1->b != p2->b)
return 1;
return
/* Note the typecasts to char * so we don't get a size in ints. */
memcmp(
/* A pointer to the start of the same type members. */
&(p1->c),
&(p2->c),
/* A pointer to the start of the last element to be compared. */
(char *)&(p2->f)
/* Plus its size to compare until the end of the last element. */
+sizeof(p2->f)
/* Minus the first element, so only c..f are compared. */
-(char *)&(p2->c)
) != 0;
}
int main(int argc, char **argv)
{
struct foo *p1, *p2;
int ret;
/* The loop is to ensure there isn't a fluke with uninitialized padding
* being the same.
*/
do
{
p1 = malloc(sizeof(struct foo));
p2 = malloc(sizeof(struct foo));
create_struct(p1);
create_struct(p2);
ret = compare(p1, p2);
free(p1);
free(p2);
if (ret)
puts("no match");
else
puts("match");
}
while (!ret);
return 0;
}
There is no guarantee of this in the C standard. From a practical standpoint it's true as part of the ABI for every current C implementation, and there seems to be no purpose in adding padding (e.g. it could not be used for checking against buffer overflows, since a conforming program is permitted to write to the padding). But strictly speaking it's not "portable".
Sadly, there is no C standard (that I have ever heard of) that allows you to control structure padding. There is the fact that automatic allocation that is initialized like this
struct something val = { 0 };
will cause all the members in val to be initialized to 0. But the padding in between is left to the implementation.
There are compiler extensions you can use like GCC's __attribute__((packed)) to eliminate most if not all structure padding, but aside from that you may be at a loss.
I also know that without major optimizations in place, most compilers won't bother to add structure padding in most cases, which would explain why this works under GCC.
That said, if your structure members cause odd alignment issues like this
struct something { char onebyte; int fourbyte; };
they will cause the compiler to add padding after the onebyte member to satisfy the alignment requirements of the fourbyte member.
I am getting an error from my compiler as follow:
C51 COMPILER V9.01 - SN: C1ADC-HAI60D COPYRIGHT KEIL ELEKTRONIK GmbH
1987 - 2009
* WARNING C260 IN LINE 300 OF SEQUENCE.C: '=': pointer truncation
* ERROR C190 IN LINE 301 OF SEQUENCE.C: '&': not an lvalue
The following is my code:
struct myCond{
unsigned char currStatus;
unsigned char prevStatus;
unsigned int *timer;
unsigned char *flag;
}
struct myCond StatCond;
unsigned int data timerdata;
bit bdata timeflag;
void someSubroutine (void)
{
struct myCond *tempCond;
tempCond = &StatCond;
tempCond->timer = &((unsigned int)timerdata);
tempCond->flag = &((unsigned char)timeflag);
}
Are we supposed to guess which line is 301?
The problems, as I understand are here:
tempCond->timer = &((unsigned int)timerdata);
tempCond->flag = &((unsigned char)timeflag);
(unsigned int)timerdata and (unsigned char)timeflag are values, r-values to be precise. They cannot be modified or assigned unlike l-values, which plain timerdata and timeflag are. And so you can't take addresses of r-values with &. It would be the same as, say, writing &1. 1 on its own does not exist as an object in data memory.
You should write instead:
tempCond->timer = &timerdata;
tempCond->flag = (unsigned char*)&timeflag;
And I'm not quite sure that it is legal to take an address of a bit variable. The last line may fail to compile.
Perhaps redefining the structure would help:
struct myCond{
...
bit bdata *flag; // or maybe without bdata
}
and then you'd write tempCond->flag = &timeflag;.
unsigned int data timerdata; // what is 'data', is it defined?
bit bdata timerflag; // what are 'bit' and 'bdata', are they defined?
Check your code with regard to my questions above. Compiler errors are often reported multiple lines after the real offense.