Compare structs in C using memcmp() and pointer arithmetic

Compare structs in C using memcmp() and pointer arithmetic - c

I know that memcmp() cannot be used to compare structs that have not been memset() to 0 because of uninitialized padding. However, in my program I have a struct with a few different types at the start, then several dozen of the same type until the end of the struct. My thought was to manually compare the first few types, then use a memcmp() on the remaining contiguous memory block of same typed members.
My question is, what does the C standard guarantee about structure padding? Can I reliably achieve this on any or all compilers? Does the C standard allow struct padding to be inserted between same type members?
I have implemented my proposed solution, and it seems to work exactly as intended with gcc:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
struct foo
{
char a;
void *b;
int c;
int d;
int e;
int f;
};
static void create_struct(struct foo *p)
{
p->a = 'a';
p->b = NULL;
p->c = 1;
p->d = 2;
p->e = 3;
p->f = 4;
}
static int compare(struct foo *p1, struct foo *p2)
{
if (p1->a != p2->a)
return 1;
if (p1->b != p2->b)
return 1;
return
/* Note the typecasts to char * so we don't get a size in ints. */
memcmp(
/* A pointer to the start of the same type members. */
&(p1->c),
&(p2->c),
/* A pointer to the start of the last element to be compared. */
(char *)&(p2->f)
/* Plus its size to compare until the end of the last element. */
+sizeof(p2->f)
/* Minus the first element, so only c..f are compared. */
-(char *)&(p2->c)
) != 0;
}
int main(int argc, char **argv)
{
struct foo *p1, *p2;
int ret;
/* The loop is to ensure there isn't a fluke with uninitialized padding
* being the same.
*/
do
{
p1 = malloc(sizeof(struct foo));
p2 = malloc(sizeof(struct foo));
create_struct(p1);
create_struct(p2);
ret = compare(p1, p2);
free(p1);
free(p2);
if (ret)
puts("no match");
else
puts("match");
}
while (!ret);
return 0;
}

There is no guarantee of this in the C standard. From a practical standpoint it's true as part of the ABI for every current C implementation, and there seems to be no purpose in adding padding (e.g. it could not be used for checking against buffer overflows, since a conforming program is permitted to write to the padding). But strictly speaking it's not "portable".

Sadly, there is no C standard (that I have ever heard of) that allows you to control structure padding. There is the fact that automatic allocation that is initialized like this
struct something val = { 0 };
will cause all the members in val to be initialized to 0. But the padding in between is left to the implementation.
There are compiler extensions you can use like GCC's __attribute__((packed)) to eliminate most if not all structure padding, but aside from that you may be at a loss.
I also know that without major optimizations in place, most compilers won't bother to add structure padding in most cases, which would explain why this works under GCC.
That said, if your structure members cause odd alignment issues like this
struct something { char onebyte; int fourbyte; };
they will cause the compiler to add padding after the onebyte member to satisfy the alignment requirements of the fourbyte member.

Related

C vs C++ placing structs in unsigned char buffer

Does C have anything similar to C++ where one can place structs in an unsigned char buffer as is done in C++ as shown in the standard sec. 6.7.2
template<typename ...T>
struct AlignedUnion {
alignas(T...) unsigned char data[max(sizeof(T)...)];
};
int f() {
AlignedUnion<int, char> au;
int *p = new (au.data) int; // OK, au.data provides storage
char *c = new (au.data) char(); // OK, ends lifetime of *p
char *d = new (au.data + 1) char();
return *c + *d; // OK
}
In C I can certainly memcpy a struct of things(or int as shown above) into an unsigned char buffer, but then using a pointer to this struct one runs into strict aliasing violations; the buffer has different declared type.
So suppose one would want to replicate the second line in f the C++ above in C. One would do something like this
#include<string.h>
#include<stdio.h>
struct Buffer {
unsigned char data[sizeof(int)];
};
int main()
{
struct Buffer b;
int n = 5;
int* p = memcpy(&b.data,&n,sizeof(int));
printf("%d",*p); // aliasing violation here as unsigned char is accessed as int
return 0;
}
Unions are often suggested i.e. union Buffer {int i;unsigned char b[sizeof(int)]}; but this is not quite as nice if the aim of the buffer is to act as storage (i.e. placing different sized types in there, by advancing a pointer into the buffer to the free part + potenially some more for proper alignment).

Have you tried using a union?
#include <string.h>
#include <stdio.h>
union Buffer {
int int_;
double double_;
long double long_double_;
unsigned char data[1];
};
int main() {
union Buffer b;
int n = 5;
int *p = memcpy(&b.data, &n, sizeof(int));
printf("%d", *p); // aliasing violation here as unsigned char is accessed as int
return 0;
}
The Buffer aligns data member according the type with the greatest alignment requirement.

Yes, because of strict aliasing rule it is just not possible. As it is not possible to write a standard compliant malloc().
Your buffer is not aligned - alignas(int) from stdalign.h needs to be added.
If you want to protect against compiler optimizations, either:
just cast the pointer and access it and compile with -fno-strict-aliasing, or use volatile
or move the accessor to the buffer to another file that is compiled without LTO so that compiler just is not able to optimize it.
// mybuffer.c
#include <stdalign.h>
alignas(int) unsigned char buffer[sizeof(int)];
void *getbuffer() { return buffer; }
// main.c
#include <string.h>
#include <stdio.h>
#include "mybuffer.h"
int main() {
void *data = getbuffer();
// int *p = new (au.data) int; // OK, au.data provides storage
int *p = data;
// char *c = new (au.data) char(); // OK, ends lifetime of *p
char *c = data;
*c = 0;
// char *d = new (au.data + 1) char();
char *d = (char*)data + 1;
*d = 0;
return *c + *d;
}

The way the definition of Effective Type in 6.5p6 is written, it's unclear what it's supposed to mean in all corner cases--likely because there was never a consensus among Committee Members as to how all corner cases should be handled. Defect reports often add more confusion than clarity, since they use terms like the "active member" of a union when neither the Standard nor the defect reports specify what actions would set or change it.
If one wants to use an object of static or automatic duration as though it were a buffer without a declared type, a safe way of doing that should be to do something like the following:
void volatile *volatile dummy_vp;
void test(void)
{
union {
char dat[1000];
unsigned long force_alignment;
} buffer;
void *volatile launder = buffer.dat;
dummy_vp = &launder;
void *storage_blob = launder;
...
}
Unless an implementation goes out of its way to test whether the read of
launder happened to yield an address matching buffer.dat, it would have no way of knowing whether the object at that address had a declared type. Nothing in the Standard would forbid an implementation from behaving nonsensically if the address happened to match that of buffer.dat, but situations where performance improvements would justify the cost of the check aren't likely to be common enough for compilers to attempt such "optimization".

Is it possible to simulate C99 lvalue array initialization in C90?

Context:
I am experimenting with functional programming patterns in C90.
Goal:
This is what I'm trying to achieve in ISO C90:
struct mut_arr tmp = {0};
/* ... */
struct arr const res_c99 = {tmp};
Initializing a const struct member of type struct mut_arr with a lvalue (tmp).
#include <stdio.h>
enum
{
MUT_ARR_LEN = 4UL
};
struct mut_arr
{
unsigned char bytes[sizeof(unsigned char const) * MUT_ARR_LEN];
};
struct arr {
struct mut_arr const byte_arr;
};
static struct arr map(struct arr const* const a,
unsigned char (*const op)(unsigned char const))
{
struct mut_arr tmp = {0};
size_t i = 0UL;
for (; i < sizeof(tmp.bytes); ++i) {
tmp.bytes[i] = op(a->byte_arr.bytes[i]);
}
struct arr const res_c99 = {tmp};
return res_c99;
}
static unsigned char op_add_one(unsigned char const el)
{
return el + 1;
}
static unsigned char op_print(unsigned char const el)
{
printf("%u", el);
return 0U;
}
int main() {
struct arr const a1 = {{{1, 2, 3, 4}}};
struct arr const a2 = map(&a1, &op_add_one);
map(&a2, &op_print);
return 0;
}
This is what I tried in C90:
#include <stdio.h>
#include <string.h>
enum {
MUT_ARR_LEN = 4UL
};
struct mut_arr {
unsigned char bytes[sizeof(unsigned char const) * MUT_ARR_LEN];
};
struct arr {
struct mut_arr const byte_arr;
};
struct arr map(struct arr const* const a,
unsigned char (*const op)(unsigned char const))
{
struct arr const res = {0};
unsigned char(*const res_mut_view)[sizeof(res.byte_arr.bytes)] =
(unsigned char(*const)[sizeof(res.byte_arr.bytes)]) & res;
struct mut_arr tmp = {0};
size_t i = 0UL;
for (; i < sizeof(tmp.bytes); ++i) {
tmp.bytes[i] = op(a->byte_arr.bytes[i]);
}
memcpy(res_mut_view, &tmp.bytes[0], sizeof(tmp.bytes));
return res;
}
unsigned char op_add_one(unsigned char const el) { return el + 1; }
unsigned char op_print(unsigned char const el) {
printf("%u", el);
return 0U;
}
int main() {
struct arr const a1 = {{{1, 2, 3, 4}}};
struct arr const a2 = map(&a1, &op_add_one);
map(&a2, &op_print);
return 0;
}
All I do is to create an "alternate view" (making it essentially writable). Hence, I cast the returned address to unsigned char(*const)[sizeof(res.byte_arr.bytes)].
Then, I use memcpy, and copy the contents of the tmp to res.
I also tried to use the scoping mechanism to circumvent initializing in the beginning.
But it does not help, since there cannot be a runtime evaluation.
This works, but it is not anything like the C99 solution above.
Is there perhaps a more elegant way to pull this off?
PS: Preferably, the solution should be as portable as possible, too. (No heap allocations, only static allocations. It should remain thread-safe. These programs above seem to be, as I only use stack allocation.)

Union it.
#include <stdio.h>
#include <string.h>
enum {
MUT_ARR_LEN = 4UL
};
struct mut_arr {
unsigned char bytes[sizeof(unsigned char) * MUT_ARR_LEN];
};
struct arr {
const struct mut_arr byte_arr;
};
struct arr map(const struct arr *a, unsigned char (*op)(unsigned char)) {
union {
struct mut_arr tmp;
struct arr arr;
} u;
size_t i = 0;
for (; i < sizeof(u.tmp.bytes); ++i) {
u.tmp.bytes[i] = op(a->byte_arr.bytes[i]);
}
return u.arr;
}
unsigned char op_add_one(unsigned char el) {
return el + 1;
}
unsigned char op_print(unsigned char el) {
printf("%u", el);
return 0U;
}
int main() {
const struct arr a1 = {{{1, 2, 3, 4}}};
const struct arr a2 = map(&a1, &op_add_one);
map(&a2, &op_print);
return 0;
}
Let's throw some standard stuffs from https://port70.net/~nsz/c/c89/c89-draft.html .
One special guarantee is made in order to simplify the use of unions: If a union contains several structures that share a common initial sequence, and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them. Two structures share a common initial sequence if corresponding members have compatible types for a sequence of one or more initial members.
Two types have compatible type if their types are the same.
For two qualified types to be compatible, both shall have the identically qualified version of a compatible type;
The idea is that "common initial sequence" of mut_arr and arr is unsigned char [sizeof(unsigned char) * MUT_ARR_LEN]; so you can access one using the other.
However, as I read it now, it is unspecified if "initial sequence if corresponding members" includes nested struct members or not. So technically to be super standard compliant, you would:
struct arr map(const struct arr *a, unsigned char (*op)(unsigned char)) {
struct mutmut_arr {
struct mut_arr byte_arr;
};
union {
struct mutmut_arr tmp;
struct arr arr;
} u;
size_t i = 0;
for (; i < sizeof(u.tmp.bytes); ++i) {
u.tmp.byte_arr.bytes[i] = op(a->byte_arr.bytes[i]);
}
return u.arr;
}
#subjective I do want to note two things.
The placement of const type qualifier in your code is very confusing. It's typical in C to write const <type> not <type> const. It's typical to align * to the right with space on the left. I was not able to read your code efficiently at all. I removed almost all const from the code above.
Creating such interface as presented will be pain with no great benefits, with a lot of edge cases with lurking undefined behaviors around the corner. In C programming language, trust the programmer - it's one of the principles of C programming language. Do not prevent the programmer to do what has to be done (initializing a structure member). I would advise making the member mutable and have one structure definition and call it day. const qualified structure members usually are just hard to deal with, with no big benefits.

My answer might sound outrageous at first glance. It is
STOP WHAT YOU ARE DOING, NOW!
I will take my time to explain and give you a glimpse into your future (which is dim, if you pursue this idea) and try to convince you. But the gist of my answer is the bold line above.
Your prototype omits crucial parts to have some lasting solution to your "functional programming in C" approach. For example, you only have arrays of bytes (unsigned char). But for a "real" solution for "real" programmers, you need to consider different types. If you go to hoogle (Haskells online type and function browser engine thingy), you will notice, that fmap, which is the functional feature you try to achieve in C is defined as:
fmap :: Functor f => (a -> b) -> f a -> f b
This means, the mapping is not always from type a to type a. It's a monadic thingy, you try to offer your C programming fellows. So, an array of type element type a needs to be mapped to an array of element type b. Hence, your solution needs to offer not just arrays of bytes.
In C, arrays can reside in different types of memory and we cannot hide this very well. (In real functional languages, memory management is kind of abstracted away for the larger part and you just do not care. But in C, you must care. The user of your library must care and you need to allow them to dutifully care. Arrays can be global, on the stack, on the heap, in shared memory, ... and you need to offer a solution, allowing all that. Else, it will always just be a toy, propagating an illusion, that "it is possible and useful".
So, with just allowing arrays of different, custom types (someone will want arrays of arrays of a type as well, mind you!) and to be aware of memory management, how could a header file of your next evolution look like. Here is what I came up with:
#ifndef __IMMUTABLE_ARRAY_H
#define __IMMUTABLE_ARRAY_H
#include <stdint.h>
#include <stdlib.h>
#include <stdatomic.h>
// lacking namespaces or similar facilities in C, we use
// the prefix IA (Immutable Array) in front of all the stuff
// declared in this header.
// Wherever you see a naked `int`, think "bool".
// 0 -> false, 1 -> true.
// We do not like stdbool.h because sometimes trouble
// ensues in mixed C/C++ code bases on some targets, where
// sizeof(C-bool) != sizeof(C++-bool) o.O. So we cannot use
// C-bool in headers...
// We need storage classes!
// There are arrays on heap, static (global arrays),
// automatic arrays (on stack, maybe by using alloca),
// arrays in shared memory, ....
// For those different locations, we need to be able to
// perform different actions, e.g. for cleanup.
// IAStorageClass_t defines the behavior for a specific
// storage class.
// There is also the case of an array of arrays to consider...
// where we would need to clean up each member of the array
// once the array goes out of scope.
struct IAArray_tag;
typedef struct IAArray_tag IAArray_t;
typedef struct IAStorageClass_tag IAStorageClass_t;
typedef int (*IAArrayAllocator) (IAStorageClass_t* sclass,
size_t elementSize,
size_t capacity,
void* maybeStorage,
IAArray_t* target);
typedef void (*IAArrayDeleter) (IAArray_t* arr);
typedef void (*IAArrayElementDeleter) (IAArray_t* arr);
typedef int64_t (*IAArrayAddRef) (IAArray_t* arr);
typedef int64_t (*IAArrayRelease) (IAArray_t* arr);
typedef struct IAStorageClass_tag {
IAArrayAllocator allocator;
IAArrayDeleter deleter;
IAArrayElementDeleter elementDeleter;
IAArrayAddRef addReffer;
IAArrayRelease releaser;
} IAStorageClass_t;
enum IAStorageClassID_tag {
IA_HEAP_ARRAY = 0,
IA_STACK_ARRAY = 1,
IA_GLOBAL_ARRAY = 2,
IA_CUSTOM_CLASSES_BEGIN = 100
};
typedef enum IAStorageClassID_tag IAStorageClassID_t;
// creates the default storage classes (for heap and automatic).
void IAInitialize();
void IATerminate();
// returns a custom and dedicated identifier of the storage class.
int32_t
IARegisterStorageClass
(IAArrayAllocator allocator,
IAArrayDeleter deleter,
IAArrayElementDeleter elementDeleter,
IAArrayAddRef addReffer,
IAArrayRelease releaser);
struct IAArray_tag {
const IAStorageClass_t* storageClass;
int64_t refCount;
size_t elementSize; // Depends on the type you want to store
size_t capacity;
size_t length;
void* data;
};
// to make sure, uninitialized array variables are properly
// initialized to a harmless state.
IAArray_t IAInitInstance();
// allows to check if we ran into some uninitialized instance.
// In C++, this would be like after default constructor.
// See IAInitInstance().
int IAIsArray(IAArray_t* arr);
int
IAArrayCreate
(int32_t storageClassID,
size_t elementSize, // the elementSize SHALL be padded to
// a system-acceptable alignment size.
size_t capacity,
size_t size,
void* maybeStorage,
IAArray_t* target);
typedef
int
(*IAInitializerWithIndex_t)
(size_t index,
void* elementPtr);
int
IAArrayCreateWithInitializer
(int32_t storageClassID,
size_t elementSize,
size_t capacity,
void* maybeStorage,
IAInitializerWithIndex_t initializer,
IAArray_t* target);
IAArray_t* IAArrayAddReference(IAArray_t* arr);
void IAArrayReleaseReference(IAArray_t* arr);
// The one and only legal way to access elements within the array.
// Shortcutters, clever guys and other violators get hung, drawn
// and quartered!
const void * const IAArrayAccess(IAArray_t* arr, size_t index);
typedef void (*IAValueMapping_t)
(size_t index,
void* sourceElementPtr,
size_t sourceElementSize,
void* targetElementPtr,
size_t targetElementSize);
size_t IAArraySize(IAArray_t* arr);
size_t IAArrayCapacity(IAArray_t* arr);
size_t IAArrayElementSize(IAArray_t* arr);
// Because of reasons, we sometimes want to recycle
// an array and populate it with new values.
// This can only be referentially transparent and safe,
// if there are no other references to this array stored
// anywhere. i.e. if refcount == 1.
// If our app code passed the array around to other functions,
// some nasty ones might sneakily store themselves a pointer
// to an array and then the refcount > 1 and we cannot
// safely recycle the array instance.
// Then, we have to release it and create ourselves a new one.
int IACanRecycleArray(IAArray_t* arr);
// Starship troopers reporter during human invasion
// of bug homeworld: "It is an ugly planet, a bug planet!"
// This is how we feel about C. Map needs some noisy extras,
// just because C does not allow to build new abstractions with
// types. Yes, we could send Erich Gamma our regards and pack
// all the noise into some IAArrayFactory * :)
int
IAArrayMap(IAValueMapping_t mapping,
IAArray_t* source,
int32_t targetStorageClassID,
size_t targetElementSize,
void* maybeTargetStorage,
IAArray_t* target);
#endif
Needless to say, that I did not bother to implement my cute immutable-array.h in my still empty immutable-array.c, yes?
But once we did it, the joy woulds begin and we could write robust, functional C programs, yes? No! This is how well written functional C application code using those arrays might look like:
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdatomic.h>
#include <math.h>
#include <assert.h>
#include "immutable-array.h"
typedef struct F64FloorResult_tag {
double div;
double rem;
} F64FloorResult_t;
void myFloor(double number, F64FloorResult_t* result) {
if (NULL != result) {
result->div = floor(number);
result->rem = number - result->div;
}
}
int randomDoubleInitializer(size_t index, double* element) {
if (NULL != element) {
*element = ((double)rand()) / (double)RAND_MAX;
return 1;
}
return 0;
}
void
doubleToF64FloorMapping
(size_t index,
double* input,
size_t inputElementSize,
F64FloorResult_t *output,
size_t outputElementSize) {
assert(sizeof(double) == inputElementSize);
assert(sizeof(F64FloorResult_t) == outputElementSize);
assert(NULL != input);
assert(NULL != output);
myFloor(*input, output);
}
int main(int argc, const char* argv[]) {
IAInitialize();
{
double sourceData[20];
IAArray_t source = IAInitInstance();
if (IAArrayCreateWithInitializer
((IAStorageClassID_t)IA_STACK_ARRAY,
sizeof(double),
20,
&sourceData[0],
(IAInitializerWithIndex_t)randomDoubleInitializer,
&source)) {
IAArray_t result = IAInitInstance();
F64FloorResult_t resultData[20];
if (IAArrayMap
((IAValueMapping_t)doubleToF64FloorMapping,
&source,
(int32_t)IA_STACK_ARRAY,
sizeof(F64FloorResult_t),
&result)) {
assert(IAArraySize(&source) == IAArraySize(&result));
for (size_t index = 0;
index < IAArraySize(&source);
index++) {
const double* const ival =
(const double* const)IAArrayAccess(&source, index);
const F64FloorResult_t* const oval =
(const F64FloorResult_t* const)
IAArrayAccess(&result,index);
printf("(%g . #S(f64floorresult_t :div %g :rem %g))\n",
*ival, oval->div, oval->rem);
}
IAArrayReleaseReference(&result);
}
IAArrayReleaseReference(&source);
}
}
IATerminate();
return 0;
}
I see already the knives coming out of the satchels of your colleagues if you try to impose such a monstrosity upon them. They will hate you, you will hate yourself. Eventually, you will hate that you ever had the idea to even try.
Especially, if in a more suitable language, the same code might look like this:
(map 'list #'(lambda (x) (multiple-value-list (floor x)))
(loop repeat 20
for x = (random 1.0)
collecting x))

type-punning a char array struct member

Consider the following code:
typedef struct { char byte; } byte_t;
typedef struct { char bytes[10]; } blob_t;
int f(void) {
blob_t a = {0};
*(byte_t *)a.bytes = (byte_t){10};
return a.bytes[0];
}
Does this give aliasing problems in the return statement? You do have that a.bytes dereferences a type that does not alias the assignment in patch, but on the other hand, the [0] part dereferences a type that does alias.
I can construct a slightly larger example where gcc -O1 -fstrict-aliasing does make the function return 0, and I'd like to know if this is a gcc bug, and if not, what I can do to avoid this problem (in my real-life example, the assignment happens in a separate function so that both functions look really innocent in isolation).
Here is a longer more complete example for testing:
#include <stdio.h>
typedef struct { char byte; } byte_t;
typedef struct { char bytes[10]; } blob_t;
static char *find(char *buf) {
for (int i = 0; i < 1; i++) { if (buf[0] == 0) { return buf; }}
return 0;
}
void patch(char *b) {
*(byte_t *) b = (byte_t) {10};
}
int main(void) {
blob_t a = {0};
char *b = find(a.bytes);
if (b) {
patch(b);
}
printf("%d\n", a.bytes[0]);
}
Building with gcc -O1 -fstrict-aliasing produces 0

The main issue here is that those two structs are not compatible types. And so there can be various problems with alignment and padding.
That issue aside, the standard 6.5/7 only allows for this (the "strict aliasing rule"):
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
...
an aggregate or union type that includes one of the aforementioned types among its members
Looking at *(byte_t *)a.bytes, then a.bytes has the effective type char[10]. Each individual member of that array has in turn the effective type char. You de-reference that with byte_t, which is not a compatible struct type nor does it have a char[10] among its members. It does have char though.
The standard is not exactly clear how to treat an object which effective type is an array. If you read the above part strictly, then your code does indeed violate strict aliasing, because you access a char[10] through a struct which doesn't have a char[10] member. I'd also be a bit concerned about the compiler padding either struct to meet alignment.
Generally, I'd simply advise against doing fishy things like this. If you need type punning, then use a union. And if you wish to use raw binary data, then use uint8_t instead of the potentially signed & non-portable char.

The error is in *(byte_t *)a.bytes = (byte_t){10};. The C spec has a special rule about character types (6.5§7), but that rule only applies when using character type to access any other type, not when using any type to access a character.

According to the Standard, the syntax array[index] is shorthand for *((array)+(index)). Thus, p->array[index] is equivalent to *((p->array) + (index)), which uses the address of p to compute the address of p->array, and then without regard for p's type, adds index (scaled by the size of the array-element type), and then dereferences the resulting pointer to yield an lvalue of the array-element type. Nothing in the wording of the Standard would imply that an access via the resulting lvalue is an access to an lvalue of the underlying structure type. Thus, if the struct member is an array of character type, the constraints of N1570 6.5p7 would allow an lvalue of that form to access storage of any type.
The maintainers of some compilers such as gcc, however, appear to view the laxity of the Standard there as a defect. This can be demonstrated via the code:
struct s1 { char x[10]; };
struct s2 { char x[10]; };
union s1s2 { struct s1 v1; struct s2 v2; } u;
int read_s1_x(struct s1 *p, int i)
{
return p->x[i];
}
void set_s2_x(struct s2 *p, int i, int value)
{
p->x[i] = value;
}
__attribute__((noinline))
int test(void *p, int i)
{
if (read_s1_x(p, 0))
set_s2_x(p, i, 2);
return read_s1_x(p, 0);
}
#include <stdio.h>
int main(void)
{
u.v2.x[0] = 1;
int result = test(&u, 0);
printf("Result = %d / %d", result, u.v2.x[0]);
}
The code abides the constraints in N1570 6.5p7 because it all accesses to any portion of u are performed using lvalues of character type. Nonetheless, the code generated by gcc will not allow for the possibility that the storage accessed by (*(struct s1))->x[0] might also be accessed by (*(struct s2))->x[i] despite the fact that both accesses use lvalues of character type.

Overlay struct to arbitary buffer

I'm a "new" C programmer, but an old assembly programmer, and have been searching for an answer for a few days.
I'm trying to parse multiple fields in a message with the C struct construct, (It's a LORA radio with an embedded RTU modbus packet).
I have This example code that shows my question:
#include <stdio.h>
#include <stdint.h>
struct MessageTable{
uint8_t msg_id;
uint8_t from;
uint8_t to;
unsigned flags1 : 1;
unsigned retransmitted : 1;
unsigned hops : 4;
union {
unsigned long millisecs;
unsigned char bytes[sizeof(unsigned long)];
} ms;
};
struct MessageTable message, *mp;
struct MessageTable message_table[8] = {0};
char buf[256];
void main(void) {
int i;
for (i=0; i<255; i++)
buf[i] = i;
mp = (struct MessageTable) &buf;
printf("To: %u, From: %u", mp->to, mp->from);
}
When I try to compile I get:
question.c: In function ‘main’:
question.c:27:18: error: conversion to non-scalar type requested
27 | mp = (struct MessageTable) &buf;
| ^~~~~~~~~~~~
What I'm attempting to do is, overlay the struct in the buffer space at some arbitrary position for named access to the different fields instead of using hard coded offsets (I.E. to=buf[2]; and retransmitted = buf[3]&02x;
What is the clean, readable, appropriate way to do this?
NOTE: there will be multiple structs at different buf positions (LORA routing, Modbus Send, Modbus Rx, Modbus err, etc...)
and, this is straight C, not C++.
I don't care if the buffer "runs off" the end of the struct, the code constructs take care of that.

First to address your error message on this line:
mp = (struct MessageTable) &buf;
Here you're attempting to convert &buf, which has type char (*)[256] i.e. a pointer to an array, to a struct MessageTable which is not a pointer type. Arrays in most contexts decay to a pointer to the first element, so you don't need to take its address, and you need to cast it to a pointer type:
mp = (struct MessageTable *)buf;
The other issue however is:
The struct might not be exactly the size you expect
The order of bitfieds may not be what you expect
If the buffer is not properly aligned for the fields in the struct you could generate a fault.

You have two problems in:
mp = (struct MessageTable) &buf;
The first is buf is already a pointer due to array/pointer conversion. C11 Standard - 6.3.2.1 Other Operands - Lvalues, arrays, and function designators(p3)
The second problem is you are casting to struct MessageTable instead of a Pointer to struct MessageTable. You can correct both with:
mp = (struct MessageTable*) buf;
Also, unless you are programming in a freestanding environment (without the benefit of any OS), in a standards conforming implementation, the allowable declarations for main for are int main (void) and int main (int argc, char *argv[]) (which you will see written with the equivalent char **argv). See: C11 Standard - §5.1.2.2.1 Program startup(p1). See also: What should main() return in C and C++? In a freestanding environment, the name and type of the function called at program startup are implementation-defined. See: C11 Standard - 5.1.2.1 Freestanding environment
Putting it altogether you would have:
#include <stdio.h>
#include <stdint.h>
struct MessageTable{
uint8_t msg_id;
uint8_t from;
uint8_t to;
unsigned flags1 : 1;
unsigned retransmitted : 1;
unsigned hops : 4;
union {
unsigned long millisecs;
unsigned char bytes[sizeof(unsigned long)];
} ms;
};
struct MessageTable message, *mp;
struct MessageTable message_table[8] = {0};
char buf[256];
int main(void) {
int i;
for (i=0; i<255; i++)
buf[i] = i;
mp = (struct MessageTable*) buf;
printf("To: %u, From: %u", mp->to, mp->from);
}
Example Use/Output
$ ./bin/struct_buf_overlay
To: 2, From: 1

C struct fields are, by default, not guaranteed to be immediately adjacent to one other, and furthermore bitfields can be reordered. Implementations are permitted to reorder bitfields and implement padding in order to efficiently meet system memory alignment requirements. If you need to guarantee that struct fields are positioned in memory immediately adjacent to one another (without padding) and in the order you specified, you need to look up how to tell your compiler to create a packed struct. This is not standard C (but it's necessary to ensure that what you're trying to accomplish will work--it might, but is not guaranteed, to work otherwise), and each compiler has its own way of doing it.

using #define for defining struct objects

I came across this simple program somewhere
#include<stdio.h>
#include<stdlib.h>
char buffer[2];
struct globals {
int value;
char type;
long tup;
};
#define G (*(struct globals*)&buffer)
int main ()
{
G.value = 233;
G.type = '*';
G.tup = 1234123;
printf("\nValue = %d\n",G.value);
printf("\ntype = %c\n",G.type);
printf("\ntup = %ld\n",G.tup);
return 0;
}
It's compiling (using gcc) and executing well and I get the following output:
Value = 233
type = *
tup = 1234123
I am not sure how the #define G statement is working.
How G is defined as an object of type struct globals ?

First, this code has undefined behavior, because it re-interprets a two-byte array as a much larger struct. Therefore, it is writing past the end of the allocated space. You could make your program valid by using the size of the struct to declare the buffer array, like this:
struct globals {
int value;
char type;
long tup;
};
char buffer[sizeof(struct globals)];
The #define is working in its usual way - by providing textual substitutions of the token G, as if you ran a search-and-replace in your favorite text editor. Preprocessor, the first stage of the C compiler, finds every entry G, and replaces it with (*(struct globals*)&buffer).
Once the preprocessor is done, the compiler sees this code:
int main ()
{
(*(struct globals*)&buffer).value = 233;
(*(struct globals*)&buffer).type = '*';
(*(struct globals*)&buffer).tup = 1234123;
printf("\nValue = %d\n",(*(struct globals*)&buffer).value);
printf("\ntype = %c\n",(*(struct globals*)&buffer).type);
printf("\ntup = %ld\n",(*(struct globals*)&buffer).tup);
return 0;
}

The macro simply casts the address of the 2-character buffer buf into a pointer to the appropriate structure type, then de-references that to produce a struct-typed lvalue. That's why the dot (.) struct-access operator works on G.
No idea why anyone would do this. I would think it much cleaner to convert to/from the character array when that is needed (which is "never" in the example code, but presumably it's used somewhere in the larger original code base), or use a union to get rid of the macro.
union {
struct {
int value;
/* ... */
} s;
char c[2];
} G;
G.s.value = 233; /* and so on */
is both cleaner and clearer. Note that the char array is too small.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Compare structs in C using memcmp() and pointer arithmetic - c

Related

C vs C++ placing structs in unsigned char buffer

Is it possible to simulate C99 lvalue array initialization in C90?

type-punning a char array struct member

Overlay struct to arbitary buffer

using #define for defining struct objects

Categories

Resources