Why is initializing C union using "designated initializer" giving random values? - c

I had a "bug" which I spent quite a while chasing:
typedef union {
struct {
uint8_t mode: 1;
uint8_t texture: 4;
uint8_t blend_mode: 2;
};
uint8_t key;
} RenderKey;
Later this union would be initialized (on stack):
Buffers buffers[128]; // initialized somewhere else
void Foo(int a, int b)
{
//C99 style initialization (all the other values should be 0)
RenderKey rkey = {.blend_mode = 1};
//rkey.key would sometimes be >= 128 thus would write out of array bounds
DoStuffWithBuffer(&buffers[rkey.key]);
}
This seemed to indicate that the last bit of the union bitfield wouldn't be initialized. So I fixed it with adding the unused bit:
typedef union {
struct {
uint8_t mode: 1;
uint8_t texture: 4;
uint8_t blend_mode: 2;
uint8_t unused: 1;
};
uint8_t key;
} RenderKey;
This works, but I don't understand WHY exactly.
That random 1 bit comes from the random garbage on stack before, but why isn't the C99 style initialization working here? Because of the union and the anonymous struct?
This happens on Clang 3.5 and tcc, but not on gcc 4.9.2.

In C11 it is stated at §6.7.9 that
The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject; all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.
But the hidden padding bit is not a subobject, it doesn't undergo that constraint because from the anonymous struct point of view it doesn't exist, so the compiler is not initializing something that is not a member of the struct, which isn't that strange after all.
A similar example would be to have something like
#include <stdio.h>
typedef struct {
unsigned char foo;
float value;
} Test;
int main(void) {
Test test = { .foo = 'a', .value = 1.2f};
printf("We expect 8 bytes: %zu\n", sizeof(Test));
printf("We expect 0: %zu\n", (void*)&test.foo - (void*)&test);
printf("We expect 4: %zu\n", (void*)&test.value - (void*)&test);
unsigned char* test_ptr = (unsigned char*) &test;
printf("value of 3rd byte: %d\n", test_ptr[2]);
}
What would expect test_ptr[2] to be? There are 3 bytes of padding between the two members of the struct which are not part of any subobject, initializing them would be a waste of time since in a normal scenario you can't access them.

Related

"Inheritance" in C's structs?

Here I'm a bit confused about this code:
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
struct test_struct {
uint8_t f;
uint8_t weird[];
};
int main(void) {
struct {
struct test_struct tst;
uint8_t weird[256];
} test_in = {};
printf("%u\n", test_in.weird[0]); // 0
test_in.tst.weird[0] = 1;
printf("%u\n", test_in.weird[0]); // 1
return 0;
}
I didn't know that it is possible to use struct's fields this way, so I have two questions:
How is it called in C?
And, of course, how does it work? (Why weird field was changed when I don't change it directly, I thought these are two different fields?)
Here I'm a bit confused about this code:
The short answer is: the code has undefined behavior.
How is it called in C? How does it work?
struct test_struct is defined with its last member as an array of unspecified length: uint8_t weird[]; This member is called a flexible array member, not to be confused with a variable length array.
6.7.2 Type specifiers
[...]
20     As a special case, the last member of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.
if you allocate such a structure from the heap with extra space for array elements, these elements can be accessed via the weird member up to the number of elements thus allocated.
The C Standard mandates that such a structure can only be defined as a member of another structure or union if it appears as the last member of said aggregate. In the posted code, the programmer violates this constraint, so accessing elements of test_in.tst.weird has undefined behavior, and so does accessing elements of test_in.weird.
The programmer also assumes that the test_in.tst.weird array and the test_in.weird array overlap exactly, which may be the case but is not guaranteed, nor supported: code relying on this type of aliasing has undefined behavior as well.
In your example, assuming the compiler accepts the empty initializer {} (part of the next C Standard and borrowed from C++), it seems to work as expected, but this is not guaranteed and alignment issues may cause it to fail as shown in the modified version below:
#include <stdint.h>
#include <stdio.h>
struct test_struct {
uint8_t f;
uint8_t weird[];
};
struct test_struct1 {
int x;
uint8_t f;
uint8_t weird[];
};
int main(void) {
struct {
struct test_struct tst;
uint8_t weird[256];
} test_in = {};
struct {
struct test_struct1 tst;
uint8_t weird[256];
} test_in1 = {};
printf("modifying test_in.weird[0]:\n");
printf("%u\n", test_in.weird[0]); // 0
test_in.tst.weird[0] = 1;
printf("%u\n", test_in.weird[0]); // 1
printf("modifying test_in1.weird[0]:\n");
printf("%u\n", test_in1.weird[0]); // 0
test_in1.tst.weird[0] = 1;
printf("%u\n", test_in1.weird[0]); // 0?
return 0;
}
Output:
chqrlie$ make 220930-flexible.run
clang -O3 -std=c11 -Weverything -o 220930-flexible 220930-flexible.c
220930-flexible.c:17:28: warning: field 'tst' with variable sized type 'struct test_struct' not at
the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct test_struct tst;
^
220930-flexible.c:19:17: warning: use of GNU empty initializer extension [-Wgnu-empty-initializer]
} test_in = {};
^
220930-flexible.c:22:29: warning: field 'tst' with variable sized type 'struct test_struct1' not
at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct test_struct1 tst;
^
220930-flexible.c:24:18: warning: use of GNU empty initializer extension [-Wgnu-empty-initializer]
} test_in1 = {};
^
4 warnings generated.
modifying test_in.weird[0]:
0
1
modifying test_in1.weird[0]:
0
0
struct test_struct {
uint8_t f;
uint8_t weird[];
};
int main(void) {
struct {
struct test_struct tst;
uint8_t weird[256];
} test_in = {};
Effectively, before there were FAM's in the language, what you've declared is:
int main(void) {
struct {
struct { uint8_t f; } tst;
union {
uint8_t weird0[1]; // any non-zero size up to 256
uint8_t weird1[256];
} overlay;
} test_in = {};
On the contrary as described in the comments section above, a declaration like
int array[];
is not a Variable Length Array, it's either called Arrays of unknown size (cppreference) or Arrays of Length Zero (gcc).
An example of a VLA would be:
void foo(size_t n)
{
int array[n]; //n is not available at compile time
}
Based on the comment below (from the cppreference - see provided link):
Within a struct definition, an array of unknown size may appear as the last member (as long as there is at least one other named member), in which case it is a special case known as flexible array member. See struct (section Explanation) for details:
struct s { int n; double d[]; }; // s.d is a flexible array member
struct s *s1 = malloc(sizeof (struct s) + (sizeof (double) * 8)); // as if d was double d[8]
The provided code is just invalid.
You declared a structure with a flexible array member
struct test_struct {
uint8_t f;
uint8_t weird[];
};
From the C Standard (6.7.2.1 Structure and union specifiers)
18 As a special case, the last element of a structure with more than
one named member may have an incomplete array type; this is called a
flexible array member.
As it is seen from the quote such a member must be the last element of a structure. So the above structure declaration is correct.
However then in main you declared another unnamed structure
int main(void) {
struct {
struct test_struct tst;
uint8_t weird[256];
} test_in = {};
//...
that contains as member an element of the structure with the flexible array element that now is not the last element of the unnamed structure. So such a declaration is invalid.
Secondly, you are using empty braces to initialize an object of the unnamed structure. Opposite to C++ in C you may not use empty braces to initialize objects.

How to init struct pointer variables with NULL?

I have such two struct
struct table_element
{
struct table_val * table_val_arr;
int count_arr;
};
struct hash_table
{
struct table_element table_element_arr[MAX_NUMBER];
};
and here my test method
void test(struct hash_table * table)
{
int count;
struct table_element * tab_element;
for(count = 0; count < MAX_NUMBER; count++)
{
tab_element = &table->table_element_arr[count];
if(tab_element->table_val_arr == NULL)
{
printf("\nNULLLL!!!!!\n");
}
else
{
printf("\nOK!!!!!\n");
}
}
}
and here how I use it
int main(int argc, char **argv)
{
struct hash_table m_hash_table;
test(&m_hash_table);
...
I expect that all value would be NULL, but sometimes I get OK sometimes NULL...
What am I doing wrong?
How to init it with NULL?
Non-static variables defined inside of a function have indeterminate values if not explicitly initialized, meaning you can't rely on anything they may contain.
You can fix this by giving an initializer for the variable:
struct hash_table m_hash_table = {{NULL, 0},{NULL, 0},/*repeat MAX_NUMBER times*/};
Or by using memset:
memset(&m_hash_table, 0, sizeof(m_hash_table));
If you don't explicitly initialise a variable in C, it'll have an undefined value. eg.
int fish; // could be zero, -100, 3805, ...anything
int chips = 5; // will definitely be 5.
The same is true of pointers. They could point anywhere. And finally, the same is true of a structure's members.
There are two common approaches to this 'problem' depending on your needs.
memset the whole thing to zero:
struct hash_table m_hash_table;
memset( &m_hash_table, 0, sizeof(m_hash_table) );
Result: all the variables will be zero, all the pointers will be NULL1.
Explicitly set everything by hand:
struct hash_table m_hash_table;
for (int i = 0; i < MAX_NUMBER; i++)
{
m_hash_table.table_element_arr[i].table_val_arr = NULL;
m_hash_table.table_element_arr[i].count_arr = 0;
}
A third option is to provide initialisation when you declare the struct, but it's logically equivalent to option 2.
struct hash_table m_hash_table = { { NULL, 0 }, { NULL, 0 }, ... /*etc*/ };
1 As per the comments, it is true that there exist some architectures where a bit pattern of all zeros is not equivalent to NULL, and hence the memset( ..., 0, ...) approach is not strictly valid. However, for all practical purposes, on any modern platform, it's a perfectly valid, idiomatic solution.
(IMHO anyone using an architecture where this isn't true isn't going to be looking for advice on SO about how to initialise their structures!)
You declared m_hash_table as an automatic variable. Such variables are usually located on the stack. The stack space may be filled with random content.
You have three options.
Declare it as a static variable: static struct hash_table m_hash_table;
Use memset(): memset(&m_hash_table, 0, sizeof(m_hash_table));
Use explicit initializer: struct hash_table m_hash_table = {};
UPDATE#1
According to this http://c-faq.com/null/machexamp.html information options #1 and #2 do not work correctly on some hardware. The option #3 gives the desired result.
UPDATE#2
The discussion below reveals a new truth. Option #1 is the best.
The struct hash_table m_hash_table; is automatic storage, (vs say, static, in which case it would be automatically initialised.) This means the contents of the variable are indeterminate. One could initialise it several ways, see initialisation, (or the other answers.) However, I think that this is important to know that memset is not a proper way to initialise a null pointer, (the C FAQ has an entire section on null pointers.) Like Pascal's nil or Java's null, 0 in pointer context has a special meaning in C, the null pointer. It commonly is all-bits-zero, leading to the mistaken impression that 0 is actually all-bits-zero, but this is not always the case. The general idiomatic way is to have a constructor in which you set any null pointers with explicit,
te->table_val_arr = 0; /* or NULL. */
te->count_arr = 0;
Edit: three initialisations are shown:
#include <stddef.h>
#include <assert.h>
/* `struct table_val` is undefined in this limited context. */
struct table_element {
int * table_val_arr;
int count_arr;
};
/** `te` is a value that gets initialised to be empty. */
static void table_element(struct table_element *const te) {
assert(te);
te->table_val_arr = 0; /* Or `NULL`, depending on your style. */
te->count_arr = 0;
}
struct hash_table {
struct table_element table_element_arr[100];
};
static size_t hash_table_size =
sizeof ((struct hash_table *)0)->table_element_arr
/ sizeof *((struct hash_table *)0)->table_element_arr;
/** `ht` is a value that gets initialised to be empty. */
static void hash_table(struct hash_table *const ht) {
size_t i;
assert(ht);
for(i = 0; i < hash_table_size; i++)
table_element(ht->table_element_arr + i);
}
/* This is automatically initialised to all-elements-zero, (which is not
necessary all-bits-zero.) */
static struct hash_table g_hash_table;
int main(void) {
struct hash_table m_hash_table = {{{0,0}}}; /* Initialiser. */
struct hash_table table; /* Garbage. */
hash_table(&table); /* Now fixed. */
return 0;
}
The dynamic way of using constructor functions is scalable to large objects and objects that one doesn't want to necessarily initialise with zero; C++ expands this greatly to RAII. The initialisation in the declaration is limited to constant expressions, and thus is probably the most efficient. The static option changes the storage class of the object and is probably unsuitable except for objects that one wanted to declare static anyway.
A colleague (not on SO) has suggested this answer: Partially initializing a C struct
Which says (in essence) if you initialise the first element of your structure, the compiler will automatically initialise everything else to zero or NULL (as appropriate) for you.
Copying from that...
10 If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
—if it has pointer type, it is initialized to a null pointer;
—if it has arithmetic type, it is initialized to (positive or unsigned) zero;
—if it is an aggregate, every member is initialized (recursively) according to these rules;
—if it is a union, the first named member is initialized (recursively) according to these rules.
...
21 If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

generic API to initialize structs with common fields

I have a number of structures, which have first 3 fields common, below is
simplified example:
struct my_struct1 {
/* common fields. */
int a;
int b;
int c;
};
struct my_struct2 {
int a;
int b;
int c;
uint32_t d;
uint32_t e;
};
struct my_struct3 {
int a;
int b;
int c;
uint16_t d;
char e;
};
static void func1(struct my_struct1 *s)
{
/* ... */
}
static void func2(struct my_struct2 *s)
{
/* ... */
}
static void func3(struct my_struct3 *s)
{
/* ... */
}
int main(void)
{
struct my_struct1 s = {1, 2, 3};
struct my_struct2 p = {1, 2, 3, 4, 5};
struct my_struct3 q = {1, 2, 3, 4, 'a'};
func1(&s);
func2(&p);
func3(&q);
/* XXX */
func3((struct my_struct3 *)&s);
return 0;
}
Is it safe to typecast s to struct my_struct3 * and pass to func3 and ensure that s or other objects allocated on stack would not be corrupted?
The reason is that I would like to write a generic API that takes a pointer, initializes common fields (which are common for structures). The other function is specific to my_struct* and sets the rest of the fields.
I'm not sure if void * can solve this.
UPDATE
I should mention, that unfortunately I can't change the structures layout, i.e. adding a common part isn't an option, because the code I'm working with is pretty old and I'm not allowed to change its core structures.
The only ugly workaround I'm seeing is to pass void * and enum struct_type parameters to generic_init function, and based on struct_type cast void * to appropriate structure.
As far as I can interpret the standard, casting a pointer of type my_struct1* to a pointer of type mystruct_3* or vice vera may yield undefined behaviour because of pointer conversion rules (cf. C11 standard ISO/IEC 9899:TC2):
6.3.2.3 Pointers ... (7) A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If
the resulting pointer is not correctly aligned for the pointed-to
type, the behavior is undefined. ...
Hence, as my_struct1 and my_struct3 may have different alignment, a pointer that is correctly aligned according to my_struct1 does not necessarily be correctly aligned according to my_struct3.
But even if you can guarantee that all structs have the same alignment, passing a pointer to an object of type my_struct1 to func3 is - in my opinion -
not safe, even if the common members are the first ones in each struct and even if func3 accesses only the common members.
The reason is that a compiler may introduce padding between members:
6.7.2.1 Structure and union specifiers ... (13) Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are
declared. A pointer to a structure object, suitably converted, points
to its initial member (or if that member is a bit-field, then to the
unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
Hence, as my_struct1 and my_struct3 have different sets of members, the rules of how a compiler introduces padding may vary between these two structs. I think that it is unlikely that this happens, but I did not find any statement in the standard that guarantees that padding - even for the first three members - is the same for my_struct1 and my_struct3.
To flesh out what the comments of EOF and Eugene Sh. already explain:
It would not be safe to cast a my_struct1 to a my_struct3, as my_struct3 has more members that my_struct1 and the compiler would not warn at all about accessing those additional members (d and e), overwriting whatever is behind the my_struct1. Doing it the other way around may work, as long as my_struct1 exactly corresponds to the start of my_struct3. I am not sure if there is any guarantee in the standard that would cover you there, but I would not bet on it.
The advantages of separating out the common part in a separate structure type
are the following:
This reduces repetition in your code which has the advantage of
allowing you to change the common code in one place, reducing the
risk of errors.
The compiler can check the types passed around, by
casting the structs you would be effectively disabling such checks.
There is no need for the common struct to be at the start of a struct, by making it a struct member the compiler can figure out the correct offsets for you.
struct common {
int a;
int b;
int c;
};
struct my_struct1 {
struct common com;
};
struct my_struct2 {
struct common com;
uint32_t d;
uint32_t e;
};
struct my_struct3 {
struct common com;
uint16_t d;
char e;
};
void init_common(struct common *com)
{
com->a = 1;
com->b = 2;
/* ... */
}
struct my_struct1 s = {{1, 2, 3}};
struct my_struct2 p = {{1, 2, 3}, 4, 5};
struct my_struct3 q = {{1, 2, 3}, 4, 'a'};
init_common(&s.com);
init_common(&p.com);
init_common(&q.com);
It's semi-safe. If the first member is identical, you are guaranteed that a pointer to the structure is also a pointer to the first member. Technically, a compiler could insert arbitrary padding after the first member. In fact, no compiler will do so, so if two structs share a first and second member, the pointers to the second members also have the same offset. However the offset may not be address + sizeof(int) for your member "b". ints might e padded to 8 bytes for performance.
To avoid ambiguity, you can explicitly set the common members to a struct "common".

defining structure variable during declaration

I have been trying something in structures in C.
struct val{
unsigned int a : 1 ;
}store[100] ;
Now I want to initialize all the array members value to 1. That is, I want all the array members to have their variable a assigned to 1. I can use a loop for that but how can I do this during the declaration?
struct val
{
unsigned int a=1 : 1 ;
}store[100];
How can i achieve this? The above syntax is coming out to be wrong in code::blocks.
struct val
{
unsigned int a : 1;
}store[100];
I initially thought that what you were doing was a bitfield initialization. Further research though suggested I am wrong, and that you are trying to use gcc's designated initializers. What you want to do, is not possible that way. I found another way in your code that you can do it, though:
typedef struct {
unsigned int a;
} val;
then, where you want to initialize the array, you will do something like that:
val values[100] = {[0].a = 1};
This works by exploiting this behavior of gcc:
If the same field is initialized multiple times, it has the value from the last initialization. If any such overridden initialization has side-effect, it is unspecified whether the side-effect happens or not. Currently, GCC discards them and issues a warning.
My Test Program follows:
#include <stdio.h>
struct val {
unsigned int a;
};
int
main (void)
{
struct val value[100] = {[0].a = 1};
printf ("A random val's value is %d\n", value[40].a);
return 0;
}
Compiles and works cleanly on my GCC 4.9.1.
It's not possible.
You can use a loop, or, if by any chance you want to initialize all struct's data members to a specific value, you can also use memset:
struct val {
unsigned int a : 1 ;
} store[N];
memset(store, value, N * sizeof(struct val));
Before declaring a struct variable you can't initialize its member. Initialization can be done as
struct val{
unsigned int a : 1;
}store[100] = {[0].a = 1} ; // Designated initializer
but it will initialize a member of other elements of store to 0. To initialize member a of all elements of store you need a loop.

Dereferencing pointer to array of void

I am attempting to learn more about C and its arcane hidden powers, and I attempted to make a sample struct containing a pointer to a void, intended to use as array.
EDIT: Important note: This is for raw C code.
Let's say I have this struct.
typedef struct mystruct {
unsigned char foo;
unsigned int max;
enum data_t type;
void* data;
} mystruct;
I want data to hold max of either unsigned chars, unsigned short ints, and unsigned long ints, the data_t enum contains
values for those 3 cases.
enum Grid_t {gi8, gi16, gi32}; //For 8, 16 and 32 bit uints.
Then I have this function that initializes and allocates one of this structs, and is supposed to return a pointer to the new struct.
mystruct* new(unsigned char foo, unsigned int bar, long value) {
mystruct* new;
new = malloc(sizeof(mystruct)); //Allocate space for the struct.
assert(new != NULL);
new->foo = foo;
new->max = bar;
int i;
switch(type){
case gi8: default:
new->data = (unsigned char *)calloc(new->max, sizeof(unsigned char));
assert(new->data != NULL);
for(i = 0; i < new->max; i++){
*((unsigned char*)new->data + i) = (unsigned char)value;
//Can I do anything with the format new->data[n]? I can't seem
//to use the [] shortcut to point to members in this case!
}
break;
}
return new;
}
The compiler returns no warnings, but I am not too sure about this method. Is it a legitimate way to use pointers?
Is there a better way©?
I missed calling it. like mystruct* P; P = new(0,50,1024);
Unions are interesting but not what I wanted. Since I will have to approach every specific case individually anyway, casting seems as good as an union. I specifically wanted to have much larger 8-bit arrays than 32-bits arrays, so an union doesn't seem to help. For that I'd make it just an array of longs :P
No, you cannot dereference a void* pointer, it is forbidden by the C language standard. You have to cast it to a concrete pointer type before doing so.
As an alternative, depending on your needs, you can also use a union in your structure instead of a void*:
typedef struct mystruct {
unsigned char foo;
unsigned int max;
enum data_t type;
union {
unsigned char *uc;
unsigned short *us;
unsigned int *ui;
} data;
} mystruct;
At any given time, only one of data.uc, data.us, or data.ui is valid, as they all occupy the same space in memory. Then, you can use the appropriate member to get at your data array without having to cast from void*.
What about
typedef struct mystruct
{
unsigned char foo;
unsigned int max;
enum data_t type;
union
{
unsigned char *chars;
unsigned short *shortints;
unsigned long *longints;
};
} mystruct;
That way, there is no need to cast at all. Just use data_t to determine which of the pointers you want to access.
Is type supposed to be an argument to the function? (Don't name this function or any variable new or any C++ programmer who tries to use it will hunt you down)
If you want to use array indices, you can use a temporary pointer like this:
unsigned char *cdata = (unsigned char *)new->data;
cdata[i] = value;
I don't really see a problem with your approach. If you expect a particular size (which I think you do given the name gi8 etc.) I would suggest including stdint.h and using the typedefs uint8_t, uint16_t, and uint32_t.
A pointer is merely an address in the memory space. You can choose to interpret it however you wish. Review union for more information on how you can interpret the same memory location in multiple ways.
casting between pointer types is common in C and C++, and the use of void* implies that you dont want users to accidentally dereference (dereferencing a void* will cause an error, but dereferencing the same pointer when cast to int* will not)

Resources