Weird behaviour with variable length arrays in struct in C - c

I came across a concept which some people call a "Struct Hack" where we can declare a pointer variable inside a struct, like this:
struct myStruct{
int data;
int *array;
};
and later on when we allocate memory for a struct myStruct using malloc in our main() function, we can simultaneously allocate memory for our int *array pointer in same step, like this:
struct myStruct *p = malloc(sizeof(struct myStruct) + 100 * sizeof(int));
p->array = p+1;
instead of
struct myStruct *p = malloc(sizeof(struct myStruct));
p->array = malloc(100 * sizeof(int));
assuming we want an array of size 100.
The first option is said to be better since we would get a continuous chunk of memory and we can free that whole chunk with one call to free() versus 2 calls in the latter case.
Experimenting, I wrote this:
#include<stdio.h>
#include<stdlib.h>
struct myStruct{
int i;
int *array;
};
int main(){
/* I ask for only 40 more bytes (10 * sizeof(int)) */
struct myStruct *p = malloc(sizeof(struct myStruct) + 10 * sizeof(int));
p->array = p+1;
/* I assign values way beyond the initial allocation*/
for (int i = 0; i < 804; i++){
p->array[i] = i;
}
/* printing*/
for (int i = 0; i < 804; i++){
printf("%d\n",p->array[i]);
}
return 0;
}
I am able to execute it without problems, without any segmentation faults. Looks weird to me.
I also came to know that C99 has a provision which says that instead of declaring an int *array inside a struct, we can do int array[] and I did this, using malloc() only for the struct, like
struct myStruct *p = malloc(sizeof(struct myStruct));
and initialising array[] like this
p->array[10] = 0; /* I hope this sets the array size to 10
and also initialises array entries to 0 */
But then again this weirdness where I am able to access and assign array indices beyond the array size and also print the entries:
for(int i = 0; i < 296; i++){ // first loop
p->array[i] = i;
}
for(int i = 0; i < 296; i++){ // second loop
printf("%d\n",p->array[i]);
}
After printing p->array[i] till i = 296 it gives me a segmentation fault, but clearly it had no problems assigning beyond i = 9.
(If I increment 'i' till 300 in the first for loop above, I immediately get a segmentation fault and the program doesn't print any values.)
Any clues about what's happening? Is it undefined behaviour or what?
EDIT: When I compiled the first snippet with the command
cc -Wall -g -std=c11 -O struct3.c -o struct3
I got this warning:
warning: incompatible pointer types assigning to 'int *' from
'struct str *' [-Wincompatible-pointer-types]
p->array = p+1;

Yes, what you see here is an example of undefined behavior.
Writing beyond the end of allocated array (aka buffer overflow) is a good example of undefined behavior: it will often appear to "work normally", while other times it will crash (e.g. "Segmentation fault").
A low-level explanation: there are control structures in memory that are situated some distance from your allocated objects. If your program does a big buffer overflow, there is more chance it will damage these control structures, while for more modest overflows it will damage some unused data (e.g. padding). In any case, however, buffer overflows invoke undefined behavior.
The "struct hack" in your first form also invokes undefined behavior (as indicated by the warning), but of a special kind - it's almost guaranteed that it would always work normally, in most compilers. However, it's still undefined behavior, so not recommended to use. In order to sanction its use, the C committee invented this "flexible array member" syntax (your second syntax), which is guaranteed to work.
Just to make it clear - assignment to an element of an array never allocates space for that element (not in C, at least). In C, when assigning to an element, it should already be allocated, even if the array is "flexible". Your code should know how much to allocate when it allocates memory. If you don't know how much to allocate, use one of the following techniques:
Allocate an upper bound:
struct myStruct{
int data;
int array[100]; // you will never need more than 100 numbers
};
Use realloc
Use a linked list (or any other sophisticated data structure)

What you describe as a "Struct Hack" is indeed a hack. It is not worth IMO.
p->array = p+1;
will give you problems on many compilers which will demand explicit conversion:
p->array = (int *) (p+1);
I am able to execute it without problems, without any segmentation faults. Looks weird to me.
It is undefined behaviour. You are accessing memory on the heap and many compilers and operating system will not prevent you to do so. But it extremely bad practice to use it.

Related

Multiple structures in a single malloc invoking undefined behaviour?

From Use the correct syntax when declaring a flexible array member it says that when malloc is used for a header and flexible data when data[1] is hacked into the struct,
This example has undefined behavior when accessing any element other
than the first element of the data array. (See the C Standard, 6.5.6.)
Consequently, the compiler can generate code that does not return the
expected value when accessing the second element of data.
I looked up the C Standard 6.5.6, and could not see how this would produce undefined behaviour. I've used a pattern that I'm comfortable with, where the header is implicitly followed by data, using the same sort of malloc,
#include <stdlib.h> /* EXIT malloc free */
#include <stdio.h> /* printf */
#include <string.h> /* strlen memcpy */
struct Array {
size_t length;
char *array;
}; /* +(length + 1) char */
static struct Array *Array(const char *const str) {
struct Array *a;
size_t length;
length = strlen(str);
if(!(a = malloc(sizeof *a + length + 1))) return 0;
a->length = length;
a->array = (char *)(a + 1); /* UB? */
memcpy(a->array, str, length + 1);
return a;
}
/* Take a char off the end just so that it's useful. */
static void Array_to_string(const struct Array *const a, char (*const s)[12]) {
const int n = a->length ? a->length > 9 ? 9 : (int)a->length - 1 : 0;
sprintf(*s, "<%.*s>", n, a->array);
}
int main(void) {
struct Array *a = 0, *b = 0;
int is_done = 0;
do { /* Try. */
char s[12], t[12];
if(!(a = Array("Foo!")) || !(b = Array("To be or not to be."))) break;
Array_to_string(a, &s);
Array_to_string(b, &t);
printf("%s %s\n", s, t);
is_done = 1;
} while(0); if(!is_done) {
perror(":(");
} {
free(a);
free(b);
}
return is_done ? EXIT_SUCCESS : EXIT_FAILURE;
}
Prints,
<Foo> <To be or >
The compliant solution uses C99 flexible array members. The page also says,
Failing to use the correct syntax when declaring a flexible array
member can result in undefined behavior, although the incorrect syntax
will work on most implementations.
Technically, does this C90 code produce undefined behaviour, too? And if not, what is the difference? (Or the Carnegie Mellon Wiki is incorrect?) What is the factor on the implementations this will not work on?
This should be well defined:
a->array = (char *)(a + 1);
Because you create a pointer to one element past the end of an array of size 1 but do not dereference it. And because a->array now points to bytes that do not yet have an effective type, you can use them safely.
This only works however because you're using the bytes that follow as an array of char. If you instead tried to create an array of some other type whose size is greater than 1, you could have alignment issues.
For example, if you compiled a program for ARM with 32 bit pointers and you had this:
struct Array {
int size;
uint64_t *a;
};
...
Array a = malloc(sizeof *a + (length * sizeof(uint64_t)));
a->length = length;
a->a= (uint64_t *)(a + 1); // misaligned pointer
a->a[0] = 0x1111222233334444ULL; // misaligned write
Your program would crash due to a misaligned write. So in general you shouldn't depend on this. Best to stick with a flexible array member which the standard guarantees will work.
As an adjunct to #dbush good answer, a way to get around alignment woes is to use a union. This insures &p[1] is properly aligned for (uint64_t*)1. sizeof *p includes any needed padding vs. sizeof *a.
union {
struct Array header;
uint64_t dummy;
} *p;
p = malloc(sizeof *p + length*sizeof p->header->array);
struct Array *a = (struct Array *)&p[0]; // or = &(p->header);
a->length = length;
a->array = (uint64_t*) &p[1]; // or &p[1].dummy;
Or go with C99 and flexible array member.
1 As well as struct Array
Before the publication of C89, there were some implementations that would attempt to identify and trap upon out-of-bounds array accesses. Given something like:
struct foo {int a[4],b[4];} *p;
such implementations would squawk at an effort to access p->a[i] if i wasn't in the range 0 to 3. For programs that don't need to index the address of array-type lvalue p->a to access anything outside that array, being able to trap on such out-of-bounds accesses would be useful.
The authors of C89 were also almost certainly aware that it was common for programs to use the address of dummy-sized array at the end of a structure as a means of accessing storage beyond the structure. Using such techniques made it possible to do things that couldn't be done nearly as nicely otherwise, and part of the Spirit of C, according to the authors of the Standard, is "Don't prevent the programmer from doing what needs to be done".
Consequently, the authors of the Standard treated such accesses as something which implementations could support or not, at their leisure, presumably based upon what would be most useful for their customers. While it would often be helpful for implementations which would normally bounds-check accesses to structures in an array, to provide an option to omit such checks in cases where the last item of an indirectly-accessed structure is an array with one element (or, if they extend the language to waive a compile-time constraint, zero elements), people writing such implementations would presumably be capable of recognizing such things without the authors of the Standard having to tell them. The notion that "Undefined Behavior" was intended as some form of prohibition doesn't seem to have really taken hold until after the publication of C89's successor standard.
With regard to your example, having a pointer within a struct point to later storage in the same allocation should work, but with a couple of caveats:
If the allocation is passed to realloc, the pointer within it will become invalid.
The only real advantage of using a pointer versus a flexible array member is that it allows for the possibility of having it point somewhere else. That may be good if the only kind of "something else" will always be an constant object of static duration that never has to be freed, or perhaps if it is some other kind of object that won't have to be freed, but may be problematical if it could hold the only reference to something stored in a separate allocation.
Flexible array members have been available as an extension in some compilers before C89 was written, and were officially added in C99. Any decent compiler should support them.
You can define struct Array as:
struct Array
{
size_t length;
char array[1];
}; /* +(length + 1) char */
then malloc( sizeof *a + length ). The "+1" element is in array[1] member. Fill structure with:
a->length = length;
strcpy( a->array, str );

How to allocate memory for an array and a struct in one malloc call without breaking strict aliasing?

When allocating memory for a variable sized array, I often do something like this:
struct array {
long length;
int *mem;
};
struct array *alloc_array( long length)
{
struct array *arr = malloc( sizeof(struct array) + sizeof(int)*length);
arr->length = length;
arr->mem = (int *)(arr + 1); /* dubious pointer manipulation */
return arr;
}
I then use the arrray like this:
int main()
{
struct array *arr = alloc_array( 10);
for( int i = 0; i < 10; i++)
arr->mem[i] = i;
/* do something more meaningful */
free( arr);
return 0;
}
This works and compiles without warnings. Recently however, I read about strict aliasing. To my understanding, the code above is legal with regard to strict aliasing, because the memory being accessed through the int * is not the memory being accessed through the struct array *. Does the code in fact break strict aliasing rules? If so, how can it be modified not to break them?
I am aware that I could allocate the struct and array separately, but then I would need to free them separately too, presumably in some sort of free_array function. That would mean that I have to know the type of the memory I am freeing when I free it, which would complicate code. It would also likely be slower. That is not what I am looking for.
The proper way to declare a flexible array member in a struct is as follows:
struct array {
long length;
int mem[];
};
Then you can allocate the space as before without having to assign anything to mem:
struct array *alloc_array( long length)
{
struct array *arr = malloc( sizeof(struct array) + sizeof(int)*length);
arr->length = length;
return arr;
}
Modern C officially supports flexible array members. So you can define your structure as follows:
struct array {
long length;
int mem[];
};
And allocate it as you do now, without the added hassle of dubious pointer manipulation. It will work out of the box, all the access will be properly aligned and you won't have to worry about dark corners of the language. Though, naturally, it's only viable if you have a single such member you need to allocate.
As for what you have now, since allocated storage doesn't have a declared type (it's a blank slate), you aren't breaking strict aliasing, since you haven't given that memory an effective type. The only issue is with possible mess-up of alignment. Though that's unlikely with the types in your structure.
I believe the code as written does violate strict aliasing rules, when standard read in the strictest sense.
You are accessing an object of type int through a pointer to unrelated type array. I believe, that an easy way out would be to use starting address of the struct, and than convert it char*, and perform a pointer arithmetic on it. Example:
void* alloc = malloc(...);
array = alloc;
int* p_int = (char*)alloc + sizeof(array);

How do I properly allocate an array within a struct with malloc and realloc in C?

I'm trying to implement a stack in C, while also trying to learn C. My background is mostly in higher languages (like Python), so a lot of the memory allocation is new to me.
I have a program that works as expected, but throws warnings that make me believe I'm doing something wrong.
Here is the code:
typedef struct {
int num_items;
int top;
int items[];
} stack;
void push(stack *st, int n) {
st->num_items++;
int* tmp = realloc(st->items, (st->num_items) * sizeof(int));
if (tmp) {
*(st->items) = tmp;
}
st->items[st->num_items - 1] = n;
st->top = n;
}
int main() {
stack *x = malloc(sizeof(x));
x->num_items = 0;
x->top = 0;
*(x->items) = malloc(0);
push(x, 2);
push(x, 3);
printf("Stack top: %d, length: %d.\n", x->top, x->num_items);
for (int i = 0; i < x->num_items; i++) {
free(&(x->items[i]));
}
free(x->items);
free(x);
}
Here is the output:
Stack top: 3, length: 2.
Which is expected. But during compilation, I get the following errors:
> gcc -x c -o driver driver.c
driver.c: In function 'push':
driver.c:16:16: warning: assignment makes integer from pointer without a cast
*(st->items) = tmp;
...
driver.c: In function 'main':
driver.c:27:14: warning: assignment makes integer from pointer without a cast
*(x->items) = malloc(0);
When you have an empty array declared at the end of the structure like you have, it's called a flexible array member. And you allocate it not by allocating just the array member, but by allocating the whole structure.
Like e.g.
stack *x = malloc(sizeof *x + sizeof s->items[0] * 32);
The above malloc call allocates space for the structure itself (note the use of the dereference operator for sizeof *x) plus space for an array of 32 elements.
It's either the above, or change the member to be a pointer.
This is an array of unspecified size
int items[];
This is a pointer
int *items;
The latter is what you use with malloc/realloc to make use of dynamically allocated memory.
Also, because you're doing (for example)
*(x->items) = malloc(0);
...you're de-referencing items so that it becomes an int which is why you're getting those particular warnings.
Your belief is correct. Usually - that is almost always - warnings from C compiler are signs of grave programming errors that will cause serious problems. Quoting Shooting yourself in the foot in various programming languages:
C
You shoot yourself in the foot.
You shoot yourself in the foot and then nobody else can figure out what you did.
The problem is that you're coding as if items was a pointer to int, yet you have declared and defined it as a flexible array member (FAM), which is an entirely different beast altogether. And since assigning to an array would produce an error, i.e.
x->items = malloc(0);
would be an error, you've come up with something that compiles with just warnings. Remember that errors are better than warnings, because they stop you from shooting yourself into foot.
The solution is to declare items as a pointer to int instead:
int *items;
and use
x->items = ...;
to get the pointer behaviour you expect.
Also,
free(&(x->items[i]));
is very wrong, since you never allocated the ith integer to begin with; they were objects in the array. Also, you don't need malloc(0); just initialize with a null pointer:
x->items = NULL;
realloc and free wouldn't mind the null pointer.
The flexible array member means that the last element in the structure is an array of indefinite length, so in malloc you would reserve enough memory for it too:
stack *x = malloc(sizeof x + sizeof *x->items * n_items);
The flexible array member is used in CPython for objects like str, bytes or tuple that are of immutable length - it is slightly faster to use a FAM instead of a pointer elsewhere, and it saves memory - especially with shorter strings or tuples.
Finally, notice that your stack becomes slower the more it grows - the reason is because you're always allocating just one more element. Instead, you should scale the size of the stack by a factor (1.3, 1.5, 2.0?), so that insertions run in O(1) time as opposed to O(n); and consider what will happen should realloc fail - perhaps you should be more loud about it!

Heap corruption on malloc/calloc of char pointer field in struct

I have a struct defined like this
typedef struct {
char* Value;
unsigned int Length;
} MY_STRUCT;
I'm creating an array of these structs using calloc:
MY_STRUCT* arr = (MY_STRUCT*)calloc(50, sizeof(MY_STRUCT));
Then, in a loop, I'm accessing each struct and trying to allocate and assign a value to the Value field using calloc and memcpy:
int i;
for(i = 0; i < 50; i++)
{
MY_STRUCT myStruct = arr[i];
int valueLength = get_value_length(i);//for sake of example, we can assume that this function returns any value [1-99]
myStruct.Length = valueLength;
myStruct.Value = (char*) calloc(valueLength, sizeof(char));
memcpy(myStruct.Value, get_value(i), valueLength); //assume get_value(i) returns char* pointing to start of desired value
}
This code block crashes on the calloc line with Visual Studio indicating heap corruption. It doesn't fail the first time through the loop. Instead, it fails on the second pass when I'm trying to allocate a length 20 char array (first pass is length 5). I've tried using malloc as well, and I've tried using recommendations in:
Heap Corruption with malloc, struct and char *
Do I cast the result of malloc?
Nothing seems to mitigate the problem. I am originally a managed code programmer so my knowledge of memory allocation and management is not always the best. I'm sure I'm doing something boneheaded, but I'm not sure what. Any help would be greatly appreciated. Thank you!

Are "malloc(sizeof(struct a *))" and "malloc(sizeof(struct a))" the same?

This question is a continuation of Malloc call crashing, but works elsewhere
I tried the following program and I found it working (i.e. not crashing - and this was mentioned in the above mentioned link too). I May be lucky to have it working but I'm looking for a reasonable explanation from the SO experts on why this is working?!
Here are some basic understanding on allocation of memory using malloc() w.r.t structures and pointers
malloc(sizeof(struct a) * n) allocates n number of type struct a elements. And, this memory location can be stored and accessed using a pointer-to-type-"struct a". Basically a struct a *.
malloc(sizeof(struct a *) * n) allocates n number of type struct a * elements. Each element can then point to elements of type struct a. Basically malloc(sizeof(struct a *) * n) allocates an array(n-elements)-of-pointers-to-type-"struct a". And, the allocated memory location can be stored and accessed using a pointer-to-(pointer-to-"struct a"). Basically a struct a **.
So when we create an array(n-elements)-of-pointers-to-type-"struct a", is it
valid to assign that to struct a * instead of struct a ** ?
valid to access/de-reference the allocated array(n-elements)-of-pointers-to-type-"struct a" using pointer-to-"struct a" ?
data * array = NULL;
if ((array = (data *)malloc(sizeof(data *) * n)) == NULL) {
printf("unable to allocate memory \n");
return -1;
}
The code snippet is as follows:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
typedef struct {
int value1;
int value2;
}data;
int n = 1000;
int i;
int val=0;
data * array = NULL;
if ((array = (data *)malloc(sizeof(data *) * n)) == NULL) {
printf("unable to allocate memory \n");
return -1;
}
printf("allocation successful\n");
for (i=0 ; i<n ; i++) {
array[i].value1 = val++;
array[i].value2 = val++;
}
for (i=0 ; i<n ; i++) {
printf("%3d %3d %3d\n", i, array[i].value1, array[i].value2);
}
free(array);
printf("freeing successful\n");
return 0;
}
EDIT:
OK say if I do the following by mistake
data * array = NULL;
if ((array = (data *)malloc(sizeof(data *) * n)) == NULL) {
Is there a way to capture (during compile-time using any GCC flags) these kind of unintended programming typo's which could work at times and might blow out anytime! I compiled this using -Wall and found no warnings!
There seems to be a fundamental misunderstanding.
malloc(sizeof(struct a) * n) allocates n number of type struct a elements.
No, that's just what one usually does use it as after such a call. malloc(size) allocates a memory region of size bytes. What you do with that region is entirely up to you. The only thing that matters is that you don't overstep the limits of the allocated memory. Assuming 4 byte float and int and 8 byte double, after a successful malloc(100*sizeof(float));, you can use the first 120 of the 400 bytes as an array of 15 doubles, the next 120 as an array of 30 floats, then place an array of 20 chars right behind that and fill up the remaining 140 bytes with 35 ints if you wish. That's perfectly harmless defined behaviour.
malloc returns a void*, which can be implicitly cast to a pointer of any type, so
some_type **array = malloc(100 * sizeof(data *)); // intentionally unrelated types
is perfectly fine, it might just not be the amount of memory you wanted. In this case it very likely is, because pointers tend to have the same size regardless of what they're pointing to.
More likely to give you the wrong amount of memory is
data *array = malloc(n * sizeof(data*));
as you had it. If you use the allocated piece of memory as an array of n elements of type data, there are three possibilities
sizeof(data) < sizeof(data*). Then your only problem is that you're wasting some space.
sizeof(data) == sizeof(data*). Everything's fine, no space wasted, as if you had no typo at all.
sizeof(data) > sizeof(data*). Then you'll access memory you shouldn't have accessed when touching later array elements, which is undefined behaviour. Depending on various things, that could consistently work as if your code was correct, immediately crash with a segfault or anything in between (technically it could behave in a manner that cannot meaningfully be placed between those two, but that would be unusual).
If you intentionally do that, knowing point 1. or 2. applies, it's bad practice, but not an error. If you do it unintentionally, it is an error regardless of which point applies, harmless but hard to find while 1. or 2. applies, harmful but normally easier to detect in case of 3.
In your examples. data was 4 resp. 8 bytes (probably), which on a 64-bit system puts them into 1. resp. 2. with high probability, on a 32-bit system into 2 resp. 3.
The recommended way to avoid such errors is to
type *pointer = malloc(num_elems * sizeof(*pointer));
No.
sizeof(struct a*) is the size of a pointer.
sizeof(struct a) is the size of the entire struct.
This array = (data *)malloc(sizeof(data *) * n) allocates a sizeof(data*) (pointer) to struct data, if you want to do that, you need a your array to be a data** array.
In your case you want your pointer to point to sizeof(data), a structure in memory, not to another pointer. That would require a data** (pointer to pointer).
is it valid to assign that to struct a * instead of struct a ** ?
Well, technically speaking, it is valid to assign like that, but it is wrong (UB) to dereference such pointer. You don't want to do this.
valid to access/de-reference the allocated array(n-elements)-of-pointers-to-type-"struct a" using pointer-to-"struct a" ?
No, undefined behavior.

Resources