When allocating memory for a variable sized array, I often do something like this:
struct array {
long length;
int *mem;
};
struct array *alloc_array( long length)
{
struct array *arr = malloc( sizeof(struct array) + sizeof(int)*length);
arr->length = length;
arr->mem = (int *)(arr + 1); /* dubious pointer manipulation */
return arr;
}
I then use the arrray like this:
int main()
{
struct array *arr = alloc_array( 10);
for( int i = 0; i < 10; i++)
arr->mem[i] = i;
/* do something more meaningful */
free( arr);
return 0;
}
This works and compiles without warnings. Recently however, I read about strict aliasing. To my understanding, the code above is legal with regard to strict aliasing, because the memory being accessed through the int * is not the memory being accessed through the struct array *. Does the code in fact break strict aliasing rules? If so, how can it be modified not to break them?
I am aware that I could allocate the struct and array separately, but then I would need to free them separately too, presumably in some sort of free_array function. That would mean that I have to know the type of the memory I am freeing when I free it, which would complicate code. It would also likely be slower. That is not what I am looking for.
The proper way to declare a flexible array member in a struct is as follows:
struct array {
long length;
int mem[];
};
Then you can allocate the space as before without having to assign anything to mem:
struct array *alloc_array( long length)
{
struct array *arr = malloc( sizeof(struct array) + sizeof(int)*length);
arr->length = length;
return arr;
}
Modern C officially supports flexible array members. So you can define your structure as follows:
struct array {
long length;
int mem[];
};
And allocate it as you do now, without the added hassle of dubious pointer manipulation. It will work out of the box, all the access will be properly aligned and you won't have to worry about dark corners of the language. Though, naturally, it's only viable if you have a single such member you need to allocate.
As for what you have now, since allocated storage doesn't have a declared type (it's a blank slate), you aren't breaking strict aliasing, since you haven't given that memory an effective type. The only issue is with possible mess-up of alignment. Though that's unlikely with the types in your structure.
I believe the code as written does violate strict aliasing rules, when standard read in the strictest sense.
You are accessing an object of type int through a pointer to unrelated type array. I believe, that an easy way out would be to use starting address of the struct, and than convert it char*, and perform a pointer arithmetic on it. Example:
void* alloc = malloc(...);
array = alloc;
int* p_int = (char*)alloc + sizeof(array);
I'm trying to implement a stack in C, while also trying to learn C. My background is mostly in higher languages (like Python), so a lot of the memory allocation is new to me.
I have a program that works as expected, but throws warnings that make me believe I'm doing something wrong.
Here is the code:
typedef struct {
int num_items;
int top;
int items[];
} stack;
void push(stack *st, int n) {
st->num_items++;
int* tmp = realloc(st->items, (st->num_items) * sizeof(int));
if (tmp) {
*(st->items) = tmp;
}
st->items[st->num_items - 1] = n;
st->top = n;
}
int main() {
stack *x = malloc(sizeof(x));
x->num_items = 0;
x->top = 0;
*(x->items) = malloc(0);
push(x, 2);
push(x, 3);
printf("Stack top: %d, length: %d.\n", x->top, x->num_items);
for (int i = 0; i < x->num_items; i++) {
free(&(x->items[i]));
}
free(x->items);
free(x);
}
Here is the output:
Stack top: 3, length: 2.
Which is expected. But during compilation, I get the following errors:
> gcc -x c -o driver driver.c
driver.c: In function 'push':
driver.c:16:16: warning: assignment makes integer from pointer without a cast
*(st->items) = tmp;
...
driver.c: In function 'main':
driver.c:27:14: warning: assignment makes integer from pointer without a cast
*(x->items) = malloc(0);
When you have an empty array declared at the end of the structure like you have, it's called a flexible array member. And you allocate it not by allocating just the array member, but by allocating the whole structure.
Like e.g.
stack *x = malloc(sizeof *x + sizeof s->items[0] * 32);
The above malloc call allocates space for the structure itself (note the use of the dereference operator for sizeof *x) plus space for an array of 32 elements.
It's either the above, or change the member to be a pointer.
This is an array of unspecified size
int items[];
This is a pointer
int *items;
The latter is what you use with malloc/realloc to make use of dynamically allocated memory.
Also, because you're doing (for example)
*(x->items) = malloc(0);
...you're de-referencing items so that it becomes an int which is why you're getting those particular warnings.
Your belief is correct. Usually - that is almost always - warnings from C compiler are signs of grave programming errors that will cause serious problems. Quoting Shooting yourself in the foot in various programming languages:
C
You shoot yourself in the foot.
You shoot yourself in the foot and then nobody else can figure out what you did.
The problem is that you're coding as if items was a pointer to int, yet you have declared and defined it as a flexible array member (FAM), which is an entirely different beast altogether. And since assigning to an array would produce an error, i.e.
x->items = malloc(0);
would be an error, you've come up with something that compiles with just warnings. Remember that errors are better than warnings, because they stop you from shooting yourself into foot.
The solution is to declare items as a pointer to int instead:
int *items;
and use
x->items = ...;
to get the pointer behaviour you expect.
Also,
free(&(x->items[i]));
is very wrong, since you never allocated the ith integer to begin with; they were objects in the array. Also, you don't need malloc(0); just initialize with a null pointer:
x->items = NULL;
realloc and free wouldn't mind the null pointer.
The flexible array member means that the last element in the structure is an array of indefinite length, so in malloc you would reserve enough memory for it too:
stack *x = malloc(sizeof x + sizeof *x->items * n_items);
The flexible array member is used in CPython for objects like str, bytes or tuple that are of immutable length - it is slightly faster to use a FAM instead of a pointer elsewhere, and it saves memory - especially with shorter strings or tuples.
Finally, notice that your stack becomes slower the more it grows - the reason is because you're always allocating just one more element. Instead, you should scale the size of the stack by a factor (1.3, 1.5, 2.0?), so that insertions run in O(1) time as opposed to O(n); and consider what will happen should realloc fail - perhaps you should be more loud about it!
I came across a concept which some people call a "Struct Hack" where we can declare a pointer variable inside a struct, like this:
struct myStruct{
int data;
int *array;
};
and later on when we allocate memory for a struct myStruct using malloc in our main() function, we can simultaneously allocate memory for our int *array pointer in same step, like this:
struct myStruct *p = malloc(sizeof(struct myStruct) + 100 * sizeof(int));
p->array = p+1;
instead of
struct myStruct *p = malloc(sizeof(struct myStruct));
p->array = malloc(100 * sizeof(int));
assuming we want an array of size 100.
The first option is said to be better since we would get a continuous chunk of memory and we can free that whole chunk with one call to free() versus 2 calls in the latter case.
Experimenting, I wrote this:
#include<stdio.h>
#include<stdlib.h>
struct myStruct{
int i;
int *array;
};
int main(){
/* I ask for only 40 more bytes (10 * sizeof(int)) */
struct myStruct *p = malloc(sizeof(struct myStruct) + 10 * sizeof(int));
p->array = p+1;
/* I assign values way beyond the initial allocation*/
for (int i = 0; i < 804; i++){
p->array[i] = i;
}
/* printing*/
for (int i = 0; i < 804; i++){
printf("%d\n",p->array[i]);
}
return 0;
}
I am able to execute it without problems, without any segmentation faults. Looks weird to me.
I also came to know that C99 has a provision which says that instead of declaring an int *array inside a struct, we can do int array[] and I did this, using malloc() only for the struct, like
struct myStruct *p = malloc(sizeof(struct myStruct));
and initialising array[] like this
p->array[10] = 0; /* I hope this sets the array size to 10
and also initialises array entries to 0 */
But then again this weirdness where I am able to access and assign array indices beyond the array size and also print the entries:
for(int i = 0; i < 296; i++){ // first loop
p->array[i] = i;
}
for(int i = 0; i < 296; i++){ // second loop
printf("%d\n",p->array[i]);
}
After printing p->array[i] till i = 296 it gives me a segmentation fault, but clearly it had no problems assigning beyond i = 9.
(If I increment 'i' till 300 in the first for loop above, I immediately get a segmentation fault and the program doesn't print any values.)
Any clues about what's happening? Is it undefined behaviour or what?
EDIT: When I compiled the first snippet with the command
cc -Wall -g -std=c11 -O struct3.c -o struct3
I got this warning:
warning: incompatible pointer types assigning to 'int *' from
'struct str *' [-Wincompatible-pointer-types]
p->array = p+1;
Yes, what you see here is an example of undefined behavior.
Writing beyond the end of allocated array (aka buffer overflow) is a good example of undefined behavior: it will often appear to "work normally", while other times it will crash (e.g. "Segmentation fault").
A low-level explanation: there are control structures in memory that are situated some distance from your allocated objects. If your program does a big buffer overflow, there is more chance it will damage these control structures, while for more modest overflows it will damage some unused data (e.g. padding). In any case, however, buffer overflows invoke undefined behavior.
The "struct hack" in your first form also invokes undefined behavior (as indicated by the warning), but of a special kind - it's almost guaranteed that it would always work normally, in most compilers. However, it's still undefined behavior, so not recommended to use. In order to sanction its use, the C committee invented this "flexible array member" syntax (your second syntax), which is guaranteed to work.
Just to make it clear - assignment to an element of an array never allocates space for that element (not in C, at least). In C, when assigning to an element, it should already be allocated, even if the array is "flexible". Your code should know how much to allocate when it allocates memory. If you don't know how much to allocate, use one of the following techniques:
Allocate an upper bound:
struct myStruct{
int data;
int array[100]; // you will never need more than 100 numbers
};
Use realloc
Use a linked list (or any other sophisticated data structure)
What you describe as a "Struct Hack" is indeed a hack. It is not worth IMO.
p->array = p+1;
will give you problems on many compilers which will demand explicit conversion:
p->array = (int *) (p+1);
I am able to execute it without problems, without any segmentation faults. Looks weird to me.
It is undefined behaviour. You are accessing memory on the heap and many compilers and operating system will not prevent you to do so. But it extremely bad practice to use it.
I've allocated an "array" of mystruct of size n like this:
if (NULL == (p = calloc(sizeof(struct mystruct) * n,1))) {
/* handle error */
}
Later on, I only have access to p, and no longer have n. Is there a way to determine the length of the array given just the pointer p?
I figure it must be possible, since free(p) does just that. I know malloc() keeps track of how much memory it has allocated, and that's why it knows the length; perhaps there is a way to query for this information? Something like...
int length = askMallocLibraryHowMuchMemoryWasAlloced(p) / sizeof(mystruct)
I know I should just rework the code so that I know n, but I'd rather not if possible. Any ideas?
No, there is no way to get this information without depending strongly on the implementation details of malloc. In particular, malloc may allocate more bytes than you request (e.g. for efficiency in a particular memory architecture). It would be much better to redesign your code so that you keep track of n explicitly. The alternative is at least as much redesign and a much more dangerous approach (given that it's non-standard, abuses the semantics of pointers, and will be a maintenance nightmare for those that come after you): store the lengthn at the malloc'd address, followed by the array. Allocation would then be:
void *p = calloc(sizeof(struct mystruct) * n + sizeof(unsigned long int),1));
*((unsigned long int*)p) = n;
n is now stored at *((unsigned long int*)p) and the start of your array is now
void *arr = p+sizeof(unsigned long int);
Edit: Just to play devil's advocate... I know that these "solutions" all require redesigns, but let's play it out.
Of course, the solution presented above is just a hacky implementation of a (well-packed) struct. You might as well define:
typedef struct {
unsigned int n;
void *arr;
} arrInfo;
and pass around arrInfos rather than raw pointers.
Now we're cooking. But as long as you're redesigning, why stop here? What you really want is an abstract data type (ADT). Any introductory text for an algorithms and data structures class would do it. An ADT defines the public interface of a data type but hides the implementation of that data type. Thus, publicly an ADT for an array might look like
typedef void* arrayInfo;
(arrayInfo)newArrayInfo(unsignd int n, unsigned int itemSize);
(void)deleteArrayInfo(arrayInfo);
(unsigned int)arrayLength(arrayInfo);
(void*)arrayPtr(arrayInfo);
...
In other words, an ADT is a form of data and behavior encapsulation... in other words, it's about as close as you can get to Object-Oriented Programming using straight C. Unless you're stuck on a platform that doesn't have a C++ compiler, you might as well go whole hog and just use an STL std::vector.
There, we've taken a simple question about C and ended up at C++. God help us all.
keep track of the array size yourself; free uses the malloc chain to free the block that was allocated, which does not necessarily have the same size as the array you requested
Just to confirm the previous answers: There is no way to know, just by studying a pointer, how much memory was allocated by a malloc which returned this pointer.
What if it worked?
One example of why this is not possible. Let's imagine the code with an hypothetic function called get_size(void *) which returns the memory allocated for a pointer:
typedef struct MyStructTag
{ /* etc. */ } MyStruct ;
void doSomething(MyStruct * p)
{
/* well... extract the memory allocated? */
size_t i = get_size(p) ;
initializeMyStructArray(p, i) ;
}
void doSomethingElse()
{
MyStruct * s = malloc(sizeof(MyStruct) * 10) ; /* Allocate 10 items */
doSomething(s) ;
}
Why even if it worked, it would not work anyway?
But the problem of this approach is that, in C, you can play with pointer arithmetics. Let's rewrite doSomethingElse():
void doSomethingElse()
{
MyStruct * s = malloc(sizeof(MyStruct) * 10) ; /* Allocate 10 items */
MyStruct * s2 = s + 5 ; /* s2 points to the 5th item */
doSomething(s2) ; /* Oops */
}
How get_size is supposed to work, as you sent the function a valid pointer, but not the one returned by malloc. And even if get_size went through all the trouble to find the size (i.e. in an inefficient way), it would return, in this case, a value that would be wrong in your context.
Conclusion
There are always ways to avoid this problem, and in C, you can always write your own allocator, but again, it is perhaps too much trouble when all you need is to remember how much memory was allocated.
Some compilers provide msize() or similar functions (_msize() etc), that let you do exactly that
May I recommend a terrible way to do it?
Allocate all your arrays as follows:
void *blockOfMem = malloc(sizeof(mystruct)*n + sizeof(int));
((int *)blockofMem)[0] = n;
mystruct *structs = (mystruct *)(((int *)blockOfMem) + 1);
Then you can always cast your arrays to int * and access the -1st element.
Be sure to free that pointer, and not the array pointer itself!
Also, this will likely cause terrible bugs that will leave you tearing your hair out. Maybe you can wrap the alloc funcs in API calls or something.
malloc will return a block of memory at least as big as you requested, but possibly bigger. So even if you could query the block size, this would not reliably give you your array size. So you'll just have to modify your code to keep track of it yourself.
For an array of pointers you can use a NULL-terminated array. The length can then determinate like it is done with strings. In your example you can maybe use an structure attribute to mark then end. Of course that depends if there is a member that cannot be NULL. So lets say you have an attribute name, that needs to be set for every struct in your array you can then query the size by:
int size;
struct mystruct *cur;
for (cur = myarray; cur->name != NULL; cur++)
;
size = cur - myarray;
Btw it should be calloc(n, sizeof(struct mystruct)) in your example.
Other have discussed the limits of plain c pointers and the stdlib.h implementations of malloc(). Some implementations provide extensions which return the allocated block size which may be larger than the requested size.
If you must have this behavior you can use or write a specialized memory allocator. This simplest thing to do would be implementing a wrapper around the stdlib.h functions. Some thing like:
void* my_malloc(size_t s); /* Calls malloc(s), and if successful stores
(p,s) in a list of handled blocks */
void my_free(void* p); /* Removes list entry and calls free(p) */
size_t my_block_size(void* p); /* Looks up p, and returns the stored size */
...
really your question is - "can I find out the size of a malloc'd (or calloc'd) data block". And as others have said: no, not in a standard way.
However there are custom malloc implementations that do it - for example http://dmalloc.com/
I'm not aware of a way, but I would imagine it would deal with mucking around in malloc's internals which is generally a very, very bad idea.
Why is it that you can't store the size of memory you allocated?
EDIT: If you know that you should rework the code so you know n, well, do it. Yes it might be quick and easy to try to poll malloc but knowing n for sure would minimize confusion and strengthen the design.
One of the reasons that you can't ask the malloc library how big a block is, is that the allocator will usually round up the size of your request to meet some minimum granularity requirement (for example, 16 bytes). So if you ask for 5 bytes, you'll get a block of size 16 back. If you were to take 16 and divide by 5, you would get three elements when you really only allocated one. It would take extra space for the malloc library to keep track of how many bytes you asked for in the first place, so it's best for you to keep track of that yourself.
This is a test of my sort routine. It sets up 7 variables to hold float values, then assigns them to an array, which is used to find the max value.
The magic is in the call to myMax:
float mmax = myMax((float *)&arr,(int) sizeof(arr)/sizeof(arr[0]));
And that was magical, wasn't it?
myMax expects a float array pointer (float *) so I use &arr to get the address of the array, and cast it as a float pointer.
myMax also expects the number of elements in the array as an int. I get that value by using sizeof() to give me byte sizes of the array and the first element of the array, then divide the total bytes by the number of bytes in each element. (we should not guess or hard code the size of an int because it's 2 bytes on some system and 4 on some like my OS X Mac, and could be something else on others).
NOTE:All this is important when your data may have a varying number of samples.
Here's the test code:
#include <stdio.h>
float a, b, c, d, e, f, g;
float myMax(float *apa,int soa){
int i;
float max = apa[0];
for(i=0; i< soa; i++){
if (apa[i]>max){max=apa[i];}
printf("on i=%d val is %0.2f max is %0.2f, soa=%d\n",i,apa[i],max,soa);
}
return max;
}
int main(void)
{
a = 2.0;
b = 1.0;
c = 4.0;
d = 3.0;
e = 7.0;
f = 9.0;
g = 5.0;
float arr[] = {a,b,c,d,e,f,g};
float mmax = myMax((float *)&arr,(int) sizeof(arr)/sizeof(arr[0]));
printf("mmax = %0.2f\n",mmax);
return 0;
}
In uClibc, there is a MALLOC_SIZE macro in malloc.h:
/* The size of a malloc allocation is stored in a size_t word
MALLOC_HEADER_SIZE bytes prior to the start address of the allocation:
+--------+---------+-------------------+
| SIZE |(unused) | allocation ... |
+--------+---------+-------------------+
^ BASE ^ ADDR
^ ADDR - MALLOC_HEADER_SIZE
*/
/* The amount of extra space used by the malloc header. */
#define MALLOC_HEADER_SIZE \
(MALLOC_ALIGNMENT < sizeof (size_t) \
? sizeof (size_t) \
: MALLOC_ALIGNMENT)
/* Set up the malloc header, and return the user address of a malloc block. */
#define MALLOC_SETUP(base, size) \
(MALLOC_SET_SIZE (base, size), (void *)((char *)base + MALLOC_HEADER_SIZE))
/* Set the size of a malloc allocation, given the base address. */
#define MALLOC_SET_SIZE(base, size) (*(size_t *)(base) = (size))
/* Return base-address of a malloc allocation, given the user address. */
#define MALLOC_BASE(addr) ((void *)((char *)addr - MALLOC_HEADER_SIZE))
/* Return the size of a malloc allocation, given the user address. */
#define MALLOC_SIZE(addr) (*(size_t *)MALLOC_BASE(addr))
malloc() stores metadata regarding space allocation before 8 bytes from space actually allocated. This could be used to determine space of buffer. And on my x86-64 this always return multiple of 16. So if allocated space is multiple of 16 (which is in most cases) then this could be used:
Code
#include <stdio.h>
#include <malloc.h>
int size_of_buff(void *buff) {
return ( *( ( int * ) buff - 2 ) - 17 ); // 32 bit system: ( *( ( int * ) buff - 1 ) - 17 )
}
void main() {
char *buff = malloc(1024);
printf("Size of Buffer: %d\n", size_of_buff(buff));
}
Output
Size of Buffer: 1024
This is my approach:
#include <stdio.h>
#include <stdlib.h>
typedef struct _int_array
{
int *number;
int size;
} int_array;
int int_array_append(int_array *a, int n)
{
static char c = 0;
if(!c)
{
a->number = NULL;
a->size = 0;
c++;
}
int *more_numbers = NULL;
a->size++;
more_numbers = (int *)realloc(a->number, a->size * sizeof(int));
if(more_numbers != NULL)
{
a->number = more_numbers;
a->number[a->size - 1] = n;
}
else
{
free(a->number);
printf("Error (re)allocating memory.\n");
return 1;
}
return 0;
}
int main()
{
int_array a;
int_array_append(&a, 10);
int_array_append(&a, 20);
int_array_append(&a, 30);
int_array_append(&a, 40);
int i;
for(i = 0; i < a.size; i++)
printf("%d\n", a.number[i]);
printf("\nLen: %d\nSize: %d\n", a.size, a.size * sizeof(int));
free(a.number);
return 0;
}
Output:
10
20
30
40
Len: 4
Size: 16
If your compiler supports VLA (variable length array), you can embed the array length into the pointer type.
int n = 10;
int (*p)[n] = malloc(n * sizeof(int));
n = 3;
printf("%d\n", sizeof(*p)/sizeof(**p));
The output is 10.
You could also choose to embed the information into the allocated memory yourself with a structure including a flexible array member.
struct myarray {
int n;
struct mystruct a[];
};
struct myarray *ma =
malloc(sizeof(*ma) + n * sizeof(struct mystruct));
ma->n = n;
struct mystruct *p = ma->a;
Then to recover the size, you would subtract the offset of the flexible member.
int get_size (struct mystruct *p) {
struct myarray *ma;
char *x = (char *)p;
ma = (void *)(x - offsetof(struct myarray, a));
return ma->n;
}
The problem with trying to peek into heap structures is that the layout might change from platform to platform or from release to release, and so the information may not be reliably obtainable.
Even if you knew exactly how to peek into the meta information maintained by your allocator, the information stored there may have nothing to do with the size of the array. The allocator simply returned memory that could be used to fit the requested size, but the actual size of the memory may be larger (perhaps even much larger) than the requested amount.
The only reliable way to know the information is to find a way to track it yourself.
..
char arKey[1]; } Bucket;
The above is said to be flexible array,how?
Often the last member of a struct is given a size of 0 or 1 (despite 0 being against the standard pre-C99, it's allowed in some compilers as it has great value as a marker). As one would not normally create an array of size 0 or 1, this indicates to fellow coders that the field is used as the start of a variably sized array, extending from the final member into any available memory.
You may also find a member of the struct defining the exact length of the flexible array, just as you often find a member that contains the total size in bytes of the struct.
Links
http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
Is using flexible array members in C bad practice?
http://msdn.microsoft.com/en-us/library/6zxfydcs(VS.71).aspx
http://blogs.msdn.com/b/oldnewthing/archive/2004/08/26/220873.aspx
Example
typedef struct {
size_t len;
char arr[];
} MyString;
size_t mystring_len(MyString const *ms) { return ms->len; }
MyString *mystring_new(char const *init)
{
size_t len = strlen(init);
MyString *rv = malloc(sizeof(MyString) + len + 1);
rv->len = len;
strncpy(rv->arr, init, len);
return rv;
}
Flexible arrays are supposed to have a length of 0 in C99. Using a size of 1 is C90 and is now deprecated.
Basically, such flexible arrays are created by invoking malloc with sizeof(Bucket) + array_length, where array_length is the desired size of your array. Then, dereferencing the arKey pointer (which must be the last member of your structure) will result in that extra memory being accessed, effectively implementing variable-sized objects.
See this page for more information:
http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html