I have a question about dynamic memory allocation in C.
I am currently working on an artificial neural network implemantation in C.
I found and existing project called genann where I noticed a method of memery allocation I was no familar with.
Consider a struct:
typedef struct
{
int an, bn;
double *a, *b
} foo;
And an init function:
foo *foo_init(int an, int bn)
{
foo *f;
f = malloc(sizeof(*f));
f->an = an;
f->bn = bn;
f->a = malloc(an*sizeof(*f->a));
f->b = malloc(bn*sizeof(*f->b));
return f;
}
And here is a different init function like in the mentioned project:
foo *foo_init(int an, int bn)
{
foo *f;
f = malloc(sizeof(*f) + an*sizeof(*f->a) + bn*sizeof(*f->b));
f->an = an;
f->bn = bn;
f->a = (double*)((char*)f + sizeof(*f)); // Could be f->a = (double*)(f + 1); ?
f->b = f->a + an;
return f;
}
So was thinking what are the differences between this two methods. The only advantage of the second method i could think of is that i only need to allocate and free one memory block, but since it is only an init function which is probably only called once, the performance difference should be insignificant. On the other side pointers of different types point to the same memeory chunk, but I think that doesn't violate the strict aliasing rule since they point to different memory within the chunk (?). Resizing could be difficult since it cant be done with a simple realloc like in the first init function. For example if i want to shrink a and b by one and I reallocate the whole memory block, the last two b values are lost (and not one a, one b).
My conclusion is that its better to allocate the memory in the first way, since the second has nearly only disadvantages. Is one of the init functions bad practice? Am I missing something, which would make the second function better? Maybe there is a particular reason why they used the second one in the project?
Thanks in advance.
If you allocate and free lots of these structures, the savings could add up to something that's significant.
It uses a little less space, since each allocated block has some bookkeeping that records its size. It also may reduce memory fragmentation.
Also, doing this ensures that the a and b arrays are close together in memory. If they're often used together, this can improve cache hits.
Library implementers often do these micro-optimizations because they can't predict how the library will be used, and they want to work as well as possible in all cases. When you're writing your own application, you have a better idea of which code will be in inner loops that are significant for performance tuning.
2nd approach problem. The below can fail due to alignment.
f->a = (double*)((char*)f + sizeof(*f));
The pointer returned by malloc() is valid for all object pointers. The computed ((char*)f + sizeof(*f)) may not meet the alignment requirement. In this case, it is highly likely things will work, but not so for other typedefs.
Since C99, code could use a flexible array member to reap the advantages of a single allocation without risking UB. Also simpler syntax.
typedef struct {
int an, bn;
double *a, *b;
// Padding will be added here as needed to meet `data[]` alignment needs
double data[];
} foo;
foo *foo_init(int an, int bn) {
// `sizeof *f` includes space for `an,bn,a,b`
// and optional alignment padding up to `data`, but not `data`.
foo *f = malloc(sizeof *f + (an + bn) * sizeof *(f->data));
if (f) {
f->an = an;
f->bn = bn;
f->a = (double*) (f + 1); // Guaranteed to align to f->data
f->b = f->a + an;
}
return f;
}
Consider size_t rather than int an, bn.
There are ways using union to do this in pre-C99. Something like
typedef union {
foo f;
double data;
} both;
foo *foo_init(int an, int bn) {
both *p = malloc(sizeof(both) + (an + bn) * sizeof *(f->data));
foo *f = (foo *) p;
if (f) {
f->an = an;
f->bn = bn;
// v--- Note p
f->a = (double*) (p + 1); // Guaranteed to align to p->data type
f->b = f->a + an;
}
return f;
}
I've allocated an "array" of mystruct of size n like this:
if (NULL == (p = calloc(sizeof(struct mystruct) * n,1))) {
/* handle error */
}
Later on, I only have access to p, and no longer have n. Is there a way to determine the length of the array given just the pointer p?
I figure it must be possible, since free(p) does just that. I know malloc() keeps track of how much memory it has allocated, and that's why it knows the length; perhaps there is a way to query for this information? Something like...
int length = askMallocLibraryHowMuchMemoryWasAlloced(p) / sizeof(mystruct)
I know I should just rework the code so that I know n, but I'd rather not if possible. Any ideas?
No, there is no way to get this information without depending strongly on the implementation details of malloc. In particular, malloc may allocate more bytes than you request (e.g. for efficiency in a particular memory architecture). It would be much better to redesign your code so that you keep track of n explicitly. The alternative is at least as much redesign and a much more dangerous approach (given that it's non-standard, abuses the semantics of pointers, and will be a maintenance nightmare for those that come after you): store the lengthn at the malloc'd address, followed by the array. Allocation would then be:
void *p = calloc(sizeof(struct mystruct) * n + sizeof(unsigned long int),1));
*((unsigned long int*)p) = n;
n is now stored at *((unsigned long int*)p) and the start of your array is now
void *arr = p+sizeof(unsigned long int);
Edit: Just to play devil's advocate... I know that these "solutions" all require redesigns, but let's play it out.
Of course, the solution presented above is just a hacky implementation of a (well-packed) struct. You might as well define:
typedef struct {
unsigned int n;
void *arr;
} arrInfo;
and pass around arrInfos rather than raw pointers.
Now we're cooking. But as long as you're redesigning, why stop here? What you really want is an abstract data type (ADT). Any introductory text for an algorithms and data structures class would do it. An ADT defines the public interface of a data type but hides the implementation of that data type. Thus, publicly an ADT for an array might look like
typedef void* arrayInfo;
(arrayInfo)newArrayInfo(unsignd int n, unsigned int itemSize);
(void)deleteArrayInfo(arrayInfo);
(unsigned int)arrayLength(arrayInfo);
(void*)arrayPtr(arrayInfo);
...
In other words, an ADT is a form of data and behavior encapsulation... in other words, it's about as close as you can get to Object-Oriented Programming using straight C. Unless you're stuck on a platform that doesn't have a C++ compiler, you might as well go whole hog and just use an STL std::vector.
There, we've taken a simple question about C and ended up at C++. God help us all.
keep track of the array size yourself; free uses the malloc chain to free the block that was allocated, which does not necessarily have the same size as the array you requested
Just to confirm the previous answers: There is no way to know, just by studying a pointer, how much memory was allocated by a malloc which returned this pointer.
What if it worked?
One example of why this is not possible. Let's imagine the code with an hypothetic function called get_size(void *) which returns the memory allocated for a pointer:
typedef struct MyStructTag
{ /* etc. */ } MyStruct ;
void doSomething(MyStruct * p)
{
/* well... extract the memory allocated? */
size_t i = get_size(p) ;
initializeMyStructArray(p, i) ;
}
void doSomethingElse()
{
MyStruct * s = malloc(sizeof(MyStruct) * 10) ; /* Allocate 10 items */
doSomething(s) ;
}
Why even if it worked, it would not work anyway?
But the problem of this approach is that, in C, you can play with pointer arithmetics. Let's rewrite doSomethingElse():
void doSomethingElse()
{
MyStruct * s = malloc(sizeof(MyStruct) * 10) ; /* Allocate 10 items */
MyStruct * s2 = s + 5 ; /* s2 points to the 5th item */
doSomething(s2) ; /* Oops */
}
How get_size is supposed to work, as you sent the function a valid pointer, but not the one returned by malloc. And even if get_size went through all the trouble to find the size (i.e. in an inefficient way), it would return, in this case, a value that would be wrong in your context.
Conclusion
There are always ways to avoid this problem, and in C, you can always write your own allocator, but again, it is perhaps too much trouble when all you need is to remember how much memory was allocated.
Some compilers provide msize() or similar functions (_msize() etc), that let you do exactly that
May I recommend a terrible way to do it?
Allocate all your arrays as follows:
void *blockOfMem = malloc(sizeof(mystruct)*n + sizeof(int));
((int *)blockofMem)[0] = n;
mystruct *structs = (mystruct *)(((int *)blockOfMem) + 1);
Then you can always cast your arrays to int * and access the -1st element.
Be sure to free that pointer, and not the array pointer itself!
Also, this will likely cause terrible bugs that will leave you tearing your hair out. Maybe you can wrap the alloc funcs in API calls or something.
malloc will return a block of memory at least as big as you requested, but possibly bigger. So even if you could query the block size, this would not reliably give you your array size. So you'll just have to modify your code to keep track of it yourself.
For an array of pointers you can use a NULL-terminated array. The length can then determinate like it is done with strings. In your example you can maybe use an structure attribute to mark then end. Of course that depends if there is a member that cannot be NULL. So lets say you have an attribute name, that needs to be set for every struct in your array you can then query the size by:
int size;
struct mystruct *cur;
for (cur = myarray; cur->name != NULL; cur++)
;
size = cur - myarray;
Btw it should be calloc(n, sizeof(struct mystruct)) in your example.
Other have discussed the limits of plain c pointers and the stdlib.h implementations of malloc(). Some implementations provide extensions which return the allocated block size which may be larger than the requested size.
If you must have this behavior you can use or write a specialized memory allocator. This simplest thing to do would be implementing a wrapper around the stdlib.h functions. Some thing like:
void* my_malloc(size_t s); /* Calls malloc(s), and if successful stores
(p,s) in a list of handled blocks */
void my_free(void* p); /* Removes list entry and calls free(p) */
size_t my_block_size(void* p); /* Looks up p, and returns the stored size */
...
really your question is - "can I find out the size of a malloc'd (or calloc'd) data block". And as others have said: no, not in a standard way.
However there are custom malloc implementations that do it - for example http://dmalloc.com/
I'm not aware of a way, but I would imagine it would deal with mucking around in malloc's internals which is generally a very, very bad idea.
Why is it that you can't store the size of memory you allocated?
EDIT: If you know that you should rework the code so you know n, well, do it. Yes it might be quick and easy to try to poll malloc but knowing n for sure would minimize confusion and strengthen the design.
One of the reasons that you can't ask the malloc library how big a block is, is that the allocator will usually round up the size of your request to meet some minimum granularity requirement (for example, 16 bytes). So if you ask for 5 bytes, you'll get a block of size 16 back. If you were to take 16 and divide by 5, you would get three elements when you really only allocated one. It would take extra space for the malloc library to keep track of how many bytes you asked for in the first place, so it's best for you to keep track of that yourself.
This is a test of my sort routine. It sets up 7 variables to hold float values, then assigns them to an array, which is used to find the max value.
The magic is in the call to myMax:
float mmax = myMax((float *)&arr,(int) sizeof(arr)/sizeof(arr[0]));
And that was magical, wasn't it?
myMax expects a float array pointer (float *) so I use &arr to get the address of the array, and cast it as a float pointer.
myMax also expects the number of elements in the array as an int. I get that value by using sizeof() to give me byte sizes of the array and the first element of the array, then divide the total bytes by the number of bytes in each element. (we should not guess or hard code the size of an int because it's 2 bytes on some system and 4 on some like my OS X Mac, and could be something else on others).
NOTE:All this is important when your data may have a varying number of samples.
Here's the test code:
#include <stdio.h>
float a, b, c, d, e, f, g;
float myMax(float *apa,int soa){
int i;
float max = apa[0];
for(i=0; i< soa; i++){
if (apa[i]>max){max=apa[i];}
printf("on i=%d val is %0.2f max is %0.2f, soa=%d\n",i,apa[i],max,soa);
}
return max;
}
int main(void)
{
a = 2.0;
b = 1.0;
c = 4.0;
d = 3.0;
e = 7.0;
f = 9.0;
g = 5.0;
float arr[] = {a,b,c,d,e,f,g};
float mmax = myMax((float *)&arr,(int) sizeof(arr)/sizeof(arr[0]));
printf("mmax = %0.2f\n",mmax);
return 0;
}
In uClibc, there is a MALLOC_SIZE macro in malloc.h:
/* The size of a malloc allocation is stored in a size_t word
MALLOC_HEADER_SIZE bytes prior to the start address of the allocation:
+--------+---------+-------------------+
| SIZE |(unused) | allocation ... |
+--------+---------+-------------------+
^ BASE ^ ADDR
^ ADDR - MALLOC_HEADER_SIZE
*/
/* The amount of extra space used by the malloc header. */
#define MALLOC_HEADER_SIZE \
(MALLOC_ALIGNMENT < sizeof (size_t) \
? sizeof (size_t) \
: MALLOC_ALIGNMENT)
/* Set up the malloc header, and return the user address of a malloc block. */
#define MALLOC_SETUP(base, size) \
(MALLOC_SET_SIZE (base, size), (void *)((char *)base + MALLOC_HEADER_SIZE))
/* Set the size of a malloc allocation, given the base address. */
#define MALLOC_SET_SIZE(base, size) (*(size_t *)(base) = (size))
/* Return base-address of a malloc allocation, given the user address. */
#define MALLOC_BASE(addr) ((void *)((char *)addr - MALLOC_HEADER_SIZE))
/* Return the size of a malloc allocation, given the user address. */
#define MALLOC_SIZE(addr) (*(size_t *)MALLOC_BASE(addr))
malloc() stores metadata regarding space allocation before 8 bytes from space actually allocated. This could be used to determine space of buffer. And on my x86-64 this always return multiple of 16. So if allocated space is multiple of 16 (which is in most cases) then this could be used:
Code
#include <stdio.h>
#include <malloc.h>
int size_of_buff(void *buff) {
return ( *( ( int * ) buff - 2 ) - 17 ); // 32 bit system: ( *( ( int * ) buff - 1 ) - 17 )
}
void main() {
char *buff = malloc(1024);
printf("Size of Buffer: %d\n", size_of_buff(buff));
}
Output
Size of Buffer: 1024
This is my approach:
#include <stdio.h>
#include <stdlib.h>
typedef struct _int_array
{
int *number;
int size;
} int_array;
int int_array_append(int_array *a, int n)
{
static char c = 0;
if(!c)
{
a->number = NULL;
a->size = 0;
c++;
}
int *more_numbers = NULL;
a->size++;
more_numbers = (int *)realloc(a->number, a->size * sizeof(int));
if(more_numbers != NULL)
{
a->number = more_numbers;
a->number[a->size - 1] = n;
}
else
{
free(a->number);
printf("Error (re)allocating memory.\n");
return 1;
}
return 0;
}
int main()
{
int_array a;
int_array_append(&a, 10);
int_array_append(&a, 20);
int_array_append(&a, 30);
int_array_append(&a, 40);
int i;
for(i = 0; i < a.size; i++)
printf("%d\n", a.number[i]);
printf("\nLen: %d\nSize: %d\n", a.size, a.size * sizeof(int));
free(a.number);
return 0;
}
Output:
10
20
30
40
Len: 4
Size: 16
If your compiler supports VLA (variable length array), you can embed the array length into the pointer type.
int n = 10;
int (*p)[n] = malloc(n * sizeof(int));
n = 3;
printf("%d\n", sizeof(*p)/sizeof(**p));
The output is 10.
You could also choose to embed the information into the allocated memory yourself with a structure including a flexible array member.
struct myarray {
int n;
struct mystruct a[];
};
struct myarray *ma =
malloc(sizeof(*ma) + n * sizeof(struct mystruct));
ma->n = n;
struct mystruct *p = ma->a;
Then to recover the size, you would subtract the offset of the flexible member.
int get_size (struct mystruct *p) {
struct myarray *ma;
char *x = (char *)p;
ma = (void *)(x - offsetof(struct myarray, a));
return ma->n;
}
The problem with trying to peek into heap structures is that the layout might change from platform to platform or from release to release, and so the information may not be reliably obtainable.
Even if you knew exactly how to peek into the meta information maintained by your allocator, the information stored there may have nothing to do with the size of the array. The allocator simply returned memory that could be used to fit the requested size, but the actual size of the memory may be larger (perhaps even much larger) than the requested amount.
The only reliable way to know the information is to find a way to track it yourself.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
struct atom {
int x;
int y;
int z;
double mass;
};
struct molecule {
struct atom *member;
int natoms;
};
struct system {
struct molecule *fragment;
int nfrags
};
struct system sys;
sys.nfrags=get_number_of_fragments;
????
The system has some number of molecules, each of which has some number of atoms. I don't know how to allocate these things. If I allocate sys.fragment first, it seems like the sizeof(molecule) is undefined since I haven't yet defined the number of atoms (so how can it have a size?). If I try to define the number of atoms first, how do I specify which fragment I'm mallocing for?
I have functions that will return the number of atoms for any molecule/fragment as well as the number of fragments, but am stuck on where to go from here.
AFAIK sizeof(X) cannot be "undefined". In this example, sizeof(molecule) is well defined as the amount of memory it takes to store one molecule instance: One atom Pointer (Note: not the size of any array that you may put here, just the size of the pointer) and one int. So it is perfectly fine to do it the first way and allocate your sys.fragment first:
sys.fragment = malloc(sys.nfrag * sizeof(sys.fragment));
sys.fragment = calloc(sys.nfrag, sizeof sys.fragmemt[0]);
or
sys.fragment = malloc(sys.nfrag * sizeof sys.fragment[0]);
when performance matters (but do not forget the check for overflow!).
malloc(sizeof(system)) Easy as that
You basically have a:
size = X;
first = malloc(X * sizeof(First));
first->second = malloc( N * sizeof(Second));
and so on
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I have a few questions about structs and pointers
For this struct:
typedef struct tNode_t {
char *w;
} tNode;
How come if I want to change/know the value of *w I need to use t.w = "asdfsd" instead
of t->w = "asdfasd"?
And I compiled this successfully without having t.w = (char *) malloc(28*sizeof(char));
in my testing code, is there a reason why tt's not needed?
Sample main:
int main()
{
tNode t;
char w[] = "abcd";
//t.word = (char *) malloc(28*sizeof(char));
t.word = w;
printf("%s", t.word);
}
Thanks.
t->w is shorthand for (*t).w i.e. it only makes sense to use the arrow if t is a pointer to a struct.
Also, since the string you assigned is hard-coded (thus, determined at compile time), there's no need to dynamically allocate its memory at runtime.
Take a look at this tutorial: http://pw1.netcom.com/~tjensen/ptr/ch5x.htm
struct tag {
char lname[20]; /* last name */
char fname[20]; /* first name */
int age; /* age */
float rate; /* e.g. 12.75 per hour */
};
kay, so we know that our pointer is going to point to a structure declared using struct tag. We declare such a pointer with the declaration:
struct tag *st_ptr;
and we point it to our example structure with:
st_ptr = &my_struct;
Now, we can access a given member by de-referencing the pointer. But, how do we de-reference the pointer to a structure? Well, consider the fact that we might want to use the pointer to set the age of the employee. We would write:
(*st_ptr).age = 63;
Look at this carefully. It says, replace that within the parenthesis with that which st_ptr points to, which is the structure my_struct. Thus, this breaks down to the same as my_struct.age.
However, this is a fairly often used expression and the designers of C have created an alternate syntax with the same meaning which is:
st_ptr->age = 63;
How come if I want to change/know the value of *w I need to use t.w =
"asdfsd" instead of t->w = "asdfasd"?
If your t is a pointer, you need to use t->w. Else you should use t.w.
in my testing code, is there a reason why tt's not needed?
In your code you have already set your t.word to point to the area storing your char w[] = "abcd";. So you don't need to malloc some memory for your t.word.
Another example where malloc is needed is:
tNode t;
t.word = (char *) malloc(28*sizeof(char));
strcpy(t.word, "Hello World");
printf("%s", t.word);
How come if I want to change/know the value of *w I need to use t.w =
"asdfsd" instead of t->w = "asdfasd"?
t is a tNode, not a pointer to tNode, so . is the appropriate accessor.
And I compiled this successfully without having t.w = (char *)
malloc(28*sizeof(char)); in my testing code, is there a reason why
tt's not needed?
This isn't needed because "abcd" is still in scope so w can just point to it. If you want w to point to persistent memory-managed contents, it will have to allocate its own memory as it would with this line.
I have a dynamic array of structures, so I thought I could store the information about the array in the first structure.
So one attribute will represent the amount of memory allocated for the array and another one representing number of the structures actually stored in the array.
The trouble is, that when I put it inside a function that fills it with these structures and tries to allocate more memory if needed, the original array gets somehow distorted.
Can someone explain why is this and how to get past it?
Here is my code
#define INIT 3
typedef struct point{
int x;
int y;
int c;
int d;
}Point;
Point empty(){
Point p;
p.x=1;
p.y=10;
p.c=100;
p.d=1000; //if you put different values it will act differently - weird
return p;
}
void printArray(Point * r){
int i;
int total = r[0].y+1;
for(i=0;i<total;i++){
printf("%2d | P [%2d,%2d][%4d,%4d]\n",i,r[i].x,r[i].y,r[i].c,r[i].d);
}
}
void reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
}
void enter(Point* r,int c){
int i;
for(i=1;i<c;i++){
r[r[0].y+1]=empty();
r[0].y++;
if( (r[0].y+2) >= r[0].x ){ /*when the amount of Points is near
*the end of allocated memory.
reallocate the array*/
reallocFunction(r);
}
}
}
int main(int argc, char** argv) {
Point * r=(Point *) malloc ( sizeof ( Point ) * INIT );
r[0]=empty();
r[0].x=INIT; /*so here I store for how many "Points" is there memory
//in r[0].y theres how many Points there are.*/
enter(r,5);
printArray(r);
return (0);
}
Your code does not look clean to me for other reasons, but...
void reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
r[0].y++;
}
The problem here is that r in this function is the parameter, hence any modifications to it are lost when the function returns. You need some way to change the caller's version of r. I suggest:
Point * // Note new return type...
reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
r[0].y++;
return r; // Note: now we return r back to the caller..
}
Then later:
r = reallocFunction(r);
Now... Another thing to consider is that realloc can fail. A common pattern for realloc that accounts for this is:
Point *reallocFunction(Point * r){
void *new_buffer = realloc(r, r[0].x*2*sizeof(Point));
if (!new_buffer)
{
// realloc failed, pass the error up to the caller..
return NULL;
}
r = new_buffer;
r[0].x*=2;
r[0].y++;
return r;
}
This ensures that you don't leak r when the memory allocation fails, and the caller then has to decide what happens when your function returns NULL...
But, some other things I'd point out about this code (I don't mean to sound like I'm nitpicking about things and trying to tear them apart; this is meant as constructive design feedback):
The names of variables and members don't make it very clear what you're doing.
You've got a lot of magic constants. There's no explanation for what they mean or why they exist.
reallocFunction doesn't seem to really make sense. Perhaps the name and interface can be clearer. When do you need to realloc? Why do you double the X member? Why do you increment Y? Can the caller make these decisions instead? I would make that clearer.
Similarly it's not clear what enter() is supposed to be doing. Maybe the names could be clearer.
It's a good thing to do your allocations and manipulation of member variables in a consistent place, so it's easy to spot (and later, potentially change) how you're supposed to create, destroy and manipulate one of these objects. Here it seems in particular like main() has a lot of knowledge of your structure's internals. That seems bad.
Use of the multiplication operator in parameters to realloc in the way that you do is sometimes a red flag... It's a corner case, but the multiplication can overflow and you can end up shrinking the buffer instead of growing it. This would make you crash and in writing production code it would be important to avoid this for security reasons.
You also do not seem to initialize r[0].y. As far as I understood, you should have a r[0].y=0 somewhere.
Anyway, you using the first element of the array to do something different is definitely a bad idea. It makes your code horribly complex to understand. Just create a new structure, holding the array size, the capacity, and the pointer.