C memory allocation sequence for struct data - c

I am reading a C scripts written by someone else. I don't understand this memory allocation part.
lda_suffstats* ss = malloc(sizeof(lda_suffstats));
ss->class_total = malloc(sizeof(double)*num_topics);
ss->class_word = malloc(sizeof(double*)*num_topics);
where lda_suffstats is a self-defined structure,
typedef struct
{
double** class_word;
double* class_total;
double alpha_suffstats;
int num_docs;
} lda_suffstats;
My question is regarding the first line of memory allocation. What is the size of lda_suffstats? Shouldn't the memory for each of its component be allocated before itself?

You can know how big lda_suffstats will be before you actually have one, just like you know how big of a bag you need to have with you in order to fit two cartons of milk and a dozen eggs. A size of lda_suffstats is a sum of sizes of double**, double*, double and int, no more, no less. They are not independent components, they'll all use the memory of the lda_suffstats. Now, the first two are pointers, which means the associated value is not right there but only pointed to, and allocating the target of the pointers is what the other two malloc lines are about.

lda_suffstats has four fields with the types of double**, double*, double, and int. The size of each of these is known at compile time. The sum of their sizes gives the size of lda_suffstats. The amount of memory allocated to the pointers does not change this because that memory is allocated outside of the struct.

Related

the difference between struct with flexible arrays members and struct with pointer members

I'm quit confused with the difference between flexible arrays and pointer as struct members. Someone suggested, struct with pointers need malloc twice. However, consider the following code:
struct Vector {
size_t size;
double *data;
};
int len = 20;
struct Vector* newVector = malloc(sizeof *newVector + len * sizeof*newVector->data);
printf("%p\n",newVector->data);//print 0x0
newVector->data =(double*)((char*)newVector + sizeof*newVector);
// do sth
free(newVector);
I find a difference is that the address of data member of Vector is not defined. The programmer need to convert to "find" the exactly address. However, if defined Vector as:
struct Vector {
size_t size;
double data[];
};
Then the address of data is defined.
I am wondering whether it is safe and able to malloc struct with pointers like this, and what is the exactly reason programmers malloc twice when using struct with pointers.
The difference is how the struct is stored. In the first example you over-allocate memory but that doesn't magically mean that the data pointer gets set to point at that memory. Its value after malloc is in fact indeterminate, so you can't reliably print it.
Sure, you can set that pointer to point beyond the part allocated by the struct itself, but that means potentially slower access since you need to go through the pointer each time. Also you allocate the pointer itself as extra space (and potentially extra padding because of it), whereas in a flexible array member sizeof doesn't count the flexible array member. Your first design is overall much more cumbersome than the flexible version, but other than that well-defined.
The reason why people malloc twice when using a struct with pointers could either be that they aren't aware of flexible array members or using C90, or alternatively that the code isn't performance-critical and they just don't care about the overhead caused by fragmented allocation.
I am wondering whether it is safe and able to malloc struct with pointers like this, and what is the exactly reason programmers malloc twice when using struct with pointers.
If you use pointer method and malloc only once, there is one extra thing you need to care of in the calculation: alignment.
Let's add one extra field to the structure:
struct Vector {
size_t size;
uint32_t extra;
double *data;
};
Let's assume that we are on system where each field is 4 bytes, there is no trailing padding on struct and total size is 12 bytes. Let's also assume that double is 8 bytes and requires alignment to 8 bytes.
Now there is a problem: expression (char*)newVector + sizeof*newVector no longer gives address that is divisible by 8. There needs to be manual padding of 4 bytes between structure and data. This complicates the malloc size calculation and data pointer offset calculation.
So the main reason you see 1 malloc pointer version less, is that it is harder to get right. With pointer and 2 mallocs, or flexible array member, compiler takes care of necessary alignment calculation and padding so you don't have to.

How does C's realloc differentiate between an array of ints and a single int?

Since arrays are just contiguous data of the same type, and you don't need to explicitly put [] somewhere (e.g. you can int *p1 = malloc(sizeof(int) * 4);, how is it that when you realloc(p1, ...), it knows to move (if it has to move) exactly 4 ints worth of space, even if potentially there are other ints in memory?
To clarify the question: If you allocate an array in this way, and also just a single, seperate int - does that mean that these 4+1 total ints are never contiguous in memory, or is this "it's an array" information in the memory block somehow (e.g. they have some sort of delimiter?), or does the compiler infer and remember that from the malloc parameter? Or something else?
Basically, how does it ensure it moves only and exactly those 4, even when there are other blocks of memory of the same size that might be also contiguous?
realloc is defined only when passed a pointer to memory allocated by a member of the malloc family of routines (or a null pointer). These routines keep records of the memory they have allocated. When you call realloc, it uses these records to know how long the allocated block is.
Often, the primary record for a block of memory is put into the bytes just before that block, so all realloc has to do is take the pointer you give it, subtract a known number of bytes from it, and look at the data at that new address, where it will find information about the size of the allocated block. However, other methods are possible too.

c malloc array of struct

So far, I have dealt a bit with pointers and structs, but I'm not sure how to allocate an array of a structure at runtime - see below.
N.B. "user_size" is initialized at runtime.
typedef struct _COORDS
{
double x;
double y;
double area;
double circumference;
int index;
wchar_t name[16];
} COORDS, *PCOORDS;
PCOORDS pCoords = (PCOORDS)malloc(sizeof(COORDS)* user_size);
// NULL ptr check omitted
After that, can I just access pCoords[0] to pCoords[user_size-1] as with an ordinary array of ints?
More to the point: I don't understand how the compiler superimposes the layout of the structure on the alloc'ed memory? Does it even have to or am I overthinking this?
The compiler does not super-impose the structure on the memory -- you tell it to do so!
An array of structures is accessed by multiplying the index of one element by its total size. pCoords[3], for example, is "at" pCoords + 3*sizeof(COORDS) in memory.
A structure member is accessed by its offset (which is calculated by the sizes of the elements before it, taking padding into account). So member x is at an offset 0 from the start of its container, pCoords plus sizeof(COORDS) times the array element index; and y is sizeof(x) after that.
Since you tell the compiler that (1) you want a contiguous block of memory with a size for user_size times the size of a single COORD, and (2) then access this through pCoords[2].y, all it has to do is multiply and add, and then read the value (literally) in that memory address. Since the type of y is double, it reads and interprets the raw bytes as a double. And usually, it gets it right.
The only problem that can arise is when you have multiple pointers to that same area of memory. That could mean that the raw bytes "at" an address may need interpreting as different types (for instance, when one pointer tells it to expect an int and another a double).
With the provisio that the valid range is acutally 0..user_size - 1, your code is fine.
You are probably overthinking this. The compiler does not "superimpose" anything on the malloc'ed memory - that is just a bunch of bytes.
However, pointers are typed in C, and the type of the pointer determines how the memory is interpreted when the pointer is derefenced or used in pointer artihmetic. The compiler knows the memory layout of the struct. Each field has a defined size and an offset, and the overall size of the struct is known, too.
In your case, the expression pCoords[i].area = 42.0 is equivalent to
char *pByte = (char*)pCoords + sizeof(COORDS) * i + offsetof(COORDS, area);
double *pDouble = (pDouble*)pByte;
*pDouble = 42.0;

Whats the difference between these array declarations in C?

People seem to say how malloc is so great when using arrays and you can use it in cases when you don't know how many elements an array has at compile time(?). Well, can't you do that without malloc? For example, if we knew we had a string that had max length 10 doesn't the following do close enough to the same thing?... Besides being able to free the memory that is.
char name[sizeof(char)*10];
and
char *name = malloc(sizeof(char)*10);
The first creates an array of chars on the stack. The length of the array will be sizeof(char)*10, but seeing as char is defined by the standard of being 1 in size, you could just write char name[10];
If you want an array, big enough to store 10 ints (defined per standard to be at least 2 bytes in size, but most commonly implemented as 4 bytes big), int my_array[10] works, too. The compiler can work out how much memory will be required anyways, no need to write something like int foo[10*sizeof(int)]. In fact, the latter will be unpredictable: depending on sizeof(int), the array will store at least 20 ints, but is likely to be big enough to store 40.
Anyway, the latter snippet calls a function, malloc wich will attempt to allocate enough memory to store 10 chars on the heap. The memory is not initialized, so it'll contain junk.
Memory on the heap is slightly slower, and requires more attention from you, who is writing the code: you have to free it explicitly.
Again: char is guaranteed to be size 1, so char *name = malloc(10); will do here, too. However, when working with heap memory, I -and I'm not alone in this- prefer to allocate the memory like so some_ptr = malloc(10*sizeof *some_ptr); using *some_ptr, is like saying 10 times the size of whatever type this pointer will point to. If you happen to change the type later on, you don't have to refactor all malloc calls.
General rule of thumb, to answer your question "can you do without malloc", is that you don't use malloc, unless you have to.
Stack memory is faster, and easier to use, but it is less abundant. This site was named after a well-known issue you can run into when you've pushed too much onto the stack: it overflows.
When you run your program, the system will allocate a chunk of memory that you can use freely. This isn't much, but plenty for simple computations and calling functions. Once you run out, you'll have to resort to allocating memory from the heap.
But in this case, an array of 10 chars: use the stack.
Other things to consider:
An array is a contguous block of memory
A pointer doesn't know/can't tell you how big a block of memory was allocated (sizeof(an_array)/sizeof(type) vs sizeof(a_pointer))
An array's declaration does not require the use of sizeof. The compiler works out the size for you: <type> my_var[10] will reserve enough memory to hold 10 elements of the given type.
An array decays into a pointer, most of the time, but that doesn't make them the same thing
pointers are fun, if you know what you're doing, but once you start adding functions, and start passing pointers to pointers to pointers, or a pointer to a pointer to a struct, that has members that are pointers... your code won't be as jolly to maintain. Starting off with an array, I find, makes it easier to come to grips with the code, as it gives you a starting point.
this answer only really applies to the snippets you gave, if you're dealing with an array that grows over time, than realloc is to be preferred. If you're declaring this array in a recursive function, that runs deep, then again, malloc might be the safer option, too
Check this link on differences between array and pointers
Also take a look at this question + answer. It explains why a pointer can't give you the exact size of the block of memory you're working on, and why an array can.
Consider that an argument in favour of arrays wherever possible
char name[sizeof(char)*10]; // better to use: char name[10];
Statically allocates a vector of sizeof(char)*10 char elements, at compile time. The sizeof operator is useless because if you allocate an array of N elements of type T, the size allocated will already be sizeof(T)*N, you don't need to do the math. Stack allocated and no free needed. In general, you use char name[10] when you already know the size of the object you need (the length of the string in this case).
char *name = malloc(sizeof(char)*10);
Allocates 10 bytes of memory in the heap. Allocation is done at run time, you need to free the result.
char name[sizeof(char)*10];
The first one is allocated on the stack, once it goes out of scope memory gets automatically freed. You can't change the size of the first one.
char *name = malloc(sizeof(char)*10);
The second one is allocated on the heap and should be freed with free. It will stick around otherwise for the lifetime of your application. You can reallocate memory for the second one if you need.
The storage duration is different:
An array created with char name[size] exists for the entire duration of program execution (if it is defined at file scope or with static) or for the execution of the block it is defined in (otherwise). These are called static storage duration and automatic storage duration.
An array created with malloc(size) exists for just as long as you specify, from the time you call malloc until the time you call free. Thus, it can be made to use space only while you need it, unlike static storage duration (which may be too long) or automatic storage duration (which may be too short).
The amount of space available is different:
An array created with char name[size] inside a function uses the stack in typical C implementations, and the stack size is usually limited to a few megabytes (more if you make special provisions when building the program, typically less in kernel software and embedded systems).
An array created with malloc may use gigabytes of space in typical modern systems.
Support for dynamic sizes is different:
An array created with char name[size] with static storage duration must have a size specified at compile time. An array created with char name[size] with automatic storage duration may have a variable length if the C implementation supports it (this was mandatory in C 1999 but is optional in C 2011).
An array created with malloc may have a size computed at run-time.
malloc offers more flexibility:
Using char name[size] always creates an array with the given name, either when the program starts (static storage duration) or when execution reaches the block or definition (automatic).
malloc can be used at run-time to create any number of arrays (or other objects), by using arrays of pointers or linked lists or trees or other data structures to create a multitude of pointers to objects created with malloc. Thus, if your program needs a thousand separate objects, you can create an array of a thousand pointers and use a loop to allocate space for each of them. In contrast, it would be cumbersome to write a thousand char name[size] definitions.
First things first: do not write
char name[sizeof(char)*10];
You do not need the sizeof as part of the array declaration. Just write
char name[10];
This declares an array of 10 elements of type char. Just as
int values[10];
declares an array of 10 elements of type int. The compiler knows how much space to allocate based on the type and number of elements.
If you know you'll never need more than N elements, then yes, you can declare an array of that size and be done with it, but:
You run the risk of internal fragmentation; your maximum number of bytes may be N, but the average number of bytes you need may be much smaller than that. For example, let's say you want to store 1000 strings of max length 255, so you declare an array like
char strs[1000][256];
but it turns out that 900 of those strings are only 20 bytes long; you're wasting a couple of hundred kilobytes of space1. If you split the difference and stored 1000 pointers, then allocated only as much space as was necessary to store each string, then you'd wind up wasting a lot less memory:
char *strs[1000];
...
strs[i] = strdup("some string"); // strdup calls malloc under the hood
...
Stack space is also limited relative to heap space; you may not be able to declare arbitrarily large arrays (as auto variables, anway). A request like
long double huge[10000][10000][10000][10000];
will probably cause your code to crash at runtime, because the default stack size isn't large enough to accomodate it2.
And finally, most situations fall into one of three categories: you have 0 elements, you have exactly 1 element, or you have an unlimited number of elements. Allocating large enough arrays to cover "all possible scenarios" just doesn't work. Been there, done that, got the T-shirt in multiple sizes and colors.
1. Yes, we live in the future where we have gigabytes of address space available, so wasting a couple of hundred KB doesn't seem like a big deal. The point is still valid, you're wasting space that you don't have to.
2. You could declare very large arrays at file scope or with the static keyword; this will allocate the array in a different memory segment (neither stack nor heap). The problem is that you only have that single instance of the array; if your function is meant to be re-entrant, this won't work.

Some questions about memory/malloc

How are variables really stored in memory? I ask this because say you malloc a segment of memory and assign it to a pointer e.g.
int *p = malloc(10 * sizeof(int));
and then run a for loop to assign integers through p - this seems different to declaring an int variable and assigning an integer to it like:
int x = 10;
Because it's a more explicit declaration that you want an int stored in memory, whereas in malloc it's just a chunk of memory you're traversing through pointer arithmetic.
Am I missing something here? Much thanks.
when you need an array of data, for example when you receive numbers from the user but don't know the length you can't use a fixed number of integers, you need a dynamic way to crate memory for those integers. malloc and his friends let you do that. among other things:
malloc let you create memory dynamically in the size you need right now.
while using malloc the memory will not be freed when exiting the scope.
using malloc for let's say array of 10 item or create an array of 10 items on the stack there is no difference in the sense of " explicit declaration that you want an int stored in memory", there is just differences in what i've written here and some more
here is an article on the differences between heap and stack
i'm writing the pros of each way:
Stack
very fast access
don't have to explicitly de-allocate variables
space is managed efficiently by CPU, memory will not become fragmented
local variables only
limit on stack size (OS-dependent)
variables cannot be resized
Heap
variables can be accessed globally
no limit on memory size
(relatively) slower access
no guaranteed efficient use of space, memory may become fragmented over time as blocks of memory are allocated, then freed
you must manage memory (you're in charge of allocating and freeing variables)
variables can be resized using realloc()
Variables declared in C like int x = 10 are located on the stack, which is accessible only until the function it's declared in returns (if it's declared outside functions, we call it global, and it's available through the whole runtime of the application).
Memory allocated with malloc and similar functions are located in the heap, which is accessible either until it's freed explicitly (e.g. calling free(...)) or the application terminates (which in case of servers might take weeks/months/years).
Both the stack and the heap is part of the memory, the main difference is in the method of allocation. In C, the * and & unary operators can blur the line between the two, so for example in case of a declaration like int x = 10 you can get the address like int* y = &x, and at the same time, you can assign a value like *y = 15 in case of a pointer as well.
Well, when you're doing
int x = 10;
Compiler does whatever needs to be done. But when you're using malloc(), you're in charge of maintaining that memory block, you can use it at your wish, this also gives you the burdon of cleaning it up properly.
1.If you know the array size . Use int array[10] is faster and more safe than int *array = malloc(10*sizeof(int)) . Only if you didn.t know the size before run-time then you need malloc thing.
2.The declare int x = 10 , x stored in stack memory. If you declare int *p = malloc(10*sizeof(int)); the p is stored in stack memory but the memory p pointer is in heap.
3.When you use int *p = malloc(10*sizeof(int)); , the function alloc a block memory , it only have the right size. In fact you can store type you want in this memory , thoungh not encourage to do this.
4.If you use int x = 10 , the memory will be freed auto , just after the variable out of its scope. If you use malloc, you should free the memory by yourself,or memory leak!
malloc asigns an memoryblock and returns an pointer to it, its life time is as of dynamicaly allocated memory. You can store any objects of its type in it.
An int
int x = 10;
is automatic storage and is an lvalue not an pointer.
So you dont have to access it by it's adress, as you would have to by an value a pointer is pointing to.
You can access and assign it's value by it's identifyer name.
And its also cleaned up, when you leave it's scope.
I summarize your questions about from the deleted answer here again for you: malloc() returns a raw chunk of data, which isn't preserved for any type. Even if you assign it to an int lvalue the data chunk isn't of type int, untill you derefference it as thoose. (means you'r using it with data of a special type.) The chunk you are getting is described by the parameter parsed to the function in Bytes. sizeof() represents the size of the type.
so you are getting in this case a chunk that has place for 10 integers. but if you havent used it with the int * ptr, you could also asign the address to a pointer of type char, and use the block as memory block for 40 char variables. But the first time you "put something in there" its then preserved for that type.

Resources