What is the difference between declaring an array "dynamically",
[ie. using realloc() or malloc(), etc... ]
vs
declaring an array within main() with Global scope?,
eg.
int main()
{
int array[10];
return 0;
}
I am learning, and at the moment it feels that there is not much differnce between
declaring a variable (array, whatever) -with Global scope,
when compared to a
dynamically allocated variable (array, whatever) -AND never calling free() on it AND allowing it to be 'destoryed' when the program ends'
What are the consequences of either option?
EDIT
Thank you for your responses.
Global scope should have been 'local scope' -local to main()
When you declare an array like int arr[10] in a function, the space for the array is allocated on the stack. The memory will be freed when your function exits.
When you declare an array or any other data structure using malloc() or realloc(), you allocated the space on the heap and the memory will only be freed afer the program exits. So when the program is running, you are responsible for freeing it using free() after you no longer want to use it. If you don't free it and make your array pointer point to something else, you will create a memory leak. However, your computer will always be able to retrieve all the program's used memory after the program ends because of virtual memory.
As kaylum said in comment below your question, the array in your second example does not have global scope. Its scope is limited to main(), and it is inaccessible in other scopes unless main() explicitly makes it available (e.g. passes it by argument to another function).
Dynamic memory allocation means that the programmer explicitly allocates memory when needed, and explicitly releases it when no longer needed. Because of that, the amount of memory allocated can be determined at run time (e.g. calculated from user input). Also, if the programmer forgets to release the memory, or reallocates it inappropriately, memory can be leaked (still allocated by the program, but not accessible by the program). For example;
/* within a function */
char *p = malloc(100);
p = malloc(200);
free(p);
leaks 100 bytes, every time this code is executed, because the result of the first malloc() call is never released, and it is then inaccessible to the program because its value is not stored anywhere.
Your second example is actually an array of automatic storage duration. As far as your program is concerned, it only exists until the end of the scope in which it is created. In your case, as main() returns, the array will cease to exist.
An example of an array with global scope is
int array[10];
void f() {array[0] = 42;}
int main()
{
array[0] = 10;
f();
/* array[0] will be 42 here */
}
The difference is that this array exists and is accessible to every function that has visibility of the declaration, within the same compilation unit.
One other important difference is that global arrays are (usually) zero initialised - a global array of int will have all elements zero. A dynamically allocated array will not have elements initialised (unless created with calloc(), which does initialise to zero). Similarly, an automatic array will not have elements initialised. It is undefined behaviour to access the value of something (including an array element) that is uninitialised.
So
#include <stdio.h>
int array[10];
int main()
{
int *array2;
int array3[10];
array2 = malloc(10*sizeof(*array2));
printf("%d\n", array[0]); /* okay - will print 0 */
printf("%d\n", array2[0]); /* undefined behaviour. array2[0] is uninitialised */
printf("%d\n", array3[0]); /* undefined behaviour. array3[0] uninitialised */
return 0;
}
Obviously the way to avoid undefined behaviour is to initialise array elements to something valid before trying to access their value (e.g. printing them out, in the example above).
Related
So i want to return an array of a size n (variable) which my function has as input. I know that in order to return arrays in C I have to define them static, but the problem is that n is a variable and thus I get an error. I thought of actually using malloc/calloc but then I won't be able to free them after the function returns the array. Please take note that I'm not allowed to change anything on main(). Are there any other alternatives which I could use? Thanks in advance.
float *Arr( int *a , int n ){
static float b[ n ];
return b
}
Got to point out that the function will only be called Once,I saw the solution you posted but i noticed you aren't freeing the allocated memory,is it not of much importance when the malloc is called inside a function?
The important thing to notice here is that this syntax:
float arr[n];
Allocates an array on the stack of the current function. In other words, that array is a local variable. Any local variable becomes invalid after the function returns, and therefore returning the array directly is undefined behavior. It will most likely cause a crash when trying to access the array from outside the function, if not anything worse.
In addition to that, declaring a variable-length array as static is invalid in any case.
If you want to write a function which creates and returns any kind of array (dynamically sized or not), the only option you have is to use dynamic allocation through malloc() and then return a pointer to the array (technically there's also alloca() to make dynamic stack allocations, but I would avoid it as it can easily break your program if the allocation is too large).
Here's an example of correct code:
float *create_array(size_t n_elements){
float *arr = malloc(sizeof(float) * n_elements);
if (arr == NULL) {
// Memory could not be allocated, handle the error appropriately.
}
return arr;
}
In this case, malloc() is reserving memory outside of the local stack of the function, in the heap. The result is a pointer that can be freely returned and passed around without any problem, since that area of memory keeps being valid after the function returns (until it is released). When you're done working with the data, you can release the allocated memory by calling free():
float *arr = create_array(100);
// ...
free(arr);
If you don't have a way to release the memory through free() after using malloc(), that's a problem in the long run, but in general, it is not a strict requirement: if your array is always needed, from its creation until the exit of the program, then there's no need to explicitly free() it, since memory is automatically released when the program terminates.
If your function needs to be called more than once or needs to create significantly sized arrays that are only useful in part of the program and should therefore be discarded when no longer in use, then I'm afraid there's no good way of doing it. You should use free() in that case.
To answer your question precisely:
Please take note that I'm not allowed to change anything on main(). Are there any other alternatives which I could use?
No, there are no other better alternatives. The only correct approach here is to dynamically allocate the array through malloc(). The fact that you cannot free it afterwards is a different kind of problem.
So i want to return an array of a size n(variable) which my function
has as input,
You can't, because C functions cannot return arrays at all. They can, and some do, return pointers, however, as your function is declared to do. Such a pointer may point to an element of an array.
i know that in order to return arrays in c i have to
define them static,
As long as I am being pedantic, the problem is to do with the lifetime of the object to which the returned pointer points. If it is an element of an automatically-allocated array, then it, along with the rest of the array, ceases to exist when the function returns. The caller must not try to dereference such a pointer.
The two other alternatives are
static allocation, which you get by declaring the variable static or by declaring it at file scope, and
dynamic allocation, which you get by reserving memory via malloc(), calloc(), or a related function.
Statically allocated objects exist for the entire lifetime of the program, and dynamically allocated ones exist until deallocated.
but problem is that n is a variable and thus i get
an error.
Yes, because variable-length arrays must be automatically allocated. Static objects exist for the whole run of the program, so the compiler needs to reserve space for them at compile time.
I thought of actually using malloc/calloc but then i won't be
able to free them after the function returns the array.
That's correct, but dynamic allocation is still probably the best solution. It is not unreasonable for a called function to return a pointer to an allocated object, thus putting the responsibility on its caller to free that object. Ordinarily, that would be a well-documented characteristic of the function, so that its callers know that such responsibility comes with calling the function.
Moreover, although it's a bit untidy, if your function is to be called only once then it may be acceptable to just allow the program to terminate without freeing the array. The host operating system can generally be relied upon to clean up the mess.
Please take
note that im not allowed to change anything on main(),are there any
other alternatives which i could use?
If you have or can impose a bound on the maximum value of n then you can declare a static array of that maximum size or longer, and return a pointer to that. The caller is receiving a pointer, remember, not an array, so it can't tell how long the pointed-to array actually is. All it knows is that the function promises n accessible elements.
Note well that there is a crucial difference between the dynamic allocation and static allocation alternatives: in the latter case, the function returns a pointer to the same array on every call. This is not inherently wrong, but it can be problematic. If implemented, it is a characteristic of the function that should be both intentional and well-documented.
If want an array of n floats where n is dynamic, you can either create a
variadic-length array (VLA):
void some_function(...)
{
//...
float b[ n ]; //allocate b on the stack
//...
}
in which case there would be no function call for the allocation, or you can allocate it dynamically, e.g., with malloc or calloc, and then free it after you're done with it.
float *b = malloc(sizeof(*b)*n);
A dynamic (malloc/calloc) allocation may be wrapped in a function that returns a pointer to the allocated memory (the wrapper may do some initializations on the allocated memory after the memory has been successfully allocated). A VLA allocation may not, because a VLA ends its lifetime at the end of its nearest enclosing block (C11 Standard - 6.2.4 Storage durations of objects(p7)).
If you do end up wrapping a malloc/calloc call in a "constructor" function like your float *Arr(void), then you obviously should not free the to-be-returned allocated memory inside Arr–Arr's caller would be responsible for freeing the result (unless it passed the responsibility over to some other part of the program):
float *Arr( int n, ...
/*some params to maybe initialize the array with ?*/ )
{
float *r; if (!(r=malloc(sizeof(*r)*n)) return NULL;
//...
//do some initializations on r
//...
return r; //the caller should free it
}
you could use malloc to reserve memory for your n sized array
Like this:
#include <stdlib.h>
#include <stdio.h>
float * arr(int * a, int n ) {
float *fp = malloc ( (size_t) sizeof(float)*n);
if (!fp) printf("Oh no! Run out of memory\n");
return fp;
}
int main () {
int i;
float * fpp = arr(&i,200);
printf("the float array is located at %p in memory\n", fpp);
return(0);
}
It seems like what you want to do is:
have a function that provides (space for) an array with a variable number of elements,
that the caller is not responsible for freeing,
that there only needs to be one instance of at a time.
In this case, instead of attempting to define a static array, you can use a static pointer to manage memory allocated and freed with realloc as needed to adjust the size, as shown in the code below. This will leave one instance in existence at all times after the first call, but so would a static array.
This might not be a good design (it depends on circumstances not stated in the question), but it seems to match what was requested.
#include <stdio.h>
#include <stdlib.h>
float *Arr(int *a , int n)
{
// Keep a static pointer to memory, with no memory allocated initially.
static float *b = NULL;
/* When we want n elements, use realloc to release the old memory, if any,
and allocate new memory.
*/
float *t = realloc(b, n * sizeof *t);
// Fail if the memory allocation failed.
if (!t)
{
fprintf(stderr, "Error, failed to allocate memory in Arr.\n");
exit(EXIT_FAILURE);
}
// Return the new memory.
return b;
}
This question already has answers here:
Difference between static memory allocation and dynamic memory allocation
(7 answers)
Closed 5 years ago.
I was wondering if someone could explain the differences between the memory allocation for ai and *pai
int ai[10];
int *pai = (int * ) calloc (10, sizeof(int));
I understand the second one is dynamically allocated but im struggling to explain why.
Let's see what is being specified in standard (difference wise)
From 7.22.3.1 (Under Memory management functions)
... The lifetime of an allocated object extends from the allocation
until the deallocation.
So yes, this is for dynamically allocated memory. Their lifetime is different from that of local variables. By calling free they are deallocated. Until then they will be alive. Doesn't depend on the life time of the scope on which they are created.
The first one is having automatic storage duration. This is the primary difference. So in the functions scope where it is declared, when it ends then it's lifetime will be over.
Also some people say that there is a heap and stack - but (un)fortunately C standard doesn't mention it. It is completely implementation of the features expected by the C standard. The implementation can be anything. The differences presented is least bothered about those kind of stuff.
As a conceptual redpill (taken from movie Matrix) pai is of automatic storage duration but the address of the memory it contains is not. The variable pai will be lost when the function where it is defined is executed. But the memory it points to, doesn't.
Well why is it called dynamic allocation?
Know one thing - when in programming we say dynamic in the context of language - it means we are doing something in runtime. Same here, we are allocating some memory when in run time by calling functions like malloc,calloc etc. That's why dynamic allocation.
In the first line, you create a variable of an array type, but the symbol ai is a constant pointer to this variable.
in the second line, you create a pointer type variable. then you allocate an array dynamically with calloc() and you puts it's address in the pointer.
The array ai is allocated on the stack, it implicitly goes out of scope, when the end of the function is reached. The pointer pai points to a memory location, which can be an array or a single element of the type pointed to, the memory is allocated on the heap and must be freed later. The second can be passed back to the function-caller on the end of the function and can even be resized with realloc (realloc does not clear the new memory like calloc does, malloc is like calloc without zeroing out the new memory). The first is for fast array computation and should be in the cache most of the time. The second is for unknown lenght of arrays, when the function is called. When the size is known, many programmers tend to define an array in the caller and pass it to the function, which modifies it. The array is implicitly converted to a pointer when calling the function.
Some library implementations store a pointer to an array in the global section, which can be reallocated. Or they have a fixed length array in global space. These variables are recommended to be thread_local. The user does not have to care about the memorymanagement of the variable of the other library.
library.h
const char* getResourceString(int id);
library.c
thread_local char* string_buf = NULL;
const char* getResourceString(int id) {
int i = getResourceSize(id);
string_buf = realloc(string_buf, i);
// fill the memory
return string_buffer;
};
These are quite different operations:
int ai[10];
declares an array object of 10 ints. If it is declared inside a block, it will have automatic storage duration, meaning that it will vanish at block end (both identifier and data). If it is declared outside any block (at file level) it will have static storage duration and will exist throughout all program.
int *pai = calloc (10, sizeof(int)); // DON'T CAST MALLOC IN C
declares a pointer to an allocated zone of memory that can contains ten integers. You can use pai as a pointer to the first element of an array and do pointer arithmetics on it. But sizeof(pai) is sizeof(int *). The array will have dynamic storage duration meaning that its life will end:
if the allocated block of memory is freed
if it is reused to store other objects
double * pd = pai;
for (int i=1; i<5; i++) { // assuming sizeof(double) == 2 * sizeof(int) here
pd[i] = i; // the allocated memory now contains 5 double
}
So in both case you can use the identifier as pointing to an array of 10 integers, but first one is an integer array object while second one is just a pointer to a block of dynamic memory (memory with no declared type that can take the type of an object that will be copied/created there) .
Gerenally speaking, automatically allocated objects will be on the stack, while dynamically allocated objects will be on the heap. Although this distinction is implementation (not standard) dependent, stack and heap are the most commonly used way to manage memory in C programs. They are basically two distinct regions of memory, the first is dedicated to automatic allocations and the second is dedicated to dynamic allocations. So when you call a function (say, the main function) all the objects declared in the scope of this function will be stacked (automatically allocated in the stack). If some dynamic allocation happens in this function, the memory will be allocated in the heap so that all pointers to this area will be pointing to objects outside the stack. When your function returns, all objects in the stack are also automatically unstacked and virtually don't exist anymore. But all objects in the heap will exist until you deallocate them (or they will be forcefully deallocated by the OS when the program ends). Arrays are structures that can be allocated automatically or dynamically. See this example:
int * automaticFactor() //wrong, see below
{
int x[10];
return &x[0];
}
int * dynamicFactor()
{
int * y = (int *) malloc(sizeof(int) * 10);
return &y[0];
}
int main()
{
//this will not work because &x[0] points to the stack
//and that area will be unstacked after the function return
int * n = automaticFactor();
//this will work because &y[0] points to the heap
//and that area will be in the heap until manual deallocation
int * m = dynamicFactor();
return 0;
}
Note that the pointers themselves are in the stack. What is in the heap is the area they are pointing to. So when you declare a pointer inside a function (such as the y of the example), it will also be unstacked at the end of the function. But since its value (i.e. the address of the allocated area) was returned to a pointer outside the function (i.e. to m), you will not lose track of the area allocated in the heap by the function.
If a variable is declared within a loop, does the previous declarations become garbage? For example, in the following:
loop{
int array[10];
array[i]=......
}
array is declared for each loop iteration. When it is newly declared, is the new memory location that array allocates same with the older location?. If it is not, does the older declarations become garbage, because the allocated area is not freed? Finally, how can it be freed without exiting the loop if the array is static like the above example?
You aren't actually allocating anything. This goes on the stack, and the size of the stack frame is calculated by your compliler at compile time. The array will reuse the same amout of stack space each iteration. The int array[10] does effectively nothing at run time.
There's a big difference by doing this:
for (...) {
int a[10];
a[0] = x;
}
and doing this:
for (...) {
int* a = (int*)malloc(sizeof(int)*10);
a[0] = x;
free(a);
}
The first "allocation" is fixed in size, and will cost you nothing. The second can be of variable size and will be a heap allocated array which you will need to manually free. C has no concept of garbage collection, so nothing really becomes garbage. But you are required to free whatever you allocate using the malloc function. If you never use that function you never need to free anything. The compiler will take care of that for you.
This is an automatic variable that the compiler handles - automatically.
You only have to take care of storage you allocate yourself using new or malloc. The rest is handled for you.
The array comes into scope each time you enter the loop and is destroyed again at the end of each loop. The compiler is very likely to reuse the same space each time, but that is not defined by the language. There will be no garbage either way.
You can assume that, for every iteration of the loop, a new array is created and and its is destroyed at the end of iteration. It implies content of newly created array is undefined .(may be garbage - more chances that it contain same data since it might occupy same place in the stack)
However, internally their wont be any allocation or deallocation for the int array[10] as pointed by Dervall
Why can I return from a function an array setup by malloc:
int *dog = (int*)malloc(n * sizeof(int));
but not an array setup by
int cat[3] = {0,0,0};
The "cat[ ]" array is returned with a Warning.
Thanks all for your help
This is a question of scope.
int cat[3]; // declares a local variable cat
Local variables versus malloc'd memory
Local variables exist on the stack. When this function returns, these local variables will be destroyed. At that point, the addresses used to store your array are recycled, so you cannot guarantee anything about their contents.
If you call malloc, you will be allocating from the heap, so the memory will persist beyond the life of your function.
If the function is supposed to return a pointer (in this case, a pointer-to-int which is the first address of the integer array), that pointer should point to good memory. Malloc is the way to ensure this.
Avoiding Malloc
You do not have to call malloc inside of your function (although it would be normal and appropriate to do so).
Alternatively, you could pass an address into your function which is supposed to hold these values. Your function would do the work of calculating the values and would fill the memory at the given address, and then it would return.
In fact, this is a common pattern. If you do this, however, you will find that you do not need to return the address, since you already know the address outside of the function you are calling. Because of this, it's more common to return a value which indicates the success or failure of the routine, like an int, than it is to return the address of the relevant data.
This way, the caller of the function can know whether or not the data was successfully populated or if an error occurred.
#include <stdio.h> // include stdio for the printf function
int rainCats (int *cats); // pass a pointer-to-int to function rainCats
int main (int argc, char *argv[]) {
int cats[3]; // cats is the address to the first element
int success; // declare an int to store the success value
success = rainCats(cats); // pass the address to the function
if (success == 0) {
int i;
for (i=0; i<3; i++) {
printf("cat[%d] is %d \r", i, cats[i]);
getchar();
}
}
return 0;
}
int rainCats (int *cats) {
int i;
for (i=0; i<3; i++) { // put a number in each element of the cats array
cats[i] = i;
}
return 0; // return a zero to signify success
}
Why this works
Note that you never did have to call malloc here because cats[3] was declared inside of the main function. The local variables in main will only be destroyed when the program exits. Unless the program is very simple, malloc will be used to create and control the lifespan of a data structure.
Also notice that rainCats is hard-coded to return 0. Nothing happens inside of rainCats which would make it fail, such as attempting to access a file, a network request, or other memory allocations. More complex programs have many reasons for failing, so there is often a good reason for returning a success code.
There are two key parts of memory in a running program: the stack, and the heap. The stack is also referred to as the call stack.
When you make a function call, information about the parameters, where to return, and all the variables defined in the scope of the function are pushed onto the stack. (It used to be the case that C variables could only be defined at the beginning of the function. Mostly because it made life easier for the compiler writers.)
When you return from a function, everything on the stack is popped off and is gone (and soon when you make some more function calls you'll overwrite that memory, so you don't want to be pointing at it!)
Anytime you allocate memory you are allocating if from the heap. That's some other part of memory, maintained by the allocation manager. Once you "reserve" part of it, you are responsible for it, and if you want to stop pointing at it, you're supposed to let the manager know. If you drop the pointer and can't ask to have it released any more, that's a leak.
You're also supposed to only look at the part of memory you said you wanted. Overwriting not just the part you said you wanted, but past (or before) that part of memory is a classic technique for exploits: writing information into part of memory that is holding computer instructions instead of data. Knowledge of how the compiler and the runtime manage things helps experts figure out how to do this. Well designed operating systems prevent them from doing that.
heap:
int *dog = (int*)malloc(n*sizeof(int*));
stack:
int cat[3] = {0,0,0};
Because int cat[3] = {0,0,0}; is declaring an automatic variable that only exists while the function is being called.
There is a special "dispensation" in C for inited automatic arrays of char, so that quoted strings can be returned, but it doesn't generalize to other array types.
cat[] is allocated on the stack of the function you are calling, when that stack is freed that memory is freed (when the function returns the stack should be considered freed).
If what you want to do is populate an array of int's in the calling frame pass in a pointer to an that you control from the calling frame;
void somefunction() {
int cats[3];
findMyCats(cats);
}
void findMyCats(int *cats) {
cats[0] = 0;
cats[1] = 0;
cats[2] = 0;
}
of course this is contrived and I've hardcoded that the array length is 3 but this is what you have to do to get data from an invoked function.
A single value works because it's copied back to the calling frame;
int findACat() {
int cat = 3;
return cat;
}
in findACat 3 is copied from findAtCat to the calling frame since its a known quantity the compiler can do that for you. The data a pointer points to can't be copied because the compiler does not know how much to copy.
When you define a variable like 'cat' the compiler assigns it an address. The association between the name and the address is only valid within the scope of the definition. In the case of auto variables that scope is the function body from the point of definition onwards.
Auto variables are allocated on the stack. The same address on the stack is associated with different variables at different times. When you return an array, what is actually returned is the address of the first element of the array. Unfortunately, after the return, the compiler can and will reuse that storage for completely unrelated purposes. What you'd see at a source code level would be your returned variable mysteriously changing for no apparent reason.
Now, if you really must return an initialized array, you can declare that array as static. A static variable has a permanent rather than a temporary storage allocation. You'll need to keep in mind that the same memory will be used by successive calls to the function, so the results from the previous call may need to be copied somewhere else before making the next call.
Another approach is to pass the array in as an argument and write into it in your function. The calling function then owns the variable, and the issues with stack variables don't arise.
None of this will make much sense unless you carefully study how the stack works. Good luck.
You cannot return an array. You are returning a pointer. This is not the same thing.
You can return a pointer to the memory allocated by malloc() because malloc() has allocated the memory and reserved it for use by your program until you explicitly use free() to deallocate it.
You may not return a pointer to the memory allocated by a local array because as soon as the function ends, the local array no longer exists.
This is a question of object lifetime - not scope or stack or heap. While those terms are related to the lifetime of an object, they aren't equivalent to lifetime, and it's the lifetime of the object that you're returning that's important. For example, a dynamically alloced object has a lifetime that extends from allocation to deallocataion. A local variable's lifetime might end when the scope of the variable ends, but if it's static its lifetime won't end there.
The lifetime of an object that has been allocated with malloc() is until that object has been freed using the free() function. Therefore when you create an object using malloc(), you can legitimately return the pointer to that object as long as you haven't freed it - it will still be alive when the function ends. In fact you should take care to do something with the pointer so it gets remembered somewhere or it will result in a leak.
The lifetime of an automatic variable ends when the scope of the variable ends (so scope is related to lifetime). Therefore, it doesn't make sense to return a pointer to such an object from a function - the pointer will be invalid as soon as the function returns.
Now, if your local variable is static instead of automatic, then its lifetime extends beyond the scope that it's in (therefore scope is not equivalent to lifetime). So if a function has a local static variable, the object will still be alive even when the function has returned, and it would be legitimate to return a pointer to a static array from your function. Though that brings in a whole new set of problems because there's only one instance of that object, so returning it multiple times from the function can cause problems with sharing the data (it basically only works if the data doesn't change after initialization or there are clear rules for when it can and cannot change).
Another example taken from another answer here is regarding string literals - pointers to them can be returned from a function not because of a scoping rule, but because of a rule that says that string literals have a lifetime that extends until the program ends.
1)
For which datatypes must I allocate memory with malloc?
For types like structs, pointers, except basic datatypes, like int
For all types?
2)
Why can I run this code? Why does it not crash? I assumed that I need to allocate memory for the struct first.
#include <stdio.h>
#include <stdlib.h>
typedef unsigned int uint32;
typedef struct
{
int a;
uint32* b;
}
foo;
int main(int argc, char* argv[])
{
foo foo2;
foo2.a = 3;
foo2.b = (uint32*)malloc(sizeof(uint32));
*foo2.b = 123;
}
Wouldn't it be better to use
foo* foo2 = malloc(sizeof(foo));
3)
How is foo.b set? Does is reference random memory or NULL?
#include <stdio.h>
#include <stdlib.h>
typedef unsigned int uint32;
typedef struct
{
int a;
uint32* b;
}
foo;
int main(int argc, char* argv[])
{
foo foo2;
foo2.a = 3;
}
All types in C can be allocated either dynamically, automatically (on the stack) or statically. The issue is not the type, but the lifetime you want - you use malloc when you want an object to exist outside of the scope of the function that created it, or when you don't know in advance how big a thing you need.
Edit to address your numbered questions.
There are no data types you must allocate with malloc. Only if you want a pointer type to point to valid memory must you use the unary & (address-of) operator or malloc() or some related function.
There is nothing wrong with your code - the line:
foo foo2;
Allocates a structure on the stack - then everything works as normal. Structures are no different in this sense than any other variable. It's not better or worse to use automatic variables (stack allocation) or globals or malloc(), they're all different, with different semantics and different reasons to choose them.
In your example in #3, foo2.b's value is undefined. Any automatic variable has an undefined and indeterminate value until you explicitly initialize it.
You must allocate with malloc any memory that you wish to be managed manually, as opposed to automatically. It doesn't matter if what's stored there is an int or a double or a struct or anything; malloc is all about manual memory management.
When you create a variable without malloc, it is stored on the stack and when it falls out of scope, its memory is automatically reclaimed. A variable falls out of scope when the variable is no longer accessible; e.g. when the block or function that the variable was declared in ends.
When you allocate memory with malloc, it is stored on the heap, and malloc returns a pointer to this memory. This memory will not be reclaimed until you call free on it, regardless of whether or not a pointer to it remains accessible (when no pointers remain to heap-allocated memory, this is a memory leak). This means that you gain the ability to continue to use the memory you allocated after the block or function that it was allocated in ends, but on the other hand you now have the responsibility to manually deallocate it when you are finished with it.
In your example, foo2 is on the stack, and will be automatically deallocated when main ends. However, the memory pointed to by foo2.b will not be automatically deallocated, since it is on the heap. This isn't a problem in your example because all memory is returned to the OS when the program ends, but if it were in a function other than main it would have been a memory leak.
foo foo2; automatically allocates the structure on the stack, and it is automatically deallocated when the enclosing function (main in this case) ends.
You only need to allocate memory on the heap, using malloc, if you need the structure to persist after the enclosing scope ends. You may also need to do this when the object is too large to fit on the stack.
2) Why can I run this code? Why does it not crash?
The code never refers to undefined memory or NULL. Why would it crash? (You have a memory leak as written, but it's probably because you're only showing part of the code and in the given program it's not a problem anyway.)
The alternative code you suggest will also work, though memory returned from malloc is also uninitialized by default. (I once worked with a custom memory allocator that filled returned memory blocks with ? characters by default. Still perfectly legal by the rules. Note that calloc returns a pointer to zero-initialized memory; use it if that's what you want.)
3) How is foo.b set? Does is reference random memory or NULL?
Random memory. Stack allocated structures are not initialized for you.
You can do it, but this is not enough.
Because, the second field is a pointer, which has to be set to a valid address. One of the ways to do it is by allocating memory (with malloc).
To your first question -
use malloc when you MUST manage object's lifetime manually.
For instance, the second code could be rewritten as:
int main(int argc, char* argv[])
{
foo foo2; uint32 b;
foo2.a = 3;
foo2.b = &b;
*foo2.b = 123;
}
This is better, because lifetime is the same, and the memory is now on stack - and doesn't need to be freed.
The memory for the struct instance ("foo2") will be allocated on the stack - there's no need to allocate memory for this yourself - if you do allocate using malloc be sure to free off the memory at a later date.