even after reading quite a bit about the strict-aliasing rules I am still confused. As far as I have understood this, it is impossible to implement a sane memory allocator that follows these rules, because malloc can never reuse freed memory, as the memory could be used to store different types at each allocation.
Clearly this cannot be right. What am I missing? How do you implement an allocator (or a memory pool) that follows strict-aliasing?
Thanks.
Edit:
Let me clarify my question with a stupid simple example:
// s == 0 frees the pool
void *my_custom_allocator(size_t s) {
static void *pool = malloc(1000);
static int in_use = FALSE;
if( in_use || s > 1000 ) return NULL;
if( s == 0 ) {
in_use = FALSE;
return NULL;
}
in_use = TRUE;
return pool;
}
main() {
int *i = my_custom_allocator(sizeof(int));
//use int
my_custom_allocator(0);
float *f = my_custom_allocator(sizeof(float)); //not allowed...
}
I don't think you're right. Even the strictest of strict aliasing rules would only count when the memory is actually allocated for a purpose. Once an allocated block has been released back to the heap with free, there should be no references to it and it can be given out again by malloc.
And the void* returned by malloc is not subject to the strict aliasing rule since the standard explicitly states that a void pointer can be cast into any other sort of pointer (and back again). C99 section 7.20.3 states:
The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).
In terms of your update (the example) where you don't actually return the memory back to the heap, I think your confusion arises because allocated object are treated specially. If you refer to 6.5/6 of C99, you see:
The effective type of an object for an access to its stored value is the declared type of the object, if any (footnote 75: Allocated objects have no declared type).
Re-read that footnote, it's important.
If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.
If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one.
For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.
In other words, the allocated block contents will become the type of the data item that you put in there.
If you put a float in there, you should only access it as a float (or compatible type). If you put in an int, you should only process it as an int (or compatible type).
The one thing you shouldn't do is to put a specific type of variable into that memory and then try to treat it as a different type - one reason for this being that objects are allowed to have trap representations (which cause undefined behaviour) and these representations may occur due to treating the same object as different types.
So, if you were to store an int in there before the deallocation in your code, then reallocate it as a float pointer, you should not try to use the float until you've actually put one in there. Up until that point, the type of the allocated is not yet float.
I post this answer to test my understanding of strict aliasing:
Strict aliasing matters only when actual reads and writes happen. Just as using multiple members of different type of an union simultaneously is undefined behavior, the same applies to pointers as well: you cannot use pointers of different type to access the same memory for the same reason you cannot do it with an union.
If you consider only one of the pointers as live, then it's not a problem.
So if you write through an int* and read through an int*, it is OK.
If you write using an int* and read through an float*, it is bad.
If you write using an int* and later you write again using float*, then read it out using a float*, then it's OK.
In case of non-trivial allocators you have a large buffer, which you typically store it in a char*. Then you make some sort of pointer arithmetic to calculate the address you want to allocate and then dereference it through the allocator's header structs. It doesn't matter what pointers do you use to do the pointer arithmetic only the pointer you dereference the area through matters. Since in an allocator you always do that via the allocator's header struct, you won't trigger undefined behavior by that.
Standard C does not define any efficient means by which a user-written memory allocator can safely take a region of memory that has been used as one type and make it safely available as another. Structures in C are guaranteed not to trap representations--a guarantee which would have little purpose if it didn't make it safe to copy structures with fields containing Indeterminate Value.
The difficulty is that given a structure and function like:
struct someStruct {unsigned char count; unsigned char dat[7]; }
void useStruct(struct someStruct s); // Pass by value
it should be possible to invoke it like:
someStruct *p = malloc(sizeof *p);
p->count = 1;
p->dat[0] = 42;
useStruct(*p);
without having to write all of the fields of the allocated structure first.
Although malloc will guarantee that the allocation block it returns may
be used by any type, there is no way for user-written memory-management
functions to enable such reuse of storage without either clearing it in
bytewise fashion (using a loop or memset) or else using free() and malloc()
to recycle the storage.
Within the allocator itself, only refer to your memory buffers as (void *). when it is optimized, the strict-aliasing optimizations shouldn't be applied by the compiler (because that module has no idea what types are stored there). when that object gets linked into the rest of the system, it should be left well-enough alone.
Hope this helps!
Related
So I think that what I got from this post is that it is UB or at least not a good idea to return pointers to locally defined variables, and in fact my compiler gives warning if I do this, however if I wrap the pointer in a struct then the warning no longer shows. Is it still UB or is there something different with structs that makes this OK. I'm using GCC 6.3.0 with -Wall.
Here is an example:
#include <stdio.h>
struct wrapper{
int *ptr;
};
struct wrapper foo(){
int a = 5;
struct wrapper new_wrapper = {&a};
return new_wrapper;
}
int main(){
printf("%d", *foo().ptr); //prints 5, but will it theoretically always do so?
return 0;
}
Returning a pointer is not a problem. The pointer and the object it points to are the problem. Using the pointer by any means is a problem, so wrapping it in a structure does not help.
When an object is “created” automatically inside a function, it is also “destroyed” when execution of the function ends. (In C, creating an object just means reserving memory for it, and destroying it means ending the reservation.)
Additionally, a pointer to memory that is no longer reserved is itself invalid. Although C implementations may implement pointers simply as memory addresses, some C implementations implement pointers in more complicated ways, and releasing the memory reservation may invalidate data needed to make the pointer work. So the C standard says that a pointer becomes invalid when the object it points to is released.
Because of that rule in the C standard, even implementations that implement pointers as memory addresses may treat pointers as invalid when the objects they point to are no longer reserved, with the result that optimization of the program by the compiler may cause any use of an invalid pointer to have unexpected effects.
Yes, it is. The local variable a lives only for the duration of foo.
Does realloc mutate its first argument?
Is mutating the first argument dependent on the implementation?
Is there a reason it should not be const? As a counter example memcpy makes its src argument const.
ISO C standard, section 7.20.3 Memory management functions, does not specify. The Linux man page for realloc does not specify.
#include <stdio.h>
#include <stdlib.h>
int main() {
int* list = NULL;
void* mem;
mem = realloc(list, 64);
printf("Address of `list`: %p\n", list);
list = mem;
printf("Address of `list`: %p\n", list);
mem = realloc(list, 0);
printf("Address of `list`: %p\n", list);
// free(list); // Double free
list = mem;
printf("Address of `list`: %p\n", list);
}
When I run the above code on my Debian laptop:
The first printf is null.
The second printf has an address.
The third printf has the same address as the second.
In accordance with the spec, trying to free the address results in a double free error.
The forth printf is null.
The function does not change the original pointer because it deals with a copy of the pointer. That is the pointer is not passed by reference.
Consider the following program
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *p = malloc( sizeof( int ) );
*p = 10;
printf( "Before p = %p\n", ( void * )p );
char *q = realloc( p, 2 * sizeof( int ) );
printf( "After p = %p\n", ( void * )p );
free( q );
return 0;
}
Its output is
Before p = 0x5644bcfde260
After p = 0x5644bcfde260
As you see the pointer p was not changed.
However the new pointer q can have the same value as pointer p had before the call of realloc.
From the C Standard (7.22.3.5 The realloc function)
4 The realloc function returns a pointer to the new object (which
may have the same value as a pointer to the old object), or a null
pointer if the new object could not be allocated.
Of course if you will write
p = realloc( p, 2 * sizeof( int ) );
instead of
char *q = realloc( p, 2 * sizeof( int ) );
then it is evident that in general the new value of pointer p can differ from the old value of p (though can be the same according to the quote). For example if the function was unable to reallocate memory. In this case a memory leak will occur provided that initial value of the pointer p was not equal to NULL. Because in this case (when the initial value of the pointer was not equal to NULL) the address of the early allocated memory will be lost.
The old memory is not deallocated if a new memory extent can not be
allocated because the function needs to copy the old content to the
new extent of memory.
From the C Standard (7.22.3.5 The realloc function)
If memory for the new object cannot be allocated, the old object is
not deallocated and its value is unchanged.
Pay attention to that this call
mem = realloc(list, 0);
does not necessary return NULL.
From the C Standard (7.22.3 Memory management functions)
If the size of the space requested is zero, the behavior is
implementation-defined: either a null pointer is returned, or the
behavior is as if the size were some nonzero value, except that the
returned pointer shall not be used to access an object.
First of all, formally, realloc frees the memory pointed to by its first argument after allocating a new object and copying the contents. As such, semantically it's absolutely correct that the pointed-to type not be const qualified. In limited cases, the new object's address may be the same as the old object's address, but a correct program largely can't even see this (comparing against the old pointer is undefined behavior), much less depend on it.
Secondly, I think you're confusing the const-ness of the argument type and the pointed-to type. const on argument types makes no sense whatsoever (and is ignored by the language, except in the implementation of the called function where it makes the local variable receiving the argument constant) since arguments are always values, not references to some object in the caller. Of course realloc can't change the value of the caller's pointer variable you pass to it. However, due to any use of invalid pointers being undefined behavior, your program can (because UB allows anything) exhibit behavior as if the caller's copy had been modified. For example, comparing it for equality with the new pointer may give inconsistent results. The const on memcpy's src makes a pointer-to-const type, not a const type.
realloc() can free the memory that its argument points to, if it can't reuse the same memory. I think this is considered to be like a mutation (since it effectively destroys it completely).
Semantically, realloc() is equivalent to:
void *realloc(void *ptr, size_t size) {
void *result = malloc(size);
if (result && ptr) {
memcpy(result, ptr, min(size, _allocation_size(ptr)));
free(ptr);
}
return result;
}
where _allocation_size() is some internal function of the C runtime that determines the size of a dynamic memory allocation.
Since the argument to free() is not declared const void *, neither is the first argument to realloc().
I'm not entirely sure what you mean by "Does realloc mutate its first argument?".
It certainly doesn't change the value of the pointer in the caller -- no C function can do that.
But does it alter the value of the pointed-to memory? That's a trickier question.
As far as the programmer is concerned, you hand realloc a pointer to M bytes, and it returns you a (possibly different) pointer to N bytes.
If it hands you back the same pointer (meaning that it was able to do the reallocation "in place"), and if N ≥ M, it definitely does not touch the M former bytes.
If it hands you back the same pointer but N < M (that is, if you reallocated the region smaller), you're no longer allowed to access or even ask about the bytes beyond M, so it's particularly hard to say whether they were modified. (But in fact, they might well have been modified, in the process of marking them unused, and available for future allocation).
Finally, if realloc hands you back a different pointer, the M former bytes are "gone" -- again, you're no longer allowed to access them, so it's hard to say if they were modified, but they probably were, because all of them are now available for future allocation.
But in any case: the pointer you hand to realloc is a pointer into the heap, and realloc definitely alters the heap as it does its work, so yes, I think it's safe to say that realloc mutates its first argument, which therefore should not be declared const. (Even in the first case I discussed, where realloc "definitely did not touch the M former bytes", it probably did still adjust some nearby data structures, to record the new allocation.)
And, finally, if by "mutate" you mean the sort of thing that C++ programs are allowed to do when member variables are declared mutable -- that is, a change happens behind the scenes to some data structure referenced by a pointer that was otherwise qualified const -- well, yes, that's not too far off from what realloc does. If realloc's first argument were const, and if the modifications realloc did perform were to data structures qualified as mutable, then I suppose this would work -- but also if we were talking about C++.
But of course we're not talking about C++; we're talking about C, which doesn't even have the mutable qualifier.
(I'd say memcpy isn't a counterexample, because it doesn't do anything that even remotely smells like writing to any data structures associated with its second argument.)
Does realloc mutate its first argument?
If you mean change the value of the variable passed as parameter the answer is no. The point isn't related to the specific realloc() function, but more generally to the way used by the language to handle parameters. C language produce a private copy of each argument, typically on the stack, before to pass them to the function, For this reason each change to them is confined locally and is lost when the function returns and the stack reused. Formally the C language pass almost all types by value (arrays are a well known exception). Anyway I'll come back on the argument below.
Is mutating the first argument dependent on the implementation?
Of course not. As said above this depends by the language.
Is there a reason it should not be const? As a counter example memcpy
makes its src argument const.
Of course there is a reason.
Forget about void * memcpy ( void * destination, const void * source, size_t num ) that has no connection to void* realloc (void* ptr, size_t size), lets consider that the management of dynamic memory depends on specific local implementation, but basically all allocation routines are based on memory pools, normally divided in small chunks, from where are derived the memory blocks returned to our programs. We can imagine that when we require to shrink the block the system will remove some chunks giving back a smaller block that incidentally remain at the same address, but if we require an extension maybe the chunks following our block are already assigned we can't proceed.
On an embedded 8 bit micro may happen that the actual memory block cannot be extended, but that another memory area is large enough for the scope, in that case we can copy the former block data to the new one and return it. But in this case we have a different address in memory.
But the malloc() must be universal independently from the machine where it is implemented, starting from 8 bits embedded applications to 64bits desktops with GBytes of available memory and virtual memory support. For this reason the standard must provide a definition that could fit all cases.
The second point is how pass the result, pass/fail, of the reallocation, if would have been used a reference to the memory block pointer (ie passing &ptr), in case of a failure returning NULL the original pointer would have been lost!. The user, to preserve it, must have done a copy of the pointer before to realloc(), but this procedure is unnatural e prone to errors.
For this reason in the standard library the problem is approached from a different side: the reallocation will formally return always a freshly allocated memory block in which has been copied the former memory block data. The programmer is required only to check the result before use it (see below code example).
The standard is very clear in the function definition, as already mentioned in other answers, that for sake of completeness I report below. From ISO/IEC 9899:2017 §7.22.3.5 The realloc function:
The realloc function deallocates the old object pointed to by ptr and
returns a pointer to a new object that has the size specified by size.
The contents of the new object shall be the same as that of the old
object prior to deallocation, up to the lesser of the new and old
sizes.
Any bytes in the new object beyond the size of the old object have
indeterminate values.
If ptr is a null pointer, the realloc function behaves like the malloc
function for the specified size. Otherwise, if ptr does not match a
pointer earlier returned by a memory management function, or if the
space has been deallocated by a call to the free or realloc function,
the behavior is undefined.
If size is nonzero and memory for the new object is not allocated, the
old object is not deallocated.
If size is zero and memory for the new object is not allocated, it is
implementation-defined whether the old object is deallocated. If the
old object is not deallocated, its value shall be unchanged.
The realloc function returns a pointer to the new object (which may
have the same value as a pointer to the old object), or a null pointer
if the new object has not been allocated.
Because you don't know if realloc() returns a new object or the former one, or even NULL in case of error, you should consider realloc() as always returning a new object, hence the code:
int* list = NULL;
void* mem;
mem = realloc(list, 64);
printf("Address of `list`: %p\n", list);
Is wrong at least for two reasons:
Obviously Because if realloc() return a new object and frees the
old memory, the variable list contains an invalid pointer. Moreover it could fail returning NULL, in that case the former block will still be valid.
Because you can't expect to have list changed in any way passing it as a local parameter in a function. Of course list will retain its former value that is NULL.
While passing a null pointer to realloc() is standard compliant, because it explicitly says that in this case the behavior will be the same as malloc(), passing a zero size the behavior is implementation-defined implying that the former block will be deallocated by some compilers, but not from some others. The latter means that the behavior can change on compiler basis, on your machine we can deduce that evidently the compiler behavior is to deallocate the block because of the double free error you got and the null pointer returned by realloc(). Please note also that in latter case when passing a zero size to realloc()the returned NULL could not mean that a failure occurred, and that the function was successful, but in case of failure you will not able to correctly understand if there was a failure or not. This is an ambiguity of the function (or it is so at my knowledge comments are welcome).
The golden rules to follow when using realloc() are basically these:
Keep in mind that the object returned from function is always a
new object and you have to save it.
Because realloc() can fail and return a NULL pointer, never use
code as that below, because if it fails we will overwrite the old
object pointer loosing the possibility to recover data or free the
former object. Always use a temporary variable to check the return
value.
Example code:
void *p = malloc(SIZE);
/* Wrong approach we overwrite anyway teh pointer */
p = realloc(p, 2*SIZE);
/** Correct approach */
void *pTmp = realloc(p, 2*SIZE);
if (NULL == pTmp)
{
//Error manage code
}
else
{
p = pTmp; //assign value
}
Now you may ask why on many machines, having virtual memory management as desktops, smartphones and the like, often happen to have unchanged memory address returned from realloc(). Well the point is that, thanks to the virtual memory management, more physical not contiguous memory chunks can be added to the virtual memory chain, then the virtual memory descriptors can be manipulated, mapping consequential virtual addresses to each physical chunk in such a way that the user sees a flat contiguous virtual memory space.
[Note: This is reposted from https://softwareengineering.stackexchange.com/q/369604/126197, where for some reason this question got immediately downvoted. Twice. Clearly there's more love over here!]
Here's a bit of code paraphrased from a vendor's example.
I've looked for authoritative documentation on passing stack-allocated structures by value, but haven't found the definitive word. In a nutshell: Does C99 guarantee this to be safe?
typedef struct {
int32_t upper;
int32_t lower;
} boundaries_t;
static boundaries_t calibrate() {
boundaries_t boundaries; // struct allocated on stack
boundaries.upper = getUpper();
boundaries.lower = getLower();
return boundaries; // return struct by value
}
int main() {
boundaries_t b;
b = calibrate();
// do stuff with b
...
}
Note that calibrate() allocates the boundaries struct on the stack and then returns it by value.
If the compiler can guarantee that the stack frame for calibrate() will be intact at the time of the assignment to b, then all is well. Perhaps that's part of the contract in C99's pass-by-value?
(Context: my world is embedded systems where pass-by-value is rarely seen. I do know that returning a pointer from a stack-allocated structure is a recipe for disaster, but this pass-by-value stuff feels alien.)
Yes, it's perfectly safe. When you return by value it copies the members of the structure into the caller's structure. As long as the structure doesn't contain any pointers to local objects, it's valid.
Returning structures tends to be uncommon, because if they're large it requires lots of copying. But sometimes we put arrays into structures to allow them to be passed and returned by value (arrays normally decay to pointers when used as parameters or return values) like other data types.
addendum by original asker
(I trust #Barmar won't mind...)
As #DanielH pointed out, in the case of SysV ABI for amd64, the compiler will make provisions for returning the struct by value. If it's small, the entire struct can be returned in a register (read: fast). If it's larger, the compiler allocates room in the caller's stack frame and passes a pointer to the callee. The callee then copies the value of the struct into that upon return. From the doc:
If the type has class MEMORY, then the caller provides space for the
return value and passes the address of this storage in %rdi as if it
were the first argument to the function. In effect, this address
becomes a “hidden” first argument.
b = calibrate();
// do stuff with b
is well behaved.
boundaries_t contains only integral types as members. Passing it by value and using the object it is assigned to in the function call is perfectly safe.
I dont have a link to a C99 reference, but what caught my eye was the struct assignment.
Assign one struct to another in C
It's basically Barmar's response.
Context:
I was reviewing some code that receives data from an IO descriptor into a character buffer, does some control on it and then use part of the received buffer to populate a struct, and suddenly wondered whether a strict aliasing rule violation could be involved.
Here is a simplified version
#define BFSZ 1024
struct Elt {
int id;
...
};
unsigned char buffer[BFSZ];
int sz = read(fd, buffer, sizeof(buffer)); // correctness control omitted for brievety
// search the beginning of struct data in the buffer, and process crc control
unsigned char *addr = locate_and_valid(buffer, sz);
struct Elt elt;
memcpy(&elt, addr, sizeof(elt)); // populates the struct
// and use it
int id = elt.id;
...
So far, so good. Provide the buffer did contain a valid representation of the struct - say it has been produced on same platform, so without endianness or padding problem - the memcpy call has populated the struct and it can safely be used.
Problem:
If the struct is dynamically allocated, it has no declared type. Let us replace last lines with:
struct Elt *elt = malloc(sizeof(struct Element)); // no declared type here
memcpy(elt, addr, sizeof(*elt)); // populates the newly allocated memory and copies the effective type
// and use it
int id = elt->id; // strict aliasing rule violation?
...
Draft n1570 for C language says in 6.5 Expressions §6
The effective type of an object for an access to its stored value is the declared type of the
object, if any.87) If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the lvalue becomes the
effective type of the object for that access and for subsequent accesses that do not modify
the stored value. If a value is copied into an object having no declared type using
memcpy or memmove, or is copied as an array of character type, then the effective type
of the modified object for that access and for subsequent accesses that do not modify the
value is the effective type of the object from which the value is copied, if it has one.
buffer does have an effective type and even a declared type: it is an array of unsigned char. That is the reason why the code uses a memcpy instead of a mere aliasing like:
struct Elt *elt = (struct Elt *) addr;
which would indeed be a strict aliasing rule violation (and could additionaly come with alignment problems). But if memcpy has given an effective type of an unsigned char array to the zone pointed by elt, everything is lost.
Question:
Does memcpy from an array of character type to a object with no declared type give an effective type of array of character?
Disclaimer:
I know that it works without a warning with all common compilers. I just want to know whether my understanding of standard is correct
In order to better show my problem, let us considere a different structure Elt2 with sizeof(struct Elt2)<= sizeof(struct Elt), and
struct Elt2 actual_elt2 = {...};
For static or automatic storage, I cannot reuse object memory:
struct Elt elt;
struct Elt2 *elt2 = &elt;
memcpy(elt2, &actual_elt2, sizeof(*elt2));
elt2->member = ... // strict aliasing violation!
While it is fine for dynamic one (question about it there):
struct Elt *elt = malloc(sizeof(*elt));
// use elt
...
struct Elt2 *elt2 = elt;
memcpy(elt2, &actual_elt2, sizeof(*elt2));
// ok, memory now have struct Elt2 effective type, and using elt would violate strict aliasing rule
elt2->member = ...; // fine
elt->id = ...; // strict aliasing rule violation!
What could make copying from a char array different?
The code is fine, no strict aliasing violation. The pointed-at data has an effective type, so the bold cited text does not apply. What applies here is the part you left out, last sentence of 6.5/6:
For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.
So the effective type of the pointed-at object becomes struct Elt. The returned pointer of malloc does indeed point to an object with no delcared type, but as soon as you point at it, the effective type becomes that of the struct pointer. Otherwise C programs would not be able to use malloc at all.
What makes the code safe is also that you are copying data into that struct. Had you instead just assigned a struct Elt* to point at the same memory location as addr, then you would have a strict aliasing violation and UB.
Lundin's answer is correct; what you are doing is fine (so long as the data is aligned and of same endianness).
I want to note this is not so much a result of the C language specification as it is a result of how the hardware works. As such, there's not a single authoritative answer. The C language specification defines how the language works, not how the language is compiled or implemented on different systems.
Here is an interesting article about memory alignment and strict aliasing on a SPARC versus Intel processor (notice the exact same C code performs differently, and gives errors on one platform while working on another):
https://askldjd.com/2009/12/07/memory-alignment-problems/
Fundamentally, two identical structs, on the same system with the same endian and memory alignment, must work via memcpy. If it didn't then the computer wouldn't be able to do much of anything.
Finally, the following question explains more about memory alignment on systems, and the answer by joshperry should help explain why this is a hardware issue, not a language issue:
Purpose of memory alignment
As this answer on another question covers, using an aggregate initialization
struct foo {
size_t a;
size_t b;
};
struct foo bar = {0};
results in built-in types being initialized to zero.
Is there any difference between using the above and using
struct foo * bar2 = calloc(1, sizeof(struct foo));
leaving aside the fact that one variable is a pointer.
Looking at the debugger we can see that both a and b are indeed set to zero for both of the above examples.
What's the difference between two above examples, are there any gotchas or hidden issues?
Yes, there is a crucial difference (aside from storage-class of your object of type struct foo):
struct foo bar = {0};
struct foo * bar2 = calloc(1, sizeof *bar2);
Every member of bar is zero-initialized (and the padding is zeroed out for sub-object without initializer, or if bar is of static or thread_local storage-class),
while all of *bar2 is zeroed out, which might have completely different results:
Neither null-pointers (T*)0 nor floating-point-numbers with value 0 are guaranteed to be all-bits-0.
(Actually, only for char, unsigned char and signed char (as well as some of the optional exact-size-types from <stdint.h>) it is guaranteed that all-bits-0 matches value-0 till some time after C99. A later technical corrigenda guaranteed it for all integral types.)
The floating-point-format might not be IEEE754.
(On most modern systems you can ignore that possibility though.)
Cite from c-faq (Thanks to Jim Balter for linking it):
The Prime 50 series used segment 07777, offset 0 for the null pointer, at least for PL/I.
struct foo bar = {0};
This defines an object of type struct foo named bar, and initializes it to zero.
"Zero" is defined recursively. All integer subobjects are initialized to 0, all floating-point subobjects to 0.0, and all pointers to NULL.
struct foo * bar2 = calloc(1, sizeof(struct foo));
IMHO this is better (but equivalently) written as:
struct foo *bar2 = calloc(1, sizeof *bar2);
By not repeating the type name, we avoid the risk of a mismatch when the code is changed later on.
This dynamically allocates an object of type struct foo (on the heap), initializes that object to all-bits-zero, and initializes bar2 to point to it.
calloc can fail to allocate memory. If it does, it returns a null pointer. You should always check for that. (The declaration of bar also allocates memory, but if it fails it's a stack overflow, and there's no good way to handle it.)
And all-bits-zero is not guaranteed to be the same as "zero". For integer types (including size_t), it's very nearly guaranteed. For floating-point and pointer types, it's entirely legal for 0.0 or NULL to have some internal representation other than all-bits-zero. You're unlikely to run into this, and since all the members of your structure are integer you probably don't need to worry about it.
calloc gives you a heap dynamically allocated zeroed memory zone (into your bar2). But an automatic variable (like bar, assuming its declaration is inside a function) is allocated on the call stack. See also calloc(3)
In C, you need to explicitly free heap allocated memory zone. But stack allocated data is popped when its function is returning.
Rerad also wikipage on C dynamic memory allocation, and on garbage collection. Reference counting is a widely used technique in C and in C++, and could be viewed as a form of GC. Think about circular references, they are hard to handle.
The Boehm conservative GC can be used in C programs.
Notice that the liveness of a memory zone is a global program-wide property. You generally cannot claim that a give zone belongs to a particular function (or library). But you could adopt conventions about that.
When you code a function returning a heap-allocated pointer (i.e. some pointer to dynamic storage) you should document that fact and decide who is in charge of freeing it.
About initialization: a calloc pointer is zeroed (when calloc succeeds). An automatic variable initialized as {0} is also zeroed. In practice, some implementations may calloc differently big objects (by asking whole zeroed pages from the kernel for them, e.g. with mmap(2)) and small objects (by reusing, if available, a previously free-d zone and zeroing it). zero-ing a zone is using a fast equivalent of memset(3)
PS. I am ignoring the weird machines on which an all zero-bit memory zone is not a cleared data for the C standard, i.e. like {0}. I don't know such machines in practice, even if I know they are in principle possible (and in theory the NULL pointer might not be an all-zero-bit word)
BTW, the compiler may optimize an all-zero local structure (and perhaps not allocate it at all on the stack, since it would fit in registers).
(This answer focuses on the differences in initialization, in the case of a struct only containing integral types)
Both forms set a and b to 0. This is because the Standard defines that all-bits-zero for an integral type must represent a value of 0.
If there is structure padding, then the calloc version sets that but the zero-initialization may not. For example:
struct foo a = { 0 }, b = { 0 };
struct foo c, d; memset(&c, 0, sizeof c); memset(&d, 0, sizeof d);
if ( memcmp(&a, &b, sizeof a) )
printf("This line may appear.\n");
if ( memcmp(&c, &d, sizeof c) )
printf("This line must not appear.\n");
A technique you will sometimes see (especially in code designed to fit on systems with small amounts of storage) is that of using memcmp to compare two structs for equality. When there is padding between structure members, this is unreliable as the padding may be different even though the structure members are the same.
The programmer didn't want to compare structure members individually as that is too much code size, so instead, he will copy structs around using memcpy, initialize them using memset; in order to preserve the ability to use memcmp to check for equality.
In modern programming I'd strongly advise to not do this; and to always use the { 0 } form of initailization. Another benefit of the latter is that there is no chance of making a mistake with the size argument and accidentally setting too much memory or too little memory.
There is a serious difference: allocation of automatic variables is done at compile-time and comes for free (when the stack frame is reserved, the room is there.) On the opposite, dynamic allocation is done at run-time and has an unpredictible and non neglectible cost.
As regards initialization, the compiler has opportunities for optimization with automatic variables (for instance by not clearing if unnecessary); this is not possible with a call of calloc.
If you like the calloc style, you also have the option of performing memset on the automatic variable.
memset(&bar, 0, sizeof bar);
UPDATE: allocation of automatic variables is quasi-done at compile-time.