clang - Undefined behavior in realloc aliasing [duplicate] - c

When you free memory, what happens to pointers that point into that memory? Do they become invalid immediately? What happens if they later become valid again?
Certainly, the usual case of a pointer going invalid then becoming "valid" again would be some other object getting allocated into what happens to be the memory that was used before, and if you use the pointer to access memory, that's obviously undefined behavior. Dangling pointer memory overwrite lesson 1, pretty much.
But what if the memory becomes valid again for the same allocation? There's only one Standard way for that to happen: realloc(). If you have a pointer to somewhere within a malloc()'d memory block at offset > 1, then use realloc() to shrink the block to less than your offset, your pointer becomes invalid, obviously. If you then use realloc() again grow the block back to at least cover the object type pointed to by the dangling pointer, and in neither case did realloc() move the memory block, is the dangling pointer valid again?
This is such a corner case that I don't really know how to interpret the C or C++ standards to figure it out. The below is a program that shows it.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
static const char s_message[] = "hello there";
static const char s_kitty[] = "kitty";
char *string = malloc(sizeof(s_message));
if (!string)
{
fprintf(stderr, "malloc failed\n");
return 1;
}
memcpy(string, s_message, sizeof(s_message));
printf("%p %s\n", string, string);
char *overwrite = string + 6;
*overwrite = '\0';
printf("%p %s\n", string, string);
string[4] = '\0';
char *new_string = realloc(string, 5);
if (new_string != string)
{
fprintf(stderr, "realloc #1 failed or moved the string\n");
free(new_string ? new_string : string);
return 1;
}
string = new_string;
printf("%p %s\n", string, string);
new_string = realloc(string, 6 + sizeof(s_kitty));
if (new_string != string)
{
fprintf(stderr, "realloc #2 failed or moved the string\n");
free(new_string ? new_string : string);
return 1;
}
// Is this defined behavior, even though at one point,
// "overwrite" was a dangling pointer?
memcpy(overwrite, s_kitty, sizeof(s_kitty));
string[4] = s_message[4];
printf("%p %s\n", string, string);
free(string);
return 0;
}

When you free memory, what happens to pointers that point into that memory? Do they become invalid immediately?
Yes, definitely. From section 6.2.4 of the C standard:
The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address, and retains
its last-stored value throughout its lifetime. If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to (or just past) reaches the end of its lifetime.
And from section 7.22.3.5:
The realloc function deallocates the old object pointed to by ptr and returns a
pointer to a new object that has the size specified by size. The contents of the new
object shall be the same as that of the old object prior to deallocation, up to the lesser of
the new and old sizes. Any bytes in the new object beyond the size of the old object have
indeterminate values.
Note the reference to old object and new object ... by the standard, what you get back from realloc is a different object than what you had before; it's no different from doing a free and then a malloc, and there is no guarantee that the two objects have the same address, even if the new size is <= the old size ... and in real implementations they often won't because objects of different sizes are drawn from different free lists.
What happens if they later become valid again?
There's no such animal. Validity isn't some event that takes place, it's an abstract condition placed by the C standard. Your pointers might happen to work in some implementation, but all bets are off once you free the memory they point into.
But what if the memory becomes valid again for the same allocation? There's only one Standard way for that to happen: realloc()
Sorry, no, the C Standard does not contain any language to that effect.
If you then use realloc() again grow the block back to at least cover the object type pointed to by the dangling pointer, and in neither case did realloc() move the memory block
You can't know whether it will ... the standard does not guarantee any such thing. And notably, when you realloc to a smaller size, most implementations modify the memory immediately following the shortened block; reallocing back to the original size will have some garbage in the added part, it won't be what it was before it was shrunk. In some implementations, some block sizes are kept on lists for that block size; reallocating to a different size will give you totally different memory. And in a program with multiple threads, any freed memory can be allocated in a different thread between the two reallocs, in which case the realloc for a larger size will be forced to move the object to a different location.
is the dangling pointer valid again?
See above; invalid is invalid; there's no going back.
This is such a corner case that I don't really know how to interpret the C or C++ standards to figure it out.
It's not any sort of corner case and I don't know what you're seeing in the standard, which is quite clear that freed memory has indeteterminate content and that the values of any pointers to or into it are also indeterminate, and makes no claim that they are magically restored by a later realloc.
Note that modern optimizing compilers are written to know about undefined behavior and take advantage of it. As soon as you realloc string, overwrite is invalid, and the compiler is free to trash it ... e.g., it might be in a register that the compiler reallocates for temporaries or parameter passing. Whether any compiler does this, it can, precisely because the standard is quite clear about pointers into objects becoming invalid when the object's lifetime ends.

If you then use realloc() again grow the block back to at least cover the object type pointed to by the dangling pointer, and in neither case did realloc() move the memory block, is the dangling pointer valid again?
No. Unless realloc() returns a null pointer, the call terminates the lifetime of the allocated object, implying that all pointers pointing into it become invalid. If realloc() succeeds, it returns the address of a new object.
Of course, it just might happen that it's the same address as the old one. In that case, using an invalid pointer to the old object to access the new one will generally work in non-optimizing implementations of the C language.
It would still be undefined behaviour, though, and might actually fail with aggressively optimizing compilers.
The C language is unsound, and it's generally up to the programmer to uphold its invariants. Failing to do so will break the implicit contract with the compiler and may result in incorrect code being generated.

It depends on your definition of "valid". You've perfectly described the situation. If you want to consider that "valid", then it's valid. If you don't want to consider that "valid", then it's invalid.

Related

Does realloc mutate its arguments

Does realloc mutate its first argument?
Is mutating the first argument dependent on the implementation?
Is there a reason it should not be const? As a counter example memcpy makes its src argument const.
ISO C standard, section 7.20.3 Memory management functions, does not specify. The Linux man page for realloc does not specify.
#include <stdio.h>
#include <stdlib.h>
int main() {
int* list = NULL;
void* mem;
mem = realloc(list, 64);
printf("Address of `list`: %p\n", list);
list = mem;
printf("Address of `list`: %p\n", list);
mem = realloc(list, 0);
printf("Address of `list`: %p\n", list);
// free(list); // Double free
list = mem;
printf("Address of `list`: %p\n", list);
}
When I run the above code on my Debian laptop:
The first printf is null.
The second printf has an address.
The third printf has the same address as the second.
In accordance with the spec, trying to free the address results in a double free error.
The forth printf is null.
The function does not change the original pointer because it deals with a copy of the pointer. That is the pointer is not passed by reference.
Consider the following program
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *p = malloc( sizeof( int ) );
*p = 10;
printf( "Before p = %p\n", ( void * )p );
char *q = realloc( p, 2 * sizeof( int ) );
printf( "After p = %p\n", ( void * )p );
free( q );
return 0;
}
Its output is
Before p = 0x5644bcfde260
After p = 0x5644bcfde260
As you see the pointer p was not changed.
However the new pointer q can have the same value as pointer p had before the call of realloc.
From the C Standard (7.22.3.5 The realloc function)
4 The realloc function returns a pointer to the new object (which
may have the same value as a pointer to the old object), or a null
pointer if the new object could not be allocated.
Of course if you will write
p = realloc( p, 2 * sizeof( int ) );
instead of
char *q = realloc( p, 2 * sizeof( int ) );
then it is evident that in general the new value of pointer p can differ from the old value of p (though can be the same according to the quote). For example if the function was unable to reallocate memory. In this case a memory leak will occur provided that initial value of the pointer p was not equal to NULL. Because in this case (when the initial value of the pointer was not equal to NULL) the address of the early allocated memory will be lost.
The old memory is not deallocated if a new memory extent can not be
allocated because the function needs to copy the old content to the
new extent of memory.
From the C Standard (7.22.3.5 The realloc function)
If memory for the new object cannot be allocated, the old object is
not deallocated and its value is unchanged.
Pay attention to that this call
mem = realloc(list, 0);
does not necessary return NULL.
From the C Standard (7.22.3 Memory management functions)
If the size of the space requested is zero, the behavior is
implementation-defined: either a null pointer is returned, or the
behavior is as if the size were some nonzero value, except that the
returned pointer shall not be used to access an object.
First of all, formally, realloc frees the memory pointed to by its first argument after allocating a new object and copying the contents. As such, semantically it's absolutely correct that the pointed-to type not be const qualified. In limited cases, the new object's address may be the same as the old object's address, but a correct program largely can't even see this (comparing against the old pointer is undefined behavior), much less depend on it.
Secondly, I think you're confusing the const-ness of the argument type and the pointed-to type. const on argument types makes no sense whatsoever (and is ignored by the language, except in the implementation of the called function where it makes the local variable receiving the argument constant) since arguments are always values, not references to some object in the caller. Of course realloc can't change the value of the caller's pointer variable you pass to it. However, due to any use of invalid pointers being undefined behavior, your program can (because UB allows anything) exhibit behavior as if the caller's copy had been modified. For example, comparing it for equality with the new pointer may give inconsistent results. The const on memcpy's src makes a pointer-to-const type, not a const type.
realloc() can free the memory that its argument points to, if it can't reuse the same memory. I think this is considered to be like a mutation (since it effectively destroys it completely).
Semantically, realloc() is equivalent to:
void *realloc(void *ptr, size_t size) {
void *result = malloc(size);
if (result && ptr) {
memcpy(result, ptr, min(size, _allocation_size(ptr)));
free(ptr);
}
return result;
}
where _allocation_size() is some internal function of the C runtime that determines the size of a dynamic memory allocation.
Since the argument to free() is not declared const void *, neither is the first argument to realloc().
I'm not entirely sure what you mean by "Does realloc mutate its first argument?".
It certainly doesn't change the value of the pointer in the caller -- no C function can do that.
But does it alter the value of the pointed-to memory? That's a trickier question.
As far as the programmer is concerned, you hand realloc a pointer to M bytes, and it returns you a (possibly different) pointer to N bytes.
If it hands you back the same pointer (meaning that it was able to do the reallocation "in place"), and if N ≥ M, it definitely does not touch the M former bytes.
If it hands you back the same pointer but N < M (that is, if you reallocated the region smaller), you're no longer allowed to access or even ask about the bytes beyond M, so it's particularly hard to say whether they were modified. (But in fact, they might well have been modified, in the process of marking them unused, and available for future allocation).
Finally, if realloc hands you back a different pointer, the M former bytes are "gone" -- again, you're no longer allowed to access them, so it's hard to say if they were modified, but they probably were, because all of them are now available for future allocation.
But in any case: the pointer you hand to realloc is a pointer into the heap, and realloc definitely alters the heap as it does its work, so yes, I think it's safe to say that realloc mutates its first argument, which therefore should not be declared const. (Even in the first case I discussed, where realloc "definitely did not touch the M former bytes", it probably did still adjust some nearby data structures, to record the new allocation.)
And, finally, if by "mutate" you mean the sort of thing that C++ programs are allowed to do when member variables are declared mutable -- that is, a change happens behind the scenes to some data structure referenced by a pointer that was otherwise qualified const -- well, yes, that's not too far off from what realloc does. If realloc's first argument were const, and if the modifications realloc did perform were to data structures qualified as mutable, then I suppose this would work -- but also if we were talking about C++.
But of course we're not talking about C++; we're talking about C, which doesn't even have the mutable qualifier.
(I'd say memcpy isn't a counterexample, because it doesn't do anything that even remotely smells like writing to any data structures associated with its second argument.)
Does realloc mutate its first argument?
If you mean change the value of the variable passed as parameter the answer is no. The point isn't related to the specific realloc() function, but more generally to the way used by the language to handle parameters. C language produce a private copy of each argument, typically on the stack, before to pass them to the function, For this reason each change to them is confined locally and is lost when the function returns and the stack reused. Formally the C language pass almost all types by value (arrays are a well known exception). Anyway I'll come back on the argument below.
Is mutating the first argument dependent on the implementation?
Of course not. As said above this depends by the language.
Is there a reason it should not be const? As a counter example memcpy
makes its src argument const.
Of course there is a reason.
Forget about void * memcpy ( void * destination, const void * source, size_t num ) that has no connection to void* realloc (void* ptr, size_t size), lets consider that the management of dynamic memory depends on specific local implementation, but basically all allocation routines are based on memory pools, normally divided in small chunks, from where are derived the memory blocks returned to our programs. We can imagine that when we require to shrink the block the system will remove some chunks giving back a smaller block that incidentally remain at the same address, but if we require an extension maybe the chunks following our block are already assigned we can't proceed.
On an embedded 8 bit micro may happen that the actual memory block cannot be extended, but that another memory area is large enough for the scope, in that case we can copy the former block data to the new one and return it. But in this case we have a different address in memory.
But the malloc() must be universal independently from the machine where it is implemented, starting from 8 bits embedded applications to 64bits desktops with GBytes of available memory and virtual memory support. For this reason the standard must provide a definition that could fit all cases.
The second point is how pass the result, pass/fail, of the reallocation, if would have been used a reference to the memory block pointer (ie passing &ptr), in case of a failure returning NULL the original pointer would have been lost!. The user, to preserve it, must have done a copy of the pointer before to realloc(), but this procedure is unnatural e prone to errors.
For this reason in the standard library the problem is approached from a different side: the reallocation will formally return always a freshly allocated memory block in which has been copied the former memory block data. The programmer is required only to check the result before use it (see below code example).
The standard is very clear in the function definition, as already mentioned in other answers, that for sake of completeness I report below. From ISO/IEC 9899:2017 §7.22.3.5 The realloc function:
The realloc function deallocates the old object pointed to by ptr and
returns a pointer to a new object that has the size specified by size.
The contents of the new object shall be the same as that of the old
object prior to deallocation, up to the lesser of the new and old
sizes.
Any bytes in the new object beyond the size of the old object have
indeterminate values.
If ptr is a null pointer, the realloc function behaves like the malloc
function for the specified size. Otherwise, if ptr does not match a
pointer earlier returned by a memory management function, or if the
space has been deallocated by a call to the free or realloc function,
the behavior is undefined.
If size is nonzero and memory for the new object is not allocated, the
old object is not deallocated.
If size is zero and memory for the new object is not allocated, it is
implementation-defined whether the old object is deallocated. If the
old object is not deallocated, its value shall be unchanged.
The realloc function returns a pointer to the new object (which may
have the same value as a pointer to the old object), or a null pointer
if the new object has not been allocated.
Because you don't know if realloc() returns a new object or the former one, or even NULL in case of error, you should consider realloc() as always returning a new object, hence the code:
int* list = NULL;
void* mem;
mem = realloc(list, 64);
printf("Address of `list`: %p\n", list);
Is wrong at least for two reasons:
Obviously Because if realloc() return a new object and frees the
old memory, the variable list contains an invalid pointer. Moreover it could fail returning NULL, in that case the former block will still be valid.
Because you can't expect to have list changed in any way passing it as a local parameter in a function. Of course list will retain its former value that is NULL.
While passing a null pointer to realloc() is standard compliant, because it explicitly says that in this case the behavior will be the same as malloc(), passing a zero size the behavior is implementation-defined implying that the former block will be deallocated by some compilers, but not from some others. The latter means that the behavior can change on compiler basis, on your machine we can deduce that evidently the compiler behavior is to deallocate the block because of the double free error you got and the null pointer returned by realloc(). Please note also that in latter case when passing a zero size to realloc()the returned NULL could not mean that a failure occurred, and that the function was successful, but in case of failure you will not able to correctly understand if there was a failure or not. This is an ambiguity of the function (or it is so at my knowledge comments are welcome).
The golden rules to follow when using realloc() are basically these:
Keep in mind that the object returned from function is always a
new object and you have to save it.
Because realloc() can fail and return a NULL pointer, never use
code as that below, because if it fails we will overwrite the old
object pointer loosing the possibility to recover data or free the
former object. Always use a temporary variable to check the return
value.
Example code:
void *p = malloc(SIZE);
/* Wrong approach we overwrite anyway teh pointer */
p = realloc(p, 2*SIZE);
/** Correct approach */
void *pTmp = realloc(p, 2*SIZE);
if (NULL == pTmp)
{
//Error manage code
}
else
{
p = pTmp; //assign value
}
Now you may ask why on many machines, having virtual memory management as desktops, smartphones and the like, often happen to have unchanged memory address returned from realloc(). Well the point is that, thanks to the virtual memory management, more physical not contiguous memory chunks can be added to the virtual memory chain, then the virtual memory descriptors can be manipulated, mapping consequential virtual addresses to each physical chunk in such a way that the user sees a flat contiguous virtual memory space.

How come I can initialize a pointer that has zero bytes allocated to it? [duplicate]

This question already has answers here:
What's the point of malloc(0)?
(17 answers)
Closed 4 years ago.
I know that malloc(size_t size) allocates size bytes and returns a pointer to the allocated memory. .
So how come when I allocate zero bytes to the integer pointer p, I am still able to initialize it?
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *p = malloc(0);
*p = 10;
printf("Pointer address is: %p\n", p);
printf("The value of the pointer: %d\n", *p);
return 0;
}
Here is my program output, I was expecting a segmentation fault.
Pointer address is: 0x1ebd260
The value of the pointer: 10
The behavior of malloc(0) is implementation defined, it will either return a pointer or NULL. As per C Standard, you should not use the pointer returned by malloc when requested zero size space1).
Dereferencing the pointer returned by malloc(0) is undefined behavior which includes the program may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
1) From C Standard#7.22.3p1 [emphasis added]:
1 The order and contiguity of storage allocated by successive calls to the aligned_alloc, calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object. The pointer returned points to the start (lowest byte address) of the allocated space. If the space cannot be allocated, a null pointer is returned. If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.
When you call malloc(0) and write to the returned buffer, you invoke undefined behavior. That means you can't predict how the program will behave. It might crash, it might output strange results, or (as in this case) it may appear to work properly.
Just because the program could crash doesn't mean it will.
In general, C does not prevent you from doing incorrect things.
After int *p = malloc(0);, p has some value. It might be a null pointer, or it might point to one or more bytes of memory. In either case, you should not use it.1 But the C language does not stop you from doing so.
When you execute *p = 10;, the compiler may have generated code to write 10 to the place where p points. p may be pointing at actual writable memory, so the store instruction may execute without failing. And then you have written 10 to a place in memory where you should not. At this point, the C standard no longer specifies what the behavior of your program is—by writing to an inappropriate place, you have broken the model of how C works.
It is also possible your compiler recognizes that *p = 10; is incorrect code in this situation and generates something other than the write to memory described above. A good compiler might give you a warning message for this code, but the compiler is not obligated to do this by the C standard, and it can allow your program to break in other ways.
Footnote
1 If malloc returns a null pointer, you should not write to *p because it is not pointing to an object. If it returns something else, you should not write to *p because C 2018 7.22.3.1 says, for this of malloc(0), “the returned pointer shall not be used to access an object.”
I think it is good idea to look at segmentation fault definition https://en.wikipedia.org/wiki/Segmentation_fault
The following are some typical causes of a segmentation fault:
Attempting to access a nonexistent memory address (outside process's address space)
Attempting to access memory the program does not have rights to (such as kernel structures in process context)
Attempting to write read-only memory (such as code segment)
So in general, access to any address in program's data segment will not lead to seg fault. This approach makes sence as C doesn't have any framework with memory management in it (like C# or Java). So as malloc in this example returns some address (not NULL) it returns it from program's data segment which can be accessed by program.
However if program is complex, such action (*p = 10) may overwrite data belonging to some other variable or object (or even pointer!) and lead to undefined behaviour (including seg fault).
But please take in account, that as described in other answers such program is not best practice and you shouldn't use such approach in production.
This is a source code of malloc() (maybe have a litter difference between kernel versions but concept still like this), It can answer your question:
static void *malloc(int size)
{
void *p;
if (size < 0)
error("Malloc error");
if (!malloc_ptr)
malloc_ptr = free_mem_ptr;
malloc_ptr = (malloc_ptr + 3) & ~3; /* Align */
p = (void *)malloc_ptr;
malloc_ptr += size;
if (free_mem_end_ptr && malloc_ptr >= free_mem_end_ptr)
error("Out of memory");
malloc_count++;
return p;
}
When you assigned size is 0, it already gave you a pointer to first address of memory

realloc() dangling pointers and undefined behavior

When you free memory, what happens to pointers that point into that memory? Do they become invalid immediately? What happens if they later become valid again?
Certainly, the usual case of a pointer going invalid then becoming "valid" again would be some other object getting allocated into what happens to be the memory that was used before, and if you use the pointer to access memory, that's obviously undefined behavior. Dangling pointer memory overwrite lesson 1, pretty much.
But what if the memory becomes valid again for the same allocation? There's only one Standard way for that to happen: realloc(). If you have a pointer to somewhere within a malloc()'d memory block at offset > 1, then use realloc() to shrink the block to less than your offset, your pointer becomes invalid, obviously. If you then use realloc() again grow the block back to at least cover the object type pointed to by the dangling pointer, and in neither case did realloc() move the memory block, is the dangling pointer valid again?
This is such a corner case that I don't really know how to interpret the C or C++ standards to figure it out. The below is a program that shows it.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
static const char s_message[] = "hello there";
static const char s_kitty[] = "kitty";
char *string = malloc(sizeof(s_message));
if (!string)
{
fprintf(stderr, "malloc failed\n");
return 1;
}
memcpy(string, s_message, sizeof(s_message));
printf("%p %s\n", string, string);
char *overwrite = string + 6;
*overwrite = '\0';
printf("%p %s\n", string, string);
string[4] = '\0';
char *new_string = realloc(string, 5);
if (new_string != string)
{
fprintf(stderr, "realloc #1 failed or moved the string\n");
free(new_string ? new_string : string);
return 1;
}
string = new_string;
printf("%p %s\n", string, string);
new_string = realloc(string, 6 + sizeof(s_kitty));
if (new_string != string)
{
fprintf(stderr, "realloc #2 failed or moved the string\n");
free(new_string ? new_string : string);
return 1;
}
// Is this defined behavior, even though at one point,
// "overwrite" was a dangling pointer?
memcpy(overwrite, s_kitty, sizeof(s_kitty));
string[4] = s_message[4];
printf("%p %s\n", string, string);
free(string);
return 0;
}
When you free memory, what happens to pointers that point into that memory? Do they become invalid immediately?
Yes, definitely. From section 6.2.4 of the C standard:
The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address, and retains
its last-stored value throughout its lifetime. If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to (or just past) reaches the end of its lifetime.
And from section 7.22.3.5:
The realloc function deallocates the old object pointed to by ptr and returns a
pointer to a new object that has the size specified by size. The contents of the new
object shall be the same as that of the old object prior to deallocation, up to the lesser of
the new and old sizes. Any bytes in the new object beyond the size of the old object have
indeterminate values.
Note the reference to old object and new object ... by the standard, what you get back from realloc is a different object than what you had before; it's no different from doing a free and then a malloc, and there is no guarantee that the two objects have the same address, even if the new size is <= the old size ... and in real implementations they often won't because objects of different sizes are drawn from different free lists.
What happens if they later become valid again?
There's no such animal. Validity isn't some event that takes place, it's an abstract condition placed by the C standard. Your pointers might happen to work in some implementation, but all bets are off once you free the memory they point into.
But what if the memory becomes valid again for the same allocation? There's only one Standard way for that to happen: realloc()
Sorry, no, the C Standard does not contain any language to that effect.
If you then use realloc() again grow the block back to at least cover the object type pointed to by the dangling pointer, and in neither case did realloc() move the memory block
You can't know whether it will ... the standard does not guarantee any such thing. And notably, when you realloc to a smaller size, most implementations modify the memory immediately following the shortened block; reallocing back to the original size will have some garbage in the added part, it won't be what it was before it was shrunk. In some implementations, some block sizes are kept on lists for that block size; reallocating to a different size will give you totally different memory. And in a program with multiple threads, any freed memory can be allocated in a different thread between the two reallocs, in which case the realloc for a larger size will be forced to move the object to a different location.
is the dangling pointer valid again?
See above; invalid is invalid; there's no going back.
This is such a corner case that I don't really know how to interpret the C or C++ standards to figure it out.
It's not any sort of corner case and I don't know what you're seeing in the standard, which is quite clear that freed memory has indeteterminate content and that the values of any pointers to or into it are also indeterminate, and makes no claim that they are magically restored by a later realloc.
Note that modern optimizing compilers are written to know about undefined behavior and take advantage of it. As soon as you realloc string, overwrite is invalid, and the compiler is free to trash it ... e.g., it might be in a register that the compiler reallocates for temporaries or parameter passing. Whether any compiler does this, it can, precisely because the standard is quite clear about pointers into objects becoming invalid when the object's lifetime ends.
If you then use realloc() again grow the block back to at least cover the object type pointed to by the dangling pointer, and in neither case did realloc() move the memory block, is the dangling pointer valid again?
No. Unless realloc() returns a null pointer, the call terminates the lifetime of the allocated object, implying that all pointers pointing into it become invalid. If realloc() succeeds, it returns the address of a new object.
Of course, it just might happen that it's the same address as the old one. In that case, using an invalid pointer to the old object to access the new one will generally work in non-optimizing implementations of the C language.
It would still be undefined behaviour, though, and might actually fail with aggressively optimizing compilers.
The C language is unsound, and it's generally up to the programmer to uphold its invariants. Failing to do so will break the implicit contract with the compiler and may result in incorrect code being generated.
It depends on your definition of "valid". You've perfectly described the situation. If you want to consider that "valid", then it's valid. If you don't want to consider that "valid", then it's invalid.

How does creating a dynamically allocated string in C work?

I don't understand how dynamically allocated strings in C work. Below, I have an example where I think I have created a pointer to a string and allocated it 0 memory, but I'm still able to give it characters. I'm clearly doing something wrong, but what?
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
char *str = malloc(0);
int i;
str[i++] = 'a';
str[i++] = 'b';
str[i++] = '\0';
printf("%s\n", str);
return 0;
}
What you're doing is undefined behavior. It might appear to work now, but is not required to work, and may break if anything changes.
malloc normally returns a block of memory of the given size that you can use. In your case, it just so happens that there's valid memory outside of that block that you're touching. That memory is not supposed to be touched; malloc might use that memory for internal housekeeping, it might give that memory as the result of some malloc call, or something else entirely. Whatever it is, it isn't yours, and touching it produces undefined behavior.
Section 7.20.3 of the current C standard states in part:
"If the size of the space requested is zero, the behavior is
implementation defined: either a null pointer is returned, or the
behavior is as if the size were some nonzero value, except that the
returned pointer shall not be used to access an object."
This will be implementation defined. Either it could send a NULL pointer or as mentioned something that cannot be referenced
Your are overwriting non-allocated memory. This might looks like working. But you are in trouble when you call free where the heap function tries to gives the memory block back.
Each malloc() returned chunk of memory has a header and a trailer. These structures hold at least the size of the allocated memory. Sometimes yout have additional guards. You are overwriting this heap internal structures. That's the reason why free() will complain or crash.
So you have an undefined behavior.
By doing malloc(0) you are creating a NULL pointer or a unique pointer that can be passed to free. Nothing wrong with that line. The problem lies when you perform pointer arithmetic and assign values to memory you have not allocated. Hence:
str[i++] = 'a'; // Invalid (undefined).
str[i++] = 'b'; // Invalid (undefined).
str[i++] = '\0'; // Invalid (undefined).
printf("%s\n", str); // Valid, (undefined).
It's always good to do two things:
Do not malloc 0 bytes.
Check to ensure the block of memory you malloced is valid.
... to check to see if a block of memory requested from malloc is valid, do the following:
if ( str == NULL ) exit( EXIT_FAILURE );
... after your call to malloc.
Your malloc(0) is wrong. As other people have pointed out that may or may not end up allocating a bit of memory, but regardless of what malloc actually does with 0 you should in this trivial example allocate at least 3*sizeof(char) bytes of memory.
So here we have a right nuisance. Say you allocated 20 bytes for your string, and then filled it with 19 characters and a null, thus filling the memory. So far so good. However, consider the case where you then want to add more characters to the string; you can't just out them in place because you had allocated only 20 bytes and you had already used them. All you can do is allocate a whole new buffer (say, 40 bytes), copy the original 19 characters into it, then add the new characters on the end and then free the original 20 bytes. Sounds inefficient doesn't it. And it is inefficient, it's a whole lot of work to allocate memory, and sounds like an specially large amount of work compared to other languages (eg C++) where you just concatenate strings with nothing more than str1 + str2.
Except that underneath the hood those languages are having to do exactly the same thing of allocating more memory and copying existing data. If one cares about high performance C makes it clearer where you are spending time, whereas the likes of C++, Java, C# hide the costly operations from you behind convenient-to-use classes. Those classes can be quite clever (eg allocating more memory than strictly necessary just in case), but you do have to be on the ball if you're interested in extracting the very best performance from your hardware.
This sort of problem is what lies behind the difficulties that operations like Facebook and Twitter had in growing their services. Sooner or later those convenient but inefficient class methods add up to something unsustainable.

Is there an alternative way to free dynamically allocated memory in C - not using the free() function?

I am studying for a test, and I was wondering if any of these are equivalent to free(ptr):
malloc(NULL);
calloc(ptr);
realloc(NULL, ptr);
calloc(ptr, 0);
realloc(ptr, 0);
From what I understand, none of these will work because the free() function actually tells C that the memory after ptr is available again for it to use. Sorry that this is kind of a noob question, but help would be appreciated.
Actually, the last of those is equivalent to a call to free(). Read the specification of realloc() very carefully, and you will find it can allocate data anew, or change the size of an allocation (which, especially if the new size is larger than the old, might move the data around), and it can release memory too. In fact, you don't need the other functions; they can all be written in terms of realloc(). Not that anyone in their right mind would do so...but it could be done.
See Steve Maguire's "Writing Solid Code" for a complete dissection of the perils of the malloc() family of functions. See the ACCU web site for a complete dissection of the perils of reading "Writing Solid Code". I'm not convinced it is as bad as the reviews make it out to be - though its complete lack of a treatment of const does date it (back to the early 90s, when C89 was still new and not widely implemented in full).
D McKee's notes about MacOS X 10.5 (BSD) are interesting...
The C99 standard says:
7.20.3.3 The malloc function
Synopsis
#include <stdlib.h>
void *malloc(size_t size);
Description
The malloc function allocates space for an object whose size is specified by size and
whose value is indeterminate.
Returns
The malloc function returns either a null pointer or a pointer to the allocated space.
7.20.3.4 The realloc function
Synopsis
#include <stdlib.h>
void *realloc(void *ptr, size_t size);
Description
The realloc function deallocates the old object pointed to by ptr and returns a
pointer to a new object that has the size specified by size. The contents of the new
object shall be the same as that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values.
If ptr is a null pointer, the realloc function behaves like the malloc function for the
specified size. Otherwise, if ptr does not match a pointer earlier returned by the
calloc, malloc, or realloc function, or if the space has been deallocated by a call
to the free or realloc function, the behavior is undefined. If memory for the new
object cannot be allocated, the old object is not deallocated and its value is unchanged.
Returns
The realloc function returns a pointer to the new object (which may have the same
value as a pointer to the old object), or a null pointer if the new object could not be
allocated.
Apart from editorial changes because of extra headers and functions, the ISO/IEC 9899:2011 standard says the same as C99, but in section 7.22.3 instead of 7.20.3.
The Solaris 10 (SPARC) man page for realloc says:
The realloc() function changes the size of the block pointer to by ptr to size bytes and returns a pointer to the (possibly moved) block. The contents will be unchanged up to the lesser of the new and old sizes. If the new size of the block requires movement of the block, the space for the previous instantiation of the block is freed. If the new size is larger, the contents of the newly allocated portion of the block are unspecified. If ptr is NULL, realloc() behaves like malloc() for the specified size. If size is 0 and ptr is not a null pointer, the space pointed to is freed.
That's a pretty explicit 'it works like free()' statement.
However, that MacOS X 10.5 or BSD says anything different reaffirms the "No-one in their right mind" part of my first paragraph.
There is, of course, the C99 Rationale...It says:
7.20.3 Memory management functions
The treatment of null pointers and zero-length allocation requests in the definition of these
functions was in part guided by a desire to support this paradigm:
OBJ * p; // pointer to a variable list of OBJs
/* initial allocation */
p = (OBJ *) calloc(0, sizeof(OBJ));
/* ... */
/* reallocations until size settles */
while(1) {
p = (OBJ *) realloc((void *)p, c * sizeof(OBJ));
/* change value of c or break out of loop */
}
This coding style, not necessarily endorsed by the Committee, is reported to be in widespread
use.
Some implementations have returned non-null values for allocation requests of zero bytes.
Although this strategy has the theoretical advantage of distinguishing between “nothing” and “zero” (an unallocated pointer vs. a pointer to zero-length space), it has the more compelling
theoretical disadvantage of requiring the concept of a zero-length object. Since such objects
cannot be declared, the only way they could come into existence would be through such
allocation requests.
The C89 Committee decided not to accept the idea of zero-length objects. The allocation
functions may therefore return a null pointer for an allocation request of zero bytes. Note that this treatment does not preclude the paradigm outlined above.
QUIET CHANGE IN C89
A program which relies on size-zero allocation requests returning a non-null pointer
will behave differently.
[...]
7.20.3.4 The realloc function
A null first argument is permissible. If the first argument is not null, and the second argument is 0, then the call frees the memory pointed to by the first argument, and a null argument may be
returned; C99 is consistent with the policy of not allowing zero-sized objects.
A new feature of C99: the realloc function was changed to make it clear that the pointed-to
object is deallocated, a new object is allocated, and the content of the new object is the same as
that of the old object up to the lesser of the two sizes. C89 attempted to specify that the new object was the same object as the old object but might have a different address. This conflicts
with other parts of the Standard that assume that the address of an object is constant during its
lifetime. Also, implementations that support an actual allocation when the size is zero do not
necessarily return a null pointer for this case. C89 appeared to require a null return value, and
the Committee felt that this was too restrictive.
Thomas Padron-McCarthy observed:
C89 explicitly says: "If size is zero and ptr is not a null pointer, the object it points to is freed." So they seem to have removed that sentence in C99?
Yes, they have removed that sentence because it is subsumed by the opening sentence:
The realloc function deallocates the old object pointed to by ptr
There's no wriggle room there; the old object is deallocated. If the requested size is zero, then you get back whatever malloc(0) might return, which is often (usually) a null pointer but might be a non-null pointer that can also be returned to free() but which cannot legitimately be dereferenced.
realloc(ptr, 0);
is equivalent to free(ptr); (although I wouldn't recommended its use as such!)
Also: these two calls are equivalent to each other (but not to free):
realloc(NULL,size)
malloc(size)
The last one--realloc(ptr, 0)--comes close. It will free any allocated block and replace it with a minimal allocation (says my Mac OS X 10.5 manpage). Check your local manpage to see what it does on your system.
That is, if ptr pointed at a substantial object, you'll get back most of its memory.
The man page on Debian Lenny agrees with Mitch and Jonathan...does BSD really diverge from Linux on this?
From the offending man page:
The realloc() function tries to change the size of the allocation pointed
to by ptr to size, and returns ptr. [...]
If size is zero and ptr is not NULL, a new,
minimum sized object is allocated and the original object is freed.
The linux and solaris man pages are very clean, and the '89 standard: realloc(ptr,0) works like free(ptr). The Mac OS manpage above, and the standard as quoted by Jonathan are less clear but seems to leave room to break the equivalence.
I've been wondering why the difference: the "act like free" interpretation seems very natural to me. Both of the implementations I have access to include some environment variable driven tunablity, but the BSD version accepts many more options Some examples:
MallocGuardEdges If set, add a guard page before and after
each large block.
MallocDoNotProtectPrelude If set, do not add a guard page before large
blocks, even if the MallocGuardEdges envi-
ronment variable is set.
MallocDoNotProtectPostlude If set, do not add a guard page after large
blocks, even if the MallocGuardEdges envi-
ronment variable is set.
and
MallocPreScribble If set, fill memory that has been allocated
with 0xaa bytes. This increases the likeli-
hood that a program making assumptions about
the contents of freshly allocated memory
will fail.
MallocScribble If set, fill memory that has been deallo-
cated with 0x55 bytes. This increases the
likelihood that a program will fail due to
accessing memory that is no longer allo-
cated.
Possibly the "minimum sized object" is nothing (i.e. equivalent to free) in the normal modes, but something with some of the guards in place. Take that for what it's worth.

Resources