realloc() dangling pointers and undefined behavior - c

When you free memory, what happens to pointers that point into that memory? Do they become invalid immediately? What happens if they later become valid again?
Certainly, the usual case of a pointer going invalid then becoming "valid" again would be some other object getting allocated into what happens to be the memory that was used before, and if you use the pointer to access memory, that's obviously undefined behavior. Dangling pointer memory overwrite lesson 1, pretty much.
But what if the memory becomes valid again for the same allocation? There's only one Standard way for that to happen: realloc(). If you have a pointer to somewhere within a malloc()'d memory block at offset > 1, then use realloc() to shrink the block to less than your offset, your pointer becomes invalid, obviously. If you then use realloc() again grow the block back to at least cover the object type pointed to by the dangling pointer, and in neither case did realloc() move the memory block, is the dangling pointer valid again?
This is such a corner case that I don't really know how to interpret the C or C++ standards to figure it out. The below is a program that shows it.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
static const char s_message[] = "hello there";
static const char s_kitty[] = "kitty";
char *string = malloc(sizeof(s_message));
if (!string)
{
fprintf(stderr, "malloc failed\n");
return 1;
}
memcpy(string, s_message, sizeof(s_message));
printf("%p %s\n", string, string);
char *overwrite = string + 6;
*overwrite = '\0';
printf("%p %s\n", string, string);
string[4] = '\0';
char *new_string = realloc(string, 5);
if (new_string != string)
{
fprintf(stderr, "realloc #1 failed or moved the string\n");
free(new_string ? new_string : string);
return 1;
}
string = new_string;
printf("%p %s\n", string, string);
new_string = realloc(string, 6 + sizeof(s_kitty));
if (new_string != string)
{
fprintf(stderr, "realloc #2 failed or moved the string\n");
free(new_string ? new_string : string);
return 1;
}
// Is this defined behavior, even though at one point,
// "overwrite" was a dangling pointer?
memcpy(overwrite, s_kitty, sizeof(s_kitty));
string[4] = s_message[4];
printf("%p %s\n", string, string);
free(string);
return 0;
}

When you free memory, what happens to pointers that point into that memory? Do they become invalid immediately?
Yes, definitely. From section 6.2.4 of the C standard:
The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address, and retains
its last-stored value throughout its lifetime. If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to (or just past) reaches the end of its lifetime.
And from section 7.22.3.5:
The realloc function deallocates the old object pointed to by ptr and returns a
pointer to a new object that has the size specified by size. The contents of the new
object shall be the same as that of the old object prior to deallocation, up to the lesser of
the new and old sizes. Any bytes in the new object beyond the size of the old object have
indeterminate values.
Note the reference to old object and new object ... by the standard, what you get back from realloc is a different object than what you had before; it's no different from doing a free and then a malloc, and there is no guarantee that the two objects have the same address, even if the new size is <= the old size ... and in real implementations they often won't because objects of different sizes are drawn from different free lists.
What happens if they later become valid again?
There's no such animal. Validity isn't some event that takes place, it's an abstract condition placed by the C standard. Your pointers might happen to work in some implementation, but all bets are off once you free the memory they point into.
But what if the memory becomes valid again for the same allocation? There's only one Standard way for that to happen: realloc()
Sorry, no, the C Standard does not contain any language to that effect.
If you then use realloc() again grow the block back to at least cover the object type pointed to by the dangling pointer, and in neither case did realloc() move the memory block
You can't know whether it will ... the standard does not guarantee any such thing. And notably, when you realloc to a smaller size, most implementations modify the memory immediately following the shortened block; reallocing back to the original size will have some garbage in the added part, it won't be what it was before it was shrunk. In some implementations, some block sizes are kept on lists for that block size; reallocating to a different size will give you totally different memory. And in a program with multiple threads, any freed memory can be allocated in a different thread between the two reallocs, in which case the realloc for a larger size will be forced to move the object to a different location.
is the dangling pointer valid again?
See above; invalid is invalid; there's no going back.
This is such a corner case that I don't really know how to interpret the C or C++ standards to figure it out.
It's not any sort of corner case and I don't know what you're seeing in the standard, which is quite clear that freed memory has indeteterminate content and that the values of any pointers to or into it are also indeterminate, and makes no claim that they are magically restored by a later realloc.
Note that modern optimizing compilers are written to know about undefined behavior and take advantage of it. As soon as you realloc string, overwrite is invalid, and the compiler is free to trash it ... e.g., it might be in a register that the compiler reallocates for temporaries or parameter passing. Whether any compiler does this, it can, precisely because the standard is quite clear about pointers into objects becoming invalid when the object's lifetime ends.

If you then use realloc() again grow the block back to at least cover the object type pointed to by the dangling pointer, and in neither case did realloc() move the memory block, is the dangling pointer valid again?
No. Unless realloc() returns a null pointer, the call terminates the lifetime of the allocated object, implying that all pointers pointing into it become invalid. If realloc() succeeds, it returns the address of a new object.
Of course, it just might happen that it's the same address as the old one. In that case, using an invalid pointer to the old object to access the new one will generally work in non-optimizing implementations of the C language.
It would still be undefined behaviour, though, and might actually fail with aggressively optimizing compilers.
The C language is unsound, and it's generally up to the programmer to uphold its invariants. Failing to do so will break the implicit contract with the compiler and may result in incorrect code being generated.

It depends on your definition of "valid". You've perfectly described the situation. If you want to consider that "valid", then it's valid. If you don't want to consider that "valid", then it's invalid.

Related

Does realloc mutate its arguments

Does realloc mutate its first argument?
Is mutating the first argument dependent on the implementation?
Is there a reason it should not be const? As a counter example memcpy makes its src argument const.
ISO C standard, section 7.20.3 Memory management functions, does not specify. The Linux man page for realloc does not specify.
#include <stdio.h>
#include <stdlib.h>
int main() {
int* list = NULL;
void* mem;
mem = realloc(list, 64);
printf("Address of `list`: %p\n", list);
list = mem;
printf("Address of `list`: %p\n", list);
mem = realloc(list, 0);
printf("Address of `list`: %p\n", list);
// free(list); // Double free
list = mem;
printf("Address of `list`: %p\n", list);
}
When I run the above code on my Debian laptop:
The first printf is null.
The second printf has an address.
The third printf has the same address as the second.
In accordance with the spec, trying to free the address results in a double free error.
The forth printf is null.
The function does not change the original pointer because it deals with a copy of the pointer. That is the pointer is not passed by reference.
Consider the following program
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *p = malloc( sizeof( int ) );
*p = 10;
printf( "Before p = %p\n", ( void * )p );
char *q = realloc( p, 2 * sizeof( int ) );
printf( "After p = %p\n", ( void * )p );
free( q );
return 0;
}
Its output is
Before p = 0x5644bcfde260
After p = 0x5644bcfde260
As you see the pointer p was not changed.
However the new pointer q can have the same value as pointer p had before the call of realloc.
From the C Standard (7.22.3.5 The realloc function)
4 The realloc function returns a pointer to the new object (which
may have the same value as a pointer to the old object), or a null
pointer if the new object could not be allocated.
Of course if you will write
p = realloc( p, 2 * sizeof( int ) );
instead of
char *q = realloc( p, 2 * sizeof( int ) );
then it is evident that in general the new value of pointer p can differ from the old value of p (though can be the same according to the quote). For example if the function was unable to reallocate memory. In this case a memory leak will occur provided that initial value of the pointer p was not equal to NULL. Because in this case (when the initial value of the pointer was not equal to NULL) the address of the early allocated memory will be lost.
The old memory is not deallocated if a new memory extent can not be
allocated because the function needs to copy the old content to the
new extent of memory.
From the C Standard (7.22.3.5 The realloc function)
If memory for the new object cannot be allocated, the old object is
not deallocated and its value is unchanged.
Pay attention to that this call
mem = realloc(list, 0);
does not necessary return NULL.
From the C Standard (7.22.3 Memory management functions)
If the size of the space requested is zero, the behavior is
implementation-defined: either a null pointer is returned, or the
behavior is as if the size were some nonzero value, except that the
returned pointer shall not be used to access an object.
First of all, formally, realloc frees the memory pointed to by its first argument after allocating a new object and copying the contents. As such, semantically it's absolutely correct that the pointed-to type not be const qualified. In limited cases, the new object's address may be the same as the old object's address, but a correct program largely can't even see this (comparing against the old pointer is undefined behavior), much less depend on it.
Secondly, I think you're confusing the const-ness of the argument type and the pointed-to type. const on argument types makes no sense whatsoever (and is ignored by the language, except in the implementation of the called function where it makes the local variable receiving the argument constant) since arguments are always values, not references to some object in the caller. Of course realloc can't change the value of the caller's pointer variable you pass to it. However, due to any use of invalid pointers being undefined behavior, your program can (because UB allows anything) exhibit behavior as if the caller's copy had been modified. For example, comparing it for equality with the new pointer may give inconsistent results. The const on memcpy's src makes a pointer-to-const type, not a const type.
realloc() can free the memory that its argument points to, if it can't reuse the same memory. I think this is considered to be like a mutation (since it effectively destroys it completely).
Semantically, realloc() is equivalent to:
void *realloc(void *ptr, size_t size) {
void *result = malloc(size);
if (result && ptr) {
memcpy(result, ptr, min(size, _allocation_size(ptr)));
free(ptr);
}
return result;
}
where _allocation_size() is some internal function of the C runtime that determines the size of a dynamic memory allocation.
Since the argument to free() is not declared const void *, neither is the first argument to realloc().
I'm not entirely sure what you mean by "Does realloc mutate its first argument?".
It certainly doesn't change the value of the pointer in the caller -- no C function can do that.
But does it alter the value of the pointed-to memory? That's a trickier question.
As far as the programmer is concerned, you hand realloc a pointer to M bytes, and it returns you a (possibly different) pointer to N bytes.
If it hands you back the same pointer (meaning that it was able to do the reallocation "in place"), and if N ≥ M, it definitely does not touch the M former bytes.
If it hands you back the same pointer but N < M (that is, if you reallocated the region smaller), you're no longer allowed to access or even ask about the bytes beyond M, so it's particularly hard to say whether they were modified. (But in fact, they might well have been modified, in the process of marking them unused, and available for future allocation).
Finally, if realloc hands you back a different pointer, the M former bytes are "gone" -- again, you're no longer allowed to access them, so it's hard to say if they were modified, but they probably were, because all of them are now available for future allocation.
But in any case: the pointer you hand to realloc is a pointer into the heap, and realloc definitely alters the heap as it does its work, so yes, I think it's safe to say that realloc mutates its first argument, which therefore should not be declared const. (Even in the first case I discussed, where realloc "definitely did not touch the M former bytes", it probably did still adjust some nearby data structures, to record the new allocation.)
And, finally, if by "mutate" you mean the sort of thing that C++ programs are allowed to do when member variables are declared mutable -- that is, a change happens behind the scenes to some data structure referenced by a pointer that was otherwise qualified const -- well, yes, that's not too far off from what realloc does. If realloc's first argument were const, and if the modifications realloc did perform were to data structures qualified as mutable, then I suppose this would work -- but also if we were talking about C++.
But of course we're not talking about C++; we're talking about C, which doesn't even have the mutable qualifier.
(I'd say memcpy isn't a counterexample, because it doesn't do anything that even remotely smells like writing to any data structures associated with its second argument.)
Does realloc mutate its first argument?
If you mean change the value of the variable passed as parameter the answer is no. The point isn't related to the specific realloc() function, but more generally to the way used by the language to handle parameters. C language produce a private copy of each argument, typically on the stack, before to pass them to the function, For this reason each change to them is confined locally and is lost when the function returns and the stack reused. Formally the C language pass almost all types by value (arrays are a well known exception). Anyway I'll come back on the argument below.
Is mutating the first argument dependent on the implementation?
Of course not. As said above this depends by the language.
Is there a reason it should not be const? As a counter example memcpy
makes its src argument const.
Of course there is a reason.
Forget about void * memcpy ( void * destination, const void * source, size_t num ) that has no connection to void* realloc (void* ptr, size_t size), lets consider that the management of dynamic memory depends on specific local implementation, but basically all allocation routines are based on memory pools, normally divided in small chunks, from where are derived the memory blocks returned to our programs. We can imagine that when we require to shrink the block the system will remove some chunks giving back a smaller block that incidentally remain at the same address, but if we require an extension maybe the chunks following our block are already assigned we can't proceed.
On an embedded 8 bit micro may happen that the actual memory block cannot be extended, but that another memory area is large enough for the scope, in that case we can copy the former block data to the new one and return it. But in this case we have a different address in memory.
But the malloc() must be universal independently from the machine where it is implemented, starting from 8 bits embedded applications to 64bits desktops with GBytes of available memory and virtual memory support. For this reason the standard must provide a definition that could fit all cases.
The second point is how pass the result, pass/fail, of the reallocation, if would have been used a reference to the memory block pointer (ie passing &ptr), in case of a failure returning NULL the original pointer would have been lost!. The user, to preserve it, must have done a copy of the pointer before to realloc(), but this procedure is unnatural e prone to errors.
For this reason in the standard library the problem is approached from a different side: the reallocation will formally return always a freshly allocated memory block in which has been copied the former memory block data. The programmer is required only to check the result before use it (see below code example).
The standard is very clear in the function definition, as already mentioned in other answers, that for sake of completeness I report below. From ISO/IEC 9899:2017 §7.22.3.5 The realloc function:
The realloc function deallocates the old object pointed to by ptr and
returns a pointer to a new object that has the size specified by size.
The contents of the new object shall be the same as that of the old
object prior to deallocation, up to the lesser of the new and old
sizes.
Any bytes in the new object beyond the size of the old object have
indeterminate values.
If ptr is a null pointer, the realloc function behaves like the malloc
function for the specified size. Otherwise, if ptr does not match a
pointer earlier returned by a memory management function, or if the
space has been deallocated by a call to the free or realloc function,
the behavior is undefined.
If size is nonzero and memory for the new object is not allocated, the
old object is not deallocated.
If size is zero and memory for the new object is not allocated, it is
implementation-defined whether the old object is deallocated. If the
old object is not deallocated, its value shall be unchanged.
The realloc function returns a pointer to the new object (which may
have the same value as a pointer to the old object), or a null pointer
if the new object has not been allocated.
Because you don't know if realloc() returns a new object or the former one, or even NULL in case of error, you should consider realloc() as always returning a new object, hence the code:
int* list = NULL;
void* mem;
mem = realloc(list, 64);
printf("Address of `list`: %p\n", list);
Is wrong at least for two reasons:
Obviously Because if realloc() return a new object and frees the
old memory, the variable list contains an invalid pointer. Moreover it could fail returning NULL, in that case the former block will still be valid.
Because you can't expect to have list changed in any way passing it as a local parameter in a function. Of course list will retain its former value that is NULL.
While passing a null pointer to realloc() is standard compliant, because it explicitly says that in this case the behavior will be the same as malloc(), passing a zero size the behavior is implementation-defined implying that the former block will be deallocated by some compilers, but not from some others. The latter means that the behavior can change on compiler basis, on your machine we can deduce that evidently the compiler behavior is to deallocate the block because of the double free error you got and the null pointer returned by realloc(). Please note also that in latter case when passing a zero size to realloc()the returned NULL could not mean that a failure occurred, and that the function was successful, but in case of failure you will not able to correctly understand if there was a failure or not. This is an ambiguity of the function (or it is so at my knowledge comments are welcome).
The golden rules to follow when using realloc() are basically these:
Keep in mind that the object returned from function is always a
new object and you have to save it.
Because realloc() can fail and return a NULL pointer, never use
code as that below, because if it fails we will overwrite the old
object pointer loosing the possibility to recover data or free the
former object. Always use a temporary variable to check the return
value.
Example code:
void *p = malloc(SIZE);
/* Wrong approach we overwrite anyway teh pointer */
p = realloc(p, 2*SIZE);
/** Correct approach */
void *pTmp = realloc(p, 2*SIZE);
if (NULL == pTmp)
{
//Error manage code
}
else
{
p = pTmp; //assign value
}
Now you may ask why on many machines, having virtual memory management as desktops, smartphones and the like, often happen to have unchanged memory address returned from realloc(). Well the point is that, thanks to the virtual memory management, more physical not contiguous memory chunks can be added to the virtual memory chain, then the virtual memory descriptors can be manipulated, mapping consequential virtual addresses to each physical chunk in such a way that the user sees a flat contiguous virtual memory space.

How come I can initialize a pointer that has zero bytes allocated to it? [duplicate]

This question already has answers here:
What's the point of malloc(0)?
(17 answers)
Closed 4 years ago.
I know that malloc(size_t size) allocates size bytes and returns a pointer to the allocated memory. .
So how come when I allocate zero bytes to the integer pointer p, I am still able to initialize it?
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *p = malloc(0);
*p = 10;
printf("Pointer address is: %p\n", p);
printf("The value of the pointer: %d\n", *p);
return 0;
}
Here is my program output, I was expecting a segmentation fault.
Pointer address is: 0x1ebd260
The value of the pointer: 10
The behavior of malloc(0) is implementation defined, it will either return a pointer or NULL. As per C Standard, you should not use the pointer returned by malloc when requested zero size space1).
Dereferencing the pointer returned by malloc(0) is undefined behavior which includes the program may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
1) From C Standard#7.22.3p1 [emphasis added]:
1 The order and contiguity of storage allocated by successive calls to the aligned_alloc, calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object. The pointer returned points to the start (lowest byte address) of the allocated space. If the space cannot be allocated, a null pointer is returned. If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.
When you call malloc(0) and write to the returned buffer, you invoke undefined behavior. That means you can't predict how the program will behave. It might crash, it might output strange results, or (as in this case) it may appear to work properly.
Just because the program could crash doesn't mean it will.
In general, C does not prevent you from doing incorrect things.
After int *p = malloc(0);, p has some value. It might be a null pointer, or it might point to one or more bytes of memory. In either case, you should not use it.1 But the C language does not stop you from doing so.
When you execute *p = 10;, the compiler may have generated code to write 10 to the place where p points. p may be pointing at actual writable memory, so the store instruction may execute without failing. And then you have written 10 to a place in memory where you should not. At this point, the C standard no longer specifies what the behavior of your program is—by writing to an inappropriate place, you have broken the model of how C works.
It is also possible your compiler recognizes that *p = 10; is incorrect code in this situation and generates something other than the write to memory described above. A good compiler might give you a warning message for this code, but the compiler is not obligated to do this by the C standard, and it can allow your program to break in other ways.
Footnote
1 If malloc returns a null pointer, you should not write to *p because it is not pointing to an object. If it returns something else, you should not write to *p because C 2018 7.22.3.1 says, for this of malloc(0), “the returned pointer shall not be used to access an object.”
I think it is good idea to look at segmentation fault definition https://en.wikipedia.org/wiki/Segmentation_fault
The following are some typical causes of a segmentation fault:
Attempting to access a nonexistent memory address (outside process's address space)
Attempting to access memory the program does not have rights to (such as kernel structures in process context)
Attempting to write read-only memory (such as code segment)
So in general, access to any address in program's data segment will not lead to seg fault. This approach makes sence as C doesn't have any framework with memory management in it (like C# or Java). So as malloc in this example returns some address (not NULL) it returns it from program's data segment which can be accessed by program.
However if program is complex, such action (*p = 10) may overwrite data belonging to some other variable or object (or even pointer!) and lead to undefined behaviour (including seg fault).
But please take in account, that as described in other answers such program is not best practice and you shouldn't use such approach in production.
This is a source code of malloc() (maybe have a litter difference between kernel versions but concept still like this), It can answer your question:
static void *malloc(int size)
{
void *p;
if (size < 0)
error("Malloc error");
if (!malloc_ptr)
malloc_ptr = free_mem_ptr;
malloc_ptr = (malloc_ptr + 3) & ~3; /* Align */
p = (void *)malloc_ptr;
malloc_ptr += size;
if (free_mem_end_ptr && malloc_ptr >= free_mem_end_ptr)
error("Out of memory");
malloc_count++;
return p;
}
When you assigned size is 0, it already gave you a pointer to first address of memory

clang - Undefined behavior in realloc aliasing [duplicate]

When you free memory, what happens to pointers that point into that memory? Do they become invalid immediately? What happens if they later become valid again?
Certainly, the usual case of a pointer going invalid then becoming "valid" again would be some other object getting allocated into what happens to be the memory that was used before, and if you use the pointer to access memory, that's obviously undefined behavior. Dangling pointer memory overwrite lesson 1, pretty much.
But what if the memory becomes valid again for the same allocation? There's only one Standard way for that to happen: realloc(). If you have a pointer to somewhere within a malloc()'d memory block at offset > 1, then use realloc() to shrink the block to less than your offset, your pointer becomes invalid, obviously. If you then use realloc() again grow the block back to at least cover the object type pointed to by the dangling pointer, and in neither case did realloc() move the memory block, is the dangling pointer valid again?
This is such a corner case that I don't really know how to interpret the C or C++ standards to figure it out. The below is a program that shows it.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
static const char s_message[] = "hello there";
static const char s_kitty[] = "kitty";
char *string = malloc(sizeof(s_message));
if (!string)
{
fprintf(stderr, "malloc failed\n");
return 1;
}
memcpy(string, s_message, sizeof(s_message));
printf("%p %s\n", string, string);
char *overwrite = string + 6;
*overwrite = '\0';
printf("%p %s\n", string, string);
string[4] = '\0';
char *new_string = realloc(string, 5);
if (new_string != string)
{
fprintf(stderr, "realloc #1 failed or moved the string\n");
free(new_string ? new_string : string);
return 1;
}
string = new_string;
printf("%p %s\n", string, string);
new_string = realloc(string, 6 + sizeof(s_kitty));
if (new_string != string)
{
fprintf(stderr, "realloc #2 failed or moved the string\n");
free(new_string ? new_string : string);
return 1;
}
// Is this defined behavior, even though at one point,
// "overwrite" was a dangling pointer?
memcpy(overwrite, s_kitty, sizeof(s_kitty));
string[4] = s_message[4];
printf("%p %s\n", string, string);
free(string);
return 0;
}
When you free memory, what happens to pointers that point into that memory? Do they become invalid immediately?
Yes, definitely. From section 6.2.4 of the C standard:
The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address, and retains
its last-stored value throughout its lifetime. If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to (or just past) reaches the end of its lifetime.
And from section 7.22.3.5:
The realloc function deallocates the old object pointed to by ptr and returns a
pointer to a new object that has the size specified by size. The contents of the new
object shall be the same as that of the old object prior to deallocation, up to the lesser of
the new and old sizes. Any bytes in the new object beyond the size of the old object have
indeterminate values.
Note the reference to old object and new object ... by the standard, what you get back from realloc is a different object than what you had before; it's no different from doing a free and then a malloc, and there is no guarantee that the two objects have the same address, even if the new size is <= the old size ... and in real implementations they often won't because objects of different sizes are drawn from different free lists.
What happens if they later become valid again?
There's no such animal. Validity isn't some event that takes place, it's an abstract condition placed by the C standard. Your pointers might happen to work in some implementation, but all bets are off once you free the memory they point into.
But what if the memory becomes valid again for the same allocation? There's only one Standard way for that to happen: realloc()
Sorry, no, the C Standard does not contain any language to that effect.
If you then use realloc() again grow the block back to at least cover the object type pointed to by the dangling pointer, and in neither case did realloc() move the memory block
You can't know whether it will ... the standard does not guarantee any such thing. And notably, when you realloc to a smaller size, most implementations modify the memory immediately following the shortened block; reallocing back to the original size will have some garbage in the added part, it won't be what it was before it was shrunk. In some implementations, some block sizes are kept on lists for that block size; reallocating to a different size will give you totally different memory. And in a program with multiple threads, any freed memory can be allocated in a different thread between the two reallocs, in which case the realloc for a larger size will be forced to move the object to a different location.
is the dangling pointer valid again?
See above; invalid is invalid; there's no going back.
This is such a corner case that I don't really know how to interpret the C or C++ standards to figure it out.
It's not any sort of corner case and I don't know what you're seeing in the standard, which is quite clear that freed memory has indeteterminate content and that the values of any pointers to or into it are also indeterminate, and makes no claim that they are magically restored by a later realloc.
Note that modern optimizing compilers are written to know about undefined behavior and take advantage of it. As soon as you realloc string, overwrite is invalid, and the compiler is free to trash it ... e.g., it might be in a register that the compiler reallocates for temporaries or parameter passing. Whether any compiler does this, it can, precisely because the standard is quite clear about pointers into objects becoming invalid when the object's lifetime ends.
If you then use realloc() again grow the block back to at least cover the object type pointed to by the dangling pointer, and in neither case did realloc() move the memory block, is the dangling pointer valid again?
No. Unless realloc() returns a null pointer, the call terminates the lifetime of the allocated object, implying that all pointers pointing into it become invalid. If realloc() succeeds, it returns the address of a new object.
Of course, it just might happen that it's the same address as the old one. In that case, using an invalid pointer to the old object to access the new one will generally work in non-optimizing implementations of the C language.
It would still be undefined behaviour, though, and might actually fail with aggressively optimizing compilers.
The C language is unsound, and it's generally up to the programmer to uphold its invariants. Failing to do so will break the implicit contract with the compiler and may result in incorrect code being generated.
It depends on your definition of "valid". You've perfectly described the situation. If you want to consider that "valid", then it's valid. If you don't want to consider that "valid", then it's invalid.

How does creating a dynamically allocated string in C work?

I don't understand how dynamically allocated strings in C work. Below, I have an example where I think I have created a pointer to a string and allocated it 0 memory, but I'm still able to give it characters. I'm clearly doing something wrong, but what?
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
char *str = malloc(0);
int i;
str[i++] = 'a';
str[i++] = 'b';
str[i++] = '\0';
printf("%s\n", str);
return 0;
}
What you're doing is undefined behavior. It might appear to work now, but is not required to work, and may break if anything changes.
malloc normally returns a block of memory of the given size that you can use. In your case, it just so happens that there's valid memory outside of that block that you're touching. That memory is not supposed to be touched; malloc might use that memory for internal housekeeping, it might give that memory as the result of some malloc call, or something else entirely. Whatever it is, it isn't yours, and touching it produces undefined behavior.
Section 7.20.3 of the current C standard states in part:
"If the size of the space requested is zero, the behavior is
implementation defined: either a null pointer is returned, or the
behavior is as if the size were some nonzero value, except that the
returned pointer shall not be used to access an object."
This will be implementation defined. Either it could send a NULL pointer or as mentioned something that cannot be referenced
Your are overwriting non-allocated memory. This might looks like working. But you are in trouble when you call free where the heap function tries to gives the memory block back.
Each malloc() returned chunk of memory has a header and a trailer. These structures hold at least the size of the allocated memory. Sometimes yout have additional guards. You are overwriting this heap internal structures. That's the reason why free() will complain or crash.
So you have an undefined behavior.
By doing malloc(0) you are creating a NULL pointer or a unique pointer that can be passed to free. Nothing wrong with that line. The problem lies when you perform pointer arithmetic and assign values to memory you have not allocated. Hence:
str[i++] = 'a'; // Invalid (undefined).
str[i++] = 'b'; // Invalid (undefined).
str[i++] = '\0'; // Invalid (undefined).
printf("%s\n", str); // Valid, (undefined).
It's always good to do two things:
Do not malloc 0 bytes.
Check to ensure the block of memory you malloced is valid.
... to check to see if a block of memory requested from malloc is valid, do the following:
if ( str == NULL ) exit( EXIT_FAILURE );
... after your call to malloc.
Your malloc(0) is wrong. As other people have pointed out that may or may not end up allocating a bit of memory, but regardless of what malloc actually does with 0 you should in this trivial example allocate at least 3*sizeof(char) bytes of memory.
So here we have a right nuisance. Say you allocated 20 bytes for your string, and then filled it with 19 characters and a null, thus filling the memory. So far so good. However, consider the case where you then want to add more characters to the string; you can't just out them in place because you had allocated only 20 bytes and you had already used them. All you can do is allocate a whole new buffer (say, 40 bytes), copy the original 19 characters into it, then add the new characters on the end and then free the original 20 bytes. Sounds inefficient doesn't it. And it is inefficient, it's a whole lot of work to allocate memory, and sounds like an specially large amount of work compared to other languages (eg C++) where you just concatenate strings with nothing more than str1 + str2.
Except that underneath the hood those languages are having to do exactly the same thing of allocating more memory and copying existing data. If one cares about high performance C makes it clearer where you are spending time, whereas the likes of C++, Java, C# hide the costly operations from you behind convenient-to-use classes. Those classes can be quite clever (eg allocating more memory than strictly necessary just in case), but you do have to be on the ball if you're interested in extracting the very best performance from your hardware.
This sort of problem is what lies behind the difficulties that operations like Facebook and Twitter had in growing their services. Sooner or later those convenient but inefficient class methods add up to something unsustainable.

The state of a pointer after deallocation

What does the pointer of a dynamically allocated memory points to after calling the free() function.
Does the pointer points to NULL, or it still points to the same place it pointed before the deallocation.
does the implementation of free() has some kind of standard for this, or it's implemented differently in different platforms.
uint8_t * pointer = malloc(12);
printf("%p", pointer); // The current address the pointer points to
free (pointer);
printf("%p", pointer); // must it be NULL or the same value as before ?
Edit:
I know that the printf will produce the same results, I just want to know if I can count on that on different implementations.
The pointer value is not modified. You pass the pointer (memory address) by value to free(). The free() function does not have access to the pointer variable, so it cannot set it to NULL.
The two printf() calls should produce identical output.
According to the standard (6.2.4/2 of C99):
The value of a pointer becomes indeterminate when the object it points
to reaches the end of its lifetime.
In practice, all implementations I know of will print the same value twice for your example code. However, it is permitted that when you free the memory, the pointer value itself becomes a trap representation, and the implementation does something strange when you try to use the value of the pointer (even though you don't dereference it).
Supposing that an implementation wants to do something unusual, a hardware exception or program abort would be the most plausible I think. You probably have to imagine an implementation/hardware that does a lot of extra work, though, so that every time a pointer value is loaded into a register, it somehow checks whether the address is valid. That could be by checking it in the memory map (in which case I suppose my hypothetical implementation would only trap if the whole page containing the allocation has been released and unmapped), or some other means.
free() only deallocates the memory. The pointer is still pointing to the old location (dangling pointer), which you should manually set to NULL.
Setting the pointer to NULL is a good practice. As the memory location may be reused for other object, you may be able to access and modify data which doesn't belong to you. This is especially hard to debug, since it won't produce a crash, or produce crash at some point which is irrelevant. Setting to NULL will guarantee a re-producible crash if you ever access the non-existent object.
I tried this code in C++
#include<iostream>
using namespace std;
int main(void)
{
int *a=new int;
*a=5;
cout<<*a<<" "<<a<<"\n";
delete a;
*a=45;
cout<<*a<<" "<<a<<"\n";
}
The output was something like this
5 0x1066c20
45 0x1066c20
Running once more yielded a similar result
5 0x13e3c20
45 0x13e3c20
Hence it seems that in gcc the pointer still points to the same memory location after deallocation.
Moreover we can modify the value at that location.

Resources