Pointers, malloc and Compilation - c

I'm trying to figure out how pointers work with allocating memory to them and declaring them, although I kinda know how they work, I'm still getting confused and I'm not sure if it's because of my compiler or something.
I currently use CodeBlocks with GNU/GCC compiler as default, and this is the code I'm running:
#include <stdio.h>
int main()
{
int a = 2;
int *b = 5;
printf("%d\n", a);
printf("%d\n", b);
}
The problem is that both of these write out correct result, why would I need to use malloc if I can just write out *b = 5 and declare it like that, isn't the purpose of malloc to allocate memory to a pointer so you can declare it after?
Is it compilers fault that it allows this to compile or I'm just not getting the point of malloc?

The problem is that both of these write out correct result, why would I need to use malloc if I can just write out *b = 5 and declare it like that, isn't the purpose of malloc to allocate memory to a pointer so you can declare it after?
You can write pointers to existing memory in which you already know what data/code exists there. In your case you're just making a pointer point to 5.. And you have no idea what's at memory address 5.
I think you're slightly confused on what memory allocation is actually doing. It's not magically making memory appear for you to use, it's using something called the heap to make memory accessible aka read/write and depending on what permissions you give possibly executable. You allocate memory so you can use that memory region. If you don't allocate memory and just access a random memory region, you have no idea what will happen. Most likely it will just crash. But you could also be accessing/overwriting existing critical information to the process.
Let's assume calling malloc returns the address 0xCAFE. This means that address or region of memory is made accessible for us. But we know 0xCAFE already exists in the process memory, could we not just make a pointer point to it and use it? No, because you have no idea if that region is accessible or if it's already being used (allocated by a previous call to malloc) or it may be used in the future (allocated by a future call to malloc) in which you'll result in corrupt memory.

The fact that the result is what you expect doesn't mean the code is correct, your program is ill-formed, in int *b = 5, you are assigning the value of 5 to a pointer, this will be interpreted as a memory address, that's what pointers are used for, so you can use that memory address to access the data, and, for instance, pass it as a function argument so that the data can be manipulated.
If you want to just store an int you would use an int variable, so, whilst not illegal, it doesn't make much sense, and deferencing the pointer will invoke undefined behavior, so you can't really use it as a pointer, which is what it is.
You use malloc so that your program and that specific pointer can be given(assigned) by the system a workable memory address where you can store data for later use, (5 is almost certainly not that).
printf("%d\n", b) is also incorrect, the specifier to print a pointer value i.e. a memory address is %p, that is definitely undefined behavior, the correct expression would be:
printf("%p\n", (void*)b);
C gives the programmer leeway to do things that other programming languages don't allow, that is an advantage, but it can also be a problem, programs that compile and seem to run properly may be problematic. When an ill-formed program complies and runs its behavior falls in the category of undefined behavior.
This is defined in the standard and gives the compilers discretionary power to treat the code in any way it sees fit, that includes producing a seemimgly correct result, the problem is that this may work today, and crash tomorrow or vice-versa, it is completely unreliable.

Related

How can a memory block created using malloc store more memory than it was initialized with? [duplicate]

when I try the code below it works fine. Am I missing something?
main()
{
int *p;
p=malloc(sizeof(int));
printf("size of p=%d\n",sizeof(p));
p[500]=999999;
printf("p[0]=%d",p[500]);
return 0;
}
I tried it with malloc(0*sizeof(int)) or anything but it works just fine. The program only crashes when I don't use malloc at all. So even if I allocate 0 memory for the array p, it still stores values properly. So why am I even bothering with malloc then?
It might appear to work fine, but it isn't very safe at all. By writing data outside the allocated block of memory you are overwriting some data you shouldn't. This is one of the greatest causes of segfaults and other memory errors, and what you're observing with it appearing to work in this short program is what makes it so difficult to hunt down the root cause.
Read this article, in particular the part on memory corruption, to begin understanding the problem.
Valgrind is an excellent tool for analysing memory errors such as the one you provide.
#David made a good comment. Compare the results of running your code to running the following code. Note the latter results in a runtime error (with pretty much no useful output!) on ideone.com (click on links), whereas the former succeeds as you experienced.
int main(void)
{
int *p;
p=malloc(sizeof(int));
printf("size of p=%d\n",sizeof(p));
p[500]=999999;
printf("p[0]=%d",p[500]);
p[500000]=42;
printf("p[0]=%d",p[500000]);
return 0;
}
If you don't allocate memory, p has garbage in it, so writing to it likely will fail. Once you made a valid malloc call, p is pointing to valid memory location and you can write to it. You are overwriting memory that you shouldn't write to, but nobody's going to hold your hand and tell you about it. If you run your program and a memory debugger such as valgrind, it will tell you.
Welcome to C.
Writing past the end of your memory is Undefined Behaviour™, which means that anything could happen- including your program operating as if what you just did was perfectly legal. The reason for your program running as if you had done malloc(501*sizeof(int)) are completely implementation-specific, and can indeed be specific to anything, including the phase of the moon.
This is because P would be assigned some address no matter what size you use with malloc(). Although, with a zero size you would be referencing invalid memory as the memory hasn't been allocated, but it may be within a location which wouldn't cause program crash, though the behavior will be undefined.
Now if you do not use malloc(), it would be pointing to a garbaging location and trying to access that is likely to cause program crash.
I tried it with malloc(0*sizeof(int))
According to C99 if the size passed to malloc is 0, a C runtime can return either a NULL pointer or the allocation behaves as if the request was for non-zero allocation, except that the returned pointer should not be dereferenced. So it is implementation defined (e.g. some implementations return a zero-length buffer) and in your case you do not get a NULL pointer back, but you are using a pointer you should not be using.If you try it in a different runtime you could get a NULL pointer back.
When you call malloc() a small chunk of memory is carved out of a larger page for you.
malloc(sizeof(int));
Does not actually allocate 4 bytes on a 32bit machine (the allocator pads it up to a minimum size) + size of heap meta data used to track the chunk through its lifetime (chunks are placed in bins based on their size and marked in-use or free by the allocator). hxxp://en.wikipedia.org/wiki/Malloc or more specifically hxxp://en.wikipedia.org/wiki/Malloc#dlmalloc_and_its_derivatives if you're testing this on Linux.
So writing beyond the bounds of your chunk doesn't necessarily mean you are going to crash. At p+5000 you are not writing outside the bounds of the page allocated for that initial chunk so you are technically writing to a valid mapped address. Welcome to memory corruption.
http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=heap+overflows
Our CheckPointer tool can detect this error. It knows that the allocation of p was to a chunk of 4 bytes, and thus the assignment is made, it is outside the area for which p was allocated. It will tell you that the p[500] assignment is wrong.

Segfault when dereferencing a custom mem address (C)

I want to declare a pointer, have it hold a custom address and then assign a value to it:
void main()
{
char *ptr;
ptr = (char *)0x123123; //the assignment works perfectly with a cast
printf("%p\n", ptr); //and the pointer indeed holds the address it's supposed to
*ptr = 'a'; //but this breaks
puts("2");
}
Initially I thought the reason is because I'm trying to dereference uninitialized memory. But I doubt actually that this is the case, since this some_type *some_ptr = &some_variable; works flawlessly, so the deal must be the address I assign it to.
Then I thought, in the same way 3 or 'a' or "alpine" are constants, (char *) 0x123123 must be a constant too. And const-s can't be edited in C, but that still can't be it, because an attempt to change a const value will not compile.
3rd assumption would be that such an address must be unavailable, but this doesn't make sense either, because line 4 works always, no matter the address I give or the type of the pointer.
3rd assumption would be that such an address must be unavailable,
That is correct: on modern OSes (which all have memory protection) you can't write to arbitrary memory address.
It used to be possible to access any memory on OSes that didn't utilize virtual memory (such as MS-DOS), but allowing that is generally a very bad idea -- it allowed random program to corrupt OS state, and required very frequent reboots.
but this doesn't make sense either, because line 4 works always, no matter the address I give or the type of the pointer.
You confuse two distinct operations: printing an address (allowed no matter what that address is) and dereferencing an address, i.e. reading or modifying the value stored at the address (only allowed for valid addresses).
The distinction is similar to "can you print an address?" (e.g. "123 Main Street, SomeTown, SomeCountry"), and "can you enter a house at that address?" (not possible for above address because there is no "SomeCountry" on Earth). Even if the address is valid, e.g. "1600 Pennsylvania Ave NW, Washington, DC 20500", you may still not be allowed to enter it.
The OP clarified elsewhere, that this is actually an XY problem.
The X problem: reading/writing to arbitrary memory locations.
The Y problem: implementing a linked list that uses consecutive memory.
Of course, the answer to that is: one has to implement his complete own memory management system to get there.
As in: first, you use malloc() to acquire a large block of consecutive memory. Then you can use arbitrary pointers within that block of memory. But of course, your code has to track which addresses are already used. Or to correctly "free" up when list nodes get deleted.
The tricky part is about handling the corner cases, such as: what happens when your last "pointer" gets used up? Do you malloc() a larger area, and move all data in memory?
Finally: assume that you don't manage a block of memory, but a single array. ( linked lists implementations are often based on arrays, as that makes some things much easier )
writing to some arbitrary memory address is dangerous and not allowed by modern operating systems, better to create a memory blob and write to that.
e.g. using malloc :
ptr = malloc(32); // now you can write to this memory block and it perfectly legal
*ptr = 'a';

Does free() remove the data stored in the dynamically allocated memory?

I wrote a simple program to test the contents of a dynamically allocated memory after free() as below. (I know we should not access the memory after free. I wrote this to check what will be there in the memory after free)
#include <stdio.h>
#include <stdlib.h>
main()
{
int *p = (int *)malloc(sizeof(int));
*p = 3;
printf("%d\n", *p);
free(p);
printf("%d\n", *p);
}
output:
3
0
I thought it will print either junk values or crash by 2nd print statement. But it is always printing 0.
1) Does this behaviour depend on the compiler?
2) if I try to deallocate the memory twice using free(), core dump is getting generated. In the man pages, it is mentioned that program behaviour is abnormal. But I am always getting core dump. Does this behaviour also depend on the compiler?
Does free() remove the data stored in the dynamically allocated memory?
No. free just free the allocated space pointed by its argument (pointer). This function accepts a char pointer to a previously allocated memory chunk, and frees it - that is, adds it to the list of free memory chunks, that may be re-allocated.
The freed memory is not cleared/erased in any manner.
You should not dereference the freed (dangling) pointer. Standard says that:
7.22.3.3 The free function:
[...] Otherwise, if the argument does not match a pointer earlier returned by a memory management
function, or if the space has been deallocated by a call to free or realloc, the behavior is undefined.
The above quote also states that freeing a pointer twice will invoke undefined behavior. Once UB is in action, you may get either expected, unexpected results. There may be program crash or core dump.
As described in gnu website
Freeing a block alters the contents of the block. Do not expect to find any data (such as a pointer to the next block in a chain of blocks) in the block after freeing it.
So, accessing a memory location after freeing it results in undefined behaviour, although free doesnt change the data in the memory location. U may be getting 0 in this example, u might as well get garbage in some other example.
And, if you try to deallocate the memory twice, on the second attempt you would be trying to free a memory which is not allocated, thats why you are gettin the core dump.
In addition to all the above explanations for use-after-free semantics, you really may want to investigate the life-saver for every C programmer: valgrind. It will automatically detect such bugs in your code and generally save your behind in the real world.
Coverity and all the other static code checkers are also great, but valgrind is awesome.
As far as standard C is concerned, it’s just not specified, because it is not observable. As soon as you free memory, all pointers pointing there are invalid, so there is no way to inspect that memory.*)
Even if you happen to have some standard C library documenting a certain behaviour, your compiler may still assume pointers aren’t reused after being passed to free, so you still cannot expect any particular behaviour.
*) I think, even reading these pointers is UB, not only dereferencing, but this doesn’t matter here anyway.

Why are the contents pointed to by a pointer not changed when memory is deallocated using free()?

I am a newbie when it comes to dynamic memory allocation. When we free the memory using void free(void *ptr) the memory is deallocated but the contents of the pointer are not deleted. Why is that? Is there any difference in more recent C compilers?
Computers don't "delete" memory as such, they just stop using all references to that memory cell and forget that anything of value is stored there. For example:
int* func (void)
{
int x = 5;
return &x;
}
printf("%d", *func()); // undefined behavior
Once the function has finished, the program stops reserving the memory location where x is stored, any other part of the program (or perhaps another program) is free to use it. So the above code could print 5, or it could print garbage, or it could even crash the program: referencing the contents of a memory cell that has ceased to be valid is undefined behavior.
Dynamic memory is no exception to this and works in the same manner. Once you have called free(), the contents of that part of the memory can be used by anyone.
Also, see this question.
The thing is that accessing memory after it has been freed is undefined behavior. It's not only that the memory contents are undefined, accessing them could lead to anything. At least some compilers when you build a debug version of the code, actually do change the contents of the memory to aid in debugging, but in release versions it's generally unnecessary to do that, so the memory is just left as is, but anyway, that is not something you can safely rely upon, don't access freed memory, it's unsafe!
In C, parameters are passed by value. So free just can't change the value of ptr.
Any change it would make would only change the value within the free function, and won't affect the caller's variable.
Also, changing it won't be so much help. There can be multiple pointers pointing to the same piece of memory, and they should all be reset when freeing. The language can't keep track of them all, so it leaves the programmer to handle the pointers.
This is very normal, because clearing the memory location after free is an overhead and generally not necessary. If you have security concerns, you can wrap the free call within a function which clears the region before freeing. You'll also notice that this requires the knowledge of the allocation size, which is another overhead.
Actually the C programming language specifies that after the lifetime of the object, even the value of any pointer pointing to it becomes indeterminate, i.e. you can't even depend on the pointer to even retain the original value.
That is because a good compiler will try to aggressively store all the variables into the CPU registers instead of memory. So after it sees that the program flow calls a function named free with the argument ptr, it can mark the register of the ptr free for other use, until it has been assigned to again, for example ptr = malloc(42);.
In between these two it could be seen changing the value, or comparing inequal against its original value, or other similar behaviour. Here's an example of what might happen.

Allocating less space then necessary for a certain type?

I am relatively new to C programming and having a hard time understanding the whole memory allocation issue.
Let's say, I do:
int *n = malloc(sizeof(char));
// (assuming malloc doesn't return NULL of course)
That provides a Pointer to int, but I didn't allocate enough memory for an int. Why does it work then? I could even cast it to int explicitly and it wouldn't bother gcc. I am aware of C compilers being very minimalist, but even if I assign a value to *n, which doesn't fit in a char, like:
*n = 300;
... and print it out afterwards:
printf("%d", *n);
... it works perfectly fine, although now at the latest I'd expect some error like a segmentation fault.
I mean, sizeof(char) is 1 and sizeof(int) is 4 on my machine. Hence 3 bytes are written to some place in memory which hasn't been allocated properly.
Does it work just because it doesn't leave the stack?
Could somebody please point me to a place where I might find enlightenment concerning that stuff?
That provides a Pointer to int, but I didn't allocate enough memory for an int. Why does it work then?
The return value from malloc is void*, the language allows this to be implicitly converted to any pointer type, in this case int*. Compilers don't typically include behavior to check that what you passed to malloc met a specific size requirement, in real-world code that can be very difficult (when non-constant sizes not known at compile time are passed to malloc). As you said, C compiler are usually rather minimalist. There are such things as "static analysis" tools which can analyze code to try to find these bugs, but that's a whole different class of tool than a compiler.
... it works perfectly fine, although now at the latest I'd expect some error like a segmentation fault. I mean, sizeof(char) is 1 and sizeof(int) is 4 on my machine. Hence 3 bytes are written to some place in memory which hasn't been allocated properly.
Writing beyond the bounds of allocated memory is what is called "undefined behavior". That means that a compliant compiler can do whatever it wants when that happens. Sometimes it will crash, sometimes it can write over some other variable in your program, sometimes nothing will happen, and sometimes nothing will seem to happen and your program will crash at a later date.
In this particular case what is happening is that most implementations of malloc allocate a minimum of 16 bytes (or more or less, like 8 or 32) even if you ask for less. So when you overwrite your single allocated byte you're writing into "extra" memory that was not used for anything. It is highly not recommended that you rely on that behavior in any real program.
Does it work just because it doesn't leave the stack?
The stack has nothing to do with this particular situation.
Could somebody please point me to a place where I might find enlightenment concerning that stuff?
Any good C book will have information of this type, take a look here: The Definitive C Book Guide and List
Generally a 32bit machine will allocate new memory on a 32bit boundary - it makes memory access faster.
So it has allocated a byte, but the next 3bytes are unused
Don't rely on this!

Resources