C does not have garbage collection, hence whenever we allocate memory using malloc/calloc/realloc, we need to manually free it after its use is over. How are variables of other data types like int, char etc. handled by C? How is the memory allocated to those variables released?
That depends. If you allocate any of those data types with malloc/calloc/realloc you will still need to free them.
On the other side, if a variable is declared inside a function, they are called automatic variables and whenever that function ends they'll be automatically collected.
The point here is not the data type per se, is the storage location. malloc/calloc/realloc allocate memory in the heap whereas automatic variables (variables declared inside functions) are allocated in the stack.
The heap is completely managed by the programmer, while the stack works in a way that when a function ends, the stack frame is shrink and every variable occupying that frame will be automatically overwritten when another function is called.
To grasp a better feeling of these, take a look at the memory layout of a C program. Other useful references might be free(3) man page and Wikipedia page for Automatic variables.
Hope this helps!
Resources (such as memory) have nothing to do with variables. You never have to think about variables. You only have to think about the resource itself, and you need to manage the lifetime of the resource. There are function calls that acquire a resource (such as malloc) and give you a handle for the resource (such as a void pointer), and you have to call another function (such as free) later on with that handle to release the resource.
Memory is only one example, C standard I/O-files work the same way, as do mutexes, sockets, window handles, etc. (In C++, add "dynamically allocated object" to the list.) But the central concept is that of the resource, the thing that needs acquiring and releasing. Variables have nothing to do with it except for the trivial fact that you can use variables to store the resource handles.
Related
For variables store in the stack, we can use static to avoid accessing from other files. Is there anyway to avoid pointer from other files accessing certain address?
First, to get things out of the way, static variables are never allocated on the stack because they are essentially global variables, they simply don't pollute the global namespace. It's trivial to get a pointer to a static variable and change it, statics are a compiler enforced construct.
Back to the actual question though, no you cannot try to examine the memory access directly. How would you even know if the memory you're accessing is valid or not? You can do something along the line though. You can for example, wrap malloc and free with your own memory management functions, and keep track of the memory allocated and freed along with metadata. You can then use another wrapper function that takes care of pointer dereferencing, and checks the metadata as you desire. You still can use raw pointers to wreak havoc if you want, so it isn't really much though.
I'm currently using C/C++. But what is the main reason of using malloc/new instead of just declare some var on the stack. Like: int a;
Or another example:
int *a; // then use this to track an array
int a = (int)malloc(sizeof(int));
Automatic variables have a lifetime that ends when the program leaves the block of code that declares them. Sometimes you want them to live longer than that; dynamic allocation gives you full control over their lifetime. Sometimes they are too big for the stack; dynamic memory is (typically) less restricted.
With that flexibility comes responsibility: you need to delete them when you've finished with them, but not before. This is difficult to get right if you try to hold onto a raw pointer and do it yourself; so (in C++) learn about RAII, and use ready-made management types like smart pointers and containers to do the work for you.
TL;DR: stack and heap are two different memory areas, they serve different purposes, they have their own access pattern (with everything that implies) and policies (yes, stack memory is way more controlled than heap memory on hw-enforced systems).
Stack memory is a precious, limited resource that needs to be allocated contiguously (you can't chunk it as you would for a heap allocation).
As a consequence of this, its default dimension on many x86 platforms is usually around 1 MB (although this can be increased). Definitely small with regard to how much memory you might allocate with a heap allocation. Your question isn't an exact duplicate of this question but I believe you should take a look at it if you're interested in other reasons for the stack being a limited resource.
Other reasons include visibility scopes (stack stuff is collected/destroyed at the end of the scope while heap allocated memory can be used in other parts of your application until you free it).
The C standard function malloc() with its friends (free(), calloc() and realloc()) allows to dynamically allocate arbitrary large (allowed by OS though) chunks of memory in runtime. This has informal name of dynamic storage duration (formal one from N1570 is allocated storage duration) with following characteristics:
lifetime is cotrolled by malloc() and free() calls. This is unlikely to automatic storage duration variables, where their lifetime is restricted by scope (e.g. block, more precisely by { and }, with exception to VLAs) and static storage duration, which have lifetime of whole execution of program
scope is determined by availability of pointer to allocated data
In other words you use malloc() when you need maximum flexibility of data allocation in runtime. Automatic variables are practically limited by stack size and static variables require compile-time reservation for exact (i.e. static) amount of memory.
I know the memory layout of a C program is divided into text, heap, stack, data and bss segments. I think (not sure) that this memory layout alone is the reason behind maintaining the scope and lifetime of variables of different storage classes.
For example, auto variables are stored in the stack. Each time a function call happens, a new stack frame is created which limits the access to called function's auto variables. But they are still inside their associated frame and get into action as soon as the called function returns control.
Thus, we can justify the scope and lifetime of auto variables. But, I want to know which data structures are used in the other segments (viz. data, bss and heap) to maintain such scoping. Or is it something else other than the memory layout that controls the scope and lifetime?
It seems, you confuse cause and effect. The scope and the lifetime of a variable is determined by the language standard. The implementation has to ensure, that the standard is met. It might use some memory layout that is handy on a certain platform, but there is no need to do so.
Memory layout with segments as text or bbs is basically a matter of the execution format, not of the language.
That said, one can answer about the most common case: There is no control of scope or lifetime in other "segments". Data and bss (for initialized and non-initialized global/static variables respectively) are for the duration of the process, and the heap is managed explicitly through malloc and free (until the entire heap is destroyed when the process terminates).
I don't know of "Viz", so I can't answer on this one.
Higher level languages such as javascript don't give the programmer a
choice as to where variables are stored. But C does. My question is:
are there any guidelines as to where to store variables, eg dependent
on size, usage, etc.
As far as I understand, there are three possible locations to store
data (excluding code segment used for actual code):
DATA segment
Stack
Heap
So transient small data items should be stored on the stack?
What about data items which must be shared between functions. These
items could be stored on the heap or in the data segment. How do you
decide which to choose?
You're looking through the wrong end of the telescope. You don't specify particular memory segments in which to store a variable (particularly since the very concept of a "memory segment" is highly platform-dependent).
In C code, you decide a variable's lifetime, visibility, and modifiability based on what makes sense for the code, and based on that the compiler will generate the machine code to store the object in the appropriate segment (if applicable)
For example, any variables declared at file scope (outside of any function) or with the keyword static will have static storage duration, meaning they are allocated at program startup and held until the program terminates; these objects may be allocated in a data segment or bss segment. Variables declared within a function or block without the static keyword have automatic storage duration, and are (typically) allocated on the stack.
String literals and other compile-time constant objects are often (but not always!) allocated in a readonly segment. Numeric literals like 3.14159 and character constants like 'A' are not objects, and do not (typically) have memory allocated for them; rather, those values are embedded directly in the machine code instructions.
The heap is reserved for dynamic storage, and variables as such are not stored there; instead, you use a library call like malloc to grab a chunk of the heap at runtime, and assign the resulting pointer value to a variable allocated as described above. The variable will live in either the stack or a data segment, while the memory it points to lives on the heap.
Ideally, functions should communicate solely through parameters, return values, and exceptions (where applicable); functions should not share data through an external variable (i.e., a global). Function parameters are usually allocated on the stack, although some platforms may pass parameters via registers.
You should prefer local/stack variables to global or heap variables when those variables are small, used often and in a relatively small/limited scope. That will give the compiler more opportunities to optimize the code using them as it'll know they aren't going to change between function calls unless you pass around pointers to them.
Also, the stack is usually relatively small and allocating large structures or arrays on it may lead to stack overflows, especially so in recursive code.
Another thing to consider is the use of global variables in multithreaded programs. You want to minimize chances of race conditions and one strategy for that is maiking functions thread-safe and re-enterant by not using any global resources in them directly (if malloc() is thread-safe, if errno is per-thread, etc you can use them, of course).
Btw, using local variables instead of global variables also improves code readability as the variables are located close to the place where they're used and you can quickly find out their type and where and how they're used.
Other than that, if your code is correct, there shouldn't be much practical difference between making variables local or global or in the heap (of course, malloc() can fail and you should remember about it:).
C only allows you to specify where data is stored indirectly... via the scope of the variable and/or allocation. i.e., a local variable to a function is typically a stack variable unless it is declared static in which case it will likely be DATA/BSS. Variables created dynamically via new/malloc will typically be heap.
However, there's no guarantee of any of that... only the implication of it.
That said, the one thing that is guaranteed to be a bad idea is to declare large local variables in functions... common source of strange errors and stack overflows. Very large arrays and structures are best suited to dynamic allocation and keep the pointers in local/global as required.
Given a pointer to some variable.. is there a way to check whether it was statically or dynamically allocated??
Quoting from your comment:
im making a method that will basically get rid of a struct. it has a data member which is a pointer to something that may or may not be malloced.. depending on which one, i would like to free it
The correct way is to add another member to the struct: a pointer to a deallocation function.
It is not just static versus dynamic allocation. There are several possible allocators, of which malloc() is just one.
On Unix-like systems, it could be:
A static variable
On the stack
On the stack but dynamically allocated (i.e. alloca())
On the heap, allocated with malloc()
On the heap, allocated with new
On the heap, in the middle of an array allocated with new[]
On the heap, within a struct allocated with malloc()
On the heap, within a base class of an object allocated with new
Allocated with mmap
Allocated with a custom allocator
Many more options, including several combinations and variations of the above
On Windows, you also have several runtimes, LocalAlloc, GlobalAlloc, HeapAlloc (with several heaps which you can create easily), and so on.
You must always release memory with the correct release function for the allocator you used. So, either the part of the program responsible for allocating the memory should also free the memory, or you must pass the correct release function (or a wrapper around it) to the code which will free the memory.
You can also avoid the whole issue by either requiring the pointer to always be allocated with a specific allocator or by providing the allocator yourself (in the form of a function to allocate the memory and possibly a function to release it). If you provide the allocator yourself, you can even use tricks (like tagged pointers) to allow one to also use static allocation (but I will not go into the details of this approach here).
Raymond Chen has a blog post about it (Windows-centric, but the concepts are the same everywhere): Allocating and freeing memory across module boundaries
The ACE library does this all over the place. You may be able to check how they do it. In general you probably shouldn't need to do this in the first place though...
Since the heap, the stack, and the static data area generally occupy different ranges of memory, it is possible with intimate knowledge of the process memory map, to look at the address and determine which allocation area it is in. This technique is both architecture and compiler specific, so it makes porting your code more difficult.
Most libc malloc implementations work by storing a header before each returned memory block which has fields (to be used by the free() call) which has information about the size of the block, as well as a 'magic' value. This magic value is to protect against the user accidently deleting a pointer which wasn't alloc'd (or freeing a block which was overwritten by the user). It's very system specific so you'd have to look at the implementation of your libc library to see exactly what magic value was there.
Once you know that, you move the given pointer back to point at header and then check it for the magic value.
Can you hook into malloc() itself, like the malloc debuggers do, using LD_PRELOAD or something? If so, you could keep a table of all the allocated pointers and use that. Otherwise, I'm not sure. Is there a way to get at malloc's bookkeeping information?
Not as a standard feature.
A debug version of your malloc library might have some function to do this.
You can compare its address to something you know to be static, and say it's malloced only if it's far away, if you know the scope it should be coming from, but if its scope is unknown, you can't really trust that.
1.) Obtain a map file for the code u have.
2.) The underlying process/hardware target platform should have a memory map file which typically indicates - starting address of memory(stack, heap, global0, size of that block, read-write attributes of that memory block.
3.) After getting the address of the object(pointer variable) from the mao file in 1.) try to see which block that address falls into. u might get some idea.
=AD