I am wondering if dynamically allocated memory with malloc global? I am reading online that allocated memory with malloc is stored on the heap. I also read online that all global variables are stored on the heap. Wouldn't this mean that dynamically allocated memory can be accessed globally? For example, I receive an error with the following code:
#include <stdio.h>
#include <stdlib.h>
void my_func(void)
{
printf("Pointer variables is: %d\n", *ptr);
}
int main()
{
int *ptr = (int *)malloc(sizeof(int));
*ptr = 5;
my_func();
return 0;
}
However, when i run the following code with a global variable there is no error:
#include <stdio.h>
int var = 5;
void my_func(void)
{
printf("Global variable is: %d\n", var);
}
int main()
{
my_func();
return 0;
}
You can access the memory created by malloc anywhere as long as you don't free it. I think that is your meaning of global.
But the ptr is a local variable of pointer type, points to the memory allocated. You have to pass it as a parameter of the function to use it.
They are two different concepts.
First, C has no global scope. “Global” means a name (an identifier) can be defined once and will be known throughout the program. In C, for a name to be known in multiple translation units, you must declare it in each translation unit where it is to be known (and define it in one of them) and link the translated files together.
Second, it would only make sense to speak of names as global or as having linkage (the property about linking different declarations of a name to the same object or function) or scope (where in a program a name is visible, meaning able to be used). Memory does not have scope or linkage. It might be said to be global in the sense it is accessible throughout the entire program, but “global” is not the right word for this since that is about visibility of names.
Third, “on the heap” is slang and should be avoided. Memory is dynamically allocated. (The C standard uses just “allocated,” but “dynamically allocated” is more explicit and is clearer in other contexts.) This slang arose because early memory management software would keep records about free blocks of memory in a heap data structure. When memory was allocated, if it could be satisfied by an existing free block, that block would be removed from the heap and given to the calling routine for its use. So allocated memory is actually taken off the heap; it is not on the heap. And modern memory managers may use diverse data structures to hold their records, either with or without heaps.
The typical memory model for a program is that all of its memory is accessible throughout the program. When memory is reserved for some use, whether by malloc or by other means, that memory may be used by any software in the program that has the address of that memory. Some memory is limited in how it may be used. For example, some may hold initialized data and be marked read-only, so that it cannot be modified. Other memory may hold program instructions and be marked as executable, so it can be executed (by a jump instruction or other instruction that transfers program control to that memory), whereas other memory in the program cannot be executed. However, these limitations generally apply to all software seeking to access memory in one way or another, unless special provisions are made (such as by calling operating system routines to change the protections).
In your program, ptr is not declared before my_func. Because of this, it is not visible inside my_func. This means the name ptr is not usable. It has nothing to do with the memory that ptr points to. To make the name ptr visible inside my_func, you must declare it prior to using it. One way to do this would be to declare an external variable (here, “external” means outside of any function):
int *ptr; // External declaration (and tentative definition).
void my_func(void)
{
printf("Pointer variables is: %d\n", *ptr);
}
int main()
{
ptr = malloc(sizeof *ptr); // Changed from declaration to assignment.
*ptr = 5;
my_func();
return 0;
}
Another way is to declare it as a function parameter:
void my_func(int *ptr)
{
printf("Pointer variables is: %d\n", *ptr);
}
int main()
{
int *ptr = malloc(sizeof *ptr);
*ptr = 5;
my_func(ptr);
return 0;
}
In this case, `void my_func(int *ptr)` declares a **different** `ptr` from the one in `main`. There are two variables named `ptr` in this program, and they are not linked together. The one in `main` is given a value in `main`. Then the call `my_func(ptr)` passes the value of this `ptr` to `my_func`. When `my_func` starts executing, a new variable named `ptr` is created and is given the value passed as the argument.
Bonus: I changed `(int *)malloc(sizeof(int));` to `malloc(sizeof *ptr);`. In C, unlike C++, it is not necessary to cast the result of `malloc`, and it is recommended not to because doing so can conceal the error of failing to use `#include <stdlib.h>`. Also, `malloc(sizeof *ptr)` says to allocate space for one of whatever type `ptr` points to. With `malloc(sizeof(int))`, an error can occur if somebody changes the type of `ptr` but forgets to find all places that type is used with `ptr` and change them too. With `malloc(sizeof *ptr)`, appropriate space will be allocated even if the type of `ptr` is changed with no other edits.
Related
In most managed languages (that is, the ones with a GC), local variables that go out of scope are inaccessible and have a higher GC-priority (hence, they'll be freed first).
Now, C is not a managed language, what happens to variables that go out of scope here?
I created a small test-case in C:
#include <stdio.h>
int main(void){
int *ptr;
{
// New scope
int tmp = 17;
ptr = &tmp; // Just to see if the memory is cleared
}
//printf("tmp = %d", tmp); // Compile-time error (as expected)
printf("ptr = %d\n", *ptr);
return 0;
}
I'm using GCC 4.7.3 to compile and the program above prints 17, why? And when/under what circumstances will the local variables be freed?
The actual behavior of your code sample is determined by two primary factors: 1) the behavior is undefined by the language, 2) an optimizing compiler will generate machine code that does not physically match your C code.
For example, despite the fact that the behavior is undefined, GCC can (and will) easily optimize your code to a mere
printf("ptr = %d\n", 17);
which means that the output you see has very little to do with what happens to any variables in your code.
If you want the behavior of your code to better reflect what happens physically, you should declare your pointers volatile. The behavior will still be undefined, but at least it will restrict some optimizations.
Now, as to what happens to local variables when they go out of scope. Nothing physical happens. A typical implementation will allocate enough space in the program stack to store all variables at the deepest level of block nesting in the current function. This space is typically allocated in the stack in one shot at the function startup and released back at the function exit.
That means that the memory formerly occupied by tmp continues to remain reserved in the stack until the function exits. That also means that the same stack space can (and will) be reused by different variables having approximately the same level of "locality depth" in sibling blocks. The space will hold the value of the last variable until some other variable declared in some sibling block variable overrides it. In your example nobody overrides the space formerly occupied by tmp, so you will typically see the value 17 survive intact in that memory.
However, if you do this
int main(void) {
volatile int *ptr;
volatile int *ptrd;
{ // Block
int tmp = 17;
ptr = &tmp; // Just to see if the memory is cleared
}
{ // Sibling block
int d = 5;
ptrd = &d;
}
printf("ptr = %d %d\n", *ptr, *ptrd);
printf("%p %p\n", ptr, ptrd);
}
you will see that the space formerly occupied by tmp has been reused for d and its former value has been overriden. The second printf will typically output the same pointer value for both pointers.
The lifetime of an automatic object ends at the end of the block where it is declared.
Accessing an object outside of its lifetime is undefined behavior in C.
(C99, 6.2.4p2) "If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime."
Local variables are allocated on the stack. They are not "freed" in the sense you think about GC languages, or memory allocated on the heap. They simply go out of scope, and for builtin types the code won't do anything - and for objects, the destructor is called.
Accessing them beyond their scope is Undefined Behaviour. You were just lucky, as no other code has overwritten that memory area...yet.
I'm currently learning the Linux process address space and I'm not sure where these C variables correspond in the process address space.
I know that when a function is called, a new frame is created, it'll contain local variables and other function calls etc..
What I am not sure about is the pointers that are in the frame:
I have this function:
int main(){
char *pointer1 = NULL;
char *pointer2 = (void *)0xDDDDDDDD;
pointer1 = malloc(80);
strcpy(pointer1, "Testing..");
return(0);
}
When main is called, a new frame is created.
Variables are initialized.
What I am not sure about these are the pointers, where does:
*pointer1 correspond to in the process address space - data or text section?
*pointer2 correspond to in the process address space - data or text section?
Does NULL and 0xDDDDDDDD belong to data or text section?
since pointer1 = malloc(80), does it belong to the stack section?
First of all it should be noted that the C specification doesn't actually require local variables to be stored on a stack, it doesn't specify location of automatic variables at all.
With that said, the storage for the variables pointer1 and pointer2 themselves will most likely be put on a stack by the compiler. Memory for them will be part of the stack-frame created by the compiler when the main function is called.
To continue, on modern PC-like systems a pointer is really nothing more than a simple unsigned integer, and its value is the address where it points. The values you use for the initialization (NULL and 0xDDDDDDDD) are simply plain integer values. The initialization is done just the same as for a plain int variable. And as such, the values used for initialization doesn't really exists as "data", instead they could be encoded directly in the machine code, and as such will be stored in the "text" (code) segment.
Lastly for the dynamic allocation, it doesn't change where pointer1 is stored. What is does it simply assigning a new value to pointer1. The memory being allocated is on the "heap" which is separate from any program section (i.e. it's neither in the code, data or stack segments).
As some programmer dude just said, the C spec does not state a region where automatic variables must be placed. But it is usual for compilers to grow the stack to accommodate them there. However, they might end on the .data region, and they will if they were, e.g., defined as static char *pointer1 instead.
The initialization values may or may not exist in a program region either. In your case, since the type of values is int, most architectures will inline the initialization as appropriate machine instructions instead, if instructions with appropriate inline operators are available. In x86_64, for example, a single mov/movq operation will be issued to put the 0 (NULL) or the other int in the appropriate memory location on the stack.
However, variables initialized with global scope, such as static char string[40] = "Hello world" or other initialized global variables end up on the .data region and take up space in there. Compilers may place declared, but undefined, global scoped variables on the .bss region instead.
The question since pointer1 = malloc(80), does it belong to the stack section? is ill-defined, because it comprises two things.
The value pointer1 is a value that will be saved at &pointer1. An address which, given the above consideration, the compiler may have put on the stack.
The result of malloc(80) is a value that refers to a region on the heap, a different region, dynamically allocated outside the mapped program space.
On Linux, the result of calling malloc may even create a new NULL-backed memory region (that is, a transient region that is not permanently stored on a file; although it could be swapped by the kernel).
In essence, you could think of how malloc(80) behaves, as something like (not taking free() into consideration, so this is an oversimplification):
int space_left = 0; void *last_mapping = NULL;
void *malloc(int req) {
void *result;
if (space_left < req) {
last_mapping = mmap(NULL, MALLOC_CHUNK_LENGTH, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
space_left = MALLOC_CHUNK_LENGTH;
}
space_left -= req;
result = last_mapping;
last_mapping += req;
return result;
}
The huge difference between calling malloc and mmap with MAP_PRIVATE is that mmap is a Linux System Call, which must make a kernel context switch, allocate a new memory map and reset the MMU layer for every memory chunk allocated, while malloc can be more intelligent and use a single big region as "heap" and manage the different malloc's and free's in userspace after the heap initialization (until the heap runs out of space, where it might have to manage multiple heaps).
Last section of your doubts i.e. "since pointer1 = malloc(80), does it belong to the stack section? " , I can tell you
In C, dynamic memory is allocated from the heap using some standard library functions. The two key dynamic memory functions are malloc() and free().
The malloc() function takes a single parameter, which is the size of the requested memory area in bytes. It returns a pointer to the allocated memory. If the allocation fails, it returns NULL. The prototype for the standard library function is like this:
void *malloc(size_t size);
The free() function takes the pointer returned by malloc() and de-allocates the memory. No indication of success or failure is returned. The function prototype is like this:
void free(void *pointer);
You can refer the doc
https://www.design-reuse.com/articles/25090/dynamic-memory-allocation-fragmentation-c.html
After ignoring C for my entire CS career I have decided to give it a look!
When initialising variables, we can have :
int b = 0;
This initialises b, allocates memory for it, and we can later update it with
b = 2;
if needs be.
So, and forgive me for this ridiculously "noob" question but why do we need calls like :
double *b = (double *) calloc(n, sizeof(double));
when initialising the variable would allocate the space for it already?
Why can we not just do
double b = 0;
b* = b.addressOf(b) //or some similar construct.
What is the use of this?
I have tried Googling this to no avail so please forgive me - ufortunately * in Google is a wildcard and so relevant results are hard to find.
Variables declared in the current context end their lifetime at the end of the context.
Allocating memory gives you space to store longer-lived variables.
For example,
double *foo() {
double d;
return &d;
}
void bar() {
double *d = foo();
*d = 0.0;
}
will try to access a variable that no longer exists, because its lifetime is the foo function.
C and C++ do not keep track of objects. A pointer only points to the object, but does not extend object lifetime, so it is entirely possible for a pointer to be invalid even if it is not NULL.
However, this is valid:
double *foo() {
return (double *)malloc(sizeof(double));
}
void bar() {
double *d = foo();
*d = 0.0;
}
This will allocate memory for a double, and return the pointer to the memory, which remains valid until explicitly returned to the pool using the free function. Not returning it to the pool will create a memory leak.
Unless I'm totally mistaken, in C, calloc or malloc are the only possibilities to implement dynamic data structures.
When it comes about variable allocation you can do it like:
statically on the stack, simply: int a = 10. These variables are defined on the stack most possibly together with some of the code using them (this is why it can be dangerous to write in an array declared on the stack without proper checking the boudadries. You might overwrite code). The variables also have a scope: function scope, global scope, and other scopes (such as the if-branch of an if-else). They are fast to use, however they are more or less ... static, and they have the big advantage that you don't need to clean them. They are automatically cleaned by the application. However they have a great disadvantage. Stack space is more limited than heap space. So you can use only modest sized variables (Don't take it literally, instead do some research what is allowed by your OS . 64KB is not enough to everyone :) ).
Dynamically on the heap, using either calloc() or some other memory allocation function. These variables are declared in an area known as the heap, or dynamic memory. These variables will stay there either until the application using them exits (in this case the (modern) OS usually reclaims the memory to itself), or they are freed using free(). You always should free the memory to avoid memory leaks. Dynamic memory has the advantage that (on a modern OS) the addressable memory is much bigger than the size allocated to stack space so you can have more, bigger, greater structures and arrays.
Scope is the region or section of the code where a variable can be accessed. There can be
File Scope
Function Scope
Block Scope
Program Scope
Prototype Scope
Example
#include<stdio.h>
void function1()
{
printf("In function1\n");
}
static void function2()
{
printf("In function2\n");
{
int i = 100;
Label1:
printf("The value of i =%d\n",i);
i++;
if(i<105)
goto Label1;
}
}
void function3(int x, int y);
int main(void)
{
function1();
function2();
return 0;
}
In the example,
‘function1()’ has ‘Program Scope’.
‘function2()’ has ‘File Scope’.
‘Label1’ has ‘Function Scope’. (Label names must be unique within the functions. ‘Label’ is the only identifier that has function scope.
Variable ‘i’ has ‘Block Scope’.
Variable ‘x’ and ‘y’ has ‘Prototype Scope’. There cannot be two variables with the name ‘x’ or ‘y’ in the function parameter list.
The variable i in the above example have the block scope. If the control goes out of scope (life ends), then the variable is gone. You can not access the variable.
So C provides dynamic memory constructs to access the memory in these kind of scenarios.
For example:
int* function(void)
{
int *ptr = malloc(sizeof(int));
*ptr = 5;
return ptr;
}
int main(void)
{
printf("%d", function());
return 0;
}
the printf would still print the value even the variable ptr is out of scope but the memory pointed by ptr still exists (has life).
Also read https://stackoverflow.com/a/18479996/1814023
What is the advantage of the static keyword in block scope vs. using malloc?
For example:
Function A:
f() {
static int x = 7;
}
Function B:
f() {
int *x = malloc(sizeof(int));
if (x != NULL)
*x = 7;
}
If I am understanding this correctly, both programs create an integer 7 that is stored on the heap. In A, the variable is created at the very beginning in some permanent storage, before the main method executes. In B, you are allocating the memory on the spot once the function is called and then storing a 7 where that pointer points. In what type of situations might you use one method over the other? I know that you cannot free the x in function A, so wouldn't that make B generally more preferable?
Both programs create an integer 7 that is stored on the heap
No, they don't.
static creates a object with static storage duration which remains alive throughout the lifetime of the program. While a dynamically allocated object(created by malloc) remains in memory until explicitly deleted by free. Both provide distinct functionality. static maintains the state of the object within function calls while dynamically allocated object does not.
In what type of situations might you use one method over the other?
You use static when you want the object to be alive throughout the lifetime of program and maintain its state within function calls. If you are working in a multithreaded environment the same static object will be shared for all the threads and hence would need synchronization.
You use malloc when you explicitly want to control the lifetime of the object.for e.g: Making sure the object lives long enough till caller of function accesses it after the function call.(An automatic/local object will be deallocated once the scope{ } of the function ends). Unless the caller explicitly calls free the allocated memory is leaked until the OS reclaims it at program exit.
In Function A, you're allocating x with static storage duration, which generally means it is not on (what most people recognize as) the heap. Rather, it's just memory that's guaranteed to exist the entire time your program is running.
In Function B, you're allocating the storage every time you enter the function, and then (unless there's a free you haven't shown) leaking that memory.
Given only those two choices, Function A is clearly preferable. It has shortcomings (especially in the face of multi-threading) but at least there are some circumstances under which it's correct. Function B (as it stands) is just plain wrong.
Forget stack v. heap. That is not the most important thing that is going on here.
Sometimes static modifies scope and sometimes it modifies lifetime. Prototypical example:
void int foo() {
static int count = 0;
return count++;
}
Try calling this repeatedly, perhaps from several different functions or files even, and you'll see that count keeps increasing, because in this case static gives the variable a lifetime equal to that of the entire execution of the program.
Read http://www.teigfam.net/oyvind/pub/notes/09_ANSI_C_static_vs_malloc.html
The static variable is created before main() and memory does not need to be allocated after running the program.
If I am understanding this correctly, both programs create an integer 7 that is stored on the heap
No, static variables are created in Data or BSS segment, and they have lifetime throughout the lifetime of the program. When you alloc using malloc(), memory is allocated in heap, and they must be explicitly freed using free() call.
In what type of situations might you use one method over the other?
Well, you use the first method, when you want access to the same variable for the multiple invocation of the same function. ie, in your example, x will only initialized once, and when you call the method for the second time, the same x variable is used.
Second method can be used, when you don't want to share the variable for multiple invocation of the function, so that this function is called for the second time, x is malloced again.
You must free x every time.
You can see the difference by calling f() 2 times, for each kind of f()
...
f();
f();
...
f(){
static int x = 7;
printf("x is : %d", x++);
}
f(){
int *x = malloc(sizeof(int));
if (x != NULL)
*x = 7;
printf("x is : %d", (*x)++);
free(x); //never forget this,
}
the results will be different
First things first , static is a storage class , and malloc() is an API , which triggers the brk() system call to allocate memory on the heap.
If I am understanding this correctly, both programs create an integer
7 that is stored on the heap ?
No.Static variables are stored in the data section of the memory allocated to the program. Even though if the scope of a static variable ends , it can still be accessed outside its scope , this may indicate that , the contents of data segment , has a lifetime independent of scope.
In what type of situations might you use one method over the other?
If you want more control , within a given scope ,over your memory use malloc()/free(), else the simpler (and more cleaner) way is to use static.
In terms of performance , declaring a variable static is much faster , than allocating it on the heap . since the algorithms for heap management is complex and the time needed to service a heap request varies depending on the type of algorithm
One more reason i can think of suggesting static is that , the static variables are by default initialized to zero , so one more less thing to worry about.
consider below exaple to understand how static works. Generally we use static keyword to define scope of variable or function. e.g. a variable defined as static will be restricted within the function and will retail its value.
But as shown in below sample program if you pass the reference of the static variable to any other function you can still update the same variable from any other function.
But precisely the static variable dies when the program terminates, it means the memory will be freed.
#include <stdio.h>
void f2(int *j)
{
(*j)++;
printf("%d\n", *j);
}
void f1()
{
static int i = 10;
printf("%d\n", i);
f2(&i);
printf("%d\n", i);
}
int main()
{
f1();
return 0;
}
But in case of malloc(), memory will not be freed on termination of the program unless and untill programmer takes care of freeing the memory using free() before termination of the program.
This way you will feel that using malloc() we can have control over variable lifespan but beware...you have to be very precise in allocating and freeing the memory when you choose dynamic memory allocation.
If you forget to free the memory and program terminated that part of heap cannot be used to allocate memory by other process. This will probably lead to starvation of memory in real world and slows down the computation. To come out of such situation you have to manually reboot the system.
In most managed languages (that is, the ones with a GC), local variables that go out of scope are inaccessible and have a higher GC-priority (hence, they'll be freed first).
Now, C is not a managed language, what happens to variables that go out of scope here?
I created a small test-case in C:
#include <stdio.h>
int main(void){
int *ptr;
{
// New scope
int tmp = 17;
ptr = &tmp; // Just to see if the memory is cleared
}
//printf("tmp = %d", tmp); // Compile-time error (as expected)
printf("ptr = %d\n", *ptr);
return 0;
}
I'm using GCC 4.7.3 to compile and the program above prints 17, why? And when/under what circumstances will the local variables be freed?
The actual behavior of your code sample is determined by two primary factors: 1) the behavior is undefined by the language, 2) an optimizing compiler will generate machine code that does not physically match your C code.
For example, despite the fact that the behavior is undefined, GCC can (and will) easily optimize your code to a mere
printf("ptr = %d\n", 17);
which means that the output you see has very little to do with what happens to any variables in your code.
If you want the behavior of your code to better reflect what happens physically, you should declare your pointers volatile. The behavior will still be undefined, but at least it will restrict some optimizations.
Now, as to what happens to local variables when they go out of scope. Nothing physical happens. A typical implementation will allocate enough space in the program stack to store all variables at the deepest level of block nesting in the current function. This space is typically allocated in the stack in one shot at the function startup and released back at the function exit.
That means that the memory formerly occupied by tmp continues to remain reserved in the stack until the function exits. That also means that the same stack space can (and will) be reused by different variables having approximately the same level of "locality depth" in sibling blocks. The space will hold the value of the last variable until some other variable declared in some sibling block variable overrides it. In your example nobody overrides the space formerly occupied by tmp, so you will typically see the value 17 survive intact in that memory.
However, if you do this
int main(void) {
volatile int *ptr;
volatile int *ptrd;
{ // Block
int tmp = 17;
ptr = &tmp; // Just to see if the memory is cleared
}
{ // Sibling block
int d = 5;
ptrd = &d;
}
printf("ptr = %d %d\n", *ptr, *ptrd);
printf("%p %p\n", ptr, ptrd);
}
you will see that the space formerly occupied by tmp has been reused for d and its former value has been overriden. The second printf will typically output the same pointer value for both pointers.
The lifetime of an automatic object ends at the end of the block where it is declared.
Accessing an object outside of its lifetime is undefined behavior in C.
(C99, 6.2.4p2) "If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime."
Local variables are allocated on the stack. They are not "freed" in the sense you think about GC languages, or memory allocated on the heap. They simply go out of scope, and for builtin types the code won't do anything - and for objects, the destructor is called.
Accessing them beyond their scope is Undefined Behaviour. You were just lucky, as no other code has overwritten that memory area...yet.