Why do we need calloc (or malloc)? - c

After ignoring C for my entire CS career I have decided to give it a look!
When initialising variables, we can have :
int b = 0;
This initialises b, allocates memory for it, and we can later update it with
b = 2;
if needs be.
So, and forgive me for this ridiculously "noob" question but why do we need calls like :
double *b = (double *) calloc(n, sizeof(double));
when initialising the variable would allocate the space for it already?
Why can we not just do
double b = 0;
b* = b.addressOf(b) //or some similar construct.
What is the use of this?
I have tried Googling this to no avail so please forgive me - ufortunately * in Google is a wildcard and so relevant results are hard to find.

Variables declared in the current context end their lifetime at the end of the context.
Allocating memory gives you space to store longer-lived variables.
For example,
double *foo() {
double d;
return &d;
}
void bar() {
double *d = foo();
*d = 0.0;
}
will try to access a variable that no longer exists, because its lifetime is the foo function.
C and C++ do not keep track of objects. A pointer only points to the object, but does not extend object lifetime, so it is entirely possible for a pointer to be invalid even if it is not NULL.
However, this is valid:
double *foo() {
return (double *)malloc(sizeof(double));
}
void bar() {
double *d = foo();
*d = 0.0;
}
This will allocate memory for a double, and return the pointer to the memory, which remains valid until explicitly returned to the pool using the free function. Not returning it to the pool will create a memory leak.

Unless I'm totally mistaken, in C, calloc or malloc are the only possibilities to implement dynamic data structures.

When it comes about variable allocation you can do it like:
statically on the stack, simply: int a = 10. These variables are defined on the stack most possibly together with some of the code using them (this is why it can be dangerous to write in an array declared on the stack without proper checking the boudadries. You might overwrite code). The variables also have a scope: function scope, global scope, and other scopes (such as the if-branch of an if-else). They are fast to use, however they are more or less ... static, and they have the big advantage that you don't need to clean them. They are automatically cleaned by the application. However they have a great disadvantage. Stack space is more limited than heap space. So you can use only modest sized variables (Don't take it literally, instead do some research what is allowed by your OS . 64KB is not enough to everyone :) ).
Dynamically on the heap, using either calloc() or some other memory allocation function. These variables are declared in an area known as the heap, or dynamic memory. These variables will stay there either until the application using them exits (in this case the (modern) OS usually reclaims the memory to itself), or they are freed using free(). You always should free the memory to avoid memory leaks. Dynamic memory has the advantage that (on a modern OS) the addressable memory is much bigger than the size allocated to stack space so you can have more, bigger, greater structures and arrays.

Scope is the region or section of the code where a variable can be accessed. There can be
File Scope
Function Scope
Block Scope
Program Scope
Prototype Scope
Example
#include<stdio.h>
void function1()
{
printf("In function1\n");
}
static void function2()
{
printf("In function2\n");
{
int i = 100;
Label1:
printf("The value of i =%d\n",i);
i++;
if(i<105)
goto Label1;
}
}
void function3(int x, int y);
int main(void)
{
function1();
function2();
return 0;
}
In the example,
‘function1()’ has ‘Program Scope’.
‘function2()’ has ‘File Scope’.
‘Label1’ has ‘Function Scope’. (Label names must be unique within the functions. ‘Label’ is the only identifier that has function scope.
Variable ‘i’ has ‘Block Scope’.
Variable ‘x’ and ‘y’ has ‘Prototype Scope’. There cannot be two variables with the name ‘x’ or ‘y’ in the function parameter list.
The variable i in the above example have the block scope. If the control goes out of scope (life ends), then the variable is gone. You can not access the variable.
So C provides dynamic memory constructs to access the memory in these kind of scenarios.
For example:
int* function(void)
{
int *ptr = malloc(sizeof(int));
*ptr = 5;
return ptr;
}
int main(void)
{
printf("%d", function());
return 0;
}
the printf would still print the value even the variable ptr is out of scope but the memory pointed by ptr still exists (has life).
Also read https://stackoverflow.com/a/18479996/1814023

Related

Is Dynamically allocated memory global?

I am wondering if dynamically allocated memory with malloc global? I am reading online that allocated memory with malloc is stored on the heap. I also read online that all global variables are stored on the heap. Wouldn't this mean that dynamically allocated memory can be accessed globally? For example, I receive an error with the following code:
#include <stdio.h>
#include <stdlib.h>
void my_func(void)
{
printf("Pointer variables is: %d\n", *ptr);
}
int main()
{
int *ptr = (int *)malloc(sizeof(int));
*ptr = 5;
my_func();
return 0;
}
However, when i run the following code with a global variable there is no error:
#include <stdio.h>
int var = 5;
void my_func(void)
{
printf("Global variable is: %d\n", var);
}
int main()
{
my_func();
return 0;
}
You can access the memory created by malloc anywhere as long as you don't free it. I think that is your meaning of global.
But the ptr is a local variable of pointer type, points to the memory allocated. You have to pass it as a parameter of the function to use it.
They are two different concepts.
First, C has no global scope. “Global” means a name (an identifier) can be defined once and will be known throughout the program. In C, for a name to be known in multiple translation units, you must declare it in each translation unit where it is to be known (and define it in one of them) and link the translated files together.
Second, it would only make sense to speak of names as global or as having linkage (the property about linking different declarations of a name to the same object or function) or scope (where in a program a name is visible, meaning able to be used). Memory does not have scope or linkage. It might be said to be global in the sense it is accessible throughout the entire program, but “global” is not the right word for this since that is about visibility of names.
Third, “on the heap” is slang and should be avoided. Memory is dynamically allocated. (The C standard uses just “allocated,” but “dynamically allocated” is more explicit and is clearer in other contexts.) This slang arose because early memory management software would keep records about free blocks of memory in a heap data structure. When memory was allocated, if it could be satisfied by an existing free block, that block would be removed from the heap and given to the calling routine for its use. So allocated memory is actually taken off the heap; it is not on the heap. And modern memory managers may use diverse data structures to hold their records, either with or without heaps.
The typical memory model for a program is that all of its memory is accessible throughout the program. When memory is reserved for some use, whether by malloc or by other means, that memory may be used by any software in the program that has the address of that memory. Some memory is limited in how it may be used. For example, some may hold initialized data and be marked read-only, so that it cannot be modified. Other memory may hold program instructions and be marked as executable, so it can be executed (by a jump instruction or other instruction that transfers program control to that memory), whereas other memory in the program cannot be executed. However, these limitations generally apply to all software seeking to access memory in one way or another, unless special provisions are made (such as by calling operating system routines to change the protections).
In your program, ptr is not declared before my_func. Because of this, it is not visible inside my_func. This means the name ptr is not usable. It has nothing to do with the memory that ptr points to. To make the name ptr visible inside my_func, you must declare it prior to using it. One way to do this would be to declare an external variable (here, “external” means outside of any function):
int *ptr; // External declaration (and tentative definition).
void my_func(void)
{
printf("Pointer variables is: %d\n", *ptr);
}
int main()
{
ptr = malloc(sizeof *ptr); // Changed from declaration to assignment.
*ptr = 5;
my_func();
return 0;
}
Another way is to declare it as a function parameter:
void my_func(int *ptr)
{
printf("Pointer variables is: %d\n", *ptr);
}
int main()
{
int *ptr = malloc(sizeof *ptr);
*ptr = 5;
my_func(ptr);
return 0;
}
In this case, `void my_func(int *ptr)` declares a **different** `ptr` from the one in `main`. There are two variables named `ptr` in this program, and they are not linked together. The one in `main` is given a value in `main`. Then the call `my_func(ptr)` passes the value of this `ptr` to `my_func`. When `my_func` starts executing, a new variable named `ptr` is created and is given the value passed as the argument.
Bonus: I changed `(int *)malloc(sizeof(int));` to `malloc(sizeof *ptr);`. In C, unlike C++, it is not necessary to cast the result of `malloc`, and it is recommended not to because doing so can conceal the error of failing to use `#include <stdlib.h>`. Also, `malloc(sizeof *ptr)` says to allocate space for one of whatever type `ptr` points to. With `malloc(sizeof(int))`, an error can occur if somebody changes the type of `ptr` but forgets to find all places that type is used with `ptr` and change them too. With `malloc(sizeof *ptr)`, appropriate space will be allocated even if the type of `ptr` is changed with no other edits.

Arrays, pointers, and memory management toy example question in C

I was playing around with C memory and pointers, and I had some questions:
int* foo(){
int a[100] = ...;
int* b = malloc(100 * sizeof(int));
... do something ...
return b
}
Is the memory consumed by a freed immediately after exiting the function?
Would changing the definition to int b [100]; be equivalent?
This declaration
int a[100] = ...;
declares a local object with the automatic storage duration. So after exiting the function the array will not be alive and its memory can be used for other purposes.
Returning from the function the array
int b [100];
like
return b;
makes the returned pointer invalid because as it has been said above the array b will not be alive after exiting the function
As for the dynamically allocated memory
int* b = malloc(100 * sizeof(int));
then it will be freed only when you will implicitly call the function free or when the process will stop its execution.
You could return the array if it had static storage duration. For example
static int b[100];
Remember that a is scoped to the function and as soon as the function exits it falls out of scope meaning any references to it, like pointers, are invalidated. Understanding scope is really important, so be sure to learn more about this before you end up introducing undefined behaviour into your programs inadvertently.
Changing b to be the same as a would subject it to the same problems. malloc() is used to dynamically allocate something and it will be valid until explicitly released with free().
Don't forget that allocating memory comes with the responsibility of releasing it. You need to have a plan to deal with that allocation.

Dynamically Allocated Array vs. Automating Declared Array with Global Scope (C language)

What is the difference between declaring an array "dynamically",
[ie. using realloc() or malloc(), etc... ]
vs
declaring an array within main() with Global scope?,
eg.
int main()
{
int array[10];
return 0;
}
I am learning, and at the moment it feels that there is not much differnce between
declaring a variable (array, whatever) -with Global scope,
when compared to a
dynamically allocated variable (array, whatever) -AND never calling free() on it AND allowing it to be 'destoryed' when the program ends'
What are the consequences of either option?
EDIT
Thank you for your responses.
Global scope should have been 'local scope' -local to main()
When you declare an array like int arr[10] in a function, the space for the array is allocated on the stack. The memory will be freed when your function exits.
When you declare an array or any other data structure using malloc() or realloc(), you allocated the space on the heap and the memory will only be freed afer the program exits. So when the program is running, you are responsible for freeing it using free() after you no longer want to use it. If you don't free it and make your array pointer point to something else, you will create a memory leak. However, your computer will always be able to retrieve all the program's used memory after the program ends because of virtual memory.
As kaylum said in comment below your question, the array in your second example does not have global scope. Its scope is limited to main(), and it is inaccessible in other scopes unless main() explicitly makes it available (e.g. passes it by argument to another function).
Dynamic memory allocation means that the programmer explicitly allocates memory when needed, and explicitly releases it when no longer needed. Because of that, the amount of memory allocated can be determined at run time (e.g. calculated from user input). Also, if the programmer forgets to release the memory, or reallocates it inappropriately, memory can be leaked (still allocated by the program, but not accessible by the program). For example;
/* within a function */
char *p = malloc(100);
p = malloc(200);
free(p);
leaks 100 bytes, every time this code is executed, because the result of the first malloc() call is never released, and it is then inaccessible to the program because its value is not stored anywhere.
Your second example is actually an array of automatic storage duration. As far as your program is concerned, it only exists until the end of the scope in which it is created. In your case, as main() returns, the array will cease to exist.
An example of an array with global scope is
int array[10];
void f() {array[0] = 42;}
int main()
{
array[0] = 10;
f();
/* array[0] will be 42 here */
}
The difference is that this array exists and is accessible to every function that has visibility of the declaration, within the same compilation unit.
One other important difference is that global arrays are (usually) zero initialised - a global array of int will have all elements zero. A dynamically allocated array will not have elements initialised (unless created with calloc(), which does initialise to zero). Similarly, an automatic array will not have elements initialised. It is undefined behaviour to access the value of something (including an array element) that is uninitialised.
So
#include <stdio.h>
int array[10];
int main()
{
int *array2;
int array3[10];
array2 = malloc(10*sizeof(*array2));
printf("%d\n", array[0]); /* okay - will print 0 */
printf("%d\n", array2[0]); /* undefined behaviour. array2[0] is uninitialised */
printf("%d\n", array3[0]); /* undefined behaviour. array3[0] uninitialised */
return 0;
}
Obviously the way to avoid undefined behaviour is to initialise array elements to something valid before trying to access their value (e.g. printing them out, in the example above).

Not sure whether or not to malloc memory for a struct

Suppose I have the following C code:
#include <stdio.h>
#include <stdlib.h>
#define NUM_PEOPLE 24
typedef struct {
char **name;
int age;
} person_t;
void get_person_info(person_t *person);
int main(int argc, char **argv) {
for (int i = 0; i < NUM_PEOPLE; i++) {
person_t new_person;
get_person_info(&new_person);
}
return 0;
}
where get_person_info() just fills out the person_t struct to which a pointer is passed in. Is it necessary to malloc() memory for new_person within main()? That is, should the line
person_t new_person;
instead be
person_t *new_person = (person_t *) malloc(sizeof(person_t));
and then change get_person_info() to accept a person_t ** instead of a person_t *?
Sorry if this question is confusing -- I'm not sure whether or not this is a case where it is necessary to reserve memory, given that a pointer to that memory is passed into get_person_info() to avoid causing a segmentation fault.
Both are correct, it depends on where you want to use the person_info.
Allocating on the stack :
for (int i = 0; i < NUM_PEOPLE; i++) {
person_t new_person;
get_person_info(&new_person);
}
Creates a person_t object on the stack and fills the new_person object with data, because the loop only does that, the object goes out of scope on the next loop iteration and the data is lost.
Using malloc :
for (int i = 0; i < NUM_PEOPLE; i++) {
person_t *new_person = malloc(sizeof(person_t));
get_person_info(new_person);
}
Creates a person_t object on the heap and fills it with data, because its allocated on the heap the new_person object will outlive the loop scope which currently means that you're leaking memory because you have no pointer pointing at the data of the person_t object of the previous loop cycle.
Both ways are correct !!
person_t *new_person = (person_t *) malloc(sizeof(person_t));
and then change get_person_info() to accept a person_t ** instead of a person_t *?
you don't need to change parameter of function -void get_person_infperson_t *person);.Just pass pointer to it in main like this -
get_person_info(new_person);
But in previous way without allocating memory , you won't be able to use it outside the block it is defined in whereas if your program depend on its life you can allocate memory to it on heap.
In your code you posted new_person is used inside loop only so if you don't intend to use to outside loop you probably won't need dynamic allocation .
But if you want to use it outside loop also you should use dynamic allocation. But don't forget to free it.
Not sure whether or not to malloc memory for a struct?
The short answer is: no need to do it in your case. If you want to use your object outside the forloop you could do it by dynamically allocated memory, namely:
person_t *new_person = malloc(sizeof(person_t));
and then call it with:
get_person_info(new_person);
In you example, the object is used within the loop, thus there is no need to do it.
Note:
when you use dynamically allocated memory you should always free it, at the end to avoid memory leaks.
Edit:
As pointed out by #Johann Gerell, after removing the redundancy of the casting of the return type of malloc, in C, the allocation would look like:
person_t *new_person = malloc(sizeof(person_t));
malloc returns a void pointer (void *), which indicates that it is a pointer to a region of unknown data type. The use of casting is required in C++ due to the strong type system, whereas this is not the case in C.
Your confusion stems from not understanding object storage duration and pointers well. Let's see each one separately to get some clarity.
Storage Duration
An object can have automatic or dynamic storage duration.
Automatic
Automatic, as the name says, would be managed by the compiler for you. You just define a variable, use it and when it goes out of scope the object is destroyed automatically for you. A simple example:
if (flag) {
int i = 0;
/* some calc. involving i */
}
// i is dead here; it cannot be accessed and its storage is reclaimed
When the control enters the if's scope, memory large enough to hold an int will be allocated automatically and assigned the value 0. Once your use of i is over, when the control exits the scope, the name i goes out of scope and thus will no longer be accessible by the program and also its storage area allocated automatically for you would be reclaimed.
Dynamic
Lets say you want to have objects dynamically allocated i.e. you want to manage the storage and thereby the lifetime of the object without the scope or the compiler coming in your way, then you'd go on by requesting storage space from the platform using malloc
malloc(sizeof(int));
Notice that we're not assigning the return value of malloc to any pointer as you're used to seeing. We'll get to pointers in a bit, lets finish dynamic objects now. Here, space large enough to hold an int is handed over to you by malloc. It's up to you to free it when you're done with it. Thus the lifetime of this unnamed int object is in your hands and would live beyond the scope of the code that created it. It would end only when you explicitly call free. Without a matching free call getting called, you'd have the infamous memory leak.
Pointers
A pointer is just what its name says - an object that can refer to another object. A pointer is never what it is pointing at (pointee). A pointer is an object and its pointee is another separate, independent object. You may make a pointer point to another named object, unnamed object, or nothing (NULL).
int i = 0;
int *ptr1 = &i; // ptr1 points to the automatic int object i
int *ptr2 = malloc(sizeof(int)); // ptr2 points to some unnamed int object
int *ptr3 = NULL; // ptr3 points to nothing
Thus the reason most people confuse pointers for dynamically allocated pointees comes from this: the pointee, here, doesn't have a name and hence they're referred to always via their pointers; some people mistake one for the other.
Function Interface
The function taking a pointer is appropriate here, since from the caller's viewpoint it's a flexible function: it can take both automatic and dynamic objects. I can create an automatic variable and pass it in, or I can pass a dynamic variable too:
void get_person_info(person_t *person);
person_t o { };
get_person_info(&a);
person_t *p = malloc(sizeof(person_t));
get_person_info(p);
free(p);
Is it necessary to malloc() memory for new_person within main()?
No. You can define an automatic variable and pass it to the function. In fact it's recommended that you try to minimize your usage of dynamic objects and prefer automatic objects since
It minimizes the chances of memory leaks in your code. Even seasoned programmers miss calling the matching free to a malloc thereby introducing a memory leak.
Dynamic object allocation/deallocation is far slower than automatic variable allocation/deallocation.
A lot of dynamic allocation deallocation causes memory fragmentation.
However, automatic variables are generally allocated in the stack and thus the upper limit on the number and size on how much you can create on the stack is relatively lower than what you can allocate dynamically (generally from the heap).
change get_person_info() to accept a person_t ** instead of a person_t *?
No, if you did so, the option of passing automatic variables would still be possible but cumbersome:
void foo(int **o);
int i = 0;
int *p = &i; // p is redundant
foo(&p);
int *p = malloc(sizeof(int));
foo(&p);
As opposed the simpler
void bar(int *o);
int i = 0;
bar(&i);
int *p = malloc(sizeof(int));
bar(p);

How to locally allocate array-pointer in C?

Think of a pointer-datatype, for instance to a floating-pointer number.
typedef float* flPtrt;
How would I allocate an array of 3 elements in the local scope? I guess using malloc and free withing the same scope produces overhead, but what's the alternative?
void foo() {
flPtrt ptr = malloc(sizeof(float)*3);
// ...
free(ptr);
}
If 3 is known at compile time and is small enough, you can declare a local array and use it as a pointer
void foo() {
float array[3];
flPtrt ptr = array;
}
If the size is bigger or variable, you have to use dynamic memory as in your example.
I think what your'e looking for is the alloca() function.
I'm not sure it's standard C, but it exists in GNU, and it worked on my visual studio.
So this is how you use it:
int n = 5;
int* a = (int*) alloca(sizeof(int) * n);
It creates an array of elements on the stack (rather than on the heap with malloc).
Advantages: less overhead, no need to free manually (when you return from your method, the stack folds back and the memory is lost)
Disadvantage: If you want to return a pointer from a method NEVER use alloca, since you will be pointing at something that no longer exists after exiting the function. One can also argue that the stack is usually smaller than the heap, so if you want larger space use malloc.
See more here
If you know the required size of the array ahead of time, you could just allocate it as a stack variable and avoid heap memory management.
Otherwise, the approach you outlined is appropriate and there is not really an alternative.
Use an array.
void foo(void) // note that "void foo()" is obsolete
{
float data[3];
float *ptr = data;
// ...
}

Resources