using malloc over array - c

May be similar question found on SO. But, I didn't found that, here is the scenario
Case 1
void main()
{
char g[10];
char a[10];
scanf("%[^\n] %[^\n]",a,g);
swap(a,g);
printf("%s %s",a,g);
}
Case 2
void main()
{
char *g=malloc(sizeof(char)*10);
char *a=malloc(sizeof(char)*10);
scanf("%[^\n] %[^\n]",a,g);
swap(a,g);
printf("%s %s",a,g);
}
I'm getting same output in both case. So, my question is when should I prefer malloc() instead of array or vice-verse and why ?? I found common definition, malloc() provides dynamic allocation. So, it is the only difference between them ?? Please any one explain with example, what is the meaning of dynamic although we are specifying the size in malloc().

The principle difference relates to when and how you decide the array length. Using fixed length arrays forces you to decide your array length at compile time. In contrast using malloc allows you to decide the array length at runtime.
In particular, deciding at runtime allows you to base the decision on user input, on information not known at the time you compile. For example, you may allocate the array to be a size big enough to fit the actual data input by the user. If you use fixed length arrays, you have to decide at compile time an upper bound, and then force that limitation onto the user.
Another more subtle issue is that allocating very large fixed length arrays as local variables can lead to stack overflow runtime errors. And for that reason, you sometimes prefer to allocate such arrays dynamically using malloc.

Please any one explain with example, what is the meaning of dynamic although we are specifying the size.
I suspect this was significant before C99. Before C99, you couldn't have dynamically-sized auto arrays:
void somefunc(size_t sz)
{
char buf[sz];
}
is valid C99 but invalid C89. However, using malloc(), you can specify any value, you don't have to call malloc() with a constant as its argument.
Also, to clear up what other purpose malloc() has: you can't return stack-allocated memory from a function, so if your function needs to return allocated memory, you typically use malloc() (or some other member of the malloc familiy, including realloc() and calloc()) to obtain a block of memory. To understand this, consider the following code:
char *foo()
{
char buf[13] = "Hello world!";
return buf;
}
Since buf is a local variable, it's invalidated at the end of its enclosing function - returning it results in undefined behavior. The function above is erroneous. However, a pointer obtained using malloc() remains valid through function calls (until you don't call free() on it):
char *bar()
{
char *buf = malloc(13);
strcpy(buf, "Hello World!");
return buf;
}
This is absolutely valid.

I would add that in this particular example, malloc() is very wasteful, as there is more memory allocated for the array than what would appear [due to overhead in malloc] as well as the time it takes to call malloc() and later free() - and there's overhead for the programmer to remember to free it - memory leaks can be quite hard to debug.
Edit: Case in point, your code is missing the free() at the end of main() - may not matter here, but it shows my point quite well.
So small structures (less than 100 bytes) should typically be allocated on the stack. If you have large data structures, it's better to allocate them with malloc (or, if it's the right thing to do, use globals - but this is a sensitive subject).
Clearly, if you don't know the size of something beforehand, and it MAY be very large (kilobytes in size), it is definitely a case of "consider using malloc".
On the other hand, stacks are pretty big these days (for "real computers" at least), so allocating a couple of kilobytes of stack is not a big deal.

Related

Overwriting global char array vs global char pointer in a loop in C

I have an infinite while loop, I am not sure if I should use a char array or char pointer. The value keeps getting overwritten and used in other functions. With a char pointer, I understand there could be a memory leak, so is it preferred to use an array?
char *recv_data = NULL;
int main(){
.....
while(1){
.....
recv_data = cJSON_PrintUnformatted(root);
.....
}
}
or
char recv[256] = {0};
int main(){
.....
while(1){
.....
strcpy(recv, cJSON_PrintUnformatted(root));
.....
}
}
The first version should be preferred.
It doesn't have a limit on the size of the returned string.
You can use free(recv_data) to fix the memory leak.
The second version has these misfeatures:
The memory returned from the function can't be freed, because you never assigned it to a variable that you can pass to free().
It's a little less efficient, since it performs an unnecessary copy.
Based on how you used it, the cJSON_PrintUnformatted returns a pointer to a char array. Since there are no input arguments, it probably allocates memory inside the function dynamically. You probably have to free that memory. So you need the returned pointer in order to deallocate the memory yourself.
The second option discards that returned pointer, and so you lost your only way to free the allocated memroy. Hence it will remain allocated -> memroy leak.
But of course this all depends on how the function is implemented. Maybe it just manipulates a global array and return a pointer to it, so there is no need to free it.
Indeed, the second version has a memory leak, as #Barmar points out.
However, even if you were to fix the memory leak, you still can't really use the first version of your code: With the first version, you have to decide at compile-time what the maximum length of the string returned by cJSON_PrintUnformatted(). Now,
If you choose a value that's too low, the strcpy() function would exceed the array bounds and corrupt your stack.
If you choose a value that's so high as to be safe - you might have to exceed the amount of space available for your program's stack, causing a Stack Overflow (yes, like the name of this site). You could fix that using a strncpy(), giving the maximum size - and then what you'd have is a truncated string.
So you really don't have much choice than using whatever memory is pointed to by the cJSON_PrintUnformatted()'s return value (it's probably heap-allocated memory). Plus - why make a copy of it when it's already there for you to use? Be lazy :-)
PS - What should really happen is for the cJSON_PrintUnformatted() to take a buffer and a buffer size as parameters, giving its caller more control over memory allocation and resource limits.

Weird situation when returning char *

I am pretty new to C programming and I have several functions returning type char *
Say I declare char a[some_int];, and I fill it later on. When I attempt to return it at the end of the function, it will only return the char at the first index. One thing I noticed, however, is that it will return the entirety of a if I call any sort of function on it prior to returning it. For example, my function to check the size of a string (calling something along the lines of strLength(a);).
I'm very curious what the situation is with this exactly. Again, I'm new to C programming (as you probably can tell).
EDIT: Additionally, if you have any advice concerning the best method of returning this, please let me know. Thanks!
EDIT 2: For example:
I have char ret[my_strlen(a) + my_strlen(b)]; in which a and b are strings and my_strlen returns their length.
Then I loop through filling ret using ret[i] = a[i]; and incrementing.
When I call my function that prints an input string (as a test), it prints out how I want it, but when I do
return ret;
or even
char *ptr = ret;
return ptr;
it never supplies me with the full string, just the first char.
A way not working to return a chunk of char data is to return it in memory temporaryly allocated on the stack during the execution of your function and (most probably) already used for another purpose after it returned.
A working alternative would be to allocate the chunk of memory ont the heap. Make sure you read up about and understand the difference between stack and heap memory! The malloc() family of functions is your friend if you choose to return your data in a chunk of memory allocated on the heap (see man malloc).
char* a = (char*) malloc(some_int * sizeof(char)) should help in your case. Make sure you don't forget to free up memory once you don't need it any more.
char* ret = (char*) malloc((my_strlen(a) + my_strlen(b)) * sizeof(char)) for the second example given. Again don't forget to free once the memory isn't used any more.
As MByD correctly pointed out, it is not forbidden in general to use memory allocated on the stack to pass chunks of data in and out of functions. As long as the chunk is not allocated on the stack of the function returning this is also quite well.
In the scenario below function b will work on a chunk of memory allocated on the stackframe created, when function a entered and living until a returns. So everything will be pretty fine even though no memory allocated on the heap is involved.
void b(char input[]){
/* do something useful here */
}
void a(){
char buf[BUFFER_SIZE];
b(buf)
/* use data filled in by b here */
}
As still another option you may choose to leave memory allocation on the heap to the compiler, using a global variable. I'd count at least this option to the last resort category, as not handled properly, global variables are the main culprits in raising problems with reentrancy and multithreaded applications.
Happy hacking and good luck on your learning C mission.

Whats the difference between these array declarations in C?

People seem to say how malloc is so great when using arrays and you can use it in cases when you don't know how many elements an array has at compile time(?). Well, can't you do that without malloc? For example, if we knew we had a string that had max length 10 doesn't the following do close enough to the same thing?... Besides being able to free the memory that is.
char name[sizeof(char)*10];
and
char *name = malloc(sizeof(char)*10);
The first creates an array of chars on the stack. The length of the array will be sizeof(char)*10, but seeing as char is defined by the standard of being 1 in size, you could just write char name[10];
If you want an array, big enough to store 10 ints (defined per standard to be at least 2 bytes in size, but most commonly implemented as 4 bytes big), int my_array[10] works, too. The compiler can work out how much memory will be required anyways, no need to write something like int foo[10*sizeof(int)]. In fact, the latter will be unpredictable: depending on sizeof(int), the array will store at least 20 ints, but is likely to be big enough to store 40.
Anyway, the latter snippet calls a function, malloc wich will attempt to allocate enough memory to store 10 chars on the heap. The memory is not initialized, so it'll contain junk.
Memory on the heap is slightly slower, and requires more attention from you, who is writing the code: you have to free it explicitly.
Again: char is guaranteed to be size 1, so char *name = malloc(10); will do here, too. However, when working with heap memory, I -and I'm not alone in this- prefer to allocate the memory like so some_ptr = malloc(10*sizeof *some_ptr); using *some_ptr, is like saying 10 times the size of whatever type this pointer will point to. If you happen to change the type later on, you don't have to refactor all malloc calls.
General rule of thumb, to answer your question "can you do without malloc", is that you don't use malloc, unless you have to.
Stack memory is faster, and easier to use, but it is less abundant. This site was named after a well-known issue you can run into when you've pushed too much onto the stack: it overflows.
When you run your program, the system will allocate a chunk of memory that you can use freely. This isn't much, but plenty for simple computations and calling functions. Once you run out, you'll have to resort to allocating memory from the heap.
But in this case, an array of 10 chars: use the stack.
Other things to consider:
An array is a contguous block of memory
A pointer doesn't know/can't tell you how big a block of memory was allocated (sizeof(an_array)/sizeof(type) vs sizeof(a_pointer))
An array's declaration does not require the use of sizeof. The compiler works out the size for you: <type> my_var[10] will reserve enough memory to hold 10 elements of the given type.
An array decays into a pointer, most of the time, but that doesn't make them the same thing
pointers are fun, if you know what you're doing, but once you start adding functions, and start passing pointers to pointers to pointers, or a pointer to a pointer to a struct, that has members that are pointers... your code won't be as jolly to maintain. Starting off with an array, I find, makes it easier to come to grips with the code, as it gives you a starting point.
this answer only really applies to the snippets you gave, if you're dealing with an array that grows over time, than realloc is to be preferred. If you're declaring this array in a recursive function, that runs deep, then again, malloc might be the safer option, too
Check this link on differences between array and pointers
Also take a look at this question + answer. It explains why a pointer can't give you the exact size of the block of memory you're working on, and why an array can.
Consider that an argument in favour of arrays wherever possible
char name[sizeof(char)*10]; // better to use: char name[10];
Statically allocates a vector of sizeof(char)*10 char elements, at compile time. The sizeof operator is useless because if you allocate an array of N elements of type T, the size allocated will already be sizeof(T)*N, you don't need to do the math. Stack allocated and no free needed. In general, you use char name[10] when you already know the size of the object you need (the length of the string in this case).
char *name = malloc(sizeof(char)*10);
Allocates 10 bytes of memory in the heap. Allocation is done at run time, you need to free the result.
char name[sizeof(char)*10];
The first one is allocated on the stack, once it goes out of scope memory gets automatically freed. You can't change the size of the first one.
char *name = malloc(sizeof(char)*10);
The second one is allocated on the heap and should be freed with free. It will stick around otherwise for the lifetime of your application. You can reallocate memory for the second one if you need.
The storage duration is different:
An array created with char name[size] exists for the entire duration of program execution (if it is defined at file scope or with static) or for the execution of the block it is defined in (otherwise). These are called static storage duration and automatic storage duration.
An array created with malloc(size) exists for just as long as you specify, from the time you call malloc until the time you call free. Thus, it can be made to use space only while you need it, unlike static storage duration (which may be too long) or automatic storage duration (which may be too short).
The amount of space available is different:
An array created with char name[size] inside a function uses the stack in typical C implementations, and the stack size is usually limited to a few megabytes (more if you make special provisions when building the program, typically less in kernel software and embedded systems).
An array created with malloc may use gigabytes of space in typical modern systems.
Support for dynamic sizes is different:
An array created with char name[size] with static storage duration must have a size specified at compile time. An array created with char name[size] with automatic storage duration may have a variable length if the C implementation supports it (this was mandatory in C 1999 but is optional in C 2011).
An array created with malloc may have a size computed at run-time.
malloc offers more flexibility:
Using char name[size] always creates an array with the given name, either when the program starts (static storage duration) or when execution reaches the block or definition (automatic).
malloc can be used at run-time to create any number of arrays (or other objects), by using arrays of pointers or linked lists or trees or other data structures to create a multitude of pointers to objects created with malloc. Thus, if your program needs a thousand separate objects, you can create an array of a thousand pointers and use a loop to allocate space for each of them. In contrast, it would be cumbersome to write a thousand char name[size] definitions.
First things first: do not write
char name[sizeof(char)*10];
You do not need the sizeof as part of the array declaration. Just write
char name[10];
This declares an array of 10 elements of type char. Just as
int values[10];
declares an array of 10 elements of type int. The compiler knows how much space to allocate based on the type and number of elements.
If you know you'll never need more than N elements, then yes, you can declare an array of that size and be done with it, but:
You run the risk of internal fragmentation; your maximum number of bytes may be N, but the average number of bytes you need may be much smaller than that. For example, let's say you want to store 1000 strings of max length 255, so you declare an array like
char strs[1000][256];
but it turns out that 900 of those strings are only 20 bytes long; you're wasting a couple of hundred kilobytes of space1. If you split the difference and stored 1000 pointers, then allocated only as much space as was necessary to store each string, then you'd wind up wasting a lot less memory:
char *strs[1000];
...
strs[i] = strdup("some string"); // strdup calls malloc under the hood
...
Stack space is also limited relative to heap space; you may not be able to declare arbitrarily large arrays (as auto variables, anway). A request like
long double huge[10000][10000][10000][10000];
will probably cause your code to crash at runtime, because the default stack size isn't large enough to accomodate it2.
And finally, most situations fall into one of three categories: you have 0 elements, you have exactly 1 element, or you have an unlimited number of elements. Allocating large enough arrays to cover "all possible scenarios" just doesn't work. Been there, done that, got the T-shirt in multiple sizes and colors.
1. Yes, we live in the future where we have gigabytes of address space available, so wasting a couple of hundred KB doesn't seem like a big deal. The point is still valid, you're wasting space that you don't have to.
2. You could declare very large arrays at file scope or with the static keyword; this will allocate the array in a different memory segment (neither stack nor heap). The problem is that you only have that single instance of the array; if your function is meant to be re-entrant, this won't work.

When and why to use malloc

Well, I can't understand when and why it is needed to allocate memory using malloc.
Here is my code:
#include <stdlib.h>
int main(int argc, const char *argv[]) {
typedef struct {
char *name;
char *sex;
int age;
} student;
// Now I can do two things
student p;
// Or
student *ptr = (student *)malloc(sizeof(student));
return 0;
}
Why is it needed to allocate memory when I can just use student p;?
malloc is used for dynamic memory allocation. As said, it is dynamic allocation which means you allocate the memory at run time. For example, when you don't know the amount of memory during compile time.
One example should clear this. Say you know there will be maximum 20 students. So you can create an array with static 20 elements. Your array will be able to hold maximum 20 students. But what if you don't know the number of students? Say the first input is the number of students. It could be 10, 20, 50 or whatever else. Now you will take input n = the number of students at run time and allocate that much memory dynamically using malloc.
This is just one example. There are many situations like this where dynamic allocation is needed.
Have a look at the man page malloc(3).
You use malloc when you need to allocate objects that must exist beyond the lifetime of execution of the current block (where a copy-on-return would be expensive as well), or if you need to allocate memory greater than the size of that stack (i.e., a 3 MB local stack array is a bad idea).
Before C99 introduced VLAs, you also needed it to perform allocation of a dynamically-sized array. However, it is needed for creation of dynamic data structures like trees, lists, and queues, which are used by many systems. There are probably many more reasons; these are just a few.
Expanding the structure of the example a little, consider this:
#include <stdio.h>
int main(int argc, const char *argv[]) {
typedef struct {
char *name;
char *sex;
char *insurance;
int age;
int yearInSchool;
float tuitionDue;
} student;
// Now I can do two things
student p;
// Or
student *p = malloc(sizeof *p);
}
C is a language that implicitly passes by value, rather than by reference. In this example, if we passed 'p' to a function to do some work on it, we would be creating a copy of the entire structure. This uses additional memory (the total of how much space that particular structure would require), is slower, and potentially does not scale well (more on this in a minute). However, by passing *p, we don't pass the entire structure. We only are passing an address in memory that refers to this structure. The amount of data passed is smaller (size of a pointer), and therefore the operation is faster.
Now, knowing this, imagine a program (like a student information system) which will have to create and manage a set of records in the thousands, or even tens of thousands. If you pass the whole structure by value, it will take longer to operate on a set of data, than it would just passing a pointer to each record.
Let's try and tackle this question considering different aspects.
Size
malloc allows you to allocate much larger memory spaces than the one allocated simply using student p; or int x[n];. The reason being malloc allocates the space on heap while the other allocates it on the stack.
The C programming language manages memory statically, automatically, or dynamically. Static-duration variables are allocated in main memory, usually along with the executable code of the program, and persist for the lifetime of the program; automatic-duration variables are allocated on the stack and come and go as functions are called and return. For static-duration and automatic-duration variables, the size of the allocation must be compile-time constant (except for the case of variable-length automatic arrays[5]). If the required size is not known until run-time (for example, if data of arbitrary size is being read from the user or from a disk file), then using fixed-size data objects is inadequate. (from Wikipedia)
Scope
Normally, the declared variables would get deleted/freed-up after the block in which it is declared (they are declared on the stack). On the other hand, variables with memory allocated using malloc remain till the time they are manually freed up.
This also means that it is not possible for you to create a variable/array/structure in a function and return its address (as the memory that it is pointing to, might get freed up). The compiler also tries to warn you about this by giving the warning:
Warning - address of stack memory associated with local variable 'matches' returned
For more details, read this.
Changing the Size (realloc)
As you may have guessed, it is not possible by the normal way.
Error detection
In case memory cannot be allocated: the normal way might cause your program to terminate while malloc will return a NULL which can easily be caught and handled within your program.
Making a change to string content in future
If you create store a string like char *some_memory = "Hello World"; you cannot do some_memory[0] = 'h'; as it is stored as string constant and the memory it is stored in, is read-only. If you use malloc instead, you can change the contents later on.
For more information, check this answer.
For more details related to variable-sized arrays, have a look at this.
malloc = Memory ALLOCation.
If you been through other programming languages, you might have used the new keyword.
Malloc does exactly the same thing in C. It takes a parameter, what size of memory needs to be allocated and it returns a pointer variable that points to the first memory block of the entire memory block, that you have created in the memory. Example -
int *p = malloc(sizeof(*p)*10);
Now, *p will point to the first block of the consecutive 10 integer blocks reserved in memory.
You can traverse through each block using the ++ and -- operator.
In this example, it seems quite useless indeed.
But now imagine that you are using sockets or file I/O and must read packets from variable length which you can only determine while running. Or when using sockets and each client connection need some storage on the server. You could make a static array, but this gives you a client limit which will be determined while compiling.

When should I use malloc in C and when don't I?

I understand how malloc() works. My question is, I'll see things like this:
#define A_MEGABYTE (1024 * 1024)
char *some_memory;
size_t size_to_allocate = A_MEGABYTE;
some_memory = (char *)malloc(size_to_allocate);
sprintf(some_memory, "Hello World");
printf("%s\n", some_memory);
free(some_memory);
I omitted error checking for the sake of brevity. My question is, can't you just do the above by initializing a pointer to some static storage in memory? perhaps:
char *some_memory = "Hello World";
At what point do you actually need to allocate the memory yourself instead of declaring/initializing the values you need to retain?
char *some_memory = "Hello World";
is creating a pointer to a string constant. That means the string "Hello World" will be somewhere in the read-only part of the memory and you just have a pointer to it. You can use the string as read-only. You cannot make changes to it. Example:
some_memory[0] = 'h';
Is asking for trouble.
On the other hand
some_memory = (char *)malloc(size_to_allocate);
is allocating a char array ( a variable) and some_memory points to that allocated memory. Now this array is both read and write. You can now do:
some_memory[0] = 'h';
and the array contents change to "hello World"
For that exact example, malloc is of little use.
The primary reason malloc is needed is when you have data that must have a lifetime that is different from code scope. Your code calls malloc in one routine, stores the pointer somewhere and eventually calls free in a different routine.
A secondary reason is that C has no way of knowing whether there is enough space left on the stack for an allocation. If your code needs to be 100% robust, it is safer to use malloc because then your code can know the allocation failed and handle it.
malloc is a wonderful tool for allocating, reallocating and freeing memory at runtime, compared to static declarations like your hello world example, which are processed at compile-time and thus cannot be changed in size.
Malloc is therefore always useful when you deal with arbitrary sized data, like reading file contents or dealing with sockets and you're not aware of the length of the data to process.
Of course, in a trivial example like the one you gave, malloc is not the magical "right tool for the right job", but for more complex cases ( creating an arbitrary sized array at runtime for example ), it is the only way to go.
If you don't know the exact size of the memory you need to use, you need dynamic allocation (malloc). An example might be when a user opens a file in your application. You will need to read the file's contents into memory, but of course you don't know the file's size in advance, since the user selects the file on the spot, at runtime. So basically you need malloc when you don't know the size of the data you're working with in advance. At least that's one of the main reasons for using malloc. In your example with a simple string that you already know the size of at compile time (plus you don't want to modify it), it doesn't make much sense to dynamically allocate that.
Slightly off-topic, but... you have to be very careful not to create memory leaks when using malloc. Consider this code:
int do_something() {
uint8_t* someMemory = (uint8_t*)malloc(1024);
// Do some stuff
if ( /* some error occured */ ) return -1;
// Do some other stuff
free(someMemory);
return result;
}
Do you see what's wrong with this code? There's a conditional return statement between malloc and free. It might seem okay at first, but think about it. If there's an error, you're going to return without freeing the memory you allocated. This is a common source of memory leaks.
Of course this is a very simple example, and it's very easy to see the mistake here, but imagine hundreds of lines of code littered with pointers, mallocs, frees, and all kinds of error handling. Things can get really messy really fast. This is one of the reasons I much prefer modern C++ over C in applicable cases, but that's a whole nother topic.
So whenever you use malloc, always make sure your memory is as likely to be freed as possible.
char *some_memory = "Hello World";
sprintf(some_memory, "Goodbye...");
is illegal, string literals are const.
This will allocate a 12-byte char array on the stack or globally (depending on where it's declared).
char some_memory[] = "Hello World";
If you want to leave room for further manipulation, you can specify that the array should be sized larger. (Please don't put 1MB on the stack, though.)
#define LINE_LEN 80
char some_memory[LINE_LEN] = "Hello World";
strcpy(some_memory, "Goodbye, sad world...");
printf("%s\n", some_memory);
One reason when it is necessary to allocate the memory is if you want to modify it at runtime. In that case, a malloc or a buffer on the stack can be used. The simple example of assigning "Hello World" to a pointer defines memory that "typically" cannot be modified at runtime.

Resources