Questions about dynamic memory allocation in C - c

What is the difference between
int size;
int *arr;
scanf("%i", &size);
arr = malloc(size * sizeof(*arr));
and
int size;
scanf("%i", &size);
int arr[size];
When I want to allocate memory for 2 big numbers i would use the
following code:
unsigned long *big_nums;
big_nums = malloc(2 * sizeof(*big_nums));
I would access the first big bumber using big_nums[0] an the seond
one with big_nums[1]. Let's say unsigned long is 4 bytes big,
then the code would allocate 2 * 4 = 8 bytes. Let's say I do
something like this:
unsigned long *big_nums;
big_nums = malloc(7);
Using big_nums[0] is clear for me, but how about big_nums[1]? Will
it cause some kind of segmentation fault error or what?

There are two places to get memory from: the stack and the heap. The stack is where you allocate short lived things, and the heap is for allocating long term things.
malloc() allocates from the heap, and int arr[size] allocates from the stack.
When your function exits, arr[size] will be disposed of automatically, but malloc() will not. This leads to what's called "memory leaks".
big_nums = malloc(7);
will indeed be an error if you access big_nums[1]. In general the standard says behavior is "undefined" which means it could work, or might not.

For Q#1: The second version will (try to) allocate a variable-length array on the stack. In C99 onwards, this is possible; but in traditional C variable-length arrays don't exist, and you must roll them yourself using malloc.

For Q#2: You will be allowed to make that error. And when you write to the second element of the array, you will overwrite one byte that does not "belong" to you.
My guess is that in most cases, nothing bad will happen because malloc(7) will secretly be equivalent to malloc(8). But there is NO GUARANTEE of this. Anything could happen, including a segfault or something worse.
By the way, if you have two separate questions, it would be best to write them up as two separate questions. You get more points way.

Related

Dynamic allocation with malloc

In what ways is
int main(int argc, char **argv)
{
int N = atoi(argv[1]);
int v[N];
...some code here...
}
better than
int main(int argc, char **argv)
{
int count = atoi(argv[1]);
int *N = malloc(count * sizeof(int));
...some code here...
free(N);
}
?
Is it less error-prone?
P.S: I know where the two will typically be allocated.
Edit: Initialized count.
It's not "better" per se. It does a different thing:
int v[count];
is a variable-length array in C99 and later. That means it has a lifetime: The moment it goes out of scope (the function in which it was created is finished), it's forgotten.
int *N = malloc(count * sizeof(int));
on the other hand allocates that memory on the heap. That means it lives as long as you do not explicitly free it (or the program ends, which is at the end of your main, anyway, so this might not be the best code to illustrate the difference on)!
Now, having to remember that you have allocated some memory that you need to deallocate later is a hassle. Programmers get that wrong all the time, and that leads to either memory leaks (you forgetting to free memory you're not using anymore, leading to the memory used by a program growing over time) or use-after-free bugs (you using memory after you've already freed it, which has undefined results and can lead to serious bugs, including security problems).
On the other hand, your stack space is small. Like 4 kB or 8 kB. Try passing 1048576 to your program as argv[1]! That's a meager 1 million ints you're creating. And that probably means you're allocating 4 MB of RAM. 4 MB, compared to the Gigabytes of RAM your computer most likely has, that's simply not much data, right?
But your int v[1048576] would plainly fail, because that's more than a function is allowed to have "scratchpad" space for itself. So, in general, for variable-length things where you cannot know before that they are really small, your second solution with malloc is "better", because it actually works.
General remark: Memory management in C is hard. That's why languages (C++ and others) have come up with solutions where you can have something that can be made aware of its own lifetime, so that you cannot forget to delete something, because the object itself knows when it is no longer used.

why use malloc function in c when we can declare arrays using arr[size] ,taking input from the user for size?

why use malloc function when we can write the code in c like this :
int size;
printf("please the size of the array\n");
scanf("%d",&size);
int arr[size];
this eliminates the possibility of assigning garbage value to array size and is also taking the size of the array at run time ...
so why use dynamic memory allocation at all when it can be done like this ?
This notation
int arr[size];
means VLA - Variable-Length Array.
Standard way they are implemented is that they are allocated on stack.
What is wrong with it?
Stack is usually relatively small - on my linux box it is only 8MB.
So if you try to run following code
#include <stdio.h>
const int MAX_BUF=10000000;
int main()
{
char buf[MAX_BUF];
int idx;
for( idx = 0 ; idx < MAX_BUF ; idx++ )
buf[idx]=10;
}
it will end up with seg fault.
TL;DR version
PRO:
VLA are OK for small allocations. You don't have to worry about freeing memory when leaving scope.
AGAINST:
They are unsafe for big allocations. You can't tell what is safe size to allocate (say recursion).
Besides the fact that VLA may encounter problems when their size is too large, there is a much more important thing with these: scope.
A VLA is allocated when the declaration is encountered and deallocated when the scope (the { ... }) is left. This has advantages (no function call needed for both operations) and disadvantages (you can't return it from a function or allocate several objects).
malloc allocates dynamically, so the memory chunk persists after return from the function you happen to be in, you can allocated with malloc several times (e.g in a for loop) and you determine exactly when you deallocate (by a call to free).
Why to not use the following:
int size;
printf("please the size of the array\n");
scanf("%d",&size);
int arr[size];
Insufficient memory. int arr[size]; may exceed resources and this goes undetected. #Weather Vane Code can detect failure with a NULL check using *alloc().
int *arr = malloc(sizeof *arr * size);
if (arr == NULL && size > 0) Handle_OutOfMemory();
int arr[size]; does not allow for an array size of 0. malloc(sizeof *arr * 0); is not a major problem. It may return NULL or a pointer on success, yet that can easily be handled.
Note: For array sizes, type size_t is best which is some unsigned integer type - neither too narrow, nor too wide. int arr[size]; is UB if size < 0. It is also a problem with malloc(sizeof *arr * size). An unqualified size is not a good idea with variable length array (VLA) nor *alloc().
VLAs, required since C99 are only optionally supported in a compliant C11 compiler.
What you write is indeed a possibility nowadays, but if you do that with g++ it will issue warnings (which is generally a bad thing).
Other thing is your arr[size] is stored at stack, while malloc stores data at heap giving you much more space.
With that is connected probably the main issue and that is, you can actually change size of your malloc'd arrays with realloc or free and another malloc. Your array is there for the whole stay and you cannot even free it at some point to save space.

malloc non-deterministic behaviour

#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *arr = (int*)malloc(10);
int i;
for(i=0;i<100;i++)
{
arr[i]=i;
printf("%d", arr[i]);
}
return 0;
}
I am running above program and a call to malloc will allocate 10 bytes of memory and since each int variable takes up 2 bytes so in a way I can store 5 int variables of 2 bytes each thus making up my total 10 bytes which I dynamically allocated.
But on making a call to for-loop it is allowing me to enter values even till 99th index and storing all these values as well. So in a way if I am storing 100 int values it means 200 bytes of memory whereas I allocated only 10 bytes.
So where is the flaw with this code or how does malloc behave? If the behaviour of malloc is non-deterministic in such a manner then how do we achieve proper dynamic memory handling?
The flaw is in your expectations. You lied to the compiler: "I only need 10 bytes" when you actually wrote 100*sizeof(int) bytes. Writing beyond an allocated area is undefined behavior and anything may happen, ranging from nothing to what you expect to crashes.
If you do silly things expect silly behaviour.
That said malloc is usually implemented to ask the OS for chunks of memory that the OS prefers (like a page) and then manages that memory. This speeds up future mallocs especially if you are using lots of mallocs with small sizes. It reduces the number of context switches that are quite expensive.
First of all, in the most Operating Systems the size of int is 4 bytes. You can check that with:
printf("the size of int is %d\n", sizeof(int));
When you call the malloc function you allocate size at heap memory. The heap is a set aside for dynamic allocation. There's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time. Because your program is small and you have no collision in the heap you can run this for with more values that 100 and it runs too.
When you know what are you doing with malloc then you build programs with proper dynamic memory handling. When your code has improper malloc allocation then the behaviour of the program is "unknown". But you can use gdb debugger to find where the segmentation will be revealed and how the things are in heap.
malloc behaves exactly as it states, allocates n number bytes of memory, nothing more. Your code might run on your PC, but operating on non-allocated memory is undefined behavior.
A small note...
Int might not be 2 bytes, it varies on different architectures/SDKs. When you want to allocate memory for n integer elements, you should use malloc( n * sizeof( int ) ).
All in short, you manage dynamic memory with other tools that the language provides ( sizeof, realloc, free, etc. ).
C doesn't do any bounds-checking on array accesses; if you define an array of 10 elements, and attempt to write to a[99], the compiler won't do anything to stop you. The behavior is undefined, meaning the compiler isn't required to do anything in particular about that situation. It may "work" in the sense that it won't crash, but you've just clobbered something that may cause problems later on.
When doing a malloc, don't think in terms of bytes, think in terms of elements. If you want to allocate space for N integers, write
int *arr = malloc( N * sizeof *arr );
and let the compiler figure out the number of bytes.

Basic array usage in C?

Is this how you guys get size of an array in ANSI-C99? Seems kind of, um clunky coming from higher language.
int tests[7];
for (int i=0; i<sizeof(tests)/sizeof(int); i++) {
tests[i] = rand();
}
Also this Segmentation faults.
int r = 10000000;
printf ("r: %i\n", r);
int tests[r];
run it:
r: 10000000
Segmentation fault
10000000 seg faults, but 1000000 works.
How do I get more info out of this? What should I be checking and how would I debug something like this? Is there a limit on C arrays? What's a segmentation fault?
Getting size of an array in C is easy. This will give you the size of array in bytes.
sizeof(x)
But I guess what you require is number of elements, in that case it would be:
sizeof(x) / sizeof(x[0])
You can write a simple macro for this:
#define NumElements(x) (sizeof(x) / sizeof(x[0]))
For example:
int a[10];
int size_a = sizeof(a); /* size in bytes */
int numElm = NumElements(a); /* number of elements, here 10 */
Why calculate the size?
Define a constant containing the size and use that when declaring the array. Reference the constant whenever you want the size of the array.
As a primarily C++ programmer, I'll say that historically the constant was often defined as an enum value or a #define. In C, that may be current rather than historic, though - I don't know how current C handles "const".
If you really want to calculate the size, define a macro to do it. There may even be a standard one.
The reason for the segfault is most likely because the array you're trying to declare is about 40 megabytes worth, and is declared as a local variable. Most operating systems limit the size of the stack. Keep your array on the heap or in global memory, and 40 megabytes for one variable will probably be OK for most systems, though some embedded systems may still cry foul. In a language like Java, all objects are on the heap, and only references are kept on the stack. This is a simple and flexible system, but often much less efficient than storing data on the stack (heap allocation overheads, avoidable heap fragmentation, indirect access overheads...).
Arrays in C don't know how big they are, so yes, you have to do the sizeof array / sizeof array[0] trick to get the number of elements in an array.
As for the segfault issue, I'm guessing that you exceeded your stack size by attempting to allocate 10000000 * sizeof int bytes. A rule of thumb is that if you need more than a few hundred bytes, allocate it dynamically using malloc or calloc instead of trying to create a large auto variable:
int r = 10000000;
int *tests = malloc(sizeof *test * r);
Note that you can treat tests as though it were an array type in most circumstances (i.e., you can subscript it, you can pass it to any function that expects an array, etc.), but it is not an array type; it is a pointer type, so the sizeof tests / sizeof tests[0] trick won't work.
Traditionally, an array has a static size. So we can do
#define LEN 10
int arr[LEN];
but not
int len;
scanf("%d", &len);
int arr[len]; // bad!
Since we know the size of an array at compile time, getting the size of an array tends to trivial. We don't need sizeof because we can figure out the size by looking at our declaration.
C++ provides heap arrays, as in
int len;
scanf("%d", &len);
int *arr = new int[len];
but since this involves pointers instead of stack arrays, we have to store the size in a variable which we pass around manually.
I suspect that it is because of integer overflow. Try printing the value using a printf:
printf("%d", 10000000);
If it prints a negative number - that is the issue.
Stack Overflow! Try allocating on the heap instead of the stack.

memory allocation in C

I have a question regarding memory allocation order.
In the following code I allocate in a loop 4 strings.
But when I print the addresses they don't seem to be allocated one after the other... Am I doing something wrong or is it some sort of defense mechanism implemented by the OS to prevent possible buffer overflows? (I use Windows Vista).
Thank you.
char **stringArr;
int size=4, i;
stringArr=(char**)malloc(size*sizeof(char*));
for (i=0; i<size; i++)
stringArr[i]=(char*)malloc(10*sizeof(char));
strcpy(stringArr[0], "abcdefgh");
strcpy(stringArr[1], "good-luck");
strcpy(stringArr[2], "mully");
strcpy(stringArr[3], "stam");
for (i=0; i<size; i++) {
printf("%s\n", stringArr[i]);
printf("%d %u\n\n", &(stringArr[i]), stringArr[i]);
}
Output:
abcdefgh
9650064 9650128
good-luck
9650068 9638624
mully
9650072 9638680
stam
9650076 9638736
Typically when you request memory through malloc(), the C runtime library will round the size of your request up to some minimum allocation size. This makes sure that:
the runtime library has room for its bookkeeping information
it's more efficient for the runtime library to manage allocated blocks that are all multiples of some size (such as 16 bytes)
However, these are implementation details and you can't really rely on any particular behaviour of malloc().
But when I print the addresses they don't seem to be allocated one after the other...
So?
Am I doing something wrong or is it some sort of defense mechanism implemented by the OS to prevent possible buffer overflows?
Probably "neither".
Just out of interest, what addresses do you get?
You shouldn't depend on any particular ordering or spacing of values returned by malloc. It behaves in mysterious and unpredictable ways.
Typically it is reasonable to expect that a series of chronological allocations will result in a memory addresses that are somehow related, but as others have pointed out, it is certainly not a requirement of the heap manager. In this particular case, though, it is possible that you are seeing results of the low fragmentation heap. Windows keeps lists of small chunks of memory that can quickly satisfy a request. These could be in any order.
You can't depend on malloc to give you contiguous addresses. It's entirely up to the implementation and presumably the current state of the heap; some implementations may, many won't.
If you need the addresses to be contiguous, allocate one large block of memory and set up your pointers to point to different areas within it.
As others have mentioned, there is no standard to specify in which order the memory blocks allocated by malloc() should be located in the memory. For example, freed blocks can be scattered all around the heap and they may be re-used in any order.
But even if the blocks happen to be one after each other, they most likely do not form a contiguous block. In order to reduce fragmentation, the heap manager only allocates blocks of specific size, for example power of two (64, 128, 256, 512 etc. bytes). So, if you reserve 10 bytes for a string, there may be perhaps 22 or 54 un-used bytes after that.
The memory overhead is another reason why it is not good idea to use dynamic memory allocation unless really necessary. It is much easier and safer just to use a static array.
Since you are interesting in knowing the addresses returned by malloc(), you should make sure you are printing them properly. By "properly", I mean that you should use the right format specifier for printf() to print addresses. Why are you using "%u" for one and "%d" for another?
You should use "%p" for printing pointers. This is also one of the rare cases where you need a cast in C: because printf() is a variadic function, the compiler can't tell that the pointers you're passing as an argument to it need to be of the type void * or not.
Also, you shouldn't cast the return value of malloc().
Having fixed the above, the program is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char **stringArr;
int size=4, i;
stringArr = malloc(size * sizeof *stringArr);
for (i=0; i < size; i++)
stringArr[i] = malloc(10 * sizeof *stringArr[i]);
strcpy(stringArr[0], "abcdefgh");
strcpy(stringArr[1], "good-luck");
strcpy(stringArr[2], "mully");
strcpy(stringArr[3], "stam");
for (i=0; i<size; i++) {
printf("%s\n", stringArr[i]);
printf("%p %p\n", (void *)(&stringArr[i]), (void *)(stringArr[i]));
}
return 0;
}
and I get the following output when I run it:
abcdefgh
0x100100080 0x1001000a0
good-luck
0x100100088 0x1001000b0
mully
0x100100090 0x1001000c0
stam
0x100100098 0x1001000d0
On my computer, char ** pointers are 8 bytes long, so &stringArr[i+1] is at 8 bytes greater than &stringArr[i]. This is guaranteed by the standard: If you malloc() some space, that space is contiguous. You allocated space for 4 pointers, and the addresses of those four pointers are next to each other. You can see that more clearly by doing:
printf("%d\n", (int)(&stringArr[1] - &stringArr[0]));
This should print 1.
About the subsequent malloc()s, since each stringArr[i] is obtained from a separate malloc(), the implementation is free to assign any suitable addresses to them. On my implementation, with that particular run, the addresses are all 0x10 bytes apart.
For your implementation, it seems like char ** pointers are 4 bytes long.
About your individual string's addresses, it does look like malloc() is doing some sort of randomization (which it is allowed to do).

Resources