Dynamically allocating multiple big arrays in C - c

I'm writing a program in C on windows that launches 30 threads, each of which needs an array of int16_t.
The size is calculated before the thread function is called and in the example I'm working with it's around 250 millions. This is around 15GB, which should not be a problem, because I have 128GB ram available.
I've tried using both malloc and calloc inside the thread function, but over half of the allocations return NULL with errno set to 12 (enomem).
With a small number of threads (up to 3) it works fine though, same if I just use 1 thread and allocating an unreasonably big array.
My next attempt to solve this issue was to create an array of pointers in the main, allocate the arrays there and pass them as argument to the thread, same thing happened.
So from these results my best guess would be it can't allocate contiguous blocks of memory of that size, so I also tried allocating many smaller arrays, which obviously didn't work either. Is this an expected behaviour or am I doing something wrong?

Related

Result of using millions of malloc()s and free()s in your C code?

I was recently asked this question in an interview.
Suppose there is a large library of C programs and each program constantly malloc()s and free()s blocks of data. What do you think will happen if there are a million calls to malloc() and free() in one run of your program. What will you add to your answer if you have been given a very large memory heap storage?
One thing that may happen is that your memory will be fragmented, especially if you allocate block of different sizes.
Thus, if your memory size is not large, some malloc may fail, even if the total free memory is bigger that requested.
This is really a stupid question without more qualifiers. Suppose you do
for (;;)
{
free (malloc(SOMEVALUE)) ;
}
In that case very little is going to happen.
Let's assume that mallocs and frees occur in a random order. If you have an malloc implementation that uses fixed sized blocks, you are going to get a different result than if you use one with variable sized blocks (=memory fragmentation).
The result you get is entirely dependent upon the malloc implementation and the sequence of the calls to malloc and free.

why does bigger malloc cause exception? [duplicate]

First of all I noticed when I malloc memory vs. calloc the memory footprint is different. I am working with datasets of several GB. It is ok for this data to be random.
I expected that I could just malloc a large amount of memory and read whatever random data was in it cast to a float. However, looking at the memory footprint in the process viewer the memory is obviously not being claimed (vs. calloc where I see a large foot print). I ran a loop to write data into the memory and then I saw the memory footprint climb. Am I correct in saying that the memory isn't actually claimed until I initialize it?
Finally after I passed 1024*1024*128 bytes (1024 MB in the process viewer) I started getting segfaults. Calloc however seems to initialize the full amount up to 1 GB. Why do I get segfaults when initializing memory in a for loop with malloc at this number 128MB and why does the memory footprint show 1024MB?
If malloc a large amount from memory and then read from it what am I getting (since the process viewer shows almost no footprint until I initialize it)?
Finally is there any way for me to alloc more than 4GB? I am testing memory hierarchy performance.
Code for #2:
long long int i;
long long int *test=(long long int*)malloc(1024*1024*1024);
for (i=0;i<1024*1024*128;i++)
test[i]=i;
sleep(15);
Some notes:
As the comments note, Linux doesn't actually allocate your memory until you use it.
When you use calloc instead of malloc, it zeroes out all the memory you requested. This is equivalent to using it.
1- If you are working on a 32-bit machine you can't have a variable with more than 2GBs allocated to it.
2- If you are working on a 64-bit machine you can allocate as much as RAM+Swap memory in total, however, allocating all for one variable requires a big consequent chunk of memory which might not be available. Try it with a linked list, where each element has only 1 MB assigned and you can achieve a higher memory allocated in total.
3- As noted by you and Sharth, unless you use your memory, linux won't allocate it.
Your #2 is failing with a segfault either because sizeof(long long int) > 8 or because your malloc returned NULL. That is very possible if you are requesting 1 GB of RAM.
More info on #2. From your 128 MB comment I get the idea that you may not realize what's happening. Because you declare the array pointer as long long int the size of each array element is 8 bytes. 1024/8 == 128 so that is why your loop works. It did when I tried it, anyway.
Your for loop in your example code is actually touching 1GB of memory, since it is indexing 128*1024*1024 long longs, and each long long is 8 bytes.

Heap memory exploration with malloc

I've written a program that takes 3 number in input:
The size of memory to allocate in the heap with malloc()
Two int value
If q is an unsigned char pointer it gives q[i]=b from q[min] to q[max].
I thought that the heap was divided in pages and that the first call to malloc() would have given a pointer to the first byte of the page of my process. So why if try to get q[-1] my process is not killed?
Then I've tried with another pointer p and I noticed that between the two pointers there is a distance of 32byte, why they are not adjacent?
The last thing I notice is that both in p[-8]=q[-40(-32-8)] and q[-8] there is the number 33 00100001 (all the other bytes are setted to 0), it means anything?
Thank you!
I thought that the heap was divided in pages and that the first call to malloc would have given a pointer to the first byte of the page of my process. So why if try to get q[-1] my process is not killed?
Most likely because your malloc implementation stores something there. Possibly the size of the block.
Then I've tried with another pointer p and I noticed that between the two pointers there is a distance of 32byte, why they are not adjacents?
Same reason. Your implementation probably stores the size of the block in the block just before the address it returns.
The last thing I notice is that both in p[-8]=q[-40(-32-8)] and q[-8] there is the number 33 (00100001), it means anything?
It probably means something to your malloc implementation. But you can't really tell what without looking at the implementation.
The standard library uses the heap before calling main so anything you do won't be on a clean heap.
The heap implementation usually uses about 2 pointer at the starting of an allocation, and the total size is usually aligned to 2 pointers.
The heap implementation usually uses a lot of bytes at the start of each system allocation, it can sometimes be close to page size.
The heap is allocated in chunks much bigger than a page, on Windows it is at least 16 pages.
The heap can be adjacent to other allocations, on Linux it appeares right after the main executable so underflowing it won't crash.

When should i use calloc over malloc

This is from Beej's guide to C
"The drawback to using calloc() is that it takes time to clear memory, and in most cases, you don't need it clear since you'll just be writing over it anyway. But if you ever find yourself malloc()ing a block and then setting the memory to zero right after, you can use calloc() to do that in one call."
so what is a potential scenario when i will want to clear memory to zero.
When the function you are passing a buffer to states in its documentation that a buffer must be zero-filled. You may also always zero out the memory for safety; it doesn't actually take that much time unless the buffers are really huge. Memory allocation itself is the potentially expensive part of the operation.
One scenario is where you are allocating an array of integers, (say, as accumulators or counter variables) and you want each element in the array to start at 0.
In some case where you are allocating memory for some structure and some member of that structure are may going to evaluation in some expression or in conditional statement without initializing that structure in that case it would be harmful or will give you undefined behavior . So overcome form this better you
1> malloc that structure and memset it with 0 before using that structure
or
2> calloc that structure
Note: some advance memory management program with malloc also reset memory with 0
There are lots of times when you might want memory zeroed!
Some examples:
Allocating memory to contain a structure, where you want all the
members initialised to zero
Allocating memory for an array of chars which you are later going to write some number of chars into, and then treat as a NULL
terminated string
Allocating memory for an array of pointers which you want initialised to NULL
If all allocated memory is zero-filled, the program's behavior is much more reproducible (so the behavior is more likely the same if you re-run your program). This is why I don't use uninitialized malloc zones.
(for similar reasons, when debugging a C or C++ program on Linux, I usually do echo 0 > /proc/sys/kernel/randomize_va_space so that mmap behavior is more reproducible).
And if your program does not allocate huge blocks (i.e. dozens of megabytes), the time spent inside malloc is much bigger than the time to zero it.

calloc v/s malloc and time efficiency

I've read with interest the post C difference between malloc and calloc. I'm using malloc in my code and would like to know what difference I'll have using calloc instead.
My present (pseudo)code with malloc:
Scenario 1
int main()
{
allocate large arrays with malloc
INITIALIZE ALL ARRAY ELEMENTS TO ZERO
for loop //say 1000 times
do something and write results to arrays
end for loop
FREE ARRAYS with free command
} //end main
If I use calloc instead of malloc, then I'll have:
Scenario2
int main()
{
for loop //say 1000 times
ALLOCATION OF ARRAYS WITH CALLOC
do something and write results to arrays
FREE ARRAYS with free command
end for loop
} //end main
I have three questions:
Which of the scenarios is more efficient if the arrays are very large?
Which of the scenarios will be more time efficient if the arrays are very large?
In both scenarios,I'm just writing to arrays in the sense that for any given iteration in the for loop, I'm writing each array sequentially from the first element to the last element. The important question: If I'm using malloc as in scenario 1, then is it necessary that I initialize the elements to zero? Say with malloc I have array z = [garbage1, garbage2, garbage 3]. For each iteration, I'm writing elements sequentially i.e. in the first iteration I get z =[some_result, garbage2, garbage3], in the second iteration I get in the first iteration I get z =[some_result, another_result, garbage3] and so on, then do I need specifically to initialize my arrays after malloc?
Assuming the total amount of memory being initialized in your two examples is the same, allocating the memory with calloc() might be faster than allocating the memory with malloc() and then zeroing them out in a separate step, especially if in the malloc() case you zero the elements individually by iterating over them in a loop. A malloc() followed by a memset() will likely be about as fast as calloc().
If you do not care that the array elements are garbage before you actually store the computation results in them, there is no need to actually initialize your arrays after malloc().
For 1 and 2, both do the same thing: allocate and zero, then use the arrays.
For 3, if you don't need to zero the arrays first, then zeroing is unnecessary and not doing it is faster.
There is a possibility that calloc's zeroing is more efficient than the code you write, but this difference will be small compared to the rest of the work the program does. The real savings of calloc is not having to write that code yourself.
Your point stated in 3. seems to indicate a case or unnecessary initialization. That is pretty bad speed wise, not only the time spent doing it is wasted but a whole lot of cache eviction happened because of it.
Doing a memset() or bzero() (that are called by calloc() anyway) is a good way to invalidate huge portion of your cache. Don't do it unless you are sure you won't overwrite everything yet can read parts of the buffer that will not have been written (as if 0 is an acceptable default value). If you write over everything anyway by all mean don't initialize your memory unnecessarily.
Unnecessary memory writing will not only ruin your app performance but also the performance of all applications sharing the same CPU with it.
The calloc and memset approaches should be about the same, and maybe slightly faster than zeroing it yourself.
Regardless, it's all relative to what you do inside your main loop, which could be orders of magnitude larger.
malloc is faster than Calloc because the reason is that malloc return memory as it is from an operating system. But when you will call Calloc it gets memory from the kernel or operating system and its initializes with its zero and then its return to you.
so, the initialization takes time. that's why malloc faster than Calloc
I dont know for linux. But on Windows there is something called the zero-page thread... calloc use those pages already initialized to zero. There is no difference in speed between malloc and calloc.
malloc differ by calloc by two reason
malloc takes one argument whereas calloc takes two argument
malloc is faster than calloc reason is that malloc processed single dimensional array to pointer format whereas calloc takes double dimensional array and before processed it converts to single dimensional array then to pointer format.
I think that, that's why malloc processing faster as compared to calloc

Resources