A technical question about dynamic memory allocation in C - c

I'm studying dynamic memory allocation in C, and I want to ask a question - let us suppose we have a program that receives a text input from the user. We don't know how long that text will be, it could be short, it could also be extremely long, so we know that we have to allocate memory to store the text in a buffer. In cases in which we receive a very long text, is there a way to find out whether we have enough memory space to allocate more memory to the text? Is there a way to have an indication that there is no memory space left?

You can use malloc() function if it returned NULL that means there no enough mem space but if it returned address of the mem it means there are mem space available example:
void* loc = malloc(sizeof(string));

ANSI C has no standard functions to get the size of available free RAM.
You may use platform-specific solutions.
C - Check currently available free RAM?

In C we typically use malloc, calloc and realloc for allocation of dynamic memory. As it has been pointed out in both answers and comments these functions return a NULL pointer in case of failure. So typical C code would be:
SomeType *p = malloc(size_required);
if (p == NULL)
{
// ups... malloc failed... add error handling
}
else
{
// great... p now points to allocated memory that we can use
}
I like to add that on (at least) Linux systems, the return value from malloc (and friends) is not really an out-of-memory indicator.
If the return value is NULL, we know the call failed and that we didn't get any memory that we can use.
But even if the return value is non-NULL, there is no guarantee that the memory really is available.
From https://man7.org/linux/man-pages/man3/free.3.html :
By default, Linux follows an optimistic memory allocation
strategy. This means that when malloc() returns non-NULL there
is no guarantee that the memory really is available. In case it
turns out that the system is out of memory, one or more processes
will be killed by the OOM killer.

We don't know how long that text will be
Sure we do, we always set a maximum limit. Because all user input needs to be sanitised anyway - so we always require a maximum limit on every single user input. If you don't do this, it likely means that your program is broken since it's vulnerable to buffer overruns.
Typically you'll read each line of user input into a local array allocated on the stack. Then you can check if it is valid (are strings null terminated etc) before allocating dynamic memory and then copy it over there.
By checking the return value of malloc etc you'll see if there was enough memory left or not.

There is no standard library function that tells you how much memory is available for use.
The best you can do within the bounds of the standard library is to attempt the allocation using malloc, calloc, or realloc and check the return value - it it’s NULL, then the allocation operation failed.
There may be system-specific routines that can provide that information, but I don’t know of any off the top of my head.

I made a test on linux with 8GB RAM. The overcommit has three main modes 0, 1 and 2 which are default, unlimited, and never:
Default:
$ echo 0 > /proc/sys/vm/overcommit_memory
$ ./a.out
After loop: Cannot allocate memory
size 17179869184
size 400000000
log2(size)34.000000
This means 8.5 GB were successfuly allocated, just about the amount of physical RAM. I tried to tweak it, but without changing swap, which is only 4 GB.
Unlimited:
$ echo 1 > /proc/sys/vm/overcommit_memory
$ ./a.out
After loop: Cannot allocate memory
size 140737488355328
size 800000000000
log2(size)47.000000
48 bits is virtual address size. 140 TB. Physical is only 39 bits (500 GB).
No overcommmit:
$ echo 2 > /proc/sys/vm/overcommit_memory
$ ./a.out
After loop: Cannot allocate memory
size 2147483648
size 80000000
log2(size)31.000000
2 GB is just what free command declares as free. Available are 4.6 GB.
malloc() fails in the same way if the process's resources are restricted - so this ENOMEM does not really specify much. "Cannot allocate memory" (aka ENOMEM aka 12) just says "malloc failed, guess why" or rather "malloc failed, NO more MEMory for you now.".
Well here is a.out which allocates doubling sizes until error.
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <math.h>
int main() {
size_t sz = 4096;
void *p;
while (!errno) {
p = malloc(sz *= 2);
free(p);
}
perror("After loop");
printf("size %ld\n", sz);
printf("size %lx\n", sz);
printf("log2(size)%f\n", log2((double)sz));
}
But I don't think this kind of probing is very useful/
Buffer
we have to allocate memory to store the text in a buffer
But not the whole text at once. With a real buffer (not just an allocated memory as destination) you could read portions of the input and store them away (out of memory onto disk).
Only disadvantage is: if I cannot use a partial input, then all the buffered copying and saving is wasted.
I really wonder what happens if I type and type fast for a couple of billion years -- without a newline.
We can allocate much more than we have as RAM, but we only need a fraction of that RAM: the buffer. But Lundin's answer shows it is much easier (typical) to rely on newlines and maximum length.
getline(3)
This gnu/posix function has the malloc/realloc built in. The paramters are a bit complicated, because a new pointer and size can be returned by reference. And return value of -1 can also mean ENOMEM, not end-of-file.
fgets() is the line-truncating version.
fread() is newline independant, with fixed size. (But you asked about text input - long lines or long overall text, or both?)
Good Q, good As, good comments about "live input":
getline how to limit amount of input as you can with fgets

Related

malloc() 5GB memory on a 32 bit machine

I was reading in a book:
The virtual address space of a process on a 32 bit machine is 2^32 i.e. 4Gb of space. And every address seen in the program is a virtual address. The 4GB of space is further goes through user/kernel split 3-1GB.
To better understand this, I did malloc() of 5Gb space and tried to print the all addresses. If I print the addresses, How is the application going to print whole 5Gb address when It has only 3GB of virtual address space? Am I missing something here?
malloc() takes size_t as an argument. On 32 bit system it's an alias to some unsigned 32 bit integer type. This means that you just cannot pass any value bigger than 2^32-1 as an argument for malloc() making it impossible request allocation of more than 4GB of memory using this function.
The same is true for all other functions that can be used to allocate memory. Ultimately they all end up as either brk() or mmap syscall. The length argument of mmap() is also of type ssize_t an in case of brk() you have to provide a pointer for the new end of your allocated space. The pointer is again 32 bit.
So there is absolutely no way to tell kernel you would like to get more than 4GB of memory allocated with one call) And it's not an accident - this just wouldn't make any sense anyway.
Now it's true that you could do several calls to malloc or other function that allocates memory, requesting more than 4GB in total. If you try this, the subsequent call (that would cause extending allocated memory to more than 3GB) will fail as there is just no address space available.
So I guess that you either didn't check the malloc return value or you did try to run code like this (or something similar):
int main() {
assert(malloc(5*1<<30));
}
and assumed that you succeeded in allocating 5GB without verifying that your argument overflowed and instead of requesting 5368709120 bytes, you requested 1073741824. One example to verify this on Linux is to use:
$ ltrace ./a.out
__libc_start_main(0x804844c, 1, 0xbfbcea74, 0x80484a0, 0x8048490 <unfinished ...>
malloc(1073741824) = 0x77746008
$
There's already a good answer. Just in case, the size of your virtual address space is easily verifiable like this:
#include <stdlib.h>
#include <stdio.h>
int main()
{
size_t size = (size_t)-1L;
void *foo;
printf("trying to allocate %zu bytes\n", size);
if (!(foo = malloc(size)))
{
perror("malloc()");
}
else
{
free(foo);
}
}
> gcc -m32 -omalloc malloc.c && ./malloc
trying to allocate 4294967295 bytes
malloc(): Cannot allocate memory
This must fail because parts of your address space are already occupied: by the mapped part of the kernel, by mapped shared libraries and by your program, of course.
You cannot do this because there is no function for you to alloc 5GB memory.

Is the memory chunk returned by malloc (and its cousins) initialized to Zero?

I wrote a code to test to stress test the memory management of Linux and Windows OS. Just for further tests I went ahead and checked what values are present in the memory returned by malloc().
The values that are being return are all 0 (zero). I have read the man page of malloc, checked on both Windows and Linux, but I am not able to find the reason for this behavior. According to the manpage the
The malloc() function allocates size bytes and returns a pointer to the allocated memory. The memory is not initialized.
To clear the memory segment, one has to manually use memset().
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>
int eat(long total,int chunk){
long i;
for(i=0;i<total;i+=chunk){
short *buffer=malloc(sizeof(char)*chunk);
if(buffer==NULL){
return -1;
}
printf("\nDATA=%d",*buffer);
memset(buffer,0,chunk);
}
return 0;
}
int main(int argc, char *argv[]){
int i,chunk=1024;
long size=10000;
printf("Got %ld bytes in chunks of %d...\n",size,chunk);
if(eat(size,chunk)==0){
printf("Done, press any key to free the memory\n");
getchar();
}else{
printf("ERROR: Could not allocate the memory");
}
}
Maybe I am missing something.
The code is adapted from here
EDIT: The problem has been been answered here for the GCC specific output. I believe Windows operating system would be also following the same procedures.
The memory returned by malloc() is not initialized, which means it may be anything. It may be zero, and it may not be; 'not initialized' means it could be anything (zero included). To get a guaranteed zeroed page use calloc().
The reason you are seeing zeroed pages (on Linux anyway) is that if an application requests new pages, these pages are zeroed by the OS (or more precisely they are copy-on-write images of a fixed page of zeroes known as the 'global zero page'). But if malloc() happens to use memory already allocated to the application which has since been freed (rather than expanding the heap) you may well see non-zero data. Note the zeroing of pages provided by the OS is an OS specific trait (primarily there for security so that one process doesn't end up with pages that happen to have data from another process), and is not mandated by the C standard.
You asked for a source for get_free_page zeroing the page: that says 'get_free_page() takes one parameter, a priority. ... It takes a page off of the free_page_list, updates mem_map, zeroes the page and returns the physical address of the page.' Here's another post that explains it well, and also explains why using calloc() is better than malloc()+memset().
Note that you aren't checking the entire allocated chunk for zero. You want something like this (untested):
int n;
char nonzero=0;
char *buffer=malloc(sizeof(char)*chunk);
if(buffer==NULL){
return -1;
}
for (n = 0; n<chunk; n++)
nonzero = nonzero || buffer[n];
printf("\nDATA=%s\n",nonzero?"nonzero":"zero");
You're absolutely correct; this behaviour is not guaranteed by the C language standard.
What you're observing could just be chance (you're only checking a couple of bytes in each allocation), or it could be an artifact of how your OS and C runtime library are allocating memory.
With this statement:
printf("\nDATA=%d",*buffer);
You only check the first sizeof(short) amount of bytes that have just been malloc()'ed (typically two (2) bytes).
Furthermore, the first time you may get lucky of getting all zeroes but after having had your program execute (and use) the heap memory then the contents-after-malloc() will be undefined.
the memory allocation function: calloc() will return a pointer to the 'new area and set all the bytes to zero.
The memory allocation function: realloc() will return a pointer to a (possibly new) area and have copied the bytes from the old area. The new area will be the 'new' requested length
The memory allocation function malloc will return a pointer to the new area but will not set the bytes to any specific value
The values that are being return are all 0 (zero).
But that's not guaranteed. It's because you're just running your program. If you malloc, random fill, and free a lot, you'll start noticing the previously freed memory is being reused, so you'll start to get non-zero chunks in your mallocs.
Yes you are right malloc() doesn't zero-initialize values. It arbitrarily pulls the amount of memory it's told to allocate from the heap, which essentially means there could be anything stored already within. You should therefore use malloc() only, where you're certain, that you are going to set it to a value. If you're going to do arithmetic with it right out of the box you might get some fishy results (I have already several times personally experienced this; you're going to have functional code with sometimes crazy output).
So set stuff you're not setting to a value to zero with memset(). Or my advise is to use calloc(). Calloc, other than malloc, does zero-initialize values. And is as far as I know faster than the combination of malloc() and memset() on the other hand malloc alone is faster than calloc. So try to find the fastest version possible at point of issue by keeping you're memory in form.
Look also at this post here: MPI matrix-vector-multiplication returns sometimes correct sometimes weird values. The question was a different one, but the cause the same.

How do I calculate beforehand how much memory calloc would allocate?

I basically have this piece of code.
char (* text)[1][80];
text = calloc(2821522,80);
The way I calculated it, that calloc should have allocated 215.265045 megabytes of RAM, however, the program in the end exceeded that number and allocated nearly 700mb of ram.
So it appears I cannot properly know how much memory that function will allocate.
How does one calculate that propery?
calloc (and malloc for that matter) is free to allocate as much space as it needs to satisfy the request.
So, no, you cannot tell in advance how much it will actually give you, you can only assume that it's given you the amount you asked for.
Having said that, 700M seems a little excessive so I'd be investigating whether the calloc was solely responsible for that by, for example, a program that only does the calloc and nothing more.
You might also want to investigate how you're measuring that memory usage.
For example, the following program:
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
int main (void) {
char (* text)[1][80];
struct mallinfo mi;
mi = mallinfo(); printf ("%d\n", mi.uordblks);
text = calloc(2821522,80);
mi = mallinfo(); printf ("%d\n", mi.uordblks);
return 0;
}
outputs, on my system:
66144
225903256
meaning that the calloc has allocated 225,837,112 bytes which is only a smidgeon (115,352 bytes or 0.05%) above the requested 225,721,760.
Well it depends on the underlying implementation of malloc/calloc.
It generally works like this - there's this thing called the heap pointer which points to the top of the heap - the area from where dynamic memory gets allocated. When memory is first allocated, malloc internally requests x amount of memory from the kernel - i.e. the heap pointer increments by a certain amount to make that space available. That x may or may not be equal to the size of the memory block you requested (it might be larger to account for future mallocs). If it isn't, then you're given at least the amount of memory you requested(sometimes you're given more memory because of alignment issues). The rest is made part of an internal free list maintained by malloc. To sum it up malloc has some underlying data structures and a lot depends on how they are implemented.
My guess is that the x amount of memory was larger (for whatever reason) than you requested and hence malloc/calloc was holding on to the rest in its free list. Try allocating some more memory and see if the footprint increases.

malloc() in C And Memory Usage

I was trying an experiment with malloc to see if I could allocate all the memory available.
I used the following simple program and have a few questions:
int main(void)
{
char * ptr;
int x = 100;
while(1)
{
ptr = (char *) malloc(x++ * sizeof(char) / 2);
printf("%p\n",ptr);
}
return 0;
}
1) Why is it that when using larger data types(int, unsigned long long int, long double) the process would use less memory but with smaller data types (int, char) it would use more?
2) When running the program, it would stop allocating memory after it reached a certain amount (~592mb on Windows 7 64-bit with 8GB RAM swap file set to system managed). The output of the print if showed 0 which means NULL. Why does it stop allocating memory after a reaching this threshold and not exhaust the system memory and swap?
I found someone in the following post trying the same thing as me, but the difference they were not seeing any difference in memory usage, but I am.
Memory Leak Using malloc fails
I've tried the code on Linux kernel 2.6.32-5-686 with similar results.
Any help and explanation would be appreciated.
Thanks,
1)Usually memory is allocated in multiples of pages, so if the size you asked for is less than a page malloc will allocate at least one page.
2)This makes sense, because in a multitasking system, you're not the only user and your process is not the only process running, there are many other processes that share a limited set of resources, including memory. If the OS allowed one process to allocate all the memory it needs without any limitation, then it's not really a good OS, right ?
Finally, in Linux, the kernel doesn't allocate any physical memory pages until after you actually start using this memory, so just calling malloc doesn't actually consume any physical memory, other than what is required to keep track of the allocation itself of course. I'm not sure about Windows though.
Edit:
The following example allocates 1GB of virtual memory
#include <stdio.h>
int main(int agrc, char **argv)
{
void *p = malloc(1024*1024*1024);
getc(stdin);
}
If you run top you get
top -p `pgrep test`
PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20 0 1027m 328 252 S 0 0.0 0:00.00 test
If you change malloc to calloc, and run top again you get
top -p `pgrep test`
PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20 0 1027m 1.0g 328 S 0 1.3 0:00.08 test
How are you reading your memory usage?
1) When allocating with char, you're allocating less memory per allocation than you do with for example long (one quarter as much, usually, but it's machine dependent)
Since most memory usage tools external to the program itself don't show allocated memory but actually used memory, it will only show the overhead malloc() itself uses instead of the unused memory you malloc'd.
More allocations, more overhead.
You should get a very different result if you fill the malloc'd block with data for each allocation so the memory is actually used.
2) I assume you're reading that from the same tool? Try counting how many bytes you actually allocate instead and it should be showing the correct amount instead of just "malloc overhead".
1) When you allocate memory, each allocation takes the space of the requested memory plus the size of a heap frame. See a related question here
2) The size of any single malloc is limited in Windows to _HEAP_MAXREQ. See this question for more info and some workarounds.
1) This could come from the fact that memory is paged and that every page has the same size. If your data fails to fit in a page and falls 'in-between' two pages, I think it is move to the beginning of the next page, thus creating a loss of space at the end of previous page.
2) The threshold is smaller because I think every program is restricted to a certain amount of data that is not the total maximum memory you have.

Heap size limitation in C

I have a doubt regarding heap in program execution layout diagram of a C program.
I know that all the dynamically allocated memory is allotted in heap which grows dynamically. But I would like to know what is the max heap size for a C program ??
I am just attaching a sample C program ... here I am trying to allocate 1GB memory to string and even doing the memset ...
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char *temp;
mybuffer=malloc(1024*1024*1024*1);
temp = memset(mybuffer,0,(1024*1024*1024*1));
if( (mybuffer == temp) && (mybuffer != NULL))
printf("%x - %x\n", mybuffer, &mybuffer[((1024*1024*1024*1)-1)]]);
else
printf("Wrong\n");
sleep(20);
free(mybuffer);
return 0;
}
If I run above program in 3 instances at once then malloc should fail atleast in one instance [I feel so] ... but still malloc is successfull.
If it is successful can I know how the OS takes care of 3GB of dynamically allocated memory.
Your machine is very probably overcomitting on RAM, and not using the memory until you actually write it. Try writing to each block after allocating it, thus forcing the operating system to ensure there's real RAM mapped to the address malloc() returned.
From the linux malloc page,
BUGS
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available. This is a really bad bug. In
case it turns out that the system is out of memory, one or more pro‐
cesses will be killed by the infamous OOM killer. In case Linux is
employed under circumstances where it would be less desirable to sud‐
denly lose some randomly picked processes, and moreover the kernel ver‐
sion is sufficiently recent, one can switch off this overcommitting
behavior using a command like:
# echo 2 > /proc/sys/vm/overcommit_memory
See also the kernel Documentation directory, files vm/overcommit-
accounting and sysctl/vm.txt.
You're mixing up physical memory and virtual memory.
http://apollo.lsc.vsc.edu/metadmin/references/sag/x1752.html
http://en.wikipedia.org/wiki/Virtual_memory
http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory
Malloc will allocate the memory but it does not write to any of it. So if the virtual memory is available then it will succeed. It is only when you write something to it will the real memory need to be paged to the page file.
Calloc if memory serves be correctly(!) write zeros to each byte of the allocated memory before returning so will need to allocate the pages there and then.

Resources