Maximum increment to sbrk - c

What is the upper limit of the increment that can be used in a sbrk call?
I am unable to successfully call sbrk with a 2e10 increment, but I am able to call sbrk with a 1e10 increment three times in a row.
I have a similar issue with mmap.
Tested in a Arch Linux 5.4.3-arch1-1 x86-64 with glibc 2.30.
Code example compiled with gcc 9.2.0.
Code example:
#define _DEFAULT_SOURCE
#include <stdio.h>
#include <unistd.h>
int main() {
printf("sizeof(intptr_t)=%ld\n", sizeof(intptr_t));
intptr_t increment = 1e10;
if (sbrk(increment * 2) == (void *) -1) {
printf("error sbrk 1\n");
}
if (sbrk(increment) == (void *) -1) {
printf("error sbrk 2\n");
}
if (sbrk(increment) == (void *) -1) {
printf("error sbrk 3\n");
}
if (sbrk(increment) == (void *) -1) {
printf("error sbrk 4\n");
}
return 0;
}
The code above results in the following output:
sizeof(intptr_t)=8
error sbrk 1

Technically, sbrk will take intptr_t on most modern systems. This will be 32 signed int quantity (-2^31 to 2^31-1) when compiling with 32 bit pointers, and 64 bit signed int (-2^63 to 2^63-1) when compiling with 64 bit pointers.
Of course, the practical range is much smaller. For the sbrk call to succeed the following should be true:
The parameter should be within the address space supported by the processor/machine. Many modern system will allow up to 2^48 addresses (as oppose to the theoretical limit of 2^64.
The total size of requested heap size should be within the configuration limits: the process specific hard/soft limits (ulimit), and system configuration limit.
When increasing size of heap, there should be enough physical or swap space to supported the requested memory.
Back to the user question, when running 32 bit program (on 32 or 64 system), the limit is 2GB or less. And 1e10 will not work. When running 64 bit program the request may fail or pass depending on configuration, resources and setup.

I did more experiments to try to check what configurations and resources could explain the described behavior.
I found out that I am unable to allocate a contiguous block of memory larger than the total physical Memory + Swap.
This seems to answer my question.
However, this is a configuration?

Related

How to correctly allocate a large chunk of RAM with calloc/malloc in Windows 10?

I need to allocate memory for a vector with n=10^9 (1 billion) rows using calloc or malloc but when I try to allocate this amount of memory the system crashes and returns me NULL, which I presumed to be the system not allowing me to allocate this big chunk of memory. I'm using Windows 10 in a 64-bit platform with 16 GB RAM.
However, when I ran the same code in a Linux OS (Debian) the system actually allocated the amount I demanded, so now I'm wondering:
How can I allocate this big chunk using Windows 10, once I'm out of time to
venture in Linux yet?
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
int main(void) {
uint32_t * a = calloc(1000000000, 4);
printf("a = %08x\n", a);
return 0;
}
The C run time won't let you do this but windows will, use the VirtualAlloc API instead of calloc. Specify NULL for the lpAddress parameter, MEM_COMMIT for flAllocationType and PAGE_READWRITE for flProtect. Also note that even though dwSize uses the "dw" decoration that is usually DWORD in this case the parameter is actually a SIZE_T which is 64-bit for 64-bit builds.
Code would look like:
#include <windows.h>
...
LPVOID pResult = VirtualAlloc(NULL, dwSize, MEM_COMMIT, PAGE_READWRITE);
if(NULL == pResult)
{ /* Handle allocation error */ }
Where dwSize is the number of bytes of memory you wish to allocate, you later use VirtualFree to release the allocated memory : VirtualFree(pResult, 0, MEM_RELEASE);
The dwSize parameter to VirtualFree is for use when you specify MEM_DECOMMIT (rather than MEM_RELEASE), allowing you to put memory back in the reserved but uncommitted state (meaning that actual pages have not yet been found to satisfy the allocation).

How to use malloc with madvise and enable MADV_DONTDUMP option

I'm looking to use madvise and malloc but I have always the same error:
madvise error: Invalid argument
I tried to use the MADV_DONTDUMP to save some space in my binaries but it didn't work.
The page size is 4096.
int main(int argc, char *argv[])
{
void *p_optimize_object;
unsigned int optimize_object_size = 4096*256;
optimize_object_size = ((optimize_object_size / 4096) + 1) * 4096;
printf("optimize_object_size = %d\n", optimize_object_size);
p_optimize_object = malloc(optimize_object_size);
if (madvise(p_optimize_object, optimize_object_size, MADV_DONTDUMP | MADV_SEQUENTIAL) == -1)
{
perror("madvise error");
}
printf("OK\n");
return 0;
}
Here's the command:
$ gcc -g -O3 madvice.c && ./a.out
Output:
madvise error: Invalid argument
You can't and even if you could do it in certain cases with certain flags (and the flags you're trying to use here should be relatively harmless), you shouldn't. madvise operates on memory from lower level allocations than malloc gives you and messing with the memory from malloc will likely break malloc.
If you want some block of memory that you can call madvise on, you should obtain it using mmap.
Your usage of sizeof is wrong; you are allocating only four bytes of memory (sizeof unsigned int), and calling madvise() with a size argument of 1M for the same chunk of memory.
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
void *p_optimize_object;
unsigned int optimize_object_size = 4096*256;
optimize_object_size = ((optimize_object_size / 4096) + 1) * 4096;
printf("optimize_object_size = %d\n", optimize_object_size);
p_optimize_object = malloc(sizeof(optimize_object_size));
fprintf(stderr, "Allocated %zu bytes\n", sizeof(optimize_object_size));
if (madvise(p_optimize_object, optimize_object_size, MADV_WILLNEED | MADV_SEQUENTIAL) == -1)
{
perror("madvise error");
}
printf("OK\n");
return 0;
}
Output:
optimize_object_size = 1052672
Allocated 4 bytes
madvise error: Invalid argument
OK
UPDATE:
And the other problem is that malloc() can give you non-aligned memory (probably with an alignment of 4,8,16,...), where madvice() wants page-aligned memory:
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
void *p_optimize_object;
unsigned int optimize_object_size = 4096*256;
int rc;
optimize_object_size = ((optimize_object_size / 4096) + 1) * 4096;
printf("optimize_object_size = %d\n", optimize_object_size);
#if 0
p_optimize_object = malloc(sizeof(optimize_object_size));
fprintf(stderr, "Allocated %zu bytes\n", sizeof(optimize_object_size));
#elif 0
p_optimize_object = malloc(optimize_object_size);
fprintf(stderr, "Allocated %zu bytes\n", optimize_object_size);
#else
rc = posix_memalign (&p_optimize_object, 4096, optimize_object_size);
fprintf(stderr, "Allocated %zu bytes:%d\n", optimize_object_size, rc);
#endif
// if (madvise(p_optimize_object, optimize_object_size, MADV_WILLNEED | MADV_SEQUENTIAL) == -1)
if (madvise(p_optimize_object, optimize_object_size, MADV_WILLNEED | MADV_DONTFORK) == -1)
{
perror("madvise error");
}
printf("OK\n");
return 0;
}
OUTPUT:
$ ./a.out
optimize_object_size = 1052672
Allocated 1052672 bytes:0
OK
And the alignement requerement appears to be linux-specific:
Linux Notes
The current Linux implementation (2.4.0) views this system call more as a command than as advice and hence may return an error when it cannot
do what it usually would do in response to this advice. (See the ERRORS description above.) This is non-standard behavior.
The Linux implementation requires that the address addr be page-aligned, and allows length to be zero. If there are some parts of the speci‐
fied address range that are not mapped, the Linux version of madvise() ignores them and applies the call to the rest (but returns ENOMEM from
the system call, as it should).
Finally:
I tried to use the MADV_DONTDUMP to save some space in my binaries but it didn't work.
Which, of course, doesn't make sense. Malloc or posix_memalign add to your address space, making (at least) the VSIZ of your running program larger. What happens to the this space is completely in the hands of the (kernel) memory manager, driven by your program's references to the particular memory, with maybe a few hints from madvice.
I tried to use the MADV_DONTDUMP to save some space in my binaries but it didn't work.
Read again, and more carefully, the madvise(2) man page.
The address should be page aligned. The result of malloc is generally not page aligned (page size is often 4Kbytes, but see sysconf(3) for SC_PAGESIZE). Use mmap(2) to ask for a page-aligned segment in your virtual address space.
You won't save any space in your binary executable. You'll just save space in your core dump, see core(5). And core dumps should not happen. See signal(7) (read also about segmentation fault and undefined behaviour).
To disable core dumps, consider rather setrlimit(2) with RLIMIT_CORE (or the ulimit -c bash builtin in your terminal running a bash shell).

Physical memory management in Userspace?

I am working on an embedded device with only 512MB of RAM and the device is running Linux kernel. I want to do the memory management of all the processes running in the userspace by my own library. is it possible to do so. from my understanding, the memory management is done by kernel, Is it possible to have that functionality in User space.
If your embedded device runs Linux, it has an MMU. Controling the MMU is normally a privileged operation, so only an operating system kernel has access to it. Therefore the answer is: No, you can't.
Of course you can write software running directly on the device, without operating system, but I guess that's not what you wanted. You should probably take one step back, ask yourself what gave you the idea about the memory management and what could be a better way to solve this original problem.
You can consider using setrlimit. Refer another Q&A.
I wrote the test code and run it on my PC. I can see that memory usage is limited. The exact relationship of units requires further analysis.
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
#include <sys/resource.h>
int main(int argc, char* argv)
{
long limitSize = 1;
long testSize = 140000;
// 1. BEFORE: getrlimit
{
struct rlimit asLimit;
getrlimit(RLIMIT_AS, &asLimit);
printf("BEFORE: rlimit(RLIMIT_AS) = %ld,%ld\n", asLimit.rlim_cur, asLimit.rlim_max);
}
// 2. BEFORE: test malloc
{
char *xx = malloc(testSize);
if (xx == NULL)
perror("malloc FAIL");
else
printf("malloc(%ld) OK\n", testSize);
free(xx);
}
// 3. setrlimit
{
struct rlimit new;
new.rlim_cur = limitSize;
new.rlim_max = limitSize;
setrlimit(RLIMIT_AS, &new);
}
// 4. AFTER: getrlimit
{
struct rlimit asLimit;
getrlimit(RLIMIT_AS, &asLimit);
printf("AFTER: rlimit(RLIMIT_AS) = %ld,%ld\n", asLimit.rlim_cur, asLimit.rlim_max);
}
// 5. AFTER: test malloc
{
char *xx = malloc(testSize);
if (xx == NULL)
perror("malloc FAIL");
else
printf("malloc(%ld) OK\n", testSize);
free(xx);
}
return 0;
}
Result:
BEFORE: rlimit(RLIMIT_AS) = -1,-1
malloc(140000) OK
AFTER: rlimit(RLIMIT_AS) = 1,1
malloc FAIL: Cannot allocate memory
From what I understand of your question, you want to somehow use your own library for handling memory of kernel processes. I presume you are doing this to make sure that rogue processes don't use too much memory, which allows your process to use as much memory as is available. I believe this idea is flawed.
For example, imagine this scenario:
Total memory 512MB
Process 1 limit of 128MB - Uses 64MB
Process 2 imit of 128MB - Uses 64MB
Process 3 limit of 256MB - Uses 256MB then runs out of memory, when in fact 128MB is still available.
I know you THINK this is the answer to your problem, and on 'normal' embedded systems, this would probably work, but you are using a complex kernel, running processes you don't have total control over. You should write YOUR software to be robust when memory gets tight because that is all you can control.

Is it possible to "punch holes" through mmap'ed anonymous memory?

Consider a program which uses a large number of roughly page-sized memory regions (say 64 kB or so), each of which is rather short-lived. (In my particular case, these are alternate stacks for green threads.)
How would one best do to allocate these regions, such that their pages can be returned to the kernel once the region isn't in use anymore? The naïve solution would clearly be to simply mmap each of the regions individually, and munmap them again as soon as I'm done with them. I feel this is a bad idea, though, since there are so many of them. I suspect that the VMM may start scaling badly after a while; but even if it doesn't, I'm still interested in the theoretical case.
If I instead just mmap myself a huge anonymous mapping from which I allocate the regions on demand, is there a way to "punch holes" through that mapping for a region that I'm done with? Kind of like madvise(MADV_DONTNEED), but with the difference that the pages should be considered deleted, so that the kernel doesn't actually need to keep their contents anywhere but can just reuse zeroed pages whenever they are faulted again.
I'm using Linux, and in this case I'm not bothered by using Linux-specific calls.
I did a lot of research into this topic (for a different use) at some point. In my case I needed a large hashmap that was very sparsely populated + the ability to zero it every now and then.
mmap solution:
The easiest solution (which is portable, madvise(MADV_DONTNEED) is linux specific) to zero a mapping like this is to mmap a new mapping above it.
void * mapping = mmap(MAP_ANONYMOUS);
// use the mapping
// zero certain pages
mmap(mapping + page_aligned_offset, length, MAP_FIXED | MAP_ANONYMOUS);
The last call is performance wise equivalent to subsequent munmap/mmap/MAP_FIXED, but is thread safe.
Performance wise the problem with this solution is that the pages have to be faulted in again on a subsequence write access which issues an interrupt and a context change. This is only efficient if very few pages were faulted in in the first place.
memset solution:
After having such crap performance if most of the mapping has to be unmapped I decided to zero the memory manually with memset. If roughly over 70% of the pages are already faulted in (and if not they are after the first round of memset) then this is faster then remapping those pages.
mincore solution:
My next idea was to actually only memset on those pages that have been faulted in before. This solution is NOT thread-safe. Calling mincore to determine if a page is faulted in and then selectively memset them to zero was a significant performance improvement until over 50% of the mapping was faulted in, at which point memsetting the entire mapping became simpler (mincore is a system call and requires one context change).
incore table solution:
My final approach which I then took was having my own in-core table (one bit per page) that says if it has been used since the last wipe. This is by far the most efficient way since you will only be actually zeroing the pages in each round that you actually used. It obviously also is not thread safe and requires you to track which pages have been written to in user space, but if you need this performance then this is by far the most efficient approach.
I don't see why doing lots of calls to mmap/munmap should be that bad. The lookup performance in the kernel for mappings should be O(log n).
Your only options as it seems to be implemented in Linux right now is to punch holes in the mappings to do what you want is mprotect(PROT_NONE) and that is still fragmenting the mappings in the kernel so it's mostly equivalent to mmap/munmap except that something else won't be able to steal that VM range from you. You'd probably want madvise(MADV_REMOVE) work or as it's called in BSD - madvise(MADV_FREE). That is explicitly designed to do exactly what you want - the cheapest way to reclaim pages without fragmenting the mappings. But at least according to the man page on my two flavors of Linux it's not fully implemented for all kinds of mappings.
Disclaimer: I'm mostly familiar with the internals of BSD VM systems, but this should be quite similar on Linux.
As in the discussion in comments below, surprisingly enough MADV_DONTNEED seems to do the trick:
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <stdio.h>
#include <unistd.h>
#include <err.h>
int
main(int argc, char **argv)
{
int ps = getpagesize();
struct rusage ru = {0};
char *map;
int n = 15;
int i;
if ((map = mmap(NULL, ps * n, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)) == MAP_FAILED)
err(1, "mmap");
for (i = 0; i < n; i++) {
map[ps * i] = i + 10;
}
printf("unnecessary printf to fault stuff in: %d %ld\n", map[0], ru.ru_minflt);
/* Unnecessary call to madvise to fault in that part of libc. */
if (madvise(&map[ps], ps, MADV_NORMAL) == -1)
err(1, "madvise");
if (getrusage(RUSAGE_SELF, &ru) == -1)
err(1, "getrusage");
printf("after MADV_NORMAL, before touching pages: %d %ld\n", map[0], ru.ru_minflt);
for (i = 0; i < n; i++) {
map[ps * i] = i + 10;
}
if (getrusage(RUSAGE_SELF, &ru) == -1)
err(1, "getrusage");
printf("after MADV_NORMAL, after touching pages: %d %ld\n", map[0], ru.ru_minflt);
if (madvise(map, ps * n, MADV_DONTNEED) == -1)
err(1, "madvise");
if (getrusage(RUSAGE_SELF, &ru) == -1)
err(1, "getrusage");
printf("after MADV_DONTNEED, before touching pages: %d %ld\n", map[0], ru.ru_minflt);
for (i = 0; i < n; i++) {
map[ps * i] = i + 10;
}
if (getrusage(RUSAGE_SELF, &ru) == -1)
err(1, "getrusage");
printf("after MADV_DONTNEED, after touching pages: %d %ld\n", map[0], ru.ru_minflt);
return 0;
}
I'm measuring ru_minflt as a proxy to see how many pages we needed to allocate (this isn't exactly true, but the next sentence makes it more likely). We can see that we get new pages in the third printf because the contents of map[0] are 0.

Can memory allocated through mmap overlap the data segment

The malloc function uses both sbrk and mmap functions. Now the sbrk function increases or decreases the data segment. So it grows linearly. Now my question is, is this linearity always maintained, or for example, an mmap call can allocate memory overlapping the data segment?
I'm talking about multithreaded programs running on multicore systems. This blog talks about some serious flaws of sbrk for multithreaded programs, and it points out that it is possible that memory allocated with sbrk can be intermingled with memory alloacted with mmap (The sbrk heap could become discontinuous because a mmaped region or a shared object obstructs the growth of the heap).
That blog post doesn't see the forest for the trees; only the malloc implementation is allowed to call sbrk with a nonzero argument. More precisely, most malloc implementations for Unix will stop functioning correctly (and by that I mean "your program will crash") if application code calls sbrk with a nonzero argument. If you want to make a large allocation directly from the OS you must use mmap to do it.
(It is true that in a multi-threaded program, malloc must internally wrap a mutex around its calls to sbrk, but that's an implementation detail. POSIX says malloc is thread safe, that's the important thing for an application programmer.)
mmap will not allocate memory overlapping the brk area unless you use MAP_FIXED. If you use MAP_FIXED and your program blows up you get to keep all the pieces.
The kernel tries to avoid doing it, but mmap in normal operation could conceivably allocate memory close to the top of the brk area. If this happens, a subsequent sbrk call that would collide with the mmap region will fail. It will not allocate discontiguous memory. Good implementations of malloc ought to detect this condition and start using mmap for everything. I have not actually tried it, but a test program would be pretty easy to write.
is this linearity always maintained, or for example, an mmap call can allocate memory overlapping the data segment?
Observed behavior is that the brk area is always linear. Implementation details: If enlarging the brk area is not possible, for example due to a blocking mapping, glibc will switch to mmap-only. Small allocations (<128KB) seem to be obtained by glibc via brk if possible, so blocking that with:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
int main(void)
{
int i;
for (i = 0; i < 1024; ++i) {
malloc(2048);
if (i == 512) {
void *r, *end = sbrk(0);
r = mmap(end, 4096, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
}
}
}
when straced, yields indeed
[...]
brk(0x1e7d000) = 0x1e7d000
mmap(0x1e7d000, 4096, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0) = 0x1e7d000
brk(0x1e9e000) = 0x1e7d000 <-- (!)
mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd9bc9000

Resources