How to use malloc with madvise and enable MADV_DONTDUMP option - c

I'm looking to use madvise and malloc but I have always the same error:
madvise error: Invalid argument
I tried to use the MADV_DONTDUMP to save some space in my binaries but it didn't work.
The page size is 4096.
int main(int argc, char *argv[])
{
void *p_optimize_object;
unsigned int optimize_object_size = 4096*256;
optimize_object_size = ((optimize_object_size / 4096) + 1) * 4096;
printf("optimize_object_size = %d\n", optimize_object_size);
p_optimize_object = malloc(optimize_object_size);
if (madvise(p_optimize_object, optimize_object_size, MADV_DONTDUMP | MADV_SEQUENTIAL) == -1)
{
perror("madvise error");
}
printf("OK\n");
return 0;
}
Here's the command:
$ gcc -g -O3 madvice.c && ./a.out
Output:
madvise error: Invalid argument

You can't and even if you could do it in certain cases with certain flags (and the flags you're trying to use here should be relatively harmless), you shouldn't. madvise operates on memory from lower level allocations than malloc gives you and messing with the memory from malloc will likely break malloc.
If you want some block of memory that you can call madvise on, you should obtain it using mmap.

Your usage of sizeof is wrong; you are allocating only four bytes of memory (sizeof unsigned int), and calling madvise() with a size argument of 1M for the same chunk of memory.
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
void *p_optimize_object;
unsigned int optimize_object_size = 4096*256;
optimize_object_size = ((optimize_object_size / 4096) + 1) * 4096;
printf("optimize_object_size = %d\n", optimize_object_size);
p_optimize_object = malloc(sizeof(optimize_object_size));
fprintf(stderr, "Allocated %zu bytes\n", sizeof(optimize_object_size));
if (madvise(p_optimize_object, optimize_object_size, MADV_WILLNEED | MADV_SEQUENTIAL) == -1)
{
perror("madvise error");
}
printf("OK\n");
return 0;
}
Output:
optimize_object_size = 1052672
Allocated 4 bytes
madvise error: Invalid argument
OK
UPDATE:
And the other problem is that malloc() can give you non-aligned memory (probably with an alignment of 4,8,16,...), where madvice() wants page-aligned memory:
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
void *p_optimize_object;
unsigned int optimize_object_size = 4096*256;
int rc;
optimize_object_size = ((optimize_object_size / 4096) + 1) * 4096;
printf("optimize_object_size = %d\n", optimize_object_size);
#if 0
p_optimize_object = malloc(sizeof(optimize_object_size));
fprintf(stderr, "Allocated %zu bytes\n", sizeof(optimize_object_size));
#elif 0
p_optimize_object = malloc(optimize_object_size);
fprintf(stderr, "Allocated %zu bytes\n", optimize_object_size);
#else
rc = posix_memalign (&p_optimize_object, 4096, optimize_object_size);
fprintf(stderr, "Allocated %zu bytes:%d\n", optimize_object_size, rc);
#endif
// if (madvise(p_optimize_object, optimize_object_size, MADV_WILLNEED | MADV_SEQUENTIAL) == -1)
if (madvise(p_optimize_object, optimize_object_size, MADV_WILLNEED | MADV_DONTFORK) == -1)
{
perror("madvise error");
}
printf("OK\n");
return 0;
}
OUTPUT:
$ ./a.out
optimize_object_size = 1052672
Allocated 1052672 bytes:0
OK
And the alignement requerement appears to be linux-specific:
Linux Notes
The current Linux implementation (2.4.0) views this system call more as a command than as advice and hence may return an error when it cannot
do what it usually would do in response to this advice. (See the ERRORS description above.) This is non-standard behavior.
The Linux implementation requires that the address addr be page-aligned, and allows length to be zero. If there are some parts of the speci‐
fied address range that are not mapped, the Linux version of madvise() ignores them and applies the call to the rest (but returns ENOMEM from
the system call, as it should).
Finally:
I tried to use the MADV_DONTDUMP to save some space in my binaries but it didn't work.
Which, of course, doesn't make sense. Malloc or posix_memalign add to your address space, making (at least) the VSIZ of your running program larger. What happens to the this space is completely in the hands of the (kernel) memory manager, driven by your program's references to the particular memory, with maybe a few hints from madvice.

I tried to use the MADV_DONTDUMP to save some space in my binaries but it didn't work.
Read again, and more carefully, the madvise(2) man page.
The address should be page aligned. The result of malloc is generally not page aligned (page size is often 4Kbytes, but see sysconf(3) for SC_PAGESIZE). Use mmap(2) to ask for a page-aligned segment in your virtual address space.
You won't save any space in your binary executable. You'll just save space in your core dump, see core(5). And core dumps should not happen. See signal(7) (read also about segmentation fault and undefined behaviour).
To disable core dumps, consider rather setrlimit(2) with RLIMIT_CORE (or the ulimit -c bash builtin in your terminal running a bash shell).

Related

free() returns memory to the OS

My test code shows that after free() and before the program exits, the heap memory is returned to the OS. I use htop(same for top) to observe the behaviour. My glibc version is ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31 .
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#define BUFSIZE 10737418240
int main(){
printf("start\n");
u_int32_t* p = (u_int32_t*)malloc(BUFSIZE);
if (p == NULL){
printf("alloc 10GB failed\n");
exit(1);
}
memset(p, 0, BUFSIZ);
for(size_t i = 0; i < (BUFSIZE / 4); i++){
p[i] = 10;
}
printf("before free\n");
free(p);
sleep(1000);
printf("exit\n");
}
Why this question Why does the free() function not return memory to the operating system? observes an opposite behaviour compared to mine? The OP also uses linux and the question is asked in 2018. Do I miss something?
Linux treats allocations larger than MMAP_THRESHOLD differently. See Why does malloc rely on mmap starting from a certain threshold?
The question you linked, where allocations may not appear to be fully reclaimed immediately, uses small allocations which are sort of pooled together by malloc() and not instantly returned to the OS on each small deallocation (that would be slow). Your single huge allocation definitely goes via the mmap() path, and so is a totally independent allocation which will be fully and immediately reclaimed.
Think of it this way: if you ask someone to buy you eggs and milk, they will likely make a single trip and return with what you requested. But if you ask for eggs and a diamond ring, they will treat those as two totally separate requests, fulfilled using very different strategies. If you then say you no longer need the eggs and the ring, they may keep the eggs for when they get hungry, but they'll probably try to get their money back for the ring right away.
I did some experiments, read a chapter of The Linux Programming Interface and get an satisfying answer for myself.
First , the conclusion I have is:
Library call malloc uses system calls brk and mmap under the hood when allocating memory.
As #John Zwinck describs, a linux process would choose to use brk or mmap allocating mem depending on how much you request.
If allocating by brk, the process is probably not returning the memory to the OS before it terminates (sometimes it does). If by mmap, for my simple test the process returns the mem to OS before it terminates.
Experiment code (examine memory stats in htop at the same time):
code sample 1
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdint.h>
#define BUFSIZE 1073741824 //1GiB
// run `ulimit -s unlimited` first
int main(){
printf("start\n");
printf("%lu \n", sizeof(uint32_t));
uint32_t* p_arr[BUFSIZE / 4];
sleep(10);
for(size_t i = 0; i < (BUFSIZE / 4); i++){
uint32_t* p = (uint32_t*)malloc(sizeof(uint32_t));
if (p == NULL){
printf("alloc failed\n");
exit(1);
}
p_arr[i] = p;
}
printf("alloc done\n");
for(size_t i = 0; i < (BUFSIZE / 4); i++){
free(p_arr[i]);
}
printf("free done\n");
sleep(20);
printf("exit\n");
}
When it comes to "free done\n", and sleep(), you can see that the program still takes up the memory and doesn't return to the OS. And strace ./a.out showing brk gets called many times.
Note:
I am looping malloc to allocate memory. I expected it to take up only 1GiB ram but in fact it takes up 8GiB ram in total. malloc adds some extra bytes for bookeeping or whatever else. One should never allocate 1GiB in this way, in a loop like this.
code sample 2:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdint.h>
#define BUFSIZE 1073741824 //1GiB
int main(){
printf("start\n");
printf("%lu \n", sizeof(uint32_t));
uint32_t* p_arr[BUFSIZE / 4];
sleep(3);
for(size_t i = 0; i < (BUFSIZE / 4); i++){
uint32_t* p = (uint32_t*)malloc(sizeof(uint32_t));
if (p == NULL){
printf("alloc failed\n");
exit(1);
}
p_arr[i] = p;
}
printf("%p\n", p_arr[0]);
printf("alloc done\n");
for(size_t i = 0; i < (BUFSIZE / 4); i++){
free(p_arr[i]);
}
printf("free done\n");
printf("allocate again\n");
sleep(10);
for(size_t i = 0; i < (BUFSIZE / 4); i++){
uint32_t* p = malloc(sizeof(uint32_t));
if (p == NULL){
PFATAL("alloc failed\n");
}
p_arr[i] = p;
}
printf("allocate again done\n");
sleep(10);
for(size_t i = 0; i < (BUFSIZE / 4); i++){
free(p_arr[i]);
}
printf("%p\n", p_arr[0]);
sleep(3);
printf("exit\n");
}
This one is similar to sample 1, but it allocate again after free. The scecond allocation doesn't increase memory usage, it uses the freed yet not returned mem again.
code sample 3:
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#define MAX_ALLOCS 1000000
int main(int argc, char* argv[]){
int freeStep, freeMin, freeMax, blockSize, numAllocs, j;
char* ptr[MAX_ALLOCS];
printf("\n");
numAllocs = atoi(argv[1]);
blockSize = atoi(argv[2]);
freeStep = (argc > 3) ? atoi(argv[3]) : 1;
freeMin = (argc > 4) ? atoi(argv[4]) : 1;
freeMax = (argc > 5) ? atoi(argv[5]) : numAllocs;
assert(freeMax <= numAllocs);
printf("Initial program break: %10p\n", sbrk(0));
printf("Allocating %d*%d bytes\n", numAllocs, blockSize);
for(j = 0; j < numAllocs; j++){
ptr[j] = malloc(blockSize);
if(ptr[j] == NULL){
perror("malloc return NULL");
exit(EXIT_FAILURE);
}
}
printf("Program break is now: %10p\n", sbrk(0));
printf("Freeing blocks from %d to %d in steps of %d\n", freeMin, freeMax, freeStep);
for(j = freeMin - 1; j < freeMax; j += freeStep){
free(ptr[j]);
}
printf("After free(), program break is : %10p\n", sbrk(0));
printf("\n");
exit(EXIT_SUCCESS);
}
This one takes from The Linux Programming Interface and I simplifiy a bit.
Chapter 7:
The first two command-line arguments specify the number and size of
blocks to allocate. The third command-line argument specifies the loop
step unit to be used when freeing memory blocks. If we specify 1 here
(which is also the default if this argument is omitted), then the
program frees every memory block; if 2, then every second allocated
block; and so on. The fourth and fifth command-line arguments specify
the range of blocks that we wish to free. If these arguments are
omitted, then all allocated blocks (in steps given by the third
command-line argument) are freed.
Try run with:
./free_and_sbrk 1000 10240 2
./free_and_sbrk 1000 10240 1 1 999
./free_and_sbrk 1000 10240 1 500 1000
you will see only for the last example, the program break decreases, aka, the process returns some blocks of mem to OS (if I understand correctly).
This sample code is evidence of
"If allocating by brk, the process is probably not returning the memory to the OS before it terminates (sometimes it does)."
At last, quotes some useful paragraph from the book. I suggest reading Chapter 7 (section 7.1) of TLPI, very helpful.
In general, free() doesn’t lower the program break, but instead adds
the block of memory to a list of free blocks that are recycled by
future calls to malloc(). This is done for several reasons:
The block of memory being freed is typically somewhere in the middle of
the heap, rather than at the end, so that lowering the program break
is not possible.
It minimizes the number of sbrk() calls that the
program must perform. (As noted in Section 3.1, system calls have a
small but significant overhead.)
In many cases, lowering the break
would not help programs that allocate large amounts of memory, since
they typically tend to hold on to allocated memory or repeatedly
release and reallocate memory, rather than release it all and then
continue to run for an extended period of time.
What is program break (also from the book):
Also: https://www.wikiwand.com/en/Data_segment

mmap() fails when allocating large amounts of memory

For my program I need an array of bytes of the size of 1/8th of the processs virtual memory space.
I used the getrlimit() system call to get the virtual memory size, then set it to the maximum limit using setrlimit(). I then used mmap() to allocate an array the size of 1/8th of the virtual memory size. Like so:
struct rlimit mem_limit;
if(getrlimit(RLIMIT_AS, &mem_limit) != 0){
return -errno;
}
mem_limit.rlim_cur = mem_limit.rlim_max;
if(setrlimit(RLIMIT_AS, &mem_limit) != 0){
return -errno;
}
array_size = (mem_limit.rlim_cur)/8;
printf("memory size is %lu bytes, array size is %lu bytes\n", mem_limit.rlim_cur, array_size);
mem_array = (char*) mmap(0, array_size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if(mem_array == MAP_FAILED){
printf("mmap failed with %d. allocation size = %lu\n", errno, g_shadow_mem_size);
return -errno;
}
mmap() fails here with errno 12, which as far as I know means there's not enough memory. I don't understand why since the program barely allocates memory other than this, let alone the other 7/8th of the memory.
I tried using malloc(), specifying an offset for mmap(), using the soft limit instead of the hard limit, allocating 1/32 of the memory instead of 1/8, using MAP_NORESERVE in the flags - nothing works so far.
I tried running a simple test program that only does the mmap() and no other memory allocations and it doesn't work either.
This is what I get:
memory size is 18446744073709551615 bytes, array size is 2305843009213693951 bytes
mmap failed with 12. allocation size = 2305843009213693951
The manpage of getrlimit has the following explanation.
RLIMIT_AS
This is the maximum size of the process's virtual memory (address space).
It doesn't return the available memory size but the memory size (18EB) in the 64-bit address space.
If you need to find out the available memory size, you can use the sysinfo function.
#include <stdio.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <linux/kernel.h>
#include <sys/sysinfo.h>
int main()
{
struct rlimit m;
struct sysinfo s;
getrlimit(RLIMIT_AS, &m);
printf("getrlimit rlim_cur:%ld, rlim_max:%ld\n", m.rlim_cur, m.rlim_max);
sysinfo(&s);
printf("sysinfo totalram:%ld, freeram:%ld, totalswap:%ld, freeswap:%ld\n", s.totalram, s.freeram, s.totalswap, s.freeswap);
return 0;
}

Mapping existing memory (data segment) to another memory segment

As the title suggests, I would like to ask if there is any way for me to map the data segment of my executable to another memory so that any changes to the second are updated instantly on the first. One initial thought I had was to use mmap, but unfortunately mmap requires a file descriptor and I do not know of a way to somehow open a file descriptor on my running processes memory. I tried to use shmget/shmat in order to create a shared memory object on the process data segment (&__data_start) but again I failed ( even though that might have been a mistake on my end as I am unfamiliar with the shm API). A similar question I found is this: Linux mapping virtual memory range to existing virtual memory range? , but the replies are not helpful.. Any thoughts are welcome.
Thank you in advance.
Some pseudocode would look like this:
extern char __data_start, _end;
char test = 'A';
int main(int argc, char *argv[]){
size_t size = &_end - &__data_start;
char *mirror = malloc(size);
magic_map(&__data_start, mirror, size); //this is the part I need.
printf("%c\n", test) // prints A
int offset = &test - &__data_start;
*(mirror + offset) = 'B';
printf("%c\n", test) // prints B
free(mirror);
return 0;
}
it appears I managed to solve this. To be honest I don't know if it will cause problems in the future and what side effects this might have, but this is it (If any issues arise I will try to log them here for future references).
Solution:
Basically what I did was use the mmap flags MAP_ANONYMOUS and MAP_FIXED.
MAP_ANONYMOUS: With this flag a file descriptor is no longer required (hence the -1 in the call)
MAP_FIXED: With this flag the addr argument is no longer a hint, but it will put the mapping on the address you specify.
MAP_SHARED: With this you have the shared mapping so that any changes are visible to the original mapping.
I have left in a comment the munmap function. This is because if unmap executes we free the data_segment (pointed to by &__data_start) and as a result the global and static variables are corrupted. When at_exit function is called after main returns the program will crash with a segmentation fault. (Because it tries to double free the data segment)
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#define _GNU_SOURCE 1
#include <unistd.h>
#include <sys/mman.h>
extern char __data_start;
extern char _end;
int test = 10;
int main(int argc, char *argv[])
{
size_t size = 4096;
char *shared = mmap(&__data_start, 4096, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_ANONYMOUS | MAP_SHARED, -1, 0);
if(shared == (void *)-1){
printf("Cant mmap\n");
exit(-1);
}
printf("original: %p, shared: %p\n",&__data_start, shared);
size_t offset = (void *)&test - (void *)&__data_start;
*(shared+offset) = 50;
msync(shared, 4096, MS_SYNC);
printf("test: %d :: %d\n", test, *(shared+offset));
test = 25;
printf("test: %d :: %d\n", test, *(shared+offset));
//munmap(shared, 4096);
}
Output:
original: 0x55c4066eb000, shared: 0x55c4066eb000
test: 50 :: 50
test: 25 :: 25

How to allocate a large memory in C [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I'm trying to write a program that can obtain a 1GB memory from the system by malloc(1024*1024*1024).
After I got the start address of the memory, In my limited understanding, if I want to initialize it, just using memset() to achieve. But the truth is there will trigger a segfault after a while.
And I tried using gdb to find where cause it, finally found if I do some operate of memory more than 128 MB will lead to this fault.
Is there has any rule that limits program just can access memory less than 128 MB? Or I used the wrong way to allocate and initialize it?
If there is a need for additional information, please tell me.
Any suggestion will be appreciated.
[Platform]
Linux 4.10.1 with gcc 5.4.0
Build program with gcc test.c -o test
CPU: Intel i7-6700
RAM: 16GB
[Code]
size_t mem_size = 1024 * 1024 * 1024;
...
void *based = malloc(mem_size); //mem_size = 1024^3
int stage = 65536;
int initialized = 0;
if (based) {
printf("Allocated %zu Bytes from %lx to %lx\n", mem_size, based, based + mem_size);
} else {
printf("Error in allocation.\n");
return 1;
}
int n = 0;
while (initialized < mem_size) { //initialize it in batches
printf("%6d %lx-%lx\n", n++, based+initialized, based+initialized+stage);
memset(based + initialized, '$', stage);
initialized += stage;
}
[Result]
Allocated 1073741824 Bytes from 7f74c9e66010 to 7f76c9e66010
...
2045 7f7509ce6010-7f7509d66010
2046 7f7509d66010-7f7509de6010
2047 7f7509de6010-7f7509e66010
2048 7f7509e66010-7f7509ee6010 //2048*65536(B)=128(MB)
Segmentation fault (core dumped)
There are two possible issues here. The first is that you're not using malloc() correctly. You need to check if it returns NULL, or a non-NULL value.
The other issue could be that the OS is over-committing memory, and the out-of-memory (OOM) killer is terminating your process. You can disable over-committing of memory and getting dumps to detect via these instructions.
Edit
Two major problems:
Don't do operations with side effects (ie: n++) inside a logging statement. VERY BAD practice, as logging calls are often removed at compile time in large projects, and now the program behaves differently.
Cast based to a (char *).
This should help with your problem.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
size_t mem_size = 1024 * 1024 * 1024;
printf("MEMSIZE: %lu\n", mem_size);
printf("SIZE OF: void*:%lu\n", sizeof(void*));
printf("SIZE OF: char*:%lu\n", sizeof(char*));
void *based = malloc(mem_size); //mem_size = 1024^3
int stage = 65536;
int initialized = 0;
if (based) {
printf("Allocated %zu Bytes from %p to %p\n", mem_size, based, based + mem_size);
} else {
printf("Error in allocation.\n");
return 1;
}
int n = 0;
while (initialized < mem_size) { //initialize it in batches
//printf("%6d %p-%p\n", n, based+initialized, based+initialized+stage);
n++;
memset((char *)based + initialized, '$', stage);
initialized += stage;
}
free(based);
return 0;
}
Holy, I found the problem - pointer type goes wrong.
Here is the complete code
int main(int argc, char *argv[]) {
/*Allocate 1GB memory*/
size_t mem_size = 1024 * 1024 * 1024;
// the problem is here, I used to use pointer as long long type
char* based = malloc(mem_size);
// and it misleading system to calculate incorrect offset
if (based) {
printf("Allocated %zu Bytes from %lx to %lx\n", mem_size, based, based + mem_size);
} else {
printf("Allocation Error.\n");
return 1;
}
/*Initialize the memory in batches*/
size_t stage = 65536;
size_t initialized = 0;
while (initialized < mem_size) {
memset(based + initialized, '$', stage);
initialized += stage;
}
/*And then I set the breakpoint, check the memory content with gdb*/
...
return 0;
Thank you for the people who have given me advice or comments :)
It is very unusual for a process to need such a large chunk of continuous memory and yes, the kernel does impose such memory limitations. You should probably know that malloc() when dealing with a memory request larger than 128 Kb it calls mmap() behind the curtains. You should try to use that directly.
You should also know that the default policy for the kernel when allocating is to allocate more memory than it has.
The logic is that most allocated memory is not actually used so it relatively safe to allow allocations that exceed the actual memory of the system.
EDIT: As some people have it pointed out, when your process does start to use the memory allocated successfully by the kernel it will get killed by the OOM Killer. This code has produced the following output:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int arc, char**argv)
{
char *c = malloc(sizeof(char) * 1024 * 1024 * 1024 * 5);
if(c)
{
printf("allocated memory\n");
memset(c, 1, sizeof(char) * 1024 * 1024 * 1024 * 5);
free(c);
}
else
{
printf("Out of memory\n");
}
return 0;
}
Output:
$ ./a.out
allocated memory
Killed
But after you change the limits of the system:
# echo 2 > /proc/sys/vm/overcommit_memory
# exit
exit
$ ./a.out
Out of memory
As you can see, the memory allocation was successful on the system and the problem appeared only after the memory was used
:EDIT
There are limits that the kernel imposes on how much memory you can allocate and you can check them out with these commands:
grep -e MemTotal -e CommitLimit -e Committed_AS /proc/meminfo
ulimit -a
The first command will print the total memory and the second will display the limit that the kernel imposes on allocations (CommitLimit). The limit is based on your system memory and the over-commit ratio defined on your system that you can check with this command cat /proc/sys/vm/overcommit_ratio.
The Committed_AS is the memory that is already allocated to the system at the moment. You will notice that this can exceed the Total Memory without causing a crash.
You can change the default behavior of your kernel to never overcommit by writing echo 2 > /proc/sys/vm/overcommit_memory You can check the man pages for more info on this.
I recommend checking the limits on your system and then disabling the default overcommit behavior of the kernel. Then try to see if your system can actually allocated that much memory by checking to see if malloc() of mmap() fail when allocating.
sources: LSFMM: Improving the out-of-memory killer
and Mastering Embedded Linux Programming by Chris Simmonds

How to map two virtual adresses on the same physical memory on linux?

I'm facing a quite tricky problem. I'm trying to get 2 virtual memory areas pointing to the same physical memory. The point is to have different page protection parameters on different memory areas.
On this forum, the user seems to have a solution, but it seems kinda hacky and it's pretty clear that something better can be done performance-wise :
http://www.linuxforums.org/forum/programming-scripting/19491-map-two-virtual-memory-addres-same-physical-page.html
As I'm facing the same problem, I want to give a shot here to know if somebody has a better idea. Don't be afraid to mention the dirty details behind the hood, this is what this question is about.
Thank by advance.
Since Linux kernel 3.17 (released in October 2014) you can use memfd_create system call to create a file descriptor backed by anonymous memory. Then mmap the same region several times, as mentioned in the above answers.
Note that glibc wrapper for the memfd_create system call was added in glibc 2.27 (released in February 2018). The glibc manual also describes how the descriptor returned can be used to create multiple mappings to the same underlying memory.
I'm trying to get 2 virtual memory area pointing on the same physical memory.
mmap the same region in the same file, twice, or use System V shared memory (which does not require mapping a file in memory).
I suppose if you dislike Sys V shared memrory you could use POSIX shared memory objects. They're not very popular but available on Linux and BSDs at least.
Once you get an fd with shm_open you could immediately call shm_unlink. Then no other process can attach to the same shared memory, and you can mmap it multiple times. Still a small race period available though.
As suggested by #PerJohansson, I wrote & tested following code, it works well on linux, using mmap with MAP_SHARED|MAP_FIXED flag, we can map the same physical page allocated by POSIX shm object multiple times and continuously into very large virtual memory.
#include "stdio.h"
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h> /* For mode constants */
#include <fcntl.h> /* For O_* constants */
void * alloc_1page_mem(int size) {
int fd;
char * ptr_base;
char * rptr;
/* Create shared memory object and set its size */
fd = shm_open("/myregion", O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
if (fd == -1) {
perror("error in shm_open");
return NULL;
}
if (ftruncate(fd, 4096) == -1) {
perror("error in ftruncate");
return NULL;
}
// following trick reserves big enough holes in VM space
ptr_base = rptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
munmap(rptr, size);
for(int i=0; i<size; i+=4096) {
rptr = mmap(rptr, 4096, PROT_READ | PROT_WRITE, MAP_SHARED|MAP_FIXED, fd, 0);
if (rptr == MAP_FAILED) {
perror("error in mmap");
return NULL;
}
rptr += 4096;
}
close(fd);
shm_unlink("/myregion");
return ptr_base;
}
void check(int * p, int total_cnt){
for (int i=0;i<4096/sizeof(int);i++) {
p[i] = i;
}
int fail_cnt = 0;
for (int k=0; k<total_cnt; k+= 4096/sizeof(int)) {
for (int i=0;i<4096/sizeof(int);i++) {
if (p[k+i] != i)
fail_cnt ++;
}
}
printf("fail_cnt=%d\n", fail_cnt);
}
int main(int argc, const char * argv[]) {
const char * cmd = argv[1];
int sum;
int total_cnt = 32*1024*1024;
int * p = NULL;
if (*cmd++ == '1')
p = alloc_1page_mem(total_cnt*sizeof(int));
else
p = malloc(total_cnt*sizeof(int));
sum = 0;
while(*cmd) {
switch(*cmd++) {
case 'c':
check(p, total_cnt);
break;
case 'w':
// save only 4bytes per cache line
for (int k=0;k<total_cnt;k+=64/sizeof(int)){
p[k] = sum;
}
break;
case 'r':
// read only 4bytes per cache line
for (int k=0;k<total_cnt;k+=64/sizeof(int)) {
sum += p[k];
}
break;
case 'p':
// prevent sum from being optimized
printf("sum=%d\n", sum);
}
}
return 0;
}
You can observe very low cache miss rate on memory allocated in such method:
$ sudo perf stat -e mem_load_retired.l3_miss -- ./a.out 0wrrrrr
# this produces L3 miss linearly increase with number of 'r' charaters
$ sudo perf stat -e mem_load_retired.l3_miss -- ./a.out 1wrrrrr
# this produces almost constant L3 miss.
If you are root, you can mmap("/dev/mem", ...) but there are caveats in the newer kernels, see accessing mmaped /dev/mem?

Resources