Zero RAM using C in Linux - c

How can I zero unused RAM in Linux for security purposes ? I wrote this simple C program but I do not know if the RAM called by malloc will be reused at the next loop or if new RAM will be used. Hopefully, after a few minutes the entire RAM will have been zeroed.
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char *a = NULL; // declare variable
while(1) // infinite loop
{
a = malloc(524288); // half a MB
memset(a, 0, 524288); // zero
free(a); // free
sleep(1); // sleep for 1 second
}
}

Linux already has a kernel process that is zeroing memory using idle cycles so it will have memory ready to hand to processes that request it.
Your loop may or may not zero different memory depending on the particular malloc implementation. If you really want to write a process like you describe, look into using sbrk directly to ensure you're cycling memory in and out of your process. I bet if you check you'll find every byte given to you by sbrk is already zero, though.

You can't zero system RAM. The system owns it. If you want to run a system which zeros the RAM then you need to write your own OS!

As long as you never access uninitialized memory, you don't have to worry about what someone else left behind. As long as you never free memory before zeroing it out, you don't have to worry about what you have left behind.

I think you need to write a kernel module to actually do this reliably. And then you still could only zero unused pages. Note that pages that were used by other processes will be cleared automatically by the kernel on allocation.
What are you trying to do? Avoid cold boot attacks?

Typically, on my system (2.6.36) I can free all the unused (but allocated) memory by just doing a while(1) malloc(); loop, and killing it when it stops allocating memory.

Related

C program slowly takes up more memory [duplicate]

I wrote a C program in Linux that mallocs memory, ran it in a loop, and TOP didn't show any memory consumption.
then I've done something with that memory, and TOP did show memory consumption.
When I malloc, do I really "get memory", or is there a "lazy" memory management, that only gives me the memory if/when I use it?
(There is also an option that TOP only know about memory consumption when I use it, so I'm not sure about this..)
Thanks
On Linux, malloc requests memory with sbrk() or mmap() - either way, your address space is expanded immediately, but Linux does not assign actual pages of physical memory until the first write to the page in question. You can see the address space expansion in the VIRT column, while the actual, physical memory usage in RES.
This starts a little off subject ( and then I'll tie it in to your question ), but what's happening is similar to what happens when you fork a process in Linux. When forking there is a mechanism called copy on write which only copies the memory space for the new process when the memory is written too. This way if the forked process exec's a new program right away then you've saved the overhead of copying the original programs memory.
Getting back to your question, the idea is similar. As others have pointed out, requesting the memory gets you the virtual memory space immediately, but the actual pages are only allocated when write to them.
What's the purpose of this? It basically makes mallocing memory a more or less constant time operation Big O(1) instead of a Big O(n) operation ( similar to the way the linux scheduler spreads it's work out instead of doing it in one big chunk ).
To demonstrate what I mean I did the following experiment:
rbarnes#rbarnes-desktop:~/test_code$ time ./bigmalloc
real 0m0.005s
user 0m0.000s
sys 0m0.004s
rbarnes#rbarnes-desktop:~/test_code$ time ./deadbeef
real 0m0.558s
user 0m0.000s
sys 0m0.492s
rbarnes#rbarnes-desktop:~/test_code$ time ./justwrites
real 0m0.006s
user 0m0.000s
sys 0m0.008s
The bigmalloc program allocates 20 million ints, but doesn't do anything with them. deadbeef writes one int to each page resulting in 19531 writes and justwrites allocates 19531 ints and zeros them out. As you can see deadbeef takes about 100 times longer to execute than bigmalloc and about 50 times longer than justwrites.
#include <stdlib.h>
int main(int argc, char **argv) {
int *big = malloc(sizeof(int)*20000000); // allocate 80 million bytes
return 0;
}
.
#include <stdlib.h>
int main(int argc, char **argv) {
int *big = malloc(sizeof(int)*20000000); // allocate 80 million bytes
// immediately write to each page to simulate all at once allocation
// assuming 4k page size on 32bit machine
for ( int* end = big + 20000000; big < end; big+=1024 ) *big = 0xDEADBEEF ;
return 0;
}
.
#include <stdlib.h>
int main(int argc, char **argv) {
int *big = calloc(sizeof(int),19531); // number of writes
return 0;
}
Yes, the memory isn't mapped into your memoryspace unless you touch it. mallocing memory will only setup the paging tables so they know when you get a pagefault in the allocated memory, the memory should be mapped in.
Are you using compiler optimizations? Maybe the optimizer has removed the allocation since you're not using the allocated resources?
The feature is called overcommit - kernel "promises" you memory by increasing the data segment size, but does not allocate physical memory to it. When you touch an address in that new space the process page-faults into the kernel, which then tries to map physical pages to it.
Yes, note the VirtualAlloc flags,
MEM_RESERVE
MEM_COMMIT
.
Heh, but for Linux, or any POSIX/BSD/SVR# system, vfork(), has been around for ages and provides simular functionality.
The vfork() function differs from
fork() only in that the child process
can share code and data with the
calling process (parent process). This
speeds cloning activity significantly
at a risk to the integrity of the
parent process if vfork() is misused.
The use of vfork() for any purpose
except as a prelude to an immediate
call to a function from the exec
family, or to _exit(), is not advised.
The vfork() function can be used to
create new processes without fully
copying the address space of the old
process. If a forked process is simply
going to call exec, the data space
copied from the parent to the child by
fork() is not used. This is
particularly inefficient in a paged
environment, making vfork()
particularly useful. Depending upon
the size of the parent's data space,
vfork() can give a significant
performance improvement over fork().

C malloc and free

I was taught that if you do malloc(), but you don't free(), the memory will stay taken until a restart happens. Well, I of course tested it. A very simple code:
#include <stdlib.h>
int main(void)
{
while (1) malloc(1000);
}
And I watched over it in Task Manager (Windows 8.1).
Well, the program took up 2037.4 MB really quickly and just stayed like that. I understand it's probably Windows limiting the program.
But here is the weird part: When I closed the console, the memory use percentage went down, even though I was taught that it isn't supposed to!
Is it redundant to call free, since the operating system frees it up anyway?
(The question over here is related, but doesn't quite answer whether I should free or not.)
On Windows, a 32 bit process can only allocate 2048 megabytes because that's how many addresses are there. Some of this memory is probably reserved by Windows, so the total figure is lower. malloc returns a null pointer when it fails, which is likely what happens at that point. You could modify your program like this to see that:
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
int counter = 0;
while (1) {
counter++;
if (malloc(1000) == NULL) {
printf("malloc failed after %d calls\n", counter);
return 0;
}
}
}
Now you should get output like this:
$ ./mem
malloc failed after 3921373 calls
When a process terminates or when it is terminated from the outside (as you do by killing it through the task manager), all memory allocated by the process is released. The operating system manages what memory belongs to what process and can therefore free the memory of a process when it terminates. The operating system does not however know how each process uses the memory it requested from the operating system and depends on the process telling it when it doesn't need a chunk of memory anymore.
Why do you need free() then? Well, this only happens on program termination and does not discriminate between memory you still need and memory you don't need any more. When your process is doing complicated things, it is often constantly allocating and releasing memory for its own computations. It's important to release memory explicitly with free() because otherwise your process might at some point no longer be able to allocate new memory and crashes. It's also good programming practice to release memory when you can so your process does not unnecessarily eat up tons of memory. Instead, other processes can use that memory.
It is advisable to call free after you are done with the memory you had allocated, as you may need this memory space later in your program and it will be a problem if there was no memory space for new allocations.
You should always seek portability for your code.If windows frees this space, may be other operating systems don't.
Every process in the Operating System have a limited amount of addressable memory called the Process Address Space. If you allocate a huge amount of memory and you end up allocating all of the memory available for this process, malloc will fail and return NULL. And you will not be able to allocate memory for this process anymore.
With all non-trivial OS, process resources are reclaimed by the OS upon process termination.
Unless there is specifc and overriding reason to explicitly free memory upon termination, you don't need to do it and you should not try for at least these reasons:
1) You would need to write code to do it, and test it, and debug it. Why do this, if the OS can do it? It's not just redundant, it reduces quality because your explict resource-releasing will never get as much testing as the OS has already had before it got released.
2) With a complex app, with a GUI and many subsystems and threads, cleanly freeing memory on shutdown is nigh-on impossible anyway, which leads to:
3) Many library developers have already given up on the 'you must explicitly release blah... ' mantra because the complexity would result in the libs never being released. Many report unreleased, (but not lost), memory to valgrid and, with opaque libs, you can do nothing at all about it.
4) You must not free any memory that is in use by a running thread. To safely release all such memory in multithreaded apps, you must ensure that all process threads are stopped and cannot run again. User code does not have the tools to do this, the OS does. It is therefore not possible to explicitly free memory from user code in any kind of safe manner in such apps.
5) The OS can free off the process memory in big chunks - much more quickly than messing around with dozens of sub-allcations in the C manager.
6) If the process is being terminated because it has failed due to memory management issues, calling free() many more times is not going to help at all.
7) Many teachers and profs say that you must explicity free the memory, so it's obviously a bad plan.

Can i force linux kernel to use particular memory pages for new executable

When i execute binary i want their stack segment to be filled with special data.
All i do is just write program that allocate huge buffer on a stack, call a lot of malloc and mmap and for example fill all this memory with 'A' character. Then i check and see that about 80% of whole memory are used by my program. Then i stop this program and start another program that just go through the stack and check values on stack. By i don't see my 'A' character anyway. Can someone tell me how can i do it?
UPD
Why i do is just because of one ctf. i mention task like.
int func()
{
int i;
if(i == 0xdeadbeef)
system("cat flag");
else
func();
}
int main()
{
func();
}
No, not without making heavy changes to the kernel. New anonymous pages are always zero-filled, and even if you could fill them with something else, there would be no reasonable way you could make them carry over data from old processes. Doing so would be a huge security hole in itself.

How many times this loop will run?

Interview asked question:
while(1)
{
void * a = malloc(1024*1024);
}
How many times this loop will run on a 2 gb ram and a 8 gb ram ?
I said infinite loop because there is no terminating condition even if memory will be full.
He din't agree.I don't have any idea now.Please help.
It should run indefinitely. On most platforms, when there's no more memory available, malloc() will return 0, so the loop will keep on running without changing the amount of memory allocated. Linux allows memory over-commitment so that malloc() calls continue to add to virtual memory. The process might eventually get killed by the OOM Killer when the data that malloc() uses to administer the memory starts to cause problems (it won't be because you try using the allocated memory itself because the code doesn't use it), but Linux isn't stipulated as the platform in the question.

Why malloc+memset is slower than calloc?

It's known that calloc is different than malloc in that it initializes the memory allocated. With calloc, the memory is set to zero. With malloc, the memory is not cleared.
So in everyday work, I regard calloc as malloc+memset.
Incidentally, for fun, I wrote the following code for a benchmark.
The result is confusing.
Code 1:
#include<stdio.h>
#include<stdlib.h>
#define BLOCK_SIZE 1024*1024*256
int main()
{
int i=0;
char *buf[10];
while(i<10)
{
buf[i] = (char*)calloc(1,BLOCK_SIZE);
i++;
}
}
Output of Code 1:
time ./a.out
**real 0m0.287s**
user 0m0.095s
sys 0m0.192s
Code 2:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define BLOCK_SIZE 1024*1024*256
int main()
{
int i=0;
char *buf[10];
while(i<10)
{
buf[i] = (char*)malloc(BLOCK_SIZE);
memset(buf[i],'\0',BLOCK_SIZE);
i++;
}
}
Output of Code 2:
time ./a.out
**real 0m2.693s**
user 0m0.973s
sys 0m1.721s
Replacing memset with bzero(buf[i],BLOCK_SIZE) in Code 2 produces the same result.
My question is: Why is malloc+memset so much slower than calloc? How can calloc do that?
The short version: Always use calloc() instead of malloc()+memset(). In most cases, they will be the same. In some cases, calloc() will do less work because it can skip memset() entirely. In other cases, calloc() can even cheat and not allocate any memory! However, malloc()+memset() will always do the full amount of work.
Understanding this requires a short tour of the memory system.
Quick tour of memory
There are four main parts here: your program, the standard library, the kernel, and the page tables. You already know your program, so...
Memory allocators like malloc() and calloc() are mostly there to take small allocations (anything from 1 byte to 100s of KB) and group them into larger pools of memory. For example, if you allocate 16 bytes, malloc() will first try to get 16 bytes out of one of its pools, and then ask for more memory from the kernel when the pool runs dry. However, since the program you're asking about is allocating for a large amount of memory at once, malloc() and calloc() will just ask for that memory directly from the kernel. The threshold for this behavior depends on your system, but I've seen 1 MiB used as the threshold.
The kernel is responsible for allocating actual RAM to each process and making sure that processes don't interfere with the memory of other processes. This is called memory protection, it has been dirt common since the 1990s, and it's the reason why one program can crash without bringing down the whole system. So when a program needs more memory, it can't just take the memory, but instead it asks for the memory from the kernel using a system call like mmap() or sbrk(). The kernel will give RAM to each process by modifying the page table.
The page table maps memory addresses to actual physical RAM. Your process's addresses, 0x00000000 to 0xFFFFFFFF on a 32-bit system, aren't real memory but instead are addresses in virtual memory. The processor divides these addresses into 4 KiB pages, and each page can be assigned to a different piece of physical RAM by modifying the page table. Only the kernel is permitted to modify the page table.
How it doesn't work
Here's how allocating 256 MiB does not work:
Your process calls calloc() and asks for 256 MiB.
The standard library calls mmap() and asks for 256 MiB.
The kernel finds 256 MiB of unused RAM and gives it to your process by modifying the page table.
The standard library zeroes the RAM with memset() and returns from calloc().
Your process eventually exits, and the kernel reclaims the RAM so it can be used by another process.
How it actually works
The above process would work, but it just doesn't happen this way. There are three major differences.
When your process gets new memory from the kernel, that memory was probably used by some other process previously. This is a security risk. What if that memory has passwords, encryption keys, or secret salsa recipes? To keep sensitive data from leaking, the kernel always scrubs memory before giving it to a process. We might as well scrub the memory by zeroing it, and if new memory is zeroed we might as well make it a guarantee, so mmap() guarantees that the new memory it returns is always zeroed.
There are a lot of programs out there that allocate memory but don't use the memory right away. Sometimes memory is allocated but never used. The kernel knows this and is lazy. When you allocate new memory, the kernel doesn't touch the page table at all and doesn't give any RAM to your process. Instead, it finds some address space in your process, makes a note of what is supposed to go there, and makes a promise that it will put RAM there if your program ever actually uses it. When your program tries to read or write from those addresses, the processor triggers a page fault and the kernel steps in to assign RAM to those addresses and resumes your program. If you never use the memory, the page fault never happens and your program never actually gets the RAM.
Some processes allocate memory and then read from it without modifying it. This means that a lot of pages in memory across different processes may be filled with pristine zeroes returned from mmap(). Since these pages are all the same, the kernel makes all these virtual addresses point to a single shared 4 KiB page of memory filled with zeroes. If you try to write to that memory, the processor triggers another page fault and the kernel steps in to give you a fresh page of zeroes that isn't shared with any other programs.
The final process looks more like this:
Your process calls calloc() and asks for 256 MiB.
The standard library calls mmap() and asks for 256 MiB.
The kernel finds 256 MiB of unused address space, makes a note about what that address space is now used for, and returns.
The standard library knows that the result of mmap() is always filled with zeroes (or will be once it actually gets some RAM), so it doesn't touch the memory, so there is no page fault, and the RAM is never given to your process.
Your process eventually exits, and the kernel doesn't need to reclaim the RAM because it was never allocated in the first place.
If you use memset() to zero the page, memset() will trigger the page fault, cause the RAM to get allocated, and then zero it even though it is already filled with zeroes. This is an enormous amount of extra work, and explains why calloc() is faster than malloc() and memset(). If you end up using the memory anyway, calloc() is still faster than malloc() and memset() but the difference is not quite so ridiculous.
This doesn't always work
Not all systems have paged virtual memory, so not all systems can use these optimizations. This applies to very old processors like the 80286 as well as embedded processors which are just too small for a sophisticated memory management unit.
This also won't always work with smaller allocations. With smaller allocations, calloc() gets memory from a shared pool instead of going directly to the kernel. In general, the shared pool might have junk data stored in it from old memory that was used and freed with free(), so calloc() could take that memory and call memset() to clear it out. Common implementations will track which parts of the shared pool are pristine and still filled with zeroes, but not all implementations do this.
Dispelling some wrong answers
Depending on the operating system, the kernel may or may not zero memory in its free time, in case you need to get some zeroed memory later. Linux does not zero memory ahead of time, and Dragonfly BSD recently also removed this feature from their kernel. Some other kernels do zero memory ahead of time, however. Zeroing pages during idle isn't enough to explain the large performance differences anyway.
The calloc() function is not using some special memory-aligned version of memset(), and that wouldn't make it much faster anyway. Most memset() implementations for modern processors look kind of like this:
function memset(dest, c, len)
// one byte at a time, until the dest is aligned...
while (len > 0 && ((unsigned int)dest & 15))
*dest++ = c
len -= 1
// now write big chunks at a time (processor-specific)...
// block size might not be 16, it's just pseudocode
while (len >= 16)
// some optimized vector code goes here
// glibc uses SSE2 when available
dest += 16
len -= 16
// the end is not aligned, so one byte at a time
while (len > 0)
*dest++ = c
len -= 1
So you can see, memset() is very fast and you're not really going to get anything better for large blocks of memory.
The fact that memset() is zeroing memory that is already zeroed does mean that the memory gets zeroed twice, but that only explains a 2x performance difference. The performance difference here is much larger (I measured more than three orders of magnitude on my system between malloc()+memset() and calloc()).
Party trick
Instead of looping 10 times, write a program that allocates memory until malloc() or calloc() returns NULL.
What happens if you add memset()?
Because on many systems, in spare processing time, the OS goes around setting free memory to zero on its own and marking it safe for calloc(), so when you call calloc(), it may already have free, zeroed memory to give you.
On some platforms in some modes malloc initialises the memory to some typically non-zero value before returning it, so the second version could well initialize the memory twice

Resources