I'm about to teach an introductory computer science course in C and I'd like to demonstrate to students why they should check whether malloc() returned a NULL. My plan was to use ulimit to restrict the amount of available memory such that I could exercise different code paths with different limits. Our prescribed environment is CentOS 6.5.
My first attempts to make this happened failed and the shell showed "Killed". This led to me discovering the Linux OOM killer. I have since tried to figure out the magic set of incantations that will cause the results I'm looking for. Apparently I need to mess with:
/etc/sysctl.conf
ulimit -m
ulimit -v
vm.overcommit_memory (which apparently should be set to 2, according to an Oracle article)
This far either I get "Killed" or a segmentation fault, neither of which is the expected outcome. The fact that I'm getting "Killed" with vm_overcommit_memory=2 means that I definitely don't understand what's going on.
If anyone can find a way to artificially and reliably create a constrained execution environment on CentOS so that students learn how to handle OOM (and other?) kinds of errors, many course instructors will thank you.
It is possible to [effectively] turn off overcommitting from kernel >= 2.5.30.
Following Linux Kernel Memory :
// save your work here and note your current overcommit_ratio value
# echo 2 > overcommit_memory
# echo 1 > overcommit_ratio
this sets the VM_OVERCOMMIT_MEMORY to 2 indicating not to overcommit past the overcommit_ratio, which is set to 1 (ie no overcommitting)
Null malloc demo
#include <stdio.h>
#include <stdlib.h>
int main(int argc,char *argv[])
{
void *page = 0; int index;
void *pages[256];
index = 0;
while(1)
{
page = malloc(1073741824); //1GB
if(!page)break;
pages[index] = page;
++index;
if(index >= 256)break;
}
if(index >= 256)
{
printf("allocated 256 pages\n");
}
else
{
printf("memory failed at %d\n",index);
}
while(index > 0)
{
--index;
free(pages[index]);
}
return 0;
}
Output
$ cat /proc/sys/vm/overcommit_memory
0
$ cat /proc/sys/vm/overcommit_ratio
50
$ ./code/stackoverflow/test-memory
allocated 256 pages
$ su
# echo 2 > /proc/sys/vm/overcommit_memory
# echo 1 > /proc/sys/vm/overcommit_ratio
# exit
exit
$ cat /proc/sys/vm/overcommit_memory
2
$ cat /proc/sys/vm/overcommit_ratio
1
$ ./code/stackoverflow/test-memory
memory failed at 0
remember to restore your overcommit_memory to 0 and overcommit_ratio as noted
Related
I am writing a program that generates hashes for files in all subdirectories and then puts them in a database or prints them to standard output: https://github.com/cherrry9/dedup
In the latest commit, I added option for my program to use multiple threads (THREADS macro).
Here are some benchmarks that I did:
$ test() { /usr/bin/time -p ./dedup / -v 0 -c 2048 -e "/\(proc\|sys\|dev\|run\)"; }
$ make clean all THREADS=1 test
real 8.03
user 4.34
sys 4.55
$ make clean all THREADS=4 && test
real 3.94
user 7.66
sys 7.42
As you can the version compiled with THREADS=4 was 2 times faster.
Now I will use the second positional argument to specify sqlite3 database:
$ test() { /usr/bin/time -p ./dedup / test.db -v 0 -c 2048 -e "/\(proc\|sys\|dev\|run\)"; }
$ make clean all THREADS=1 && test
real 20.40
user 7.58
sys 7.29
$ rm test.db
$ make clean all THREADS=4 && test
real 21.86
user 17.17
sys 18.15
Version compiled with THREADS=4 was slower than version that used THREADS=1!
When I used second argument, in dedup.c was executed this code that inserted hashes to database:
if (sql != NULL && sql_insert(sql, entry->fpath, hash) != 0) {
// ...
sql_insert uses transactions to prevent sqlite from writing to database every time I call INSERT.
int
sql_insert(SQL *sql, const char *filename, char unsigned hash[])
{
int errcode;
pthread_mutex_lock(&sql->mtx);
sqlite3_bind_text(sql->stmt, 1, filename, -1, NULL);
sqlite3_bind_blob(sql->stmt, 2, hash, SHA256_LENGTH, NULL);
sqlite3_step(sql->stmt);
SQL_TRY(sqlite3_reset(sql->stmt));
if (++sql->insertc >= INSERT_LIM) {
SQL_TRY(sqlite3_exec(sql->database, "COMMIT;BEGIN", NULL, NULL, NULL));
sql->insertc = 0;
}
pthread_mutex_unlock(&sql->mtx);
return 0;
}
This fragment is executed for every processed file and for some reason it's blocking all threads in my program.
And here's my question, how can i prevent sqlite from blocking threads and degrading the performance of my program?
Here is dedup options explanation if you wonder what test function is doing:
1th positional argument - directory to use to generate hashes
2th positional argument - path to databases which will be used by sqlite3
-v level - verbose level (0 means print only errors)
-c nbytes - read nbytes from each file
-e regex - exclude directories that match regex
I'm using serialized mode in sqlite3.
It seems that all your threads use the same database connection and statement objects. Therefore you have a race-condition (even in SERIALIZED threading model), as multiple threads are binding, stepping, and resetting the same statement. Asking 'why is it slow' becomes irrelevant until you fix this problem.
Instead you should wrap your sql_insert with a mutex to guarantee that at most one thread is accessing the database connection:
int
sql_insert(SQL *sql, const char *filename, char unsigned hash[])
{
pthread_mutex_lock(&sql->mutex);
// ... actual insert and exec code ...
pthread_mutex_unlock(&sql->mutex);
return 0;
}
Then add and initialize that mutex in your SQL structure with pthread_mutex_init.
You'll see the performance boost if your bottleneck is indeed the computation of SHA-256 rather than writing into the database. Otherwise the overhead of this mutex should be negligible and the number of threads will not have a significant effect of the run-time.
Just a quick question (I hope). How would you allocate an address space via mlock and then launch an application within that space?
For instance I have a binary that launches from a wrapper program that configures the environment. I only have access to the wrapper code and would like to have the binary launch in a certain address space. Is it possible to do this from the wrapper?
Thanks!
If you have the sources for the program, add a command-line option so that the program calls mlockall(MCL_CURRENT | MCL_FUTURE) at some point. That locks it in memory.
If you want to control the address spaces the kernel loads the program into, you need to delve into kernel internals. Most likely, there is no reason to do so; only people with really funky hardware would.
If you don't have the sources, or don't want to recompile the program, then you can create a dynamic library that executes the command, and inject it into the process via LD_PRELOAD.
Save the following as lockall.c:
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
#include <errno.h>
static void wrerr(const char *p)
{
if (p) {
const char *q = p + strlen(p);
ssize_t n;
while (p < q) {
n = write(STDERR_FILENO, p, (size_t)(q - p));
if (n > 0)
p += n;
else
if (n != -1 || errno != EINTR)
return;
}
}
}
static void init(void) __attribute__((constructor));
static void init(void)
{
int saved_errno = errno;
if (mlockall(MCL_CURRENT | MCL_FUTURE) == -1) {
const char *errmsg = strerror(errno);
wrerr("Cannot lock all memory: ");
wrerr(errmsg);
wrerr(".\n");
exit(127);
} else
wrerr("All memory locked.\n");
errno = saved_errno;
}
Compile it to a dynamic library liblockall.so using
gcc -Wall -O2 -fPIC -shared lockall.c -Wl,-soname,liblockall.so -o liblockall.so
Install the library somewhere typical, for example
sudo install -o 0 -g 0 -m 0664 liblockall.so /usr/lib/
so you can run any binary, and lock it into memory, using
LD_PRELOAD=liblockall.so binary arguments..
If you install the library somewhere else (not listed in /etc/ld.so.conf), you'll need to specify path to the library, like
LD_PRELOAD=/usr/lib/liblockall.so binary arguments..
Typically, you'll see the message Cannot lock all memory: Cannot allocate memory. printed by the interposed library, when running commands as a normal user. (The superuser, or root, typically has no such limit.) This is because for obvious reasons, most Linux distributions limit the amount of memory an unprivileged user can lock into memory; this is the RLIMIT_MEMLOCK resource limit. Run ulimit -l to see the per-process resource limits currently set (for the current user, obviously).
I suggest you set a suitable limit of how much memory the process can run, running e.g. ulimit -l 16384 bash-built-in before executing the (to set the limit to 16384*1024 bytes, or 16 MiB), if running as superuser (root). If the process leaks memory, instead of crashing your machine (because it locked all available memory), the process will die (from SIGSEGV) if it exceeds the limit. That is, you'd start your process using
ulimit -l 16384
LD_PRELOAD=/usr/lib/liblockall.so binary arguments..
if using Bash or dash shell.
If running as a dedicated user, most distributions use the pam_limits.so PAM module to set the resource limits "automatically". The limits are listed either in the /etc/security/limits.conf file, or in a file in the /etc/security/limits.d/ subdirectory, using this format; the memlock item specifies the amount of memory each process can lock, in units of 1024 bytes. So, if your service runs as user mydev, and you wish to allow the user to lock up to 16 megabytes = 16384*1024 bytes per process, then add line mydev - memlock 16384 into /etc/security/limits.conf or /etc/security/limits.d/mydev.conf, whichever your Linux distribution prefers/suggests.
Prior to PAM, shadow-utils were used to control the resource limits. The memlock resource limit is specified in units of 1024 bytes; a limit of 16 megabytes would be set using M16384. So, if using shadow-utils instead of PAM, adding line mydev M16384 (followed by whatever the other limits you wish to specify) to /etc/limits should do the trick.
I wish to dump the output of rsyslog(service) to some file at a selected
location.
Following is what I have tried :
1. Made changes to /etc/rsyslog.conf
#################
#### MODULES ####
#################
$ModLoad imfile
$ModLoad omprog <----- NEWLY ADDED ------>
$ModLoad imuxsock # provides support for local system logging
$ModLoad imklog # provides kernel logging support
#$ModLoad immark # provides --MARK-- message capability
###########################
#### GLOBAL DIRECTIVES ####
###########################
#
# Use traditional timestamp format.
# To enable high precision timestamps, comment out the following line.
#
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
$ActionOMProgBinary /home/test/dmsg <----- NEWLY ADDED ------>
# Filter duplicated messages
dmsg : Is a C program that reads the lines from stdin and writes it to
file (/home/test/log_syslog_file)
I am expecting the output to be dumped to /home/test/log_syslog_file
But nothing happens.
code for dmsg (dmsg.c) ::
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
int main(){
char* lineptr;
size_t size = 0;
int fd = open("log_syslog_file", O_CREAT| O_WRONLY);
while(getline(&lineptr, &size, stdin)>0){
if(write(fd, lineptr, strlen(lineptr))<0){
fprintf(stderr, "write failure");
break;
}
}
free(lineptr);
close(fd);
return 0;
}
I am using Ubuntu 14.04
-------- EDIT ---------
After starting the rsyslog service,
I am giving the following command:
rsyslogd -c5 -d -n
When I use the following it works fine :
cat /var/log/syslog | ./dmsg
Thanks.
You've got at least one major bug in your code:
char* lineptr;
...
while(getline(&lineptr, &size, stdin)>0)
You never allocate memory for the string stored in *lineptr, but you don't tell getline() to allocate the memory for you, either. The resulting buffer overflow can result in all sorts of exciting bugs showing up before the inevitable crash (for example, in my test run, log_syslog_file got the permissions ---x--x--T).
First of all what #Mark said. Apart from that make sure that
you have something like
*.* :omprog:
in your rsyslog.conf.
This will redirect all the messages to your program.
I was trying to do a buffer overflow (I'm using Linux) on a simple program that requires a password. Here's the program code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int check_authentication(char *password){
int auth_flag = 0;
char password_buffer[16];
strcpy(password_buffer, password);
if(strcmp(password_buffer, "pass1") == 0)
auth_flag = 1;
if(strcmp(password_buffer, "pass2") == 0)
auth_flag = 1;
return auth_flag;
}
int main(int argc, char **argv)
{
if(argc < 2){
printf("\t[!] Correct usage: %s <password>\n", argv[0]);
exit(0);
}
if(check_authentication(argv[1])){
printf("\n-=-=-=-=-=-=-=-=\n");
printf(" Access granted.\n");
printf("-=-=-=-=-=-=-=-=\n");
} else {
printf("\nAccess Denied.\n");
}
return 0;
}
OK, now I compiled it, no errors, and saved it as overflow.c.
Now I opened the Terminal, I moved into the file directory (Desktop) and then wrote:
./overflow.c AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
The Terminal said: "Stack smashing detected" (or something like that) and then quit the program execution.
Now, I'm reading a book, called "Hacking - The Art Of Exploitation" by Jon Erickson. In a chapter, he explains this type of exploit (I took the code from the book) and does the same command I've done. The memory overflows and the program prints "Access granted.". Now, why my OS is detecting I'm trying to exploit the program? I've done something wrong?
I also tried the exploit on Mac OS X. Same thing happened. Please, can someone help me? Thanks in advance.
In modern linux distributions buffer overflow is detected and the process is killed. In order to disable that mode simply compile your application with such flags (gcc):
-fno-stack-protector -fno-stack-protector-all
If compiling with gcc, add -fno-stack-protector flag. The message you received is meant to protect you from your bad code :)
The reason is stack smashing is actually a protection mechanism used by some compilers to detect buffer overflow attacks. You are trying to put the 29 A's into a shorter character array (16 bytes).
Most modern OS have protective mechanisms built in. Almost any good OS does not allow direct low level memory access to any program. It only allows programs to access the adress space allocated to them. Linux based OS automatically kill the processes that try to access beyond their allocated memory space.
Other than this, OS also have protective mechanisms that prevent a program from crashing the system by allocating large amounts of memory, in an attempt to severely deplete the resources available to the OS.
I am searching for way to create a somewhat random string of 64k size.
But I want this to be fast as well. I have tried the following ways:
a) read from /dev/random -- This is too slow
b) call rand() or a similar function of my own -- a soln with few(<10) calls should be ok.
c) malloc() -- On my Linux, the memory region is all zeroes always,
instead of some random data.
d) Get some randomness from stack variable addresses/ timestamp etc. to init initial few bytes.
Followed by copying over these values to the remaining array in different variations.
Would like to know if there is a better way to approach this.
/dev/random blocks after its pool of random data has been emptied until it gathered new random data. You should try /dev/urandom instead.
rand() should be fairly fast in your c runtime implementation. If you can relax your "random" requirement a bit (accepting lower quality random numbers), you can generate a sequence of numbers using a tailored implementaton of a linear congruential generator. Be sure to choose your parameters wisely (see the wikipedia entry) to allow additional optimizations.
To generate such a long set of random numbers faster, you could use SSE/AVX and generate four/eight 32 random bits in parallel.
You say "somewhat random" so I assume you do not need high quality random numbers.
You should probably use a "linear congruential generator" (LGC). See wikipedia for details:
http://en.wikipedia.org/wiki/Linear_congruential_generator
That will require one addition, one multiplication and one mod function per element.
Your options:
a) /dev/random is not intended to be called frequently. See "man 4 random" for details.
b) rand etc. are like the LGC above but some use a more sophisticated algorithm that gives better random numbers at a higher computational cost. See "man 3 random" and "man 3 rand" for details.
c) The OS deliberately zeros the memory for security reasons. It stops leakage of data from other processes. Google "demand zero paging" for details.
d) Not a good idea. Use /dev/random or /dev/urandom once, that's what they're for.
Perhaps calling OpenSSL routines, something like the programmatic equivalent of:
openssl rand NUM_BYTES | head -c NUM_BYTES > /dev/null
which should run faster than /dev/random and /dev/urandom.
Here's some test code:
/* randombytes.c */
#include <stdlib.h>
#include <stdio.h>
#include <openssl/rand.h>
/*
compile with:
gcc -Wall -lcrypto randombytes.c -o randombytes
*/
int main (int argc, char **argv)
{
unsigned char *random_bytes = NULL;
int length = 0;
if (argc == 2)
length = atoi(argv[1]);
else {
fprintf(stderr, "usage: randombytes number_of_bytes\n");
return EXIT_FAILURE;
}
random_bytes = malloc((size_t)length + 1);
if (! random_bytes) {
fprintf(stderr, "could not allocate space for random_bytes...\n");
return EXIT_FAILURE;
}
if (! RAND_bytes(random_bytes, length)) {
fprintf(stderr, "could not get random bytes...\n");
return EXIT_FAILURE;
}
*(random_bytes + length) = '\0';
fprintf(stdout, "bytes: %s\n", random_bytes);
free(random_bytes);
return EXIT_SUCCESS;
}
Here's how it performs on a Mac OS X 10.7.3 system (1.7 GHz i5, 4 GB), relative to /dev/urandom and OpenSSL's openssl binary:
$ time ./randombytes 100000000 > /dev/null
real 0m6.902s
user 0m6.842s
sys 0m0.059s
$ time cat /dev/urandom | head -c 100000000 > /dev/null
real 0m9.391s
user 0m0.050s
sys 0m9.326s
$ time openssl rand 100000000 | head -c 100000000 > /dev/null
real 0m7.060s
user 0m7.050s
sys 0m0.118s
The randombytes binary is 27% faster than reading bytes from /dev/urandom and about 2% faster than openssl rand.
You could profile other approaches in a similar fashion.
Don't over think it. dd if=/dev/urandom bs=64k count=1 > random-bytes.bin.