I'm using a library which creates a pthread using the default stack size of 8MB. Is it possible to programatically reduce the stack size of the thread the library creates? I tried using setrlimit(RLIMIT_STACK...) inside my main() function, but that doesn't seem to have any effect. ulimit -s seems to do the job, but I don't want to set the stack size before my program is executed.
Any ideas what I can do? Thanks
Update 1:
Seems like I'm going to give up on being able to set the stack size using setrlimit(RLIMIT_STACK,...). I checked the resident memory and found it's a lot less than the virtual memory. That's a good enough reason for me to give up on trying to limit the stack size.
I think you are out of luck. If the library you are using does not provide a way to set the stack limit, then you can't change it after the thread has been created. setrlimit and shell limits effects the main thread's stack.
Threads are created within the processes memory space so their stacks are allocated when the threads are created. On Unix I believe the stack will be mapped to RAM on demand, so you may not actually use 8 Megs of RAM if you don't need it (virtual vs resident memory).
There are a couple aspects to answering this question.
First, as stated in the comments, pthread_attr_setstacksize is The Right Way to do this. If the library calling pthread_create doesn't have a way to let you do this, fixing the library would be the ideal solution. If the thread is purely internal to the library (not calling code from the calling application) it really should set its own preference for the stack size based on something like PTHREAD_STACK_MIN + ITS_OWN_NEEDS. If it's calling back to your code, it should let you request how much stack space you need.
Second, as an implementation detail, glibc uses the stack limit from setrlimit/ulimit to derive the stack size for threads created by pthread_create. You can perhaps influence the size this way, but it's not portable, and as you've found, not reliable even there (it's not working when you call setrlimit from within the process itself). It's possible that glibc only probes the limit once when the relevant code is first initialized, so I would try moving the setrlimit call as early as possible in main to see if this helps.
Finally, the stack size for threads may not even be relevant to your application. Even if the stack size is 8MB, only the pages which have actually been modified (probably 4k or at most 8k unless you have big arrays on the stack) are actually using physical memory. The rest is just tying up virtual address space (of which you always have at least 2-3 GB) and possibly commit charge. By default, Linux enables overcommit, so commit charge will not be strictly enforced, and therefore the fact that glibc is requesting too much may not even matter. You could make the overcommit checking even less strict by writing a 1 to /proc/sys/vm/overcommit_memory, but this will cause you to loose information about when you're "running out of memory" and make your program crash instead. On such a constrained system you may prefer even stricter overcommit accounting, but then you have to fix the thread stack size problem...
Related
I am trying to identify any way if we can increase the stack size of my running program after getting SIGSEGV. I know we can increase the size of stack by ulimit -c but, that did not solve this problem. Because my process is already dead. I want to handle this situation where my process will not get killed even after segfault. setrlimit() is one way which can be used for exceed stack size statically. But I don't want to block more memory than I need.
Under Linux, setting a higher limit for the stack size does not commit this memory to your process. The memory will be paged in as required.
The default stack size in quite large already. You should run under the debugger or produce a core upon SIGSEGV to analyze what is really going on. You might have a very deep recursion, or an inordinate amount of local variable space, possibly via VLAs allocated by mistake. Increasing the stack space may hide the problem for a while, but it is not a reliable solution.
A recent OS will not automatically reserve the memory for the stack, but just add pages as required. The ulimit is just an upper bound you allow it to use up. So increasing the stack size statically should be no problem and exactly what you want.
I have a Linux app (written in C) that allocates large amount of memory (~60M) in small chunks through malloc() and then frees it (the app continues to run then). This memory is not returned to the OS but stays allocated to the process.
Now, the interesting thing here is that this behavior happens only on RedHat Linux and clones (Fedora, Centos, etc.) while on Debian systems the memory is returned back to the OS after all freeing is done.
Any ideas why there could be the difference between the two or which setting may control it, etc.?
I'm not certain why the two systems would behave differently (probably different implementations of malloc from different glibc's). However, you should be able to exert some control over the global policy for your process with a call like:
mallopt(M_TRIM_THRESHOLD, bytes)
(See this linuxjournal article for details).
You may also be able to request an immediate release with a call like
malloc_trim(bytes)
(See malloc.h). I believe that both of these calls can fail, so I don't think you can rely on them working 100% of the time. But my guess is that if you try them out you will find that they make a difference.
Some mem handler dont present the memory as free before it is needed. It instead leaves the CPU to do other things then finalize the cleanup. If you wish to confirm that this is true, then just do a simple test and allocate and free more memory in a loop more times than you have memeory available.
I would like to allocate a fixed memory for my application (developed using C). Say my application should not cross 64MB of memory occupation. And also i should avoid to use more CPU usage. How it is possible?
Regards
Marcel.
Under unix: "ulimit -d 64M"
One fairly low-tech way I could ensure of not crossing a maximum threshold of memory in your application would be to define your own special malloc() function which keeps count of how much memory has been allocated, and returns a NULL pointer if the threshold has been exceeded. This would of course rely on you checking the return value of malloc() every time you call it, which is generally considered good practice anyway because there is no guarantee that malloc() will find a contiguous block of memory of the size that you requested.
This wouldn't be foolproof though, because it probably won't take into account memory padding for word alignment, so you'd probably end up reaching the 64MB memory limit long before your function reports that you have reached it.
Also, assuming you are using Win32, there are probably APIs that you could use to get the current process size and check this within your custom malloc() function. Keep in mind that adding this checking overhead to your code will most likely cause it to use more CPU and run a lot slower than normal, which leads nicely into your next question :)
And also i should avoid to use more
CPU usage.
This is a very general question and there is no easy answer. You could write two different programs which essentially do the same thing, and one could be 100 times more CPU intensive than another one due to the algorithms that have been used. The best technique is to:
Set some performance benchmarks.
Write your program.
Measure to see whether it reaches your benchmarks.
If it doesn't reach your benchmarks, optimise and go to step (3).
You can use profiling programs to help you work out where your algorithms need to be optimised. Rational Quantify is an example of a commercial one, but there are many free profilers out there too.
If you are on POSIX, System V- or BSD-derived system, you can use setrlimit() with resource RLIMIT_DATA - similar to ulimit -d.
Also take a look at RLIMIT_CPU resource - it's probably what you need (similar to ulimit -t)
Check man setrlimit for details.
For CPU, we've had a very low-priority task ( lower than everything else ) that does nothing but count. Then you can see how often that task gets to run, and you know if the rest of your processes are consuming too much CPU. This approach doesn't work if you want to limit your process to 10% while other processes are running, but if you want to ensure that you have 50% CPU free then it works fine.
For memory limitations you are either stuck implementing your own layer on top of malloc, or taking advantage of your OS in some way. On Unix systems ulimit is your friend. On VxWorks I bet you could probably figure out a way to take advantage of the task control block to see how much memory the application is using... if there isn't already a function for that. On Windows you could probably at least set up a monitor to report if your application does go over 64 MB.
The other question is: what do you do in response? Should your application crash if it exceeds 64Mb? Do you want this just as a guide to help you limit yourself? That might make the difference between choosing an "enforcing" approach versus a "monitor and report" approach.
Hmm; good question. I can see how you could do this for memory allocated off the heap, using a custom version of malloc and free, but I don't know about enforcing it on the stack too.
Managing the CPU is harder still...
Interesting.
I have made a program in c and wanted to see, how much memory it uses and noticed, that the memory usage grows while normally using it (at launch time it uses about 250k and now it's at 1.5mb). afaik, I freed all the unused memory and after some time hours, the app uses less memory. Could it be possible, that the freed memory just goes from the 'active' memory to the 'wired' or something, so it's released when free space is needed?
btw. my machine runs on mac os x, if this is important.
How do you determine the memory usage? Have you tried using valgrind to locate potential memory leaks? It's really easy. Just start your application with valgrind, run it, and look at the well-structured output.
If you're looking at the memory usage from the OS, you are likely to see this behavior. Freed memory is not automatically returned to the OS, but normally stays with the process, and can be malloced later. What you see is usually the high-water mark of memory use.
As Konrad Rudolph suggested, use something that examines the memory from inside the process to look for memory links.
The C library does not usually return "small" allocations to the OS. Instead it keeps the memory around for the next time you use malloc.
However, many C libraries will release large blocks, so you could try doing a malloc of several megabytes and then freeing it.
On OSX you should be able to use MallocDebug.app if you have installed the Developer Tools from OSX (as you might have trouble finding a port of valgrind for OSX).
/Developer/Applications/PerformanceTools/MallocDebug.app
I agree with what everyone has already said, but I do want to add just a few clarifying remarks specific to os x:
First, the operating system actually allocates memory using vm_allocate which allocates entire pages at a time. Because there is a cost associated with this, like others have stated, the C library does not just deallocate the page when you return memory via free(3). Specifically, if there are other allocations within the memory page, it will not be released. Currently memory pages are 4096 bytes in mac os x. The number of bytes in a page can be determined programatically with sysctl(2) or, more easily, with getpagesize(2). You can use this information to optimize your memory usage.
Secondly, user-space applications do not wire memory. Generally the kernel wires memory for critical data structures. Wired memory is basically memory that can never be swapped out and will never generate a page fault. If, for some reason, a page fault is generated in a wired memory page, the kernel will panic and your computer will crash. If your application is increasing your computer's wired memory by a noticeable amount, it is a very bad sign. It generally means that your application is doing something that significantly grows kernel data structures, like allocating and not reaping hundreds of threads of child processes. (of course, this is a general statement... in some cases, this growth is expected, like when developing a virtual host or something like that).
In addition to what the others have already written:
malloc() allocates bigger chunks from the OS and spits it out in smaller pieces as you malloc() it. When free()ing, the piece first goes into a free-list, for quick reuse by another malloc if the size fits. It may at this time be merged with another free item, to form bigger free blocks, to avoid fragmentation (a whole bunch of different algorithms exist there, from freeLists to binary-sized-fragments to hashing and what not else).
When freed pieces arrive so that multiple fragments can be joined, free() usually does this, but sometimes, fragments remain, depending on size and orderof malloc() and free(). Also, only when a big such free block has been created will it be (sometimes) returned to the OS as a block. But usually, malloc() keeps things in its pocket, dependig on the allocated/free ratio (many heuristics and sometimes compile or flag options are often available).
Notice, that there is not ONE malloc/free algotrithm. There is a whole bunch of different implementations (and literature). Highly system, OS and library dependent.
I am working on a parallel app (C, pthread). I traced the system calls because at some point I have bad parallel performances. My traces shown that my program calls mprotect() many many times ... enough to significantly slow down my program.
I do allocate a lot of memory (with malloc()) but there is only a reasonable number of calls to brk() to increase the heap size. So why so many calls to mprotect() ?!
Are you creating and destroying lots of threads?
Most pthread implementations will add a "guard page" when allocating a threads stack. It's an access protected memory page used to detect stack overflows. I'd expect at least one call to mprotect each time a thread is created or terminated to (un)protect the guard page. If this is the case, there are several obvious strategies:
Set the guard page size to zero using pthread_attr_setguardsize() before creating threads.
Use a thread-pool (of as many threads as processors say). Once a thread is done with a task, return it to the pool to get a new task rather than terminate and create a new thread.
Another explanation might be that you're on a platform where a thread's stack will be grown if overflow is detected. I don't think this is implemented on Linux with GCC/Glibc as yet, but there have been some proposals along these lines recently. If you use a lot of stack space whilst processing, you might explicitely increase the initial/minimum stack size using pthread_attr_setstacksize.
Or it might be something else entirely!
If you can, run your program under a debug libc and break on mprotect(). Look at the call stack, see what your code is doing that's leading to the mprotect() calls.
glibc library that has ptmalloc2 for its malloc uses mprotect() internally for micromanagement of heap for threads other than main thread (for main thread, sbrk() is used instead.) malloc() firstly allocates large chunk of memory with mmap() for the thread if a heap area seems to have contention, and then it changes the protection bits of unnecessary portion to make it accessible with mprotect(). Later, when it needs to grow the heap, it changes the protection to read/writable with mprotect() again. Those mprotect() calls are for heap growth and shrink in multithreaded applications.
http://www.blackhat.com/presentations/bh-usa-07/Ferguson/Whitepaper/bh-usa-07-ferguson-WP.pdf
explains this in a bit more detailed way.
The 'valgrind' suite has a tool called 'callgrind' that will tell you what is calling what. If you run the application under 'callgrind', you can then view the resulting profile data with 'kcachegrind' (it can analyze profiles made by 'cachegrind' or 'callgrind'). Then just double-click on 'mprotect' in the left pane and it will show you what code is calling it and how many times.