I want to know what is the maximum number of (recursive)function calls allowed in gcc C. I have a program which can take stack depth of 400000 function calls each with size of around 200 bytes (so around 80 MB). How can I increase the maximum depth?
The stack limit is not imposed by the compiler, but by the operating system. On Unix, you can try using ulimit(1) to increase it.
I would recommend rewriting the routine into an iterative algorithm. Though nontrivial it should be straightforward to convert the algorithm, and will free you from having to deal with such resource limitations (which, I would guess, vary wildly by architecture, platform, computer details, etc.)
Also, please note: all recursive algorithms can be written iteratively.
Related
I need billions of random bytes from arc4random_buf, and my strategy is to request X random bytes at a time, and repeat this many times.
My question is how large should X be. Since the nbytes argument to arc4random_buf can be arbitrarily large, I suppose there must be some kind of internal loop that generates some entropy each time its body is executed. Say, if X is a multiple of the number of random bytes generated each iteration, the performance can be improved because I’m not wasting any entropy.
I’m on macOS, which is unfortunately closed-source, so I cannot simply read the source code. Is there any portable way to determine the optimal X?
Doing some benchmarks on typical target systems is probably the best way to figure this out, but looking at a couple of implementations, it seems unlikely that the buffer size will make much difference to the cost of arc4random_buffer.
The original implementation implements arc4random_buffer as a simple loop around a function which generates one byte. As long as the buffer is big enough to avoid excessive call overhead, it should make little difference.
The FreeBSD library implementation appears to attempt to optimise by periodically computing about 1K of random bytes. Then arc4random_buffer uses memcpy to copy the bytes from the internal buffer to the user buffer.
For the FreeBSD implementation, the optimal buffer size would be the amount of data available in the internal buffer, because that minimizes the number of calls to memcpy. However, there's no way to know how much that is, and it will not be the same on every call because of the rekeying algorithm.
My guess is that you will find very little difference between buffer sizes greater than, say, 16K, and probably even less. For the FreeBSD implementation, it will be very slightly more efficient if your buffer size is a multiple of 8.
Addendum: All the implementations I know of have a global rekey threshold, so you cannot influence the cost of rekeying by changing the buffer size in arc4random_buffer. The library simply rekeys every X bytes generated.
I am doing a project where I need to port code from PLM51 to C.
8051 architecture is being used. The microcontroller is romless and an external memory of 64Kb is being used. The PLM51 code size is almost 63Kb.
So my question is that when I port my code from PLM51 to C, will the code size increase or decrease?
What are the parameters which will decide the increase/decrease in size?
To start out I must say that while I have written in both languages, I have not done a port from PL/M to C or compared the sizes of similar programs written in the two languages.
This question is very difficult to answer with any degree of certainty but the two languages are fairly similar in their level, being fairly low level portable languages. I seem to remember our rule of thumb for PL/M was an average of around 5 assembler instructions per PL/M statement. This efficiency will vary between compilers and optimisation levels.
One factor that may have a large impact on the code size of the final image is the external libraries that may be included by the linker. A particular culprit is the printf formatter that is typically quite large. In PL/M you would normally write your own character output functions that would be tailored to your specific needs often resulting is smaller code.
And is there a way to easily monitor your stack depth in a linux environment?
Consider the case of a basic app in C, compiled with gcc, in Ubuntu.
How about if you do NOT allow dynamic memory allocation (no malloc/free-ing)?
Can you know your max stack depth after you compile?
No. Consider a recursive function that might call itself any number of times, depending on input. You can't know how many times the function might be invoked, one inside the last, without knowing what the input to the program is.
I expect that it might be possible to determine the max stack depth for some programs, but you can't determine that for all programs.
And is there a way to easily monitor your stack depth in a linux environment?
I don't know about an easy way to monitor the stack depth continuously, but you can determine the stack depth at any point using gdb's -stack-info-depth command.
How about if you do NOT allow dynamic memory allocation (no malloc/free-ing)?
That doesn't make a difference. Consider a recursive Fibonacci function -- that wouldn't need to allocate any memory dynamically, but the number of stack frames will still vary depending on input.
It is possible to do a call-graph analysis. The longest path in the graph is the maximum stack depth. However, there are limitations.
With recursive functions, all bets are off, the depth of recursion depends on the run time input, it is not possible to deduce that in compile time analysis. [It is possible to detect the presence of recursive function by analyzing the call graph and looking for self edges, i.e. edges with same source and destination.]
Furthermore, the same issue is present if there are loops/cycles in the call graph. [As mentioned by #Caleb: a()->b()->c()->d()->e()->f()->g()->c()] [Using algorithms from graph theory, it is possible to detect the presence of cycles as well.]
References for call graph:
http://en.wikipedia.org/wiki/Call_graph
Tools to get a pictorial function call graph of code
Preferable for x86-32 gcc implementation
Considering modern C compiler optimize like crazy, I think you'll find timings to be very situationally dependent. What would be a slow operation in one situation might be either optimized away to a faster operation, or the compiler might be able to use a faster 8 or 16 bit version of the same instruction, etc.
It depends on the particular case, but this is likely to vary substantially based on the platform, hardware, operating system, function, and function inputs. A general answer is "no." It also depends on what you mean by "time;" there is execution time and clock time, among other things.
The best way to determine how long something will take is to run it as best you can. If performance is an issue, profiling and perfecting will be your best bet.
Certain real-time systems place constraints on how long operations will take, but this is not specific to C.
I don't think such a thing is really possible. When you consider the difference in time for the same program given different arguments. For example, assuming the function costOf did what you wanted, which costs more, memcpy or printf. Both?
costOf(printf("Hello World")) > costOf(memcpy(a, b, 4))
costOf(printf("Hello World")) < costOf(memcpy(a, b, 4 * 1024 * 1024 * 1024))
IMHO, this is a micro optimization, which should be disregarded until all profiling has been performed. In general, library routines are not the consumers of execution time, but rather resources or programmer created functions.
I also suggest spending more time on a program's quality, and robustness rather than worrying about micro optimizations. With computing power increasing and memory sizes increasing, size and execution times are less of a problem to customers than quality and robustness. A customer is willing to wait for a program that produces correct output (or performs all requirements correctly) and doesn't crash rather than demanding a fast program that has errors or crashes the system.
To answer your question, as others have stated, the execution times of library functions depend upon the library developer, the platform (hardware) and the operating system. Some platforms can execute floating point instructions faster or in equal time to integral operations. Some libraries will delegate function to the operating system, while others will package their own. Some functions are slower because they are written to work on a variety of platforms, while the same functions in other libraries can be faster because they are tailored to the specific platform.
Use the library functions that you need and don't worry about their speed. Use 3rd party tested libraries rather than rewriting your own code. If the program is executing very slowly, review the design and profile. Perhaps you can gain more speed by using Data Oriented Design rather than Object Oriented Design or procedural programming. Again, concentrate your efforts on developing quality and robust code while learning how to produce software more efficiently.
I have two questions related to memory. First some background. I am a novice-intermediate c programmer.
I have written several different tree like data-structures with variable number of nodes at each level. One such structure, can have as its data a number of integer variables, which themselves are primary data for integer trees. I have written recursive functions for genrating trees with random numbers of nodes at different levels. I pass pointers to randomly generated integer trees as parameters for generating the main data-structure.
I have also written recursive code for operating on these tree structures, such as printing the tree. Just for my learning, I created queue and stack for my nodes and wrote iterative functions for in-order, pre-order and posr-order printing of the tree. I think, I am beginning to get the hang of it.
Now the question.
(a) I need to write other functions, which are obviously easy and clean if written using pure recursion. I can see how it could be written iteratively. It is not difficult, just tedious. The maximum depth of my trees will be 3-5, however, the number of nodes at each level is large. It is my understanding, that every recursive call will store addresses on a stack. If the depth is large, it can run out of memory. But if the depth is shallow, the penalty (memory/speed) of using a recursive function may not be terrible.
Do people have recommendations on criteria for deciding if an iterative/recursive solution is preferable?? I have read various threads on the site about iterative soution, but could not find any thing that directly speaks to this issue.
(b) Second, question relates to requesting memory from the system. I know that some applications can request certain amount of memory. I am using mingw-gcc4.x with Netbeans IDE. How can I specify the maximum amount of memory that the program can use in debug / release mode? Or, does it depend solely on the available RAM and no explicit specification is necessary?!
Thanks in advance,
paras
~RT
"The maximum depth of my trees will be 3-5"
This depth of recursion will not challenge the default stack size of any version of Windows, or any other system you'll ever see that doesn't have "Watch out! The stack is tiny!" plastered all over it. Most programs go a lot more than 3-5 calls deep, without involving any recursion at all.
So as long as your recursion only goes "down" your tree, not "across" its breadth, you're fine. Unless of course you're doing something really unusual like sticking an enormous array on the stack as a local variable.
Am I right that your (post-order) recursion looks something like this?
void doNode(struct node *n) {
for (int i = 0; i < n->num_nodes; ++i) {
doNode(n->nodes[i]);
}
// do some work on this node
}
If so, then for 3-5 levels the only way it'll use a lot of stack is if it looks like this:
void doNode(struct node *n) {
int myRidiculousArray[100*1000] = { 0 };
for (int i = 0; i < n->num_nodes; ++i) {
doNode(n->nodes[i]);
}
// do some work on this node, using myRidiculousArray
}
If you have millions of nodes, then there may be some performance gain to be had from avoiding the function call overhead per node. But write it the easy way first, and then come back and look at it later if you're ever desperate for a bit more performance. It's fairly rare for the overhead of a function call per node to be reason your code is slow - it does happen, but usually only after you've fixed a bunch of other, even worse, slowdowns.
If you write your function using tail recursion (provided you're compiling with optimization enabled) you won't run into problems with stack or memory space.
In the end you need to program your functions so you can understand them so do whatever is easier for you.
Even an iterative implementation is a recursive algorithm if you're using a stack to store nodes; both use O(f) space, where "f" is a function that's "more" than a constant (c is O(f) but f is not O(1)). You might still wind up using less memory with the iterative version if the elements of your stack are smaller than a call-stack frame. If so, you can look into reducing the size of a call stack by using closures, assuming the language supports them.
Iterative algorithms will have O(1) space requirements. Even a recursive implementation can achieve this using tail calls, as Dashogun mentions.
Spend a little time trying to find an iterative algorithm. If you can't find one, I recommend going with the recursive implementation unless you know for certain that you need to handle a recursive structure that (these days) has a depth of at least 213. For a binary tree, that's 2213 nodes, which I very much doubt you'll see.
(a) Recursion is not bad in itself. However if writing the iterative algo is close in complexity you should use the iterative one. Before commiting to a recursive algorithm some prerequisites apply:
-You should make sure that the recursion depth (and the local variables in the re-entrant functions) will not make you exceed the stack size. For the depth you mentioned used on Windows this would be a problem in very few cases. Additionally you can add a safety check on the height of the tree.
(b) If you are asking about the stack size: I see you use mingw, thus you probably build for Windows. The stack size in Windows is per thread. Have a look here how to setup your reserved and initially commited stack size.
If you are asking about heap memory allocation have a look here. But the short story is that you can use all the memory the system can provide for heap allocations.