Occasionally after compiling, computer locks up - c

This is a strange issue, but it happens between one and five times a month.
During development, I compile frequently (this is not the unusual part.) From time to time, running the freshly-compiled binary locks up my system. Tray clock doesn't increment, ctrl+alt+backspace doesn't kill Xorg. Totally conked.
I physically powercycle the machine and everything's OK. Application runs fine, from the same binary that murdered my machine earlier or after a no-change recompile, and I get on with my work.
But it still bothers me, largely because I have no idea what causes it. This can occur with binaries compiling with either Clang or GCC. What is going on?

Hard to say, but I have two ideas:
1) Bad RAM
This is possible, but depending on your code, #2 might be more likely.
2) Buffer overflow bug
If you are overwriting memory due to a bug in your code, you could be putting some bits in memory that happen to be assembly instructions as well. I would look very carefully at the code you have to see where you don't check array lengths before you write to them.

Related

GCC Why Non Run Optimization All Time?

I wrote the well known swap function in C and watched the assembly output using gcc S and once again did the same but with optimizations of O2
The difference was pretty big as I saw only 5 lines compared to 20 lines.
My question is, If optimisation really helps what's the reason of not using it all the time? Why we non non-optimized compilation of code?
An extra question to those working in the industry, when you release the final version of your program after testing it do you compile with optimizations on?
I am responding to all your comments, please read them.
There are a few reasons.
1. Compilation takes longer time
For small and even medium sized projects, this is rarely an issue today. Modern computers are VERY fast. If it takes five or ten seconds usually does not matter. But for larger projects it does matter. Especially if the build process is not setup properly. I remember when I was trying to add a feature to the game The Battle for Wesnoth. Compilation took around ten minutes. It's easy to see how much you would want to reduce that to five minutes or lower if you could.
2. Optimized code is harder to debug
The reason that it makes code harder to debug is that the debugger does not run the program line by line. That's just an illusion. Here is an example where it might be a problem:
int main(void) {
char str[] = "Hello, World!";
int number_of_capital_letters = 0;
for(int i=0; i<strlen(str); i++) {
if(isupper(str[i]))
number_of_capital_letters++;
}
printf("%s\n", str);
// Outcommented for debugging reasons
// printf("%d\n", number_of_capital_letters);
}
You fire up your debugger and wonders why it does not keep track of number_of_capital_letters. And then you find out that since you have commented out the last printf statement, the variable is not used for any observable behavior so the optimizer changes your code to:
int main(void) {
puts("Hello, World!");
}
One could argue that you then just turn off optimizer for a debug build. And that's true in the world when a cow is a sphere. But a third reason is
3. Sometimes bugs only show up at higher optimization levels.
Imagine that you have a big code base. When you upgrade the compiler, a bug suddenly emerges. And it seems to vanish when you remove optimization. What's the problem here? Well, it could be a bug in the optimizer. But it could also be a bug in your code that manifested itself with the new version of the optimizer. Very often, code with undefined behavior behaves different in code compiled with optimization.
So what do you do? You could try to figure out if the bug is in the optimizer or your code. That can be a VERY time consuming task. Let's assume it's a bug in the optimizer. What to do? You could downgrade your compiler, which is not optimal for several reasons. Especially if it's an open source project. Imagine downloading the source and then run the build script and scratching your head for hours to figure out what's wrong, and then you see in some documentation (provided that the author documented it) that you need a specific version of a specific compiler.
Let's instead assume it's a bug in your code. The ideal thing is of course to fix it. But maybe you don't have the resources to do so. This time you can also require anyone who compiles it to use a certain version of a specific compiler.
But if you could just edit a Makefile and replace -O3 with -O2, you can clearly see that it's a viable option sometimes in our non-ideal world where time is not an endless resource. With a bit of bad luck, such a bug can take a week to track down. Or more. That's time you can spend somewhere else.
Here is an example of such a bug:
#include <stdio.h>
int main(void) {
char str[] = "Hello";
str[5] = '!';
puts(str);
}
When I compiled this with gcc 10.2 I got different results depending on optimization level.
Without optimization:
Hello!
With optimization:
Hello!`#
Try it out yourself:
https://godbolt.org/z/5dcKKrEW1
https://godbolt.org/z/48bz5ae1d
And here I found a forum thread where the debug build works but not release: https://developer.apple.com/forums/thread/15112
4. Sometimes bugs only show up at LOWER optimization levels.
Yep, that may also happen. In this case, you could just increase the optimization if you don't care that much about correctness. But if you do care, this can be a way to find bugs. If your code runs correctly both with and without optimization, it's more likely to not contain bugs that will haunt you in the future compared to if you only have compiled with optimization.
I did not find an example that worked, but this might theoretically do.
int main(void) {
if(1/0) // Division by zero
puts("An error has occurred");
else
puts("Everything is fine");
}
If this is compiled without optimization, it's a high probability that it will crash. But the optimizer might assume that undefined behavior (like division by zero) never occurs, so it optimizes the code to just:
int main(void) {
puts("Everything is fine");
}
Assume that 1/0 is some kind of error check that is very unlikely to evaluate to true, so you would normally assume the program prints "Everything is fine". Here, the optimizer hides a bug.
5. The optimizer might produce a binary that's bigger in size, or is using more memory. Or something else that's not desirable.
This sometimes matters. Especially in embedded systems. Usually (always) -O0 produces very big code, but you might want to use -Os (optimize for size instead of speed) instead of -O3 to get a small binary. And sometimes also to get faster code. See below.
6. The optimizer might produce slower code
Yep, really. It's not often, but it may happen. A related but not equivalent example is illustrated in this question where the compiler generates faster code when optimizing for size of executable than speed.
If you never use a source level debugger you probably could. But if you never use a source level debugger, you probably should.
Unoptimized code has a direct one-to-one correspondence to statements expressions and variables in the source code, so when stepping through the code, it all makes sense - all the lines are executed in the order you would expect and all variables have a valid state when you would expect them to do so.
Optimised code on the other hand can eliminate code and variables, and reorder execution and generally render source level debugging a nonsense. Sometimes you get a bug that only appears in an optimised build, so you may have to deal with it, but generally such things are a result of undefined behaviour, and it is generally better to avoid that in the first instance.
One thing to consider is that in development you have performed all your testing and development on unoptimized code; so you could debug it. If, on the day you release it you crank up the optimiser and ship it, you are essentially shipping a whole lot of untested code. Testing is hard, and you really should test what you release, so between building and releasing you may have a lot of work to do to eliminate the risk. Releasing to the same build spec that you have been testing every day throughout development may be lower risk.
For code running on a desktop responding to and waiting for user input, or which disk or network I/O bound, making the code faster or smaller often serves little purpose. There may be specific parts of a large application that will benefit such as sorting or searching algorithms on large data sets, or image or audio processing, and for those you might use targeted rather then whole application optimisation.
In embedded systems where often you are using processors much slower than Desktop systems with much smaller memory resources, optimisation for both speed and size may be critical, but even there the code normally has to both fit and meet real-time deadlines in its debug build in order to support test and debugging. If it only works optimised, it will be much harder to debug.
Apart from optimising your code, it should perhaps be noted that in order to do that job, the optimiser has to perform a much deeper analysis the code through techniques such as abstract execution, and in doing so can find bugs and issue warnings that normal compilation will not detect. For example the optimiser is rather good at detecting variables that may be used before they are initialised. To that end, I would recommend switching on and max'ing the optimiser as a kind of "poor man's" static analysis, even if you use a lower optimisation level for release - for the reasons given earlier.
The optimiser is also the most complex part of any compiler; if the compiler is going to have a bug, it is likely to be in the optimiser. That said I have only ever encountered one such confirmed bug, in Microsoft C v6.0 1989! More often what at first appears to be a compiler bug turns out to be undefined behaviour or latent bugs in the source being compiled that manifest themselves with different code generation options.
Personally I usually have optimisation turned on.
My reasons are:
The shipped code is built with optimisation as we need the -- especially numerical -- performance. Since you can't ship what you haven't tested the test version must also be optimised. It would I suppose be possible to build without optimisation during development but I begrudge the extra time to then build with optimisation, and test, prior to release to test. Moreover performance is sometimes part of the spec, so some development test has to be done with optimised code.
I don't find using a debugger so very tough with optimised code. Mind you, given the kind of programs I mostly write -- fancy filters without user interfaces and numerical libraries -- printf and valgrind (which works fine with optimised code) are my preferred tools.
In recent versions of gcc, at least, more and better diagnostics are produced with optimisation on rather than off.
This, like so much else in programming, will of course vary with circumstances.
One reason is probably just: tradition. The first C compiler was written for the DEC PDP-11, which had a 64k address space. (That's right, a tenth of that famous but mythical old IBM PC quote about "640k should be enough for anybody".) The first C compiler ran as quite a number of separate programs or passes: there was the preprocessor cpp, the parser c0, the code generator c1, the assembler as, and the linker ld. If you asked for optimization, it ran as a separate pass c2 which was a "peephole optimizer" operating on c1's output, before passing it to as.
Compilation was much slower in those days than it is today (because of course the processors were much slower). People didn't routinely request optimization for everyday work, because it really did cost you something significant in your edit/compile/debug cycle.
And although a whole lot has changed since then, the fact that optimization is something extra, something special, that you have to request explicitly, lives on.

How can I output a sequence as post order? [duplicate]

This is intended to be a general-purpose question to assist new programmers who have a problem with a program, but who do not know how to use a debugger to diagnose the cause of the problem.
This question covers three classes of more specific question:
When I run my program, it does not produce the output I expect for the input I gave it.
When I run my program, it crashes and gives me a stack trace. I have examined the stack trace, but I still do not know the cause of the problem because the stack trace does not provide me with enough information.
When I run my program, it crashes because of a segmentation fault (SEGV).
A debugger is a program that can examine the state of your program while your program is running. The technical means it uses for doing this are not necessary for understanding the basics of using a debugger. You can use a debugger to halt the execution of your program when it reaches a particular place in your code, and then examine the values of the variables in the program. You can use a debugger to run your program very slowly, one line of code at a time (called single stepping), while you examine the values of its variables.
Using a debugger is an expected basic skill
A debugger is a very powerful tool for helping diagnose problems with programs. And debuggers are available for all practical programming languages. Therefore, being able to use a debugger is considered a basic skill of any professional or enthusiast programmer. And using a debugger yourself is considered basic work you should do yourself before asking others for help. As this site is for professional and enthusiast programmers, and not a help desk or mentoring site, if you have a question about a problem with a specific program, but have not used a debugger, your question is very likely to be closed and downvoted. If you persist with questions like that, you will eventually be blocked from posting more.
How a debugger can help you
By using a debugger you can discover whether a variable has the wrong value, and where in your program its value changed to the wrong value.
Using single stepping you can also discover whether the control flow is as you expect. For example, whether an if branch executed when you expect it ought to be.
General notes on using a debugger
The specifics of using a debugger depend on the debugger and, to a lesser degree, the programming language you are using.
You can attach a debugger to a process already running your program. You might do it if your program is stuck.
In practice it is often easier to run your program under the control of a debugger from the very start.
You indicate where your program should stop executing by indicating the source code file and line number of the line at which execution should stop, or by indicating the name of the method/function at which the program should stop (if you want to stop as soon as execution enters the method). The technical means that the debugger uses to cause your program to stop is called a breakpoint and this process is called setting a breakpoint.
Most modern debuggers are part of an IDE and provide you with a convenient GUI for examining the source code and variables of your program, with a point-and-click interface for setting breakpoints, running your program, and single stepping it.
Using a debugger can be very difficult unless your program executable or bytecode files include debugging symbol information and cross-references to your source code. You might have to compile (or recompile) your program slightly differently to ensure that information is present. If the compiler performs extensive optimizations, those cross-references can become confusing. You might therefore have to recompile your program with optimizations turned off.
I want to add that a debugger isn't always the perfect solution, and shouldn't always be the go-to solution to debugging. Here are a few cases where a debugger might not work for you:
The part of your program which fails is really large (poor modularization, perhaps?) and you're not exactly sure where to start stepping through the code. Stepping through all of it might be too time-consuming.
Your program uses a lot of callbacks and other non-linear flow control methods, which makes the debugger confused when you step through it.
Your program is multi-threaded. Or even worse, your problem is caused by a race condition.
The code that has the bug in it runs many times before it bugs out. This can be particularly problematic in main loops, or worse yet, in physics engines, where the problem could be numerical. Even setting a breakpoint, in this case, would simply have you hitting it many times, with the bug not appearing.
Your program must run in real-time. This is a big issue for programs that connect to the network. If you set up a breakpoint in your network code, the other end isn't going to wait for you to step through, it's simply going to time out. Programs that rely on the system clock, e.g. games with frameskip, aren't much better off either.
Your program performs some form of destructive actions, like writing to files or sending e-mails, and you'd like to limit the number of times you need to run through it.
You can tell that your bug is caused by incorrect values arriving at function X, but you don't know where these values come from. Having to run through the program, again and again, setting breakpoints farther and farther back, can be a huge hassle. Especially if function X is called from many places throughout the program.
In all of these cases, either having your program stop abruptly could cause the end results to differ, or stepping through manually in search of the one line where the bug is caused is too much of a hassle. This can equally happen whether your bug is incorrect behavior, or a crash. For instance, if memory corruption causes a crash, by the time the crash happens, it's too far from where the memory corruption first occurred, and no useful information is left.
So, what are the alternatives?
Simplest is simply logging and assertions. Add logs to your program at various points, and compare what you get with what you're expecting. For instance, see if the function where you think there's a bug is even called in the first place. See if the variables at the start of a method are what you think they are. Unlike breakpoints, it's okay for there to be many log lines in which nothing special happens. You can simply search through the log afterward. Once you hit a log line that's different from what you're expecting, add more in the same area. Narrow it down farther and farther, until it's small enough to be able to log every line in the bugged area.
Assertions can be used to trap incorrect values as they occur, rather than once they have an effect visible to the end-user. The quicker you catch an incorrect value, the closer you are to the line that produced it.
Refactor and unit test. If your program is too big, it might be worthwhile to test it one class or one function at a time. Give it inputs, and look at the outputs, and see which are not as you're expecting. Being able to narrow down a bug from an entire program to a single function can make a huge difference in debugging time.
In case of memory leaks or memory stomping, use appropriate tools that are able to analyze and detect these at runtime. Being able to detect where the actual corruption occurs is the first step. After this, you can use logs to work your way back to where incorrect values were introduced.
Remember that debugging is a process going backward. You have the end result - a bug - and find the cause, which preceded it. It's about working your way backward and, unfortunately, debuggers only step forwards. This is where good logging and postmortem analysis can give you much better results.

Profiling a Single Function Predictably

I need a better way of profiling numerical code. Assume that I'm using GCC in Cygwin on 64 bit x86 and that I'm not going to purchase a commercial tool.
The situation is this. I have a single function running in one thread. There are no code dependencies or I/O beyond memory accesses, with the possible exception of some math libraries linked in. But for the most part, it's all table look-ups, index calculations, and numerical processing. I've cache aligned all arrays on the heap and stack. Due to the complexity of the algorithm(s), loop unrolling, and long macros, the assembly listing can become quite lengthy -- thousands of instructions.
I have been resorting to using either, the tic/toc timer in Matlab, the time utility in the bash shell, or using the time stamp counter (rdtsc) directly around the function. The problem is this: the variance (which might be as much as 20% of the runtime) of the timing is larger than the size of the improvements I'm making, so I have no way of knowing if the code is better or worse after a change. You might think then it's time to give up. But I would disagree. If you are persistent, many incremental improvements can lead to a two or three times performance increase.
One problem I have had multiple times that is particularly maddening is that I make a change and the performance seems to improve consistently by say 20%. The next day, the gain is lost. Now it's possible I made what I thought was an innocuous change to the code and then completely forgot about it. But I'm wondering if it's possible something else is going on. Like maybe GCC doesn't yield a 100% deterministic output as I believe it does. Or maybe it's something simpler, like the OS moved my process to a busier core.
I have considered the following, but I don't know if any of these ideas are feasible or make any sense. If yes, I would like explicit instructions on how to implement a solution. The goal is to minimize the variance of the runtime so I can meaningfully compare different versions of optimized code.
Dedicate a core of my processor to run only my routine.
Direct control over the cache(s) (load it up or clear it out).
Ensuring my dll or executable always loads to the same place in memory. My thinking here is that maybe the set-associativity of the cache interacts with the code/data location in RAM to alter performance on each run.
Some kind of cycle accurate emulator tool (not commercial).
Is it possible to have a degree of control over context switches? Or does it even matter? My thinking is the timing of the context switches is causing variability, maybe by causing the pipeline to be flushed at an inopportune time.
In the past I have had success on RISC architectures by counting instructions in the assembly listing. This only works, of course, if the number of instructions is small. Some compilers (like TI's Code Composer for the C67x) will give you a detailed analysis of how it's keeping the ALU busy.
I haven't found the assembly listings produced by GCC/GAS to be particularly informative. With full optimization on, code is moved all over the place. There can be multiple location directives for a single block of code dispersed about the assembly listing. Further, even if I could understand how the assembly maps back into my original code, I'm not sure there's much correlation between instruction count and performance on a modern x86 machine anyway.
I made a weak attempt at using gcov for line-by-line profiling, but due to an incompatibility between the version of GCC I built and the MinGW compiler, it wouldn't work.
One last thing you can do is average over many, many trial runs, but that takes forever.
EDIT (RE: Call Stack Sampling)
The first question I have is, practically, how do I do this? In one of your power point slides, you showed using Visual Studio to pause the program. What I have is a DLL compiled by GCC with full optimizations in Cygwin. This is then called by a mex DLL compiled by Matlab using the VS2013 compiler.
The reason I use Matlab is because I can easily experiment with different parameters and visualize the results without having to write or compile any low level code. Further, I can compare my optimized DLL to the high level Matlab code to ensure my optimizations have not broken anything.
The reason I use GCC is that I have a lot more experience with it than with Microsoft's compiler. I'm familiar with many flags and extensions. Further, Microsoft has been reluctant, at least in the past, to maintain and update the native C compiler (C99). Finally, I've seen GCC kick the pants off commercial compilers, and I've looked at the assembly listing to see how it's actually done. So I have some intuition of how the compiler actually thinks.
Now, with regards to making guesses about what to fix. This isn't really the issue; it's more like making guesses about how to fix it. In this example, as is often the case in numerical algorithms, there is really no I/O (excluding memory). There are no function calls. There's virtually no abstraction at all. It's like I'm sitting on top of a piece of saran wrap. I can see the computer architecture below, and there's really nothing in-between. If I re-rolled up all the loops, I could probably fit the code on about one page or so, and I could almost count the resultant assembly instructions. Then I could do a rough comparison to the theoretical number of operations a single core is capable of doing to see how close to optimal I am. The trouble then is I lose the auto-vectorization and instruction level parallelization I got from unrolling. Unrolled, the assembly listing is too long to analyze in this way.
The point is that there really isn't much to this code. However, due to the incredible complexity of the compiler and modern computer architecture, there is quite a bit of optimization to be had even at this level. But I don't know how small changes are going to affect the output of the compiled code. Let me give a couple of examples.
This first one is somewhat vague, but I'm sure I've seen it happen a few times. You make a small change and get a 10% improvement. You make another small change and get another 10% improvement. You undo the first change and get another 10% improvement. Huh? Compiler optimizations are neither linear, nor monotonic. It's possible, the second change required an additional register, which broke the first change by forcing the compiler to alter its register allocation algorithm. Maybe, the second optimization somehow occluded the compiler's ability to do optimizations which was fixed by undoing the first optimization. Who knows. Unless the compiler is introspective enough to dump its full analysis at every level of abstraction, you'll never really know how you ended up with the final assembly.
Here is a more specific example which happened to me recently. I was hand coding AVX intrinsics to speed up a filter operation. I thought I could unroll the outer loop to increase instruction level parallelism. So I did, and the result was that the code was twice as slow. What happened was there were not enough 256 bit registers to go around. So the compiler was temporarily saving results on the stack, which killed performance.
As I was alluding to in this post, which you commented on, it's best to tell the compiler what you want, but unfortunately, you often have no choice and are forced to hand tweak optimizations, usually via guess and check.
So I guess my question would be, in these scenarios (the code is effectively small until unrolled, each incremental performance change is small, and you're working at a very low level of abstraction), would it be better to have "precision of timing" or is call stack sampling better at telling me which code is superior?
I've faced a similar problem some time ago but that was on Linux which made it easier to tweak. Basically the noise introduced by OS (called "OS jitter") was as big as 5-10% in SPEC2000 tests (I can imagine it's much higher on Windows due to much bigger amount of bloatware).
I was able to bring deviation to below 1% by combination of the following:
disable dynamic frequency scaling (better do this both in BIOS and in Linux kernel as not all kernel versions do this reliably)
disable memory prefetching and other fancy settings like "Turbo boost", etc. (BIOS, again)
disable hyperthreading
enable high-performance process scheduler in kernel
bind process to core to prevent thread migration (use core 0 - for some reason it was more reliable on my kernel, go figure)
boot to single-user mode (in which no services are running) - this isn't as easy in modern systemd-based distros
disable ASLR
disable network
drop OS pagecache
There may be more to it but 1% noise was good enough for me.
I might put detailed instructions to github later today if you need them.
-- EDIT --
I've published my benchmarking script and instructions here.
Am I right that what you're doing is making an educated guess of what to fix, fixing it, and then trying to measure to see if it made any difference?
I do it a different way, which works especially well as the code gets large.
Rather than guess (which I certainly can) I let the program tell me how the time is spent, by using this method.
If the method tells me that roughly 30% is spent doing such-and-so, I can concentrate on finding a better way to do that.
Then I can run it and just time it.
I don't need a lot of precision.
If it's better, that's great.
If it's worse, I can undo the change.
If it's about the same, I can say "Oh well, maybe it didn't save much, but let's do it all again to find another problem,"
I need not worry.
If there's a way to speed up the program, this will pinpoint it.
And often the problem is not just a simple statement like "line or routine X spends Y% of the time", but "the reason it's doing that is Z in certain cases" and the actual fix may be elsewhere.
After fixing it, the process can be done again, because a different problem, which was small before, is now larger (as a percent, because the total has been reduced by fixing the first problem).
Repetition is the key, because each speedup factor multiplies all the previous, like compound interest.
When the program no longer points out things I can fix, I can be sure it is nearly optimal, or at least nobody else is likely to beat it.
And at no point in this process did I need to measure the time with much precision.
Afterwards, if I want to brag about it in a powerpoint, maybe I'll do multiple timings to get smaller standard error, but even then, what people really care about is the overall speedup factor, not the precision.

Writing a VM - well formed bytecode?

I'm writing a virtual machine in C just for fun. Lame, I know, but luckily I'm on SO so hopefully no one will make fun :)
I wrote a really quick'n'dirty VM that reads lines of (my own) ASM and does stuff. Right now, I only have 3 instructions: add, jmp, end. All is well and it's actually pretty cool being able to feed lines (doing it something like write_line(&prog[1], "jmp", regA, regB, 0); and then running the program:
while (machine.code_pointer <= BOUNDS && DONE != true)
{
run_line(&prog[machine.cp]);
}
I'm using an opcode lookup table (which may not be efficient but it's elegant) in C and everything seems to be working OK.
My question is more of a "best practices" question but I do think there's a correct answer to it. I'm making the VM able to read binary files (storing bytes in unsigned char[]) and execute bytecode. My question is: is it the VM's job to make sure the bytecode is well formed or is it just the compiler's job to make sure the binary file it spits out is well formed?
I only ask this because what would happen if someone would edit a binary file and screw stuff up (delete arbitrary parts of it, etc). Clearly, the program would be buggy and probably not functional. Is this even the VM's problem? I'm sure that people much smarter than me have figured out solutions to these problems, I'm just curious what they are!
Is it the VM's job to make sure the bytecode is well formed or is it just the compiler's job to make sure the binary file it spits out is well formed?
You get to decide.
Best practice is to have the VM do a single check before execution, cost proportional to the size of the program, which is sophisticated enought to guarantee that nothing wonky can happen during execution. Then during actual execution of the bytecode, you run with no checks.
However, the check-before-running idea can require some very sophisticated analysis, and even the most performance-conscious VMs often have some checks at run time (example: array bounds).
For a hobby project, I'd keep things simple and have the VM check sanity every time you execute an instruction. The overhead for most instructions won't be too great.
The same issue arises in Java, and as I recall, in that case the VM does have to do some checks to make sure the bytecode is well formed. In that situation, it's actually a serious issue because of the potential for security problems: if someone can alter a Java bytecode file to contain something that the compiler would never output (such as accessing a private variable from another class), it could potentially expose sensitive data being held in the application's memory, or could allow the application to access a website that it shouldn't be allowed to, or something. Java's virtual machine includes a bytecode verifier to make sure, to the extent possible, that these sorts of things don't happen.
Now, in your case, unless your homemade language takes off and becomes popular, the security aspect is something you don't have to worry about so much; after all, who's going to be hacking your programs, other than you? Still, I would say it's a good idea to make sure that your VM at least has a reasonable failure strategy for when the bytecode is invalid. At a minimum, if it encounters something it doesn't understand and can't process, it should detect that and fail with an error message, which will make debugging easier on your part.
Virtual machines that interpret bytecode generally have some way of validating their input; for example, Java will throw a VerifyError if the class file is in an inconsistent state
However, it sounds like you're implementing a processor, and since they tend to be lower-level there's less ways you can manage to get things in a detectable invalid state -- giving it an undefined opcode is one obvious way. Real processors will signal that the process attempted to execute an illegal instruction, and the OS will deal with it (Linux kills it with SIGILL, for example)
If you're concerned about someone having edited the binary file, then there is only one answer to your question: the VM must do the check. It's the only way you have a chance to detect the tampering. The compiler just creates the binary. It has no way of detecting downstream tampering.
It makes sense to have the compiler do as much sanity checking as possible (since it only has to do it once), but there's always going to be issues that can't be detected by static analysis, like [cough] stack overflow, array range errors, and the like.
I'd say it's legitimate for your VM to let the emulated processor catch fire, as long as the VM implementation itself doesn't crash. As the VM implementor, you get to set the rules. But if you want virtual hardware companies to virtually buy your virtual chip, you'll have to do something a little more forgiving of errors: good options might be to raise an exception (harder to implement) or reset the processor (much easier). Or maybe you just define every opcode to be valid, except that some are "undocumented" - they do something unspecified, other than crashing your implementation. Rationale: if (!) your VM implementation is to run several instances of the guest simultaneously, it would be very bad if one guest were able to cause others to fail.

File descriptor limits and default stack sizes

Where I work we build and distribute a library and a couple complex programs built on that library. All code is written in C and is available on most 'standard' systems like Windows, Linux, Aix, Solaris, Darwin.
I started in the QA department and while running tests recently I have been reminded several times that I need to remember to set the file descriptor limits and default stack sizes higher or bad things will happen. This is particularly the case with Solaris and now Darwin.
Now this is very strange to me because I am a believer in 0 required environment fiddling to make a product work. So I am wondering if there are times where this sort of requirement is a necessary evil, or if we are doing something wrong.
Edit:
Great comments that describe the problem and a little background. However I do not believe I worded the question well enough. Currently, we require customers, and hence, us the testers, to set these limits before running our code. We do not do this programatically. And this is not a situation where they MIGHT run out, under normal load our programs WILL run out and seg fault.
So rewording the question, is requiring the customer to change these ulimit values to run our software to be expected on some platforms, ie, Solaris, Aix, or are we as a company making it to difficult for these users to get going?
Bounty:
I added a bounty to hopefully get a little more information on what other companies are doing to manage these limits. Can you set these pragmatically? Should we? Should our programs even be hitting these limits or could this be a sign that things might be a bit messy under the covers? That is really what I want to know, as a perfectionist a seemingly dirty program really bugs me.
If you need to change these values in order to get your QA tests to run, then that is not too much of a problem. However, requiring a customer to do this in order for the program to run should (IMHO) be avoided. If nothing else, create a wrapper script that sets these values and launches the application so that users will still have a one-click application launch. Setting these from within the program would be the preferable method, however. At the very least, have the program check the limits when it is launched and (cleanly) error out early if the limits are too low.
If a software developer told me that I had to mess with my stack and descriptor limits to get their program to run, it would change my perception of the software. It would make me wonder "why do they need to exceed the system limits that are apparently acceptable for every other piece of software I have?". This may or may not be a valid concern, but being asked to do something that (to many) can seem hackish doesn't have the same professional edge as an program that you just launch and go.
This problem seems even worse when you say "this is not a situation where they MIGHT run out, under normal load our programs WILL run out and seg fault". A program exceeding these limits is one thing, but a program that doesn't gracefully handle the error conditions resulting from exceeding these limits is quite another. If you hit the file handle limit and attempt to open a file, you should get an error indicating that you have too many files open. This shouldn't cause a program crash in a well-designed program. It may be more difficult to detect stack usage issues, but running out of file descriptors should never cause a crash.
You don't give much details about what type of program this is, but I would argue that it's not safe to assume that users of your program will necessarily have adequate permissions to change these values. In any case, it's probably also unsafe to assume that nothing else might change these values while your program is running without the user's knowledge.
While there are always exceptions, I would say that in general a program that exceeds these limits needs to have its code re-examined. The limits are there for a reason, and pretty much every other piece of software on your system works within those limits with no problems. Do you really need that many files open at the same time, or would it be cleaner to open a few files, process them, close them, and open a few more? Is your library/program trying to do too much in one big bundle, or would it be better to break it into smaller, independent parts that work together? Are you exceeding your stack limits because you are using a deeply-recursive algorithm that could be re-written in a non-recursive manner? There are likely many ways in which the library and program in question can be improved in order to ease the need to alter the system resource limits.
The short answer is: it's normal, but not inflexible. Of course, limits are in place to prevent rogue processes or users from starving the system of resources. Desktop systems will be less restrictive than server systems but still have certain limits (e.g. filehandles.)
This is not to say that limits cannot be altered in persistent/reproduceable manners, either by the user at the user's discretion (e.g. by adding the relevant ulimit calls in .profile) or programatically from within programs/libraries which know with certitude that they will require large amounts of filehandles (e.g. setsysinfo(SSI_FD_NEWMAX,...)), stack (provided at pthread creation time), etc.
On Darwin, the default soft limit on the number of open files is 256; the default hard limit is unlimited.
AFAICR, on Solaris, the default soft limit on the number of open files is 16384 and the hard limit is 32768.
For stack sizes, Darwin has soft/hard limits of 8192/65536 KB. I forget what the limit is on Solaris (and my Solaris machine is unavailable - power outages in Poughkeepsie, NY mean I can't get to the VPN to access the machine in Kansas from my home in California), but it is substantial.
I would not worry about the hard limits. If I thought the library might run out of 256 file descriptors, I'd increase the soft limit on Darwin; I would probably not bother on Solaris.
Similar limits apply on Linux and AIX. I can't answer for Windows.
Sad story: a few years ago now, I removed the code that changed the maximum file size limit in a program - because it had not been changed from the days when 2 MB was a big file (and some systems had a soft limit of just 0.5 MB). Once upon a decade and some ago, it actually increased the limit; when it was removed, it was annoying because it reduced the limit. Tempus fugit and all that.
On SuSE Linux (SLES 10), the open files limits are 4096/4096, and the stack limits are 8192/unlimited.
As you have to support a large number of different systems i would consider it wise to setup certain known to be good values for system limits/resources because the default values can differ wildly between systems.
The default size for pthread stacks is for example such a case. I recently had to find out that the default on HPUX11.31 is 256KB(!) which isn't very reasonable at least for our applications.
Setting up well defined values increases the portability of an application as you can be sure that there are X file descriptors, a stack size of Y, ... on every platform and that things are not just working by good luck.
I have the tendency to setup such limits from within the program itself as the user has less things to screw up (someone always tries to run the binary without the wrapper script). To optionally allow for runtime customization environment variables could be used to override the defaults (still enforcing the minimum limits).
Lets look at it this way. It is not very customer friendly to require customers to set these limits. As detailed in the other answers, you are most likely to hit soft limits and these can be changed. So change them automatically, if necessary in a script that starts the actual application (you can even write it so that it fails if the hard limits are too low and produce a nice error message instead of a segfault).
That's the practical part of it. Without knowing what the application does I'm a bit at a guess, but in most cases you should not be anywhere close to hitting any of the default limits of (even less progressive) operating systems. Assuming the system is not a server that is bombarded with requests (hence the large amount of file/socket handles used) it is probably a sign of sloppy programming. Based on experience with programmers, I would guess that file descriptors are left open for files that are only read/written once, or that the system keeps open a file descriptor on a file that is only sporadically changed/read.
Concerning stack sizes, that can mean two things. The standard cause of a program running out of stack is excessive recursion (or unbounded recursion), which is an error condition that the limits actually are designed to address. The second thing is that some big (probably configuration) structures are allocated on the stack that should be allocated in heap memory. It might even be worse and those huge structures are being passed around by value (instead of reference) and that would mean a big hit on available (wasted) stack space as well as a big performance penalty.
A small tip : If you plan to run the application over 64 bit processor, then please be careful about setting stacksize unlimited. Which in 64 Bit Linux system give -1 as stacksize.
Thanks
Shyam
Perhaps you could add whatever is appropriate to the start script, like 'ulimit -n -S 4096'.
But having worked with Solaris since 2.6, its not unusual to modify rlim_fd_cur and rlim_fd_max in /etc/system permanently. In older versions of Solaris, they're just too low for some workloads, like running webservers.

Resources