How can I output a sequence as post order? [duplicate] - c

This is intended to be a general-purpose question to assist new programmers who have a problem with a program, but who do not know how to use a debugger to diagnose the cause of the problem.
This question covers three classes of more specific question:
When I run my program, it does not produce the output I expect for the input I gave it.
When I run my program, it crashes and gives me a stack trace. I have examined the stack trace, but I still do not know the cause of the problem because the stack trace does not provide me with enough information.
When I run my program, it crashes because of a segmentation fault (SEGV).

A debugger is a program that can examine the state of your program while your program is running. The technical means it uses for doing this are not necessary for understanding the basics of using a debugger. You can use a debugger to halt the execution of your program when it reaches a particular place in your code, and then examine the values of the variables in the program. You can use a debugger to run your program very slowly, one line of code at a time (called single stepping), while you examine the values of its variables.
Using a debugger is an expected basic skill
A debugger is a very powerful tool for helping diagnose problems with programs. And debuggers are available for all practical programming languages. Therefore, being able to use a debugger is considered a basic skill of any professional or enthusiast programmer. And using a debugger yourself is considered basic work you should do yourself before asking others for help. As this site is for professional and enthusiast programmers, and not a help desk or mentoring site, if you have a question about a problem with a specific program, but have not used a debugger, your question is very likely to be closed and downvoted. If you persist with questions like that, you will eventually be blocked from posting more.
How a debugger can help you
By using a debugger you can discover whether a variable has the wrong value, and where in your program its value changed to the wrong value.
Using single stepping you can also discover whether the control flow is as you expect. For example, whether an if branch executed when you expect it ought to be.
General notes on using a debugger
The specifics of using a debugger depend on the debugger and, to a lesser degree, the programming language you are using.
You can attach a debugger to a process already running your program. You might do it if your program is stuck.
In practice it is often easier to run your program under the control of a debugger from the very start.
You indicate where your program should stop executing by indicating the source code file and line number of the line at which execution should stop, or by indicating the name of the method/function at which the program should stop (if you want to stop as soon as execution enters the method). The technical means that the debugger uses to cause your program to stop is called a breakpoint and this process is called setting a breakpoint.
Most modern debuggers are part of an IDE and provide you with a convenient GUI for examining the source code and variables of your program, with a point-and-click interface for setting breakpoints, running your program, and single stepping it.
Using a debugger can be very difficult unless your program executable or bytecode files include debugging symbol information and cross-references to your source code. You might have to compile (or recompile) your program slightly differently to ensure that information is present. If the compiler performs extensive optimizations, those cross-references can become confusing. You might therefore have to recompile your program with optimizations turned off.

I want to add that a debugger isn't always the perfect solution, and shouldn't always be the go-to solution to debugging. Here are a few cases where a debugger might not work for you:
The part of your program which fails is really large (poor modularization, perhaps?) and you're not exactly sure where to start stepping through the code. Stepping through all of it might be too time-consuming.
Your program uses a lot of callbacks and other non-linear flow control methods, which makes the debugger confused when you step through it.
Your program is multi-threaded. Or even worse, your problem is caused by a race condition.
The code that has the bug in it runs many times before it bugs out. This can be particularly problematic in main loops, or worse yet, in physics engines, where the problem could be numerical. Even setting a breakpoint, in this case, would simply have you hitting it many times, with the bug not appearing.
Your program must run in real-time. This is a big issue for programs that connect to the network. If you set up a breakpoint in your network code, the other end isn't going to wait for you to step through, it's simply going to time out. Programs that rely on the system clock, e.g. games with frameskip, aren't much better off either.
Your program performs some form of destructive actions, like writing to files or sending e-mails, and you'd like to limit the number of times you need to run through it.
You can tell that your bug is caused by incorrect values arriving at function X, but you don't know where these values come from. Having to run through the program, again and again, setting breakpoints farther and farther back, can be a huge hassle. Especially if function X is called from many places throughout the program.
In all of these cases, either having your program stop abruptly could cause the end results to differ, or stepping through manually in search of the one line where the bug is caused is too much of a hassle. This can equally happen whether your bug is incorrect behavior, or a crash. For instance, if memory corruption causes a crash, by the time the crash happens, it's too far from where the memory corruption first occurred, and no useful information is left.
So, what are the alternatives?
Simplest is simply logging and assertions. Add logs to your program at various points, and compare what you get with what you're expecting. For instance, see if the function where you think there's a bug is even called in the first place. See if the variables at the start of a method are what you think they are. Unlike breakpoints, it's okay for there to be many log lines in which nothing special happens. You can simply search through the log afterward. Once you hit a log line that's different from what you're expecting, add more in the same area. Narrow it down farther and farther, until it's small enough to be able to log every line in the bugged area.
Assertions can be used to trap incorrect values as they occur, rather than once they have an effect visible to the end-user. The quicker you catch an incorrect value, the closer you are to the line that produced it.
Refactor and unit test. If your program is too big, it might be worthwhile to test it one class or one function at a time. Give it inputs, and look at the outputs, and see which are not as you're expecting. Being able to narrow down a bug from an entire program to a single function can make a huge difference in debugging time.
In case of memory leaks or memory stomping, use appropriate tools that are able to analyze and detect these at runtime. Being able to detect where the actual corruption occurs is the first step. After this, you can use logs to work your way back to where incorrect values were introduced.
Remember that debugging is a process going backward. You have the end result - a bug - and find the cause, which preceded it. It's about working your way backward and, unfortunately, debuggers only step forwards. This is where good logging and postmortem analysis can give you much better results.

Related

Occasionally after compiling, computer locks up

This is a strange issue, but it happens between one and five times a month.
During development, I compile frequently (this is not the unusual part.) From time to time, running the freshly-compiled binary locks up my system. Tray clock doesn't increment, ctrl+alt+backspace doesn't kill Xorg. Totally conked.
I physically powercycle the machine and everything's OK. Application runs fine, from the same binary that murdered my machine earlier or after a no-change recompile, and I get on with my work.
But it still bothers me, largely because I have no idea what causes it. This can occur with binaries compiling with either Clang or GCC. What is going on?
Hard to say, but I have two ideas:
1) Bad RAM
This is possible, but depending on your code, #2 might be more likely.
2) Buffer overflow bug
If you are overwriting memory due to a bug in your code, you could be putting some bits in memory that happen to be assembly instructions as well. I would look very carefully at the code you have to see where you don't check array lengths before you write to them.

Self-modifying code for trace hooks?

I'm looking for the least-overhead way of inserting trace/logging hooks into some very performance-sensitive driver code. This logging stuff has to always be compiled in, but most of the time do nothing (but do nothing very fast).
There isn't anything much simpler than just having a global on/off word, doing an if(enabled){log()}. However, if possible I'd like to even avoid the cost of loading that word every time I hit one of my hooks. It occurs to me that I could potentially use self-modifying code for this -- i.e. everywhere I have a call to my trace function, I overwrite the jump with a NOP when I want to disable the hooks, and replace the jump when I want to enable them.
A quick google doesn't turn up any prior art on this -- has anyone done it? Is it feasible, are there any major stumbling blocks that I'm not foreseeing?
(Linux, x86_64)
Yes, this technique has been implemented within the Linux kernel, for exactly the same purpose (tracing hooks).
See the LWN article on Jump Labels for a starting point.
There's not really any major stumbling blocks, but a few minor ones: multithreaded processes (you will have to stop all other threads while you're enabling or disabling the code); incoherent instruction cache (you'll need to ensure the I-cache is flushed, on every core).
Does it matter if your compiled driver is suddenly twice as large?
Build two code paths -- one with logging, one without. Use a global function pointer(s) to jump into the performance-sensitive section(s), overwrite them as appropriate.
If there were a way to somehow declare a register global, you could load the register with the value of your word at every entry point into your driver from the outside and then just check the register. Of course, then you'd be denying the use of that register to the optimizer, which might have some unpleasant performance consequences.
I'm writing not so much on the issue of whether this is possible or not but if you gain anything significant.
On the one hand you don't want to test "logging enabled" every time a logging possibility presents itself and on the other need to test "logging enabled" and overwrite code with either the yes- or the no-case code. Or does your driver "remember" that it was no the last time and since no is requested this time nothing needs to be done?
The logic necessary does not appear to be trivial compared to testing every time.

Writing a VM - well formed bytecode?

I'm writing a virtual machine in C just for fun. Lame, I know, but luckily I'm on SO so hopefully no one will make fun :)
I wrote a really quick'n'dirty VM that reads lines of (my own) ASM and does stuff. Right now, I only have 3 instructions: add, jmp, end. All is well and it's actually pretty cool being able to feed lines (doing it something like write_line(&prog[1], "jmp", regA, regB, 0); and then running the program:
while (machine.code_pointer <= BOUNDS && DONE != true)
{
run_line(&prog[machine.cp]);
}
I'm using an opcode lookup table (which may not be efficient but it's elegant) in C and everything seems to be working OK.
My question is more of a "best practices" question but I do think there's a correct answer to it. I'm making the VM able to read binary files (storing bytes in unsigned char[]) and execute bytecode. My question is: is it the VM's job to make sure the bytecode is well formed or is it just the compiler's job to make sure the binary file it spits out is well formed?
I only ask this because what would happen if someone would edit a binary file and screw stuff up (delete arbitrary parts of it, etc). Clearly, the program would be buggy and probably not functional. Is this even the VM's problem? I'm sure that people much smarter than me have figured out solutions to these problems, I'm just curious what they are!
Is it the VM's job to make sure the bytecode is well formed or is it just the compiler's job to make sure the binary file it spits out is well formed?
You get to decide.
Best practice is to have the VM do a single check before execution, cost proportional to the size of the program, which is sophisticated enought to guarantee that nothing wonky can happen during execution. Then during actual execution of the bytecode, you run with no checks.
However, the check-before-running idea can require some very sophisticated analysis, and even the most performance-conscious VMs often have some checks at run time (example: array bounds).
For a hobby project, I'd keep things simple and have the VM check sanity every time you execute an instruction. The overhead for most instructions won't be too great.
The same issue arises in Java, and as I recall, in that case the VM does have to do some checks to make sure the bytecode is well formed. In that situation, it's actually a serious issue because of the potential for security problems: if someone can alter a Java bytecode file to contain something that the compiler would never output (such as accessing a private variable from another class), it could potentially expose sensitive data being held in the application's memory, or could allow the application to access a website that it shouldn't be allowed to, or something. Java's virtual machine includes a bytecode verifier to make sure, to the extent possible, that these sorts of things don't happen.
Now, in your case, unless your homemade language takes off and becomes popular, the security aspect is something you don't have to worry about so much; after all, who's going to be hacking your programs, other than you? Still, I would say it's a good idea to make sure that your VM at least has a reasonable failure strategy for when the bytecode is invalid. At a minimum, if it encounters something it doesn't understand and can't process, it should detect that and fail with an error message, which will make debugging easier on your part.
Virtual machines that interpret bytecode generally have some way of validating their input; for example, Java will throw a VerifyError if the class file is in an inconsistent state
However, it sounds like you're implementing a processor, and since they tend to be lower-level there's less ways you can manage to get things in a detectable invalid state -- giving it an undefined opcode is one obvious way. Real processors will signal that the process attempted to execute an illegal instruction, and the OS will deal with it (Linux kills it with SIGILL, for example)
If you're concerned about someone having edited the binary file, then there is only one answer to your question: the VM must do the check. It's the only way you have a chance to detect the tampering. The compiler just creates the binary. It has no way of detecting downstream tampering.
It makes sense to have the compiler do as much sanity checking as possible (since it only has to do it once), but there's always going to be issues that can't be detected by static analysis, like [cough] stack overflow, array range errors, and the like.
I'd say it's legitimate for your VM to let the emulated processor catch fire, as long as the VM implementation itself doesn't crash. As the VM implementor, you get to set the rules. But if you want virtual hardware companies to virtually buy your virtual chip, you'll have to do something a little more forgiving of errors: good options might be to raise an exception (harder to implement) or reset the processor (much easier). Or maybe you just define every opcode to be valid, except that some are "undocumented" - they do something unspecified, other than crashing your implementation. Rationale: if (!) your VM implementation is to run several instances of the guest simultaneously, it would be very bad if one guest were able to cause others to fail.

C Programming: Debugging with pthreads

One of the hardest things for me to initially adjust to was my first intense experience programming with pthreads in C. I was used to knowing exactly what the next line of code to be run would be and most of my debugging techniques centered around that expectation.
What are some good techniques to debugging with pthreads in C? You can suggest personal methodologies without any added tools, tools you use, or anything else that helps you debug.
P.S. I do my C programming using gcc in linux, but don't let that necessarily restrain your answer
Valgrind is an excellent tool to find race conditions and pthreads API misuses. It keeps a model of program memory (and perhaps of shared resources) accesses and will detect missing locks even when the bug is benign (which of course means that it will completely unexpectedly become less benign at some later point).
To use it, you invoke valgrind --tool=helgrind, here is its manual. Also, there is valgrind --tool=drd (manual). Helgrind and DRD use different models so they detect overlapping but possibly different set of bugs. False positives also may occur.
Anyway, valgrind has saved countless hours of debugging (not all of them though :) for me.
One of the things that will suprise you about debugging threaded programs is that you will often find the bug changes, or even goes away when you add printf's or run the program in the debugger (colloquially known as a Heisenbug).
In a threaded program, a Heisenbug usually means you have a race condition. A good programmer will look for shared variables or resources that are order-dependent. A crappy programmer will try to blindly fix it with sleep() statements.
Debugging a multithreaded application is difficult. A good debugger such as GDB (with optional DDD front end) for the *nix environment or the one that comes with Visual Studio on windows will help tremendously.
In the 'thinking' phase, before you start coding, use the State Machine concept. It can make the design much clearer.
printf's can help you understand the dynamics of your program. But they clutter up the source code, so use a macro DEBUG_OUT() and in its definition enable it with a boolean flag. Better still, set/clear this flag with a signal that you send via 'kill -USR1'. Send the output to a log file with a timestamp.
also consider using assert(), and then analyze your core dumps using gdb and ddd.
My approach to multi-threaded debugging is similar to single-threaded, but more time is usually spent in the thinking phase:
Develop a theory as to what could be causing the problem.
Determine what kind of results could be expected if the theory is true.
If necessary, add code that can disprove or verify your results and theory.
If your theory is true, fix the problem.
Often, the 'experiment' that proves the theory is the addition of a critical section or mutex around suspect code. I will then try to narrow down the problem by systematically shrinking the critical section. Critical sections are not always the best fix (though can often be the quick fix). However, they're useful for pinpointing the 'smoking gun'.
Like I said, the same steps apply to single-threaded debugging, though it is far too easy to just jump into a debugger and have at it. Multi-threaded debugging requires a much stronger understanding of the code, as I usually find the running multi-threaded code through a debugger doesn't yield anything useful.
Also, hellgrind is a great tool. Intel's Thread Checker performs a similar function for Windows, but costs a lot more than hellgrind.
I pretty much develop in an exclusively multi-threaded, high performance world so here's the general practice I use.
Design- the best optimization is a better algorithm:
1) Break you functions into LOGICALLY separable pieces. This means that a call does "A" and ONLY "A"- not A then B then C...
2) NO SIDE EFFECTS: Abolish all nakedly global variables, static or not. If you cannot fully abolish side effects, isolate them to a few locations (concentrate them in the code).
3) Make as many isolated components RE-ENTRANT as possible. This means they're stateless- they take all their inputs as constants and only manipulate DECLARED, logically constant parameters to produce the output. Pass-by-value instead of reference wherever you can.
4) If you have state, make a clear separation between stateless sub-assemblies and the actual state machine. Ideally the state machine will be a single function or class manipulating stateless components.
Debugging:
Threading bugs tend to come in 2 broad flavors- races and deadlocks. As a rule, deadlocks are much more deterministic.
1) Do you see data corruption?: YES => Probably a race.
2) Does the bug arise on EVERY run or just some runs?: YES => Likely a deadlock (races are generally non-deterministic).
3) Does the process ever hang?: YES => There's a deadlock somewhere. If it only hangs sometimes, you probably have a race too.
Breakpoints often act much like synchronization primitives THEMSELVES in the code, because they're logically similar- they force execution to stall in the current context until some other context (you) sends a signal to resume. This means that you should view any breakpoints you have in code as altering its mufti-threaded behavior, and breakpoints WILL affect race conditions but (in general) not deadlocks.
As a rule, this means you should remove all breakpoints, identify the type of bug, THEN reintroduce them to try and fix it. Otherwise, they simply distort things even more.
I tend to use lots of breakpoints. If you don't actually care about the thread function, but do care about it's side effects a good time to check them might be right before it exits or loops back to it's waiting state or whatever else it's doing.
When I started doing multithreaded programming I... stopped using debuggers.
For me the key point is good program decomposition and encapsulation.
Monitors are the easiest way of error-free multithreaded programming.
If you cannot avoid complex lock dependencies then it is easy to check if they are cyclic
- wait until program hangs ans check the stacktraces using 'pstack'.
You can break cyclic locks by introducing some new threads and asynchronous communication buffers.
Use assertions, and make sure to write singlethreaded unittests for particular components of your software - you can then run them in debugger if you want.

Reinitialize global/static memory at runtime or static analysis of global/static variables

The Problem
I'm working on a large C project (C99) that makes heavy use of global variables (I know, I know). The program works fairly well, but it was originally designed to run once and exit.
As such, it relies on it's global/static memory to be initialized with 0 (or whatever value it was declared with), and during runtime it modifies these variables (as most programs do).
However, instead of exiting on completion, I want to run the program again. I want to make a parent program that has control and visibility into this large program. Having complete visibility into the running program is very important.
The solution needs to work on macOS, Linux, and Windows.
I've considered:
1. Forking it
Make a small wrapper program that serves as the "shell", and execute the large program as needed.
Pros
OS does the hard work of resetting the memory to the correct values
Guaranteed to operate as intended
Cons
Lost visibility into the program
Can't inspect memory of executing program from wrapper during runtime, harder to tweak settings before launching, harder to collect runtime information
Need to implement a system to get internal data in/out of the program, potentially touching a lot of code
Unified experience harder (sharing a GUI window, etc)
2. Identify critical structures manually
Peruse the source, run the program multiple times, wait for program to blow up on a sanity check or bad memory access.
Pros
Easy to do
Easy to start
High visibility, code sharing, and unification
Cons
Does not catch every case, very patchwork
Time consuming
3. Refactor
Collect all globals into a single structure to memset, create initializers for variables that are initialized with a value. Handle statics on a case-by-case basis.
Pros
Conceptually easy, sledgehammer approach
High visibility, code sharing, and unification
Cons
Very time consuming, codebase large, would touch pretty much everything
4. Magic wand
Tell the OS to reinitialize global/static memory. If I need to save a value, I'll store it locally and then rewrite it when it's done.
Pros
Mostly perfect :)
Cons
Doesn't exist (?)
Very black magic
Probably not cross platform
May anger 3rd-party libs
What I am doing now
I am going with option 2 right now, just making my way through the code, leaning on the program to crash and point me in the right direction.
I'd say this method has gotten me about 80% of the way there. I've identified and reinitialized enough things that the program, more or less, can be rerun. It's not as widespread as I thought, and it gives me a lot of hope.
Occasionally, strange things happen or it doesn't operate as intended, but it also doesn't crash. This makes tracking it down more difficult.
I just need something to get me that last 20%. Maybe some sort of static analysis tool, or something to help me go through the source and see where globals are touched.
To detect easily the global and static variables you can try CppDepend and execute a cqlinq query like this one
from f in Fields where f.IsGlobal || f.IsStatic
select f
You can also modify the query if you want the variables used by a speific function or in a specific file.

Resources