Disable LLDB restoring state whenever a function called within debugger is crashed - lldb

I'm debugging a C project using LLDB debugger.
I'm attaching the debugger to a program and then calling some functions within the debugger, which are expected to crash. I wanted to know where this function crashes.
The problem is - since the function is called from within the debugger, once the function has crashed the debugger resets the state back to the state before calling the function. I don't want this, any idea how to disable this?
This is the message I get from the lldb debugger
Thanks

You can get lldb to preserve the state of a thread when an expression crashes in two ways.
1) If you want to ALWAYS stop in the crashed state when expressions crash, set the setting target.process.unwind-on-error-in-expressions to false:
settings set target.process.unwind-on-error-in-expressions false
either in the command line or in your ~\.lldbinit file.
BTW, for this reason, if you are using the lldb SB API's for scripting or you're writing your own debugger GUI and you need to call a function for some purpose, it's a good idea to explicitly override this setting when calling functions. Use the SBExpressionOptions.SetUnwindOnErrors method for this. Otherwise, your users might end up staring at a crash in an expression they did not call...
2) You can do it on a per-expression basis using:
expr -u 0 -- <expression>
Note, if you do this often, you can make an alias for this. Something like:
command alias pu expr -u 0 --
Then just do:
pu <expression>
While stopped at an expression crash, you can examine the stack, local variables, call other expressions, etc just as you would at a normal stop in lldb. When you are done with this investigation and want to return the thread to the state it was in prior to calling the expression, use the command:
thread return -x
Calling expressions on a thread nests; you can call an expression, stop when it crashes, call another expression that crashes, stop when it crashes, etc... thread return -x will unwind the youngest expression crash.
On a related note, you can also set breakpoints in a function that you call in the expression parser, hit the breakpoint and then step through the rest of the expression evaluation. That is also not on by default, but is controlled by the -i flag to expr like:
expr -i 0 -- <expression>
When you are done examining the code you've called, you can either use thread return -x to wipe the expression evaluation from the thread stack, or you can continue and the expression will finish evaluating and the result will be printed.
This trick is quite handy for watching how functions behave with odd inputs, etc.

Related

Can addresses of unmodified locals wind up corrupted in setjmp/longjmp?

If one winds up in the situation of being stuck using setjmp/longjmp (don't ask), then there are lots of nice warnings from the compiler about when you might be doing something wrong.
But with a -Wall -Wextra -pedantic build while using Address Sanitizer in Clang, I wound up with a case roughly parallel to:
void outer() {
jmp_buf buf;
ERR error;
if (setjmp(buf) ? helper(&error) : FALSE) {
// process whatever helper decided to write into error
return;
}
// do the stuff you wanted to guard that may longjmp.
// error is never modified
}
On a longjmp, looking into the helper stack frame the error pointer is null. If I look in the outer() frame it says error has been "optimized out".
It's puzzling because I'm compiling with -O0 So the "optimized out" is weird for it to be saying. But as with most things longjmp-y, I wonder what keeps the compiler from possibly making a decision on what register it's going to put the error address in ahead of time...then having that be invalidated.
Is address sanitizer punking me, or do I actually have to write something like:
void outer() {
jmp_buf buf;
ERR error;
volatile ERR* error_ptr = &error;
if (setjmp(buf) ? helper(error_ptr) : FALSE) {
// process whatever helper decided to write into error
return;
}
// do the stuff you wanted to guard that may longjmp.
// error is never modified
}
As I research this, I've noticed that jmp_bufs are not locals in any of the examples I see. Is that something you can't do? :-/
NOTE: See #AnT's answer and comments below for the "language-lawyer" issue about the setjmp() ? ... : ... construct. But what I actually had going on here turned out to be a broken longjmp call that was after the function exited. Per longjmp() docs (also: common sense), that's definitely broken; I just didn't realize that's what had happened:
If the function that called setjmp has exited, the behavior is undefined (in other words, only long jumps up the call stack are allowed)
Is there a reason the helper call is "embedded" into the controlling expression of if through ?: operator? This is actually a violation of the language requirements that says
7.13.1.1 The setjmp macro
4 An invocation of the setjmp macro shall appear only in one of the
following contexts:
— the entire controlling expression of a selection
or iteration statement;
— one operand of a relational or equality
operator with the other operand an integer constant expression, with
the resulting expression being the entire controlling expression of
a selection or iteration statement;
— the operand of a unary !
operator with the resulting expression being the entire controlling
expression of a selection or iteration statement; or
— the entire
expression of an expression statement (possibly cast to void).
5 If
the invocation appears in any other context, the behavior is
undefined.
The whole point of that requirement is to make sure the "unpredictable" return from setjmp, triggered by longjmp, should not land in the middle of an expression evaluation, i.e. in an unsequenced context. In your specific example it is rather obvious that from the point of view of abstract C language, variable error cannot possibly be changed by setjmp call, which opens the door for many optimizations.
It is hard to say what happened here, since helper receives a pointer &error, not error's direct value. At the surface everything seems fine from practical point of view. But formally the behavior is undefined.
In your case, you should not try to fix thing by making variables volatile, but rather should simplify the context the setjmp is used in to conform with the above requirements. Something along the lines of
if (setjmp(buf) != 0) {
helper(&error);
...
return;
}

LLDB print a variable named class

I have a C programm in which a variable called class is used.
I'm trying to debug it with LLDB but I'm encountering the following problem:
(lldb) print class
error: warning: declaration does not declare anything
error: declaration of anonymous class must be a definition
error: 1 errors parsing expression
I believe this problem occurs because class is a reserved keyword in C++ and LLDB interprets the code passed to print as C++. Is there still a way to print the content of my variable?
(Please do not advise me to rename the variable, I would have come up with this myself, if it was possible)
The problem is that the lldb expression parser uses C++ references to implement the job of finding & extracting results from the expressions we run. So we currently have to compile the expressions as C++ expressions, and as you guessed, you can't use "class" in a C++ expression. At some point, we have to teach clang how to do "C with references" and then we'll be able to compile & execute real C expressions.
However, provided you have debug information for "class", you can print the value of the variable using the "frame variable" command, i.e.:
(lldb) frame variable class
The "frame variable" command does not use the expression parser, it goes directly to the debug information, extracts the type & location of the variable, and prints that directly. So it doesn't suffer this restriction. If "class" is a global variable, not a frame local, use target variable instead.
frame variable does support a limited set of "expression-like" features, you can say:
(lldb) frame variable class.member
or
(lldb) frame variable *class
but you can't use it to call functions or pass the variable to a function call.
If you need to do that you can run the command:
(lldb) frame variable -L class
which will print the location of the variable. Usually that's some address, in which case you can use
(TypeOfClass *) <Address From Frame Variable>
in your expression in place of "class". If the location turns out to be a register, then use "$" appropriately cast in your expression. If you are going to use the variable in a number of expressions, remember you can do:
(lldb) expr TypeOfClass *$class = (TypeOfClass *) <Address From Frame Variable>
and then just use $class in your subsequent expressions. If you got super-motivated, you could even write a Python command that automates these steps...

Compiler optimization call-ret vs jmp

I am building one of the projects and I am looking at the generated list file.(target: x86-64) My code looks like:
int func_1(var1,var2){
asm_inline_(
)
func_2(var1,var2);
return_1;
}
void func_2(var_1,var_2){
asm __inline__(
)
func_3();
}
/**** Jump to kernel ---> System call stub in assembly. This func in .S file***/
void func_3(){
}
When I see the assembly code, I find "jmp" instruction is used instead of "call-return" pair when calling func_2 and func_3. I am sure it is one of the compiler optimization and I have not explored how to disable it. (GCC)
The moment I add some volatile variables to func_2 and func_3 and increment them then "jmp" gets replaced by "call-ret" pair.
I am bemused to see the behavior because those variables are useless and they don't serve any purpose.
Can someone please explain the behavior?
Thanks
If code jumps to the start of another function rather than calling it, when the jumped-to function returns, it will return back to the point where the outer function was called from, ignoring any more of the first function after that point. Assuming the behaviour is correct (the first function contributed nothing else to the execution after that point anyway), this is an optimisation because it reduces the number of instructions and stack manipulations by one level.
In the given example, the behaviour is correct; there's no local stack to pop and no value to return, so there is no code that needs to run after the call. (return_1, assuming it's not a macro for something, is a pure expression and therefore does nothing no matter its value.) So there's no reason to keep the stack frame around for the future when it has nothing more to contribute to events.
If you add volatile variables to the function bodies, you aren't just adding variables whose flow the compiler can analyse - you're adding slots that you've explicitly told the compiler could be accessed outside the normal control flow it can predict. The volatile qualifier warns the compiler that even though there's no obvious way for the variables to escape, something outside has a way to get their address and write to it at any time. So it can't reduce their lifetime, because it's been told that code outside the function might still try to write to that stack space; and obviously that means the stack frame needs to continue to exist for its entire declared lifespan.

C Program: how can result of 1st few lines of code change based on what happens much later in program?

I'm programming some C code using gcc -std=C89 switch on a Linux box. This C code communicates with an Oracle database using OCI drivers called by OCILIB libraries. After downloading the necessary data from the database, the C program calls a C function (my_function) that performs a lot of complex math. The program flow looks like:
int main (void) {
OCI_Connection *cn;
OCI_Statement *st;
OCI_Resultset *rs;
...
/* FIRST CALL TO DB */
OCI_Initialize(NULL, NULL, OCI_ENV_DEFAULT);
cn = OCI_ConnectionCreate(...);
st = OCI_StatementCreate(cn);
OCI_Prepare(st, ...);
OCI_Bindxxx(st, ...);
OCI_Execute(st);
printf(...); /* verify data retrieved from database is correct */
/* SECOND CALL TO DB */
OCI_Prepare(st, ...); /* different prepare stmt than above */
OCI_Bindxxx(st, ...);
OCI_Execute(st, ...);
printf(...); /* verify data retrieved from database is correct */
/* THIRD CALL TO DB */
OCI_SetFetchSize(st, 200);
OCI_Prepare(st, ...);
OCI_Bindxxx(st, ...);
OCI_Execute(st);
rs = OCI_GetResultset(st);
...
printf(...); /* verify data retrieved from database is correct */
OCI_Cleanup();
return EXIT_SUCCESS;
my_function(...);
}
If I run the program as shown, the printf statements all display the correct data has been downloaded from the database into the C program. However, my_function has not executed.
If I then move the return EXIT_SUCCESS line of code from before my_function() to AFTER my_function(), re-compile the code and run it, the printf statements show that the data from the 1st call to the database is saved correctly in the C program, but the 2nd call's data is incorrect, and the printf statement from the 3rd call appears not to have done anything.
There are no errors or warnings reported at compile and run time.
I'm not that experienced in C (or OCILIB), but for those who are, is there a logical explanation how the placement of return EXIT_SUCCESS in the code can interact with code located much before it, to cause this?
In my simple mind, I think of the code as executing one line at a time, so if the code works to line 123 (for example), a change to the code at line 456 shouldn't effect the results up to line 123 (e.g. when comparing before-versus-after the change to line 456). Perhaps Am I missing something?
Another possibility is that your code is relying on the value of uninitialized variables, and that by adding the return before calling myfunction() you are changing the way the compiler lays out variables in memory.
For example, an optimizing compiler might notice that the call to myfunction() is unreachable because of the return, thus it might be able to avoid setting aside space for a temporary variable it might otherwise need for the myfunction() call.
Make sure your compiler is set to warn about use of uninitialized variables.
I'm guessing that your printf statements don't end in newlines; in this case, the output isn't flushed until the main ends. This allows for my_function to corrupt stdout in the meantime. Use newlines or fflush and I'll bet this apparently anomalous behavior will cease.
Rewritten answer:
If your code is behaving as differently as you describe, it suggests that the version with the return before your call doesn't include your function in its executable image (the unused code is optimized out), which changes the memory layout. This then might be affecting your code if you have serious memory management issues.
Did you try reprinting the first data after the second lot of database activity, to ensure that you still had the information you thought you had read successfully? Was your printing of the information retrieved thorough and complete?

C Macro to Override Variable Assignment with Function Call

Calling all C macro gurus...
Is there any way to write a C macro that will replace something like this:
my_var = 5;
with this:
setVar(&my_var, 5);
In other words, can I write a C macro that will override assignments for a specific variable (in the above example, my_var) and instead pass it to a function whose job it is to set that variable? If possible, I'd like to be able to hook into assignments of a specific variable.
EDIT: After thinking about this some more, I'm not sure it could be done. Even if you can come up with a macro to do it, setVar wouldn't necessarily know the type of the variable its setting, so what would be the type of its second argument?
EDIT: The reason I'd like to hook assignments of specific variables is for use in a primitive debugger for some specialized embedded C code. It would be nice to be able to have a "watch list", essentially like you have in an IDE. My first instinct was to try to hook variable assignments with a C macro so you could just drop the macro into your code and have that variable "watched", but then again I've never really written a debugger before so maybe I'm going about that all wrong.
Not with the standard preprocessor. It cannot change the parsing of the file, only replace proper names with a piece of code (and "=" isn't valid in a name).
If you're feeling adventurous, you can try to replace the executable "cpp" with a small script which pre-processes the source code. But that might wreck havoc with the debugging information (file name and, if you're replacing one line of code with several, with line number information, too). The script would call "sed"`:
sed -e 's/my_var\s*=\s*([^;]+);/MY_VAR(my_var, $1);/' file.c > file_tmp.c
But your best bet is probably to put this into a script and simply run it on all your sources. This will change the code and you'll see what is happening in your debugger.
#define setVar(_left_, _right_) *(_left_) = _right_

Resources