Automatically printing the structures and variables in C - c

I am working with 4-5 .c files (around 2000 to 5000 lines each) which include several
headers.Currently I do not have any debug prints which would help me debug the program
during its course of execution.
My question is :-
Is there a way (or some existing tool) to parse the .c files and add new set of print
statements for all the variables in the current scope in the .c file ? Just the same way as
VC++ allows us to see Locals and globals etc. I need them printed at each step. Also,
pointers should be dereferenced.
For ex. lets say at one point in the .c file, there are 10 global variables and 3 locals.
I need to generate the smart printfs to print these 13 variables at that point. Later in
the program if there are 20 variables, i should be able to print the 20 variables etc.
The included header files contain all the relevant declarations for each of these
variables (which can be structures/pointers/arrays or some combinations etc etc.)
I was trying to achieve this via perl script.
What I did is, I generated the preprocessed file (.i file) and i tried parsing it via perl
and then generate individual print functions specific to each variable, but after half a
days' effort i realized that its just too time consuming.
Is there a tool that already does that ? If not this, anything close to it should be good
enough (On which i can apply some perl processing etc)
My goal is that after the program execution,at each step during the program execution, I should be able to see the variables(valid at that scope) without having to invoke the debugger.
I am allowed to process the .c files and re-write them again etc. etc.
Hope my question is clear and thanks for your replies.

Assuming that your C program can be interpreted by Frama-C's value analysis, which is far from a given, you could use that to obtain a log of the values of all living variables at each point of the program or at points of interest.
Consider the following program:
int x = 1;
main(){
int l;
x=2;
Frama_C_dump_each();
l=3;
Frama_C_dump_each();
{
int blocklocal = l + 1;
Frama_C_dump_each();
x = blocklocal + 1;
Frama_C_dump_each();
}
Frama_C_dump_each();
return 0;
}
Running frama-c -val -slevel 1000000000 -no-results t.c on this program produces the log:
[value] Values of globals at initialization
x ∈ {1}
[value] DUMPING STATE of file t.c line 7
x ∈ {2}
=END OF DUMP==
[value] DUMPING STATE of file t.c line 9
x ∈ {2}
l ∈ {3}
=END OF DUMP==
[value] DUMPING STATE of file t.c line 12
x ∈ {2}
l ∈ {3}
blocklocal ∈ {4}
=END OF DUMP==
[value] DUMPING STATE of file t.c line 14
x ∈ {5}
l ∈ {3}
blocklocal ∈ {4}
=END OF DUMP==
[value] DUMPING STATE of file t.c line 16
x ∈ {5}
l ∈ {3}
=END OF DUMP==
The Frama_C_dump_each() statements were inserted by me manually, but you could also nudge the interpreter so that it dumps a state automatically at each statement.
For this approach to work, you need the entire source code of your program, including standard library functions (strlen(), memcpy(), …) and you must hard-code the values of the input at the beginning of the main() function. Otherwise, it will behave as the static analyzer that it really is instead of behaving as a C interpreter.
You could also use the GUI to observe values of variables in your program, but if it is not linear, statements that are visited several times either because of function calls or because of loops will show all the values that can be taken during execution.

Related

Unable to read from offset 0 of a pointer in VST

For various reasons I am verifying a program that attempts to read from a pointer with an offset of 0:
int x = (*(z + 0)).ssl_int;
When verifying this in VST, the context has as an assumption that there is a value on the heap at this location:
data_at Tsh (Tunion _sslval noattr) (inr x2) z
However, when trying to run forward over the read, the proof gets stuck with the error:
It is not obvious how to move forward here. ...
But the same read goes through fine if I adjust my precondition to be
data_at Tsh (tarray (Tunion_sslval noattr) 1) [(inr x2)] z
or change the source code such that the read is instead:
int x = (*z).ssl_int;
Is there a way to verify this program without changing the source code or the program specification? How can I make the read go through?

How do nested scopes affect stack depth?

I tried compiling the following c-code using MSVC into assembly both with (CL TestFile.c /Fa /Ot) and without optimizations (CL TestFile.c /Fa) and the result is they produce the same stack-depth.
Why does the compiler use 8 bytes for each of the 3 varibles x, y, and z when it knows it will use a maximum of 16 bytes? Instead of y$1 = 4 and z$2 = 8 could it not use y$1 = 4 and z$2 = 4 so y and z use the same memory on the stack without any problems?
int main() {
int x = 123;
if (x == 123) {
int y = 321;
}
else {
int z = 234;
}
}
; Parts of the assembly code
x$ = 0
y$1 = 4
z$2 = 8
main PROC
$LN5:
sub rsp, 24
; And so on...
Nested scopes do not affect stack depth. Per the C standard, nested scopes affect visibility of identifiers and do not impose any requirements on how a C implementation uses the stack, if it has one. A C compiler is permitted by the C standard generate any code that gets the same observable behavior.
For the program shown in the question, the only observable behavior is to exit with a success status, so a good compiler should, when optimizing, generate a minimal program. For example, GCC 10.2 for x86-64 generates just an xor and a ret:
main:
xor eax, eax
ret
So does Clang 11.0.1. If MSVC does not, that is a deficiency in it. (However, it may be that the switches /Os and /Ot do not request optimization or do not request much optimization; they may just express a preference for speed or time when used in conjunction with other optimization switches.)
Further, a good compiler should perform lifetime analysis of the use of objects, constructing a graph representing where nodes are places in code and are labeled with creations or uses of values and directed edges are potential program control flows (or some equivalent representation of the source code). Then assembler (or intermediate code) should be generated to implement the semantics required by the graph. If two sets of source code have equivalent graphs, the compiler should generate equivalent assembly (or intermediate code) for them (up to some reasonable ability to process complicated graphs) regardless of whether definitions in nested scopes were used or not.

Why some of the local variables are not listed in the corresponding stack frame when inspected using GDB?

I have a piece of code in C as shown below-
In a .c file-
1 custom_data_type2 myFunction1(custom_data_type1 a, custom_data_type2 b)
2 {
3 int c=foo();
4 custom_data_type3 t;
5 check_for_ir_path();
6 ...
7 ...
8 }
9
10 custom_data_type4 myFunction2(custom_data_type3 c, const void* d)
11 {
12 custom_data_type4 e;
13 struct custom_data_type5 f;
14 check_for_ir_path();
15 ...
16 temp = myFunction1(...);
17 return temp;
18 }
In a header file-
1 void CRASH_DUMP(int *i)
2 __attribute__((noinline));
3
4 #define INTRPT_FORCE_DUMMY_STACK 3
5
6 #define check_for_ir_path() { \
7 if (checkfunc1() && !checkfunc2()) { \
8 int sv = INTRPT_FORCE_DUMMY_STACK; \
9 ...
10 CRASH_DUMP(&sv);\
11 }\
12 }\
In an unknown scenario, there is a crash.
After processing the core dump using GDB, we get the call stack like -
#0 0x00007ffa589d9619 in myFunction1 [...]
(custom_data_type1=0x8080808080808080, custom_data_type2=0x7ff9d77f76b8) at ../xxx/yyy/zzz.c:5
sv = 32761
t = <optimized out>
#1 0x00007ffa589d8f91 in myFunction2 [...]
(custom_data_type3=<optimized out>, d=0x7ff9d77f7748) at ../xxx/yyy/zzz.c:16
sv = 167937677
f = {
...
}
If you see the function, myFunction1 there are three local variables- c, t, sv (defined as part of macro definition). However, in the backtrace, in the frame 0, we see only two local variables - t and sv. And i dont see the variable c being listed.
Same is the case, in the function myFunction2, there are three local variables - e, f, sv(defined as part of macro definition). However, from the backtrace, in the frame 1, we see only two local variables - f and sv. And i dont see the variable e being listed.
Why is the behavior like this?
Any non-static variable declared inside the function, should be put on the callstack during execution and which should have been listed in the backtrace full, isn't it? However, some of the local variables are missing in the backtrace. Could someone provide an explanation?
Objects local to a C function often do not appear on the stack because optimization during compilation often makes it unnecessary to store objects on the stack. In general, while an implementation of the C abstract machine may be viewed as storing objects to local to a function on the stack, the actual implementation on a real processor after compilation and optimization may be very different. In particular:
An object local to a function may be created and used only inside a processor register. When there are enough processor registers to hold a function’s local objects, or some of them, there is no point in writing them to memory, so optimized code will not do so.
Optimization may eliminate a local object completely or fold it into other values. For example, given void foo(int x) { int t = 10; bar(x+2*t); … }, the compiler may merely generate code that adds an immediate value of 20 to x, with the result that neither 10 nor any other instantiation of t ever appears on stack, in a register, or even in the immediate operand of an instruction. It simply does not exist in the generated code because there was no need for it.
An object local to a function may appear on the stack at one point during a function’s code but not at others. And the places it appears may differ from place to place in the code. For example, with { int t = x*x; … bar(t); … t = x/3; … bar(t); … }, the compiler may decide to stash the first value of t in one place on the stack. But the second value assigned to t is effectively a separate lifetime, and the compiler may stash it in another place on the stack (or not at all, per the above). In a good implementation, the debugger may be aware of these different places and display the stored value of t while the program counter is in a matching section of code. And, while the program counter is not in a matching section of code, t may effectively not exist, and the debugger could report it is optimized out at that point.

Log whether a global variable has been read or written

Requirement:
Given a C program I have to identify whether the functions accessing global variables are reading them or writing them.
Example code:
#include <stdio.h>
/* global variable declaration */
int g = 20;
int main()
{
/* writing the global variable */
g = 10;
/* reading the global variable */
printf ("value of g = %d\n", g);
return 0;
}
Executing the above code I want to generate a log file in the below format:
1- Global variable a written in function main() "TIME_STAMP"
2- Global variable a read in function main() "TIME_STAMP"
Research:
I am cetainly able to acheive this by doing a static analysis of source code as per below logic:
Go through the c code and identify the statements where the global
variable is read.
Then analysis the c code statement to identify if
it is a read or write statement.(Checking if ++ or -- operator is
used with global variable or any assignemnt has been made to the
global variable)
Add a log statement above the identified statement which will execute
along with this statement execution.
This is not a proper implementation.
Some studies:
I have gone through how debuggers are able to capture information.
Some links in the internet:
How to catch a memory write and call function with address of write
Not completely answering your question, but to just log access you could do:
#include <stdio.h>
int g = 0;
#define g (*(fprintf(stderr, "accessing g from %s. g = %d\n", __FUNCTION__, g), &g))
void foo(void)
{
g = 2;
printf("g=%d\n", g);
}
void bar(void)
{
g = 3;
printf("g=%d\n", g);
}
int main(void)
{
printf("g=%d\n", g);
g = 1;
foo();
bar();
printf("g=%d\n", g);
}
Which would print:
accessing g from main. g = 0
g=0
accessing g from main. g = 0
accessing g from foo. g = 1
accessing g from foo. g = 2
g=2
accessing g from bar. g = 2
accessing g from bar. g = 3
g=3
accessing g from main. g = 3
g=3
Below is the way i solved this problem:
I created a utility(In java) which works as below(C program source file is the input to my utility):
Parse the file line by line identifying the variables and functions.
It stores global variables in a separate container and look for lines using them.
For every line which access the global variable i am analyzing them identifying whether it is a read operation or write operation(ex: ==, +=, -+
etc are write operation).
For every such operation i am instrumenting the code as suggested by #alk(https://stackoverflow.com/a/41158928/6160431) and that in turn will generate the log file when i execute the modified source file.
I am certainly able to achieve what i want but still looking for better implementation if anyone have.
For further discussion if anybody want we can have have a chat.
I refer the source code and algos from the below tools:
http://www.dyninst.org/
https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool

Compiling Tail-Call Optimization In Mutual Recursion Across C and Haskell

I'm experimenting with the foreign-function interface in Haskell. I wanted to implement a simple test to see if I could do mutual recursion. So, I created the following Haskell code:
module MutualRecursion where
import Data.Int
foreign import ccall countdownC::Int32->IO ()
foreign export ccall countdownHaskell::Int32->IO()
countdownHaskell::Int32->IO()
countdownHaskell n = print n >> if n > 0 then countdownC (pred n) else return ()
Note that the recursive case is a call to countdownC, so this should be tail-recursive.
In my C code, I have
#include <stdio.h>
#include "MutualRecursionHaskell_stub.h"
void countdownC(int count)
{
printf("%d\n", count);
if(count > 0)
return countdownHaskell(count-1);
}
int main(int argc, char* argv[])
{
hs_init(&argc, &argv);
countdownHaskell(10000);
hs_exit();
return 0;
}
Which is likewise tail recursive. So then I make a
MutualRecursion: MutualRecursionHaskell_stub
ghc -O2 -no-hs-main MutualRecursionC.c MutualRecursionHaskell.o -o MutualRecursion
MutualRecursionHaskell_stub:
ghc -O2 -c MutualRecursionHaskell.hs
and compile with make MutualRecursion.
And... upon running, it segfaults after printing 8991.
Just as a test to make sure gcc itself can handle tco in mutual recursion, I did
void countdownC2(int);
void countdownC(int count)
{
printf("%d\n", count);
if(count > 0)
return countdownC2(count-1);
}
void countdownC2(int count)
{
printf("%d\n", count);
if(count > 0)
return countdownC(count-1);
}
and that worked quite fine. It also works in the single-recursion case of just in C and just in Haskell.
So my question is, is there a way to indicate to GHC that the call to the external C function is tail recursive? I'm assuming that the stack frame does come from the call from Haskell to C and not the other way around, since the C code is very clearly a return of a function call.
I believe cross-language C-Haskell tail calls are very, very hard to achieve.
I do not know the exact details, but the C runtime and the Haskell runtime are vastly different. The main factors for this difference, as far as I can see, are:
different paradigm: purely functional vs imperative
garbage collection vs manual memory management
lazy semantics vs strict one
The kinds of optimizations which are likely to survive across language boundaries given such differences are next to zero. Perhaps, in theory, one could invent an ad hoc C runtime together with a Haskell runtime so that some optimizations are feasible, but GHC and GCC were not designed in this way.
Just to show an example of the potential differences, assume we have the following Haskell code
p :: Int -> Bool
p x = x==42
main = if p 42
then putStrLn "A" -- A
else putStrLn "B" -- B
A possible implementation of the main could be the following:
push the address of A on the stack
push the address of B on the stack
push 42 on the stack
jump to p
A: print "A", jump to end
B: print "B", jump to end
while p is implemented as follows:
p: pop x from the stack
pop b from stack
pop a from stack
test x against 42
if equal, jump to a
jump to b
Note how p is invoked with two return addresses, one for each possible result. This is different from C, whose standard implementations use only one return address. When crossing boundaries the compiler must account for this difference and compensate.
Above I also did not account for the case when the argument of p is a thunk, to keep it simple. The GHC allocator can also trigger garbage collection.
Note that the above fictional implementation was actually used in the past by GHC (the so called "push/enter" STG machine). Even if now it is no longer in use, the "eval/apply" STG machine is only marginally closer to the C runtime. I'm not even sure about GHC using the regular C stack: I think it does not, using its own one.
You can check the GHC developer wiki to see the gory details.
While I am no expert in Haskel-C interop, I do not imagine a call from C to Haskel can be a straight function invocation - it most likely has to go through intermediary to set up environment. As a result, your call to haskel would actually consist of call to this intermediary. This call likely was optimized by gcc. But the call from intermediary to actual Haskel routine was not neccessarily optimized - so I assume, this is what you are dealing with. You can check assembly output to make sure.

Resources