Automated tracing use of variables within source code - c

I'm working with a set of speech processing routines (written in C) meant to be compiled with the mex command on MATLAB. There is this C-function which I'm interested in accelerating using FPGA.
The hardware takes in specified input parameters through input ports, the rest of the inputs as constants to be hard coded, and passes a particular variable some where within the C-function, say foo, to the output port.
I am interested in tracing the computation graph (unsure if this is the right term to use) of foo. i.e. how foo relates to intermediate computed variables, which in turn eventually depends on input parameters and hard coded constants. This is to allow me to flatten the logic so they can be coded using a hardware description language, as well as remove irrevelant logic which does not affect the value of foo. The catch is that some intermediate variables are global, therefore tracing is a headache.
Is there an automated tool which analyzes a given set of C headers and source files and provide a means of tracing how a specified variable is altered, with some kind of dependency graph of all variables used?

I think what you are looking for is a tool to do value analysis.
Among the tools available to do this, I think Code Surfer is probably the best out there. Of course, it is also quite expensive but if you are a student, they do have an academic license program. On the open-source side, Frama-C can also do this in a more limited fashion and has a much, much steeper learning curve. But it is free and will get you where you want to go.

Related

when should we care about cache missing?

I want to explain my question through a practical problem I met in my project.
I am writing a c library( which behaves like a programmable vi editor), and i plan to provide a series of APIs ( more than 20 in total ):
void vi_dw(struct vi *vi);
void vi_de(struct vi *vi);
void vi_d0(struct vi *vi);
void vi_d$(struct vi *vi);
...
void vi_df(struct vi *, char target);
void vi_dd(struct vi *vi);
These APIs do not perform core operations, they are just wrappers. For example, I can implement vi_de() like this:
void vi_de(struct vi *vi){
vi_v(vi); //enter visual mode
vi_e(vi); //press key 'e'
vi_d(vi); //press key 'd'
}
However, if the wrapper is as simple as such, I have to write more than 20 similar wrapper functions.
So, I consider implementing more complex wrappers to reduce the amount:
void vi_d_move(struct vi *vi, vi_move_func_t move){
vi_v(vi);
move(vi);
vi_d(vi);
}
static inline void vi_dw(struct vi *vi){
vi_d_move(vi, vi_w);
}
static inline void vi_de(struct vi *vi){
vi_d_move(vi, vi_e);
}
...
The function vi_d_move() is a better wrapper function, he can convert a part of similar move operation to APIs, but not all, like vi_f(), which need another wrapper with a third argument char target .
I finished explaining the example picked from my project. The pseudo code above is simper than real case, but is enough to show that:
The more complex the wrapper is, the less wrappers we need, and the slower they will be.(they will become more indirect or need to consider more conditions).
There are two extremes:
use only one wrapper but complex enough to adopt all move operations and convert them into corresponding APIs.
use more than twenty small and simple wrappers. one wrapper is one API.
For case 1, the wrapper itself is slow, but it has more chance resident in cache, because it is often executed(all APIs share it). It's a slow but hot path.
For case 2, these wrappers are simple and fast, but has less chance resident in cache. At least, for any API first time called, a cache miss will happen.(CPU need to fetch instructions from memory, but not L1, L2).
Currently, I implemented five wrappers, each of them are relatively simple and fast. this seems to be a balance, but just seems. I chose five just because I felt the move operation can be divided into five groups naturally. I have no idea how to evaluate it, I don't mean a profiler, I mean, in theory, what main factors should be considered in such case?
In the post end, I want to add more detail for these APIs:
These APIs need to be fast. Because this library is designed as a high performance virtual editor. The delete/copy/paste operation is designed to approach the bare C code.
A user program based on this library seldom calls all these APIs, only parts of them, and usually no more than 10 times for each.
In real case, the size of these simple wrappers are about 80 bytes each, and will be no more than 160 bytes even merged into a single complex one. (but will introduce more if-else branches).
4, As with the situation the library is used, I will take lua-shell as example(a little off-topic, but some friends want to know why I so care its performance):
lua-shell is a *nix shell which uses lua as its script. Its command execution unit(which do forks(), execute()..) is just a C module registered into the lua state machine.
Lua-shell treats everything as lua .
So, When user input:
local files = `ls -la`
And press Enter. The string input is first sent to lua-shell's preprocessor————which convert mixed-syntax to pure lua code:
local file = run_command("ls -la")
run_command() is the entry of lua-shell's command execution unit, which, I said before, is a C module.
We can talk about libvi now. lua-shell's preprocessor is the first user of the library I am writing. Here is its relative codes(pseudo):
#include"vi.h"
vi_loadstr("local files = `ls -la`");
vi_f(vi, '`');
vi_x(vi);
vi_i(vi, "run_command(\"");
vi_f(vi, '`');
vi_x(vi);
vi_a(" \") ");
The code above is parts of luashell's preprocessor implementation.
After generating the pure lua code, he feeds it to Lua State Machine and run it.
The shell user is sensitive to the time interval between Enter and a new prompt, and in most case lua-shell needs preprocess script with larger size and more complicate mixed-syntax.
This is a typical situation where libvi is used.
I won't care that much about cache misses (especially in your case), unless your benchmarks (with compiler optimizations enabled, i.e. compile with gcc -O2 -mtune=native if using GCC....) indicate that they matter.
If performances matters that much, enable more optimizations (perhaps compiling and linking your entire application or library with gcc -flto -O2 -mtune=native that is with link-time optimizations), and hand-optimize only what is critical. You should trust your optimizing compiler.
If you are in the design phase, consider perhaps making your application multi-threaded or somehow concurrent and parallel. With care, this could speedup it more than cache optimizations.
It is unclear what your library is about and what are your design goals. A possibility to add flexibility might be embed some interpreter (like lua or guile or python, etc...) in your application, hence configuring it thru scripts. In many cases, such an embedding could be fast enough (especially when the application specific primitives are of high enough level). Another (more complex) possibility is to provide metaprogramming abilities perhaps thru some JIT compiling library like libjit or libgccjit (so you would sort-of "compile" user scripts into dynamically produced machine code).
BTW, your question seems to focus on instruction cache misses. I would believe that data cache misses are more important (and less optimizable by the compiler), and that is why you would prefer e.g. vectors to linked lists (and more generally care about low-level data structures, focusing on using sequential -or cache-friendly- accesses)
(you could find a good video by Herb Sutter which explains that last point; I forgot the reference)
In some very specific cases, with recent GCC or Clang, adding a few __builtin_prefetch might slightly improve performance (by decreasing cache misses), but it could also harm it significantly, so I don't recommend using it in general, but see this.

Generating Define-Use Paths for C Code coverage analysis

How is it possible to generate uncovered Define-Use Paths for C Code (using e.g. gcc).
As I saw this subject is only academic. (unlike line coverage)
resource: http://whiteboxtest.com/Data-Flow-Testing.php
You need a tool that can determine for every definition, all the possible uses (e.g., computes Def-Use pairs) in the code, associate with each Def-Use pair the variable defined and the program location (file, line, column) of Def and Use points.
Then for each def-use pair, you need to add instrumentation ("probes") to the program that records the use of that def-use pair, when it gets used (usually near the use), as some kind of boolean variable specific to that def-use pair.
Because there are lot of these it is useful to organize the individual booleans as a boolean array. (Obvious optimization to minimize the number inserted probes: a basic block, when executed, will satisfy many def-use pairs; a boolean representing execution of the basic block [block coverage] can stand in as set of def-use pairs. I'm sure there are other similar optimizations.).
After running the program, one has to dump these boolean variables, compute the actual def-use information (e.g., including using the block coverage data), and then display it.
A standard scheme for modifying the program to do this with source-to-source program transformations. My paper Branch Coverage for Arbitrary Languages Made Easy shows how to do this with our DMS Software Reengineering Toolkit and its style of rewrite rules. The paper focuses on branch (block) coverage, but the instrumentation aspect is fine. A typical transformation rule used looks like this:
rule mark_if_then_else(condition:expression; tstmt:statement; estmt:statement)=
“if (\condition) \tstmt else \estmt;”
rewrites to
“if (\condition)
{ visited[\new_place\(\tstmt\)]=1;
\tstmt}
else
{ visited[\new_place\(\estmt\)]=1;
\estmt
};”
This rule modifies and if-then-else construct to collect "visited" booleans for each conditionally executed block (then and else clauses) generating a new index for each new block. The \xxxx means "an arbitrary code structure of syntax type ssss if the transformation rule signature (first line) declares ssss:xxxx. You can see more information on the precise syntax and meaning of the DMS rewrite rules here.
It turns out getting def-use information is hard; you need what amounts to a compiler front end thus OP's mention of GCC. GCC won't do source-to-source transformations but you can get essentially the same effect with source-to-binary transformations as gcov does, by modifying the GCC sources with procedural code to add the probes. In general, though, GCC doesn't want to help you do this kind of custom instrumentation.
I don't know, but I'm pretty sure Clang computes def-use information. It is possible to do source to source transformations with Clang but I have no experience with that.
I do know that our DMS does compute def-use information for C and C++. That and its ability to do source-to-source transformations would make building a def-use coverage tool technically straightforward.
(Not asked, but DMS also computes control flows, so one could also do path coverage straightforwardly.)
Then there is the problem of building a display tool. You need something that can show the def use pairs and their status, probably associated with/superimposed on the the code so it is easy to understand each def and use. So you need to record line and column-precise information about the location of each def and use. I don't think you can get that from GCC; it doesn't have that information in the binary, but maybe it has it in the its constructed AST. You can get column information from DMS and Clang (I think).

Top-down approach in C: Interface implemented via multiple C files

I am responsible for designing the software architecture of an embedded system in C90 (which is dictated by the target hardware compiler). It shall be easily built against a couple of targets (traditional testing, Software-In-The-Loop, final hardware). Therefore I took a top-down approach or, designing for an interface:
Once defined the data flows of the system (inputs, outputs, ...) I have created generical interfaces in the form of .H files that need to be implemented by the targets.
Therefore, and for the sake of the question, let them be two:
imeasures.h --> Measures needed by the algorithm
icomm.h --> Data flow to and from the algorithm to other devices
For the production target, suppose that all the measures but one (e.g. Engine Speed) are taken using ADCmeasures module, and the last mentioned one (Engine Speed) is provided by RS232comm module.
Question 1
Is it OK if imeasures.h is implemented using both ADCmeasures and RS232comm modules in the following form?
imeasures.h <--is implemented BY-- imeasuresImpl.c
imeasuresImpl.c --> calls functions from ADCmeasures.h and RS232comm.h
Therefore, switching targets would imply changing imeasuresImpl and the rest of callees.
Question 2
Due to the overhead the previous method may suppose (which could be mitigated using inline functions, indeed) , I also thought about a ¿less elegant? form:
imeasures.h <-- is partially implemented by ADCmeasures.c
imeasures.h <-- is partially implemented by RS232comm.c
Which pitfalls do you see? I can see that, for example, if imeasures.h consists of a single getter method which returns a struct, I would have to partially fill the struct in both of the partial implementations. Or, in turn, provide different getter methods, and then I would be deciding beforehand a layout of the implementation which would break the top-down principle.
Thank you for your attention.
First, some assumptions on the situation, the requirements
So I assume that through imeasures.h, preferably you would like to get an interface with a single get function which would return you a structure nicely populated with the most fresh measurements. While it is possible, you may accept some other functions like run which would run the processes necessary for the measurements, and an init to initialize the stuff (I mean with "possible" that there are ways I sometimes explored by which you can get around without these two latter functions).
As you tell, I assume you would like to separate an as thin hardware interface as possible, so you could easier apply simulation for testing, or later you would have less to reimplement when porting to different hardware.
As the interface suggest, you would like to hide the split (that one of your measurements come from RS232).
Solving with something like Q1, the architecture
Your take with Q1 seems to be an okay approach for laying down the architecture to meet these requirements. For Q2 I think "forget that", I can't conceive any reasonable solution which would appear like that.
My approach, just like your Q1, would require at least three implementation file.
On the top would be an imeasures.c (I would stick to this name, since that's the usual way of doing these, and there is no very good reason to do anything different here). This file would implement the whole imeasures.h interface, in it containing the logic for assembling the measurements, and dispatching the hardware-specific components. It would not contain anything hardware-specific by itself.
An RS232comm.c (and .h) would realize the RS232 hardware interface. I would do this as generic as reasonable, within the necessities of meeting the requirements (for example if it would only need to receive, I would only implement a receiver adequate for the project here). The goal is to have something which meets the project's requirements, however if needed, may be re-used for other projects on the same (or similar) hardware.
An ADCcomm.c (and .h). Note that I did not name it ADCmeasures.c for a good reason: since I don't want to have anything specific for the actual measurements here. Just like above: something necessary by the requirements, but generic enough so it might be possible to be reused.
Following this, it is likely that you get an imeasures.c which does not need to be altered in any means for the simulation (has no hardware specific code), so can also be tested in that testing environment. You also get useful little hardware specific components which you can reuse for new projects (in my case it happened quite frequently as many times electrical engineers would iterate on the same piece of hardware for later projects).
Usually you shouldn't have to be concerned about overhead. Design first, optimize only where it is actually necessary. If you design well, you may even likely to end up with an end product performing better, just because you don't have to battle with messy performance code (or "I thought it would perform better" code), taking your time from recognizing the real bottlenecks, and time from either discovering better algorithms or optimizing those parts which actually need it.
Well, hope it helps in getting across this!

Reverse engineer "compiled" Perl vs. C?

Have a client that's claiming complied C is harder to reverse engineer than sudo "compiled" Perl byte-code, or the like. Anyone have a way to prove, or disprove this?
I don't know too much about perl, but I'll give some examples why reversing code compiled to assembly is so ugly.
The ugliest thing about reverse engineering c code is that the compilation removes all type information. This total lack of names and types is very the worst part IMO.
In a dynamically typed language the compiler needs to preserve much more information about that. In particular the names of fields/methods/... since these are usually strings for which it is impossible to find every use.
There is plenty of other ugly stuff. Such as whole program optimization using different registers to pass parameters every time. Functions being inlined so what was one a simple function appears in many places, often in slightly different form due to optimizations.
The same registers and bytes on the stack get reused by different content inside a function. Gets especially ugly with arrays on the stack. Since you have no way to know how big the array is and where it ends.
Then there are micro-optimizations which can get annoying. For example I once spend >15 minutes to reverse a simple function that once was similar to return x/1600. Because the compiler decided that divisions are slow and rewrote that division by a constant into several multiplications additions and bitwise-operations.
Perl is really easy to reverse engineer. The tool of choice is vi, vim, emacs or notepad.
That does raise the question about why they're worried about reverse engineering. It is more difficult to turn machine code back to something resembling the original source code than it is byte-code normally but for most nefarious activities that's irrelevant. If someone wants to copy your secrets or break your security they can do enough without turning it back into a perfect representation of your original source code.
Reverse engineering code for a virtual machine is usually easier. A virtual machine is typically designed to be an easy target for the language. That means it typically represents the constructs of that language reasonably easily and directly.
If, however, you're dealing with a VM that wasn't designed for that particular language (e.g., Perl compiled to the JVM) that would frequently put you back much closer to working with code generated for real hardware -- i.e., you have to do whatever's necessary to target a pre-defined architecture instead of designing the target to fit the source.
Ok, there has been suficient debate on this over the years; and mostly the results are never conclusive ... mainly because it doesn't matter.
For a motivated reverse engineer, both will be the same.
If you are using pseudo exe makers like perl2exe then that will be easier to "decompile" than compiled C, as perl2exe does not compile the perl at all, it's just a bit "hidden" (see http://www.net-security.org/vuln.php?id=2464 ; this is really old, but concept is probably still the same (I haven't researched so don't know for sure, but I hope you get my point) )
I would advise look at the language which is best for the job so maintenance and development of the actual product can be done sensibly and sustainably.
Remember you _can_not_ stop a motivated adversary, you need to make it more expensive to reverse than to write it themselves.
These 4 should make it difficult (but again not impossible)...
[1] Insert noise code (random places, random code) which does pointless maths and complex data structure interaction (if done properly this will be a great headache if the purpose is to reverse the code rather than the functionality).
[2] Chain a few (different) code obfuscators on the source code as part of build process.
[3] Apply a Software protection dongle which will prevent code execution if the h/w is not present, this will mean physical access to the dongle's data is required before rest of the reversing can take place : http://en.wikipedia.org/wiki/Software_protection_dongle
[4] There are always protectors (e.g. Themida http://www.oreans.com/themida.php) you can get which will be able to protect a .exe after it has been built (regardless of how it was compiled).
... That should give the reverser enough headache.
But remember that all this will also cost money, so you should always weigh up what is it that you are trying to achieve and then look at your options.
In short: Both methods are equally insecure. Unless you are using a non-compiling perl-to-exe maker in which case native compiled EXE wins.
I hope this helps.
C is harder to decompile than byte-compiled Perl code. Any Perl code that's been byte-compiled can be decompiled. Byte-compiled code is not machine code like in compiled C programs. Some others suggested using code obfuscation techniques. Those are just tricks to make code harder to read and won't effect the difficulty in decompiling the Perl source. The decompiled source may be harder to read but there are many Perl de-obfuscation tools available and even a Perl module:
http://metacpan.org/pod/B::Deobfuscate
Perl packing programs like Par, PerlAPP or Perl2exe won't offer source code protection either. At some point the source has to be extracted so Perl can execute the script. Even packers like PerlAPP and Perl2exe, which attempt some encryption techniques on the source, can be defeated with a debugger:
http://www.perlmonks.org/?displaytype=print;node_id=779752;replies=1
It'll stop someone from casually browsing your Perl code but even the packer has to unpack the script before it can be run. Anyone who's determined can get the source code.
Decompiling C is a different beast altogether. Once it's compiled it's now machine code. You either end up with Assembly code with most C decompilers or some of the commercial C decompilers will take the Assembly code and try to generate equivalent C code but, unless it's a really simple program, seldom are able to recreate the original code.

how do call graphs resolve function pointers?

I am implementing a call graph program for a C using perl script. I wonder how to resolve call graphs for function pointers using output of 'objdump'?
How different call graph applications resolve function pointers?
Are function pointers resolved at run time or they can be done statically?
EDIT
How do call graphs resolve cycles in static evaluation of program?
It is easy to build a call graph of A-calls-B when the call statement explicitly mentions B. It is much harder to handle indirect calls, as you've noticed.
Good static analysis tools form estimates of the contents of pointer variables by propagating pointer assignments/copies/arithmetic across program data flows (inter and intra-procedural ["global"]) using a variety of schemes, often conservative ("you get too much").
Without such an estimate, you cannot have any idea what a pointer contains and therefore simply cannot make a useful prediction (well, you can use the ultimate conservative estimate that it will go anywhere, but I think you've already rejected that solution).
Our DMS Software Reengineering Toolkit has static control/dataflow/points-to/call graph analysis that has been applied to huge systems (~~25 million lines) of C code, and produced such call graphs. The machinery to do this
is pretty complex but you can find it in advanced topics in the compiler literature. I doubt you want to implement this in Perl.
This is easier when you have source code, because you at least reliably know what is code, and what is not. You're trying to do this on object code, which means you can't even eliminate data.
Using function pointers is a way of choosing the actual function to call at runtime, so in general, it wouldn't be possible to know what would actually happen statically.
However, you could look at all functions that are possible to call and perhaps show those in some way. Often the callbacks have a unique enough signature (not always).
If you want to do better, you have to analyze the source code, to see which functions are assigned to pointers to begin with.

Resources