getting live variables with clang's python binding - c

I'm writing an analysis tool for C language in Python and I am using clang python binding for this purpose. So far it has been great and I was able to get AST of the C files I have and process them. Now I need to find live variable at a certain point of program. By live variables I mean the variables that are used after that line of code before begin re-assigned. I noticed that clang already has some related code in liveVariables.h.
Does anyone know how I can get it to work in the python binding?
How can I call liveVariables functions by having the AST and TranslationUnit?

Related

From Assembler to C-Compiler

i designed a small RISC in verilog. Which steps do I have to take to create a c compiler which uses my assembler-language? Or is it possible to modify a conventional compiler like gcc, cause I don't want to make things like linker, ...
Thanks
You need to use an unmodified C lexer+parser (often called the front end), and a modified code generation component (the back end) to do that.
Eli Bendersky's pycparser can be used as the front end, and Atul's mini C compiler can be used as inspiration for the code generating back end: http://people.cs.uchicago.edu/~varmaa/mini_c/
With Eli Bendersky's pycparser, all you need to do is convert the AST to a Control Flow Graph (CFG) and generate code from there. It is easier to start with supporting a subset of C than the full shebang.
The two tools are written in Python, but you didn't mention any implementation language preferences :)
I have found most open sourcen compilers (except clang it seems) too tightly coupled to easily modify the back end. Clang and especially GCC are not easy to dive into, nowhere NEAR as easy as the two above. And since Eli's parser does full C99 (it parses everything I've thrown at it) it seem like a nice front end to use for further development.
The examples on the Github project demonstrates most of the features of the project and it's easy to get started. The example that parses C to literal English is worth taking a look at, but may take a while to fully grok. It basically handles any C expression, so it is a good reference for how to handle the different nodes of the AST.
I also recommended the tools above, in my answer to this question: Build AST from C code

Turning strings into code?

So let's say I have a string containing some code in C, predictably read from a file that has other things in it besides normal C code. How would I turn this string into code usable by the program? Do I have to write an entire interpreter, or is there a library that already does this for me? The code in question may call subroutines that I declared in my actual C file, so one that only accounts for stock C commands may not work.
Whoo. With C this is actually pretty hard.
You've basically got a couple of options:
interpret the code
To do this, you'll hae to write an interpreter, and interpreting C is a fairly hard problem. There have been C interpreters available in the past, but I haven't read about one recently. In any case, unless you reallY really need this, writing your own interpreter is a big project.
Googling does show a couple of open-source (partial) C interpreters, like picoc
compile and dynamically load
If you can capture the code and wrap it so it makes a syntactically complete C source file, then you can compile it into a C dynamically loadable library: a DLL in Windows, or a .so in more variants of UNIX. Then you could load the result at runtime.
Now, what normally would lead someone to do this is a need to be able to express some complicated scripting functions. Have you considered the possibility of using a different language? Python, Scheme (guile) and Lua are easily available to add as a scripting language to a C application.
C has nothing of this nature. That's because C is compiled, and the compiler needs to do a lot of building of the code before the code starts running (hence receives a string as input) that it can't really change on the fly that easily. Compiled languages have a rigidity to them while interpreted languages have a flexibility.
You're thinking of Perl, Python PHP etc. and so called "fourth generation languages." I'm sure there's a technical term in c.s. for this flexibility, but C doesn't have it. You'll need to switch to one of these languages (and give up performance) if you have a task that requires this sort of string use much. Check out Perl's /e flag with regexes, for instance.
In C, you'll need to design your application so you don't need to do this. This is generally quite doable, as for its non-OO-ness and other deficiencies many huge, complex applications run on well-written C just fine.

Getting type information of C symbols

Let me try to give some background first. I'm working on some project with some micro controller (AVR) which I'm accessing through some interface (UART). I'm doing direct writes to its global variables and I'm also able to directly execute functions (write args, trigger execution, read back return values).
AVR code is in C compiled with GCC toolchain. PC, that is communicating with it, is running python code. As of now I have imported adress & size information into python easily by parsing 'objdump -x' output. Now what would greatly boost my development would be information about types of the symbols (types & sizes of structs elements, enums values, functions arguments & return values, ...).
Somehow this seemed like a common thing that people do daily, and I was naively expecting ready-made python tools at start. Well, not so easy. By now I've spend many hours looking into various ways how to accomplish that.
One approach would be to just parse the C code (using e.g. pycparser). But seems like I would have to at least 'pre-parse' the code to exclude various unsupported constructs and various ordering problems and so on. Also, in theory, the problem would be if compiler would do some optimizations, like struct or enum reordering and so on.
I've been also looking into various gcc, gdb and objdump options to get such information. Have spent some time looking for tools for extracting information from various debugging formats (dwarf, stabs).
The closest I get so far is to dump stabs debugging information with objdump -g option. This outputs C-like information, which I would then parse using pycparser or on my own.
But before I spent my time doing that, I decided to raise a question here, strongly hoping that someone will hit me with possibly totally different approach I just haven't think of.
There's a quite nice tool called c2ph that dumps a parsable descripton of the types and sizes (using debug info as the source)
To answer myself... this is what I found:
http://code.google.com/p/pydevtools/
Actually I knew about it before, but it didn't really work for me at first.
So basically I made it Python 3 compatible and did few other fixes/changes also - here you can get it all:
http://code.google.com/p/pydevtools/source/checkout
Actually there is some more code which actually uses this module, but it is not finished yet. I will probably add it when finished.

Abstract-syntax tree for subset of C

For teaching purpose we are building a javascript step by step interpreter for (a subset of) C code.
Basically we have : int,float..., arrays, functions, for, while... no pointers.
The javascript interpreter is done and allow us to explain how a boolean expression is evaluated, will show the variables stack...
For now, we are manually converting our C examples to some javascript that will run and build a stack of actions (affectation, function call...) that can later on be used to do the step by step stuff. Since we are limiting ourselves to a subset of C it's quite easy to do.
Now we would like to compile the C code to our javascript representation. All we need is a Abstract-syntax tree of the C code and the javascript generation is straightforward.
Do you know a good C-parser that could generate a such tree ? No need to be in javascript (but that would be perfect), any language is alright as this can be done offline.
I've looked at Emscripten ( https://github.com/kripken/emscripten ) but it's more a C=>javascript compiler and that's not what we want.
I've recently used Eli Bendersky's pycparser to mess with ASTs of C code. I think it'd work well for your purposes.
I think that ANTLR has a full C parser.
To do your translation task, I suspect you will need full symbol table support; you have to know what the symbols mean. Here most "parsers" will fail you; they don't build a full symbol table. I think ANTLR does not, but I could be wrong.
Our DMS Software Reengineering Toolkit with its C Front End provides a full C arser, and builds complete symbol tables. (You may not need it for your application, but it includes a full C preprocessor, too). It also provide control flow, data flow, points-to-analysis and call graph construction, all of which can be useful in translating C to whatever your target virtual machine is.

removing unneeded code from gcc andd mingw

i noticed that mingw adds alot of code before calling main(), i assumed its for parsing command line parameters since one of those functions is called __getmainargs(), and also lots of strings are added to the final executable, such as mingwm.dll and some error strings (incase the app crashed) says mingw runtime error or something like that.
my question is: is there a way to remove all this stuff? i dont need all these things, i tried tcc (tiny c compiler) it did the job. but not cross platform like gcc (solaris/mac)
any ideas?
thanks.
Yes, you really do need all those things. They're the startup and teardown code for the C environment that your code runs in.
Other than non-hosted environments such as low-level embedded solutions, you'll find pretty much all C environments have something like that. Things like /lib/crt0.o under some UNIX-like operating systems or crt0.obj under Windows.
They are vital to successful running of your code. You can freely omit library functions that you don't use (printf, abs and so on) but the startup code is needed.
Some of the things that it may perform are initialisation of atexit structures, argument parsing, initialisation of structures for the C runtime library, initialisation of C/C++ pre-main values and so forth.
It's highly OS-specific and, if there are things you don't want to do, you'll probably have to get the source code for it and take them out, in essence providing your own cut-down replacement for the object file.
You can safely assume that your toolchain does not include code that is not needed and could safely be left out.
Make sure you compiled without debug information, and run strip on the resulting executable. Anything more intrusive than that requires intimate knowledge of your toolchain, and can result in rather strange behaviour that will be hard to debug - i.e., if you have to ask how it could be done, you shouldn't try to do it.

Resources