How to make use of Clang's AST? - c

I am looking at making use of the Clang's AST for my C code and do some analysis over the AST. Some pointers on where to start, how to obtain the Clang's AST, tutorials or anything in this regard will be of great help!!!
I have been trying to find some and I got this link which was created 2 years back. But for some reason, it is not working for me. The sample code, in the tutorial, gives me too many errors. So I am not sure, if I build the code properly or some thing is wrong with the tutorial. But I would be happy to start from some other page as well.

Start with the tutorial linked by sharth. Then go through Clang's Doxygen. Start with SemaConsumer.
Read a lot of source code. Clang is a moving target. If you are writing tools based on clang, then you need to recognize that clang is adding and fixing features daily, so you should be prepared to read a lot of code!

You probably want the stable C API provided in the libclang library, as opposed to the unstable C++ internal APIs that others have mentioned.
The best documentation to start with currently is the video/slides of the talk, "libclang: Thinking Beyond the Compiler" available on the LLVM Developers Meeting website.
However, do note that the stability of the API comes at a cost of comprehensiveness. You won't be able to do everything with this API, but it is much easier to use.

To obtain the AST as well as get to know stages of the frontend, there is a frontend chapter in the book "LLVM core libraries". Basically it has such a flow (in the case of llvm-4.0.1 and should similar for later versions):
cc1_main.cpp:cc1_main (ExecuteCompilerInvocation)
CompilerInstance.cpp:CompilerInstance::ExecuteAction
ParseAST.cpp:clang::ParseAST (Consumer>HandleTranslationUnit(S.getASTContext())
CodeGenAction.cpp:HandleTranslationUnit
The last function handles the whole translation unit(top level decls are already handled at this point), and calls EmitBackendOutput to do backend stuff. So this function is a good spot where you can do something with the complete AST and before emitting backend output.
In terms of how to manipulate the AST, clang has some basic tutorial on this: http://clang.llvm.org/docs/RAVFrontendAction.html.
Also look at ASTDumper.cpp. It's the best example of visiting the AST.
Another good tutorial: https://jonasdevlieghere.com/understanding-the-clang-ast/ teaches you how to find a specific call expr in the AST via three different approaches.

I find this ASTUnit::LoadFromCompilerInvocation() fn as the most easiest way to construct the AST.
This link may give you some ideas http://comments.gmane.org/gmane.comp.compilers.clang.devel/12471

Related

Does the libgit2 project or anyone else provide sample code to demonstrate the various libgit2 functions?

I am using libgit2 via an FFI in another language but I am having difficulty figuring out what various functions actually do (and I'd prefer not to resort to reading the source code unless absolutely necessary). Does anyone know where I can find some working code samples for some of the functions in libgit2?
There are many ways to help you get a start with libgit2:
A series of posts from Ben Straub, one of the core contributors
Taking a peek at the libgit2 examples which are written in a very easily understandable C code
Reading through the headers which describe each function, expected parameters and produced output
Another angle would be to look at the libgit2 tests which emphasize the behavioral contract of each function, or, if you're more familiar with other languages, peek at the test code of some of libgit2 bindings and then dive into the way they're exercising libgit2
C# -> LibGit2Sharp
Ruby -> Rugged
Python -> Pygit2
From https://libgit2.github.com/, there's the 101 Samples guide.

Is there any interactive tool on web to understand common code bases?

I am modifying the code for glibc 2.5. Now since glibc is large and complex, I need to have a really good tool, to see interaction of different parts of the code. I am using Understand for this purpose, but Understood is only valid for 15 days. Afterwards you have to buy it.
So my question is, on web, are there sites where you can interactively understand common code bases such as glibc, gcc, linux kernel etc. I mean where you could search for some function, and then click on a function call to see its definition and such other useful features. I have used Koders.com, but it will only display the code, and is not interactive.
OpenGrok is good. You have to host it yourself though, it's not on the web.

Code refactoring tools for C, usable on GNU/Linux? FOSS preferable

Variations of this question have been asked, but not specific to GNU/Linux and C. I use Komodo Edit as my usual Editor, but I'd actually prefer something that can be used from CLI.
I don't need C++ support; it's fine if the tool can only handle plain C.
I really appreciate any direction, as I was unable to find anything.
I hope I'm not forced to 'roll' something myself.
NOTE: Please refrain from mention vim; I know it exists and what its capabilities are. I purposefully choose to avoid vim, which is why I use Komodo (or nano on the servers).
I don't think that a pure console refactoring tool would be nice to use.
I use Eclipse CDT on linux to write and refactor C-Code.
There exists also Xrefactory for Emacs http://www.xref.sk/xrefactory/main.html
if a non console refactoring tool is o.k for you as well.
C-xrefactory was an open source version of xrefactory, covering C and Java, made available on SourceForge by Marián Vittek under GPLv2.
For those interested, there's an actively maintained c-xrefactory fork on GitHub:
https://github.com/thoni56/c-xrefactory
The goal of the GitHub fork is to refactor c-xrefactory itself, add a test suite, and try to document the original source code (which is rather obscure). Maybe, in the future, also convert it into an LSP C language server and refactoring tool.
C-xrefactory works on Emacs; setup scripts and instructions can be found at the repository. Windows users can run it via WSL/WSL2.
You could consider coding a GCC plugin or a MELT extension (MELT is a domain specific language to extend GCC) for your needs.
However, such approach would take you some time, because you'll need to understand some of GCC internals.
For Windows only, and not FOSS but you said "any direction..."
Our DMS Software Reengineering Toolkit" with its C Front End can apply transformations to C source code. DMS can be configured to carry out custom, complex reliable transformations, although the configuration isn't as easy as typing just a command like "refactor frazzle by doobaz".
One of the principal stumbling blocks is still the preprocessor. DMS can transform code that has preprocessor directives in typical places (around statements, expressions, if/for/while loop heads, declarations, etc.) but other "unstructured conditionals" give it trouble. You can run DMS by expanding the preprocessor directives out of existence, or more imporantly, expanding out the ones that give it trouble, but mostly people don't like this because they prefer to keep thier preprocessor directives. So it isn't perfect.
[Another answer suggested Concinelle, which looks pretty good from my point of view. As far as I know, it doesn't handle preprocessor directives at all; I could be wrong and it might handle some cases as DMS does, but I'm sure it can't handle all the cases].
You don't want to consider rolling your own. Building a transformation/refactoring tool is much harder than you might guess having never tried it. You need full, accurate parsers for the (C) dialect of interest and just that is pretty hard to get right. You need a preprocessor, symbol tables, flow analysis, transformation, code regeneration machinery, ... this stuff takes years of effort to build and get right. Trust me, been there, done that.

Erlang source code guide

I am interested in delving into Erlang's C source code and try to understand what is going on under the hood. Where can I find info on the design and structure of the code?
First of all, you might want to have a look to Joe Armstrong's thesis, introducing Erlang at a high level. It will be useful to get an idea of what was the idea behind the language. Then, you could focus on the Erlang Run Time System (erts). The erlang.erl module could be a good start. Then, I would focus on the applications who constitutes the so-called minimal release, kernel and stdlib. Within the stdlib, have a look on how behaviours are implemented. May I suggest the gen_server.erl module as a start?
A Guide To The Erlang Source
http://www.trapexit.org/A_Guide_To_The_Erlang_Source
The short answer is that there is no good guide. And the code is not very well documented.
I recommend finding someone in your neighbourhood that knows the code reasonably well, and buy them dinner in exchange for a little chat.
If you don't have the possibility to do that, then I recommend starting with the loader.
./erts/emulator/beam/beam_load.c
Some useful information can also be found by pretty printing the beam representation. I don't know whether there is any way to do so supplied by OTP, but the HiPE project has some cheats.
hipe:c(MODULE, [pp_beam]).
Should get you started.
(And I also recommend Joe's book.)
Pretty printer of beam can be done by 'erlc -S', which is equivalent with hipe:c(M, [pp_beam]) mentioned by Daniel.
I also use erts_debug:df(Module). to disassemble the loaded beam code, which are instructions actually been interpreted by the VM.
Sometimes I use a debugger. OTP delivers tools supporting gdb very well. See example usage at http://www.erlang.org/pipermail/erlang-questions/2008-September/037793.html
A little late to the party here. If you just download the source from GitHub the internal documentation is really good. You have to generate some of it using make.
Get the documentation built and most of the relevant source is under /erts (Erlang Run Time System)
Edit: BEAM Wisdoms is also a really good guide but it may or may not be what you're after.

What is the good approach to build a new compiler?

I have an experience about the compiler phrases and I interested in Programming Languages & Compilers field and I hope somebody gives me some explanation about what is the good approach to write a new compiler from scratch for a new programming language ? (I mean STEPS).
The first step is to read the Dragon Book.
It offers a good introduction to the whole field of compiler building, but also goes into enough detail to actually build your own.
As for the following steps I suggest following the chapters of the book. It's not written as a tutorial, but nevertheless offers much practical advice, which makes it an ideal hub for your own ideas and research.
Please don't use the Dragon Book, it's old and mostly outdated (and uses weird names for most of the stuff).
For books, I'd recommand Apple's Tiger Book, or Cooper's Engineering a compiler. I'd strongly suggest you to use a framework like llvm so you don't have to re-implement a bunch of stuff for code generation etc.
Here is the tutorial for building your language with llvm: http://llvm.org/docs/tutorial/
I would look at integrating your langauge/front end with the GNU compiler framework.
That way you only (ONLY!) need to write the parser and translator to gcc's portable object format. You get the optimiser, object code generation for the chip of choice, linker etc for free.
Another alternative would be to target a Java JVM, the virtual machine is well documented and the JVM instruction set is much more staighforward than x86 machine code.
I managed to write a compiler without any particular book (though I had read some compiler books in the past, just not in any real detail).
The first thing you should do is play with any of the "Compiler compiler" type tools (flex, bison, antlr, javacc) and get your grammar working. Grammars are mostly straightforward, but there's always nitty bits that get in the way and make a ruin of everything. Especially things like expressions, precedence, etc.
Some of the older simpler language are simpler for a reason. It makes the parsers "Just Work". Consider a Pascal variant that can be processed solely through recursive decent.
I mention this because without your grammar, you have no language. If you can't parse and lex it properly, you get nowhere very fast. And watching a dozen lines of sample code in your new language get turned in to a slew of tokens and syntax nodes is actually really amazing. In a "wow, it really works" kind of way. It's literally almost an "it all works" or "none of it works" kind of thing, especially at the beginning. Once it actually works, you feel like you might be able to really pull it off.
And to some extent that's true, because once you get that part done, you have to get your fundamental runtime going. Once you get "a = 1 + 1" compiled, the bulk of the new work is behind your and now you just need to implement the rest of the operators. It basically becomes an exercise of managing lookup tables and references, and having some idea where you are at any one time in the process.
You can run out on your own with a brand new syntax, innovative runtime, etc. But if you have the time, it's probably best to do a language that's already been done, just to understand and implement all of the steps, and think about if you were writing the language you really want, how you would do what you're doing with this existing one differently.
There are a lot of mechanics to compiler writing and just doing the process successfully once will give you a lot more confidence when you want to come back and do it again with your own, new language.

Resources