I am working on a tool that takes the LLVM IR and modifies it. I'm interested in allowing the programmer to give hints to the compiler. For example, he can give the hint that a particular loop is compute intensive. For this purpose, one thing that comes to my mind is to use a pragma. So my question is, how can we make the pragmas work? Can I have the pragma information there in the LLVM IR? What are the options for such kind of task?
This question can refer to several different things:
If you're looking to understand how to implement pragma, take a look at how Clang does it. I.e. what various pragma directives are translated to.
If you want to understand the existing hints (for instance inlinehint, byval etc.), look at attributes - for example Function Attributes.
If you want something more flexible and proprietary, you can use metadata. LLVM itself uses it for various purposes, but in your own compiler you're very free in what you can do with it. Hints to the compiler are one possible application.
Related
I'm trying to write a language runtime (and a language itself) that is similar to .NET or to the JVM. It's got a form of bytecode that is custom.
What I want is a way to translate said bytecode to actual, runnable machine code. So, because I'm not wanting to write such a translator myself (this is more of a toy project/personal side project) I want to find a good JIT library to use.
Here's what I want the library to do:
The library should be as easy to use as possible (toy project and I don't really have much experience here)
The library should support at least x86_64 (development machine), though preferably it should cover other architectures as well
The library should preferably do some low level optimizations (register tracking and allocation, reducing memory accesses etc); those optimizations shouldn't be very expensive to do though (I will myself do other optimizations to e.g. remove virtual calls and convert them to direct ones, for example). I can accept a library with no optimization if it's easiest to use though.
The library must have an interface that is usable from C (preferred) or C++ (acceptable).
I will use Boehm GC for garbage collection, if it matters (probably doesn't, but just in case). Maybe a compacting GC would be nice, but I guess I shouldn't combine the questions...
I would suggest llvm. There are some great tutorials on how to implement your own language with it and basic stuff is not too complicated. You also get the option to do a lot of more advanced stuff later on. As a bonus not only can you use JIT but you can also statically compile and optimize your binaries. LLVM also does have a C interface and can target all common CPU architectures and even a lot of more obscure ones.
Variations of this question have been asked, but not specific to GNU/Linux and C. I use Komodo Edit as my usual Editor, but I'd actually prefer something that can be used from CLI.
I don't need C++ support; it's fine if the tool can only handle plain C.
I really appreciate any direction, as I was unable to find anything.
I hope I'm not forced to 'roll' something myself.
NOTE: Please refrain from mention vim; I know it exists and what its capabilities are. I purposefully choose to avoid vim, which is why I use Komodo (or nano on the servers).
I don't think that a pure console refactoring tool would be nice to use.
I use Eclipse CDT on linux to write and refactor C-Code.
There exists also Xrefactory for Emacs http://www.xref.sk/xrefactory/main.html
if a non console refactoring tool is o.k for you as well.
C-xrefactory was an open source version of xrefactory, covering C and Java, made available on SourceForge by Marián Vittek under GPLv2.
For those interested, there's an actively maintained c-xrefactory fork on GitHub:
https://github.com/thoni56/c-xrefactory
The goal of the GitHub fork is to refactor c-xrefactory itself, add a test suite, and try to document the original source code (which is rather obscure). Maybe, in the future, also convert it into an LSP C language server and refactoring tool.
C-xrefactory works on Emacs; setup scripts and instructions can be found at the repository. Windows users can run it via WSL/WSL2.
You could consider coding a GCC plugin or a MELT extension (MELT is a domain specific language to extend GCC) for your needs.
However, such approach would take you some time, because you'll need to understand some of GCC internals.
For Windows only, and not FOSS but you said "any direction..."
Our DMS Software Reengineering Toolkit" with its C Front End can apply transformations to C source code. DMS can be configured to carry out custom, complex reliable transformations, although the configuration isn't as easy as typing just a command like "refactor frazzle by doobaz".
One of the principal stumbling blocks is still the preprocessor. DMS can transform code that has preprocessor directives in typical places (around statements, expressions, if/for/while loop heads, declarations, etc.) but other "unstructured conditionals" give it trouble. You can run DMS by expanding the preprocessor directives out of existence, or more imporantly, expanding out the ones that give it trouble, but mostly people don't like this because they prefer to keep thier preprocessor directives. So it isn't perfect.
[Another answer suggested Concinelle, which looks pretty good from my point of view. As far as I know, it doesn't handle preprocessor directives at all; I could be wrong and it might handle some cases as DMS does, but I'm sure it can't handle all the cases].
You don't want to consider rolling your own. Building a transformation/refactoring tool is much harder than you might guess having never tried it. You need full, accurate parsers for the (C) dialect of interest and just that is pretty hard to get right. You need a preprocessor, symbol tables, flow analysis, transformation, code regeneration machinery, ... this stuff takes years of effort to build and get right. Trust me, been there, done that.
I'm implementing a new back-end to LLVM, starting with the CBackend target.
The end goal is to use "llc" to generate source transforms of input C code.
However, there are a number of optimizations I'd like to make, which don't seem to be very well supported within this context.
The LLVM object code is very low level, and I have to inspect it to re-discover what's actually going on. This would be a lot simpler to do at the AST level.
However, it appears that the AST level is a Clang-internal construct, and there's no easy way to plug into this.
Do I have to inspect the LLVM object code and reverse-engineer the higher-level flow myself? (Does each back-end have to do this? That seems wasteful!)
In general, you cannot reverse-engineer everything. So, you have only two possibilities:
Do everything on clang AST level.
Emit additional information (e.g. via metadata) which might help you to recover some aspects of the input source.
But really, you shouldn't do any source-to-source transform on LLVM IR level, it's a wrong tool for a given target. You can surely plug to AST level. E.g. clang sources contains a rewriter which turns ObjC code into plain C.
I am looking at making use of the Clang's AST for my C code and do some analysis over the AST. Some pointers on where to start, how to obtain the Clang's AST, tutorials or anything in this regard will be of great help!!!
I have been trying to find some and I got this link which was created 2 years back. But for some reason, it is not working for me. The sample code, in the tutorial, gives me too many errors. So I am not sure, if I build the code properly or some thing is wrong with the tutorial. But I would be happy to start from some other page as well.
Start with the tutorial linked by sharth. Then go through Clang's Doxygen. Start with SemaConsumer.
Read a lot of source code. Clang is a moving target. If you are writing tools based on clang, then you need to recognize that clang is adding and fixing features daily, so you should be prepared to read a lot of code!
You probably want the stable C API provided in the libclang library, as opposed to the unstable C++ internal APIs that others have mentioned.
The best documentation to start with currently is the video/slides of the talk, "libclang: Thinking Beyond the Compiler" available on the LLVM Developers Meeting website.
However, do note that the stability of the API comes at a cost of comprehensiveness. You won't be able to do everything with this API, but it is much easier to use.
To obtain the AST as well as get to know stages of the frontend, there is a frontend chapter in the book "LLVM core libraries". Basically it has such a flow (in the case of llvm-4.0.1 and should similar for later versions):
cc1_main.cpp:cc1_main (ExecuteCompilerInvocation)
CompilerInstance.cpp:CompilerInstance::ExecuteAction
ParseAST.cpp:clang::ParseAST (Consumer>HandleTranslationUnit(S.getASTContext())
CodeGenAction.cpp:HandleTranslationUnit
The last function handles the whole translation unit(top level decls are already handled at this point), and calls EmitBackendOutput to do backend stuff. So this function is a good spot where you can do something with the complete AST and before emitting backend output.
In terms of how to manipulate the AST, clang has some basic tutorial on this: http://clang.llvm.org/docs/RAVFrontendAction.html.
Also look at ASTDumper.cpp. It's the best example of visiting the AST.
Another good tutorial: https://jonasdevlieghere.com/understanding-the-clang-ast/ teaches you how to find a specific call expr in the AST via three different approaches.
I find this ASTUnit::LoadFromCompilerInvocation() fn as the most easiest way to construct the AST.
This link may give you some ideas http://comments.gmane.org/gmane.comp.compilers.clang.devel/12471
This is a question I've come across repeatedly, usually concerning plug ins, but recently I came across it trying to hammer out some build system issues. My concern is primarily for *nix based systems, but I suppose it applies to windows as well.
The question is, what is the minimum amount of information necessary to do dynamic linking? I know linux distributions like Debian have simply an 'i686', which is enough. However, I suppose there is some implicit information here, and I probably won't be able to do dynamic linking of any shared object as long as they're compiled using -march=i686, will I?
So what must be matched correctly for me to be able to load a shared object successfully? I know that for c++ even the compiler (and sometimes version) must match due to name mangling, but I was kind of hoping this wasn't the case for c.
Any thoughts appreciated.
Edit:
Neil's answer made me realize I'm not really talking about dynamic linking, or rather, the question is two-fold,
what's needed for static linking, and
what's needed for dynamic linking
I have higher hopes for the first I guess.
Well at minimum, the code must have been compiled for the same processor family, and you need to know the names of the library and the function. On top of that, you need the same ABI. You should be aware that despite what people think, the C Standard does not specify an ABI and it is entirely possible for two C compilers (or versions of the same compiler) to adhere to the standard, run on the same platform, but have different ABIs.
As for exactly specifying architecture details - I must admit I've never done it. Are you planning on distributing binary libraries on different Linux variants?