Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
We all know that C compilers spit out assembly.
However I am doing research where my tool only accepts a narrow subset of ANSI C.
Is there any C-to-C translators out there that can inline functions or flatten struct's, but writes out C code?
Any other tool that could simplify C code, let me hear them.
LLVM supports something like this.
If you do not require the resulting C code to be particularily readable, you could use your regular compiler to produce a binary executable, and then use a decompiler to produce C code from the binary. The decompiler will most likely not be able to "deinline" the functions that the compiler inlined. Not sure about the structs, though, but if you compile without debugging symbols and use a not-too-sophisticated decompiler, it might not detect the structs at all.
Clang can translate its AST back to C as far as I can understand from various sources on the Internet.
The old MIT project C2C (was on FTP for some time) and the newer Cilk give you the possibility to run the C->AST->C process.
Cilk and Cilk++ are actively maintained. They include a very good ANSI C parser.
Our DMS Software Reengineering Toolkit and its C Front End can do this.
DMS provides generic machinery for parsing, building ASTs, symbol tables, generally analyzing ASTs, and specific analyzers for control/dataflow/points-to/value range, as well as transforming ASTs aribtrarily either procedurally or using patterns, and regeneration of source text including comments. DMS's ability to process multiple compilation units at the same time allow for global analzyers and transformations that affect multiple compilation units simultaneously.
DMS's C Front end specializes all this for C (it has front ends for a variety of other langauges). It handles variety of dialects, including ANSI, GCC 3/4, MS Visual C and Green Hills C; it can be customized for other dialects as needed.
DMS has been used for a variety of C analysis/transformation projects, including analyzing a 26 million line software system.
An interesting application of DMS is to instrument C source to catch pointer errors when they occur (rather than suffering a long-delayed crash); see our CheckPointer tool. This tool reads the source code, inserts extra code to check each pointer access, and then writes out the results.
In the process of doing this, it normalizes the C code to a simplified subset to get rid of lots of special cases. This normalization may be pretty close to the kind of thing OP wants to do.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
In recent months, I have been seeing mentions of "LLVM" all over the place. I've looked it up, but the description of a "modern compiler infrastructure" doesn't really tell me anything. I can't find much about it, other than some mention of a c compiler that comes along with it (which doesn't seem to be any different from any other C compiler out there.)
Is there some difference between this LLVM thing and any other compiler, say, GCC? Or is it an over-hyped replacement benefiting from being newer than the competition?
There is some academic literature on the matter, I recommend the AOSA book chapter on it, written by the principal author (Chris Lattner).
LLVM is a collection of libraries built to support compiler development and related tasks. Each library supports a particular component in a typical compiler pipeline (lexing, parsing, optimizations of a particular type, machine code generation for a particular architecture, etc.). What makes it so popular is that its modular design allows its functionality to be adapted and reused very easily. This is handy when you're developing a compiler for an existing language to target a new hardware architecture (you only have to write the hardware specific components, all the lexing, parsing, machine independent optimization, etc. are handled for you), or developing a compiler for a new language (all the back end stuff is handled for you), or when you're doing something compiler adjacent (like analyzing source code, embedding a language in a larger application, etc.).
In order to support this, LLVM employs a pretty sophisticated internal representation (called the LLVM IR, creatively enough) that is basically assembly language for a theoretical hardware architecture designed to make targeting it with a compiler very easy. Most of the LLVM libraries take the IR in, operate on it, and output the modified IR, supporting the project's aim of modularity. This is in contrast to GCC, which (historically, I haven't checked recently) has a less complete IR and thus the separate phases of compilation are very tightly coupled because they have to share a lot of information.
Clang is the flagship compiler built on the LLVM framework.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I decided to start studying compiler theory but the problem is that I want a compiler for any language in order to track each of
lexical analyzer output.
syntax tree.
intermediate representation.
code generation.
I dont care for optimization right now
I am aware of some questions similar to mine about clang and gcc and I understand that both of them make lexical and syntax analysing on the fly
I just want any compiler in any language as long as the compiler itself is written in C and run on ubuntu x64
I am not sure you have the right approach, if you are willing to learn about compilation techniques for C specifically. And C is not the best language to write a compiler in (if you start from scratch, Ocaml is better suited for that task). BTW, recent Clang/LLVM or GCC are coded in C++ (no more in C).
The C language now sort-of requires optimization, as I explained here, so skipping the optimization part is not very useful. Be aware that optimization passes form the majority and the most difficult part of real-world compilers.
The lexing and parsing parts of compiler are now well understood. And there are several code generator tools for them (yacc or bison, lex or flex, ANTLR...). For several pragmatical reasons, real compilers like GCC don't use these tools.
You could look into tinycc, nwcc, or 8cc if you want to look inside non-optimizing toy C compilers.
You could also look into the intermediate representations of real compiler, e.g. GIMPLE for GCC (BTW, try to compile with gcc -fdump-tree-all -O2 -c some simple C code with a few loops; you'll be surprized by the hundreds of dump files showing the many internal compiler representations from many passes). You'll learn a lot by customizing GCC with MELT, and the MELT documentation page contains several very useful references. This answer should also help and contains or references some pictures of GCC.
disclaimer: I am the main author of MELT
PS. There are very good reasons to bootstrap compilers. So a compiler for a language other than C is unlikely to be coded in C (it is often coded for that language itself), since C is not a good programming language to write a compiler from scratch.
PPS. If you only know C -and no other programming languages-, I would suggest to learn some other programming language (e.g. Scheme with SICP, Ocaml, or Haskell or Scala or Clojure or Common Lisp) before diving into compilers! Read also something about Programming Language Pragmatics. If you know a bit of Scheme or Lisp, Queinnec's book Lisp In Small Pieces will teach you a big lot.
There are many, many places to start from to explore this territory. Many languages include a compilation capability or aspect such as Lisp and Forth.
To learn about a C compiler, there is a book about the LCC compiler which includes the source code for the compiler. There are also repositories of old C compilers at The Unix History Society archive (tuhs.org).
Still another angle you could take is to examine the language False (an ancestor of the more famous Brainfuck) which is designed to be implemented with very little code.
Another angle, which connects to your interest in complexity theory, is to learn about the Chomsky Hierarchy of languages and the associated abstract machines which can parse them. This will teach you why Lex and Yacc are separate tools and what each is good for (and how to do it yourself and not need them at all).
I am actually on the very same quest myself. I'm currently reading the old 1979 book Anatomy of Lisp which contains compiler code in, of course, Lisp. But this is ok, because I already have my own homebrewed lisp interpreter to execute it with.
The Tiger language has been designed by prof. Andrew Appel exactly on purpose to illustrate, step-by-step, a full compiler construct process.
You can google for 'tiger language' and read some online resource, there are also some questions/answers here on SO, but the better choice would be to get a copy of the book for the language you prefer, and implement the parts you're most interested into.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I started a while ago to learn the C language, and has spent several hours I search THE miracle software.
I am looking for software that import sources of software in C (files.c) and generates a "mind map" of the code with all files, functions, variables, etc ...
Do you know if it exists? It'll help me a lot to understand the architecture of complex software.
Thank you very much for all your answers.
Take a look at the "call graph". This sort of visualization should get you started.
As the comment suggests, Doxygen is a good open-source tool. Take a look at some output here. Doxygen is straight-forward to configure for call-graph generation under *nix. It's a little more complex for Windows. First, check out this SO post: how to get doxygen to produce call & caller graphs for c functions. Doxygen's HTML output provides a number of nice cross-referencing features (files, variables, structs, etc.) in addition to caller/callee graphs.
On the commercial side, Understand for C/C++ has first-rate visualization features. Google "c call graph diagram" for other commercial and open-source options.
Finally, there are some older SO posts, like this one Tools to get a pictorial function call graph of code. Take a look at it.
Look into the program ctags. It is an indexer of names and functions based on the structure of the programming language.
It is quite mature, and has integration with a number of other tools. I use it with an older (but very nice) text editor called vi, but it can be used independently from the command line.
It does not generate a graphical view of the connections. However, in my estimation there are probably too many connections in most C programs to display visually without creating a large amount of information overload.
This answer differs from Throwback's answer in some interesting ways. A call graph can mean a few things. One thing it can mean is the path a running program took through a section of code, and another is the combination of all paths a running program might take through the code, and another is the combination of all paths in the code (whether they can be reached or not).
Your needs will drive which tool you should use.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is there any source-to-source converter (translator) from Ada (95, 2005) to C?
How full they are (can they convert every feature of Ada into gnu c99 + pthreads + POSIX)?
Is it possible to use such ada-to-c translator in critical applications?
PS: Translators to C++ (up to 2003 with gnu extensions) are welcome too.
PPS: when said "gnu c99", it means only that C99 + most gnu extensions are supported, but don't mean the GCC.
I don't know of any open source Ada-to-C translator. The only one I knew of at all was SofCheck's, which was reportedly pretty good.
SofCheck has since been bought by AdaCore, and I did a very brief search of the AdaCore website for the translator, and nothing jumped out. You could ask them at sales#adacore.com, if pursuing a commercial solution is a viable option for you. (At least get a price.)
Unless there is an incredibly strong reason to use Ada for this application (e.g., customer demands it, or you already have a big application coded in Ada that you want to use), it will likely be a lot less painful if you just bite the bullet and code your solution in well-crafted C99 or C++ as you see fit.
If you insist, Sofcheck's translator might be best; they've been working on it a long time.
Failing that, you might(?) build a translator starting with the ASIS output of an Ada compiler. That's likely rather a lot of persnickety work since Ada has pretty precise semantics that you'd better preserve if you want to just carelessly code in Ada, translate and run. It will be even more work if you want the output to be "pretty" for the final customer. (Long term maintenance should be a consideration). I suspect implementing code to simulate Ada's rendezvous might be rather tricky, being both semantically complicated and asynchronous at the same time. The real flaw with this approach is that it is a lot of work; maybe just getting on with your life and coding the application itself in something non-Ada would be less effort.
See my caveats on language translation done poorly and alternative methods.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm after a very tiny XML parser for an embedded project. It needs to compile down to 10-15k, doesn't need to validate, and needs to be simple and portable.
I was able to tweak the compilation flags of the following XML parser libraries for C, and cut down more than 50% of their size on my Ubuntu machine. Mini-XML is the only one close to what you requested:
Mini-XML (36K)
Expat (124K)
RXP (184K)
There's a good discussion here:
C XML library for Embedded Systems
I was searching for one recently and I found SimpleXML (http://simplexml.sourceforge.net/) and the slightly larger sxmlc(http://sourceforge.net/projects/sxmlc/)
I find SimpleXML more interesting because it's simpler, I didn't try it but it looks like it matches what I have in mind, a single file(well .h and .c) library that doesn't support exotic XML features.
The simple XML parser is a tiny parser for a subset of XML (everything except entities and namespaces). It uses a simple "one-handler per tag" interface and is suited for use with devices with limited resources.
Try yxml — it's really small and fast non–validating parser.
You can always roll your own implementation. I did this a few years ago, and just now added some interface documentation to the code at mercurial.intuxication.org/hg/cstuff.
Please note that the parser has never been used in a production environment or even been tested more than rudimentarily; comments are non-existent as well, so have fun grokking the code if you need to modify it ;)
I developed sxmlc ("Simple XML in C") exactly to be like that: as little files as possible. It's only one file, with an optional "search" file you can add if you need XPath-like search through the document.
It handles DOM-style loading (the whole document tree in memory) or SAX-style loading (calling callbacks whenever a node is read with its attributes, or text was read on a node). If memory is a concern you'll be interested in SAX.
Some people were also interested by the fact that it can parse either files or memory buffers (useful when you get XML as a web reply).
It handles Unicode files since version 4 through #define, so if you don't need Unicode, just don't define the SXMLC_UNICODE and there won't be any weight increase in the binary.
I also have to say it keeps comments when writing back XML to disk! I always felt sorry when people spend time explaining configuration syntax in XML files ("put 'true' to enable special compression..."), which are wiped when saved back by the application.
It compiles under Linux and Windows. I had good feedback from people happily embedding it in routers.
As I want to keep it as simple as possible, I will probably not add new functions but rather improve the existing ones (and correct bugs, of course! :)). I am not very active in its development, unless bugs are reported.