AST from C code [closed] - c

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I want to perform some transformations on C source code. I need a tool on linux that generates a complete AST from the source code so that I can apply my transformations on this AST and then convert it back to the C source code. I tried ELSA but it is not getting compiled. (I am using Ubuntu 8.4). Can anyone suggest a better tool/application?

I would recommend clang. It has a fairly complete C implementation with most gcc extensions, and the code is very understandable. Their C++ implementation is incomplete, but if you only care about generating ASTs from C code that should be fine. Depending on what you want to do you can either use clang as a library and work with the ASTs directly, or have clang dump them out to console.

See pycparser - a pure-Python AST generator for C.

There are two projects that I'm aware of and that you could find useful:
CIL
Transformers
They both parse a standard C source code to allow further analisys and transformation. I've not used them so you have to check for yourself if they fit your needs.
The suggestion of using GCC is also valid, of course. I know there's not much documentation on this aspect of gcc, though.

To get AST XML output you can try to use cscan from MarpaX::Languages::C::AST. The output will look like:
xml
<cscan>
<typedef_hash>
<typedef id="GLenum" before="unsigned int" after="" file="/usr/include/GL/gl.h"/>
...

www.antlr.org

http://ctool.sourceforge.net/

Our DMS Software Reengineering Toolkit has been used on huge C systems, parsing, analyzing, transforming, and regenerating C code. Runs on Windows, and will run on Linux under Wine, but it does handle Linux-style (GCC) C code.
I can't emphasize enough the ability to round-trip the C source code: parse, build trees, transform, regenerate compilable C code with the comments and either prettyprinted or with the original programmer's indentation. Few of the other answers here suggest systems that can do that robustly.
The fact that DMS is designed to carry out program transformations (as opposed to other systems suggested in answers here) is also a great advantage. DMS provide tree-pattern matches and rewrites; it augments this with full control and data flow analyis to be used to extend the conditions that you'd like to match. A tool intending to be a compiler is just that, and you'll have a very hard time persuading it not to be a compiler, and an instead to be a transformation engine as the OP requested.
See https://stackoverflow.com/a/2173477/120163 for example ASTs produced by DMS.

I've done small amounts of work on source-to-source transformations and I found CIL to be very powerful for this task. CIL has the advantage of being a framework specifically designed for static source analysis and transformation. It can also process code with any amount of ugly GCC specific extensions(It's been used to process the Linux kernel, as one example.) Unfortunately, it is written in OCAML, and analyses/transformations built using it must also be writtne in OCAML, which might be problematic if you've never used it.
Alternatively, clang is supposed to have a relatively easily-hackable codebase and it can certainly be used to produce C AST's.

You can try generate AST (Abstract Syntax Tree) using Lexx and Yacc on Linux:
lex and yacc
from lex and yacc to ast

"I tried ELSA but it is not getting
compiled. (I am using Ubuntu 8.4)"
The Elkhound and Elsa source code, version 2005.08.22b from scottmcpeak.com/elkhound/ is outdated (old C++ style .h header files).
Elsa is working and part of Oink: http://www.cubewano.org/oink/#Gettingthecode
I have just got it working now under Ubuntu 9.10.

How about taking gcc and writing a custom backend for it? I've never done it nor even worked on gcc source code, so I don't know how hard it would be.

Related

Zopfli is written in C for portability... wait what? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
So I am not a C programmer so pardon this question.
I was reading this blog entry Google Zopfli Compression and I was a little dumbfounded by the following sentence : "Zopfli is written in C for portability".
How exactly is C a portable language? Or does he not mean portable in a compile-to-machine-code sense, but some other context? I guess C is more portable than writing assembly code. But is that really the comparison he is trying to make? I hope someone can enlighten me as to what he means and how exactly C is a portable language.
Thanks a lot!
Portable in this context means something like "Anybody can take this source code and compile it on their own computer and have this program." Very nearly all computers drawing power somewhere today have a C compiler available for them (it may not be installed on that machine, but it's either available to be installed or is available as a cross-compiler (eg embedded systems)), so that same source code is portable virtually everywhere. (EDIT: I'm assuming based on context that the source code doesn't have system-specific things in it, as system-specific things would limit your portability.)
"Portability" has multiple meanings, depending on the context:
The C language is "portable" in the sense that C compilers have been written for a wide variety of platforms, from mainframes to microcontrollers;
The language is also "portable" in the sense that there is an agreed-upon standard that implementations conform to (to greater or lesser degree), so you don't have subtly different versions of the language depending on the vendor - the behavior of a conforming program should be the same on any conforming implementation;
C programs that don't make any assumptions about the system they're running on (type sizes, alignment, endianess) or use system-specific libraries are often "trivially" portable; they only need to be recompiled for the target platform, without needing to edit the source code.
Compared to the majority of its contemporaries (Pascal, Fortran, etc.), C is highly portable, and I spent the bulk of the '90s writing C code that had to run on multiple platforms concurrently (one project required the same code to run on Windows NT, Solaris, and Classic MacOS).
C's portability can be summed up as "write once1, build and run everywhere", where Java and C#'s portability can be summed up as "write and build once, run everywhere."
1. Subject to the caveats in the third bullet
For a piece of software to be considered cross-platform, it must be able to function on more than one computer architecture or operating system.
Developing such program can be a time-consuming task because different operating systems have different application programming interfaces (API).
For example, Linux uses a different API for application software than Windows does.
C is a language you can use in most of the API.
C code can be directly called in C++, and easily used in C# and I believe Objective-C. That and the wide availability of c compilers, it does make sense.
Of course, the argument can also be made that Java is more portable as far as running it directly on other machines. But Java can't be moved from language to language as easily.

Code refactoring tools for C, usable on GNU/Linux? FOSS preferable

Variations of this question have been asked, but not specific to GNU/Linux and C. I use Komodo Edit as my usual Editor, but I'd actually prefer something that can be used from CLI.
I don't need C++ support; it's fine if the tool can only handle plain C.
I really appreciate any direction, as I was unable to find anything.
I hope I'm not forced to 'roll' something myself.
NOTE: Please refrain from mention vim; I know it exists and what its capabilities are. I purposefully choose to avoid vim, which is why I use Komodo (or nano on the servers).
I don't think that a pure console refactoring tool would be nice to use.
I use Eclipse CDT on linux to write and refactor C-Code.
There exists also Xrefactory for Emacs http://www.xref.sk/xrefactory/main.html
if a non console refactoring tool is o.k for you as well.
C-xrefactory was an open source version of xrefactory, covering C and Java, made available on SourceForge by Marián Vittek under GPLv2.
For those interested, there's an actively maintained c-xrefactory fork on GitHub:
https://github.com/thoni56/c-xrefactory
The goal of the GitHub fork is to refactor c-xrefactory itself, add a test suite, and try to document the original source code (which is rather obscure). Maybe, in the future, also convert it into an LSP C language server and refactoring tool.
C-xrefactory works on Emacs; setup scripts and instructions can be found at the repository. Windows users can run it via WSL/WSL2.
You could consider coding a GCC plugin or a MELT extension (MELT is a domain specific language to extend GCC) for your needs.
However, such approach would take you some time, because you'll need to understand some of GCC internals.
For Windows only, and not FOSS but you said "any direction..."
Our DMS Software Reengineering Toolkit" with its C Front End can apply transformations to C source code. DMS can be configured to carry out custom, complex reliable transformations, although the configuration isn't as easy as typing just a command like "refactor frazzle by doobaz".
One of the principal stumbling blocks is still the preprocessor. DMS can transform code that has preprocessor directives in typical places (around statements, expressions, if/for/while loop heads, declarations, etc.) but other "unstructured conditionals" give it trouble. You can run DMS by expanding the preprocessor directives out of existence, or more imporantly, expanding out the ones that give it trouble, but mostly people don't like this because they prefer to keep thier preprocessor directives. So it isn't perfect.
[Another answer suggested Concinelle, which looks pretty good from my point of view. As far as I know, it doesn't handle preprocessor directives at all; I could be wrong and it might handle some cases as DMS does, but I'm sure it can't handle all the cases].
You don't want to consider rolling your own. Building a transformation/refactoring tool is much harder than you might guess having never tried it. You need full, accurate parsers for the (C) dialect of interest and just that is pretty hard to get right. You need a preprocessor, symbol tables, flow analysis, transformation, code regeneration machinery, ... this stuff takes years of effort to build and get right. Trust me, been there, done that.

Small libc for embedded systems [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I am looking for a small libc for embedded use with freertos on a ARM7 microcontroller.
I have looked at newlib, but it is a bit too complex for my needs. Newlib calls malloc() in
a number of functions (e.g. printf()), which is not good for small embedded realtime systems.
Does anyone know of a small, portable, open source libc implementation that will fit my application?
PDCLib might fit your needs. It's still incomplete [broken link], though, and probably in need of a lot more real-world testing. Its author goes by DevSolar here on SO.
update 2012-11-01: As of 2012-08-14, development has been taken over by Owen Shepherd, complete with a new homepage and bitbucket repository [broken link, use this one].
update 2015-10-31: The dedicated website seems to be dead, but the code can still be found on bitbucket. The last commit to that repository happened 2014-11-24.
update 2016-07-12: The website is back up, and DevSolar started committing again on 2016-03-01.
I use newlib on my Cortex_M3 with 32kB RAM, and to eliminate the malloc() you can use siprintf() or sniprintf().
Pro: No more calls to malloc().
Con: It does not suport formatting float and double, and is not really portable this way.
If you use newlib and do not implement the sbrk syscall, then any function you use that requires malloc will generate a linker error, which will prevent you from inadvertently using a call that requires dynamic memory . So I would suggest that you do that, and then simply avoid those functions that cause the linker error. You can modify or override any library functions you do not wish to use.
printf() is not good for small embedded realtime systems!
Actually it is worse than malloc in many ways. Variable argument lists, very complex formatting, float number support when you don't need it etc etc. printf() comes with an enormous overhead, and the compiler will not be able to reduce it, as every parameter passed to it is evaluated in runtime.
printf() is perhaps ok for hobbyists and beginners still learning C. But if you are a professional programmer, you really ought to write your own serial monitor / LCD routines. You will dramatically improve the program performance and flash consumption.
I had similar needs and found that klibc fit it quite well. The only downside (for commercial use) is that the distribution includes a few GPL-licensed files, even though most of it is BSD-licensed. I have hacked a minimal version of it here.
This is even more limited than PDCLib, and suitable if you just need a few basic functions such as printf and strtok. Compiles to just 4kB with all functions included.
You might want to look into the Embedded Artistry libc, which promises to be minimal and well-tested. It includes a malloc-free printf(). Disclaimer: I have not used it, but it appears well-structured and actively developed.
You can check out the LGPL µClibc, which is supposed to be close to glibc but much more suited to embedded systems.
It also has a page referencing other open source C libraries, including newlib and eCos, which may be more suited for non-Linux environments.
Look into uClibc and EGLIBC, perhaps.

Parse C files [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I am looking for a Windows based library which can be used for parsing a bunch of C files to list global and local variables. The global and local variables may be declared using typedef. The output (i.e. list of global and local variables) can then be used for post processing (e.g. replacing the variable names with a new name).
Is such a library available?
Some of the methods available:
Elsa: The Elkhound-based C/C++ Parser
CIL - Infrastructure for C Program Analysis and Transformation
Sparse - a Semantic Parser for C
clang: a C language family frontend for LLVM
pycparser: C parser and AST generator written in Python
Alternately you could write your own using lex and yacc (or their kin- flex and bison) using a public lex specification and a yacc grammar.
Possibly overkill, but there's a complete ANSI C parser written with Boost.Spirit:
http://spirit.sourceforge.net/repository/applications/c.zip
Maybe you'll be able to model it to suit your needs.
Parsing C is lot harder than it looks, when you take into
account different dialects, preprocessor directives,
the need for type information while parsing, etc.
People that tell you "just use lex and yacc" have
clearly not done a production C parser.
A tool that can do this is our C front end
It addresses all of the above issues.
On completion, it has a complete, navigable symbol table
with all identifiers and corresponding type information.
Listing global and local variables would be trivial with this.
I'm the architect behind Semantic Designs.
I don't know if it offers a library, but have a look at CTAGS.
If it is plain C, lex and yacc are your friends, but you need to take on account C preprocessor - source files with unexpanded macros typically are do not comply with C syntax so parser, written with K&R grammar in mind, most likely will fail.
If you decide to parse the output of preprocessor, be prepared that your parser will fail due to "extensions" of your particular compiler, because very likely standard library headers use them. At least this the the case with GCC.
I had this with GCC and finally decided to achieve my goal using different approach. If you just need to change names for variables, regular expressions will do fine, and there is no need to build a full parser, IMHO. If your goal is just to collect data, the ultimate source of data is debug information. There are ways to get debug information out of binary - for ELF executables with DWARF there is libdwarf, for Windows-land (COFF ?) should be something as well. Probably you can use some existing tools to get debug information about binary - again, I know nothing about Windows, you need to investigate.
I recently read about a win32-based system that looked at the debugging information in COFF dlls:
http://www.drizzle.com/~scottb/gdc/fubi-paper.htm
Maybe gnu project cflow http://www.gnu.org/software/cflow/ ?

Choosing a static code analysis tool [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I'm working on a project where I'm coding in C in a UNIX environment. I've been using the lint tool to check my source code. Lint has been around a long time (since 1979), can anyone suggest a more recent code analysis tool I could use ? Preferably a tool that is free.
Don't overlook the compiler itself. Read the compiler's documentation and find all the warnings and errors it can provide, and then enable as many as make sense for you.
Also make sure to tell your compiler to treat warnings like errors so you're forced to fix them right away (-Werror on gcc).
By the way, don't be fooled -Wall on gcc does not enable all warnings.
You may want to check valgrind (free!) — it "automatically detect[s] many memory management and threading bugs, and profile[s] your programs in detail." It isn't a static checker, but it's a great tool!
For C code, you definitely should definitely use Flexelint. I used it for nearly 15 years and swear by it. One of the really great features it has is that warnings can be selectively turned off and on via comments in the code ("/* lint -e123*/"). This turned out to be a powerful documentation tool when you wanted to something out of the ordinary. "I am turning off warning X, therefore, there is some good reason I'm doing X."
For anybody into interesting C/C++ questions, look at some of their examples on their site and see if you can figure out the bugs without looking at the hints.
I've heard good things about clang static analyzer, which IIRC uses LLVM as it's backend. If that's implemented on your platform, that might be a good choice.
From what I understand, it does a bit more than just syntax analysis. "Automatic Bug Finding", for instance.
You can use cppcheck. It is an easy to use static code analysis tool.For example:
cppcheck --enable=all .
will check all C/C++ files under the current folder.
I recently compiled a list of all the static analysis tools I had at my disposal, I am still in the process of evaluating them all. Note, these are mostly security analysis tools.
splint
RATS
SMATCH
Uno
We've been using Coverity Prevent to check out C++ source code.
It's not a free tool (although I believe they offer free scanning for open source projects), but it's one of the best static analysis tools you'll find. I've heard it's even more impressive on C than on C++, but it's helped us avoid quite a number of bugs so far.
Lint-like tools generally suffer from a "false alarm" problem: they report a lot more issues than really exist. If the proportion of genuinely-useful warnings is too low, the user learns to just ignore the tool. More modern tools expend some effort to focus on the most likely/interesting warnings.
PC-lint/Flexelint are very powerful and useful static analysis tools, and highly configurable, though sadly not free.
When first using a tool like this, they can produce huge numbers of warnings, which can make it hard to differentiate between major and minor ones. Therefore, it is best to start using the tool on your code as early in the project as possible, and then to run it on your code as often as possible, so that you can deal with new warnings as they come up.
With continual use like this, you soon learn how to write your code in a way which confirms to the rules applied by the tool.
Because of this, I prefer tools like Lint which run relatively quickly, and so encourage continual use, rather than the more cumbersome tools which you may end up using less often, if at all.
You can try CppDepend, a pretty complete static analyzer available on windows and linux, throught VS Plugin, IDE or command line, and it's free for open source contributors
You might find the Uno tool useful. It's one of the few free non-toy options. It differs from lint, Flexelint, etc. in focusing on a small number of "semantic" errors (null pointer derefs, out-of-bounds array indices, and use of uninitialized variables). It also allows user-defined checks, like lock-unlock discipline.
I'm working towards a public release of a successor tool, Orion (CONTENT NOT AVAILABLE ANYMORE)
lint is constantly updated... so why would you want a more recent one.
BTW flexelint is lint
G'day,
I totally agree with the suggestions to read and digest what the compiler is telling you after setting -Wall.
A good static analysis tool for security is FlawFinder written by David Wheeler. It does a good job looking for various security exploits,
However, it doesn't replace having a knowledgable someone read through your code. As David says on his web page, "A fool with a tool is still a fool!"
cheers,
Rob
I've found that it's generally best to use multiple static analysis tools to find bugs. Every tool is designed differently, and they can find very different things from each other.
There are some good discussions in some of the talks here. It's from a conference held by the US Department of Homeland Security on static analysis.
Sparse is a computer software tool, already available on Linux, designed to find possible coding faults in the Linux kernel.
There are two active projects of Linux Verification Center aimed to improve quality of the loadable kernel modules.
Linux Driver Verification (LDV) - a comprehensive toolset for static source code verification of Linux device drivers.
KEDR Framework - an extensible framework for dynamic analysis and verification of kernel modules.
Another ongoing project is Linux File System Verification that aims to develop a dedicated toolset for verification of Linux file system implementations.
There is a "-Weffc++" option for gcc which according to the Mac OS X man page will:
Warn about violations of the following style guidelines from Scott Meyers' Effective C++ book:
[snip]
I know you asked about C, but this is the closest I know of..

Resources