Parse C files [closed] - c

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I am looking for a Windows based library which can be used for parsing a bunch of C files to list global and local variables. The global and local variables may be declared using typedef. The output (i.e. list of global and local variables) can then be used for post processing (e.g. replacing the variable names with a new name).
Is such a library available?

Some of the methods available:
Elsa: The Elkhound-based C/C++ Parser
CIL - Infrastructure for C Program Analysis and Transformation
Sparse - a Semantic Parser for C
clang: a C language family frontend for LLVM
pycparser: C parser and AST generator written in Python
Alternately you could write your own using lex and yacc (or their kin- flex and bison) using a public lex specification and a yacc grammar.

Possibly overkill, but there's a complete ANSI C parser written with Boost.Spirit:
http://spirit.sourceforge.net/repository/applications/c.zip
Maybe you'll be able to model it to suit your needs.

Parsing C is lot harder than it looks, when you take into
account different dialects, preprocessor directives,
the need for type information while parsing, etc.
People that tell you "just use lex and yacc" have
clearly not done a production C parser.
A tool that can do this is our C front end
It addresses all of the above issues.
On completion, it has a complete, navigable symbol table
with all identifiers and corresponding type information.
Listing global and local variables would be trivial with this.
I'm the architect behind Semantic Designs.

I don't know if it offers a library, but have a look at CTAGS.

If it is plain C, lex and yacc are your friends, but you need to take on account C preprocessor - source files with unexpanded macros typically are do not comply with C syntax so parser, written with K&R grammar in mind, most likely will fail.
If you decide to parse the output of preprocessor, be prepared that your parser will fail due to "extensions" of your particular compiler, because very likely standard library headers use them. At least this the the case with GCC.
I had this with GCC and finally decided to achieve my goal using different approach. If you just need to change names for variables, regular expressions will do fine, and there is no need to build a full parser, IMHO. If your goal is just to collect data, the ultimate source of data is debug information. There are ways to get debug information out of binary - for ELF executables with DWARF there is libdwarf, for Windows-land (COFF ?) should be something as well. Probably you can use some existing tools to get debug information about binary - again, I know nothing about Windows, you need to investigate.

I recently read about a win32-based system that looked at the debugging information in COFF dlls:
http://www.drizzle.com/~scottb/gdc/fubi-paper.htm

Maybe gnu project cflow http://www.gnu.org/software/cflow/ ?

Related

Getting data from a .json file in c [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I'm trying to find a good way to parse JSON in C. I really don't need a huge library or anything, I would rather have something small and lightweight with a bare minimum of features, but good documentation.
Does anyone have anything they can point me to?
Json isn't a huge language to start with, so libraries for it are likely to be small(er than Xml libraries, at least).
There are a whole ton of C libraries linked at Json.org. Maybe one of them will work well for you.
cJSON has a decent API and is small (2 files, ~700 lines). Many of the other JSON parsers I looked at first were huge... I just want to parse some JSON.
Edit: We've made some improvements to cJSON over the years.
NXJSON is full-featured yet very small (~400 lines of code) JSON parser, which has easy to use API:
const nx_json* json=nx_json_parse_utf8(code);
printf("hello=%s\n", nx_json_get(json, "hello")->text_value);
const nx_json* arr=nx_json_get(json, "my-array");
int i;
for (i=0; i<arr->length; i++) {
const nx_json* item=nx_json_item(arr, i);
printf("arr[%d]=(%d) %ld\n", i, (int)item->type, item->int_value);
}
nx_json_free(json);
Jsmn is quite minimalistic and has only two functions to work with.
https://github.com/zserge/jsmn
You can have a look at Jansson
The website states the following:
Jansson is a C library for encoding, decoding and manipulating JSON data. It features:
Simple and intuitive API and data model
Can both encode to and decode from JSON
Comprehensive documentation
No dependencies on other libraries
Full Unicode support (UTF-8)
Extensive test suite
I used JSON-C for a work project and would recommend it. Lightweight and is released with open licensing.
Documentation is included in the distribution. You basically have *_add functions to create JSON objects, equivalent *_put functions to release their memory, and utility functions that convert types and output objects in string representation.
The licensing allows inclusion with your project. We used it in this way, compiling JSON-C as a static library that is linked in with the main build. That way, we don't have to worry about dependencies (other than installing Xcode).
JSON-C also built for us under OS X (x86 Intel) and Linux (x86 Intel) without incident. If your project needs to be portable, this is a good start.
Do you need to parse arbitrary JSON structures, or just data that's specific to your application. If the latter, you can make it a lot lighter and more efficient by not having to generate any hash table/map structure mapping JSON keys to values; you can instead just store the data directly into struct fields or whatever.

PICK/BASIC, FlashBASIC, and C Interoperability

I stumbled across some interesting documentation regarding PICK programming:
http://www.d3ref.com/?token=flash.basic
It says FlashBASIC is a compiled, instead of interpreted, version of PICK programs that are interoperable with PICK. This is great. I am curious about how it describes Object code:
converts Pick/BASIC source code into a list of binary instructions
called object code.
Is this object code interoperable with other languages? Or is it limited to the PICK & Universe operating environment? In other words could a C program call a FlashBASIC program?
This is helpful in defining the C version, but cannot find any clear definition of the FlashBasic version:
What's an object file in C?
You're asking a few different questions which I'll try to answer.
Here is an article I wrote that might help your understanding of FlashBASIC. In short, where traditional MV BASIC is compiled and then run by assembler, the Flash compiler is C and generates an object module that sits below the standard BASIC object in frame space. At runtime that code is then interpreted by a C runtime. For our purposes here, there is no C interface, this is just an internal mechanism for getting code to run faster.
Note from the above that this is Not related to the "What's an object file in C?" topic because object modules in D3 are stored in D3 frames, completely unrelated to common OS-level object modules.
Now about C calling Pick - in your case D3: You can use the CP library - the docs are in the same area as the link you cited. Rather than binding with the database itself, you can also use your code in a client/server mode with the MVSP library if you're using Managed C (.NET). Or you can use any common web service client mechanism in C and setup D3 as a web service server with a number of technologies including MVST, mv.NET, Java, or C/C++.
I know that response is rather vague but you're asking a question which has been discussed at-length in forums over a period of years. If you ask a more specific question you'll get a specific answer. Feel free to refine your query in a comment and we can focus the answer.
Also note that you tagged this question as "u2". If you are really using the U2 variant of MV/Pick (Universe or Unidata) then the reference to the D3 docs was misleading and none of the above applies, as they do this differently in U2 and there is no FlashBASIC there. I know, you're confused. Let's work it out...
Yep, Flash BASIC just translates to C, is compiled, and resulting object files are dynamically loaded and linked, then run from the Pick OS. The feature of C programs running and interacting with BASIC was certainly possible, but we did not implement that feature.

Parsing JSON using C [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I'm trying to find a good way to parse JSON in C. I really don't need a huge library or anything, I would rather have something small and lightweight with a bare minimum of features, but good documentation.
Does anyone have anything they can point me to?
Json isn't a huge language to start with, so libraries for it are likely to be small(er than Xml libraries, at least).
There are a whole ton of C libraries linked at Json.org. Maybe one of them will work well for you.
cJSON has a decent API and is small (2 files, ~700 lines). Many of the other JSON parsers I looked at first were huge... I just want to parse some JSON.
Edit: We've made some improvements to cJSON over the years.
NXJSON is full-featured yet very small (~400 lines of code) JSON parser, which has easy to use API:
const nx_json* json=nx_json_parse_utf8(code);
printf("hello=%s\n", nx_json_get(json, "hello")->text_value);
const nx_json* arr=nx_json_get(json, "my-array");
int i;
for (i=0; i<arr->length; i++) {
const nx_json* item=nx_json_item(arr, i);
printf("arr[%d]=(%d) %ld\n", i, (int)item->type, item->int_value);
}
nx_json_free(json);
Jsmn is quite minimalistic and has only two functions to work with.
https://github.com/zserge/jsmn
You can have a look at Jansson
The website states the following:
Jansson is a C library for encoding, decoding and manipulating JSON data. It features:
Simple and intuitive API and data model
Can both encode to and decode from JSON
Comprehensive documentation
No dependencies on other libraries
Full Unicode support (UTF-8)
Extensive test suite
I used JSON-C for a work project and would recommend it. Lightweight and is released with open licensing.
Documentation is included in the distribution. You basically have *_add functions to create JSON objects, equivalent *_put functions to release their memory, and utility functions that convert types and output objects in string representation.
The licensing allows inclusion with your project. We used it in this way, compiling JSON-C as a static library that is linked in with the main build. That way, we don't have to worry about dependencies (other than installing Xcode).
JSON-C also built for us under OS X (x86 Intel) and Linux (x86 Intel) without incident. If your project needs to be portable, this is a good start.
Do you need to parse arbitrary JSON structures, or just data that's specific to your application. If the latter, you can make it a lot lighter and more efficient by not having to generate any hash table/map structure mapping JSON keys to values; you can instead just store the data directly into struct fields or whatever.

Parsing source code

I need to parse the source code of different files, each written in a different language, and I would like to do this using C.
To do that, I was thinking of using yacc / lex, but I find them very hard to understand, maybe due to the complete lack of decent documentation (either that, or they really are cryptic).
So my questions are: where can I find some good documentation for yacc / lex, preferably a tutorial style introduction? Or, is there any better way to do this in C? Maybe there's something else I could use instead of yacc / lex, perhaps even written in a different language?
yacc and lex are very powerful tools, built around the theories for compiler construction. To be able to fully understand them you probably need some basics in formal languages, automata theory and compiler construction.
The dragon book is a classic on the subject.
The second half of Kernighan and Pike's The Unix Programming Environment is an extended introduction to programming an interpreter with lex and yacc. The lex coverage is a little light, as they mostly use a custom scanner.
If you like math (the most important clause in this answer), then write your own compiler-compiler, and then write your compiler with that. I did this once because I was getting bored of writing all the functions for all the productions of a compiler which I had started as a recursive-descent compiler, because the available choices in 2004 didn't please me, and because I had free time while job-hunting. I only used the compiler compiler on the one project, and it is not necessarily thoroughly tested, so it is not on github. I was very happy with the grammar file syntax that I devised.
If I had such a need today I might make a different decision. The newer cutting-edge CC's seem to have have changed a lot in the last 8 years.

AST from C code [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I want to perform some transformations on C source code. I need a tool on linux that generates a complete AST from the source code so that I can apply my transformations on this AST and then convert it back to the C source code. I tried ELSA but it is not getting compiled. (I am using Ubuntu 8.4). Can anyone suggest a better tool/application?
I would recommend clang. It has a fairly complete C implementation with most gcc extensions, and the code is very understandable. Their C++ implementation is incomplete, but if you only care about generating ASTs from C code that should be fine. Depending on what you want to do you can either use clang as a library and work with the ASTs directly, or have clang dump them out to console.
See pycparser - a pure-Python AST generator for C.
There are two projects that I'm aware of and that you could find useful:
CIL
Transformers
They both parse a standard C source code to allow further analisys and transformation. I've not used them so you have to check for yourself if they fit your needs.
The suggestion of using GCC is also valid, of course. I know there's not much documentation on this aspect of gcc, though.
To get AST XML output you can try to use cscan from MarpaX::Languages::C::AST. The output will look like:
xml
<cscan>
<typedef_hash>
<typedef id="GLenum" before="unsigned int" after="" file="/usr/include/GL/gl.h"/>
...
www.antlr.org
http://ctool.sourceforge.net/
Our DMS Software Reengineering Toolkit has been used on huge C systems, parsing, analyzing, transforming, and regenerating C code. Runs on Windows, and will run on Linux under Wine, but it does handle Linux-style (GCC) C code.
I can't emphasize enough the ability to round-trip the C source code: parse, build trees, transform, regenerate compilable C code with the comments and either prettyprinted or with the original programmer's indentation. Few of the other answers here suggest systems that can do that robustly.
The fact that DMS is designed to carry out program transformations (as opposed to other systems suggested in answers here) is also a great advantage. DMS provide tree-pattern matches and rewrites; it augments this with full control and data flow analyis to be used to extend the conditions that you'd like to match. A tool intending to be a compiler is just that, and you'll have a very hard time persuading it not to be a compiler, and an instead to be a transformation engine as the OP requested.
See https://stackoverflow.com/a/2173477/120163 for example ASTs produced by DMS.
I've done small amounts of work on source-to-source transformations and I found CIL to be very powerful for this task. CIL has the advantage of being a framework specifically designed for static source analysis and transformation. It can also process code with any amount of ugly GCC specific extensions(It's been used to process the Linux kernel, as one example.) Unfortunately, it is written in OCAML, and analyses/transformations built using it must also be writtne in OCAML, which might be problematic if you've never used it.
Alternatively, clang is supposed to have a relatively easily-hackable codebase and it can certainly be used to produce C AST's.
You can try generate AST (Abstract Syntax Tree) using Lexx and Yacc on Linux:
lex and yacc
from lex and yacc to ast
"I tried ELSA but it is not getting
compiled. (I am using Ubuntu 8.4)"
The Elkhound and Elsa source code, version 2005.08.22b from scottmcpeak.com/elkhound/ is outdated (old C++ style .h header files).
Elsa is working and part of Oink: http://www.cubewano.org/oink/#Gettingthecode
I have just got it working now under Ubuntu 9.10.
How about taking gcc and writing a custom backend for it? I've never done it nor even worked on gcc source code, so I don't know how hard it would be.

Resources