Examining a C source code easily - c

In Java, just by clicking the classes in Eclipse, I can go to that specific class being referenced. In C, how can I do that? As far as I can remember I wasn't able to do that in Eclipse. I had hard times manually finding where externs are declared/defined etc.

use
ctags
This is command line tool. You need to first create tags for your whole source code and then you can jump to defination of any function or variable.
In eclipce IDE
You can go its defination by using F3 and come back using Alt + <-
If you do not want to use F3 then press Ctrl and move curcer to that place and click on that...you will go to its defination.

How to go to a certain place in the source code is IDE-specific and not related to the programming language, so your question doesn't make any sense.
Many C compilers support Eclipse.
In Eclipse, you do what you are asking for by placing the cursor on the item you are interested in, then press F3.

In addition of the other replies (notably mentionning ctags), and if using a recent GCC compiler (i.e. gcc, g++, preferably version 4.8 or 4.9, etc....) you could use the MELT plugin and DSL.
MELT enables you to work on the internal representations (such as Gimple) of the GCC compiler.
In particular, MELT has powerful pattern matching facilities, and a command line interface to find particular Gimple patterns. So for example you could, with a few command line arguments to gcc, find all the calls to malloc with a constant argument bigger than 30 bytes. This requires working on the compiler internal representations (e.g. because of sizeof operators) and is not possible in a purely textual tool.
For finding occurrences of identifiers, you could use grep or also the ack perl tool.
The tools mentioned here (ctags, grep, MELT, ack) are command line tools. It is up to you to configure or adapt your IDE (or editor like emacs) and/or your builder (like make) to invoke them.
Remember that compilers are command-line tools, at least on Linux
PS. I am the main author of MELT.

Related

gcc generating a list of function and variable names

I am looking for a way to get a list of all the function and variable names for a set of c source files. I know that gcc breaks down those elements when compiling and linking so is there a way to piggyback that process? Or any other tool that could do the same thing?
EDIT: It's mostly because I am curious, I have been playing with things like make auto dependency and graphing include trees and would like to be able to get more stats on the source files. And it seems like something that would already exist but i haven't found any options or flags for it.
If you are only interested by names of global functions and variables, you might (assuming you are on Linux) use the nm or objdump utilities on the ELF binary executable or object files.
Otherwise, you might customize the GCC compiler (assuming you have a recent version, e.g. 5.3 or 6 at least) thru plugins. You could code them directly in C++, or you might consider using GCC MELT, a Lisp-like domain specific language to customize GCC. Perhaps even the findgimple mode of GCC MELT might be enough....
If you consider extending GCC, be aware that you'll need to spend a significant time (perhaps months) understanding its internal representations (notably Generic Trees & Gimple) in details. The links and slides on GCC MELT documentation page might be useful.
Your main issue is that you probably need to understand most of the details about GCC internal representations, and that takes time!
Also, the details of GCC internals are slightly changing from one version of GCC to the next one.
You could also consider (instead of working inside GCC) using the Clang/LLVM framework (but learning that is also a lot of time). Maybe you might also look into Frama-C or Coccinnelle.
Another approach might be to compile with debug info and parse DWARF information.
PS. My point is that your problem is probably much more difficult than what you believe. Parsing C is not that simple ... You might spend months or even years working on that... And details could be target-processor & system & compiler specific...

Using parse_datetime from gnu c

I am developing a program for analyzing time series under gnu/linux. To analyze a time window, I want to be able to specify start/end times on the command line. Parsing dates using strptime is simple enough, however I would like to use the flexible 'natural language' format as it is used by the unix ''date'' command. There, this is done using the parse_datetime function.
I have the source of the coreutils, but would like to avoid copying over the code and all attached header files.
My question is: is there a standard library under Unix/Linux which gives access to the full power of parse_datetime().
The function you refer to is not part of any standard, nor any stock utility library. However, it is available as a semi-standalone component as part of gnulib, namely the parse-datetime module. You will need to take it and incorporate it into your program; the gnulib distribution has tools for that. Be aware that if you do this you have to GPL your entire program (this is not a big deal if the program is only for your personal use -- the GPL's requirements only kick in when you start giving the compiled program to other people).
A possible alternative is g_date_set_parse from GLib, but I can't speak to how clever it is.

Bootstrapping A compiler [duplicate]

I've heard of the idea of bootstrapping a language, that is, writing a compiler/interpreter for the language in itself. I was wondering how this could be accomplished and looked around a bit, and saw someone say that it could only be done by either
writing an initial compiler in a different language.
hand-coding an initial compiler in Assembly, which seems like a special case of the first
To me, neither of these seem to actually be bootstrapping a language in the sense that they both require outside support. Is there a way to actually write a compiler in its own language?
Is there a way to actually write a compiler in its own language?
You have to have some existing language to write your new compiler in. If you were writing a new, say, C++ compiler, you would just write it in C++ and compile it with an existing compiler first. On the other hand, if you were creating a compiler for a new language, let's call it Yazzleof, you would need to write the new compiler in another language first. Generally, this would be another programming language, but it doesn't have to be. It can be assembly, or if necessary, machine code.
If you were going to bootstrap a compiler for Yazzleof, you generally wouldn't write a compiler for the full language initially. Instead you would write a compiler for Yazzle-lite, the smallest possible subset of the Yazzleof (well, a pretty small subset at least). Then in Yazzle-lite, you would write a compiler for the full language. (Obviously this can occur iteratively instead of in one jump.) Because Yazzle-lite is a proper subset of Yazzleof, you now have a compiler which can compile itself.
There is a really good writeup about bootstrapping a compiler from the lowest possible level (which on a modern machine is basically a hex editor), titled Bootstrapping a simple compiler from nothing. It can be found at https://web.archive.org/web/20061108010907/http://www.rano.org/bcompiler.html.
The explanation you've read is correct. There's a discussion of this in Compilers: Principles, Techniques, and Tools (the Dragon Book):
Write a compiler C1 for language X in language Y
Use the compiler C1 to write compiler C2 for language X in language X
Now C2 is a fully self hosting environment.
The way I've heard of is to write an extremely limited compiler in another language, then use that to compile a more complicated version, written in the new language. This second version can then be used to compile itself, and the next version. Each time it is compiled the last version is used.
This is the definition of bootstrapping:
the process of a simple system activating a more complicated system that serves the same purpose.
EDIT: The Wikipedia article on compiler bootstrapping covers the concept better than me.
A super interesting discussion of this is in Unix co-creator Ken Thompson's Turing Award lecture.
He starts off with:
What I am about to describe is one of many "chicken and egg" problems that arise when compilers are written in their own language. In this ease, I will use a specific example from the C compiler.
and proceeds to show how he wrote a version of the Unix C compiler that would always allow him to log in without a password, because the C compiler would recognize the login program and add in special code.
The second pattern is aimed at the C compiler. The replacement code is a Stage I self-reproducing program that inserts both Trojan horses into the compiler. This requires a learning phase as in the Stage II example. First we compile the modified source with the normal C compiler to produce a bugged binary. We install this binary as the official C. We can now remove the bugs from the source of the compiler and the new binary will reinsert the bugs whenever it is compiled. Of course, the login command will remain bugged with no trace in source anywhere.
Check out podcast Software Engineering Radio episode 61 (2007-07-06) which discusses GCC compiler internals, as well as the GCC bootstrapping process.
Donald E. Knuth actually built WEB by writing the compiler in it, and then hand-compiled it to assembly or machine code.
As I understand it, the first Lisp interpreter was bootstrapped by hand-compiling the constructor functions and the token reader. The rest of the interpreter was then read in from source.
You can check for yourself by reading the original McCarthy paper, Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I.
Every example of bootstrapping a language I can think of (C, PyPy) was done after there was a working compiler. You have to start somewhere, and reimplementing a language in itself requires writing a compiler in another language first.
How else would it work? I don't think it's even conceptually possible to do otherwise.
Another alternative is to create a bytecode machine for your language (or use an existing one if it's features aren't very unusual) and write a compiler to bytecode, either in the bytecode, or in your desired language using another intermediate - such as a parser toolkit which outputs the AST as XML, then compile the XML to bytecode using XSLT (or another pattern matching language and tree-based representation). It doesn't remove the dependency on another language, but could mean that more of the bootstrapping work ends up in the final system.
It's the computer science version of the chicken-and-egg paradox. I can't think of a way not to write the initial compiler in assembler or some other language. If it could have been done, I should Lisp could have done it.
Actually, I think Lisp almost qualifies. Check out its Wikipedia entry. According to the article, the Lisp eval function could be implemented on an IBM 704 in machine code, with a complete compiler (written in Lisp itself) coming into being in 1962 at MIT.
Some bootstrapped compilers or systems keep both the source form and the object form in their repository:
ocaml is a language which has both a bytecode interpreter (i.e. a compiler to Ocaml bytecode) and a native compiler (to x86-64 or ARM, etc... assembler). Its svn repository contains both the source code (files */*.{ml,mli}) and the bytecode (file boot/ocamlc) form of the compiler. So when you build it is first using its bytecode (of a previous version of the compiler) to compile itself. Later the freshly compiled bytecode is able to compile the native compiler. So Ocaml svn repository contains both *.ml[i] source files and the boot/ocamlc bytecode file.
The rust compiler downloads (using wget, so you need a working Internet connection) a previous version of its binary to compile itself.
MELT is a Lisp-like language to customize and extend GCC. It is translated to C++ code by a bootstrapped translator. The generated C++ code of the translator is distributed, so the svn repository contains both *.melt source files and melt/generated/*.cc "object" files of the translator.
J.Pitrat's CAIA artificial intelligence system is entirely self-generating. It is available as a collection of thousands of [A-Z]*.c generated files (also with a generated dx.h header file) with a collection of thousands of _[0-9]* data files.
Several Scheme compilers are also bootstrapped. Scheme48, Chicken Scheme, ...

A Windows C compiler that doesn't split arguments in its runtime libraries?

I have heard that in Windows, parameters are passed a single parameter, and then the program splits it into arguments, either in its runtime libraries, or sometimes, in the actual code.
I've heard that most C/C++ compilers do it in runtime libararies (for example, TCC - Tiny C Compiler, which I downloaded)
Are there any C compilers I can download, that don't? Any links to them?
And in such a compiler, would argsv[0] have the whole string?
Added
It's based on what this person (jdedb) said in Super User question Can't pipe or redirect Cygwin grep output, after seeming to suggest that I ask on Stack Overflow.
"It's up to the called program to split the command tail into words, if it wants to operate in Unix (and C language) fashion. (The runtime support libraries of most C and C++ language implementations for Win32 do this splitting behind the scenes."
He said it's the compilers.. But according to Necrolis, it's not the compiler.
(added- Necrolis commented correcting my misreading, compiler!=runtime library)
If you are on Windows, just use GetCommandLine. This is how most CRT wrappers get the command line to split to start with.
As for your actual question, it's not the compiler, but the CRT startup wrapper that they use. If you implement mainCRTstartup, and override the entrypoint with it, you can do whatever you want. A good example of how it works can be seen here.
That "parameter splitting" is the way mandated by the C99 Standard (PDF file) in 5.1.2.2.1.
If an implementation (compiler + library + options) recognizes but does not separate the program name from the other parameters (and parameters from each other) it is not conforming.
Of course, if you use a free-standing implementation none of this applies.

How would I implement something similar to the Objective-C #encode() compiler directive in ANSI C?

The #encode directive returns a const char * which is a coded type descriptor of the various elements of the datatype that was passed in. Example follows:
struct test
{ int ti ;
char tc ;
} ;
printf( "%s", #encode(struct test) ) ;
// returns "{test=ic}"
I could see using sizeof() to determine primitive types - and if it was a full object, I could use the class methods to do introspection.
However, How does it determine each element of an opaque struct?
#Lothars answer might be "cynical", but it's pretty close to the mark, unfortunately. In order to implement something like #encode(), you need a full blown parser in order to extract the the type information. Well, at least for anything other than "trivial" #encode() statements (i.e., #encode(char *)). Modern compilers generally have either two or three main components:
The front end.
The intermediate end (for some compilers).
The back end.
The front end must parse all the source code and basically converts the source code text in to an internal, "machine useable" form.
The back end translates the internal, "machine useable" form in to executable code.
Compilers that have an "intermediate end" typically do so because of some need: they support multiple "front ends", possibly made up of completely different languages. Another reason is to simplify optimization: all the optimization passes work on the same intermediate representation. The gcc compiler suite is an example of a "three stage" compiler. llvm could be considered an "intermediate and back end" stage compiler: The "low level virtual machine" is the intermediate representation, and all the optimization takes place in this form. llvm also able to keep it in this intermediate representation right up until the last second- this allows for "link time optimization". The clang compiler is really a "front end" that (effectively) outputs llvm intermediate representation.
So, if you want to add #encode() functionality to an 'existing' compiler, you'd probably have to do it as a "source to source" 'compiler / preprocessor'. This was how the original Objective-C and C++ compilers were written- they parsed the input source text and converted it to "plain C" which was then fed in to the standard C compiler. There's a few ways to do this:
Roll your own
Use yacc and lex to put together a ANSI-C parser. You'll need a grammar- ANSI C grammar (Yacc) is a good start. Actually, to be clear, when I say yacc, I really mean bison and flex. And also, loosely, the other various yacc and lex like C-based tools: lemon, dparser, etc...
Use perl with Yapp or EYapp, which are pseudo-yacc clones in perl. Probably better for quickly prototyping an idea compared to C-based yacc and lex- it's perl after all: Regular expressions, associative arrays, no memory management, etc.
Build your parser with Antlr. I don't have any experience with this tool chain, but it's another "compiler compiler" tool that (seems) to be geared more towards java developers. There appears to be freely available C and Objective-C grammars available.
Hack another tool
Note: I have no personal experience using any of these tools to do anything like adding #encode(), but I suspect they would be a big help.
CIL - No personal experience with this tool, but designed for parsing C source code and then "doing stuff" with it. From what I can glean from the docs, this tool should allow you to extract the type information you'd need.
Sparse - Worth looking at, but not sure.
clang - Haven't used it for this purpose, but allegedly one of the goals was to make it "easily hackable" for just this sort of stuff. Particularly (and again, no personal experience) in doing the "heavy lifting" of all the parsing, letting you concentrate on the "interesting" part, which in this case would be extracting context and syntax sensitive type information, and then convert that in to a plain C string.
gcc Plugins - Plugins are a gcc 4.5 (which is the current alpha/beta version of the compiler) feature and "might" allow you to easily hook in to the compiler to extract the type information you'd need. No idea if the plugin architecture allows for this kind of thing.
Others
Coccinelle - Bookmarked this recently to "look at later". This "might" be able to do what you want, and "might" be able to do it with out much effort.
MetaC - Bookmarked this one recently too. No idea how useful this would be.
mygcc - "Might" do what you want. It's an interesting idea, but it's not directly applicable to what you want. From the web page: "Mygcc allows programmers to add their own checks that take into account syntax, control flow, and data flow information."
Links.
CocoaDev Objective-C Parsing - Worth looking at. Has some links to lexers and grammars.
Edit #1, the bonus links.
#Lothar makes a good point in his comment. I had actually intended to include lcc, but it looks like it got lost along the way.
lcc - The lcc C compiler. This is a C compiler that is particularly small, at least in terms of source code size. It also has a book, which I highly recommend.
tcc - The tcc C compiler. Not quite as pedagogical as lcc, but definitely still worth looking at.
poc - The poc Objective-C compiler. This is a "source to source" Objective-C compiler. It parses the Objective-C source code and emits C source code, which it then passes to gcc (well, usually gcc). Has a number of Objective-C extensions / features that aren't available in gcc. Definitely worth looking at.
You would implement this by implementing the ANSI C compiler first and then add some implementation specific pragmas and functions to it.
Yes i know this is cynical answer and i accept the downvotes.
One way to do it would be to write a preprocessor, which reads the source code for the type definitions and also replaces #encode... with the corresponding string literal.
Another approach, if your program is compiled with -g, would be to write a function that reads the type definition from the program's debug information at run-time, or use gdb or another program to read it for you and then reformat it as desired. The gdb ptype command can be used to print the definition of a particular type (or if that is insufficient there is also maint print type, which is sure to print far more information than you could possibly want).
If you are using a compiler that supports plugins (e.g. GCC 4.5), it may also be possible to write a compiler plugin for this. Your plugin could then take advantage of the type information that the compiler has already parsed. Obviously this approach would be very compiler-specific.

Resources