Is it possible somehow using C macros to make prefix notation and/or Lisp syntax? For example, I want to write (f a b) instead of f(a, b); for C compiler.
Just for fun!
In the Lisp language, especially in the ANSI-Specified Common Lisp dialect, the manner in which processing of code occurs is much different from that of a C/C++ compiler, and this allows for Common Lisp macros to do -- and furthermore, to be -- something completely different from macros in other, non-Lisp languages.
Lisp interpretation has three stages. First is the read phase, during which code is read and certain defined characters are expanded into Common Lisp code.
1) Read Time -- during this stage, all defined, dispatched, etcetera, macro characters are expanded into common lisp forms/code to be evaluated later.
2) Compile Time -- during this stage, all definitions take place and procedures are stored into memory in a (perhaps) specified namespace with given names by which to reference them during:
3) Run Time -- during this time, the program is essentially running, and all that is left is to call the procedures constructed in phases 1 & 2 inside the Lisp REPL.
Interpreted languages are much more likely to support the insane customization options Common Lisp's macro definition system affords us. For example, I can quite easily change my REPL such that if I feed it the following form:
CL_USER> (progn (sleep 10) (format *standard-output* "~%~%10 seconds have passed and the universal timestamp is now ~a~%~%" (get-universal-time))&
...I may have defined the character #\& as to take the expression it closes, write it into a lambda function, and put that function into a threaded process, giving the REPL/prompt back to the user immediately and 10 seconds later (allowing interpretation the entire time) format a short message to the standard output.
Unfortunately, C & C++ just aren't really built for this sort of customization.
Whatever it is that you're doing, I imagine the answer to be "do the entire thing in Common Lisp" quite honestly, and I don't say that out of bias or elitism, but out of simple experience and the benefit of years of observation.
While I'm close to it, let me just call it what it is and end on absolute subjective opinion: I've observed that all programmers I respect and who are doing work that is worthy of respect in the hacker community eventually end up using Common Lisp as their primary mode of operation.
Is it possible somehow using C macros to make prefix notation and/or
Lisp syntax?
Probably.
For example, I want to write (f a b) instead of f(a, b); for C
compiler
No, you do not want to do that.
Just for fun!
Enjoy The International Obfuscated C Code Contest
Related
I tried to generate C code starting from a scheme function and I do not manage to find any translator from scheme to C. I tried to convert this function to C.
(define f
(lambda(n)
(if (= n 0) 1
(* n (f (- n 1))))))
(display (f 10))
(newline)
I tried to use gambit (gsc) and it generates a C file that looks merely like a file to load in some interpreter, not a file containing a main function that can be executed.
Is there some application that generates C code that can be directly executed? The functions from standard scheme library like display should be linked with some object file.
EDIT:
My purpose is to understand the algorithms used by professional translators.
There are many such translators, dating back at least to the 1980s I think CHICKEN is a good current one.
If you want to use that:
get CHICKEN;
build & install it with the appropriate make incantation (this was painless for me on OSX, it should be very painless indeed on Linux therefore, although it may be harder on Windows);
stash your code in a file I'll call f.scm.
if you want to see the C code, compile with chicken f.scm which will produce a few hundred lines of incomprehensible C;
if you want just the executable, use csc to create it.
There is an extensive manual which you will need to read if you want to do anything nontrivial, such as linking in C libraries or talking to Scheme code from C.
Without knowing what you are after, this smells as if it may be an XY problem. In particular:
if you want a Scheme system which will allow you talk to code written in C, then you probably want a system with an FFI, not one that compiles to C;
if you want a Scheme system which will create native executables, then you probably want, well, a Scheme system which will create native executables, not one which compiles to C.
There are many examples of each of these. Some of these systems may also compile to, or via, C, but one does not depend on the other.
Finally, if you want to understand how Scheme compilers which target C work (or how Scheme compilers which target any language, including assembler), then the traditional approach probably still works best: find a well-written one for which source is available, and read & tinker with its source code.
Basically no scheme to C translators will do what you want. They create hideous code not meant to be read and they rely on the underlying C compiler to do much of the optimization. Chicken and Gambit make use of header files while I have Stalin, which does not but it is based on R4RS instead of R5RS and later.
You are probably better off reading Abdulaziz Ghuloum's paper An Incremental Approach to Compiler Construction (PDF) or perhaps Matt Mights articles on parsing, continuations and compilations. Longer down he actually has a Scheme to C and Scheme to Java with different approaches to closure conventions. In the end nothing beats doing it yourself so have a go!
I got a question from my fellow student friend about why +-/* don't need math.h library to work in C language.
<math.h> contains macro and function definitions for mathematical operations. Some of the functionality in <math.h> is required to be present according to the C Standard, but they still aren't intrinsically part of the grammar of the language, unlike the operators +, -, *, / and %.
Because they are in the standard of C and they are only one instruction in Assembly language. math.h is only the name of the library. That doesn't mean there are no math if you don't include it.
If you look at C Operators, notice they are all fairly simple operations that can be done on numbers and values without the need of a function call (sqrt()). These are part of the C standard and are a basic part of the language, present by default in every program.
The math.h Library contains far more complex mathematical operations, mostly in functions, not small assembly instructions. These do not need to be included in the language because not every program is going to need a square root or a cosine.
Basic operators are part of the grammar of the language. In a lib there are "higher functions" that are composed out of basic operators or other libs. So you can reduce everythink back to the basic constructs of a language ... certainly.
Arithmetic operators are built into the language grammar - they're not separate library calls like sqrt() or abs() or whatever. So, they don't need to have any sort of declaration in scope in order to function.
Primarily, the reason math.h is needed for some operations and not others is that the people who designed C decided to build some things into the core language and to keep some things in separate sets, including a set of things for math, a set of things for strings, a set of things for time, a set of things for input and output, and so on.
It would be possible to build the things in math.h into the core language. For example, sizeof is built into the language, so building sqrt into the language too would not require any change of grammar. Also, it would be theoretically possible to exclude some operations like * from the core language and require you to include math.h before using them. However, the language provides ways for declaring functions like sqrt but does not provide ways for declaring operators like *, so some changes to the grammar would have to be made to support this.
So, since it is possible the core language could include or exclude various things, then the reasons for various things being included or excluded are somewhat a matter of choice. Essentially, the basic arithmetic operations were considered fundamental and very useful, so they were made part of the core language, while other functions were not. There are various factors contributing to this.
One is a desire to avoid cluttering the language. If all of the functions declared in headers were part of the core language, then sqrt could be used only for the sqrt in math.h. A programmer could not use sqrt for their own variable name. This is fine for a few names, but, as the library grows, the chance there will be collisions between a name in a library and a name in regular source code grows.
Additionally, if there is existing source code and somebody has a bright idea for a new routine, adding the new routine name to the language might break existing code that is already using that name for a different purpose.
So, generally, we prefer to implement non-essential routines in separate sets, and then authors can choose to include the ones they want to use and learn, and they can leave out the ones they do not need and avoid problems.
Partitioning the libraries into sets like this also means that library routines not used by a program do not have to be linked into the final program executable, so the executable file can be smaller.
Additionally, it means C can be used in a variety of environments, such as a small machine that is not able to support the full math library. Somebody might want to run simple programs that just work with basic arithmetic on a small processor. If the core language of C is small, they can write such programs. If every C program had to include all of the routines on the libraries, it might not be possible to get C working on very small computers.
I need to do some source-to-source manipulations in Linux kernel. I tried to use clang for this purpose but there is a problem. Clang does preprocessing of the source code, i.e. macro and include expansion. This causes clang to sometimes produce broken C code in terms of Linux kernel. I can't maintain all the changes manually, since I expect to have thousands of changes per single file.
I tried ANTLR, but the public grammars available are incomplete and not suitable for such projects as Linux kernel.
So my question is the following. Are there any ways to perform source-to-source manipulations for a C code without preprocessing it?
So assume following code.
#define AAA 1
void f1(int a){
if(a == AAA)
printf("hello");
}
After applying source-to-source manipulation I want to get this
#define AAA 1
void f1(int a){
if(functionCall(a == AAA))
printf("hello");
}
But Clang, for instance, produces following code which does not fit my requirements, i.e. it expands macro AAA
#define AAA 1
void f1(int a){
if(functionCall(a == 1))
printf("hello");
}
I hope I was clear enough.
Edit
The above code is only an example. The source-to-source manipulations I want to do are not restricted with if() statement substitution, but also inserting unary operator in front of expression, replace arithmetic expression with its positive or negative value, etc.
Solution
There is one solution I found for my self. I use gcc in order to produce preprocessed source code and then apply Clang. Then I don't have any issues with macro expansion and includes, since that job is done by gcc. Thanks for the answers!
You may consider http://coccinelle.lip6.fr/ : it provides a nice semantics patching framwork.
An idea would be to replace all occurrences of
if(a == AAA)
with
if(functionCall(a == AAA))
You can do this easily using, e.g., the sed tool.
If you have a finite collection of patterns to be replaced you can write a sed script to perform the substitution.
Would this solve your problem?
Handling the preprocessor is one of the most difficult problems in applying transformations to C (and C++) code.
Our DMS Software Reengineering Toolkit with its C Front End come relatively close to doing this. DMS can parse C source code, preserving most preprocessor conditionals, macro defintions and uses.
It does so by allow preprocessor actions in "well-structured" places. Examples: #defines are allowed where declarations or statements can occur, macro calls and conditionals as replacements for many of the nonterminals in the language (e.g., function head, expression, statement, declarations) and in many non-structured places that people commonly place them (e.g, #if fooif (...) {#endif). It parses the source code and preprocessor directives as if they were part of one language (they ARE, its called "C"), and builds corresponding ASTs, which can be transformed and will regenerate correctly with the captured preprocessor directives. [This level of capability handles OP's example perfectly.]
Some directives are poorly placed (both in the syntax sense, e.g., across multiple fragments of the language, and the "you've got to be kidding" understandability sense). These DMS handles by expanding them away, with some guidance from the advance engineer ("alway expand this macro"). A less satisfactory approach is to hand-convert the unstructured preprocessor conditionals/macro calls into structured ones; this is a bit painful but more workable than one might expect since the bad cases occur with considerably less frequency than the good ones.
To do better than this, one needs to have symbol tables and flow analysis that take into account the preprocessor conditions, and capture all the preprocessor conditionals. We've done some experimental work with DMS to capture conditional declarations in the symbol table (seems to work fine), and we're just starting work on a scheme for the latter.
Not easy being green.
Clang maintains extremely accurate information about the original source code.
Most notably, the SourceManager is able to tell if a given token has been expanded from a macro or written as is, and Chandler Caruth recently implemented macro diagnosis which are able to display the actual macro expansion stack (at the various stages of expansions) tracing back to the actual written code (3.0).
Therefore, it is possible to use the generated AST and then rewrite the source code with all its macros still in place. You would have to query virtually every node to know whether it comes from a macro expansion or not, and if it does retrieve the original code of the expansion, but still it seems possible.
There is a rewriter module in Clang
You can dig up Chandler's code on the macro diagnosis stack
So I guess you should have all you need :) (And hope so because I won't be able to help much more :p)
I would advise to resort to Rose framework. Source is available on github.
This is a more theoretical question about macros (I think). I know macros take source code and produce object code without evaluating it, enabling programmers to create more versatile syntactic structures. If I had to classify these two macro systems, I'd say there was the "C style" macro and the "Lisp style" macro.
It seems that debugging macros can be a bit tricky because at runtime, the code that is actually running differs from the source.
How does the debugger keep track of the execution of the program in terms of the preprocessed source code? Is there a special "debug mode" that must be set to capture extra data about the macro?
In C, I can understand that you'd set a compile time switch for debugging, but how would an interpreted language, such as some forms of Lisp, do it?
Apologize for not trying this out, but the lisp toolchain requires more time than I have to spend to figure out.
I don't think there's a fundamental difference in "C style" and "Lisp style" macros in how they're compiled. Both transform the source before the compiler-proper sees it. The big difference is that C's macros use the C preprocessor (a weaker secondary language that's mostly for simple string substitution), while Lisp's macros are written in Lisp itself (and hence can do anything at all).
(As an aside: I haven't seen a non-compiled Lisp in a while ... certainly not since the turn of the century. But if anything, being interpreted would seem to make the macro debugging problem easier, not harder, since you have more information around.)
I agree with Michael: I haven't seen a debugger for C that handles macros at all. Code that uses macros gets transformed before anything happens. The "debug" mode for compiling C code generally just means it stores functions, types, variables, filenames, and such -- I don't think any of them store information about macros.
For debugging programs that use
macros, Lisp is pretty much the same
as C here: your debugger sees the
compiled code, not the macro
application. Typically macros are
kept simple, and debugged
independently before use, to avoid
the need for this, just like C.
For debugging the macros
themselves, before you go and use it somewhere, Lisp does have features
that make this easier than in C,
e.g., the repl and
macroexpand-1 (though in C
there is obviously a way to
macroexpand an entire file, fully, at
once). You can see the
before-and-after of a macroexpansion,
right in your editor, when you write
it.
I can't remember any time I ran across a situation where debugging into a macro definition itself would have been useful. Either it's a bug in the macro definition, in which case macroexpand-1 isolates the problem immediately, or it's a bug below that, in which case the normal debugging facilities work fine and I don't care that a macroexpansion occurred between two frames of my call stack.
In LispWorks developers can use the Stepper tool.
LispWorks provides a stepper, where one can step through the full macro expansion process.
You should really look into the kind of support that Racket has for debugging code with macros. This support has two aspects, as Ken mentions. On one hand there is the issue of debugging macros: in Common Lisp the best way to do that is to just expand macro forms manually. With CPP the situation is similar but more primitive -- you'd run the code through only the CPP expansion and inspect the result. However, both of these are insufficient for more involved macros, and this was the motivation for having a macro debugger in Racket -- it shows you the syntax expansion steps one by one, with additional gui-based indications for things like bound identifiers etc.
On the side of using macros, Racket has always been more advanced than other Scheme and Lisp implementations. The idea is that each expression (as a syntactic object) is the code plus additional data that contains its source location. This way when a form is a macro, the expanded code that has parts coming from the macro will have the correct source location -- from the definition of the macro rather than from its use (where the forms are not really present). Some Scheme and Lisp implementations will implement a limited for of this using the identity of subforms, as dmitry-vk mentioned.
I don't know about lisp macros (which I suspect are probably quite different than C macros) or debugging, but many - probably most - C/C++ debuggers do not handle source-level debugging of C preprocessor macros particularly well.
Generally, C/C++ debuggers they don't 'step' into the macro definition. If a macro expands into multiple statements, then the debugger will usually just stay on the same source line (where the macro is invoked) for each debugger 'step' operation.
This can make debugging macros a little more painful than they might otherwise be - yet another reason to avoid them in C/C++. If a macro is misbehaving in a truly mysterious way, I'll drop into assembly mode to debug it or expand the macro (either manually or using the compiler's switch). It's pretty rare that you have to go to that extreme; if you're writing macros that are that complicated, you're probably taking the wrong approach.
Usually in C source-level debugging has line granularity ("next" command) or instruction-level granularity ("step into"). Macro processors insert special directives into processed source that allow compiler to map compiled sequences of CPU instructions to source code lines.
In Lisp there exists no convention between macros and compiler to track source code to compiled code mapping, so it is not always possible to do single-stepping in source code.
Obvious option is to do single stepping in macroexpanded code. Compiler already sees final, expanded, version of code and can track source code to machine code mapping.
Other option is to use the fact that lisp expressions during manipulation have identity. If the macro is simple and just does destructuring and pasting code into template then some expressions of expanded code will be identical (with respect to EQ comparison) to expressions that were read from source code. In this case compiler can map some expressions from expanded code to source code.
The simple answer is that it is complicated ;-) There are several different things that contribute to being able to debug a program, and even more for tracking macros.
In C and C++, the preprocessor is used to expand macros and includes into actual source code. The originating filenames and line numbers are tracked in this expanded source file using #line directives.
http://msdn.microsoft.com/en-us/library/b5w2czay(VS.80).aspx
When a C or C++ program is compiled with debugging enabled, the assembler generates additional information in the object file that tracks source lines, symbol names, type descriptors, etc.
http://sources.redhat.com/gdb/onlinedocs/stabs.html
The operating system has features that make it possible for a debugger to attach to a process and control the process execution; pausing, single stepping, etc.
When a debugger is attached to the program, it translates the process stack and program counter back into symbolic form by looking up the meaning of program addresses in the debugging information.
Dynamic languages typically execute in a virtual machine, whether it is an interpreter or a bytecode VM. It is the VM that provides hooks to allow a debugger to control program flow and inspect program state.
Is it legal to call a C compiler written in C or a PHP interpreter written in PHP metacircular? Is this definition valid only for languages of a specific type, like Lisp? In short, what are the conditions that an interpreter should satisfy for being called Metacircular?
A metacircular interpreter is an interpreter written in a (possibly more basic) implementation of the same language. This is usually done to experiment with adding new features to a language, or creating a different dialect.
The reason this process is associated with Lisp is because of the highly lucid paper "The Art of the Interpreter", which shows several metacircular interpreters based on Scheme. (The paper is the kernel for the book SICP, and its fourth chapter works through others that create e.g. a lazily-evaluated Scheme.)
This is also vastly easier to do in a "homoiconic" language (a language whose code can be manipulated as data at runtime), such as Lisp, Prolog, and Forth.
As to your direct question - the C compiler wouldn't be an interpreter at all. A compiler written in its own language is 'self-hosting', which is a similar property, but more related to bootstrapping. A PHP interpreter in PHP probably wouldn't count, since you would likely be re-implementing a nontrivial amount of the language in the process. The major benefit of a conventional metacircular interpreter is that doing so isn't necessary - you can plug in the existing parser, garbage collection (if any), etc., and just write a top-level evaluator with different semantics. In Scheme or Prolog, it's often less than a page of code.
Here is a definition from the wikipedia page for metacircular:
A meta-circular evaluator is a special
case of a self-interpreter in which
the existing facilities of the parent
interpreter are directly applied to
the source code being interpreted,
without any need for additional
implementation.
So the answer is no in both cases:
A C compiler is not an interpreter (evaluator). It translates a program from one form to another without executing it.
A (hypothetical) PHP interpreter written in PHP would be a self interpreter, but not necessarily metacircular.
To complement the above answers: http://www.c2.com/cgi/wiki?MetaCircularEvaluator
Lisp written in Lisp implements "eval" by calling "eval". But there is
no "eval" in many other languages (and if there is, it has different
semantics), so instead a completely new language system would have to
be written, one which gives a detailed algorithm for "eval" -- which
was not necessary in the metacircular case. And that is the magic of
MetaCircularEvaluators: they reflect an underlying magic of the
languages in which they are possible.
As i understand it, a metacircular interpreter is an interpreter that can interpret itself.
A compiler only translates code, and doesn't execute it.
Any Turing-complete language is mathematically able to emulate any logical computation, so here's an example using Python. Instead of using CPython to translate this code to CPU instructions and execute it, you could also use PyPy. The latter is bootstrapped, so fulfills some arbitrary criterion that some people use to define a metacircular interpreter.
"""
Metacircular Python interpreter with macro feature.
By Cees Timmerman, 14aug13.
"""
import re
def meta_python_exec(code):
# Optional meta feature.
re_macros = re.compile("^#define (\S+) ([^\r\n]+)", re.MULTILINE)
macros = re_macros.findall(code)
code = re_macros.sub("", code)
for m in macros:
code = code.replace(m[0], m[1])
# Run the code.
exec(code)
if __name__ == "__main__":
#code = open("metacircular_overflow.py", "r").read() # Causes a stack overflow in Python 3.2.3, but simply raises "RuntimeError: maximum recursion depth exceeded while calling a Python object" in Python 2.7.3.
code = "#define 1 2\r\nprint(1 + 1)"
meta_python_exec(code)
A C compiler written in C is not a MetaCircularEvaluator, because the
compiler must specify extremely detailed and precise semantics for
each and every construct. The fact that the compiler is written in the
target language does not help at all; the same algorithms could be
translated into Pascal or Java or Ada or Cobol, and it would still be
a perfectly good C compiler.
By contrast, a MetaCircularInterpreter
for Lisp can't be translated into a non-Lisp language. That's right,
cannot be -- at least, not in any simple one-to one fashion. Lisp
written in Lisp implements "eval" by calling "eval". But there is no
"eval" in many other languages (and if there is, it has different
semantics), so instead a completely new language system would have to
be written, one which gives a detailed algorithm for "eval" -- which
was not necessary in the metacircular case.
And that is the magic of
MetaCircularEvaluators: they reflect an underlying magic of the
languages in which they are possible.