Pascal to C converter [closed] - c

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm writing program which translate Pascal to C and need some help. I started with scanner generator Flex. I defined some rules and created scanner which is working more or less ok. It breaks Pascal syntax into tokens, for now it's only printing what it found. But I have no idea what should I do next. Are there any articles or books covering this subject? What is the next step?

Why do you want to do such a Pascal to C converter?
If you just want to run some Pascal programs, it is simpler to use (or improve) existing compilers like gpc, or Pascal to C translators, like e.g. p2c
If you want to convert hand-written Pascal code to humanly-readable (and improvable) C code, the task is much more difficult; in particular, you probably want to convert the indentation, the comments, keep the same names as much as possible -but avoiding clashes with system names- etc!
You always want to parse some abstract syntax tree, but the precise nature of these trees is different. Perhaps flex + bison or even ANTLR may or not be adequate (you can always write a hand-written parser). Also, error recovery may or not be important to you (aborting on the first syntax error is very easy; trying to make sense of an ill-written syntactically-incorrect Pascal source is quite hard).
If you want to build a toy Pascal compiler, consider using LLVM (or perhaps even GCC middle-end and back-ends)

You might want to take a look at "Translating Between Programming Languages Using A Canonical Representation And Attribute Grammar Inversion" and references therein.

The most common approach would be to build a parse tree in your front end, and then walk through that tree outputting the equivalent C in the back end. This gives you the flexibility to perform any reordering of declarations that's required (IIRC Pascal supports use before declaration, but C doesn't). If you're using flex for the scanner, tradition would dictate using bison for the parser, although there are alternatives. If you look, you can probably find a freely available Pascal syntax in the format expected by bison.

You have to know the Pascal grammar, the C grammar and built (design) a "something" (i.e. a grammar or an automata...) that can translate every Pascal rule in the corresponding C rule.
Than, once you have your tokenized stream, using some method like LR, you can find the semantic tree which correspond to the sequence of Pascal rule applied and convert every rule in the corresponding C rule (this can be easly done with Bison).
Pay attention that Pascal and C have not Context Free grammars, so more control will be necessary.

Related

What kind of lexer/parser was used in the very first C compiler? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
In the early 1970s, Dennis Ritchie wrote the very first C compiler.
In the year 2017, I wanted to write a C compiler. Books like Deep C Secrets (Peter Van Der Linden) say that C was, above all else, designed to be easy to compile. But I've been having an inordinate amount of trouble with it.
For starters, it's already relatively difficult to come up with Lex/Yacc specifications for the C language, and these tools didn't even exist yet when Ritchie made his compiler!
Plus, there are a great many examples of surprisingly small C compilers that do not use any help from Lex & Yacc. (Check out this tiny obfuscated C compiler from Fabrice Bellard. Note that his "production" tinycc source is actually quite a bit longer, most likely in an effort to accommodate more architectures, and to be more readable)
So what am I missing here? What kind of lexer/parser did Ritchie use in his compiler? Is there some easier way of writing compilers that I just haven't stumbled onto?
Yacc's name is an abbreviation for "yet another compiler compiler", which strongly suggests that it was neither the first nor the second such tool.
Indeed, the Wikipedia article on History of Compiler Construction notes that
In the early 1960s, Robert McClure at Texas Instruments invented a compiler-compiler called TMG, the name taken from "transmogrification". In the following years TMG was ported to several UNIVAC and IBM mainframe computers.
…
Not long after Ken Thompson wrote the first version of Unix for the PDP-7 in 1969, Doug McIlroy created the new system's first higher-level language: an implementation of McClure's TMG. TMG was also the compiler definition tool used by Ken Thompson to write the compiler for the B language on his PDP-7 in 1970. B was the immediate ancestor of C.
That's not quite an answer to your question, but it provides some possibilities.
Original answer:
I wouldn't be at all surprised if Ritchie just banged together a hand-built top-down or operator precedence parser. The techniques were well-known, and the original C language presented few challenges. But parser generating tools definitely existed.
Postscript:
A comment on the OP by Alexey Frunze points to this early version of the C compiler. It's basically a recursive-descent top-down parser, up to the point where expressions need to be parsed at which point it uses a shunting-yard-like operator precedence grammar. (See the function tree in the first source file for the expression parser.) This style of starting with a top-down algorithm and switching to a bottom-up algorithm (such as operator-precedence) when needed is sometimes called "left corner" (LC) parsing.
So that's basically the architecture which I said wouldn't surprise me, and it didn't :).
It's worth noting that the compiler unearthed by Alexey (and also by #Torek in a comment to this post) does not handle anything close to what we generally consider the C language these days. In particular, it handles only a small subset of the declaration syntax (no structs or unions, for example), which is probably the most complicated part of the K&R C grammar. So it does not answer your question about how to produce a "simple" parser for C.
C is (mostly) parseable with an LALR(1) grammar, although you need to implement some version of the "lexer hack" in order to correctly parse cast expressions. The input to the parser (translation phase 7) will be a stream of tokens produced by the preprocessing code (translation phase 4, probably incorporating phases 5 and 6), which itself may draw upon a (f)lex tokenizer (phase 3) whose input will have been sanitized in some fashion according to phases 1 and 2. (See § 5.1.1.2 for a precise definition of the phases).
Sadly, (f)lex was not designed to be part of a pipeline; they really want to just handle the task of reading the source. However, flex can be convinced to let you provide chunks of input by redefining the YY_INPUT macro. Handling trigraphs (if you chose to do that) and line continuations can be done using a simple state machine; it's convenient that these transformations only shrink the input, simplifying handling of the maximum input length parameter to YY_INPUT. (Don't provide input one character at a time as suggested by the example in the flex manual.)
Since the preprocessor must produce a stream of tokens (at this point, whitespace is no longer important), it is convenient to use bison's push-parser interface. (Indeed, it is very often more convenient to use the push API.) If you take that suggestion, you will end up with phase 4 as the top-level driver of the parse.
You could hand-build a preprocessor-directive parser, but getting #if expressions and pragmas right suggests the use of a separate bison parser for preprocessing.
If you just want to learn how to build a compiler, you might want to start with a simpler language such as Tiger, the language used as a running example in Andrew Appel's excellent textbooks on compiler construction.

Advantages of function prototyping [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
After half an hour of research on the Internet, I couldn't find any reasoned discussion of the advantages of function prototyping.
I manage in Java/Android, and am beginning a C course. Prototyping looks cumbersome compared to my previous experience, and I would like to know the reason(s) why it still exists in 2013.
I understand that life was more difficult for Ritchie and pals; however, a compiler could be written today that would generate a list of functions in a first pass, then do its usual thing using that list of functions as a current compiler would use a header file.
It probably can't persist either only because of backwards compatibility. It would be feasible to create a compiler that could switch between current operation mode, and the hypothetical new mode I just described, depending on the code it is shown.
If prototyping persists, it must therefore have an interest for the programmer, not for the compiler programmer. Am I right or wrong - and where can I find a reasoned discussion of the advantages of function prototyping vs. no prototyping?
You're forgetting that in C you can call a function whose source you don't have.
C supports binary distribution of code, which is quite common for (commercial) libraries.
You get a header that declares the API (all functions and data types) and the code in a .lib (or whatever your platform uses) file. This is typically the case for all of C's standard library; you don't always get the source to the compiler vendor's library but you must still be able to call the functions, of course.
For that to work, the C compiler must have the declarations when processing your code, so it can generate the proper arguments for the call, and of course deal with any return value correctly.
It's not enough to just rely on your source, since if you do
GRAPHICSAPI_SetColorRGB(1, 1, 1);
but the actual declaration is:
void GRAPHICSAPI_SetColorRGB(double red, double green, double blue);
the compiler cannot magically convert your int arguments to double if it doesn't have the prototype. Of course, having the prototype makes it possible to error-check that the call makes sense, which is very valuable.
Interesting idea about having the compiler have a first look over all source files to take notice of all functions prototypes.
However
libraries (object code) need to have their declarations somewhere, this is why the includes exist
Also I find convenient to be able to grep the includes as "free text", like
grep alloc /usr/includes/*

Are C++ comments considered bad style in C? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was discussing C programming styles with some students and when we were talking about comments, one of them noted that he doesn't use C++ comments in C code because they are a bad idea. Turns out that it was based on personal experience with multi-line C++ comments, but it's not the first time I've heard that claim. So, is // considered harmful and if so, then why?
It depends what version of C you are using. C 99 allows // as a comment, whereas C 89 doesn't.
If you want to be as backward compatible as possible, don't use them. But, I think this is an extreme fringe case. I'm willing to bet almost everyone uses C 99.
Edit: Any recent version of GCC uses most of C99. You can find more info in Wikipedia.
C++ comments are not allowed as per the MISRA-C 2004 standard. Certain industries (automotive, specifically) prize MISRA compliant code and therefore, C++ comments are not allowed. I believe the same goes for other static code checking tools such as LDRA, etc...
This doesn't make them inherently bad, but it does mean that if you get into certain industries and want to work professionally, you will be actively discouraged from using C++ style comments.
If you use C++ comments in C, chances are that some C compilers won't accept your code. I would consider this harmful.
C++-style comments were added to C with the (not yet widely supported) C99 standard. While the standard itself isn't widely supported in full, some parts of it (like the C++ style comments), are supported in almost every compiler by now. Considering that they were added, it means that there's a need for them, so it's easy to figure out that it wouldn't be considered bad style -- especially if you set yourself guidelines on where to use which.
Only reason not to use them is if you want to write a well-formed C89 compilant program.
One common reason why people use // instead of /* */ is that you can "nest" the former and not the latter, and so you can comment out code that has comments in it. But you should really be using #if 0 for commenting out code in C anyways.
This really shouldn't be of any concern these days, unless you're maintaining code for written specifically to compile with ancient compilers and the likes.
"//" is supported in C99, but in C89 (which is the by far most supported dialect) it's not supported.

Testing Frameworks for C [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
After doing some work with Ruby, Rails, and RSpec last summer and I learned to TATFT. Now I can't write code without writing tests first.
I'm taking a programming course in C next year, and I would like to learn to write C test-driven. Is it a good idea (or even possible) to do TDD with C? If so, are there any good testing frameworks compatible with C?
Is it a good idea (or even possible) to do TDD with C?
Yes, it obviously is a good idea, as with other languages. However, due to the procedural nature of the language, it comes with its some more difficulties.
Static functions quickly get in the way. This can be solved by including the source file under test, or defining a STATIC macro that only means static when compiling the code for production - not unit test
#if defined(UNIT_TEST)
#define STATIC
#else
#define STATIC static
#endif
There is no isolation: there is only one global context. With an OO language you can just instantiate one object (or a cluster of collaborating objects) to test it (them), you can also use mock objects.
With C, you can, however, override functions just by re-defining them in your unit tests. This works fine on Unix-like systems where the linker invokes the first function he is finding - I'm not sure on Windows.
If so, are there any good testing
frameworks compatible with C?
You can start small with minunit. The learning curve is flat as it only is four macros long.
EDIT: There are two lists of UT frameworks for the C language, that were mentioned in other answers and I didn't repeat : one on Wikepedia and another one on xprogramming.com.
We use "check" from http://check.sourceforge.net/, it provides basic functionality for testsuites and tests (similiar to junit), but is fairly lightweight. On feature I like is that it handles if your test dumps code and considers that a failure.
Also note "check" is a "C" based framework rather than a "C++" one.
I just discovered CSpec, which does BDD in C. Doesn't look very mature, but it reminds me of RSpec.
There are a number of unit testing harnesses for C. Wikipedia has a much better list than I could assemble here.
If you are actually using a C++ compiler, but using it in 'C' mode by compiling .c files, then, also, any of the C++ unit test frameworks will work OK.
Take a look at the original list of xUnit frameworks at http://www.xprogramming.com/software.htm
This similar question also has a lot of answers "Unit Testing C Code"
I used RCUNIT, it is mature and has everything I need. I have also used ipl canata which is great but is very expensive so that is probability not what you want.
You certainly can do unit testing in C (I do). The framework I use (for the Windows platform) is CunitWin32
Here is the list of unit test frameworks for c:
http://en.wikipedia.org/wiki/List_of_unit_testing_frameworks#C
enjoy it!
So a proper C programmer will tell you that because C is statically typed it catches all bugs you might have and therefore you don't need a unit test framework.
They are full of shit, but that's the argument for statically type languages like C.
I think you should probably take the approach that Adobe did with Photoshop. Write a series of core libraries in C, and then all the glue and real logic of the application should be in a higher level language. Photoshop is mostly written in Lua, but many languages work for this.

Which language is useful to create a report for a valid C program [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Can anyone suggest me a helpful programming language which can be used to create a tool which will analyse the given C program and generate a txt report or html report containing information about the given program (function list, variable list etc).
The program I intend to build is similar to doxygen but i want it for my personal use.
ctags, perhaps?
Ctags generates an index (or tag) file of language objects found in source files that allows these items to be quickly and easily located by a text editor or other utility. A tag signifies a language object for which an index entry is available (or, alternatively, the index entry created for that object).
Both Python and Perl have excellent string processing capabilities.
I'd suggest using something like ctags to parse the program, and just create a script to read the ctags file and output in txt/html.
The file format used by ctags is well-defined so that other programs can read it. See http://ctags.sourceforge.net for more information on ctags itself and the file it uses.
You're opening a big can of worms, this isn't an effective use of your time, blah blah blah, etc.
Moving on to an answer, if you're talking about anything beyond trivial analysis and you need accuracy, you will need to parse the C source code. You can do that in any language, but you will almost certainly want to generate your parser from a high-level grammar. There are any number of tools for that. A modern and particularly powerful parser generator is ANTLR; there are a number of ANTLR grammars for C, including easier-to-work-with subsets.
Look into scripting languages. I'd recommend Python or Perl.
Haskell has a relatively recent language-c project http://www.sivity.net/projects/language.c which allows the analysis of C code.
If you are familiar with Haskell, then it might be worth a look. Even if you are not, it might be interesting to have a go.
If it's a programming language you want then I'd say something which is known for string processing power so that would mean perl.
However the task you require can be rather complicated since you need to 'know' the language, so you would require to follow the same steps the compiler does, being lexical and grammatical analyses on the language (think flex, think yacc) in order to truly 'know' what meaning those strings have.
Perhaps the best starting point is to take a look at doxygen and try to reuses as much of the work done there as possible
Lex/yacc are appropriate for building parsers.
pycparser is a complete parser for ANSI C89/C90 written in pure Python. It's being widely used to analyze C source code for various needs. It comes with some example code, such as listing all the function definitions in files, etc.

Resources