Converting "c-like language" to "custom language" with parser [closed] - c

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I have a collection of files written in a language 'A' that need to be translated into corresponding files of a language 'B'.
I want to create a program/parser that can automate this task (probably rather a toolchain than a single program). However,
I am struggling to find a suitable choice for the programs of my toolchain.
Language A is embedded software code, i.e. low-level language. It is 90% standard C-Code and 10% "custom" Code, i.e.
the files also contain small segments that cannot be understood by a standard C compiler. The 90% C-Code is not any random C-construct that is possible in C (this would be hard to parse concerning semantics), but follows certain recurring expressions, actions and patterns. And it always follows these patterns in the (more or less) same way. It does mostly perform write operations to the memory, and does not include complex structures such as C-struct or enum etc..
Example of regular low-level C-Code in language A:
#define MYMACRO 0x123
uint32_t regAddr;
regAddr = MYMACRO;
*(uint32_t*)(regAddr) = 0xdeadbeef;
Example for "custom code" in language A:
custom_printf("hello world! Cpu number: %d \n", cpu_nr);
Language B is a 100% custom language. This transformation is necessary, in order to work with the file in another tool for debugging purposes. Translation of the example above would look roughly like this:
definemacro MYMACRO 0x123
define_local_int regAddr
localint.set regAddr = MYMACRO
data.write regAddr 0xdeadbeef
Note: I am well aware that Stackoverflow is not meant to be a site for open discussions about "which tool do you prefer?". But I think this question
is rather one like "I need at least ONE meaningful toolset that gets the job done", i.e. there are probably not so many sensible options for discussion anyway.
These were my considerations and approaches so far:
Performance is NOT relevant for my toolchain. It only should be easy to implement and adapt to changes.
First approach: As language A is mostly C-Code, I first thought of the pycparser Python Plugin, which provides a C-parser that parses C-Code
into an AST (Abstract Syntax Tree). My plan was to read in the language-A files, and then write a Python program that creates
language-B files out of the AST. However, I found it difficult to adapt/teach the pycparser plugin in order to fully support the 10% custom properties of language A.
Second approach: Using 'general-purpose parser generators' such as Yacc/Bison or ANTLR. Here however, I am
not sure which of the tools suits my needs (Yacc/Bison with LALR parser or ANTLR with LL parser) and how to set up an appropriate
toolchain that includes such a parser and then processes (e.g. with Python) the data structure that the generated parser creates in order to create the custom language B. It would also be helpful if the parser generator of choice provides an existing C-language definition that can easily adapted for the 10% custom C-language part.
I should also mention that I have never worked with general-purpose parsers before.
Could anybody please give me some advice about a meaningful set of tools for this task?
Edit:
I apologize if this seems as a vague question, I tried to put it as precisely as I could.
I added an example for languages A and B to make the composition of the languages more clear, and in order to show that language A follows certain recurring patterns that can be easily understood concerning semantics.
If this edit does not improve the clarity and broadness, I will repost on programmers as was suggested.
Edit2:
Alright, as the topic clearly still seems to be deplaced here, I herewith withdraw the question. I already received some valuable input from the first few posters, which enouraged me to make further experiments with the general purpose parser generators.

Related

Which would compile and/or calculate the first 100 numbers of the fibonacci sequence faster: C or Brainfuck [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I know very little about what makes a language "fast", but it stands to reason for me that a language designed for extreme minimalism would also be extremely fast, right?
C is far closer to English than BrainFuck, and the compiler size for BrainFuck is remarkably small at 1024 bytes, almost nothing compared to the Tiny C Compiler, which is sized at around 100 KB.
However, all the websites online treat C as the fastest language bar bytecode or assembly.
(Clarity edit to take question off hold)
If I made the same program in C and BrainFuck (which, for example, calculated the first 100 numbers of the fibonacci sequence) , which one will complete the task faster at runtime? Which one would compile faster?
Yes and no. There are language features that will make a language slower (in some cases garbage collection, dynamic types only known at runtime, …) and others that will make it more complex but allow the compiler more freedom.
Case in point: the constexpr keyword of C++ is somewhat complex to implement, but allows the programmer to tell the compiler "This function application has to be replaceable by its result". In extreme cases, this allows the compiler to replace costly function calls (e.g. a fast fourier transform) with a constant result without any runtime cost.
Compiled C code is very fast, because it has almost no features that don't map down directly to assembler and it has almost half a century of compiler optimizations.
It depends on the ability of the compiler.
A compiler converts a source file into an executable. There are four phases (for a C) program to become an executable:
Pre-processing
Compilation
Assembly
Linking
An abstract syntax tree (AST) is usually created. The AST is often used to generate an intermediate representation (IR), sometimes called an intermediate language, for the code generation.
This intermediate language is actually compiled to machine code and it does not matter, from which high level language (in this sense brainfuck a high level language) you got this from.

Is C Compiled or/and Interpreted? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
There are a lot of answers and quotations about "Compiled vs. Interpreted" and I do understand the differences between them.
When it comes to C, I am not sure: Is C a compiled or an Interpreted language, or both? And, if both I will really thankful if you add a bit of explanation.
A programming language is simply a textual representation of abstract principles. It is not compiled or interpreted - it is just text.
A compiler will take the language and translate it into machine language (assembly code), which can easily be translated into machine instructions (most systems use a binary encoding, but there are some "fuzzy" systems as well).
An interpreter will take the language and translate it into some byte-code interpretation that can be easily translated into a binary encoding on supported platforms.
The difference between the two is when that change occurs. A compiler typically will convert the text to machine language and package it into a binary file before the user runs the program (e.g. when the programmer is compiling it). An interpreter will typically do that conversion when the user is running the program. There are trade-offs for both approaches.
The whole point here is that the language itself is not compiled nor interpreted; it is just a textual standard. The implementation details of turning that text into machine instructions is where the compilation or interpretation choice is made.
It's typically compiled, although there is of course nothing preventing people from implementing interpreters.
It's generally wrong to classify languages as either/or; what is the language going to do? It's just a spec on paper, it can't prevent people from implementing it as either a compiler or an interpreter, or some combination/hybrid approach.
There are languages which are designed to make compilation easy, by giving the user only features that directly map to machine instructions, such as arithmetic, pointer manipulation, function calls (and indirect function calls which give you virtual dispatch). Interpretation of these is generally also easy, but particularly poor performance. C is one of these.
Other languages are designed for interpretation. These often have dynamic typing, lazy dispatch, dynamic (not lexical) scope of closures, reflection, dynamic codegen, and other features that make compilation incredibly difficult. Of course difficult is not the same as impossible, and some of these languages do end up with compilers as a result of Herculean efforts.

What is the use of finite automata? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
What is the use of finite automata? And all the concepts that we study in the theory of computation. I've never seen their uses yet.
They are the theoretical underpinnings of concepts widely used in computer science and programming, and understanding them helps you better understand how to use them (and what their limits are). The three basic ones you should encounter are, in increasing order of power:
Finite automata, which are equivalent to regular expressions. Regular expressions are widely used in programming for matching strings and extracting text. They are a simple method of describing a set of valid strings using basic characters, grouping, and repitition. They can do a lot, but they can't match balanced sets of parentheses.
Push-down automata, equivalent to context-free grammars. Text/input parsers and compilers use these when regular expressions aren't powerful enough (and one of the things you learn in studying finite automata is what regular expressions can't do, which is crucial to knowing when to write a regular expression and when to use something more complicated). Context-free grammars can describe "languages" (sets of valid strings) where the validity at a certain point in parsing the string does not depend on what else has been seen.
Turing machines, equivalent to general computation (anything you can do with a computer). Some of the things you learn when you cover these enable you to understand the limits of computing itself. A good theory course will teach you about the Halting Problem, which enables you to identify problems for which it is impossible to write a program. Once you've identified such a problem, then you know to stop trying (or refine it to something that is possible).
Understanding the theory and limitations of these various computing mechanisms enable you to better understand problems and programs and think more deeply about programming.
There was a request-for-work published about a year ago on one of the freelance coding exchange sites asking, essentially, for a program which solved the Halting Problem. Several people responded with offers, saying they "understood the requirements" and could "start immediately". It was impossible to write a program which met the requirements. Understanding computing theory enables you to not be that bidder who demonstrates, in public, that he really doesn't understand computing (and doesn't bother to thoroughly investigate a problem before declaring understanding and making an offer).
Finite automata are very useful for communication protocols and for matching strings against regular expressions.
Automatons are used in hardware and software applications. Please read the implementation section here http://en.wikipedia.org/wiki/Finite-state_machine#Implementation
There is also a notion of Automata-based programming. Please check this http://en.wikipedia.org/wiki/Automata-based_programming
cheers
Every GUI, every workflow can be treated as a finite automata. Think of each page as a state and transitions occurring due to certain events. Perhaps you can't proceed to a certain page or the next stage of the workflow until a series of conditions are met.
Finite automata are e.g. used to parse formal languages. This means that finite automata are very usefull in the creation of compiler and interpreter techniques.
Historicaly, the finite state machine showed that a lot of problems can be solved by a very simple automate.
Try taking a compilers course. You will very likely make a compiler or interpreter using a finite state automaton to implement a recursive descent parser.
For example to manage states of some objects with defined life cycle.
As example of this: orders in book shop.
An order can have the following states:
-ordered
-payed
-shipping
-done
and program of the finite automata knows how one state can be changed by other.
The finite automata is a type of state machine (SM). In general SMs are used for parsing formal languages.
You can use as a formal language many entities, not only characters.
And regular language is a type of formal language.
There are some theory that show, what type of the SM is better to parse a regular language:
http://en.wikipedia.org/wiki/Regular_language

C for R programmers - recommended resources/approaches once past the basics [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I would like to improve my C skills in order to be more competent at converting R code to C where this would be useful. What hints do people have that will help me on my way?
Background: I followed an online Intro to C course a few years ago and that plus Writing R Extensions and S Programming (Venables & Ripley) enabled me to convert bottleneck operations to C, e.g. computing the product of submatrices (did I re-invent the wheel there?). However I would like to go a bit beyond this, e.g. converting larger chunks of code, making use of linear algebra routines etc.
No doubt I have more to learn from the resources I used before, but I wondered if there were others that people recommend? Working through examples is obviously one way to learn more: Brian Ripley gave a couple of examples of moving from S prototypes to S+C in this workshop on Efficient Programming in S and a more recent Bioconductor workshop Advanced R for Bioinformatics (sorry can't post hyperlink) includes a lab on writing an R+C algorithm. More like this, or other suggestions would be appreciated.
That is a very interesting question. As it happens, I had learned C and C++ before moving to R so that may have made it "easier" for me to add C/C++ to R.
But even with that, I would be among the first to say that adding pure C to R is hellishly complicated because of the different macros and R-internals at the C level that you need to learn.
Which leads me to my favorite argument: Use an additional abstraction layer such as the Rcpp package. It hides a lot of the nasty details. And I hope that you don't need to know a lot of C++ to make use of it. One example of a package using it is the small earthmovdist package on R-Forge which uses
Rcpp wrapper classes to interface one particular metric.
Edit 1: For example, see the main function of earthmovdist here which should hopefully be easy enough to read, possibly with the (short)
Rcpp wrapper classes package manual at
one's side.
Edit 2: Three quick reasons why I consider C++ to be more appropriate and R-alike:
using Rcpp wrapper classes means you never
have to use PROTECT and UNPROTECT, which is a frequent source of error and heap
corruption if not mapped
using Rcpp and with STL container classes like vector etc means you never have to explicitly call malloc() / free() or new / deletewhich removes another frequent source of error.
Rcpp allows you to wrap everything in try / catch blocks at the C++ level and reports the exception back to R --- so no sudden seg.faults and program deaths.
That said, choice of language is a very personal decision, and many users are of course perfectly happy with the lower-level interface between C and R.
I have struggled with this issue as well.
If the issue is to improve command of C, there are plenty of book lists on the subject. They all start with K&R. I enjoyed "Expert C Programming" by P. van der Linden and "C primer" by S. Prata. Any reference on the C standard library works.
If the issue is to interface C to R, other then the aforementioned official R document, you can check out this Harvard course, and this quick start guide. I have only passed scalar and arrays to C, and honestly wouldn't know how to interface complex data structures.
If the issue is to interface C++ to R, or build C++ skills, I can't really answer as I don't use much C++. A good starting point for me was "C++ the Core Language" (O'Reilly). Very simple, primitive, but useful for people coming from C.
My primary recommendation is to look at other packages. Needless to say, all packages don't use C code, so you will need to find examples that do. You can download the source code for all packages off CRAN, and in some instances, you can also browse them on R-Forge. Some R projects are also maintained on Google Code or sites like github (for instance, ggplot2). You will find the C code in the "src" directory.
Here's an example of a src directory with some C source code on R-Forge for the "survival" package.
Here's an example with C source code on Google Code for the "rpostgresql" package.
Here's an example with C source code on github for "rqtl".
In general, think about what you're trying to accomplish, and then look at packages that do similar things.
The "C Programming Language" book is probably still the most widely used, so you may want to have that on your bookshelf. The following free book is also a useful resource: http://publications.gbdirect.co.uk/c_book/
"What is the best book to learn C?" is a perenial SO question. (The middle link is probably the best.)
As for R-specific ways of learning C, I've found it instructive to download the R source code and take a look at some the .Internal code.
EDIT: Someone else had just asked "What to read after K&R?"
If your goal is to use C to get rid of bottlenecks you'll need a good numerical library in C. There are lots, but I've found gsl (GNU Scientific Library) pretty useful.
http://www.gnu.org/software/gsl/
There is also the classic book "Numerical recipes in C" which provides an overview of important numerical techniques (though I don't recommend using their code verbatim).

Resources for learning C program design [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Coming from a OO background (C#/java) I'm looking for resources to learn how to design pure C programs well.
Whilst i'm familiar with the syntax of C, and i can write small programs, i'm unsure of the approach to take for larger applications, and what techniques to employ. Anything you guys can recommend.
EDIT: I'm happy to completly abandon OO for the purposes of programming in C, my interest is in learning how to structure a program without OO, I want to learn about good ways of designing programs in procedural languages such as C.
This posting has a list of UNIX books which includes most of the classic C/Unix works. For C programming on Windows, Petzold's Programming Windows is probably the best start.
For C program design, some of the UNIX programming books will tell you snippets but I'm not aware of a 'C program architecture' book.
If you're used to java, some tips for C programming are:
Make use of stack. Often when you call a procedure you will want to have variables allocated in the caller's stack frame and pass pointers to them into the procedure you want to call. This will be substantially faster than dynamically allocating memory with malloc() and much less error-prone. Do this wherever appropriate.
C doesn't do garbage collection, so dynamically allocating data items is more fiddly and you have to keep track of them to make sure they get freed. Variables allocated on the stack (see 1) are more 'idiomatic' where they are applicable. Plus, you don't have to free them - this is a bonus for local variables.
Apropos of (2), consider an architecture where your functions return a status or error code and pass data in and out using the stack as per (1).
Get to know what setjmp() and longjmp() do. They can be quite useful for generic error handler mechanisms in lieu of structured exception handling functionality.
C does not support exceptions. See (3).
Lint is your friend. Splint is even friendlier.
Learn what the preprocessor does and what you shouldn't do with it even if you can.
Learn the ins and outs of endian-ness, word alignment, pointer arithmetic and other low-level architectural arcana. Contrary to popular opinion these are not rocket science. If you're feeling keen, try dabbling in assembly language and get a working knowledge of that. It will do much for your understanding of what's going on in your C program.
C has no concept of module scope, so plan your use of includes, prototype declarations, and use of extern and static to make private scopes and import identifiers.
GUI programming in C is tedious on all platforms.
Apropos of (10) learn the C API of at least one scripting language such as Tcl, Lua or Python. In many cases, the best use of C is as a core high-performance engine on an application that is substantially written in something else.
The equivalent of a constructor is an initializing function where you pass in a pointer to the item you want set up. Often you can see this in the form of a call to the function that looks like setup_foo(&my_foo). It's better to separate allocation from initialising, as you can use this function to initialise an item you have allocated on the stack. A similar principle applies to destructors.
Most people find Hungarian notation about as readable as written Hungarian. The exception to this is native Hungarian speakers, who typically find Hungarian notation about as legible as Cuneiform.. Unfortunately, Hungarian notation is widely encountered in Windows software and the entire Win32 API uses it, with the expected effects on the legibility of software written on this platform.
C/Unix books, even really good ones like the ones written by the late W Richard Stevens tend to be available secondhand quite cheaply through Amazon marketplace. In no particular order, get a copy of K&R, Stevens APUE and UNP 1 & 2, the Dragon book, Rochkind, Programming Pearls, Petzold and Richter (if working on Windows) and any of the other classic C/Unix works. Read, scribble on them with a pencil and generally interact with the books.
There are many, many good C/Unix programming resources on the web.
Read and understand the Ten Commandments of C Programming and some of the meta-discussion as to the why's and wherefores behind the commandments. This is showing its age to a certain extent, although most of it is still relevant and obscure compilers are still quite common in the embedded systems world.
Lex and Yacc are your friend if you want to write parsers.
As Navicore points out below (+1), Hanson's 'C Interfaces and Implementations' is a run-down on interface/implementation design for modular architecture with a bunch of examples. I have actually heard of this book and heard good things about it, although I can't claim to have read it. Aside from the C idioms that I've described above, this concept is arguably the core of good procedural design. In fact, other procedural languages such as Modula-2 actually make this concept explicit in their design. This might be the closest thing to a 'C Program Architecture' book in print.
Read the C FAQ.
My concerns going from OO back to C were addressed in David Hanson's "C Interfaces and Implementations".
C Interfaces and Implementations
Seriously, its approach made a huge difference in avoiding accidentally building the large ball of yarn that many non-oo systems wind up as.
Here's some interesting responses from a different question regarding OO programming in C. I made a post about some C code I worked with which basically impelmented object orientation stopping just short by not including virtual methods.
If I were doing C coding, I would use this technique to define 'objects'.
I find keeping Design Patterns in mind is always helpful, and can be implemented in most languages.
Here's a nice PDF discussing object oriented C programming.
While it is written as a somewhat language-agnostic text, Code Complete provides a lot of good guidance on code structure and organization, along with construction practices.

Resources