I am writing a compiler for a toy OO language. I am writing it in C, using Flex and Bison.
Consider the following syntax:
class MyClass {
int m_n;
void MyFunc(int b) {
m_n = 5;
m_p = b;
}
int m_p;
}
My current code will complain that in MyFunc, m_p has not yet been declared (with good reason). So, I came to the conclusion that I need a multi-pass parsing technique - something along the lines of:
1st pass - process variable declarations
2nd pass - process function definitions
First - is this the best way to solve the issue? Are there other methods that I should look into? Second - if this is a favorable solution, would I go about implementing it with a re-entrant lexer/parser?
Thanks
I recently wrote a compiler for an OO language, we had multiple passes (depending on the complexity of the language of course):
Collect all Classes
Build up superclass hierarchy
Collect all methods and fields
Collect variables inside methods etc.
There are reasons why we had to split up the whole process into 4 passes:
You can't build up the superclass hierarchy when not all classes have been processed yet (led to 2. pass)
You can't validate inherited methods (return value, parameters etc.) when the superclass is unknown (led to 2. pass)
You can't process variables when not all fields have been collected yet (led to 4. pass)
You can leave out the second pass if you don't have inheritance in your language of course.
When I look at it now, it should've been possible to merge pass 2 and 3 as all data should be available for pass 3.
The way we implemented it was just by walking through the AST and annotating it with the required symbol tables.
Related
I am tasked to assist with the design of a dynamic library (exposed with a C interface) aimed to be used in embed software application on various embed platform (Android,Windows,Linux).
Main requirements are speed , and decoupling.
For the decoupling part : one of our requirement is to be able to facilitate integration and so permit backward compatibility and resilience.
My library have some entry points that should be called by the integrating software (like an initialize constructor to provide options as where to log, how to behave etc...) and could also call some callback in the application (an event to inform when task is finished).
So I have come with several propositions but as each of one not seems great I am searching advice on a better or standard ways to achieve decoupling an d backward compatibility than this 3 ways that I have come up :
First an option that I could think of is to have a generic interface call for my exposed entry points for example with a hashmap of key/values for the parameters of my functions so in pseudo code it gives something like :
myLib.Initialize(Key_Value_Option_Array_Here);
Another option is to provide a generic function to provide all the options to the library :
myLib.SetOption(Key_Of_Option, Value_OfOption);
myLib.SetCallBack(Key_Of_Callbak, FunctionPointer);
When presenting my option my collegue asked me why not use a google protobuf argument as interface between the library and the embed software : but it seems weird to me, as their will be a performance hit on each call for serialization and deserialization.
Are there any more efficient or standard way that you coud think of?
You could have a struct for optional arguments:
typedef struct {
uint8_t optArg1;
float optArg2;
} MyLib_InitOptArgs_T;
void MyLib_Init(int16_t arg1, uint32_t arg2, MyLib_InitOptArgs_T const * optionalArgs);
Then you could use compound literals on function call:
MyLib_Init(1, 2, &(MyLib_InitOptArgs_T){ .optArg2=1.2f });
All non-specified values would have zero-ish value (0, NULL, NaN), and would be considered unused. Similarly, when passing NULL for struct pointer, all optional arguments would be considered unused.
Downside with this method is that if you expect to have many new arguments in the future, structure could grow too big. But whether that is an issue, depends on what your limits are.
Another option is to simply have multiple smaller initialization functions for initializating different subsystems. This could be combined with the optional arguments system above.
Say I have an external library that computes the optima, say minima, of a given function. Say its headers give me a function
double[] minimizer(ObjFun f)
where the headers define
typedef double (*ObjFun)(double x[])
and "minimizer" returns the minima of the function f of, say, a two dimensional vector x.
Now, I want to use this to minimize a parameterized function. I don't know how to express this in code exactly, but say if I am minimizing quadratic forms (just a silly example, I know these have closed form minima)
double quadraticForm(double x[]) {
return x[0]*x[0]*q11 + 2*x[0]*x[1]*q12 + x[1]*x[1]*q22
}
which is parameterized by the constants (q11, q12, q22). I want to write code where the user can input (q11, q12, q22) at runtime, I can generate a function to give to the library as a callback, and return the optima.
What is the recommended way to do this in C?
I am rusty with C, so asking about both feasibility and best practices. Really I am trying to solve this using C/Cython code. I was using python bindings to the library so far and using "inner functions" it was really obvious how to do this in python:
def getFunction(q11, q12, q22):
def f(x):
return x[0]*x[0]*q11 + 2*x[0]*x[1]*q12 + x[1]*x[1]*q22
return f
// now submit getFunction(/*user params*/) to the library
I am trying to figure out the C construct so that I can be better informed in creating a Cython equivalent.
The header defines the prototype of a function which can be used as a callback. I am assuming that you can't/won't change that header.
If your function has more parameters, they cannot be filled by the call.
Your function therefor cannot be called as callback, to avoid undefined behaviour or bogus values in parameters.
The function therefor cannot be given as callback; not with additional parameters.
Above means you need to drop the idea of "parameterizing" your function.
Your actual goal is to somehow allow the constants/coefficients to be changed during runtime.
Find a different way of doing that. Think of "dynamic configuration" instead of "parameterizing".
I.e. the function does not always expect those values at each call. It just has access to them.
(This suggests the configuration values are less often changed than the function is called, but does not require it.)
How:
I only can think of one simple way and it is pretty ugly and vulnerable (e.g. due to racing conditions, concurrent access, reentrance; you name it, it will hurt you ...):
Introduce a set of global variables, or better one struct-variable, for readability. (See recommendation below for "file-global" instead of "global".)
Set them at runtime to the desired values, using a separate function.
Initialise them to meaningful defaults, in case they never get written.
Read them at the start of the minimizing callback function.
Recommendation: Have everything (the minimizing function, the configuration variable and the function which sets the configuration at runtime) in one code file and make the configuration variable(s) static (i.e. restricts access to it this code file).
Note:
The answer is only the analysis that and why you should not try paraemeters.
The proposed method is not considered part of the answer; it is more simple than good.
I invite more holistic answers, which propose safer implementation.
I've seen tons of questions about this. Some have answers, some don't, but none seem to work for me. I have this program (somebody else wrote it) that I wish to use. However there are two problems in the constructor:
template<unsigned N>
class Enumeration {
public:
Enumeration(const array<vector<pair<unsigned char, double>>, N>& pDistribution);
}
The problem with this is that I wish to run this class on user defined input. This input decides the value of N. But because of the 1. const requirement on N for arrays, seeing as I need to construct the array that I will use in the constructor and 2. the const requirement N for templates, I am in quite a pickle.
I tried double pointers, using a proxing class or constexpr voids, non seem to work (depending on whether I did it correctly, I'm reletively new in C++).
My last resort is to do something really ugly with a many-cases switch-statement, but I was hoping someone here can help me out. Preferably without using an extension for the compiler.
The class you have shown does not support N being determined at run-time. It is intended for a different purpose, for when N can be determined at compile time.
Trying to allow N be determined at run-time in the above case is almost certainly a bad idea.
Instead, writing a variant of your type such that the outermost container is not an array but rather a vector would be the general approach required to make the size of the outermost container be determined at run time.
This will involve rewriting most of the class.
class Enumeration_Runtime {
public:
Enumeration_Runtime(const std::vector<std::vector<std::pair<unsigned char, double>>>& pDistribution);
};
the const&ness of the parameter might be best turned into a pass-by-value, but I am unsure.
There is no easy route here, because the person who wrote Enumeration<N> wrote it to not allow N to vary at run time.
Here lately I've been tinkering around with my own languages as well as reading various writings on the subject.
Does anyone have any good advice on how, in C (or Assembler), do you program the concept of the Object Class and/or the concept of Generics into a language. (referring to the Java implementations of Object and Generics)
For instance, in Java all all classes extend Object. So how do you represent this at the C level? is it something like:
#include <stdio.h>
typedef struct {
int stuff;
} Object;
typedef struct {
int stuff;
Object object;
} ChildClass;
int main() {
ChildClass childClass;
childClass.stuff = 100;
childClass.object.stuff = 200;
printf("%d\n", childClass.stuff);
printf("%d\n", childClass.object.stuff);
}
And I'm not really even sure how to get started with implementing something like Generics.
I also appreciate any valuable links regarding program langauge design.
Thanks,
Take a look at Structure and Interpretation of Computer Programs by Abelson and Sussman. While it doesn't show how to do it in C, it does demonstrate how to create types at run time and how to build an object system on top of a language that doesn't provide native support. Once you understand the basic ideas, you should be able to use structs and function pointers to create an implementation. Of course, looking at the source code for a C++ preprocessor will also be instructive. At one time, C++ was just a preprocessor for a C compiler.
I found this book a little while ago that has been an interesting read: Object-Oriented Programming With ANSI-C (PDF).
In C I've created class-like structures and methods by using structs (to store the class's state) and functions that take pointers to them (methods of the class). Implementing things like inheritance is possible, but would get messy fast. I'm not a Java guy though, and I'm not sure how much of Java you should press onto C, they are very different languages.
Here's probably the crudest form of a object implementation possible; I wrote it to run multiple PID controls at the same time.
//! PID control system state variables
typedef struct {
const PID_K * K; //!< PID control parameters
int32_t e; //!< Previous error (for derivative term)
int32_t i; //!< Integrator
} PID_SYS;
void PID_Init(PID_SYS * sys, const PID_K * K)
{
sys->i = 0;
sys->e = 0;
sys->K = K;
}
int16_t PID_Step(PID_SYS * sys, int32_t e)
{
// ...PID math using "sys->" for any persistent state variables...
}
If your goal is to write a new language that incorporates high level concepts, you might want to look at the CPython sources. CPython is an object oriented programming language whose interpreter is written in C. Open source C implementations of compilers/interpreters for C++, D, Javascript, Go, Objective C, and many, many others exist as well.
It's more complicated, but you're on the right path. Actual implementations use roughly the same code as yours to achieve inheritance (but they actually use containment to do it, which is quite ironic), along with a per-instance table of function pointers (virtual functions) and some (okay, many) helper macros.
See gobject.
It's definitely not C, but I'd recommend taking a look at Lua.
At its core, Lua only has a few basic types: number, string, boolean, function, and table (there's a couple more outside of the scope of this topic, though. A table is essentially just a hashtable that accepts keys of any type and can contain values of any type as well.
You can implement OOP in Lua by way of metatables. In Lua, a table is allowed to have up to one metatable, which is accessed under special circumstances, such as when a table is added or multiplied to another table or when you try to access a key that is not present in the table.
Using metatables, you can quickly and easily achieve something quite like inheritance by chaining together multiple metatables. When you try to access a missing key in a table, Lua looks up a key named __index in that table's metatable. So if you try to access a key named foo on a table that doesn't have such a key, Lua will check for foo in the first metatable. If it isn't present there and that metatable has a metatable of its own with __index defined, it will check for foo in the next one, and so on.
Once you realize how simple it is to do this in Lua, translating it to C is very achievable. Your OOP will be completely at run-time, of course, but it will be very OOP-like indeed.
I know the basics of this methods,procedures,function and classes but i always confuse to differentiate among those in contrast of Object oriented programming so please can any body tell me the difference among those with simple examples ?
A class, in current, conventional OOP, is a collection of data (member variables) bound together with the functions/procedures that work on that data (member functions or methods). The class has no relationship to the other three terms aside from the fact that it "contains" (more properly "is associated with") the latter.
The other three terms ... well, it depends.
A function is a collection of computing statements. So is a procedure. In some very anal retentive languages, though, a function returns a value and a procedure doesn't. In such languages procedures are generally used for their side effects (like I/O) while functions are used for calculations and tend to avoid side effects. (This is the usage I tend to favour. Yes, I am that anal retentive.)
Most languages are not that anal retentive, however, and as a result people will use the terms "function" and "procedure" interchangeably, preferring one to the other based on their background. (Modula-* programmers will tend to use "procedure" while C/C++/Java/whatever will tend to use "function", for example.)
A method is just jargon for a function (or procedure) bound to a class. Indeed not all OOP languages use the term "method". In a typical (but not universal!) implementation, methods have an implied first parameter (called things like this or self or the like) for accessing the containing class. This is not, as I said, universal. Some languages make that first parameter explicit (and thus allow to be named anything you'd like) while in still others there's no magic first parameter at all.
Edited to add this example:
The following untested and uncompiled C++-like code should show you what kind of things are involved.
class MyClass
{
int memberVariable;
void setMemberVariableProcedure(int v)
{
memberVariable = v;
}
int getMemberVariableFunction()
{
return memberVariable;
}
};
void plainOldProcedure(int stuff)
{
cout << stuff;
}
int plainOldFunction(int stuff)
{
return 2 * stuff;
}
In this code getMemberVariableProcedure and getMemberVariableFunction are both methods.
Procedures, function and methods are generally alike, they hold some processing statements.
The only differences I can think between these three and the places where they are used.
I mean 'method' are generally used to define functions inside a class, where several types of user access right like public, protected, private can be defined.
"Procedures", are also function but they generally represent a series of function which needs to be carried out, upon the completion of one function or parallely with another.
Classes are collection of related attributes and methods. Attributes define the the object of the class where as the methods are the action done by or done on the class.
Hope, this was helpful
Function, method and procedure are homogeneous and each of them is a subroutine that performs some calculations.
A subroutine is:
a method when used in Object-Oriented Programming (OOP). A method can return nothing (void) or something and/or it can change data outside of the subroutine or method.
a procedure when it does not return anything but it can change data outside of the subroutine, think of a SQL stored procedure. Not considering output parameters!
a function when it returns something (its calculated result) without changing data outside of the subroutine or function. This is the way how SQL functions work.
After all, they are all a piece of re-usable code that does something, e.g. return data, calculate or manipulate data.
There is no difference between of among.
Method : no return type like void
Function : which have return type