Token return values in ANTLR 3 C - c

I'm new to ANTLR, and I'm attempting to write a simple parser using C language target (antler3C). The grammar is simple enough that I'd like to have each rule return a value, eg:
number returns [long value]
:
( INT {$value = $INT.ivalue;}
| HEX {$value = $HEX.hvalue;}
)
;
HEX returns [long hvalue]
: '0' 'x' ('0'..'9'|'a'..'f'|'A'..'F')+ {$hvalue = strtol((char*)$text->chars,NULL,16);}
;
INT returns [long ivalue]
: '0'..'9'+ {$ivalue = strtol((char*)$text->chars,NULL,10);}
;
Each rule collects the return value of it's child rules until the topmost rule returns a nice struct full of my data.
As far as I can tell, ANTLR allows lexer rules (tokens, eg 'INT' & 'HEX') to return values just like parser rules (eg 'number'). However, the generated C code will not compile:
error C2228: left of '.ivalue' must have class/struct/union
error C2228: left of '.hvalue' must have class/struct/union
I did some poking around, and the errors make sense - the tokens end up as generic ANTLR3_COMMON_TOKEN_struct, which doesn't allow for a return value. So maybe the C target just doesn't support this feature. But like I said, I'm new to this, and before I go haring off to find another approach I want to confirm that I can't do it this way.
So the question is this: 'Does antler3C support return values for lexer rules, and if so what is the proper way to use them?'

Not really any new information, just some details on what #bemace already mentioned.
No, lexer rules cannot have return values. See 4.3 Rules from The Definitive ANTLR reference:
Rule Arguments and Return Values
Just like function calls, ANTLR parser and tree parser rules can have
arguments and return values. ANTLR lexer rules cannot have return
values [...]
There are two options:
Option 1
You can do the transforming to a long in the parser rule number:
number returns [long value]
: INT {$value = Long.parseLong($INT.text);}
| HEX {$value = Long.parseLong($HEX.text.substring(2), 16);}
;
Option 2
Or create your own token that has, say, a toLong(): long method:
import org.antlr.runtime.*;
public class YourToken extends CommonToken {
public YourToken(CharStream input, int type, int channel, int start, int stop) {
super(input, type, channel, start, stop);
}
// your custom method
public long toLong() {
String text = super.getText();
int radix = text.startsWith("0x") ? 16 : 10;
if(radix == 16) text = text.substring(2);
return Long.parseLong(text, radix);
}
}
and define in the options {...} header in your grammar to use this token and override the emit(): Token method in your lexer class:
grammar Foo;
options{
TokenLabelType=YourToken;
}
#lexer::members {
public Token emit() {
YourToken t = new YourToken(input, state.type, state.channel,
state.tokenStartCharIndex, getCharIndex()-1);
t.setLine(state.tokenStartLine);
t.setText(state.text);
t.setCharPositionInLine(state.tokenStartCharPositionInLine);
emit(t);
return t;
}
}
parse
: number {System.out.println("parsed: "+$number.value);} EOF
;
number returns [long value]
: INT {$value = $INT.toLong();}
| HEX {$value = $HEX.toLong();}
;
HEX
: '0' 'x' ('0'..'9'|'a'..'f'|'A'..'F')+
;
INT
: '0'..'9'+
;
When you generate a parser and lexer, and run this test class:
import org.antlr.runtime.*;
import java.io.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("0xCafE");
FooLexer lexer = new FooLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
FooParser parser = new FooParser(tokens);
parser.parse();
}
}
it will produce the following output:
parsed: 51966
The first options seems the more practical in your case.
Note that, as you can see, the examples given are in Java. I have no idea if option 2 is supported in the C target/runtime. I decided to still post it to be able to use it as a future reference here on SO.

Lexer rules must return Token objects, because that's what the Parser expects to work with. There may be a way to customize the type of token object used, but it's easier just to convert tokens to values in the lowest-level parser rules.
social_title returns [Name.Title title]
: SIR { title = Name.Title.SIR; }
| 'Dame' { title = Name.Title.DAME; }
| MR { title = Name.Title.MR; }
| MS { title = Name.Title.MS; }
| 'Miss' { title = Name.Title.MISS; }
| MRS { title = Name.Title.MRS; };

There is a third option: You can pass an object as argument to the lexer rule. This object contains a member that represents the lexer's return value. Within the lexer rule, you can set the member. Outside the lexer rule, at the point you call it, you can get the member and do whatever you want with this 'return value'.
This way of parameter passing corresponds to the 'var' parameters in Pascal or the 'out' parameters in C++ and other programming languages.

Related

Kotlin/Native to C dylib : How to access class members in a instance object which returned by a method?

I am trying to learn Kotlin/Native C interop
I exported some Kotlin classes as C dynamic Lib and succeeded in access methods with primitive return types
But When trying to access class members in a instance object which returned by a method, the object contains something named as pinned
Code sample:
#Serializable
data class Persons (
val results: Array<Result>,
val info: Info
)
class RandomUserApiJS {
fun getPersonsDirect() : Persons {
return runBlocking {
RandomUserApi().getPersons()
}
}
}
Now when using them in C codeblocks,
In this image, note that the persons obj only showing a field named pinned and no other member functions found.
Since I don't know that much in C/C++ and can't investigate further.
Please help me to understand to access instance members of Kotlin Class in exported C lib?
Header file for ref:
https://gist.github.com/RageshAntony/a0b9007376084fa8b213b022b58f9886
for your gist
https://gist.github.com/RageshAntony/a0b9007376084fa8b213b022b58f9886
I modified the following:
// I comment this annotation
// #Serializable
data class Persons(
val results: List<Result>,
val info: Info,
/**
* the Result's properties too many
* I will use a simple data class for this example
* how to get c array from Persons (also suitable any iterable)
*/
val testList: List<Simple>,
) {
public fun toJson() = Json.encodeToString(this)
companion object {
public fun fromJson(json: String) = Json.decodeFromString<Persons>(json)
}
val arena = Arena()
fun getTestListForC(size: CPointer<IntVar>): CPointer<COpaquePointerVar> {
size.pointed.value = testList.size
return arena.allocArray<COpaquePointerVar>(testList.size) {
this.value = StableRef.create(testList[it]).asCPointer()
}
}
fun free() {
arena.clear()
}
}
/**
* kotlin <-> c bridge is primitive type
* like int <-> Int
* like char* <-> String
* so the Simple class has two primitive properties
*/
data class Simple(
val name: String,
val age: Int,
)
#include <stdio.h>
#include "libnative_api.h"
int main(int argc, char **argv) {
libnative_ExportedSymbols* lib = libnative_symbols();
libnative_kref_MathNative mn = lib->kotlin.root.MathNative.MathNative();
const char *a = lib->kotlin.root.MathNative.mul(mn,5,6); // working
printf ("Math Resullt %s\n",a);
libnative_kref_RandomUserApiJS pr = lib->kotlin.root.RandomUserApiJS.RandomUserApiJS();
libnative_kref_Persons persons = lib->kotlin.root.RandomUserApiJS.getPersonsDirect(pr);
// when accessing above persons obj, only a field 'pinned' availabe, nothing else
int size;
libnative_kref_Simple* list = (libnative_kref_Simple *)lib->kotlin.root.Persons.getTestListForC(persons, &size);
printf("size = %d\n", size);
for (int i = 0; i < size; ++i) {
const char *name = lib->kotlin.root.Simple.get_name(list[i]);
int age = lib->kotlin.root.Simple.get_age(list[i]);
printf("%s\t%d\n", name, age);
}
lib->kotlin.root.Persons.free(persons);
return 0;
}
// for output
Math Resullt The answer is 30
size = 3
name1 1
name2 2
name3 3
But I don't think calling kotlin lib through C is a good behavior, because kotlin native is not focused on improving performance for now, in my opinion, all functions that can be implemented with kotlin native can find solutions implemented in pure c, So I'm more focused on how to access the c lib from kotlin. Of course, it's a good solution if you absolutely need to access klib from c, but I'm still not very satisfied with it, then I may create a github template to better solve kotlin-interop from c.But that's not the point of this answer.

Postgres C extension aggregate: How to detect first time aggregate function is called

I'm writing a C extension aggregate function for PostgreSQL , and in C code I would like to know if it is the first time that transition function of the aggregate be called.
For example, I define a aggregate function such as:
CREATE AGGREGATE my_aggregate (text) (
sfunc = my_transfunc,
stype = text,
finalfunc = my_finalfn,
initcond = '');
Then in C code of my_transfunc, how can I know if it is the first time my_transfunc be called ( but not the second, third ... time).
Datum my_transfunc(PG_FUNCTION_ARGS) {
// How to check if the first time function called
if (first_time) { then do something }
else { do some other things }
}
I don't want to use global or static variable here as this made my function is not threaded-safe which violent the requirement for my function.
Generally this is a matter of a proper setting of initcond. Typically you do not need to know whether the function is executed for the first time if only the algorithm is designed properly.
In your case, assuming that the function returns non-empty string, you can check whether the argument is empty (i.e. is equal to initcond). Of course, you can set initcond to a special value instead of an empty string.
Datum my_transfunc(PG_FUNCTION_ARGS) {
text *arg = PG_GETARG_TEXT_PP(0);
int32 arg_size = VARSIZE_ANY_EXHDR(arg);
if (arg_size == 0) { // arg == initcond }
else { // do some other things }
}

How do I print values from C extensions?

Every Ruby object is of type VALUE in C. How do I print it in a readable way?
Any other tips concerning debugging of Ruby C extensions are welcome.
You can call p on Ruby objects with the C function rb_p. For example:
VALUE empty_array = rb_ary_new();
rb_p(empty_array); // prints out "[]"
Here's what I came up with:
static void d(VALUE v) {
ID sym_puts = rb_intern("puts");
ID sym_inspect = rb_intern("inspect");
rb_funcall(rb_mKernel, sym_puts, 1,
rb_funcall(v, sym_inspect, 0));
}
Having it in a C file, you can output VALUEs like so:
VALUE v;
d(v);
I've borrowed the idea from this article.
I've found an interesting way using Natvis files in Visual Studio.
I have created C++ wrapper objects over the Ruby C API - this gives me a little bit more type safety and the syntax becomes more similar to writing actual Ruby.
I won't be posting the whole code - too long for that, I plan on open sourcing it eventually.
But the gist of it is:
class Object
{
public:
Object(VALUE value) : value_(value)
{
assert(NIL_P(value_) || kind_of(rb_cObject));
}
operator VALUE() const
{
return value_;
}
// [More code] ...
}
Then lets take the String class for example:
class String : public Object
{
public:
String() : Object(GetVALUE("")) {}
String(VALUE value) : Object(value)
{
CheckTypeOfOrNil(value_, String::klass());
}
String(std::string value) : Object( GetVALUE(value.c_str()) ) {}
String(const char* value) : Object( GetVALUE(value) ) {}
operator std::string()
{
return StringValueCStr(value_);
}
operator std::string() const
{
return operator std::string();
}
static VALUE klass()
{
return rb_cString;
}
// String.empty?
bool empty()
{
return length() == 0;
}
size_t length() const
{
return static_cast<size_t>(RSTRING_LEN(value_));
}
size_t size() const
{
return length();
};
};
So - my wrappers make sure to check that the VALUE they wrap is of expected type or Nil.
I then wrote some natvis files for Visual Studio which will provide some real time debug information for my wrapper objects as I step through the code:
<?xml version="1.0" encoding="utf-8"?>
<AutoVisualizer xmlns="http://schemas.microsoft.com/vstudio/debugger/natvis/2010">
<Type Name="SUbD::ruby::String">
<DisplayString Condition="value_ == RUBY_Qnil">Ruby String: Nil</DisplayString>
<DisplayString Condition="value_ != RUBY_Qnil">Ruby String: {((struct RString*)value_)->as.heap.ptr,s}</DisplayString>
<StringView Condition="value_ != RUBY_Qnil">((struct RString*)value_)->as.heap.ptr,s</StringView>
<Expand>
<Item Name="[VALUE]">value_</Item>
<Item Name="[size]" Condition="value_ != RUBY_Qnil">((struct RString*)value_)->as.heap.len</Item>
<Item Name="[string]" Condition="value_ != RUBY_Qnil">((struct RString*)value_)->as.heap.ptr</Item>
<Item Name="[capacity]" Condition="value_ != RUBY_Qnil">((struct RString*)value_)->as.heap.aux.capa</Item>
</Expand>
</Type>
</AutoVisualizer>
Note that this is all hard-coded to the exact internal structure of Ruby 2.0. This will not work in Ruby 1.8 or 1.9 - haven't tried with 2.1 or 2.2 yet. Also, there might be mutations of how the String can be stored which I haven't added yet. (Short strings can be stored as immediate values.)
(In fact - the natvis posted above only works for 32bit - not 64bit at the moment.)
But once that is set up I can step through code and inspect the Ruby strings almost like they are std::string:
Getting it all to work isn't trivial. If you noticed in my natvis my RUBY_Qnil references - they would not work unless I added this piece of debug code to my project:
// Required in order to make them available to natvis files in Visual Studio.
#ifdef _DEBUG
const auto DEBUG_RUBY_Qnil = RUBY_Qnil;
const auto DEBUG_RUBY_FIXNUM_FLAG = RUBY_FIXNUM_FLAG;
const auto DEBUG_RUBY_T_MASK = RUBY_T_MASK;
const auto DEBUG_RUBY_T_FLOAT = RUBY_T_FLOAT;
const auto DEBUG_RARRAY_EMBED_FLAG = RARRAY_EMBED_FLAG;
const auto DEBUG_RARRAY_EMBED_LEN_SHIFT = RARRAY_EMBED_LEN_SHIFT;
const auto DEBUG_RARRAY_EMBED_LEN_MASK = RARRAY_EMBED_LEN_MASK;
#endif
You cannot use macros in natvis definitions unfortunately, so that's why I had to manually expand many of them into the natvis file by inspecting the Ruby source itself. (The Ruby Cross Reference is of great help here: http://rxr.whitequark.org/mri/ident?v=2.0.0-p247)
It's still WIP, but it's already saved me a ton of headaches. Eventually I want to extract the debug setup on GitHub: https://github.com/thomthom (Keep an eye on that account if you are interested.)

Understanding angular js filter syntax according to documentation

According to this the "currency" filter takes amount as the first parameter given the following syntax:
{{ currency_expression | currency : amount : symbol}}
But in the following example it never passed the amount as a parameter:
<span id="currency-default">{{amount | currency}}</span>
I'm assuming that amount in the example refers to the currency_expression in the syntax as written in the documentation. They could have written it in the documentation in this way:
{{ currency_expression | currency : symbol}}
Another example is the filter filter with the following syntax:
{{ filter_expression | filter : array : expression : comparator}}
But in the following example it never specified the "source array" parameter:
<tr ng-repeat="friendObj in friends | filter:search:strict">
I'm assuming that friendObj in friends in the example refers to the filter_expression and search refers to the array if we're going to follow the syntax as written in the documentation. They could have written it in the documentation in this way:
{{ filter_expression_that_returns_array | filter : expression : comparator}}
I'm not so sure if I'm missing something but the documentation doesn't make sense to me given their examples.
My question is, should I simply ignore what the documentation says that the first parameter must be the input?
For what it's worth (since by experience we already know it is like that), the source code indicates that the expression only needs to be before the |.
Using the source for version 1.2.16 and without going into much detail:
ng/parse.js#L103
// In the OPERATORS hash:
var OPERATORS = {
...
'|': function (self, locals, a, b) {
return b(self, locals)(self, locals, a(self, locals));
},
...
ng/parse.js#L579
// `Parser`'s `filter()` method:
Parser.prototype = {
...
filter: function() {
var token = this.expect();
var fn = this.$filter(token.text);
var argsFn = [];
while (true) {
if ((token = this.expect(':'))) {
argsFn.push(this.expression());
} else {
var fnInvoke = function(self, locals, input) {
var args = [input];
for (var i = 0; i < argsFn.length; i++) {
args.push(argsFn[i](self, locals));
}
return fn.apply(self, args);
};
return function() {
return fnInvoke;
};
}
}
},
...
So, what did we learn ?
if ((token = this.expect(':'))) {
argsFn.push(this.expression());
The parser will get all tokens after the filter (separated by :) and put them in an array (argsFn).
var args = [input];
for (var i = 0; i < argsFn.length; i++) {
args.push(argsFn[i](self, locals));
}
At "runtime" (when the actual filtering is hapenning) an new array will be created (args) which will contain input (but what is input ? more on that later) and each parameter token that was previosuly stored in argsFn.
return fn.apply(self, args);
This args array will be the arguments list to the filtering function.
So, args contains input and the tokens after filter_name (as in expression | filter_name : param1 : param2).
If we can convince ourselves that input is indeed the expression (appearing on the left of the |, then we should be convinced that there is no need to have the expression appear as the first parameter after filter_name.
var fnInvoke = function(self, locals, input) {
...
return function() {
return fnInvoke;
};
filter() returns an anonymous function that when executed returns the function fnInvoke.
input is the third argument assed to fnInvoke when it is executed.
Now lets get back to the | operator:
'|': function (self, locals, a, b) {
...
It will result in calling this anonymous function with a and b being the left-hand side (expression) and the right-hand side (filter_name:param1:param2) respectively.
(In fact a and b are not the left- and right-hand sides, but they are functions that when executed return the result of evaluating the left- and right-hand sides in a given context (i.e. scope).
return b(self, locals)(self, locals, a(self, locals));
This tells us that the function returned by calling the anonymous function returned by filter() (b(self, locals)) will be executed with the following arguments:
`self`, `locals`, `a(self, locals)`
Which means that the mysterious input parameter (remember it was the 3rd argument of fnInvoke ?) is a(self, locals).
And a(self, locals) is basically the result of evaluating the left-hand side argument of the | operator in the context of the current scope, e.g. the result of evaluating a string ('someExpression') to the value of the property in the current scope ($scope.someExpression).
I don't know if you are convinced (I don't think I would have been).
I left much detail out of the explanation, but the interested reader can delve into that source and convince themselves :)
I feel kind of bad for posting such a longish answer with so little practical value. Sigh...
Personally, I would always read the source code when in doubt.
For your <span id="currency-default">{{amount | currency}}</span> example:
https://github.com/angular/angular.js/blob/master/src/ng/filter/filters.js#L50
currencyFilter.$inject = ['$locale'];
function currencyFilter($locale) {
var formats = $locale.NUMBER_FORMATS;
return function(amount, currencySymbol){
if (isUndefined(currencySymbol)) currencySymbol = formats.CURRENCY_SYM;
return formatNumber(amount, formats.PATTERNS[1], formats.GROUP_SEP, formats.DECIMAL_SEP, 2).
replace(/\u00A4/g, currencySymbol);
};
}
Looks to me like amount then currency, so input must be first parameter.
Update.
Source for filter in HTML Template Binding context.
https://github.com/angular/angular.js/blob/master/src/ng/filter/filter.js#L116
function filterFilter() {
return function(array, expression, comparator) {
if (!isArray(array)) return array;
First if is a check for array, aka input, as first parameter.

Loops' iterating in ANTLR

I'm trying to make a Pascal interpreter using ANTLR and currently have some troubles with processing loops while walking the AST tree.
For example for loop is parsed as:
parametricLoop
: FOR IDENTIFIER ASSIGN start = integerExpression TO end = integerExpression DO
statement
-> ^( PARAMETRIC_LOOP IDENTIFIER $start $end statement )
;
(variant with DOWNTO is ignored).
In what way can I make walker to repeat the loop's execution so much times as needed? I know that I should use input.Mark() and input.Rewind() for that. But exactly where should they be put? My current wrong variant looks so (target language is C#):
parametricLoop
:
^(
PARAMETRIC_LOOP
IDENTIFIER
start = integerExpression
{
Variable parameter = Members.variable($IDENTIFIER.text);
parameter.value = $start.result;
}
end = integerExpression
{
int end_value = $end.result;
if ((int)parameter.value > end_value) goto EndLoop;
parametric_loop_start = input.Mark();
}
statement
{
parameter.value = (int)parameter.value + 1;
if ((int)parameter.value <= end_value)
input.Rewind(parametric_loop_start);
)
{
EndLoop: ;
}
;
(Hope everything is understandable). The condition of repeating should be checked before the statement's first execution.
I tried to play with placing Mark and Rewind in different code blocks including #init and #after, and even put trailing goto to loops head, but each time loop either iterated one time or threw exceptions like Unexpected token met, for example ':=' (assignement). I have no idea, how to make that work properly and can't find any working example. Can anybody suggest a solution of this problem?
I haven't used ANTLR, but it seems to me that you are trying to execute the program while you're parsing it, but that's not really what parsers are designed for (simple arithmetic expressions can be executed during parsing, but as you have discovered, loops are problematic). I strongly suggest that you use the parsing only to construct the AST. So the parser code for parametricLoop should only construct a tree node that represents the loop, with child nodes representing the variables, conditions and body. Afterwards, in a separate, regular C# class (to which you provide the AST generated by the parser), you execute the code by traversing the tree in some manner, and then you have complete freedom to jump back and forth between the nodes in order to simulate the loop execution.
I work with ANTLR 3.4 and I found a solution which works with Class CommonTreeNodeStream.
Basically I splitted off new instances of my tree parser, which in turn analyzed all subtrees. My sample code defines a while-loop:
tree grammar Interpreter;
...
#members
{
...
private Interpreter (CommonTree node, Map<String, Integer> symbolTable)
{
this (new CommonTreeNodeStream (node));
...
}
...
}
...
stmt : ...
| ^(WHILE c=. s1=.) // ^(WHILE cond stmt)
{
for (;;)
{
Interpreter condition = new Interpreter (c, this.symbolTable);
boolean result = condition.cond ();
if (! result)
break;
Interpreter statement = new Interpreter (s1, this.symbolTable);
statement.stmt ();
}
}
...
cond returns [boolean result]
: ^(LT e1=expr e2=expr) {$result = ($e1.value < $e2.value);}
| ...
Just solved a similar problem, several points:
Seems you need to use BufferedTreeNodeStream instead of CommonTreeNodeStream, CommonTreeNodeStream never works for me (struggled long time to find out)
Use seek seems to be more clear to me
Here's my code for a list command, pretty sure yours can be easily changed to this style:
list returns [Object r]
: ^(LIST ID
{int e_index = input.Index;}
exp=.
{int s_index = input.Index;}
statements=.
)
{
int next = input.Index;
input.Seek(e_index);
object list = expression();
foreach(object o in (IEnumerable<object>)list)
{
model[$ID.Text] = o;
input.Seek(s_index);
$r += optional_block().ToString();
}
input.Seek(next);
}

Resources