Find conditional evaluation in for loop using libclang - c

I'm using clang (via libclang via the Python bindings) to put together a code-review bot. I've been making the assumption that all FOR_STMT cursors will have 4 children; INIT, EVAL, INC, and BODY..
for( INIT; EVAL; INC )
BODY;
which would imply that I could check the contents of the evaluation expression with something in python like:
forLoopComponents = [ c for c in forCursor.get_children() ]
assert( len( forLoopComponents ) == 4 )
initExpressionCursor = forLoopComponents[ 0 ]
evalExpressionCursor = forLoopComponents[ 1 ]
incExpressionCursor = forLoopComponents[ 2 ]
bodyExpressionCursor = forLoopComponents[ 3 ]
errorIfContainsAssignment( evalExpressionCursor ) # example code style rule
This approach seems...less than great to begin with, but I just accepted it as a result of libclang, and the Python bindings especially, being rather sparse. However I've recently noticed that a loop like:
for( ; a < 4; a-- )
;
will only have 3 children -- and the evaluation will now be the first one rather than the second. I had always assumed that libclang would just return the NULL_STMT for any unused parts of the FOR_STMT...clearly, I was wrong.
What is the proper approach for parsing the FOR_STMT? I can't find anything useful for this in libclang.
UPDATE: Poking through the libclang source, it looks like these 4 components are dumbly added from the clang::ForStmt class using a visitor object. The ForStmt object should be returning null statement objects, but some layer somewhere seems to be stripping these out of the visited nodes vector...?

The same here, as a workaround I replaced the first empty statement with a dummy int foo=0 statement.
I can imagine a solution, which uses Cursor's get_tokens to match the parts of the statement.
The function get_tokens can help in situations, where clang is not enough.

Related

Efficency of assign-and-compare in the same statement in Smalltalk

A previous SO question raised the issue about which idiom is better in time of execution efficency terms:
[ (var := exp) > 0 ] whileTrue: [ ... ]
versus
[ var := exp.
var > 0 ] whileTrue: [ ... ]
Intuitively it seems the first form could be more efficient during execution, because it saves fetching one additional statement (second form). Is this true in most Smalltalks?
Trying with two stupid benchmarks:
| var acc |
var := 10000.
[ [ (var := var / 2) < 0 ] whileTrue: [ acc := acc + 1 ] ] bench.
| var acc |
var := 10000.
[ [ var := var / 2. var < 0 ] whileTrue: [ acc := acc + 1 ] ] bench
Reveals no major differences between both versions.
Any other opinions?
So the question is: What should I use to achieve a better execution time?
temp := <expression>.
temp > 0
or
(temp := <expression>) > 0
In cases like this one, the best way to arrive at a conclusion is to go down one step in the level of abstraction. In other words, we need a better understanding of what's happening behind the scenes.
The executable part of a CompiledMethod is represented by its bytecodes. When we save a method, what we are doing is compiling it into a series of low level instructions for the VM to be able to execute the method every time it is invoked. So, let's take a look at the bytecodes of each one of the cases above.
Since <expression> is the same in the same in both cases, let's reduce it drastically to eliminate noise. Also, let's put our code in a method so to have a CompiledMethod to play with
Object >> m
| temp |
temp := 1.
temp > 0
Now, let's look CompiledMethod and its superclasses for some message that would show us the bytecodes of Object >> #m. The selector should contain the subword bytecodes, right?
...
Here it is #symbolicBytecodes! Now let's evaluate (Object >> #m) symbolicBytecodes to get:
pushConstant: 1
popIntoTemp: 0
pushTemp: 0
pushConstant: 0
send: >
pop
returnSelf
Note by the way how our temp variable has been renamed to Temp: 0 in the bytecodes language.
Now repeat with the other and get:
pushConstant: 1
storeIntoTemp: 0
pushConstant: 0
send: >
pop
returnSelf
The difference is
popIntoTemp: 0
pushTemp: 0
versus
storeIntoTemp: 0
What this reveals is that in both cases temp is read from the stack in different ways. In the first case, the result of our <expression> is popped into temp from the execution stack and then temp is pushed again to restore the stack. A pop followed by a push of the same thing. In the second case, instead, no push or pop happens and temp is simply read from the stack.
So the conclusion is that in the first case we will be generating two cancelling instructions pop followed by push.
This also explains why the difference is so hard to measure: push and pop instructions have direct translations into machine code and the CPU will execute them really fast.
Note however, that nothing prevents the compiler to automatically optimize the code and realize that in fact pop + push is equivalent to storeInto. With such an optimization both Smalltalk snippets would result in exactly the same machine code.
Now, you should be able to decide which form do you prefer. I my opinion such a decision should only take into account the programming style that you like better. Taking into consideration the execution time is irrelevant because the difference is minimal, and could be easily reduced to zero by implementing the optimization we just discussed. By the way, that would be an excellent exercise for those willing to understand the low level realms of the unparalleled Smalltalk language.

Looping on a database with Clojure

I just got started with Clojure on Heroku, first reading this introduction.
Now in the phase of getting my hands dirty, I am facing this issue handling a database in a loop.
This is working:
(for
[s (db/query (env :database-url)
["select * from My_List"])]
; here one can do something with s, for example:
; print out (:field s)
)
But it is not enough to update variables inside the loop as I want.
Reading on the subject, I understand that Clojure having its own way of handling variables I need to use a loop pattern.
Here is what I tried:
(loop [a 0 b 1
s (db/query (env :database-url)
["select * from My_List"])]
; here I want to do something with s, for example
; print out (:field s)
; and do the following ... but it does not work!
(if (> (:otherField s) 5)
(:otherField s)
(recur (+ a (:otherField s)) b s))
)
Since I tried various ways of doing before writing this post, I know that the code above works except for the fact that I am doing something wrong concerning the database.
So here comes my question: What do I need to change to make it work?
I see, that it's hard to get to the functional thinking at first, when you're used to a different paradigm.
I don't think there is a correct explanation on “how to do this loop right”, because it's not right to do a loop here.
The two things that feel most incorrect to me:
Never do a SELECT * FROM table. This is not how relational databases are ment to be used. For example when you want the sum of all values greater than 5 you should do: SELECT SUM(field) FROM my_list WHERE field > 5
Don't think in loops (how to do it) but in what you want to do with the data:
I want to work on field :otherFIeld
I am only interested in values bigger than 5
I want the sum of all the remaining values
Then you come to something like this:
(reduce +
(filter #(> % 5)
(map :otherField
(db/query (env :database-url) ["select * from My_List"]))))
(No loop at all.)

Using eval in Julia to deal with varargs

I have just started using Julia. I am trying to use eval (in Julia) in order to define a set of variables in a function. Let's say I want to set v1 equal to 2:
function fun_test(varargs...)
v1 = 0;
if length(varargs) > 0
j = collect(linspace(1,length(varargs)-1,length(varargs)/2));
for i in j
expr_vargs = parse("$(varargs[i]) = $(varargs[i+1]);");
eval(expr_vargs);
end
end
println(v1)
end
Calling the function as:
fun_test("v1", "2");
It doesn't work, since println returns 0 (the initial value of v1). However, if I run an analogous eval call in the Julia's terminal, then it works.
Could you please clarify why it doesn't work and how to fix it?
eval runs in toplevel scope, not in function scope. It is not possible to dynamically update bindings in function scope. Without knowing your precise use case, I suspect there is a way to do things without dynamic rebinding. In particular, v1, v2, etc. is probably best made into an array, V.
Nevertheless, if you really must, you can always define v1 as a global variable in a submodule:
module FunTest
v1 = 0
function fun_test(varargs...)
if length(varargs) > 0
j = collect(linspace(1,length(varargs)-1,length(varargs)/2));
for i in j
#eval $(varargs[i]) = $(varargs[i+1])
end
end
println(v1)
end
export fun_test
end
using .FunTest
fun_test(:v1, 2) # result: 2
(I have also modified your code to avoid parseing strings, which is best done through expression interpolation.)

Do loop index changing on its own

I have a couple hundred line program (including functions) in essentially free-form Fortran. At one point, I have a pair of nested do loops that call functions and store results in matrices. However, I don't believe any of that is the problem (although I could be wrong).
Immediately after the first do loop starts, I define an array using a column of another array. Immediately after that, the index is always set to 3. I haven't been able to find any useful information in the usual places. I've included a fragment of the code below.
do i = 1,n
print *, 'i:',i ! Gives i = 1
applyto = eig_vec(:,i)
print *, i ! Gives i = 3
state1 = create_state(ground,applyto,state,bin_state,num_s,ns)
first = destroy_state(ground,state1,state,bin_state,num_s,ns)
state1 = destroy_state(ground,applyto,state,bin_state,num_s,ns)
second = create_state(ground,state1,state,bin_state,num_s,
1 ns)
do j = 1,n
bra = eig_vec(:,j)
a_matrix(j,i) = sum(bra*first + bra*second)
matrix(j,i) = sum(bra*first - bra*second
end do
end do
Is this a bug? Am I missing something obvious? I am compiling the code with a high level of optimization, if that could potentially be a source of problems. I'm relatively new to Fortran, so debugging flags or commands (for gdb - I believe that's all I have available) would be welcome.

C: Convert A ? B : C into if (A) B else C

I was looking for a tool that can convert C code expressions for the form:
a = (A) ? B : C;
into the 'default' syntax with if/else statements:
if (A)
a = B
else
a = C
Does someone know a tool that's capable to do such a transformation?
I work with GCC 4.4.2 and create a preprocessed file with -E but do not want such structures in it.
Edit:
Following code should be transformed, too:
a = ((A) ? B : C)->b;
Coccinelle can do this quite easily.
Coccinelle is a program matching and
transformation engine which provides
the language SmPL (Semantic Patch
Language) for specifying desired
matches and transformations in C code.
Coccinelle was initially targeted
towards performing collateral
evolutions in Linux. Such evolutions
comprise the changes that are needed
in client code in response to
evolutions in library APIs, and may
include modifications such as renaming
a function, adding a function argument
whose value is somehow
context-dependent, and reorganizing a
data structure. Beyond collateral
evolutions, Coccinelle is successfully
used (by us and others) for finding
and fixing bugs in systems code.
EDIT:
An example of semantic patch:
## expression E; constant C; ##
(
!E & !C
|
- !E & C
+ !(E & C)
)
From the documentation:
The pattern !x&y. An expression of this form is almost always meaningless, because it combines a boolean operator with a bit operator. In particular, if the rightmost bit of y is 0, the result will always be 0. This semantic patch focuses on the case where y is a constant.
You have a good set of examples here.
The mailing list is really active and helpful.
The following semantic patch for Coccinelle will do the transformation.
##
expression E1, E2, E3, E4;
##
- E1 = E2 ? E3 : E4;
+ if (E2)
+ E1 = E3;
+ else
+ E1 = E4;
##
type T;
identifier E5;
T *E3;
T *E4;
expression E1, E2;
##
- E1 = ((E2) ? (E3) : (E4))->E5;
+ if (E2)
+ E1 = E3->E5;
+ else
+ E1 = E4->E5;
##
type T;
identifier E5;
T E3;
T E4;
expression E1, E2;
##
- E1 = ((E2) ? (E3) : (E4)).E5;
+ if (E2)
+ E1 = (E3).E5;
+ else
+ E1 = (E4).E5;
The DMS Software Reengineering Toolkit can do this, by applying program transformations.
A specific DMS transformation to match your specific example:
domain C.
rule ifthenelseize_conditional_expression(a:lvalue,A:condition,B:term,C:term):
stmt -> stmt
= " \a = \A ? \B : \C; "
-> " if (\A) \a = \B; else \a=\C ; ".
You'd need another rule to handle your other case, but it is equally easy to express.
The transformations operate on source code structures rather than text, so layout out and comments won't affect recognition or application. The quotation marks in the rule not traditional string quotes, but rather are metalinguistic quotes that separate the rule syntax language from the pattern langu age used to specify the concrete syntax to be changed.
There are some issues with preprocessing directives if you intend to retain them. Since you apparantly are willing to work with preprocessor-expanded code, you can ask DMS to do the preprocessing as part of the transformation step; it has full GCC4 and GCC4-compatible preprocessors built right in.
As others have observed, this is a rather easy case because you specified it work at the level of a full statement. If you want to rid the code of any assignment that looks similar to this statement, with such assignments embedded in various contexts (initializers, etc.) you may need a larger set of transforms to handle the various set of special cases, and you may need to manufacture other code structures (e.g., temp variables of appropriate type). The good thing about a tool like DMS is that it can explicitly compute a symbolic type for an arbitrary expression (thus the type declaration of any needed temps) and that you can write such a larger set rather straightforwardly and apply all of them.
All that said, I'm not sure of the real value of doing your ternary-conditional-expression elimination operation. Once the compiler gets hold of the result, you may get similar object code as if you had not done the transformations at all. After all, the compiler can apply equivalence-preserving transformations, too.
There is obviously value in making regular changes in general, though.
(DMS can apply source-to-source program transformations to many langauges, including C, C++, Java, C# and PHP).
I am not aware of such a thing as the ternary operator is built-into the language specifications as a shortcut for the if logic... the only way I can think of doing this is to manually look for those lines and rewrite it into the form where if is used... as a general consensus, the ternary operator works like this
expr_is_true ? exec_if_expr_is_TRUE : exec_if_expr_is_FALSE;
If the expression is evaluated to be true, execute the part between ? and :, otherwise execute the last part between : and ;. It would be the reverse if the expression is evaluated to be false
expr_is_false ? exec_if_expr_is_FALSE : exec_if_expr_is_TRUE;
If the statements are very regular like this why not run your files through a little Perl script? The core logic to do the find-and-transform is simple for your example line. Here's a bare bones approach:
use strict;
while(<>) {
my $line = $_;
chomp($line);
if ( $line =~ m/(\S+)\s*=\s*\((\s*\S+\s*)\)\s*\?\s*(\S+)\s*:\s*(\S+)\s*;/ ) {
print "if(" . $2 . ")\n\t" . $1 . " = " . $3 . "\nelse\n\t" . $1 . " = " . $4 . "\n";
} else {
print $line . "\n";
}
}
exit(0);
You'd run it like so:
perl transformer.pl < foo.c > foo.c.new
Of course it gets harder and harder if the text pattern isn't as regular as the one you posted. But free, quick and easy to try.

Resources