C: Convert A ? B : C into if (A) B else C - c

I was looking for a tool that can convert C code expressions for the form:
a = (A) ? B : C;
into the 'default' syntax with if/else statements:
if (A)
a = B
else
a = C
Does someone know a tool that's capable to do such a transformation?
I work with GCC 4.4.2 and create a preprocessed file with -E but do not want such structures in it.
Edit:
Following code should be transformed, too:
a = ((A) ? B : C)->b;

Coccinelle can do this quite easily.
Coccinelle is a program matching and
transformation engine which provides
the language SmPL (Semantic Patch
Language) for specifying desired
matches and transformations in C code.
Coccinelle was initially targeted
towards performing collateral
evolutions in Linux. Such evolutions
comprise the changes that are needed
in client code in response to
evolutions in library APIs, and may
include modifications such as renaming
a function, adding a function argument
whose value is somehow
context-dependent, and reorganizing a
data structure. Beyond collateral
evolutions, Coccinelle is successfully
used (by us and others) for finding
and fixing bugs in systems code.
EDIT:
An example of semantic patch:
## expression E; constant C; ##
(
!E & !C
|
- !E & C
+ !(E & C)
)
From the documentation:
The pattern !x&y. An expression of this form is almost always meaningless, because it combines a boolean operator with a bit operator. In particular, if the rightmost bit of y is 0, the result will always be 0. This semantic patch focuses on the case where y is a constant.
You have a good set of examples here.
The mailing list is really active and helpful.

The following semantic patch for Coccinelle will do the transformation.
##
expression E1, E2, E3, E4;
##
- E1 = E2 ? E3 : E4;
+ if (E2)
+ E1 = E3;
+ else
+ E1 = E4;
##
type T;
identifier E5;
T *E3;
T *E4;
expression E1, E2;
##
- E1 = ((E2) ? (E3) : (E4))->E5;
+ if (E2)
+ E1 = E3->E5;
+ else
+ E1 = E4->E5;
##
type T;
identifier E5;
T E3;
T E4;
expression E1, E2;
##
- E1 = ((E2) ? (E3) : (E4)).E5;
+ if (E2)
+ E1 = (E3).E5;
+ else
+ E1 = (E4).E5;

The DMS Software Reengineering Toolkit can do this, by applying program transformations.
A specific DMS transformation to match your specific example:
domain C.
rule ifthenelseize_conditional_expression(a:lvalue,A:condition,B:term,C:term):
stmt -> stmt
= " \a = \A ? \B : \C; "
-> " if (\A) \a = \B; else \a=\C ; ".
You'd need another rule to handle your other case, but it is equally easy to express.
The transformations operate on source code structures rather than text, so layout out and comments won't affect recognition or application. The quotation marks in the rule not traditional string quotes, but rather are metalinguistic quotes that separate the rule syntax language from the pattern langu age used to specify the concrete syntax to be changed.
There are some issues with preprocessing directives if you intend to retain them. Since you apparantly are willing to work with preprocessor-expanded code, you can ask DMS to do the preprocessing as part of the transformation step; it has full GCC4 and GCC4-compatible preprocessors built right in.
As others have observed, this is a rather easy case because you specified it work at the level of a full statement. If you want to rid the code of any assignment that looks similar to this statement, with such assignments embedded in various contexts (initializers, etc.) you may need a larger set of transforms to handle the various set of special cases, and you may need to manufacture other code structures (e.g., temp variables of appropriate type). The good thing about a tool like DMS is that it can explicitly compute a symbolic type for an arbitrary expression (thus the type declaration of any needed temps) and that you can write such a larger set rather straightforwardly and apply all of them.
All that said, I'm not sure of the real value of doing your ternary-conditional-expression elimination operation. Once the compiler gets hold of the result, you may get similar object code as if you had not done the transformations at all. After all, the compiler can apply equivalence-preserving transformations, too.
There is obviously value in making regular changes in general, though.
(DMS can apply source-to-source program transformations to many langauges, including C, C++, Java, C# and PHP).

I am not aware of such a thing as the ternary operator is built-into the language specifications as a shortcut for the if logic... the only way I can think of doing this is to manually look for those lines and rewrite it into the form where if is used... as a general consensus, the ternary operator works like this
expr_is_true ? exec_if_expr_is_TRUE : exec_if_expr_is_FALSE;
If the expression is evaluated to be true, execute the part between ? and :, otherwise execute the last part between : and ;. It would be the reverse if the expression is evaluated to be false
expr_is_false ? exec_if_expr_is_FALSE : exec_if_expr_is_TRUE;

If the statements are very regular like this why not run your files through a little Perl script? The core logic to do the find-and-transform is simple for your example line. Here's a bare bones approach:
use strict;
while(<>) {
my $line = $_;
chomp($line);
if ( $line =~ m/(\S+)\s*=\s*\((\s*\S+\s*)\)\s*\?\s*(\S+)\s*:\s*(\S+)\s*;/ ) {
print "if(" . $2 . ")\n\t" . $1 . " = " . $3 . "\nelse\n\t" . $1 . " = " . $4 . "\n";
} else {
print $line . "\n";
}
}
exit(0);
You'd run it like so:
perl transformer.pl < foo.c > foo.c.new
Of course it gets harder and harder if the text pattern isn't as regular as the one you posted. But free, quick and easy to try.

Related

Regex with no 2 consecutive a's and b's

I have been trying out some regular expressions lately. Now, I have 3 symbols a, b and c.
I first looked at a case where I don't want 2 consecutive a's. The regex would be something like:
((b|c + a(b|c))*(a + epsilon)
Now I'm wondering if there's a way to generalize this problem to say something like:
A regular expression with no two consecutive a's and no two consecutive b's. I tried stuff like:
(a(b|c) + b(a|c) + c)* (a + b + epsilon)
But this accepts inputs such as"abba" or "baab" which will have 2 consecutive a's (or b's) which is not what I want. Can anyone suggest me a way out?
If you can't do a negative match then perhaps you can use negative lookahead to exclude strings matching aa and bb? Something like the following (see Regex 101 for more information):
(?!.*(aa|bb).*)^.*$
I (think I) solved this by hand-drawing a finite state machine, then, generating a regex using FSM2Regex. The state machine is written below (with the syntax from the site):
#states
s0
s1
s2
s3
#initial
s0
#accepting
s1
s2
s3
#alphabet
a
b
c
#transitions
s0:a>s1
s0:b>s2
s0:c>s3
s1:b>s2
s1:c>s3
s2:a>s1
s2:c>s3
s3:c>s3
s3:a>s1
s3:b>s2
If you look at the transitions, you'll notice it's fairly straightforward- I have states that correspond to a "sink" for each letter of the alphabet, and I only allow transitions out of that state for other letters (not the "sink" letter). For example, s1 is the "sink" for a. From all other states, you can get to s1 with an a. Once you're in s1, though, you can only get out of it with a b or a c, which have their own "sinks" s2 and s3 respectively. Because we can repeat c, s3 has a transition to itself on the character c. Paste the block text into the site, and it'll draw all this out for you, and generate the regex.
The regex it generated for me is:
c+cc*(c+$+b+a)+(b+cc*b)(cc*b)*(c+cc*(c+$+b+a)+$+a)+(a+cc*a+(b+cc*b)(cc*b)*(a+cc*a))(cc*a+(b+cc*b)(cc*b)*(a+cc*a))*(c+cc*(c+$+b+a)+(b+cc*b)(cc*b)*(c+cc*(c+$+b+a)+$+a)+b+$)+b+a
Which, I'm pretty sure, is not optimal :)
EDIT: The generated regex uses + as the choice operator (usually known to us coders as |), which means it's probably not suitable to pasting into code. However, I'm too scared to change it and risk ruining my regex :)
You can use back references to match the prev char
string input = "acbbaacbba";
string pattern = #"([ab])\1";
var matchList = Regex.Matches(input, pattern);
This pattern will match: bb, aa and bb. If you don't have any match in your input pattern, it means that it does not contain a repeated a or b.
Explanation:
([ab]): define a group, you can extend your symbols here
\1: back referencing the group, so for example, when 'a' is matched, \1 would be 'a'
check this page: http://www.regular-expressions.info/backref.html

Find conditional evaluation in for loop using libclang

I'm using clang (via libclang via the Python bindings) to put together a code-review bot. I've been making the assumption that all FOR_STMT cursors will have 4 children; INIT, EVAL, INC, and BODY..
for( INIT; EVAL; INC )
BODY;
which would imply that I could check the contents of the evaluation expression with something in python like:
forLoopComponents = [ c for c in forCursor.get_children() ]
assert( len( forLoopComponents ) == 4 )
initExpressionCursor = forLoopComponents[ 0 ]
evalExpressionCursor = forLoopComponents[ 1 ]
incExpressionCursor = forLoopComponents[ 2 ]
bodyExpressionCursor = forLoopComponents[ 3 ]
errorIfContainsAssignment( evalExpressionCursor ) # example code style rule
This approach seems...less than great to begin with, but I just accepted it as a result of libclang, and the Python bindings especially, being rather sparse. However I've recently noticed that a loop like:
for( ; a < 4; a-- )
;
will only have 3 children -- and the evaluation will now be the first one rather than the second. I had always assumed that libclang would just return the NULL_STMT for any unused parts of the FOR_STMT...clearly, I was wrong.
What is the proper approach for parsing the FOR_STMT? I can't find anything useful for this in libclang.
UPDATE: Poking through the libclang source, it looks like these 4 components are dumbly added from the clang::ForStmt class using a visitor object. The ForStmt object should be returning null statement objects, but some layer somewhere seems to be stripping these out of the visited nodes vector...?
The same here, as a workaround I replaced the first empty statement with a dummy int foo=0 statement.
I can imagine a solution, which uses Cursor's get_tokens to match the parts of the statement.
The function get_tokens can help in situations, where clang is not enough.

Finding all possible combinations of truth values to well-formed formulas occuring in the given formula in C

I want to write a truth table evaluator for a given formula like in this site.
http://jamie-wong.com/experiments/truthtabler/SLR1/
Operators are:
- (negation)
& (and)
| (or)
> (implication)
= (equivalence)
So far I made this
-(-(a& b) > ( -((a|-s)| c )| d))
given this formula my output is
abdsR
TTTT
TTTF
TTFT
TTFF
TFTT
TFTF
TFFT
TFFF
FTTT
FTTF
FTFT
FTFF
FFTT
FFTF
FFFT
FFFF
I am having difficulties with the evaluating part.
I created an array which in I stored indises of parenthesis if it helps,namely
7-3, 17-12, 20-11, 23-9, 24-1
I also checked the code in http://www.stenmorten.com/English/llc/source/turth_tables_ass4.c
,however I didn't get it.
Writing an operator precedence parser to evaluate infix notation expressions is not an easy task. However, the shunting yard algorithm is a good place to start.

How do I execute, as part of a ML{*...*} command, a string that contains ML source?

From the Isabelle user list Makarius Wenzel says this:
It is also possible to pass around ML sources as strings or tokens in Isabelle/ML, and invoke the compiler on it. That is a normal benefit of incremental compilation.
I have an ML statement as a string, like this:
ML{* val source_string = "val x = 3 + 4"; *}
I want to use "val x = 3 + 4" as a ML statement in a ML{*...*} command. I can do it by calling Poly/ML externally like this:
ML{*
fun polyp cmd = Isabelle_System.bash
("perl -e 'print q(" ^ cmd ^ ");' | $ML_HOME/poly");
*}
ML{* polyp source_string; *}
That takes about 200ms. I figure it would be about 0ms if I could do it internally.
Update 140411
Makarius Wenzel may have another way to do things, but what I have below with ML_Context.eval_text is almost what I want. It's in line with what I've been experimenting with. The problem is that used_only_by_src1 is global. I can't put it in the let.
I suppose if I was using src1 in two different ML{*...*} commands, there's some short period of time in which used_only_by_src1 could be changed by the one before it was used by the other. But, I guess this is all part of learning about stateless programming.
ML{*
val src1 = "x + y" (*I would actually have a global list of sources.*)
val used_only_by_src1 = Unsynchronized.ref "";
fun two_int_arg_fun s1 s2 = let
val s = "val x = " ^ s1 ^ "val y = " ^ s2
^ "used_only_by_src1 := (Int.toString(" ^ src1 ^ "))";
val _ = ML_Context.eval_text true Position.none s;
in !used_only_by_src1 end;
*}
ML{*
two_int_arg_fun;
two_int_arg_fun "44;" "778;"
(*3ms*)
*}
Note 140412: Actually, I don't need to execute ML strings. I can program in ML like normal, since any ML{*...*} can access global ML functions, where I can write what I need.
The main solution I got from this is how to pass arguments to a string of Perl code, which I got here from trying to do it for ML, so thanks to davidg. Plus, ML_Context.exec and ML_Context.eval_text might come in useful somewhere, and learning enough to be able to use them is totally useful.
There is the problem of needing a local or global Unsynchronized.ref that's guaranteed to not be changed by some other code (or a non-mutable type), but surely there's a solution to that.
I didn't pursue ML_Context.exec because isar_syn.ML wasn't making any sense to me, but I've gotten as far as below, so now I'm asking, "What functions do I need that involve Context.generic -> Context.generic or Toplevel.transition -> Toplevel.transition, and what do those do for me as far as being able to get the return value of 3+4 in a ML{*...*}?"
I'm using grep with Isabelle_System.bash, and I've gotten as far as what's below, in looking for the right signatures in Isabelle/Pure. I throw in for free a grep I use to look for useful or needed functions in the Poly/ML Basis.
ML{*
Isabelle_System.bash ("grep -nr 'get_generic' $ISABELLE_HOME/src/Pure");
Isabelle_System.bash ("grep -nr 'hash' $ML_HOME/../src/Basis/*");
Config.get_generic;
(*From use at line 265 of isar_syn.ML.*)
ML_Context.exec;
ML_Context.exec (fn () => ML_Context.eval_text true Position.none "3 + 4");
(*OUT: val it = fn: Context.generic -> Context.generic*)
(fn (txt, pos) =>
Toplevel.generic_theory
(ML_Context.exec (fn () => ML_Context.eval_text true pos txt) #>
Local_Theory.propagate_ml_env)) ("3 + 4", Position.none);
(*OUT: val it = fn: Toplevel.transition -> Toplevel.transition*)
*}
The structure ML_Context might be a good starting point to look. For instance, your expression can be executed as so:
ML {*
ML_Context.eval_text true Position.none "val x = 3 + 4"
*}
This will internally evaluate the expression 3 + 4, and throw away the results.
Functions like ML_Context.exec will allow you to capture the results of expressions and put them into your local context; you might want to look at implementation of the ML Isar command in src/Pure/Isar/isar_syn.ML to see how such functions are used in practice.

Array Operations in perl

Why does the code below return 11 with this :- #myarray = ("Rohan");
Explaination i got was :- The expression $scalar x $num_times, on the other hand, returns a string containing $num_times copies of $scalar concatenated together string-wise.
So it should give 10 not 11 ...
code is as below :-
print "test:\n";
#myarray = ("Rohan"); # this returns 11
###myarray = ("Rohan","G"); this returns 22
#myarray2 = (#myarray x 2);
#myarray3 = ((#myarray) x 2); #returns Rohan,Rohan and is correct
print join(",",#myarray2,"\n\n");
print join(",",#myarray3,"\n\n");
What’s happening is that the x operator supplies scalar context not just to its right‐hand operand, but also to its left‐and operand as well — unless the LHO is surrounded by literal parens.
This rule is due to backwards compatibility with super‐ancient Perl code from back when Perl didn’t understand having a list as the LHO at all. This might be a v1 vs v2 thing, a v2 vs v3 thing, or maybe v3 vs v4. Can’t quite remember; it was a very long time ago, though. Ancient legacy.
Since an array of N elements in scalar context in N, so in your scenario that makes N == 1 and "1" x 2 eq "11".
Perl is doing exactly what you asked. In the first example, the array is in scalar context and returns its length. this is then concatenated with itself twice. In the second example, you have the array in list context and the x operator repeats the list.

Resources