How do I change the perl regex for code formatting? - c

I am trying to understand the below code. But I am not getting it.
Basically, the below code currently checks for if condition in a c or cpp file.
if ($perl_version_ok &&
$line =~ /^\+(.*)\b($Constant|[A-Z_][A-Z0-9_]*)\s*($Compare)\s*($LvalOrFunc)/) {
# Throw error
}
where Constant is any macro or any constant value; LvalOrFunc is any variable or function call; Compare is the operations like !=, == ,&& , etc
The if checks for codes like this if(CONST_VALUE == x), where CONST_VALUE is some macros. In this case its true and goes inside if condition.
But I want to check for the opposite if(x == CONST_VALUE ), and then throw error.
Please help in understanding this piece of line and how to achieve the desired result.
Note:
The code is from linux kernel dir, available here: https://github.com/torvalds/linux/blob/master/scripts/checkpatch.pl
Line number of the code: 5483

The code doesn't check for if(CONST_VALUE == x). As shown in the comments above the line in the source code
# comparisons with a constant or upper case identifier on the left
# avoid cases like "foo + BAR < baz"
# only fix matches surrounded by parentheses to avoid incorrect
# conversions like "FOO < baz() + 5" being "misfixed" to "baz() > FOO + 5"
it checks for a plus sign followed by a CONSTANT_VALUE == x. The \+ in the regex matches a plus sign.
$line =~ /^\+(.*)\b($Constant|[A-Z_][A-Z0-9_]*)\s*($Compare)\s*($LvalOrFunc)/
^ ^ ^ ^ ^
| | | | |
binding | | | word
operator | | | boundary
start | |
of string | |
plus |
anything
Reverting the compared values should be easy:
($LvalOrFunc)\s*($Compare)\s*($Constant|[A-Z_][A-Z0-9_]*)

Related

bash: search for string, copy string next to it and list all for further post-processing

I have the following challenge:
my source_file.txt contains:
track001="alpha"
some text ... but also again the string track001 without " symbol... some more text
track002="beta"
some text ... but also again the string track002 without " symbol ... some more text
track027="gamma"
some text ... but also again the string track003 without " symbol ... some more text
track...="..."
... about 30 entries.
Now, I want to
search for the string next to trackxxx=" (=> find the alpha, beta and gamma string)
afterwards provide the list to the user for further pre-processing in the terminal:
| Reference | Title | Status |
|---------- |--------| ------------------|
| 001 | alpha | [ not selected ] |
| 002 | beta | [ not selected ] |
| ... | ... | [ not selected ] |
| 027 | gamma | [ not selected ] |
type Reference number (xxx): < user prompt>
change Status (selected = 1 / not selected = 0): < user prompt >
I thought about:
to copy the file and delete all lines which do not start with trackxxx=" but I guess there is nice sed which does the magic.
I need to paste all into a matrix to ease the pre-processing
for the pre-processing I would like to keep it simple (terminal interaction) no zenity etc.. Maybe someone has an idea to make the selector operation more user friendly.
Appreciate your support, thank you!
As a partial answer, because of the request for explanation of my comments:
sed -n 's/^track\(.*\)="\([^"]*\).*/ \1 \2 /p' will give you a list of
001 alpha
002 beta
...
027 gamma
which can be fed into a for-loop in bash to do the actual processing.
sed -n will not produce output, unless a line is explicitly printed
s/pattern/replacement/ replaces the pattern by the replacement
^track matches track if it is at the beginning of a line (^)
\(.*\) creates a capture group; the \( opens the capture group and the \) closes it. The capture group contains all characters up to the next element in the pattern
-=" This is the next element in the pattern: literal ="
\([^"]*\) second capture group. All character that are not " are added to this group.
.* the rest of the line. Will most probably begin with a ", but if you forget the closing ", that's ok too.
-The replacement string \1 \2 is a combination of the two capture group, \1 for the first and \2 for the second.
p Explicitly print this line if the pattern is matched. Because of the -n, normal output is suppressed, and you will get only the explicitly printed lines.

How to define standard mathematical notation with mpc parser

I am reading a compiler tutorial here www.buildyourownlisp.com. It uses a parser combinator called mpc. What I have at the moment will parse polish notation, but I'm trying to work out how to use standard notation with it. I just can't seem to how to do it.
The rules of the parser are as follows:
. Any character is required.
a The character a is required.
[abcdef] Any character in the set abcdef is required.
[a-f] Any character in the range a to f is required.
a? The character a is optional.
a* Zero or more of the character a are required.
a+ One or more of the character a are required.
^ The start of input is required.
$ The end of input is required.
"ab" The string ab is required.
'a' The character a is required.
'a' 'b' First 'a' is required, then 'b' is required.
'a' | 'b' Either 'a' is required, or 'b' is required.
'a'* Zero or more 'a' are required.
'a'+ One or more 'a' are required.
<abba> The rule called abba is required.
The polish notation is written like this:
" \
number: /-?[0-9]+/'.'?/[0-9]+/ ; \
operator: '+' | '-' | '*' | '/' | '%' | \"add\" | \"sub\" | \"mul\" | \"div\" ; \
expr: <number> | '(' <operator> <expr>+ ')' ; \
dlispy: /^/ <operator> <expr>+ /$/ ;",
I've managed to make it accept decimal numbers by adding '.'?/[0-9]+/, but i can't work out how to restructure it to accept standard notation eg 2*(3+2) instead of *2 (+ 3 2). I know that I'll have to rewrite the expr and the dlispy rules, but I'm new to regex and BNF. Hope you can help, thanks
Written as yacc rules, that would be:
expr : '(' expr ')
| expr operator expr
| number
;

Flex/Bison start condition

How to enable a start condition at the beginning of a rule and disable it at the end ? I have to ignore whitespace with some bison rules only.
How to ignore whitespace inside nested brackets.
define_directive:
DEFINE '(' class_name ')'{ ... }
;
I'm trying to write a parser for this sample code with some more rules.
#/*
* #Template Family
* #Description sample script template for Mate Programming language
* (multi-line comment)
*/
#namespace(sample)
#require(String fatherName)
#require(String motherName)
#require(Array childrenNames)
#define(Family : Template) #// end of header anything can go in body section below (comment)
Family Description
==================
Father's Name: #(fatherName)
Mother's Name: #(motherName)
Number of child: #(childrenNamesCount,0) #// valuation operator is null safe (comment)
List of children's names
------------------------
#foreach(childName:childrenNames)
> #(childName)
#empty
> there is no child name to display.
#end
##(varName) #// this should not be interpreted because escaped with # (comment)
Lexer and parser partially implemented. My problem is how to deal with whitespace inside statement keywords like #foreach, #require.
Whitespaces should be ignored for these.
desired sample output
Family Description
==================
Father's Name: Mira
Mother's Name: James
Number of child: 0
List of children's names
------------------------
> there is no child name to display.
##(varName)
bison file content
command:
fileword
| valuation
| alternative
| loop
| command_directive
;
fileword:
tokenword { scriptlangy_echo(yytext,"fileword.tokenword"); }
| MAGICESC { scriptlangy_echo("#","fileword.MAGICESC"); }
;
tokenword:
IDENTIFIER | NUMBER | STRING_LITERAL | WHITESPACE
| INC_OP | DEC_OP | AND_OP | OR_OP | LE_OP | GE_OP | EQ_OP | NE_OP | L_OP | G_OP
| ';' | ',' | ':' | '=' | ']' | '.' | '&' | '[' | '!' | '~' | '-' | '+' | '*' | '/' | '%' | '^' | '|' | ')' | '}' | '?' | '{' | '('
;
valuation:
'#' '(' expression ')' {
fprintf(yyout, "<val>");
}
| '#' '(' expression ',' default_value ')' {
fprintf(yyout, "<val>");
}
;
loop:
for_loop
| foreach_loop
| while_loop
;
while_loop:
WHILE '(' expression ')' end_block
| WHILE '(' expression ')' commands end_block
;
for_loop:
FOR '(' expression_statement expression_statement expression')' end_block
| FOR '(' expression_statement expression_statement expression')' commands end_block
;
foreach_loop:
foreach_block end_block
| foreach_block empty_block end_block
;
foreach_block:
FOREACH '(' IDENTIFIER ')'
| FOREACH '(' IDENTIFIER ':' expression')' commands
;
The key part of your question seems to be this:
I have to ignore whitespace with some bison rules only. How to ignore
whitespace inside nested brackets.
As I remarked in comments, your implementation idea of somehow doing this by having your parser rules manipulate scanner start conditions is pretty much a non-starter. Forget about that.
Since evidently your scanner does not, in general, ignore whitespace, it must emit tokens that represent whitespace, or perhaps tokens that represent something else plus whitespace (ugly). If it emits whitespace tokens then the thing to do is simply to account for them in your grammar rules. This is completely possible. In fact, you can build a parser for any context-free language on top of a scanner that just returns every character as its own token. The scanner / parser dichotomy is a functional and conceptual convenience, not a necessity.
For example, then, suppose we want to be able to parse numeric array literals, formed as a nonempty, comma-delimited list of decimal numbers enclosed in curly braces, with optional whitespace around commas and inside the braces. Suppose further that we have these terminal symbols to work with:
OPEN // open brace
CLOSE // close brace
NUM // maximal sequence of one or more decimal digits
COMMA // a comma
WS // a maximal run of whitespace
We might then write these rules:
array: array_start array_elements CLOSE;
array_start: OPEN
| OPEN WS
;
array_elements: array_element
| array_elements array_separator array_element
;
array_element: NUM
| NUM WS
;
array_separator: COMMA
| COMMA WS
;
There are, of course, many other ways to set up the details, but, generally speaking, this is how you handle whitespace with parser rules: not by ignoring it, but by accepting it.

Bison shift/reduce conflicts warning [duplicate]

I'm having trouble fixing a shift reduce conflict in my grammar. I tried to add -v to read the output of the issue and it guides me towards State 0 and mentions that my INT and FLOAT is reduced to variable_definitions by rule 9. I cannot see the conflict and I'm having trouble finding a solution.
%{
#include <stdio.h>
#include <stdlib.h>
%}
%token INT FLOAT
%token ADDOP MULOP INCOP
%token WHILE IF ELSE RETURN
%token NUM ID
%token INCLUDE
%token STREAMIN ENDL STREAMOUT
%token CIN COUT
%token NOT
%token FLT_LITERAL INT_LITERAL STR_LITERAL
%right ASSIGNOP
%left AND OR
%left RELOP
%%
program: variable_definitions
| function_definitions
;
function_definitions: function_head block
| function_definitions function_head block
;
identifier_list: ID
| ID '[' INT_LITERAL ']'
| identifier_list ',' ID
| identifier_list ',' ID '[' INT_LITERAL ']'
;
variable_definitions:
| variable_definitions type identifier_list ';'
;
type: INT
| FLOAT
;
function_head: type ID arguments
;
arguments: '('parameter_list')'
;
parameter_list:
|parameters
;
parameters: type ID
| type ID '['']'
| parameters ',' type ID
| parameters ',' type ID '['']'
;
block: '{'variable_definitions statements'}'
;
statements:
| statements statement
;
statement: expression ';'
| compound_statement
| RETURN expression ';'
| IF '('bool_expression')' statement ELSE statement
| WHILE '('bool_expression')' statement
| input_statement ';'
| output_statement ';'
;
input_statement: CIN
| input_statement STREAMIN variable
;
output_statement: COUT
| output_statement STREAMOUT expression
| output_statement STREAMOUT STR_LITERAL
| output_statement STREAMOUT ENDL
;
compound_statement: '{'statements'}'
;
variable: ID
| ID '['expression']'
;
expression_list:
| expressions
;
expressions: expression
| expressions ',' expression
;
expression: variable ASSIGNOP expression
| variable INCOP expression
| simple_expression
;
simple_expression: term
| ADDOP term
| simple_expression ADDOP term
;
term: factor
| term MULOP factor
;
factor: ID
| ID '('expression_list')'
| literal
| '('expression')'
| ID '['expression']'
;
literal: INT_LITERAL
| FLT_LITERAL
;
bool_expression: bool_term
| bool_expression OR bool_term
;
bool_term: bool_factor
| bool_term AND bool_factor
;
bool_factor: NOT bool_factor
| '('bool_expression')'
| simple_expression RELOP simple_expression
;
%%
Your definition of a program is that it is either a list of variable definitions or a list of function definitions (program: variable_definitions | function_definitions;). That seems a bit odd to me. What if I want to define both a function and a variable? Do I have to write two programs and somehow link them together?
This is not the cause of your problem, but fixing it would probably fix the problem as well. The immediate cause is that function_definitions is one or more function definition while variable_definitions is zero or more variable definitions. In other words, the base case of the function_definitions recursion is a function definition, while the base case of variable_definitions is the empty sequence. So a list of variable definitions starts with an empty sequence.
But both function definitions and variable definitions start with a type. So if the first token of a program is int, it could be the start of a function definition with return type int or a variable definition of type int. In the former case, the parser should shift the int in order to produce the function_definitions base case:; in the latter case, it should immediately reduce an empty variable_definitions base case.
If you really wanted a program to be either function definitions or variable definitions, but not both. you would need to make variable_definitions have the same form as function_definitions, by changing the base case from empty to type identifier_list ';'. Then you could add an empty production to program so that the parser could recognize empty inputs.
But as I said at the beginning, you probably want a program to be a sequence of definitions, each of which could either be a variable or a function:
program: %empty
| program type identifier_list ';'
| program function_head block
By the way, you are misreading the output file produced by -v. It shows the following actions for State 0:
INT shift, and go to state 1
FLOAT shift, and go to state 2
INT [reduce using rule 9 (variable_definitions)]
FLOAT [reduce using rule 9 (variable_definitions)]
Here, INT and FLOAT are possible lookaheads. So the interpretation of the line INT [reduce using rule 9 (variable_definitions)] is "if the lookahead is INT, immediately reduce using production 9". Production 9 produces the empty sequence, so the reduction reduces zero tokens at the top of the parser stack into a variable_definitions. Reductions do not use the lookahead token, so after the reduction, the lookahead token is still INT.
However, the parser doesn't actually do that because it has a different action for INT, which is to shift it and go to state 1. as indicated by the first line start INT. The brackets [...] indicate that this action is not taken because it is a conflict and the resolution of the conflict was some other action. So the more accurate interpretation of that line is "if it weren't for the preceding action on INT, the lookahead INT would cause a reduction using rule 9."

Assign values to dynamic arrays

My bash script needs to read values from a properties file and assign them to a number of arrays. The number of arrays is controlled via configuration as well. My current code is as follows:
limit=$(sed '/^\#/d' $propertiesFile | grep 'limit' | tail -n 1 | cut -d "=" -f2- | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
for (( i = 1 ; i <= $limit ; i++ ))
do
#properties that define values to be assigned to the arrays are labeled myprop## (e.g. myprop01, myprop02):
lookupProperty=myprop$(printf "%.2d" "$i")
#the following line reads the value of the lookupProperty, which is a set of space-delimited strings, and assigns it to the myArray# (myArray1, myArray2, etc):
myArray$i=($(sed '/^\#/d' $propertiesFile | grep $lookupProperty | tail -n 1 | cut -d "=" -f2- | sed 's/^[[:space:]]*//;s/[[:space:]]*$//'))
done
When I attempt to execute the above code, the following error message is displayed:
syntax error near unexpected token `$(sed '/^\#/d' $propertiesFile | grep $lookupProperty | tail -n 1 | cut -d "=" -f2- | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')'
I am quite sure the issue is in the way I am declaring the "myArray$i" arrays. However, any different approach I tried produced either the same errors or incomplete results.
Any ideas/suggestions?
You are right that bash does not recognize the construct myArray$i=(some array values) as an array variable assignment. One work-around is:
read -a myArray$i <<<"a b c"
The read -a varname command reads an array from stdin, which is provided by the "here" string <<<"a b c", and assigns it to varname where varname can be constructs like myArray$i. So, in your case, the command might look like:
read -a myArray$i <<<"$(sed '/^\#/d' $propertiesFile | grep$lookupProperty | tail -n 1 | cut -d "=" -f2- | seds/^[[:space:]]*//;s/[[:space:]]*$//')"
The above allows assignment. The next issue is how to read out variables like myArray$i. One solution is to name the variable indirectly like this:
var="myArray$i[2]" ; echo ${!var}

Resources