perl6 interpolate array in match for AND, OR, NOT functions - arrays

I am trying to re-do my program for match-all, match-any, match-none of the items in an array. Some of the documentations on Perl6 don't explain the behavior of the current implementation (Rakudo 2018.04) and I have a few more questions.
(1) Documentation on regex says that interpolating array into match regex means "longest match"; however, this code does not seem to do so:
> my $a="123 ab 4567 cde";
123 ab 4567 cde
> my #b=<23 b cd 567>;
[23 b cd 567]
> say (||#b).WHAT
(Slip)
> say $a ~~ m/ #b /
「23」 # <=== I expected the match to be "567" (#b[3] matching $a) which is longer than "23";
(2) (||#b) is a Slip; how do I easily do OR or AND of all the elements in the array without explicitly looping through the array?
> say $a ~~ m:g/ #b /
(「23」 「b」 「567」 「cd」)
> say $a ~~ m:g/ ||#b /
(「23」 「b」 「567」 「cd」)
> say $a ~~ m/ ||#b /
「23」
> say $a ~~ m:g/ |#b /
(「23」 「b」 「567」 「cd」)
> say $a ~~ m:g/ &#b /
(「23」 「b」 「567」 「cd」)
> say $a ~~ m/ &#b /
「23」
> say $a ~~ m/ &&#b /
「23」 # <=== && and & don't do the AND function
(3) What I ended up doing is condensing my previous codes into 2 lines:
my $choose = &any; # can prompt for choice of any, one, all, none here;
say so (gather { for #b -> $z { take $a ~~ m/ { say "==>$_ -->$z"; } <{$z}> /; } }).$choose;
output is "True" as expected. But I am hoping a simpler way, without the "gather-take" and "for" loop.
Thank you very much for any insights.
lisprog

interpolate array in match for AND, OR, NOT functions
I don't know any better solution than Moritz's for AND.
I cover OR below.
One natural way to write a NOT of a list of match tokens would be to use the negated versions of a lookahead or lookbehind assertion, eg:
my $a="123 ab 4567 cde";
my #b=<23 b cd 567>;
say $_>>.pos given $a ~~ m:g/ <!before #b> /;
displays:
(0 2 3 4 6 7 9 10 11 13 14 15)
which is the positions of the 12 matches of not 23, b, cd, or 567 in the string "123 ab 4567 cde", shown by the line of ^s below which point to each of the character positions that matched:
my $a="123 ab 4567 cde";
^ ^^^ ^^ ^^^ ^^^
0123456789012345
I am trying to re-do my program for match-all, match-any, match-none of the items in an array.
These sound junction like and some of the rest of your question is clearly all about junctions. If you linked to your existing program it might make it easier for me/others to see what you're trying to do.
(1)
||#b matches the leftmost matching token in #b, not the longest one.
Write |#b, with a single |, to match the longest matching token in #b. Or, better yet, write just plain #b, which is shorthand for the same thing.
Both of these match patterns (|#b or ||#b), like any other match patterns, are subject to the way the regex engine works, as briefly described by Moritz and in more detail below.
When the regex engine matches a regex against an input string, it starts at the start of the regex and the start of the input string.
If it fails to match, it steps past the first character in the input string, giving up on that character, and instead pretends the input string began at its second character. Then it tries matching again, starting at the start of the regex but the second character of the input string. It repeats this until it either gets to the end of the string or finds a match.
Given your example, the engine fails to match right at the start of 123 ab 4567 cde but successfully matches 23 starting at the second character position. So it's then done -- and the 567 in your match pattern is irrelevant.
One way to get the answer you expected:
my $a="123 ab 4567 cde";
my #b=<23 b cd 567>;
my $longest-overall = '';
sub update-longest-overall ($latest) {
if $latest.chars > $longest-overall.chars {
$longest-overall = $latest
}
}
$a ~~ m:g/ #b { update-longest-overall( $/ ) } /;
say $longest-overall;
displays:
「567」
The use of :g is explained below.
(2)
|#b or ||#b in mainline code mean something completely unrelated to what they mean inside a regex. As you can see, |#b is the same as #b.Slip. ||#b means #b.Slip.Slip which evaluates to #b.Slip.
To do a "parallel" longest-match-pattern-wins OR of the elements of #b, write #b (or |#b) inside a regex.
To do a "sequential" leftmost-match-pattern-wins OR of the elements of #b, write ||#b inside a regex.
I've so far been unable to figure out what & and && do when used to prefix an array in a regex. It looks to me like there are multiple bugs related to their use.
In some of the code in your question you've specified the :g adverb. This directs the engine to not stop when it finds a match but rather to step past the substring it just matched and begin trying to match again further along in the input string.
(There are other adverbs. The :ex adverb is the most extreme. In this case, when there's a match at a given position in the input string, the engine tries to match any other match pattern at the same position in the regex and input string. It keeps doing this no matter how many matches it accumulates until it has tried every last possible match at that position in the regex and input string. Only when it's exhausted all these possibilities does it move forward one character in the input string, and tries exhaustively matching all over again.)
(3)
My best shot:
my $a="123 ab 4567 cde";
my #b=<23 b cd 567>;
my &choose = &any;
say so choose do for #b -> $z {
$a ~~ / { say "==>$a -->$z"; } $z /
}

(1) Documentation on regex says that interpolating array into match regex means "longest match"; however, this code does not seem to do so:
The actual rule is that a regex finds the left-most match first, and the longest match second.
However, the left-most rule is true for all regex matches, which is why the regex documentation doesn't explicitly mention it when talking about alternations.
(2) (||#b) is a Slip; how do I easily do OR or AND of all the elements in the array without explicitly looping through the array?
You can always construct a regex as text first:
my $re_text = join '&&', #branches;
my $regex = re/ <$re_text> /;

Related

Why is Perl giving "Can't modify string in scalar output" error?

I'm pretty new to Perl and this is my most complex project yet. Apologies if any parts of my explanation don't make sense or I miss something out - I'll be happy to provide further clarification. It's only one line of code that's causing me an issue.
The Aim:
I have a text file that contains a single column of data. It reads like this:
0
a,a,b,a
b,b,b,a
1
a,b,b,a
b,b,b,a
It continues like this with a number in ascending order up to 15, and the following two lines after each number are a combination of four a's or b's separated by commas. I have tied this file to an array #diplo so I can specify specific lines of it.
I also have got a file that contains two columns of data with headers that I have converted into a hash of arrays (with each of the two columns being an array). The name of the hash is $lookup and the array names are the names of the headings. The actual arrays only start from the first value in each column that isn't a heading. This file looks like this:
haplo frequency
"|5,a,b,a,a|" 0.202493719
"|2,b,b,b,a|" 0.161139191
"|3,b,b,b,a|" 0.132602458
This file contains all of the possible combinations of a or b at the four positions combined with all numbers 0-14 and their associated frequencies. In other words, it includes all possible combinations from "|0,a,a,a,a|" followed be "|1,a,a,a,a|" through to "|13,b,b,b,b|" and "|14,b,b,b,b|".
I want my Perl code to go through each of the combinations of letters in #diplo starting with a,a,b,a and record the frequency associated with the row of the haplo array containing each number from 0-14, e.g. first recording the frequency associated with "|0,a,a,b,a|" then "|1,a,a,b,a|" etc.
The output would hopefully look like this:
0 #this is the number in the #diplo file and they increase in order from 0 up to 15
0.011 0.0023 0.003 0.0532 0.163 0.3421 0.128 0.0972 0.0869 0.05514 0.0219 0.0172 0.00824 0.00886 0.00196 #these are the frequencies associated with x,a,a,b,a where x is any number from 0 to 14.
My code:
And here is the Perl code I created to hopefully sort this out (there is more to create the arrays and such which I can post if required, but I didn't want to post a load of code if it isn't necessary):
my $irow = 1; #this is the row/element number in #diplo
my $lrow = 0; #this is the row/element in $lookup{'haplo'}
my $copynumber = 0;
#print "$copynumber, $diplo[2]";
while ($irow < $diplolines - 1) {
while ($copynumber < 15) {
while ($lrow < $uplines - 1) {
if ("|$copynumber,$diplo[$irow]|" = $lookup{'haplo'}[$lrow]) { ##this is the only line that causes errors
if ($copynumber == 0) {
print "$diplo[$irow-1]\n";
#print "$lookup{'frequency'}[$lrow]\t";
}
print "$lookup{'frequency'}[$lrow]\t";
}
$lrow = $lrow + 1;
}
$lrow = 0;
$copynumber = $copynumber + 1;
}
$lrow = 0;
$copynumber = 0;
$irow = $irow + 1;
}
However, the line if ("|$copynumber,$diplo[$irow]|" = $lookup{'haplo'}[$lrow]) is causing an error Can't modify string in scalar assignment near "]) ".
I have tried adding in speech marks, rounded brackets and apostrophes around various elements in this line but I still get some sort of variant on this error. I'm not sure how to get around this error.
Apologies for the long question, any help would be appreciated.
EDIT: Thanks for the suggestions regarding eq, it gets rid of the error and I now know a bit more about Perl than I did. However, even though I don't get an error now, if I put anything inside the if loop for this line, e.g. printing a number, it doesn't get executed. If I put the same command within the while loop but outside of the if, it does get executed. I have strict and warnings on. Any ideas?
= is assignment, == is numerical comparison, eq is string comparison.
You can't modify a string:
$ perl -e 'use strict; use warnings; my $foo="def";
if ("abc$foo" = "abcdef") { print "match\n"; } '
Found = in conditional, should be == at -e line 1.
Can't modify string in scalar assignment at -e line 1, near ""abcdef") "
Execution of -e aborted due to compilation errors.
Nonnumerical strings act like zeroes in a numerical comparison:
$ perl -e 'use strict; use warnings; my $foo="def";
if ("abc$foo" == 0) { print "match\n"; } '
Argument "abcdef" isn't numeric in numeric eq (==) at -e line 1.
match
A string comparison is probably what you want:
$ perl -e 'use strict; use warnings; my $foo="def";
if ("abc$foo" eq "abcdef") { print "match\n"; } '
match
This is the problematic expression:
"|$copynumber,$diplo[$irow]|" = $lookup{'haplo'}[$lrow]
The equals sign (=) is an assignment operator. It assigns the value on its right-hand side to the variable on its left-hand side. Therefore, the left-hand operand needs to be a variable, not a string as you have here.
I don't think you want to do an assignment here at all. I think you're trying to check for equality. So don't use an assignment operator, use a comparison operator.
Perl has two equality comparison operators. == does a numeric comparison to see if its operands are equal and eq does a string comparison. Why does Perl need two operators? Well Perl converts automatically between strings and numbers so it can't possibly know what kind of comparison you want to do. So you need to tell it.
What's the difference between the two types of comparison? Well, consider this code.
$x = '0';
$y = '0.0';
Are $x and $y equal? Well it depends on the kind of comparison you do. If you compare them as numbers then, yes, they are the same value (zero is the same thing whether it's an integer or a real number). But if you compare them as strings, they are different (they're not the same length for a start).
So we now know the following
$x == $y # this is true as it's a numeric comparison
$x eq $y # this is false as it's a string comparison
So let's go back to your code:
"|$copynumber,$diplo[$irow]|" = $lookup{'haplo'}[$lrow]
I guess you started with == here.
"|$copynumber,$diplo[$irow]|" == $lookup{'haplo'}[$lrow]
But that's not right as |$copynumber,$diplo[$irow]| is clearly as string, not a number. And Perl will give you a warning if you try to do a numeric comparison using a value that doesn't look like a number.
So you changed it to = and that doesn't work either as you've now changed it to an assignment.
What you really need is a string comparison:
"|$copynumber,$diplo[$irow]|" eq $lookup{'haplo'}[$lrow]

How do I select 3 character in a string after a specific symbol

Ok, so let me explain. I have some string like this : "BAHDGF - ZZZGH1237484" or
like this "HDG54 - ZZZ1HDGET4" I want here to select the triple Z (that are obviously 3 differents character but for the example I think it's more comprehensible for you).
So, my problem is the next : The first part has a modulable length but i can just ignore it, I need something to take the triple Z so I was thinking about something that can "slice" my string after
the " - ".
I started to try using "partition" but I just failed lamentably. I just get kinda lost with the news 3 array and then take the first 3 letter of one of the array, well, it seems very complicated and i think I'm just passing by an obvious solution that I can't find actually. It's been something like 2 days that i'm on it without anything in my mind that can help me, sort of blank page syndrome actually and I really need a little help to unlock this point.
Given:
examples=[ "BAHDGF - ZZZGH1237484", "HDG54 - ZZZ1HDGET4" ]
You could use a regex:
examples.each {|e| p e, e[/(?<=-\s)ZZZ/]}
Prints:
"BAHDGF - ZZZGH1237484"
"ZZZ"
"HDG54 - ZZZ1HDGET4"
"ZZZ"
Or .split with a regex:
examples.each {|e| p e.split(/-\s*(ZZZ)/)[1] }
'ZZZ'
'ZZZ'
If the 3 characters are something other than 'ZZZ' just modify your regex:
> "BAHDGF - ABCGH1237484".split(/\s*-\s*([A-Z]{3})/)[1]
=> "ABC"
If you wanted to use .partition it is two steps. Easiest with a regex partition and then just take the first three characters:
> "BAHDGF - ABCGH1237484".partition(/\s*-\s*/)[2][0..2]
=> "ABC"
"Selecting" the string "ZZZ" is a misnomer. What you have asked for is to determine if the string contains the substring "- ZZZ" and if it does, return "ZZZ":
"BAHDGF - ZZZGH1237484".include?("- ZZZ") && "ZZZ"
#=> "ZZZ"
"BAHDGF - ZZVGH1237484".include?("- ZZZ") && "ZZZ"
#=> false
That is very little different that just asking if the string contains the substring "- ZZZ":
if "BAHDGF - ZZZGH1237484".include?("- ZZZ")
...
end
If the question were instead, say, return a string of three identical capital letters following "- ", if present, you would be selecting a substring. That could be done as follows.
r = /- \K(\p{Lu})\1{2}/
"BAHDGF - XXXGH1237484"[r]
#=> "XXX"
"BAHDGF - xxxGH1237484"[r]
#=> nil
The regular expression reads, "match '- ', then forget everything matched so far and reset the match pointer to the current location (\K), then match an upper case Unicode letter (\p{Lu}) and save it to capture group 1 ((\p{Lu})), then match the contents of capture group 1 (\1) twice ({2})". One may alternatively use a positive lookbehind:
/(?<=- )(\p{Lu})\1{2}/

How do I filter elements in an array that both match a pattern and don't match a second pattern?

I'm using Ruby 2.4. I have the following expression for only keeping elements in an array that match a certain pattern:
lines.grep(/^[[:space:]]*\d+/)
How do I write a Ruby expression that keeps elements in an array that both match a pattern and don't match a second pattern? That is, I want to keep elements that match the above, but then also exclude elements that match:
/^[[:space:]]*\d+[:.]/
If my array contains only the element " 123 23:25 ", the result should contain this original element because it starts with a number that doesn't contain a ":" or "." after it.
Use the following regex:
/^[[:space:]]*\d++(?![:.])/
See this regex demo.
Here, ^[[:space:]]* part is the same as in the first regex, \d++ will possessively match 1+ digits (thus, deactivating backtracking) and (?![:.]) will fail any match if those 1+ digits are followed with either : or ..
Details:
^ - start of a line
[[:space:]]* - (may be replaced with \s*) - 0+ whitespace characters
\d++ - 1+ digits matches possessively so that backtracking into the pattern was not possible
(?![:.]) - a negative lookahead that fails the match if : or . is found immediately to the right of the current location (after 1+ digits).
To answer the general case of your question.
Try this
lines.select { |each| each =~ first_pattern && each !~ second_pattern }
Or if you use Ruby 2.3+
lines.grep(first_pattern).grep_v(second_pattern)
The name of grep_v is inspired by the grep -v command line options.

how to use RANDOM command in declared array

I am trying to build a script where I need to create a password generator with the following parameters :
must be at least 8 characters long but not longer than 16 characters.
must contain at least 1 digit (0-9).
must contain at least 1 lowercase letter.
must contain at least 1 uppercase letter.
must contain exactly one and only one of # # $ % & * + - =
I have two ideas :
The first :
#!/bin/bash
#
#Password Generator
#
#
Upper=('A''B''C''D''E''F''G''H''I''J'K''L''M''N''O''P''Q''R''S''T''U''V''W''X''Y''Z')
Lower=('a''b''c''d''e''z''f''g''h''i''j''k''l''m''o'''p''q''r''s''t''u''v''w''x''y''z')
Numbers=('1''2''3''4''5''6''7''8''9')
SpecialChar=('#''#''$''%''&''*''+''-''=')
if [ S# -eq 0 ] ; then
Pwlength=`shuf -i 8-16 -n 1`
Password=`< /dev/urandom tr -dc A-Za-z0-9$SpecialChar | head -c $Pwlength`
echo "Random Password is being generated for you"
sleep 5
echo "Your new password is : $Password"
exit
The problem is I get characters that I didnt even defined ?
The secound idea :
for((i=0;i<4;i++))
do
password=${Upper[$random % ${#Lower[#]} ] }
password=${Upper[$random % ${#Upper[#]} ] }
password=${Upper[$random % ${#Number[#]} ] }
password=${Upper[$random % ${#SpecialChar[#]} ] }
done
For some reason non of them work ;/
In your first example, move the "-" character to the end of the SpecialChar. I think the definition as you had it results in allowing "+" to "=" (i.e., the value passed to tr reads as "+-="), which accounts for the characters you were not expecting. Alternatively, replace the "-" with "_".
So, try a definition like:
SpecialChar=('#''#''$''%''&''*''+''=''-')
As already mentioned, it would be cleaner and easier to use directly a string to handle list of characters. Handling special characters in an array may cause side effects (for instance getting the list of files in the current directory with the '*' character). In addition, arrays may be difficult to pass as function arguments).
ALPHA_LOW="abcdefghijklmnopqrstuvwxyz"
# Then simply access the char in the string at the ith position
CHAR=${ALPHA_LOW:$i:1}
You could generate upper cases from the lower cases.
ALPHA_UP="`echo \"$ALPHA_LOW\" | tr '[:lower:]' '[:upper:]'`"
The variable which contains a random number is $RANDOM (in capital letters).
sleep 5 is unnecessary.
You need to find a way to keep count of occurrences left for each character type. For more information, I wrote a complete script to do what you described here.
Your first attempt has the following problems:
You are using arrays to contain single strings. Pasting 'A''B''C' is equivalent to 'ABC'. Simply converting these to scalar variables would probably be the simplest fix.
You are not quoting your variables.
You declare Upper, Lower, and Numbers, but then never use them.
Using variables for stuff you only ever use once (or less :-) is dubious anyway.
As noted by #KevinO, the dash has a special meaning in tr, and needs to be first or last if you want to match it literally.
On top of that, the sleep 5 doesn't seem to serve any useful purpose, and will begin to annoy you if it doesn't already. If you genuinely want a slower computer, I'm sure there are people here who are willing to trade.
Your second attempt has the following problems:
The special variable $RANDOM in Bash is spelled in all upper case.
You are trying to do modulo arithmetic on (unquoted!) arrays of characters, which isn't well-defined. The divisor after % needs to be a single integer. You can convert character codes to integers but it's sort of painful, and it's less than clear what you hope for the result to be.
A quick attempt at fixing the first attempt would be
Password=$(< /dev/urandom tr -dc 'A-Za-z0-9##$%&*+=-' | head -c $(shuf -i 8-16 -n 1))
If you want to verify some properties on the generated password, I still don't see why you would need arrays.
while read -r category expression; do
case $Password in
*[$expression]*) continue;;
*) echo "No $category"; break;;
esac
done <<'____HERE'
lowercase a-z
uppercase A-Z
numbers 0-9
specials -##$%&*+=
HERE

For loop to take the value of the whole array each time

Suppose I have 3 arrays, A, B and C
I want to do the following:
A=("1" "2")
B=("3" "4")
C=("5" "6")
for i in $A $B $C; do
echo ${i[0]} ${i[1]}
#process data etc
done
So, basically i takes the value of the whole array each time and I am able to access the specific data stored in each array.
On the 1st loop, i should take the value of the 1st array, A, on the 2nd loop the value of array B etc.
The above code just iterates with i taking the value of the first element of each array, which clearly isn't what I want to achieve.
So the code only outputs 1, 3 and 5.
You can do this in a fully safe and supportable way, but only in bash 4.3 (which adds namevar support), a feature ported over from ksh:
for array_name in A B C; do
declare -n current_array=$array_name
echo "${current_array[0]}" "${current_array[1]}"
done
That said, there's hackery available elsewhere. For instance, you can use eval (allowing a malicious variable name to execute arbitrary code, but otherwise safe):
for array_name in A B C; do
eval 'current_array=( "${'"$array_name"'[#]}"'
echo "${current_array[0]}" "${current_array[1]}"
done
If the elements of the arrays don't contain spaces or wildcard characters, as in your question, you can do:
for i in "${A[*]}" "${B[*]}" "${C[*]}"
do
iarray=($i)
echo ${iarray[0]} ${iarray[1]}
# process data etc
done
"${A[*]}" expands to a single string containing all the elements of ${A[*]}. Then iarray=($i) splits this on whitespace, turning the string back into an array.

Resources