Simplify long if-else chain with regex by comparing with array - arrays

for (my $j = 0; $j <100000; $j++){
my $outcome = rand();
for (my $k = 0; $k < #cum_sum; $k++){
if ($cum_sum[$k] >= $outcome){
if ($keys[$k] =~ m/\"|\/|\<|\>|\\|\`|\~|\#|\#|\$|\%|\^|\*|[0-9]/) {
print $out "";
}
if ($keys[$k] =~ m/\s/){
print $out " ";
}
elsif ($keys[$k] =~ m/\&/){
print $out " and ";
}
elsif ($keys[$k] =~ m/\!/){
print $out "! ";
}
elsif ($keys[$k] =~ m/\:/){
print $out ": ";
}
elsif ($keys[$k] =~ m/\'/){
print $out "' ";
}
elsif ($keys[$k] =~ m/\./){
print $out ". ";
}
elsif ($keys[$k] =~ m/\;/){
print $out "; ";
}
elsif ($keys[$k] =~ m/\?/){
print $out "? ";
}
elsif ($keys[$k] =~ m/\,/){
print $out ", ";
}
else {
print $out "$keys[$k]";
}
last;
}
}
# print "$outcome\n";
}
I mostly need help with simplifying the long chain of elsif statements I have.
The logic in the outer for loops works.
#keys is an array of two character (digrams) strings.
I am trying to make the elsif statements more efficient by comparing the digrams from #key to an array of the punctuation marks #punctuation = qw(! : " ' ; ? , .)
Then, if the digram does contain one of the punctuation marks in the punctuation array, the digram gets changed to "punctuation_mark " e.g. "a!" -> "! "
The end result would be that I do not use regexes for any of #punctuation.
However, I am not sure on how to implement this change.
Thank you!

In order to simplify your code, it may have sense to use the Tie::RegexpHash CPAN module. The main idea is build a hash with regexpr as keys, so you easily find the related values by matching:
use Tie::RegexpHash;
my $rehash = Tie::RegexpHash->new();
$rehash->add( qr/\s/, " " );
$rehash->add( qr/\&/, " and " );
#...
my $value = $rehash->match( "&" ); # $value <-- " and "

Use alternation, just like in the first regex, but capture the match and use $1 to replace the word
my $re_punct = join '|', map { quotemeta } qw(& ! : ' . ; ? ,); #'
for my $j (0..99_999) {
my $outcome = rand();
for my $k (0..$#cum_sum) {
...
if ($keys[$k] =~ /($re_punct)/) {
if ($1 eq '&') { $keys[$k] = " and " }
else { $keys[$k] = "$1 " }
}
...
}
}
Comments
quotemeta escapes by \ all "ASCII non-"word" characters"
syntax: $#ary is the index of the last element in #ary, just right for looping over array index
for my $i ($beg .. $end) is much clearer than the equivalent C-style for loop†
The presented logic leaves a question: what if both characters are punctuation?
Note on your idea to do it "By Comparing With Array"
You could use List::MoreUtils::first_value, for instance. It would go like
use List::MoreUtils qw(first_value);
my #punc = map { quotemeta } qw(& ! : ' . ; ? ,); #'
foreach my $word (#words) {
if (my $match = first_value { $word =~ /$_/ } #punct) {
$word = $match;
}
}
The first_value (or firstval) returns the first element of #punct for which the block returns true, and undef if none do. The $word aliases the currently processed element of #words and changing it changes the array element; so you get your replacement.
However, you still have to deal with regex and escape (at least some of) punctuation. Thus I see no advantage in going to this trouble; the "straight-up" regex is far clearer in this case.
† Even compiled languages evolve this way. The C++11 introduced the range-based for loop
for (auto var: container) ... // (really, const auto&), or auto&, or auto&&
and a standard reference linked above says
Used as a more readable equivalent to the traditional for loop [...]
In Perl this is how things are done; just use it.

So, looking at it you have two cases:
One where you replace a set of values, with 'the value and a space'. And another where you replace with something different.
So how about creating a lookup table for each, and processing just two regexes:
#!/usr/bin/env perl.
use strict;
use warnings;
use Data::Dumper;
my %replace = (
'"' => "",
'/' => "",
'&' => " and ",
);
my #add_space = ( ',', '?', ';', '.', "'", ':', '!' );
my $search = join ( "|", map { quotemeta } keys %replace );
$search = qr/($search)/;
my $add_space_after = join "|", map {quotemeta} #add_space;
$add_space_after = qr/($add_space_after)/;
while ( <DATA> ) {
s/\s/ /g;
s/$search/$replace{$1}/;
s/$add_space_after/$1 /;
print;
}
__DATA__
Work:Work
cookies&milk;wordhere
why?are;you/so "sad"
This gives us what you want I think, and - hopefully - keeps the code pretty conscise.
Of importance is the quotemeta function here, because it ensures your metachars are escaped before regexing them.
Note - the only one of your examples this doesn't handle is the \s to " ". But that's IMO probably best to write separately for clarity, as obviously youc can't quotemeta it.

Related

loop through elements of array to find character perl

I have a perl array where I only want to loop through elements 2-8.
The elements are only meant to contain numbers, so if any of those elements contain a letter, I want to set an error flag = 1, as well as some other variables as seen.
The reason I have 2 error flag variables is due to scope rules within the loop.
fields is an array, I created by splitting another irrelevant array by the " " key.
So, when I try to print error_line2, error_fname2 from outside the loop, I get this:
Use of uninitialized value $error_flag2 in numeric eq (==)
I don't know why, because I've initialized the value within the loop and created the variable outside the loop.
Not sure if I'm even looping to find characters correctly, so then it's not setting the error_flag2 = 1.
Example line:
bob hankerman 2039 3232 23 232 645 64x3 324
since element 7 has the letter 'x' , I want the flag to be set to 1.
#!/usr/bin/perl
use strict;
use warnings;
use Scalar::Util qw(looks_like_number);
my $players_file = $ARGV[0];
my #players_array;
open (my $file, "<", "$players_file")
or die "Failed to open file: $!\n";
while(<$file>) {
chomp;
push #players_array, $_;
}
close $file;
#print join "\n", #players_array;
my $num_of_players = #players_array;
my $error_flag;
my $error_line;
my $error_fname;
my $error_lname;
my $error_flag2=1;
my $error_line2;
my $error_fname2;
my $error_lname2;
my $i;
foreach my $player(#players_array){
my #fields = split " ", $player;
my $size2 = #fields;
for($i=2; $i<9; $i++){
print "$fields[$i] \n";
if (grep $_ =~ /^[a-zA-Z]+$/){
my $errorflag2 = 1;
$error_flag2 = $errorflag2;
my $errorline2 = $player +1;
$error_line2 = $errorline2;
my $errorfname2 = $fields[0];
$error_fname2 = $errorfname2;
}
}
if ($size2 == "9" ) {
my $firstname = $fields[0];
my $lastname = $fields[1];
my $batting_average = ($fields[4]+$fields[5]+$fields[6]+$fields[7]) / $fields[3];
my $slugging = ($fields[4]+($fields[5]*2)+($fields[6]*3)+($fields[7]*4)) / $fields[3];
my $on_base_percent = ($fields[4]+$fields[5]+$fields[6]+$fields[7] +$fields[8]) / $fields[2];
print "$firstname ";
print "$lastname ";
print "$batting_average ";
print "$slugging ";
print "$on_base_percent\n ";
}
else {
my $errorflag = 1;
$error_flag = $errorflag;
my $errorline = $player +1;
$error_line = $errorline;
my $errorfname = $fields[0];
$error_fname = $errorfname;
my $errorlname = $fields[1];
$error_lname = $errorlname;
}
}
if ($error_flag == "1"){
print "\n Line $error_line : ";
print "$error_fname, ";
print "$error_lname :";
print "Line contains not enough data.\n";
}
if ($error_flag2 == "1"){
print "\n Line $error_line2 : ";
print "$error_fname2, ";
print "Line contains bad data.\n";
}
OK, so the problem you've got here is that you're thinking of grep in Unix terms - a text based thing. It doesn't work like that in perl - it operates on a list.
Fortunately, this is pretty easy to handle in your case, because you can split your line into words.
Without your source data, this is hopefully a proof of concept:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
while ( <DATA> ) {
#split the current line on whitespace into an array.
#first two elements get assigned to firstname lastname, and then the rest
#goes into #values
my ( $firstname, $lastname, #values ) = split; #works on $_ implicitly.
#check every element in #values, and test the regex 'non-digit' against it.
my #errors = grep { /\D/ } #values;
#output any matches e.g. things that contained 'non-digits' anywhere.
print Dumper \#errors;
#an array in a scalar context evaluates as the number of elements.
#we need to use "scalar" here because print accepts list arguments.
print "There were ", scalar #errors, " errors\n";
}
__DATA__
bob hankerman 2039 3232 23 232 645 64x3 324
Or reducing down the logic:
#!/usr/bin/perl
use strict;
use warnings;
while ( <DATA> ) {
#note - we don't need to explicity specify 'scalar' here,
#because assigning it to a scalar does that automatically.
#(split) splits the current line, and [2..8] skips the first two.
my $count_of_errors = grep { /\D/ } (split)[2..8];
print $count_of_errors;
}
__DATA__
bob hankerman 2039 3232 23 232 645 64x3 324
First : You don't need to use "GREP", Simply you can match the string with "=~" in perl and you can print matched value with $&.
Second : You should use $_ if and only if there is not other variable used in the loop. There is already $i used in the loop, you can write the loop as :
for my $i (2..9) {
print "$i\n";
}
or
foreach(2..9) {
print "$_\n";
}

Perl: Inserting values into specific columns of CSV file

I have CSV data of the form:
S.No,Label,Customer1,Customer2,Customer3...
1,label1,Y,N,Y
2,label2,N,Y,N
...
I need to reproduce the "label" to the left of "customer" columns marked with Y - and have nothing ("") to the left of columns marked with N.
Expected output:
S.No,Label,Customer1,Customer1,Customer2,Customer2,Customer3,Customer3...
1,label1,label1,Y,"",N,label1,Y
2,label2,"",N,label2,Y,"",N
When opened using Excel, it would look like this:
S.No Label Customer1 Customer1 Customer2 Customer2 Customer3 Customer3...
1 label1 label1 Y N label1 Y
2 label2 N label2 Y N
The two leftmost columns, referring to S.No and the original "Label" column, are constant.
What is the simplest way to do this? I tried the following code:
use strict;
use warnings;
my $nonIncludesFile = "nonIncludes.csv";
open(my $xfh, "+>", $nonIncludesFile) or warn "Unable to open $nonIncludesFile, $!";
chomp( my $header = <$xfh> );
my #names = split ",", $header;
my #names1;
my #fields;
my #fields1;
for(my $j=0; $j< scalar(#names); $j++)
{
$names1[$j] = $names[$j];
}
while(<$xfh>)
{
my $nonIncLine = $_;
$nonIncLine = chomp($nonIncLine);
#fields = split ",", $nonIncLine;
next if $. == 1; #skip the first line
for(my $i = 0; $i < scalar(#fields) -2; $i++) #Number of "customers" = scalar(#fields) -2
{
$fields1[0] = $fields[0];
$fields1[1] = $fields[1];
if('Y' eq $fields[ $i + 2 ])
{
$fields1[$i+2] = 'Y';
substr(#fields1, $i + 1, 0, $fields[1]); #insert the label to the left - HERE
}
else
{
$fields1[$i+2] = 'N';
substr(#fields1, $i + 1, 0, "");
}
}
}
print $xfh #names1;
print $xfh #fields1;
close($xfh);
This however complains of "substr outside of string" at the line marked by "HERE".
What am I doing wrong? And is there any simpler (and better) way to do this?
Something like this maybe?
#!/usr/bin/perl
use strict;
use warnings;
#read the header row
chomp( my ( $sn, $label, #customers ) = split( /,/, <DATA> ) );
#double the 'customers' column headings (one is suffixed "_label")
print join( ",", $sn, $label, map { $_ . "_label", $_ } #customers ), "\n";
#iterate data
while (<DATA>) {
#strip trailing linefeed
chomp;
#extract fields with split - note breaks if you've quoted commas inline.
my ( $sn, $label, #row ) = split /,/;
print "$sn,$label,";
#iterate Y/N values, and either prints "Y" + label, or anything else + blank.
foreach my $value (#row) {
print join( ",", $value eq "Y" ? $label : "", $value ),",";
}
print "\n";
}
__DATA__
S.No,Label,Customer1,Customer2,Customer3
1,label1,Y,N,Y
2,label2,N,Y,N
Assumes you don't have any fruity special characters (e.g. commas) in the fields, because it'll break if you do, and you might want to consider Text::CSV instead.
It is always much better to post some usable test data than write a something like this question
However, it looks like your data has no quoted fields or escaped characters, so it looks like you can just use split and join to process the CSV data
Here's a sample Perl program that fulfils your requirement. The example output uses your data as it is. Each line of data has to be processed backwards so that the insertions don't affect the indices of elements that are yet to be processed
use strict;
use warnings 'all';
use feature 'say';
while ( <DATA> ) {
chomp;
my #fields = split /,/;
for ( my $i = $#fields; $i > 1; --$i ) {
my $newval =
$. == 1 ? $fields[$i] :
lc $fields[$i] eq 'y' ? $fields[1] :
'';
splice #fields, $i, 0, $newval;
}
say join ',', #fields;
}
__DATA__
S.No,Label,Customer1,Customer2,Customer3...
1,label1,Y,N,Y
2,label2,N,Y,N
output
S.No,Label,Customer1,Customer1,Customer2,Customer2,Customer3...,Customer3...
1,label1,label1,Y,,N,label1,Y
2,label2,,N,label2,Y,,N

Compare Arrays and Delete Arrays

I have below three sets ( arrays ) I need to perform an operation like this ( (A-B)UC ) on.
Can someone have the logic of this in Perl?
Here is my code I can able check for is B subset of A or not but I could not able to do "A-B":
my #array = (MAJOR,MINOR,MM,DD,YY);
my #exclude = (MM,MINOR,YY);
my #include = (LICENSE,VALID);
foreach (#exclude) {
if ( $_ ~~ #array ) {
print "\n $_ is defined in variables and it will be excluded \n";
#array = grep {!/\$_/} #array;
print "#array \n";
}
else {
print "\n $_ is not defined under variables please check the files \n";
exit 100;
}
}
foreach (#array){
print "$_ \n";
}
I suspect something is wrong in my logic with grep operation i.e. delete operation.
One problem with the grep is that $_ in the outer loop is redefined inside the grep block to each element of #array. You need to have different names. Also, your regex was lacking anchors; however, instead of a regex, just use string inequality. Try this:
my #array = qw(MAJOR MINOR MM DD YY);
my #exclude = qw(MM MINOR YY);
my #include = qw(LICENSE VALID);
foreach my $e (#exclude) {
if ( $e ~~ #array ) {
print "\n $e is defined in variables and it will be excluded \n";
#array = grep {$e ne $_} #array;
print "#array \n";
} else {
print "\n $e is not defined under variables please check the files \n";
exit 100;
}
}
use strict and warnings to alert you to many pitfalls in perl.
A hash is the most natural way to represent a set in perl.
use strict;
use warnings;
my #array = ('MAJOR','MINOR','MM','DD','YY');
my #exclude = ('MM','MINOR','YY');
my #include = ('LICENSE','VALID');
my %set;
# add #array to set
#set{#array} = ();
# remove #exclude
delete #set{#exclude};
# add #include
#set{#include} = ();
# array of elements resulting
my #result = sort keys %set;
You could use a set to do those kind of operations. I used a non-standard module Set::Scalar to help me with it:
#!/usr/bin/env perl
use warnings;
use strict;
use Set::Scalar;
my #array = qw(MAJOR MINOR MM DD YY);
my #exclude = qw(MM MINOR YY);
my #include = qw(LICENSE VALID);
my $array_set = Set::Scalar->new(#array);
my $exclude_set = Set::Scalar->new(#exclude);
my $include_set = Set::Scalar->new(#include);
my $result = $array_set->difference($exclude_set)->union($include_set);
use Data::Dumper;
print Dumper #$result;
Run it like:
perl script.pl
That yields:
$VAR1 = 'VALID';
$VAR2 = 'MAJOR';
$VAR3 = 'DD';
$VAR4 = 'LICENSE';

Dynamically punctuating and grammatically correcting a string via a Foreach loop in PowerShell

I have a simple section in a PowerShell script that goes through each array in a list and grabs the data (found at [3] of the current array), using it to determine if another part of the array (found at [0]) should be added to the end of a string.
$String = "There is"
$Objects | Foreach-Object{
if ($_[3] -match "YES")
{$String += ", a " + $_[0]}
}
This works fine and dandy, resulting in a $String of something like
"There is, a car, a airplane, a truck"
But unfortunately this doesn't really make sense grammatically for what I want. I'm aware that I could either fix the string after it's been created, or include lines in the foreach/if statement that determine which characters to add. This would need to be:
$String += " a " + $_[0] - for the first match.
$String += ", a " + $_[0] - for following matches.
$String += " and a " + $_[0] + " here." - for the last match.
Furthermore I need to determine whether to use " a " if $_[0] starts with a consonant, or " an " if $_[0] starts with a vowel. All in all, I'd like the output to be
"There is a car, an airplane and a truck here."
Thanks!
Try something like this:
$vehicles = $Objects | ? { $_[3] -match 'yes' } | % { $_[0] }
$String = 'There is'
for ($i = 0; $i -lt $vehicles.Length; $i++) {
switch ($i) {
0 { $String += ' a' }
($vehicles.Length-1) { $String += ' and a' }
default { $String += ', a' }
}
if ($vehicles[$i] -match '^[aeiou]') { $String += 'n' }
$String += ' ' + $vehicles[$i]
}
$String += ' here.'

How do I use an array of values to limit a for loop in perl?

The question is rather vague I know, but I hope the space to explain may help shed light, this is something I've wracked my brain around all day and couldn't find any advice through searching.
Basically I have an array #cluster that I'm trying to use to make an iterator $x skip over the values located in that array. The array will vary in size, so I can't just (rather atrociously) make if statements to fit all cases unfortunately.
Normally when I need to do this with a scalar value I just do:
for my $x (0 .. $numLines){
if($x != $value){
...
}
}
Any advice?
You can do:
my #cluster = (1,3,4,7);
outer: for my $x (0 .. 10){
$x eq $_ and next outer for #cluster;
print $x, "\n";
}
With Perl 5.10 you can also do:
for my $x (0 .. 10){
next if $x ~~ #cluster;
print $x, "\n";
}
or better to use a hash:
my #cluster = (1,3,4,7);
my %cluster = map {$_, 1} #cluster;
for my $x (0 .. 10){
next if $cluster{$x};
print $x, "\n";
}
Hmm... If you are skipping over lines, why not use that criteria directly instead of remembering the lines that need to be filtered out?
The grep function is a powerful construct for filtering lists:
my #array = 1 .. 10;
print "$_\n" for grep { not /^[1347]$/ } #array; # 2,5,6,8,9,10
print "$_\n" for grep { $_ % 2 } #array; # 1,3,5,7,9
my #text = qw( the cat sat on the mat );
print "$_\n" for grep { ! /at/ } #text; # the, on, the
Much less clutter, and much more DWIM!
Dou you mean something like that:
for my $x (0 .. $numLines){
my $is_not_in_claster = 1;
for( #claster ){
if( $x == $_ ){
$is_not_in_claster = 0;
last;
}
}
if( $is_not_in_claster ){
...
}
}
?

Resources