Perl: grep from multiple arrays at once - arrays

I have multiple arrays (~32). I want to remove all blank elements from them. How can it be done in a short way (may be via one foreach loop or 1-2 command lines)?
I tried the below, but it's not working:
my #refreshArrayList=("#list1", "#list2", "#list3","#list4", "#list5", "#list6" , "#list7");
foreach my $i (#refreshArrayList) {
$i = grep (!/^\s*$/, $i);
}
Let's say, #list1 = ("abc","def","","ghi"); #list2 = ("qwe","","rty","uy", "iop"), and similarly for other arrays. Now, I want to remove all blank elements from all the arrays.
Desired Output shall be: #list1 = ("abc","def","ghi"); #list2 = ("qwe","rty","uy", "iop") ### All blank elements are removed from all the arrays.
How can it be done?

You can create a list of list references and then iterator over these, like
for my $list (\#list1, \#list2, \#list3) {
#$list = grep (!/^\s*$/, #$list);
}
Of course, you could create this list of list references also dynamically, i.e.
my #list_of_lists;
push #list_of_lists, \#list1;
push #list_of_lists, \#list2;
...
for my $list (#list_of_lists) {
#$list = grep (!/^\s*$/, #$list);
}

#$_ = grep /\S/, #$_ for #AoA; # #AoA = (\#ary1, \#ary2, ...)
Explanation
First, this uses the statement modifier, "inverting" the usual for loop syntax into the form STMT for LIST
The for(each) modifier is an iterator: it executes the statement once for each item in the LIST (with $_ aliased to each item in turn).
It is mostly equivalent to a "normal" for loop, with the notable difference being that no scope is set and so there is no need to tear it down either, adding a small measure of efficiency.† Ae can have only one statement; but then again, that can be a do block. Having no scope means that we cannot declare lexical variables for the statement (unless a do block is used).
So the statement is #$_ = grep /\S/, #$_, executed for each element of the list.
In a for(each) loop, the variable that is set to each element in turn as the list is iterated over ("topicalizer") is an alias to those elements. So changing it changes elements. From perlsyn
If VAR is omitted, $_ is set to each value.
If any element of LIST is an lvalue, you can modify it by modifying VAR inside the loop.
In our case $_ is always an array reference, and then the underlying array is rewritten by dereferencing it (#$_) and assigning to that the output list of grep, which consists only of elements that have at least one non-space character (/\S/).
† I ran a three-way benchmark, of the statement-modifier loop against a "normal" loop with and without a topical variable.
For adding 100e6 numbers I get 8-11% speedup (on both a desktop and a server) and with a more involved calculation ($r = ($r + $_ ) / sqrt($_)) it's 4-5%.
A side observation: In both cases the full for loop without a variable (using default $_ for the topicalizer) is 1-2% faster than the one with a lexical topical variable set.

Related

How do I copy list elements to hash keys in perl?

I have found a couple of ways to copy the elements of a list to the keys of a hash, but could somebody please explain how this works?
#!/usr/bin/perl
use v5.34.0;
my #arry = qw( ray bill lois shirly missy hank );
my %hash;
$hash{$_}++ for #arry; # What is happening here?
foreach (keys %hash) {
say "$_ => " . $hash{$_};
}
The output is what I expected. I don't know how the assignment is being made.
hank => 1
shirly => 1
missy => 1
bill => 1
lois => 1
ray => 1
$hash{$_}++ for #array;
Can also be written
for (#array) {
$hash{$_}++;
}
Or more explicitly
for my $key (#array) {
$hash{$key}++;
}
$_ is "the default input and pattern-searching space"-variable. Often in Perl functions, you can leave out naming an explicit variable to use, and it will default to using $_. for is an example of that. You can also write an explicit variable name, that might feel more informative for your code:
for my $word (#words)
Or idiomatically:
for my $key (keys %hash) # using $key variable name for hash keys
You should also be aware that for and foreach are exactly identical in Perl. They are aliases for the same function. Hence, I always use for because it is shorter.
The second part of the code is the assignment, using the auto-increment operator ++
It is appended to a variable on the LHS and increments its value by 1. E.g.
$_++ means $_ = $_ + 1
$hash{$_}++ means $hash{$_} = $hash{$_} + 1
...etc
It also has a certain Perl magic included, which you can read more about in the documentation. In this case, it means that it can increment even undefined variables without issuing a warning about it. This is ideal when it comes to initializing hash keys, which do not exist beforehand.
Your code will initialize a hash key for each word in your #arry list, and also count the occurrences of each word. Which happens to be 1 in this case. This is relevant to point out, because since hash keys are unique, your array list may be bigger than the list of keys in the hash, since some keys would overwrite each other.
my #words = qw(foo bar bar baaz);
my %hash1;
for my $key (#words) {
$hash{$key} = 1; # initialize each word
}
# %hash1 = ( foo => 1, bar => 1, baaz => 1 );
# note -^^
my %hash2; # new hash
for my $key (#words) {
$hash{$key}++; # use auto-increment: words are counted
}
# %hash2 = ( foo => 1, bar => 2, baaz => 1);
# note -^^
Here is another one
my %hash = map { $_ => 1 } #ary;
Explanation: map takes an element of the input array at a time and for each prepapres a list, here of two -- the element itself ($_, also quoted because of =>) and a 1. Such a list of pairs then populates a hash, as a list of an even length can be assigned to a hash, whereby each two successive elements form a key-value pair.
Note: This does not account for possibly multiple occurences of same elements in the array but only builds an existance-check structure (whether an element is in the array or not).
$hash{$_}++ for #arry; # What is happening here?
It is iterating over the array, and for each element, it's assigning it as a key to the hash, and incrementing the value of that key by one. You could also write it like this:
my %hash;
my #array = (1, 2, 2, 3);
for my $element (#array) {
$hash{$element}++;
}
The result would be:
$VAR1 = {
'2' => 2,
'1' => 1,
'3' => 1
};
$hash{$_}++ for #arry; # What is happening here?
Read perlsyn, specifically simple statements and statement modifiers:
Simple Statements
The only kind of simple statement is an expression evaluated for its side-effects. Every simple statement must be terminated with a semicolon, unless it is the final statement in a block, in which case the semicolon is optional. But put the semicolon in anyway if the block takes up more than one line, because you may eventually add another line. Note that there are operators like eval {}, sub {}, and do {} that look like compound statements, but aren't--they're just TERMs in an expression--and thus need an explicit termination when used as the last item in a statement.
Statement Modifiers
Any simple statement may optionally be followed by a SINGLE modifier, just before the terminating semicolon (or block ending). The possible modifiers are:
if EXPR
unless EXPR
while EXPR
until EXPR
for LIST
foreach LIST
when EXPR
[...]
The for(each) modifier is an iterator: it executes the statement once for each item in the LIST (with $_ aliased to each item in turn). There is no syntax to specify a C-style for loop or a lexically scoped iteration variable in this form.
print "Hello $_!\n" for qw(world Dolly nurse);

Powershell - print the list length respectively

I want to write two things in Powershell.
For example;
We have a one list:
$a=#('ab','bc','cd','dc')
I want to write:
1 >> ab
2 >> bc
3 >> cd
4 >> dc
I want this to be dynamic based on the length of the list.
Thanks for helping.
Use a for loop so you can keep track of the index:
for( $i = 0; $i -lt $a.Count; $i++ ){
"$($i + 1) >> $($a[$i])"
}
To explain how this works:
The for loop is defined with three sections, separated by a semi-colon ;.
The first section declares variables, in this case we define $i = 0. This will be our index reference.
The second section is the condition for the loop to continue. As long as $i is less than $a.Count, the loop will continue. We don't want to go past the length of the list or you will get undesired behavior.
The third section is what happens at the end of each iteration of the loop. In this case we want to increase our counter $i by 1 each time ($i++ is shorthand for "increment $i by 1")
There is more nuance to this notation than I've included but it has no bearing on how the loop works. You can read more here on Unary Operators.
For the loop body itself, I'll explain the string
Returning an object without assigning to a variable, such as this string, is effectively the same thing as using Write-Output.
In most cases, Write-Output is actually optional (and often is not what you want for displaying text on the screen). My answer here goes into more detail about the different Write- cmdlets, output streams, and redirection.
$() is the sub-expression operator, and is used to return expressions for use within a parent expression. In this case we return the result of $i + 1 which gets inserted into the final string.
It is unique in that it can be used directly within strings unlike the similar-but-distinct array sub-expression operator and grouping operator.
Without the subexpression operator, you would get something like 0 + 1 as it will insert the value of $i but will render the + 1 literally.
After the >> we use another sub-expression to insert the value of the $ith index of $a into the string.
While simple variable expansion would insert the .ToString() value of array $a into the final string, referencing the index of the array must be done within a sub-expression or the [] will get rendered literally.
Your solution using a foreach and doing $a.IndexOf($number) within the loop does work, but while $a.IndexOf($number) works to get the current index, .IndexOf(object) works by iterating over the array until it finds the matching object reference, then returns the index. For large arrays this will take longer and longer with each iteration. The for loop does not have this restriction.
Consider the following example with a much larger array:
# Array of numbers 1 through 65535
$a = 1..65535
# Use the for loop to output "Iteration INDEXVALUE"
# Runs in 106 ms on my system
Measure-Command { for( $i = 0; $i -lt $a.Count; $i++ ) { "Iteration $($a[$i])" } }
# Use a foreach loop to do the same but obtain the index with .IndexOf(object)
# Runs in 6720 ms on my system
Measure-Command { foreach( $i in $a ){ "Iteration $($a.IndexOf($i))" } }
Another thing to watch out for is that while you can change properties and execute methods on collection elements, you can't change the element values of a non-collection collection (any collection not in the System.Concurrent.Collections namespace) when its enumerator is in use. While invisible, foreach (and relatedly ForEach-Object) implicitly invoke the collection's .GetEnumerator() method for the loop. This won't throw an error like in other .NET languages, but IMO it should. It will appear to accept a new value for the collection but once you exit the loop the value remains unchanged.
This isn't to say the foreach loop should never be used or that you did anything "wrong", but I feel these nuances should be made known before you do find yourself in a situation where a better construct would be appropriate.
Okey,
I fixed that;
$a=#('ab','bc','cd','dc')
$a.Length
foreach ($number in $a) {
$numberofIIS = $a.IndexOf($number)
Write-Host ($numberofIIS,">>>",$number)
}
Bender's answer is great, but I personally avoid for loops if at all possible. They usually require some awkward indexing into arrays and that ugly setup... The whole thing just ends up looking like hieroglyphics.
With a foreach loop it's our job to keep track of the index (which is where this answer differs from yours) but I think in the end it is more readable then a for loop.
$a = #('ab', 'bc', 'cd', 'dc')
# Pipe the items of our array to ForEach-Object
# We use the -Begin block to initialize our index variable ($x)
$a | ForEach-Object -Begin { $x = 1 } -Process {
# Output the expression
"$x" + ' >> ' + $_
# Increment $x for next loop
$x++
}
# -----------------------------------------------------------
# You can also do this with a foreach statement
# We just have to intialize our index variable
# beforehand
$x = 1
foreach ($number in $a){
# Output the expression
"$x >> $number"
# Increment $x for next loop
$x++
}

Perl - grep in foreach does not work with $_ but was with variable

I want to delete some defined elements from a array. I already have a solution with grep and one element: #big = grep { ! /3/ } #big. In case of several elements I want to put them in an array and using foreach. #big is the array from which I want delete elements from #del:
perl -e "#big = (1,2,3,4,5); #del = (2,4);
foreach $i (#del) {#big = grep { ! /$i/ } #big; print \"#big\n\"}"
Here is the output:
1 3 4 5
1 3 5
This works fine for me. If I want to use default variable $_ it does not work:
perl -e "#big = (1,2,3,4,5); #del = (2,4);
foreach (#del) {#big = grep { ! /$_/ } #big; print \"#big\n\"}"
This gives no output. Any idea what happens?
As #MattJacob points out, there are two conflicting uses for $_:
foreach (#del) {...}
This implicitly uses $_ as the loop variable. From the foreach docs:
The foreach keyword is actually a synonym for the for keyword, so you can use either. If VAR is omitted, $_ is set to each value.
Your grep command also uses $_, always, as part of its processing. So if you had a value in $_ as part of your foreach, and it was replaced in the grep ...?
From the grep docs:
Evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element)
Please do check the grep docs, since there is an explicit warning about modifying a list that you are iterating over. If you are trying to iteratively shrink the list, you might consider some alternative ways to process it. (For example, could you build a single pattern for grep that would combine all the values in your #del list, and just process the #big list one time? This might be quicker, too.)

Apply a substitution over every element of an array

I have an array of file names. Names are of the format company_ID_timestamp.
How do I apply a substitution on the array without running a loop?
for ( my $i=0; $i < scalar #todayFiles; $i++ ) {
$todayFiles[$i] = s/_20[0-9]{10}//;
}
Unless you want an ugly hack, you're going to want some kind of a loop, even if it's hidden with a map, or a for statement modifier.
s/_20[0-9]{10}// for #todayFiles;
The following works in Perl v5.14 and up (because of the /r modifier). This one makes sense if you don't want to modify the original array:
my #otherArray = map { s/_20[0-9]{10}//r } #todayFiles;
And here's a shorter/better way to write that C-style loop you showed:
for my $filename (#todayFiles) {
$filename =~ s/_20[0-9]{10}//;
}
The latter one works because the for aka foreach loop actually aliases the variable $filename to the elements of the array being iterated over.
To apply a substitution to every element of an array, there is no way but to iterate over those elements
That said, you can tidy up your code a lot by using for as a statement modifier and employing the $_ default variable
s/_20[0-9]{10}// for #todayFiles;
This still iterates over the entire array, but the code is a lot tighter

Why can't I seem to remove undefined element from an array in Perl?

I have an array of strings that comes from data in a hash table. I am trying to remove any (apparently) empty elements, but for some reason there seems to be an obstinate element that refuses to go.
I am doing:
# Get list array from hash first, then
#list = grep { $_ ne ' ' } #list;
#list = uniq #list;
return sort #list;
At the grep line I get the Use of uninitialized value in string ne... message with the rest of the array printed correctly below.
I've tried doing it the 'long' way:
foreach (#list) {
if ($_ ne ' ') {
push #new_list, $_;
}
}
But this produces exactly the same result. I tried using defined with the expected result (nothing).
I could sort the array beforehand and delete the first element, but that seems very risky as I cannot guarantee that the data set will always have blank elements. It also seems excessive to resort to regular expressions, but perhaps I'm wrong. I'm sure I'm missing something ridiculously simple, as usual.
Elements can't be empty. You're trying to remove undefined elements. But you're not checking if the element is undefined, you're checking if it consists of a string consisting of a single space. You want:
#list = grep defined, #list;
My answer assumes that you do not want strings that are either empty (meaning undefined or have a length of 0) or consist solely of spaces.
Your grep line only tests for strings that equal exactly one space. However, the warning implies that at least one array element is indeed undefined. Comparing an undefined value with eq will only yield true for an empty string, not for a single space.
So in order to remove all entries that are either undefined or consist only of spaces you could do something like this:
#list = grep { defined && m/[^\s]/ } #list;
Note that an empty space is trueish for Perl. Therefore a simple grep defined, #list will actually not throw out the entries that consist solely of spaces.
It looks like you want to filter all the elements that contain a non-space character. To do this as well as reject undefined elements you can write simply
#list = uniq grep { defined and /\S/ } #list;

Resources