perl, removing elements from array in for loop - arrays

will the following code always work in perl ?
for loop iterating over #array {
# do something
if ($condition) {
remove current element from #array
}
}
Because I know in Java this results in some Exceptions, The above code is working for me for now, but I want to be sure that it will work for all cases in perl. Thanks

Well, it's said in the doc:
If any part of LIST is an array, foreach will get very confused if you
add or remove elements within the loop body, for example with splice.
So don't do that.
It's a bit better with each:
If you add or delete a hash's elements while iterating over it,
entries may be skipped or duplicated--so don't do that. Exception: In
the current implementation, it is always safe to delete the item most
recently returned by each(), so the following code works properly:
while (($key, $value) = each %hash) {
print $key, "\n";
delete $hash{$key}; # This is safe
}
But I suppose the best option here would be just using grep:
#some_array = grep {
# do something with $_
some_condition($_);
} #some_array;

Related

Can't use string as an ARRAY ref while strict refs in use

Getting an error when I attempt to dump out part of a multi dimensional hash array. Perl spits out
Can't use string ("somedata") as an ARRAY ref while "strict refs" in use at ./myscript.pl
I have tried multiple ways to access part of the array I want to see but I always get an error. I've used Dumper to see the entire array and it looks fine.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);
use String::Util qw(trim);
my %arrHosts;
open(my $filehdl, "<textfile.txt") || die "Cannot open or find file textfile.txt: $!\n";
while( my $strInputline = <$filehdl> ) {
chomp($strInputline);
my ($strHostname,$strOS,$strVer,$strEnv) = split(/:/, $strInputline);
$strOS = lc($strOS);
$strVer = trim($strVer);
$strEnv = trim($strEnv);
$strOS = trim($strOS);
$arrHosts{$strOS}{$strVer}{$strEnv} = $strHostname;
}
# If you want to see the entire database, remove the # in front of Dumper
print Dumper \%arrHosts;
foreach my $machine (#{$arrHosts{solaris}{10}{DEV}}) {
print "$machine\n";
}
close($filehdl);
The data is in the form
machine:OS:OS version:Environment
For example
bigserver:solaris:11:PROD
smallerserver:solaris:11:DEV
I want to print out only the servers that are solaris, version 11, in DEV. Using hashes seems the easiest way to store the data but alas, Perl barfs when attempting to print out only a portion of it. Dumper works great but I don't want to see everything. Where did I go wrong??
You have the following:
$arrHosts{$strOS}{$strVer}{$strEnv} = $strHostname;
That means the following contains a string:
$arrHosts{solaris}{10}{DEV}
You are treating it as if it contains a reference to an array. To group the hosts by OS+ver+env, replace
$arrHosts{$strOS}{$strVer}{$strEnv} = $strHostname;
with
push #{ $arrHosts{$strOS}{$strVer}{$strEnv} }, $strHostname;
Iterating over #{ $arrHosts{solaris}{10}{DEV} } will then make sense.
My previous code also had the obvious problem whereby if the combo of OS, Version, and Environment were the same it wrote over previous data. Blunderful. Push is the trick

Using any to check existence of element in array inside hash

I am trying to check if an element (array) of a hash contains a specific item by using any. Because my arrays can get very large it seemed any was the most efficient way as it returns true as soon as the item is found. The problem is that CLI returns:
Type of arg 1 to List::Util::any must be block or sub {} (not array
dereference) at ...
The line is (changed to a fictitious example) reproduced below. I am trying to see if id of item2 is inside field of item1 in the fictitious example below.
unless(any(#{$hash{$item1}{field}}) eq $hash{$item2}{id}) {
# Do magic.
}
What am I doing wrong? As any is part of List::Util, I have loaded that module at the top.
use List::Util qw(any);
You need to import the function:
use List::Util qw(any);
UPDATE: As noted, the 1st arg to any should be a block of code. In this case, compare the hash value to $_, which is assigned to each value in your array until the condition is true.
unless(any { $_ eq $hash{$item2}{id} } #{$hash{$item1}{field}}) {

Looping through array contained in a hash

So I have an object that contains an array:
package MyObject;
sub new {
my($type) = #_;
my $self->{Params}{Status}{Packages} = [];
}
I have a add new package sub which appends onto this "Package" array like:
sub add_package {
my($self, $package_obj) = #_;
push $self->{Params}{Status}{Packages}, $package;
}
Now when I go to find all the packages in my array I have issues. Whenever I try and pull out the packages like this:
foreach my $package($self->{Params}{Status}{Packages}) {
# do something with $package.
}
This only loops through one time. Now from what I understand the hash actually stores a pointer to the array so I tried to do:
foreach my $package(#$self->{Params}{Status}{Packages}) {
# do something with $package.
}
But then there is an error saying that $self is not an array. I did notice when I do:
scalar $self->{Params}{Status}{Packages};
It returns:
#ARRAY(0xSome Address);
What am I missing? And how can I use a foreach loop to go through my array?
$self->{Params}{Status}{Packages} is a reference to an array, in Perl terminology. When you have a reference to something, put the right character in front of it to dereference it. If the reference is more than just a name with possibly some sigils in front, you need to surround it with braces. It's a matter of precedence: #$self->{Params}{Status}{Packages} is parsed as (#$self)->{Params}{Status}{Packages}, but you need
#{$self->{Params}{Status}{Packages}}
i.e. the array referenced by the expression $self->{Params}{Status}{Packages}.
In this case, you need to wrap it all in the array dereference block #{} so perl knows which portion you're trying to dereference...
for my $package (#{ $self->{Params}{Status}{Packages} }){
print "$package\n";
}
Also, just to keep things consistent, I prefer to always deref the array with the block when extracting, or inserting:
push #{ $self->{Params}{Status}{Packages} }, $package;
UPDATE: As of 5.24.0+, autoderef (using keys(), values() or each() with a reference) will almost certainly be removed, and replaced with postfix references. However, using the #{} and %{} will continue to be supported, and is backwards compatible, so I'd recommend using them at all times.
In my view, the clearest way to do this is to extract the array reference to a temporary scalar variable, which makes accessing the array very straightforward
my $packages = $self->{Params}{Status}{Packages};
for my $package ( #$packages ) {
# do something with $package.
}
Also, if you have use strict and use warnings enabled as you should, your add_package subroutine will produce the message
push on reference is experimental
This isn't something you can safely ignore. Experimental features may change their behaviour or disappear completely in later versions of Perl, and it is unwise to make use of them in production code. You can fix your subroutine in a similar way, like this
sub add_package {
my ($self, $package_obj) = #_;
my $packages = $self->{Params}{Status}{Packages};
push #$packages, $package;
}

Creating a list of duplicate filenames with Perl

I've been trying to write a script to pre-process some long lists of files, but I am not confident (nor competent) with Perl yet and am not getting the results I want.
The script below is very much work in progress but I'm stuck on the check for duplicates and would be grateful if anyone could let me know where I am going wrong. The block dealing with duplicates seems to be of the same form as examples I have found but it doesn't seem to work.
#!/usr/bin/perl
use strict;
use warnings;
open my $fh, '<', $ARGV[0] or die "can't open: $!";
foreach my $line (<$fh>) {
# Trim list to remove directories which do not need to be checked
next if $line =~ m/Inventory/;
# MORE TO DO
next if $line =~ m/Scanned photos/;
$line =~ s/\n//; # just for a tidy list when testing
my #split = split(/\/([^\/]+)$/, $line); # separate filename from rest of path
foreach (#split) {
push (my #filenames, "$_");
# print "#filenames\n"; # check content of array
my %dupes;
foreach my $item (#filenames) {
next unless $dupes{$item}++;
print "$item\n";
}
}
}
I am struggling to understand what is wrong with my check for duplicates. I know the array contains duplicates (uncommenting the first print function gives me a list with lots of duplicates). The code as it stands generates nothing.
Not the main purpose of my post but my final aim is to remove unique filenames from the list and keep filenames which are in duplicated in other directories.
I know that none of these files are identical but many are different versions of the same file which is why I'm focussing on filename.
Eg I would want an input of:
~/Pictures/2010/12345678.jpg
~/Pictures/2010/12341234.jpg
~/Desktop/temp/12345678.jpg
to give an output of:
~/Pictures/2010/12345678.jpg
~/Desktop/temp/12345678.jpg
So I suppose ideally it would be good to check for uniqueness of a match based on the regex without splitting if that is possible.
This below loop does nothing, because the hash and the array only contain one value for each loop iteration:
foreach (#split) {
push (my #filenames, "$_"); # add one element to lexical array
my %dupes;
foreach my $item (#filenames) { # loop one time
next unless $dupes{$item}++; # add one key to lexical hash
print "$item\n";
}
} # #filenames and %dupes goes out of scope
A lexical variable (declared with my) has a scope that extends to the surrounding block { ... }, in this case your foreach loop. When they go out of scope, they are reset and all the data is lost.
I don't know why you copy the file names from #split to #filenames, it seems very redundant. The way to dedupe this would be:
my %seen;
my #uniq;
#uniq = grep !$seen{$_}++, #split;
Additional information:
You might also be interested in using File::Basename to get the file name:
use File::Basename;
my $fullpath = "~/Pictures/2010/12345678.jpg";
my $name = basename($fullpath); # 12345678.jpg
Your substitution
$line =~ s/\n//;
Should probably be
chomp($line);
When you read from a file handle, using for (foreach) means you read all the lines and store them in memory. It is preferable most times to instead use while, like this:
while (my $line = <$fh>)
TLP's answer gives lots of good advice. In addition:
Why use both an array and a hash to store the filenames? Simply use the hash as your one storage solution, and you will automatically remove duplicates. i.e:
my %filenames; #outside of the loops
...
foreach (#split) {
$filenames{$_}++;
}
Now when you want to get the list of unique filenames, just use keys %filenames or, if you want them in alphabetical order, sort keys %filenames. And the value for each hash key is a count of occurrences, so you can find out which ones were duplicated if you care.

Perl: Hash within Array within Hash

I am trying to build a Hash that has an array as one value; this array will then contain hashes. Unfortunately, I have coded it wrong and it is being interpreted as a psuedo-hash. Please help!
my $xcHash = {};
my $xcLine;
#populate hash header
$xcHash->{XC_HASH_LINES} = ();
#for each line of data
$xcLine = {};
#populate line hash
push(#{$xcHash->{XC_HASH_LINES}}, $xcLine);
foreach $xcLine ($xcHash->{XC_HASH_LINES})
#psuedo-hash error occurs when I try to use $xcLine->{...}
$xcHash->{XC_HASH_LINES} is an arrayref and not an array. So
$xcHash->{XC_HASH_LINES} = ();
should be:
$xcHash->{XC_HASH_LINES} = [];
foreach takes a list. It can be a list containing a single scalar (foreach ($foo)), but that's not what you want here.
foreach $xcLine ($xcHash->{XC_HASH_LINES})
should be:
foreach my $xcLine (#{$xcHash->{XC_HASH_LINES}})
foreach $xcLine ($xcHash->{XC_HASH_LINES})
should be
foreach $xcLine ( #{ $xcHash->{XC_HASH_LINES} } )
See http://perlmonks.org/?node=References+quick+reference for easy to remember rules for how to dereference complex data structures.
Golden Rule #1
use strict;
use warnings;
It might seem like a fight at the beginning, but they will instill good Perl practices and help identify many syntactical errors that might otherwise go unnoticed.
Also, Perl has a neat feature called autovivification. It means that $xcHash and $xcLine need not be pre-defined or constructed as references to arrays or hashes.
The issue faced here is to do with the not uncommon notion that a scalar can hold an array or hash; it doesn't. What it holds is a reference. This means that the $xcHash->{XC_HASH_LINES} is an arrayref, not an array, which is why it needs to be dereferenced as an array using the #{...} notation.
Here's what I would do:
my %xcHash;
for each line of data:
push #{$xcHash{XC_HASH_LINES}},$xcLine;

Resources