Array got flushed after while loop within a filehandle - arrays

I got a problem with a Perl script.
Here is the code:
use strict;
use Data::Dumper;
my #data = ('a','b');
if(#data){
for(#data){
&debug_check($_)
}
}
print "#data";#Nothing is printed, the array gets empty after the while loop.
sub debug_check{
my $ip = shift;
open my $fh, "<", "debug.txt";
while(<$fh>){
print "$_ $ip\n";
}
}
Array data, in this example, has two elements. I need to check if the array has elements. If it has, then for each element I call a subroutine, in this case called "debug_check". Inside the subroutine I need to open a file to read some data. After I read the file using a while loop, the data array gets empty.
Why the array is being flushed and how do I avoid this strange behavior?
Thanks.

The problem here I think, will be down to $_. This is a bit of a special case, in that it's an alias to a value. If you modify $_ within a loop, it'll update the array. So when you hand it into the subroutine, and then shift it, it also updates #data.
Try:
my ( $ip ) = #_;
Or instead:
for my $ip ( #array ) {
debug_check($ip);
}
Note - you should also avoid using an & prefix to a sub. It has a special meaning. Usually it'll work, but it's generally redundant at best, and might cause some strange glitches.

while (<$fh>)
is short for
while (defined($_ = <$fh>))
$_ is currently aliased to the element of #data, so your sub is replacing each element of #data with undef. Fix:
while (local $_ = <$fh>)
which is short for
while (defined(local $_ = <$fh>))
or
while (my $line = <$fh>) # And use $line instead of $_ afterwards
which is short for
while (defined(my $line = <$fh>))
Be careful when using global variables. You want to localize them if you modify them.

Related

Perl - initialization of hash

I'm not sure how to correctly initialize my hash - I'm trying to create a key/value pair for values in coupled lines in my input file.
For example, my input looks like this:
#cluster t.18
46421 ../../../output###.txt/
#cluster t.34
41554 ../../../output###.txt/
I'm extracting the t number from line 1 (#cluster line) and matching it to output###.txt in the second line (line starting with 46421). However, I can't seem to get these values into my hash with the script that I have written.
#!/usr/bin/perl
use warnings;
use strict;
my $key;
my $value;
my %hash;
my $filename = 'input.txt';
open my $fh, '<', $filename or die "Can't open $filename: $!";
while (my $line = <$fh>) {
chomp $line;
if ($line =~ m/^\#cluster/) {
my #fields = split /(\d+)/, $line;
my $key = $fields[1];
}
elsif ($line =~ m/^(\d+)/) {
my #output = split /\//, $line;
my $value = $output[5];
}
$hash{$key} = $value;
}
It's a good idea, but your $key that is created with my in the if block is a local variable scoped to that block, masking the global $key. Inside the if block the symbol $key has nothing to do with the one you nicely declared upfront. See my in perlsub.
This local $key goes out of scope as soon as if is done and does not exist outside the if block. The global $key is again available after the if, being visible elsewhere in the loop, but is undefined since it has never been assigned to. The same goes for $value in the elsif block.
Just drop the my declaration inside the loop, thus assign to those global variables (as intended?). So, $key = ... and $value = ..., and the hash will be assigned correctly.
Note -- this is about how to get that hash assignment right. I don't know how your actual data looks and whether the line is parsed correctly. Here is a toy input.txt
#cluster t.1
1111 ../../../output1.1.txt/
#cluster t.2
2222 ../../../output2.2.txt/
I pick the 4th field instead of the 6th, $value = $output[3];, and add
print "$_ => $hash{$_}\n" for keys %hash;
after the loop. This prints
1 => output1.1.txt
2 => output2.2.txt
I am not sure whether this is what you want but the hash is built fine.
A comment on choice of tools in parsing
You parse the lines for numbers, by using the property of split to return the separators as well, when they are captured. That is neat, but in some sense it reverses its main purpose, which is to extract other components from the string, as delimited by the pattern. Thus it may make the purpose of the code a little bit convoluted, and you also have to index very precisely to retrieve what you need.
Instead of using split to extract the delimiter itself, which is given by a regex, why not extract it by a regex? That makes the intention crystal clear, too. For example, with input
#cluster t.10 has 4319 elements, 0 subclusters
37652 ../../../../clust/output43888.txt 1.397428
the parsing can go as
if ($line =~ m/^\#cluster/) {
($key) = $line =~ /t\.(\d+)/;
}
elsif ($line =~ m/^(\d+)/) {
($value) = $line =~ m|.*/(\w+\.txt)|;
}
$hash{$key} = $value if defined $key and defined $value;
where t\. and \.txt are added to more precisely specify the targets. If the target strings aren't certain to have that precise form, just capture \d+, and in the second case all non-space after the last /, say by m|^\d+.*/(\S+)|. We use the greediness of .*, which matches everything possible up to the thing that comes after it (a /), thus all the way to the very last /.
Then you can also reduce it to a single regex for each line, for example
if ($line =~ m/^\#cluster\s+t\.(\d+)/) {
$key = $1;
}
elsif ($line =~ m|^\d+.*/(\w+\.txt)|) {
$value = $1;
}
Note that I've added a condition to the hash assignment. The original code in fact assigns an undef on the first iteration, since no $value had yet been seen at that point. This is overwritten on the next iteration and we don't see it if we only print the hash afterwards. The condition also guards you against failed matches, for malformatted lines or such. Of course, far better checks can be run.

Creating multidimensional array while reading a file - Perl

I'm totally new to Perl, and I've been assigned some task... I have to read a tab separated file, and then do some operations with the data in a DB. The .tsv file is like this:
ID Name Date
155 Pedro 1988-05-05
522 Mengano 2002-08-02
So far I thought that creating a multidimensional array with the data of the file will be a good solution to handle this data later. So I read the file line by line, skip the item title columns and save the values in an array. However, I'm having difficulties creating this multidimensional array... this is what I've done so far:
#Read file from path
my #array;
my $fh = path($filename)->openr_utf8;
while (my $line = <$fh>) {
chomp $line;
# skip comments and blank lines and title line
next if $line =~ /^\#/ || $line =~ /^\s*$/ || $line =~ /^\+/ || $line =~ /ID/;
#split each line into array
my #aux_line = split(/\s+/, $line);
push #array, #{ $aux_line };
}
Obviously, last line is not working... how could be done to create an array of arrays this way? I'm little bit lost with references... And somebody can think of a better way to store this data we read from file? Thank you!
You can also do this with map:
use Data::Dumper;
my #stuff = map {[split]} <$fh>;
print Dumper \#stuff;
(with maybe a grep to skip comments)
But it may suit your use case better to use an array of hashes :
my #stuff ;
chomp(my #header = split ' ', <$fh>);
while ( <$fh>) {
my %this_row;
#this_row{#header} = split;
push ( #stuff, \%this_row) ;
}
First, use strict and use warnings. That would instantly alert you about that your wrong way to get array reference tries to access completely different variable (Perl allows variable of different types have same names).
After that just change your last line to:
push #array, \#aux_line;

while and foreach interaction

I am having an undesired interaction with a foreach loop and a while that I don't quite understand. I have a normal for loop, and then a foreach loop going through an array. I create a string (representing a file name) with both and then open the file and read it.
The code I use is here:
#array=(1,2);
for($y=0;$y<2;$y++)
{
foreach(#array)
{
print "#array\n";
$name="/Users/jorge/$_\_vs_$y\.txt";
print "$name\n";
open(INFILE,"$name") or die "Can't open files!\n";
while(<INFILE>)
{
$line=$_;
}
}
}
and the output is:
array: 1 2
/Users/jorge/1_vs_0.txt
array: 2
/Users/jorge/2_vs_0.txt
array:
/Users/jorge/_vs_1.txt
Seems that somehow the while loop is shortening my array, if I remove the:
while(<INFILE>)
it works as intented, also if I change the foreach too:
foreach $tmp (#array)
and use $tmp instead of $_, it also works as intended, the output looks like this:
array: 1 2
/Users/jorge/1_vs_0.txt
array: 1 2
/Users/jorge/2_vs_0.txt
array: 1 2
/Users/jorge/1_vs_1.txt
array: 1 2
/Users/jorge/2_vs_1.txt
array: 1 2
/Users/jorge/1_vs_2.txt
You're using $_ as the loop control variable on two nested loops. Instead, you should give each loop its own variable.
Not only is the inner loop changing the value of the control variable for the outer loop, but it's also changing the actual array being looped over. That's because the loop control variable in Perl's for/foreach aliases the elements of the array, so when the <INFILE> construct reads a line into $_, it overwrites the current element of #array with that line. When the while loop finishes reading the file, $_ comes out of the last <INFILE> as undefined, which means the most-recently processed element of #array will always be undefined when you get back to the top of the foreach loop.
You should also be declaring your variables with my, using a lexical scalar instead of a bareword as a file handle, and using the the three-argument version of open, so I've made those changes below as well. But the solution to your shrinking-array problem is just the use of an explicit variable instead of the default $_ for at least one of the two loops.
my #array = (1, 2);
for (my $y = 0; $y < 2; $y++)
{
foreach my $x (#array) # using $x instead of $_
{
print "#array\n";
my $name = "/Users/jorge/${x}_vs_$y.txt";
print "$name\n";
open my $infile, '<', $name or die "$0: can't open file '$name': $!\n";
while (my $line = <$infile>) # using $line instead of $_
{
# do something with $line here
}
}
}
Your while loop overwrites content of array as it uses $_ which is aliased to #array elements.
It is best to use lexical (my) variable when reading file using while,
while (my $line = <INFILE>) {
# ..
}

Assign Array to Variable without Extra Blank Line in Perl

I'm writing a perl script that uploads the contents from an array into a database. To do this, I've created a "foreach" statement that loops through the array and assigns its contents into a variable:
foreach my $line (#array)
{ $data .= "$line\n" }
This script works, and I'm able to upload the variable into my database. However, thanks to the new line character at the end of my statement (used to retain the original line breaks of my array), my variable contains an extra blank line at the end, that also get's put into the database.
What is the best way to get rid of this blank line? Is this something that I can only really fix after the loop? I'm extremely new to Perl, so I'm a bit confused. Thanks!
my $data = join "\n", #array;
The join builtin takes a separator and a list, and concatenates the elements of your list. It is equivalent to
sub join ($#) {
my $sep = shift;
#_ or return "";
my $str = shift;
$str .= $sep . shift while #_;
return $str;
}
Maybe you can try the next:
my $data = join "\n", #array;

How can I extract just the elements I want from a Perl array?

Hey I'm wondering how I can get this code to work. Basically I want to keep the lines of $filename as long as they contain the $user in the path:
open STDERR, ">/dev/null";
$filename=`find -H /home | grep $file`;
#filenames = split(/\n/, $filename);
for $i (#filenames) {
if ($i =~ m/$user/) {
#keep results
} else {
delete $i; # does not work.
}
}
$filename = join ("\n", #filenames);
close STDERR;
I know you can delete like delete $array[index] but I don't have an index with this kind of loop that I know of.
You could replace your loop with:
#filenames = grep /$user/, #filenames;
There's no way to do it when you're using foreach loop. But nevermind. The right thing to do is to use File::Find to accomplish your task.
use File::Find 'find';
...
my #files;
my $wanted = sub {
return unless /\Q$file/ && /\Q$user/;
push #files, $_;
};
find({ wanted => $wanted, no_chdir => 1 }, '/home');
Don't forget to escape your variables with \Q for use in regular expressions.
BTW, redirecting your STDERR to /dev/null is better written as
{
local *STDERR;
open STDERR, '>', '/dev/null';
...
}
It restores the filehandle after exiting the block.
If you have a find that supports -path, then make it do the work for you, e.g.,
#! /usr/bin/perl
use warnings;
use strict;
my $user = "gbacon";
my $file = "bash";
my $path = join " -a " => map "-path '*$_*'", $user, $file;
chomp(my #filenames = `find -H /home $path 2>/dev/null`);
print map "$_\n", #filenames;
Note how backticks in list context give back a list of lines (including their terminators, removed above with chomp) from the command's output. This saves you having to split them yourself.
Output:
/home/gbacon/.bash_history
/home/gbacon/.bashrc
/home/gbacon/.bash_logout
If you want to remove an item from an array, use the multi-talented splice function.
my #foo = qw( a b c d e f );
splice( #foo, 3, 1 ); # Remove element at index 3.
You can do all sorts of other manipulations with splice. See the perldoc for more info.
As codeholic alludes to, you should never modify an array while iterating over it with a for loop. If you want to modify an array while iterating, use a while loop instead.
The reason for this is that for evaluates the expression in parens once, and maps each item in the result list to an alias. If the array changes, the pointers get screwed up and chaos will follow.
A while evaluates the condition each time through the loop, so you won't run into issues with pointers to non-existent values.

Resources