while and foreach interaction - arrays

I am having an undesired interaction with a foreach loop and a while that I don't quite understand. I have a normal for loop, and then a foreach loop going through an array. I create a string (representing a file name) with both and then open the file and read it.
The code I use is here:
#array=(1,2);
for($y=0;$y<2;$y++)
{
foreach(#array)
{
print "#array\n";
$name="/Users/jorge/$_\_vs_$y\.txt";
print "$name\n";
open(INFILE,"$name") or die "Can't open files!\n";
while(<INFILE>)
{
$line=$_;
}
}
}
and the output is:
array: 1 2
/Users/jorge/1_vs_0.txt
array: 2
/Users/jorge/2_vs_0.txt
array:
/Users/jorge/_vs_1.txt
Seems that somehow the while loop is shortening my array, if I remove the:
while(<INFILE>)
it works as intented, also if I change the foreach too:
foreach $tmp (#array)
and use $tmp instead of $_, it also works as intended, the output looks like this:
array: 1 2
/Users/jorge/1_vs_0.txt
array: 1 2
/Users/jorge/2_vs_0.txt
array: 1 2
/Users/jorge/1_vs_1.txt
array: 1 2
/Users/jorge/2_vs_1.txt
array: 1 2
/Users/jorge/1_vs_2.txt

You're using $_ as the loop control variable on two nested loops. Instead, you should give each loop its own variable.
Not only is the inner loop changing the value of the control variable for the outer loop, but it's also changing the actual array being looped over. That's because the loop control variable in Perl's for/foreach aliases the elements of the array, so when the <INFILE> construct reads a line into $_, it overwrites the current element of #array with that line. When the while loop finishes reading the file, $_ comes out of the last <INFILE> as undefined, which means the most-recently processed element of #array will always be undefined when you get back to the top of the foreach loop.
You should also be declaring your variables with my, using a lexical scalar instead of a bareword as a file handle, and using the the three-argument version of open, so I've made those changes below as well. But the solution to your shrinking-array problem is just the use of an explicit variable instead of the default $_ for at least one of the two loops.
my #array = (1, 2);
for (my $y = 0; $y < 2; $y++)
{
foreach my $x (#array) # using $x instead of $_
{
print "#array\n";
my $name = "/Users/jorge/${x}_vs_$y.txt";
print "$name\n";
open my $infile, '<', $name or die "$0: can't open file '$name': $!\n";
while (my $line = <$infile>) # using $line instead of $_
{
# do something with $line here
}
}
}

Your while loop overwrites content of array as it uses $_ which is aliased to #array elements.
It is best to use lexical (my) variable when reading file using while,
while (my $line = <INFILE>) {
# ..
}

Related

How does foreach(#array, $var) work in Perl?

I am trying to figure out some Perl code someone else wrote, and I'm confused with the following syntax of foreach loop
foreach (#array, $var){
....
}
This code runs, but nobody else uses it based on my google search online. And it doesn't work the same way as the more common way of foreach with arrays, which is:
foreach $var (#array){
....
}
Could someone explain this syntax?
foreach (#array, $var){ .... }
is short for
foreach $_ (#array, $var){ .... }
which is short for
foreach $_ ($array[0], $array[1], ... $array[N], $var){ .... }
So on each iteration of the loop, $_ is set to be each element of the array and then finally to be $var. The $_ variable is just a normal perl variable, but one which is used in many places within the Perl syntax as a default variable to use if one is not explicitly mentioned.
Perl's for() doesn't operate on an array, it operates on a list.
for (#array, $var) {...}
...expands #array and $var into a single list of individual scalar elements, and then iterates over that list one element at a time.
It's no different than if you had of done:
my #array = (1, 2, 3);
my $var = 4;
push #array, $var;
for (#array) {...}
Also, for my $var (#array, $var) {...} will work just fine, because with my, you're localizing a new $var scalar, and the external, pre-defined one is left untouched. perl knows the difference.
You should always use strict;, and declare your variables with my.

How to read a .txt file and store it into an array

I know this is a fairly simple question, but I cannot figure out how to store all of the values in my array the way I want to.
Here is a small portion what the .txt file looks like:
0 A R N D
A 2 -2 0 0
R -2 6 0 -1
N 0 0 2 2
D 0 -1 2 4
Each value is delimited by either two spaces - if the next value is positive - or a space and a '-' - if the next value is negative
Here is the code:
use strict;
use warnings;
open my $infile, '<', 'PAM250.txt' or die $!;
my $line;
my #array;
while($line = <$infile>)
{
$line =~ /^$/ and die "Blank line detected at $.\n";
$line =~ /^#/ and next; #skips the commented lines at the beginning
#array = $line;
print "#array"; #Prints the array after each line is read
};
print "\n\n#array"; #only prints the last line of the array ?
I understand that #array only holds the last line that was passed to it. Is there a way where I can get #array to hold all of the lines?
You are looking for push.
push #array, $line;
You undoubtedly want to precede this with chomp to snip any newlines, first.
If file is small as compared to available memory of your machine then you can simply use below method to read content of file in to an array
open my $infile, '<', 'PAM250.txt' or die $!;
my #array = <$infile>;
close $infile;
If you are going to read a very large file then it is better to read it line by line as you are doing but use PUSH to add each line at end of array.
push(#array,$line);
I will suggest you also read about some more array manipulating functions in perl
You're unclear to what you want to achieve.
Is every line an element of your array?
Is every line an array in your array and your "words" are the elements of this array?
Anyhow.
Here is how you can achieve both:
use strict;
use warnings;
use Data::Dumper;
# Read all lines into your array, after removing the \n
my #array= map { chomp; $_ } <>;
# show it
print Dumper \#array;
# Make each line an array so that you have an array of arrays
$_= [ split ] foreach #array;
# show it
print Dumper \#array;
try this...
sub room
{
my $result = "";
open(FILE, <$_[0]);
while (<FILE>) { $return .= $_; }
close(FILE);
return $result;
}
so you have a basic functionality without great words. the suggest before contains the risk to fail on large files. fastest safe way is that. call it as you like...
my #array = &room('/etc/passwd');
print room('/etc/passwd');
you can shorten, rename as your convinience believes.
to the kidding ducks nearby: by this way the the push was replaced by simplictiy. a text-file contains linebreaks. the traditional push removes the linebreak and pushing up just the line. the construction of an array is a simple string with linebreaks. now contain the steplength...

Perl - initialization of hash

I'm not sure how to correctly initialize my hash - I'm trying to create a key/value pair for values in coupled lines in my input file.
For example, my input looks like this:
#cluster t.18
46421 ../../../output###.txt/
#cluster t.34
41554 ../../../output###.txt/
I'm extracting the t number from line 1 (#cluster line) and matching it to output###.txt in the second line (line starting with 46421). However, I can't seem to get these values into my hash with the script that I have written.
#!/usr/bin/perl
use warnings;
use strict;
my $key;
my $value;
my %hash;
my $filename = 'input.txt';
open my $fh, '<', $filename or die "Can't open $filename: $!";
while (my $line = <$fh>) {
chomp $line;
if ($line =~ m/^\#cluster/) {
my #fields = split /(\d+)/, $line;
my $key = $fields[1];
}
elsif ($line =~ m/^(\d+)/) {
my #output = split /\//, $line;
my $value = $output[5];
}
$hash{$key} = $value;
}
It's a good idea, but your $key that is created with my in the if block is a local variable scoped to that block, masking the global $key. Inside the if block the symbol $key has nothing to do with the one you nicely declared upfront. See my in perlsub.
This local $key goes out of scope as soon as if is done and does not exist outside the if block. The global $key is again available after the if, being visible elsewhere in the loop, but is undefined since it has never been assigned to. The same goes for $value in the elsif block.
Just drop the my declaration inside the loop, thus assign to those global variables (as intended?). So, $key = ... and $value = ..., and the hash will be assigned correctly.
Note -- this is about how to get that hash assignment right. I don't know how your actual data looks and whether the line is parsed correctly. Here is a toy input.txt
#cluster t.1
1111 ../../../output1.1.txt/
#cluster t.2
2222 ../../../output2.2.txt/
I pick the 4th field instead of the 6th, $value = $output[3];, and add
print "$_ => $hash{$_}\n" for keys %hash;
after the loop. This prints
1 => output1.1.txt
2 => output2.2.txt
I am not sure whether this is what you want but the hash is built fine.
A comment on choice of tools in parsing
You parse the lines for numbers, by using the property of split to return the separators as well, when they are captured. That is neat, but in some sense it reverses its main purpose, which is to extract other components from the string, as delimited by the pattern. Thus it may make the purpose of the code a little bit convoluted, and you also have to index very precisely to retrieve what you need.
Instead of using split to extract the delimiter itself, which is given by a regex, why not extract it by a regex? That makes the intention crystal clear, too. For example, with input
#cluster t.10 has 4319 elements, 0 subclusters
37652 ../../../../clust/output43888.txt 1.397428
the parsing can go as
if ($line =~ m/^\#cluster/) {
($key) = $line =~ /t\.(\d+)/;
}
elsif ($line =~ m/^(\d+)/) {
($value) = $line =~ m|.*/(\w+\.txt)|;
}
$hash{$key} = $value if defined $key and defined $value;
where t\. and \.txt are added to more precisely specify the targets. If the target strings aren't certain to have that precise form, just capture \d+, and in the second case all non-space after the last /, say by m|^\d+.*/(\S+)|. We use the greediness of .*, which matches everything possible up to the thing that comes after it (a /), thus all the way to the very last /.
Then you can also reduce it to a single regex for each line, for example
if ($line =~ m/^\#cluster\s+t\.(\d+)/) {
$key = $1;
}
elsif ($line =~ m|^\d+.*/(\w+\.txt)|) {
$value = $1;
}
Note that I've added a condition to the hash assignment. The original code in fact assigns an undef on the first iteration, since no $value had yet been seen at that point. This is overwritten on the next iteration and we don't see it if we only print the hash afterwards. The condition also guards you against failed matches, for malformatted lines or such. Of course, far better checks can be run.

Why is my script only accessing the first element in array?

Below is my script.
I have attempted many print statements to work out why it is only accessing the first array element. The pattern match works. The array holds a minimum 40 elements. I have checked and it is full.
I have printed each line, and each line prints.
my $index = 0;
open(FILE, "$file") or die "\nNot opening $file for reading\n\n";
open(OUT, ">$final") or die "Did not open $final\n";
while (<FILE>) {
foreach my $barcode (#barcode) {
my #line = <FILE>;
foreach $_ (#line) {
if ($_ =~ /Barcode([0-9]*)\t$barcode[$index]\t$otherarray[$index]/) {
my $bar = $1;
$_ =~ s/.*//;
print OUT ">Barcode$bar"."_"."$barcode[$index]\t$otherarray[$index]";
}
print OUT $_;
}
$index++;
}
}
Okay, lets say the input was:
File:
Barcode001 001 abc
Barcode002 002 def
Barcode003 003 ghi
#barcode holds:
001
002
003
#otherarray holds:
abc
def
ghi
The output result for this script is currently printing only:
Barcode001_001 abc
It should be printing:
>Barcode001_001 abc
>Barcode002_002 def
>Barcode003_003 ghi
Where it should be printing a whole load up to ~40 lines.
Any ideas? There must be something wrong with the way I am accessing the array elements? Or incrementing? Hoping this isn't something too silly!
Thanks in advance.
It needs the index because I am trying to match arrays in parallel, as they are ordered. Each line needs to match the corresponding indices of the arrays to each line in the file.
It's a little hard to answer with certainty without more information about the contents of #barcode and FILE, but there is something odd in your code which makes me think that it might be the problem.
The construct while (<FILE>) { ... } will, until end of file, read a line from FILE into $_ and then execute the contents of the loop. In your code, you also read all the lines from FILE from within the loop that iterates over #barcode. I think it is likely that you intended to check each line from FILE against all the elements of #barcode, which would make the loop look like the following:
while (my $line = <FILE>) {
foreach my $barcode (#barcode) {
if ($line =~ /Barcode([0-9]*)\t$barcode/) {
my $bar = $1;
print OUT ">Barcode$bar"."_"."$barcode\n";
}
else {
print OUT $line;
}
}
}
I've taken the liberty of doing a bit of code tidying, but I may have made some unwarranted assumptions.
Your core problem in the above is - in your first iteration you slurp all of your file into #lines. But because it's lexically scoped to the loop, it disappears when that loop completes.
Furthermore:
I would strongly suggest that you don't use $_ like that.
$_ is a special variable that's set implicitly in loops. I'd strongly suggest that you need to replace that with something that isn't a special variable, because that's a sure way to cause yourself pain.
turn on use strict; and use warnings;
use 3 argument open with a lexical filehandle.
perltidy your code, so the bracketing looks right.
you've a search and replace pattern on $_ that's emptying it completely, but then you're trying to print it. You may well not be printing what you think you're printing.
You're accessing <FILE> outside and inside your loop. This will cause you problems.
Barcode([0-9]*) - with a '*' there you're saying 'zero or more' is valid. You may want to consider \d+ - one or more digits.
referencing multiple arrays by index is messy. I'd suggest coalescing them into a hash lookup (lookup by key - barcode)
This line:
my #line = <FILE>;
reads your whole file into #line. But you do this inside the while loop that iterates... each line in <FILE>. Don't do that, it's horrible.
Is this something like what you wanted?
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #barcode = qw (
001
002
003
);
my #otherarray = qw (
abc
def
ghi
);
my %lookup;
#lookup{#barcode} = #otherarray;
print Dumper \%lookup;
#commented because I don't have your source data
#my $file = "input_file_name";
#my $output = "output_file_name";
#open( my $input, "<", $file ) or die "\nNot opening $file for reading\n\n";
#open( my $output, ">", $final ) or die "Did not open $final\n";
#while ( my $line = <$input> )
while ( my $line = <DATA> ) {
foreach my $barcode (#barcode) {
if ( my ($bar) = ( $line =~ /Barcode(\d+)\s+$barcode/ ) ) {
print ">Barcode$bar" . "_" . "$barcode $lookup{$barcode}\n";
#print {$output} ">Barcode$bar" . "_" . "$lookup{$barcode}\n";
}
}
}
__DATA__
Barcode001 001
Barcode002 002
Barcode003 003
Prints:
$VAR1 = {
'001' => 'abc',
'002' => 'def',
'003' => 'ghi'
};
>Barcode001_001 abc
>Barcode002_002 def
>Barcode003_003 ghi
It turns out it was a simple issue as I had suspected being a Monday. I had a colleague go through it with me, and it was the placing of the index:
#my $index = 0; #This means the index is iterated through,
#but for each barcode for one line, then it continues
#counting up and misses the other values, therefore
#repeatedly printing just the first element of the array.
open(FILE, "$file") or die "\nNot opening $file for reading\n\n";
open(OUT, ">$final") or die "Did not open $final\n";
while (<FILE>) {
$index = 0; #New placement of $index for initialising.
foreach my $barcode (#barcode) {
my #line = <FILE>;
foreach $_ (#line) {
if ($_ =~ /Barcode([0-9]*)\t$barcode[$index]\t$otherarray[$index]/) {
my $bar = $1;
$_ =~ s/.*//;
print OUT ">Barcode$bar"."_"."$barcode[$index]\t$otherarray[$index]";
}
print OUT $_;
$index++; #Increment here
}
#$index++;
}
}
Thanks to everyone for their responses, for my original and poorly worded question they would have worked and may be more efficient, but for the purpose of the script and my edited question, it needs to be this way.

Array got flushed after while loop within a filehandle

I got a problem with a Perl script.
Here is the code:
use strict;
use Data::Dumper;
my #data = ('a','b');
if(#data){
for(#data){
&debug_check($_)
}
}
print "#data";#Nothing is printed, the array gets empty after the while loop.
sub debug_check{
my $ip = shift;
open my $fh, "<", "debug.txt";
while(<$fh>){
print "$_ $ip\n";
}
}
Array data, in this example, has two elements. I need to check if the array has elements. If it has, then for each element I call a subroutine, in this case called "debug_check". Inside the subroutine I need to open a file to read some data. After I read the file using a while loop, the data array gets empty.
Why the array is being flushed and how do I avoid this strange behavior?
Thanks.
The problem here I think, will be down to $_. This is a bit of a special case, in that it's an alias to a value. If you modify $_ within a loop, it'll update the array. So when you hand it into the subroutine, and then shift it, it also updates #data.
Try:
my ( $ip ) = #_;
Or instead:
for my $ip ( #array ) {
debug_check($ip);
}
Note - you should also avoid using an & prefix to a sub. It has a special meaning. Usually it'll work, but it's generally redundant at best, and might cause some strange glitches.
while (<$fh>)
is short for
while (defined($_ = <$fh>))
$_ is currently aliased to the element of #data, so your sub is replacing each element of #data with undef. Fix:
while (local $_ = <$fh>)
which is short for
while (defined(local $_ = <$fh>))
or
while (my $line = <$fh>) # And use $line instead of $_ afterwards
which is short for
while (defined(my $line = <$fh>))
Be careful when using global variables. You want to localize them if you modify them.

Resources