Why does my Perl sub receive the parameters of its parent? - arrays

Here is my test code :
#!/bin/perl
use strict;
use Array::Utils qw[array_minus];
sub sub1 {
my #array1 = qw(1 2 3);
my #array2 = qw(1 3 5);
my #arrayMinus = array_minus(#array1, #array2);
my #sortedArrayMinus = sort #arrayMinus;
print "Result from array_minus + sort : " . join(",", #sortedArrayMinus) . "\n";
my #sortedArrayMinus2 = sort array_minus(#array1, #array2);
print "Result from sort array_minus : " . join(",", #sortedArrayMinus2) . "\n";
}
sub1("a","b");
When I run it, it gives the following result :
Result from array_minus + sort : 2
Can't use string ("b") as an ARRAY ref while "strict refs" in use at Array/Utils.pm line 123.
So, the second call to array_minus fails because of wrong parameters.
I'm using version 0.5 of the Array::Utils library (I've manually copied it from http://cpansearch.perl.org/src/ZMIJ/Array-Utils-0.5/Utils.pm)
The relevant lines of this file are :
sub array_minus(\#\#) {
my %e = map{ $_ => undef } #{$_[1]};
return grep( ! exists( $e{$_} ), #{$_[0]} );
}
If I debug the value of #_ in array_minus, its value is OK for the first call, but it's [ 'a', 'b' ] for the second call.
So it behaves as if array_minus sub receives the parameters of sub1, instead of the ones I passed, but only when I also ask to sort the result on the same line. What's wrong in my code?
I'm using Perl 5.22.1.

This expression:
sort array_minus(#array1, #array2)
really means "sort the list resulting from concatenating #array1 and #array2, using array_minus as the comparison function".
As explained in perldoc -f sort:
Warning: syntactical care is required when sorting the list returned from a function. If you want to sort the list returned by the function call find_records(#key), you can use:
my #contact = sort { $a cmp $b } find_records #key;
my #contact = sort +find_records(#key);
my #contact = sort &find_records(#key);
my #contact = sort(find_records(#key));
... because otherwise you're hitting the sort SUBNAME LIST syntax (which exists for historical reasons: perl had sort long before it supported subroutine references).

Read the warning in sort and fix the syntax:
my #sortedArrayMinus2 = sort(array_minus(#array1, #array2));
The original syntax was telling sort to use array_minus as the comparison function.

Related

How to modify array elements using subroutine in Perl

I am trying to modify an array passed to a subroutine.
I am passing an array reference to the subroutine and assigning new values but it is not getting reflected in the caller side.
Below is my program.
sub receiveArray {
my $arrayref = #_;
#{$arrayref} = ( 4, 5, 6 );
}
#ar = ( 1, 2, 3 );
print "Values of the function before calling the function\n";
foreach my $var ( #ar ) {
print $var;
print "\n";
}
receiveArray(\#ar);
print "Values of the function after calling the function\n";
foreach my $var ( #ar ) {
print $var;
print "\n";
}
What is the problem in the above code?
You should start every Perl file you write with use strict; use warnings;. That will help you avoid errors like this.
The problem is in this line:
my $arrayref = #_;
You're assigning an array to a scalar, so the array is evaluated in scalar context, which yields the number of elements in the array.
What you should do instead is:
my ($arrayref) = #_;
Now it's using list assignment, putting the first function argument into $arrayref (and ignoring the rest, if any).
List assignment is documented in perldoc perldata (the part starting with "Lists may be assigned to ...").

Perl matching multidimensional array elements

Im not getting any output, anyone get where the issue lies,
matching or calling?
(The two subarrays in the multidimensional array have the same length.)
//Multidimensional array,
//Idarray = Fasta ID, Seqarray = "ATTGTTGGT" sequences
#ordarray = (\#idarray, \#seqarray);
//This calling works
print $ordarray[0][0] , "\n";
print $ordarray[1][0] , "\n", "\n";
// Ordarray output = "TTGTGGCACATAATTTGTTTAATCCAGAT....."
User inputs a search string, loop iterates the sequence dimension,
and counts amount of matches. Prints number of matches and the corresponding ID from the ID dimension.
//The user input-searchstring
$sestri = <>;
for($r=0;$r<#idarray;$r++) {
if ($sestri =~ $ordarray[1][$r] ){
print $ordarray[0][$r] , "\n";
$counts = () = $ordarray[0][$r] =~ /$sestri/g;
print "number of counts: ", $counts ;
}
I think the problem lies with this:
$sestri = <>;
That may well not be doing what you intended - your comment says "user specified search string" but that's not what that operator does.
What it does, is open the filename you specifed on the command line, and 'return' the first line.
I would suggest that if you want to grab a search string from command line you want to do it via #ARGV
E.g.
my ( $sestri ) = #ARGV; # will give first word.
However, please please please switch on use strict and use warnings. You should always do this prior to posting on a forum for assistance.
I would also question quite why you need a two dimensional array with two elements in it though. It seems unnecessary.
Why not instead make a hash, and key your "fasta ids" to the sequence?
E.g.
my %id_of;
#id_of{#seqarray} = #idarray;
my %seq_of;
#seq_of{#id_array} = #seqarray;
I think this would suit your code a bit better, because then you don't have to worry about the array indicies at all.
use strict;
use warnings;
my ($sestri) = #ARGV;
my %id_of;
#id_of{#seqarray} = #idarray;
foreach my $sequence ( keys %id_of ) {
##NB - this is a pattern match, and will be 'true'
## if $sestri is a substring of $sequence
if ( $sequence =~ m/$sestri/ ) {
print $id_of{$sequence}, "\n";
my $count = () = $sequence =~ m/$sestri/g;
print "number of counts: ", $count, "\n";
}
}
I've rewritten it a bit, because I'm not entirely understanding what your code is doing. It looks like it's substring matching in #seqarray but then returning the count of matching elements in #idarray I don't think that makes sense, but if it does, then amend according to your needs.

Perl: correctly print array of arrays (dereference)

Hey fellow perl monks,
I'm still wrapping my head around how to correctly dereference. (I read the similar posts prior to posting, but unfortunately am still a bit cloudy on the concept.)
I have the following array, which internally is composed of two arrays. (BTW, I am using strict and warning pragmas.)
use strict; use warnings;
my #a1; my #a2;
where:
#a1 = ( "1MB", "2MB", ... )
and..
#a2 = ( "/home", "/home/debug", ... )
Both #a1 & #a2 are arrays which contain 51 rows. So, I populate these into my 2nd array.
my #b;
push (#b, [ #a1, #a2 ]);
However, when I try to print the results of #b:
sub newl { print "\n"; print "\n"; }
my $an1; my #an1;
$an1 = $#a1;
#an1 = ( 0, 1..$an1 );
for my $i (#an1) { print #b[$i]; &newl; }
I see references to the arrays:
ARRAY(0x81c0a10)
.
ARRAY(0x81c0a50)
.
.
.
How do I properly print this array? I know I need to dereference the array, I'm not sure how to go about doing this. I tried populating my array as such:
push (#b, [ \#a1, \#a2 ]);
Which produces the same results. I also tried:
for my $i (#an1) { print #{$b[$i]}; &newl; }
Which unfortunately errors due to having 0 as an array reference?
Can't use string ("0") as an ARRAY ref while "strict refs" in use at p_disk_ex6.pl line 42.
Any suggestions are greatly appreciated!
A short example program, which might help you:
use strict;
use warnings;
my #a1 = qw(1MB 2MB 10MB 7MB);
my #a2 = qw(/foo /bar /flub /blub);
my #b = (\#a1, \#a2);
# equivalent long version:
# my #b = ();
# $b[0] = \#a1;
# $b[1] = \#a2;
for (my $i = 0; $i <= $#a2; $i++) {
print "a1[$i]: $b[0][$i]\n";
print "a2[$i]: $b[1][$i]\n";
print "\n";
}
In your example you were pushin an anoymous arrayref [] into #b. Therefore $b[0] contained the arrayref.
my #b;
push (#b, [ \#a1, \#a2 ]);
# this corresponds to:
# $b[0][0] = \#a1;
# $b[0][1] = \#a2;
In the example where you wrote [#a1, #a2] you were creating an array_ref which contained the joined arrays #a1 and #a2 (first all elements of #a1, and then all elements of #a2):
my #b;
push(#b , [#a1, #a2]);
# $b[0] = ['1MB' , '2MB', '10Mb', '7MB', '/foo', '/bar', '/flub', '/blub']
Even Simply this also works
use strict;
use warnings;
my #a1 = qw(1MB 2MB 10MB 7MB);
my #a2 = qw(/foo /bar /flub /blub);
my #b = (#a1, #a2);
print "#b";
If you want a general solution that doesn't assume how many elements there are in each of the sub-arrays, and which also allows arbitrary levels of nesting, you're better off using packages that someone else has already written for displaying recursive data structures. A particularly prevalent one is YAML, which you can install if you don't already have it by running cpan:
$ cpan
Terminal does not support AddHistory.
cpan shell -- CPAN exploration and modules installation (v1.9800)
Enter 'h' for help.
cpan[1]> install YAML
Then you can display arbitrary data structures easily. To demonstrate with a simple example:
use YAML;
my #a1 = qw(1MB 2MB 10MB 7MB);
my #a2 = qw(/foo /bar /flub /blub);
my #b = (\#a1, \#a2);
print Dump(\#b);
results in the output
---
-
- 1MB
- 2MB
- 10MB
- 7MB
-
- /foo
- /bar
- /flub
- /blub
For a slightly more complicated example
my #b = (\#a1, \#a2,
{ a => 0, b => 1 } );
gives
---
-
- 1MB
- 2MB
- 10MB
- 7MB
-
- /foo
- /bar
- /flub
- /blub
- a: 0
b: 1
To read this, the three "-" characters in column 1 indicate an array with three elements.
The first two elements have four sub elements each (the lines with "-" in column 3). The
third outer element is a hash reference, since it is made up of "key: value" pairs.
A nice feature about YAML is that you can use it to dump any recursive data structure into a file, except those with subroutine references, and then read it back later using Load.
If you really have to roll your own display routine, that is certainly possible, but you'll have a much easier time if you write it recursively. You can check whether your argument is an array reference or a hash reference (or a scalar reference) by using ref:
my #a1 = qw(1MB 2MB 10MB 7MB);
my #a2 = qw(/foo /bar /flub /blub);
my #b = (\#a1, \#a2,
{ a => 0, b => 1 } );
print_recursive(\#b);
print "\n";
sub print_recursive {
my ($obj) = #_;
if (ref($obj) eq 'ARRAY') {
print "[ ";
for (my $i=0; $i < #$obj; $i++) {
print_recursive($obj->[$i]);
print ", " if $i < $#$obj;
}
print " ]";
}
elsif (ref($obj) eq 'HASH') {
print "{ ";
my #keys = sort keys %$obj;
for (my $i=0; $i < #keys; $i++) {
print "$keys[$i] => ";
print_recursive($obj->{$keys[$i]});
print ", " if $i < $#keys;
}
print " }";
}
else {
print $obj;
}
}
which produces the output
[ [ 1MB, 2MB, 10MB, 7MB ], [ /foo, /bar, /flub, /blub ], { a => 0, b => 1 } ]
I have not written my example code to worry about pretty-printing, nor does it
handle scalar, subroutine, or blessed object references, but it should give you the idea of how you can write a fairly general recursive data structure dumper.

How to retrieve an array from a hash that has been passed to a subroutine in perl

I am trying to write a subroutine that takes in a hash of arrays as an argument. However, when I try to retrieve one of the arrays, I seem to get the size of the array instead of the array itself.
my(%hash) = ( );
$hash{"aaa"} = ["blue", 1];
_subfoo("test", %hash);
sub _subfoo {
my($test ,%aa) = #_;
foreach my $name (keys %aa) {
my #array = #{$aa{$name}};
print $name. " is ". #array ."\n";
}
}
This returns 2 instead of (blue, 1) as I expected. Is there some other way to handle arrays in hashes when in a subroutine?
Apologies if this is too simple for stack overflow, first time poster, and new to programming.
You're putting your #array array into a scalar context right here:
print $name. " is ". #array ."\n";
An array in scalar context gives you the number of elements in the array and #array happens to have 2 elements. Try one of these instead:
print $name . " is " . join(', ', #array) . "\n";
print $name, " is ", #array, "\n";
print "$name is #array\n";
and you'll see the elements of your #array. Using join lets you paste the elements together as you please; the second one evaluates #array in list context and will mash the values together without separating them; the third interpolates #array by joining its elements together with $" (which is a single space by default).
As mu is too short has mentioned, you used the array in scalar context, and therefore it returned its length instead of its elements. I had some other pointers about your code.
Passing arguments by reference is sometimes a good idea when some of those arguments are arrays or hashes. The reason for this is that arrays and hashes are expanded into lists before being passed to the subroutine, which makes something like this impossible:
foo(#bar, #baz);
sub foo { # This will not work
my (#array1, #array2) = #_; # All the arguments will end up in #array1
...
}
This will work, however:
foo(\#bar, \#baz);
sub foo {
my ($aref1, $aref2) = #_;
...
}
You may find that in your case, each is a nice function for your purposes, as it will make dereferencing the array a bit neater.
foo("test", \%hash); # note the backslash to pass by reference
sub foo {
my ($test, $aa) = #_; # note use of scalar $aa to store the reference
while (my ($key, $value) = each %$aa)) { # note dereferencing of $aa
print "$key is #$value\n"; # ...and $value
}
}

Perl: mapping to lists' first element

Task: to build hash using map, where keys are the elements of the given array #a, and values are the first elements of the list returned by some function f($element_of_a):
my #a = (1, 2, 3);
my %h = map {$_ => (f($_))[0]} #a;
All the okay until f() returns an empty list (that's absolutely correct for f(), and in that case I'd like to assign undef). The error could be reproduced with the following code:
my %h = map {$_ => ()[0]} #a;
the error itself sounds like "Odd number of elements in hash assignment". When I rewrite the code such that:
my #a = (1, 2, 3);
my $s = ()[0];
my %h = map {$_ => $s} #a;
or
my #a = (1, 2, 3);
my %h = map {$_ => undef} #a;
Perl does not complain at all.
So how should I resolve this — get first elements of list returned by f(), when the returned list is empty?
Perl version is 5.12.3
Thanks.
I've just played around a bit, and it seems that ()[0], in list context, is interpreted as an empty list rather than as an undef scalar. For example, this:
my #arr = ()[0];
my $size = #arr;
print "$size\n";
prints 0. So $_ => ()[0] is roughly equivalent to just $_.
To fix it, you can use the scalar function to force scalar context:
my %h = map {$_ => scalar((f($_))[0])} #a;
or you can append an explicit undef to the end of the list:
my %h = map {$_ => (f($_), undef)[0]} #a;
or you can wrap your function's return value in a true array (rather than just a flat list):
my %h = map {$_ => [f($_)]->[0]} #a;
(I like that last option best, personally.)
The special behavior of a slice of an empty list is documented under “Slices” in perldata:
A slice of an empty list is still an empty list. […] This makes it easy to write loops that terminate when a null list is returned:
while ( ($home, $user) = (getpwent)[7,0]) {
printf "%-8s %s\n", $user, $home;
}
I second Jonathan Leffler's suggestion - the best thing to do would be to solve the problem from the root if at all possible:
sub f {
# ... process #result
return #result ? $result[0] : undef ;
}
The explicit undef is necessary for the empty list problem to be circumvented.
At first, much thanks for all repliers! Now I'm feeling that I should provide the actual details of the real task.
I'm parsing a XML file containing the set of element each looks like that:
<element>
<attr_1>value_1</attr_1>
<attr_2>value_2</attr_2>
<attr_3></attr_3>
</element>
My goal is to create Perl hash for element that contains the following keys and values:
('attr_1' => 'value_1',
'attr_2' => 'value_2',
'attr_3' => undef)
Let's have a closer look to <attr_1> element. XML::DOM::Parser CPAN module that I use for parsing creates for them an object of class XML::DOM::Element, let's give the name $attr for their reference. The name of element is got easy by $attr->getNodeName, but for accessing the text enclosed in <attr_1> tags one has to receive all the <attr_1>'s child elements at first:
my #child_ref = $attr->getChildNodes;
For <attr_1> and <attr_2> elements ->getChildNodes returns a list containing exactly one reference (to object of XML::DOM::Text class), while for <attr_3> it returns an empty list. For the <attr_1> and <attr_2> I should get value by $child_ref[0]->getNodeValue, while for <attr_3> I should place undef into the resulting hash since no text elements there.
So you see that f function's (method ->getChildNodes in real life) implementation could not be controlled :-) The resulting code that I have wrote is (the subroutine is provided with list of XML::DOM::Element references for elements <attr_1>, <attr_2>, and <attr_3>):
sub attrs_hash(#)
{
my #keys = map {$_->getNodeName} #_; # got ('attr_1', 'attr_2', 'attr_3')
my #child_refs = map {[$_->getChildNodes]} #_; # got 3 refs to list of XML::DOM::Text objects
my #values = map {#$_ ? $_->[0]->getNodeValue : undef} #child_refs; # got ('value_1', 'value_2', undef)
my %hash;
#hash{#keys} = #values;
%hash;
}

Resources