Perl - Filter function for arrays - arrays

i am trying to create a subroutine that does the following :
Takes two arrays as input (Filter, Base)
Outputs only the values of the second array that do not exist in the first
Example :
#a = ( 1, 2, 3, 4, 5 );
#b = ( 1, 2, 3, 4, 5, 6, 7);
Expected output : #c = ( 6, 7 );
Called as : filter_list(#filter, #base)
###############################################
sub filter_list {
my #names = shift;
my #arrayout;
foreach my $element (#_)
{
if (!($element ~~ #names )){
push #arrayout, $element;
}
}
return #arrayout
}
Test Run :
#filter = ( 'Tom', 'John' );
#array = ( 'Tom', 'John', 'Mary' );
#array3 = filter_list(#filter,#array);
print #array3;
print "\n";
Result :
JohnJohnMary
Can anyone help? Thank you.

You can't pass arrays to subs, only scalars. So when you do
my #filtered = filter_list(#filter, #base);
you are really doing
my #filtered = filter_list($filter[0], $filter[1], ..., $base[0], $base[1], ...);
As such, when you do
my #names = shift;
you are really doing
my #names = $filter[0];
which is obviously wrong.
The simplest solution is to pass references to the arrays.
my #filtered = filter_list(\#filter, \#base);
A hash permits an efficient implementation (O(N+M)).
sub filter_list {
my ($filter, $base) = #_;
my %filter = map { $_ => 1 } #$filter;
return grep { !$filter{$_} } #$base;
}
Alternatively,
my #filtered = filter_list(\#filter, #base);
could be implemented as
sub filter_list {
my $filter = shift;
my %filter = map { $_ => 1 } #$filter;
return grep { !$filter{$_} } #_;
}

What you're looking for is the difference of two sets. This, along with union, intersection, and a bunch of others are set operations. Rather than writing your own, there's plenty of modules for dealing with sets.
Set::Object is very fast and featureful. I'd avoid using the operator interface (ie. $set1 - $set2) as it makes the code confusing. Instead use explicit method calls.
use strict;
use warnings;
use v5.10;
use Set::Object qw(set);
my $set1 = set(1, 2, 3, 4, 5);
my $set2 = set(1, 2, 3, 4, 5, 6, 7);
say join ", ", $set2->difference($set1)->members;
Note that sets are unordered and cannot contain duplicates. This may or may not be what you want.

This uses List::Compare, a module with a large collection of routines for comparing lists.
Here you want get_complement
use warnings;
use strict;
use List::Compare;
my #arr1 = ( 1, 2, 3, 4, 5 );
my #arr2 = ( 1, 2, 3, 4, 5, 6, 7);
my $lc = List::Compare->new(\#arr1, \#arr2);
my #only_in_second = $lc->get_complement;
print "#only_in_second\n";
The module has many options.
If you don't need the result sorted, pass -u to the constructor for faster operation.
There is also the "Accelerated Mode", obtained by passing -a. For the purpose of efficient repeated comparisons between the same arrays many things are precomputed at construction. With this flag that is suppressed, which speeds up single comparisons. See List::Compare Modes.
These two options can be combined, List::Compare->new('-u', '-a', \#a1, \#a2).
Operations on three or more lists are supported.
There is also the functional interface, as a separate List::Compare::Functional module.

Related

Creating an array of arrays for GD::Graph

All right, I'm trying to make an array of arrays in Perl for use with the GD::Graph module. And I can't figure out why the following is not valid array of arrays for GD::Graph.
my #t = (1, 2, 3, 4);
my #g = (2, 4, 5, 6);
my #data = (#t, #g);
I've also tried constructing the data like below, and it still does not like it.
my #data;
push #data, #t;
push #data, #g;
I want to keep my values in seperate arrays and just combine them for use with GD::Graph, because that is what I've found to be the easiest way, even if it is ugly. How would I go about creating a valid structure for use with GD::Graph, that is created on the fly?
It complains about it here.
my $gd = $graph->plot(\#data) or die $graph->error;
Looks like #data is just a single dimension array with 8 elements.
You can define array of arrays by using the anonymous array constructor []
my #data = (
[1, 2, 3, 4],
[2, 4, 5, 6]
);
For me using array references did the trick
my #t = (1, 2, 3, 4);
my #g = (2, 4, 5, 6);
my #data = (\#t, \#g);
and the plot the chart with fro example:
my $graph = new GD::Graph::lines(800,600 );
my $gd = $graph->plot( \#data );
open OUT, ">","whatever.png" or die "Couldn't open for output: $!";
binmode(OUT);
print OUT $gd->png( );
close OUT;

How to sum multiple arrays element-wise in Perl?

There is a question very similar to this already but I would like to do this for multiple arrays. I have an array of arrays.
my #AoA = (
$arr1 = [ 1, 0, 0, 0, 1 ],
$arr2 = [ 1, 1, 0, 1, 1 ],
$arr3 = [ 2, 0, 2, 1, 0 ]
);
I want to sum the items of all the three (or more) arrays to get a new one like
( 4, 1, 2, 2, 2 )
The use List::MoreUtils qw/pairwise/ requires two array arguments.
#new_array = pairwise { $a + $b } #$arr1, #$arr2;
One solution that comes to mind is to loop through #AoA and pass the first two arrays into the pairwise function. In the subsequent iterations, I will pass the next #$arr in #AoA and the #new_array into the pairwise function. In the case of an odd sized array of arrays, after I've passed in the last #$arr in #AoA, I will pass in an equal sized array with elements of 0's.
Is this a good approach? And if so, how do I implement this? thanks
You can easily implement a “n-wise” function:
sub nwise (&#) # ← take a code block, and any number of further arguments
{
my ($code, #arefs) = #_;
return map {$code->( do{ my $i = $_; map $arefs[$_][$i], 0 .. $#arefs } )}
0 .. $#{$arefs[0]};
}
That code is a bit ugly because Perl does not support slices of multidimensional arrays. Instead I use nested maps.
A quick test:
use Test::More;
my #a = (1, 0, 0, 0, 1);
my #b = (1, 1, 0, 1, 1);
my #c = (2, 0, 2, 1, 0);
is_deeply [ nwise { $_[0] + $_[1] + $_[2] } \#a, \#b, \#c], [4, 1, 2, 2, 2];
I prefer passing the arrays as references instead of using the \# or + prototype: This allows you to do
my #arrays = (\#a, \#b, \#c);
nwise {...} #arrays;
From List::MoreUtils you could have also used each_arrayref:
use List::Util qw/sum/;
use List::MoreUtils qw/each_arrayref/;
my $iter = each_arrayref #arrays;
my #out;
while (my #vals = $iter->()) {
push #out, sum #vals;
}
is_deeply \#out, [4, 1, 2, 2, 2];
Or just plain old loops:
my #out;
for my $i (0 .. $#a) {
my $accumulator = 0;
for my $array (#arrays) {
$accumulator += $array->[$i];
}
push #out, $accumulator;
}
is_deeply \#out, [4, 1, 2, 2, 2];
The above all assumed that all arrays were of the same length.
A note on your snippet:
Your example of the array structure is of course legal perl, which will even run as intended, but it would be best to leave out the inner assignments:
my #AoA = (
[ 1, 0, 0, 0, 1 ],
[ 1, 1, 0, 1, 1 ],
[ 2, 0, 2, 1, 0 ],
);
You might actually be looking for PDL, the Perl Data Language. It is a numerical array module for Perl. It has many functions for processing arrays of data. Unlike other numerical array modules for other languages it has this handy ability to use its functionality on arbitrary dimensions and it will do what you mean. Note that this is all done at the C level, so it is efficient and fast!
In your case you are looking for the projection method sumover which will take an N dimensional object and return an N-1 dimensional object created by summing over the first dimension. Since in your system you want to sum over the second we first have to transpose by exchanging dimensions 0 and 1.
#!/usr/bin/env perl
use strict;
use warnings;
use PDL;
my #AoA = (
[ 1, 0, 0, 0, 1 ],
[ 1, 1, 0, 1, 1 ],
[ 2, 0, 2, 1, 0 ],
);
my $pdl = pdl \#AoA;
my $sum = $pdl->xchg(0,1)->sumover;
print $sum . "\n";
# [4 1 2 2 2]
The return from sumover is another PDL object, if you need a Perl list you can use list
print "$_\n" for $sum->list;
Here's a simple iterative approach. It probably will perform terribly for large data sets. If you want a better performing solution you will probably need to change the data structure, or look on CPAN for one of the statistical packages. The below assumes that all arrays are the same size as the first array.
$sum = 0;
#rv = ();
for ($y=0; $y < scalar #{$AoA[0]}; $y++) {
for ($x=0; $x < scalar #AoA; $x++) {
$sum += ${$AoA[$x]}[$y];
}
push #rv, $sum;
$sum = 0;
}
print '('.join(',',#rv).")\n";
Assumptions:
each row in your AoA will have the same number of columns as the first row.
each value in the arrayrefs will be a number (specifically, a value in a format that "works" with the += operator)
there will be at least one "row" with sat least one "column"
Note: "$#{$AoA[0]}" means, "the index of the last element ($#) of the array that is the first arrayref in #AoA ({$AoA[0]})"
(shebang)/usr/bin/perl
use strict;
use warnings;
my #AoA = (
[ 1, 0, 0, 0, 1 ],
[ 1, 1, 0, 1, 1 ],
[ 2, 0, 2, 1, 0 ]
);
my #sums;
foreach my $column (0..$#{$AoA[0]}) {
my $sum;
foreach my $aref (#AoA){
$sum += $aref->[$column];
}
push #sums,$sum;
}
use Data::Dumper;
print Dumper \#sums;

How can I check if two arrays contain the same elements in Perl?

So all I need is a simple way to tell if two arrays are identical in perl. Order does not matter, so I'm looking for something like this:
my #a = (1, 2, 3);
my #b = (2, 3, 1);
my #c = (1, 2, 4);
&identical(#a, #b) returns 1
&identical(#a, #c) returns 0
Thanks!
You could tally the elements' counts in a hash. Have a (element => count) hash, and bump the count up every time the first array has that element, and down every time the other has it (or vice versa). If the two arrays have all the same elements, every value in the hash will be 0.
sub have_same_elements {
my ($arr1, $arr2) = #_;
my %counts = ();
$counts{$_} += 1 foreach (#$arr1);
$counts{$_} -= 1 foreach (#$arr2);
return !(grep { $_ != 0 } values %counts);
}
$a_and_b_same = have_same_elements(\#a, \#b); # will be true
$a_and_c_same = have_same_elements(\#a, \#c); # will be false
(Note, this might or might not work with objects that do their own stringification. Hash keys can't be references, so Perl stringifies references as you use them. Its default stringifier turns references into something like ARRAY(0x12345678), which makes references distinct unless they're to the same thingie. But if an object does its own stringification and doesn't return distinct strings for different references, this will probably break. Just so you know.)
If you're using Perl 5.10 or greater (and if you aren't, you really should upgrade), you can use the smart match operator:
use strict;
use warnings;
my #a = (1, 2, 3);
my #b = (2, 3, 1);
my #c = (1, 2, 4);
#sort each of them (numerically)
#a = sort { $a <=> $b } #a;
#b = sort { $a <=> $b } #b;
#c = sort { $a <=> $b } #c;
if ( #a ~~ #b ) {
print "\#a and \#b are the same! (after sorting)\n";
}
else {
print "nope\n";
}
if ( #a ~~ #c ) {
print "\#a and \#c are the same! (after sorting)\n";
}
else {
print "nope\n";
}
You could also roll your own function:
use strict;
use warnings;
my #a = (1, 2, 3);
my #b = (2, 3, 1);
my #c = (1, 2, 4);
print same_elements(\#a, \#b) . "\n";
print same_elements(\#a, \#c) . "\n";
#arguments are two array references
sub same_elements {
my $array_ref_1 = shift;
my $array_ref_2 = shift;
my #arr1 = #$array_ref_1;
my #arr2 = #$array_ref_2;
#If they are not the same length, we are done.
if( scalar(#arr1) != scalar(#arr2) ) {
return 0;
}
#sort them!
#arr1 = sort { $a <=> $b } #arr1;
#arr2 = sort { $a <=> $b } #arr2;
foreach my $i( 0 .. $#arr1 ) {
if ( $arr1[$i] != $arr2[$i] ) {
return 0;
}
}
return 1;
}
First off, you are going to have to rethink your function.
identical(#a, #b);
Doesn't pass two arrays to the function, but passes a single array with all of the elements in both arrays in it. It's as if you said:
identical(1, 2, 3, 2, 3, 1);
In order for your function to work, you'll have to pass references to your arrays:
identical(\#a, \#b);
I'd say to prototype your subroutine, but that's probably going to cause you more problems that it'll solve.
If order is not important, sort the arrays before comparing them. You might even be able to cheat...
sub identical {
my $array_ref_1 = shift;
my $array_fef_2 = shift;
use Digest::SHA qw(sha1_hex);
if ( ref( $array_ref_1 ) ne "ARRAY") or ( ref( $array_ref_2 ) ne "ARRAY") {
return; #Error, need two array references
}
# Dereference Arrays
my #array_1 = #{$array_ref_1};
my #array_2 = #{$array_ref_2};
# Setup Arrays to be one big scalar
my $scalar_1 = join "\n", sort #array_1;
my $scalar_2 = join "\n", sort #array_2;
my $checksum_1 = sha1_hex $scalar_1;
my $checksum_2 = sha1_hex $scalar_2;
if ($checksum_1 eq $checksum_2) {
return 1;
}
else {
return 0_but_true;
A few notes:
I could have dereferences, joined, generated the checksum, and did the comparison in a single statement. I did them separately in order to make it clearer what I was doing. Programmatically, it probably doesn't make any difference. Perl will optimize the whole thing anyway. I always go for clarity.
0_but_true returns a 0, but at the same time returns a true value. This way, you can do something like if ( identical( \#A, \#B ) ) { to make sure that the function worked. Then, you can test for zero or one.
Be sure to test your parameters. I used the ref function to do this.
I cheated. I first turned the two sorted arrays into scalars. Then, I used the sha1 checksum to verify that they're the same. A checksum using the sha1 function should be pretty good. It is highly unlikely to fail.
The real issue is what if you had multi-lined arrays like this:
#a = ("this", "that", "the\nother");
#b = ("this", "that\nthe", "other");
Using the join the way I did would cause the resulting scalars to be equal.
I guess you could write it like this in a way that makes the least assumptions on the kind of input you are dealing with (just pass the appropriate comparison sub):
use List::Util;
sub identical {
my #this = #{ +shift };
my #that = #{ +shift };
my $cmp = shift // sub { shift eq shift };
return '' unless #this == #that;
for my $idx (List::Util::shuffle keys #this) {
return '' unless $cmp->($this[$idx], $that[$idx]);
}
return 1;
}
which behaves like so:
0> identical([0..100], [0..100])
$res[0] = 1
1> identical([0..100], ['0.0', 1..100])
$res[1] = ''
2> identical([0..100], ['0.0', 1..100], sub {shift == shift})
$res[2] = 1
3> identical(['0.66666666666', 0..10], [2/3, 0..10], sub {shift == shift})
$res[3] = ''
4> identical(['0.66666666666', 0..10], [2/3, 0..10], sub {shift() - shift() < 1e-5})
$res[4] = 1
# if you need this to be true check out https://stackoverflow.com/a/12127428/13992
5> identical([0..100], [List::Util::shuffle(0..100)])
$res[5] = ''

Passing arrays to a subroutine that prints each array separately

I know that this is probably a simple fix, but I have not been able to find the answer through google and searching through the questions here.
My goal is to pass multiple arrays to a subroutine that simply iterates through each array separately and prints each array with something before and after it.
What I have:
#A1 = (1, 2, 3);
#A2 = (4, 5, 6);
printdata(#A1, #A2) ;
sub printdata {
foreach(#_) {
print "$_" ;
print "###"
}
}
What I am attempting to get is:
123###456###
Instead its treating both arrays as one and iterating through each variable in the array, Putting the separator after every variable vice the entire array.
1###2###3###etc.....
I am not sure how to get the subroutine to treat the arrays as separate rather than as one.
Any Help would be greatly appreciated!
You need to pass the arrays as references:
#A1 = (1, 2, 3);
#A2 = (4, 5, 6);
printdata(\#A1, \#A2) ;
sub printdata {
foreach(#_) {
print #$_ ;
print "###"
}
}
The sub call expands the arrays into a list of scalars, which is then passed to the sub within the #_ variable. E.g.:
printdata(#A1, #A2);
is equal to:
printdata(1,2,3,4,5,6);
See the section on "Pass by Reference" in perldoc perlsub.
use strict;
use warnings;
use English qw<$LIST_SEPARATOR>;
my #A1 = (1, 2, 3);
my #A2 = (4, 5, 6);
{ local $LIST_SEPARATOR = '';
my #a = map { "#$_" } \#A1, \#A2;
$LIST_SEPARATOR = '###';
print "#a\n";
}
You also could have used join (po-tay-to, po-tah-to).
my #a = map { join( '', #$_ ) } \#A1, \#A2;
print join( '###', #a ), "\n";

better way for converting multidimensional array to one dimensional array

Currently, I am using following code to convert an irregular multidimensional array into one dimensional array.
my $array = [0,
[1],
2,
[3, 4, 5],
[6,
[7, 8, 9 ],
],
[10],
11,
];
my #mylist;
getList($array);
print Dumper (\#mylist);
sub getList
{
my $array = shift;
return if (!defined $array);
if (ref $array eq "ARRAY")
{
foreach my $i (#$array)
{
getList($i);
}
}
else
{
print "pushing $array\n";
push (#mylist, $array);
}
}
This is based on recursion where I am checking each element. If element is a reference to an array then calling it recursively with new array.
Is there a better way to solve this kind of problem?
First of all your function should never return data by modifying a global variable. Return a list instead.
As for efficiency, Perl has a surprisingly large function call overhead. Therefore for large data structures I would prefer a non-recursive approach. Like so:
use Data::Dumper;
my $array = [
0,
[1],
2,
[3, 4, 5],
[6, [7, 8, 9 ]],
[10],
11,
];
my #mylist = get_list($array);
print Dumper (\#mylist);
sub get_list {
my #work = #_;
my #result;
while (#work) {
my $next = shift #work;
if (ref($next) eq 'ARRAY') {
unshift #work, #$next;
}
else {
push #result, $next;
}
}
return #result;
}
Note that the formatting that I am using here matches the recommendations of perlstyle. We all know the futility of arguing the One True Brace Style. But at the least I'm going to suggest that you reduce your 8 space indent. There is research into this, and code comprehension has been shown to be improved with indents in the 2-4 space range. Read Code Complete for details. It doesn't matter where you are in that range for young people, but older programmers whose eyesight is going find 4 a better indent. Read Perl Best Practices for more on that.
Use CPAN. Do not worry about recursion overhead until you know it is a problem.
#!/usr/bin/perl
use strict;
use warnings;
use List::Flatten::Recursive;
my $array = [
0,
[1],
2,
[3, 4, 5],
[6, [7, 8, 9 ]],
[10],
11,
];
my #result = flat($array);
print join(", ", #result), "\n";
It's generally better to replace recursion with iteration. For general techniques, see Higher Order Perl book (freely avaialble) chapter 5, in this case:
my #stack = ($array);
my #flattened;
while (#stack) {
my $first = shift #stack;
if (ref($first) eq ref([])) {
push #stack, #$first; # Use unshift to keep the "order"
} else {
push #flattened, $first;
}
}
The reason it's better is because recursive implementations:
Risk running into stack overflow if there are too many nested levels
Less efficient due to the cost of recursive calls
In generall this is the only way to do this.
You can optimize your code a little, by only caling getList() again, when you encounter a ArrayRef. If you find a regular value you can push it directly into #mylist instead of rerunning getList().
I've used this is in the past. This code is on the command line, but you can put the code in single quotes into your .pl file
$ perl -le'
use Data::Dumper;
my #array = ( 1, 2, 3, [ 4, 5, 6, [ 7, 8, 9 ] ], [ 10, 11, 12, [ 13, 14, 15 ] ], 16, 17, 18 );
sub flatten { map ref eq q[ARRAY] ? flatten( #$_ ) : $_, #_ }
my #flat = flatten #array;
print Dumper \#flat;
'

Resources