Clustering By Interval Via Hash of Array in Perl - arrays

I have a data that looks like this
#Status value
TP 5.000
TP 3.000
TP 3.000
TN 10.000
TP 2.000
TP 9.000
TN 1.000
TP 9.000
TN 1.000
What we want to do is to cluster the Status based on the given interval in value.
Let that interval be 1-3, 4-6, 7-9, 10-12, etc .. (i.e. Bin size 3).
We hope to get the hash of array like this:
my %hoa = (
'1-3' => [TP,TP,TP,TN,TN],
'4-6' => [TP],
'7-9' => [TP,TP],
'10-12' => [TN]);
What's the way to achieve that?
Update: Corrected the HoA for 7-9, thanks to ysth.

Abstracting away the code to determine interval:
sub interval {
my ($val) = #_;
my $i = int( ( $val + 2 ) / 3 );
my $interval = sprintf( '%d-%d', $i * 3 -2, $i * 3 );
return $interval;
}
my %hoa;
while ( my $line = <> ) {
next if $line =~ /^#/;
my ($status, $value) = split ' ', $line;
push #{ $hoa{ interval($value) } }, $status;
}
use Data::Dumper;
print Dumper \%hoa;
(which gets two TPs for 7-9, not one as you show).

ysth's answer was the first thing that occurred to me as well, and I think he has the right approach.
I'd just like to leave a suggestion: you could use a clustering algorithm to do this for you in a future-proof kind of way (say, when your data becomes multidimensional). K-means, for example, would work fine, even for 1D data such as yours.
For example:
use strict; use warnings;
use Algorithm::KMeans;
my $datafile = $ARGV[0] or die;
my $K = $ARGV[1] or 0;
my $mask = 'N1';
my $clusterer = Algorithm::KMeans->new(
datafile => $datafile,
mask => $mask,
K => $K,
terminal_output => 0,
);
$clusterer->read_data_from_file();
my ($clusters, $cluster_centers) = $clusterer->kmeans();
my %clusters;
while (#$clusters) {
my $cluster = shift #$clusters;
my $center = shift #$cluster_centers;
$clusters{"#$center"} = $cluster;
}
use YAML; print Dump \%clusters;

Related

Is there any function in Perl that shifts the array element without removing them?

I have an array, say #array1 = qw(abc def ghi jkl).
Now, I want to use this array in a way that elements are shifted 1 by 1, but that shifting takes place virtually, and not in the array.
Like, "shift" will shift the elements and remove them from the array. But, I don't want those elements to be removed.
Short Code Snippet:
while (my $rName = shift #array1) {
my $bName = shift #array1 ;
## Do something now with the value
}
##And now, I want that I can use #array1 again with the original elements residing
How can it be implemented?
In Perl 5.36 you'll be able to do this:
for my ($rName, $bName) (#array1) { ... }
Use a C-style for loop and increment by two. $#foo is the index of the last element.
my #foo = 0 .. 5;
for (my $i = 0; $i <= $#foo; $i += 2){
my $r_name = $foo[$i];
my $b_name = $foo[$i+1];
}
If you wanted fancier-looking code, you could use natatime from List::MoreUtils on CPAN, which gives you an iterator that you can use in a while loop.
use List::MoreUtils 'natatime';
my #foo = 0 .. 5;
my $it = natatime 2, #foo;
while ( my ($r_name, $b_name) = $it->() ) {
print "$r_name $b_name\n";
}
You can also use pairs from the core List::Util module:
A convenient shortcut to operating on even-sized lists of pairs, this function returns a list of ARRAY references, each containing two items from the given list.
#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;
use List::Util qw/pairs/;
my #array1 = qw/a 1 b 2 c 3/;
for my $pair (pairs #array1) {
my ($rName, $bName) = #$pair;
say "$rName => $bName";
}
say "#array1";
# Require Perl 5.36
for my ( $rName, $bName ) ( #array1 ) {
...
}
my #array2 = #array1;
while ( #array2 ) {
my $rName = shift #array2;
my $bName = shift #array2;
...
}
for my $i ( 0 .. $#array1/2 ) {
my $rName = $array1[ $i * 2 + 0 ];
my $bName = $array1[ $i * 2 + 1 ];
...
}
for ( my $i = 0; $i < #array1; ) {
my $rName = $array1[ $i++ ];
my $bName = $array1[ $i++ ];
...
}
use List::Util qw( pairs );
for ( pairs #array1 ) {
my ( $rName, $bName ) = #$_;
...
}
I tried to organize them from fastest to slowest, but I didn't actually benchmark anything.
Three of the solutions were previously mentioned, but noone mentioned what I think is the fastest solution that doesn't require 5.36 (which hasn't been released yet).

For Loop Issues in creating nested array

Creating a matrix of products for three element arrays. I understand Perl does not have multi-dimensional arrays and are flattened. I have been using refs but I can't seem to get past the for loop issue in getting three products into a single array and pushing that array into a different single array. And I could be way off too. Be nice, but I've spent too many hours on this.
I have moved values inside and out of various places i.e. { }, printed out variables until I'm blue and used $last all over for debugging. I'm likely fried at this point.
use strict;
use warnings;
my #array1 = (1, 2, 3);
my #array2 = (2, 4, 6);
my #matrixArray = ();
my $matrixArray;
my #row;
my #finalArray = maths(\#array1, \#array2);
print #finalArray;
sub maths{
my $array1ref = shift;
my $array2ref = shift;
my $value1;
my $value2;
my $maths;
my #row = ();
my #array1 = #{$array1ref};
my #array2 = #{$array2ref};
my $len1 = #array1;
my $len2 = #array2;
for my $x (0 ..($len1 -1)){
#iterate through first array at each value
$value1 = $array1[$x];
#print $value1, " value1 \n";
for my $y (0 .. ($len2 -1)){
#iterate through second array at each value
$value2 = $array2[$y];
#print $value2, " value2 \n";
#calculate new values
$maths = $value1 * $value2;
#exactly right here
#print $maths, " maths \n" ;
push #row, $maths;
}
}
#and exactly right here but not set of arrays
#print #row, "\n";
return #row;
}
Currently I'm able to get this: 246481261218. Which is the correct dumb math but...
it should appear as a matrix:
2 4 6
4 8 12
6 12 18
I am not passing three arrays so it seems my issue is up in the sub routine before I can get on with anything else. This seems to be a theme that I often miss. So sorry if I sound inept.
EDIT***
This was working but I couldn't unpack it
use strict;
use warnings;
my #array1 = (1, 2, 3);
my #array2 = (2, 4, 6);
my #matrixArray = ();
maths(\#array1, \#array2);
foreach my $x (#matrixArray){
print "$x \n";
}
sub maths{
my $array1ref = shift;
my $array2ref = shift;
my $value1;
my $value2;
my $maths;
my #row = ();
my $row;
my #array1 = #{$array1ref};
my #array2 = #{$array2ref};
my $len1 = #array1;
my $len2 = #array2;
for my $x (0 ..($len1 -1)){
#iterate through first array at each value
$value1 = $array1[$x];
for my $y (0 .. ($len2 -1)){
#iterate through second array at each value
$value2 = $array2[$y];
#calculate new values
$maths = $value1 * $value2;
push #row, $maths;
$row = \#row;
}
push #matrixArray, $row;
}
return #matrixArray;
}
The output right after the function call is this:
ARRAY(0x55bbe2c667b0)
ARRAY(0x55bbe2c667b0)
ARRAY(0x55bbe2c667b0)
which would be the (line 10) print of $x.
****EDIT
This Works (almost):
print join(" ", #{$_}), "\n" for #matrixArray;
Output is a bit wrong...
2 4 6 4 8 12 6 12 18
2 4 6 4 8 12 6 12 18
2 4 6 4 8 12 6 12 18
And of note: I knew $x was an array but I seemed to run into trouble trying to unpack it correctly. And I'm no longer a fan of Perl. I'm pining for the fjords of Python.
And *****EDIT
This is working great and I get three arrays out of it:
sub maths{
my ($array1, $array2) = #_;
my #res;
for my $x (#$array1) {
my #row;
for my $y (#$array2) {
push #row, $x * $y;
}
push #res, \#row;
}
#This is the correct structure on print #res!
return #res;
}
But, though it's putting it together correctly, I have no output after the call
maths(\#array1, \#array2);
NOTHING HERE...
print #res;
print join(" ", #{$_}), "\n" for #res;
foreach my $x (#res){
print join(" ", #{$x}), "\n";
}
And of course a million thanks! I regret taking this stupid course and fear my grade will eventually do me in. Still pining for Python!
It appears that you need a matrix with rows obtained by multiplying an array by elements of another.
One way
use warnings;
use strict;
use Data::Dump qw(dd);
my #ary = (2, 4, 6);
my #factors = (1, 2, 3);
my #matrix = map {
my $factor = $_;
[ map { $_ * $factor } #ary ]
} #factors;
dd #matrix;
The array #matrix, formed by the outer map, has array references for each element and is thus (at least) a two-dimensional structure (a "matrix"). Those arrayrefs are built with [ ], which creates an anonymous array out of a list inside. That list is generated by map over the #ary.
I use Data::Dump to nicely print complex data. In the core there is Data::Dumper.
With a lot of work like this, and with large data, efficiency may matter. The common wisdom would have it that direct iteration should be a bit faster than map, but here is a benchmark. This also serves to show more basic ways as well.
use warnings;
use strict;
use feature 'say';
use Benchmark qw(cmpthese);
my $runfor = shift // 5; # run each case for these many seconds
sub outer_map {
my ($ary, $fact) = #_;
my #matrix = map {
my $factor = $_;
[ map { $_ * $factor } #$ary ]
} #$fact;
return \#matrix;
}
sub outer {
my ($ary, $fact) = #_;
my #matrix;
foreach my $factor (#$fact) {
push #matrix, [];
foreach my $elem (#$ary) {
push #{$matrix[-1]}, $elem * $factor;
}
}
return \#matrix;
}
sub outer_tmp {
my ($ary, $fact) = #_;
my #matrix;
foreach my $factor (#$fact) {
my #tmp;
foreach my $elem (#$ary) {
push #tmp, $elem * $factor;
}
push #matrix, \#tmp;
}
return \#matrix;
}
my #a1 = map { 2*$_ } 1..1_000; # worth comparing only for large data
my #f1 = 1..1_000;
cmpthese( -$runfor, {
direct => sub { my $r1 = outer(\#a1, \#f1) },
w_tmp => sub { my $r2 = outer_tmp(\#a1, \#f1) },
w_map => sub { my $r3 = outer_map(\#a1, \#f1) },
});
On a nice machine with v5.16 this prints
Rate direct w_map w_tmp
direct 11.0/s -- -3% -20%
w_map 11.4/s 3% -- -17%
w_tmp 13.8/s 25% 21% --
The results are rather similar on v5.29.2, and on an oldish laptop.
So map is a touch faster than building a matrix directly, and 15-20% slower than the method using a temporary array for rows, which I'd also consider clearest. The explicit loops can be improved a little by avoiding scopes and scalars, and the "direct" method can perhaps be sped up some by using indices. But these are dreaded micro-optimizations, and for fringe benefits at best.
Note that timings such as these make sense only with truly large amounts of data, what the above isn't. (I did test with both dimensions ten times as large, with very similar results.)
The second program is mostly correct.
The problem is that you didn't unpack the second level of the array.
foreach my $x (#matrixArray){
print "$x \n";
}
should be something like:
foreach my $x (#matrixArray) {
print join(" ", #{$x}), "\n";
}
# or just:
print join(" ", #{$_}), "\n" for #matrixArray;
Your maths function can be made shorter without losing legibility (it may actually make it more legible) by cutting out unnecessary temporary variables and indexing. For example:
sub maths {
my #array1 = #{ $_[0] };
my #array2 = #{ $_[1] }; # or: ... = #{ (shift) };
my #res = ();
for my $x (#array1) {
my #row = (); # <-- bugfix of original code
for my $y (#array2) {
my $maths = $x * $y;
push #row, $maths;
}
push #res, \#row;
}
return #res;
}

Is it possible to assign two variables in Perl foreach loop?

Is it possible to assign two variables the same data from an array in a Perl foreach loop?
I am using Perl 5, I think I came across something in Perl 6.
Something like this:
my $var1;
my $var2;
foreach $var1,$var2 (#array){...}
It's not in the Perl 5 core language, but List::Util has a pairs function which should be close enough (and a number of other pair... functions which may be more convenient, depending on what you're doing inside the loop):
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
use List::Util 'pairs';
my #list = qw(a 1 b 2 c 3);
for my $pair (pairs #list) {
my ($first, $second) = #$pair;
say "$first => $second";
}
Output:
a => 1
b => 2
c => 3
The easiest way to use this is with a while loop that calls splice on the first two elements of the array each time,
while (my($var1, $var2) = splice(#array, 0, 2)) {
...
}
However, unlike foreach, this continually does a double-shift on the original array, so when you’re done, the array is empty. Also, the variables assigned are copies, not aliases as with foreach.
If you don’t like that, you can use a C-style for loop:
for (my $i = 0; $i < #array; $i += 2) {
my($var1, $var2) = #array[$i, $i+1];
...
}
That leaves the array in place but does not allow you to update it the way foreach does. To do that, you need to address the array directly.
my #pairlist = (
fee => 1,
fie => 2,
foe => 3,
fum => 4,
);
for (my $i = 0; $i < #pairlist; $i += 2) {
$pairlist[ $i + 0 ] x= 2;
$pairlist[ $i + 1 ] *= 2;
}
print "Array is #pairlist\n";
That prints out:
Array is feefee 2 fiefie 4 foefoe 6 fumfum 8
You can get those into aliased variables if you try hard enough, but it’s probably not worth it:
my #kvlist = (
fee => 1,
fie => 2,
foe => 3,
fum => 4,
);
for (my $i = 0; $i < #kvlist; $i += 2) {
our ($key, $value);
local(*key, $value) = \#kvlist[ $i, $i + 1 ];
$key x= 2;
$value *= 2;
}
print "Array is #kvlist\n";
Which prints out the expected changed array:
Array is feefee 2 fiefie 4 foefoe 6 fumfum 8
Note that the pairs offered by the List::Pairwise module, which were but very recently added to the core List::Util module (and so you probably cannot use it), are still not giving you aliases:
use List::Util 1.29 qw(pairs);
my #pairlist = (
fee => 1,
fie => 2,
foe => 3,
fum => 4,
);
for my $pref (pairs(#pairlist)) {
$pref->[0] x= 2;
$pref->[1] *= 2;
}
print "Array is #pairlist\n";
That prints out only:
Array is fee 1 fie 2 foe 3 fum 4
So it didn’t change the array at all. Oops. :(
Of course, if this were a real hash, you could double the values trivially:
for my $value (values %hash) { $value *= 2 }
The reasons that works is because those are aliases into the actual hash values.
You cannot change the keys, since they’re immutable. However, you can make a new hash that’s an updated copy of the old one easily enough:
my %old_hash = (
fee => 1,
fie => 2,
foe => 3,
fum => 4,
);
my %new_hash;
#new_hash{ map { $_ x 2 } keys %old_hash } =
map { $_ * 2 } values %old_hash;
print "Old hash is: ", join(" " => %old_hash), "\n";
print "New hash is: ", join(" " => %new_hash), "\n";
That outputs
Old hash is: foe 3 fee 1 fum 4 fie 2
New hash is: foefoe 6 fiefie 4 fumfum 8 feefee 2
A general algorithm for more than 2 variables:
while( #array ){
my $var1 = shift #array;
my $var2 = shift #array;
my $var3 = shift #array;
# other variables from #array
# do things with $var1, $var2, $var3, ...
}
PS: Using a working copy of the array to that it is preserved for use later:
if( my #working_copy = #array ){
while( #working_copy ){
my $var1 = shift #working_copy;
my $var2 = shift #working_copy;
my $var3 = shift #working_copy;
# other variables from #working_copy
# do things with $var1, $var2, $var3, ...
}
}
PPS: another way is to use indexing. Of course, that is a sure sign that the data structure is wrong. It should be an array of arrays (AoA) or an array of hashes (AoH). See perldoc perldsc and perldoc perllol.
my $i = 0;
while( $i < #array ){
my $var1 = $array[ $i++ ];
my $var2 = $array[ $i++ ];
my $var3 = $array[ $i++ ];
# other variables from #array
# do things with $var1, $var2, $var3, ...
}
PPPS: I've been asked to clarify why the data structure is wrong. It is a flatten set of tuples (aka records aka datasets). The tuples are recreated by counting of the number of data for each. But what is the reader constructing the set has a bug and doesn't always get the number right? If, for a missing value, it just skips adding anything? Then all the remaining tuples are shifted by one, causing the following tuples to be grouped incorrectly and therefore, invalid. That is why an AoA is better; only the tuple with the missing data would be invalid.
But an better structure would be an AoH. Each datum would access by a key. Then new or optional data can be added without breaking the code downstream.
While I'm at it, I'll add some code examples:
# example code for AoA
for my $tuple ( #aoa ){
my $var1 = $tuple->[0];
my $var2 = $tuple->[1];
my $var3 = $tuple->[2];
# etc
}
# example code for AoH
for my $tuple ( #aoh ){
my $var1 = $tuple->{keyname1};
my $var2 = $tuple->{key_name_2};
my $var3 = $tuple->{'key name with spaces'};
my $var4 = $tuple->{$key_name_in_scalar_variable};
# etc
}
Here is a module-less way to "loop" by an arbitrary value ($by) and output the resulting group of elements using an array slice:
#!perl -l
#array = "1".."6";
$by = 3; $by--;
for (my $i = 0 ; $i < #array ; $i += $by ) {
print "#array[$i..$i+$by]";
$i++ ;
}
As a one-liner to test (cut and paste to a Unix shell):
perl -E '#array = "1".."6"; $by = 3; $by--;
for (my $i = 0 ; $i < #array ; $i += $by ) {
say "#array[$i..$i+$by]"; $i++ }'
Output:
1 2 3
4 5 6
If you make $by = 2; it will print pairs of numbers. To get at specific elements of the resulting slice access it as an anonymous array: (e.g. [#array[$i..$i+$by]]->[1]).
See also:
How do I read two items at a time in a Perl foreach loop?
Perl way of iterating over 2 arrays in parallel
Some good responses there, including reference to natatime which is quite easy to use. It's easy to implement too - it is essentially a wrapper around the splice solutions mentioned in the responses here.
The following is not the nicest example, but I've been using autobox::Core and made an #array->natatime() "method" ;-) like this:
use autobox::Core ;
sub autobox::Core::ARRAY::natatime {
my ($self, $by) = #_;
my #copy = #$self ;
my #array ;
push #array, [splice (#copy, 0, $by) ] while #copy ;
if ( not defined wantarray ) {
print "#{ $_ } \n" for #array ;
}
return wantarray ? #array : \#array;
}
The #copy array is spliced destructively, but $self (which is how the #array in front of the autobox method -> arrow gets passed to the function) is still there. So I can do:
my #dozen = "1" .. "12" ; # cakes to eat
#dozen->natatime(4) ; # eat 4 at time
my $arr_ref = #dozen->natatime(4) ; # make a reference
say "Group 3: #{ $arr_ref->[2] }" ; # prints a group of elements
say scalar #dozen , " cakes left" ; # eat cake; still have it
Output:
1 2 3 4
5 6 7 8
9 10 11 12
Group 3: 9 10 11 12
12 cakes left
One other approach that also uses a CPAN module (I gave this answer elsewhere but it is worth repeating). This can also be done non-destructively, with Eric Strom's excellent List::Gen module:
perl -MList::Gen=":all" -E '#n = "1".."6"; say "#$_" for every 2 => #n'
1 2
3 4
5 6
Each group of elements you grab is returned in an anonymous array so the individual values are in: $_->[0] $_->[1] ... etc.
You mentioned Perl6, which handles multiple looping values nicely:
my #qarr = 1 .. 6;
my ($x, $y, $z) ;
for #qarr -> $x , $y , $z { say $x/$y ; say "z = " ~ $z }
Output:
0.5
z = 3
0.8
z = 6
For more on the Perl6 approach see: Looping for Fun and Profit from the 2009 Perl6 Advent Calendar, or the Blocks and Statements Synopsis for details. Perhaps Perl 5 will have a similar "loop by multliple values" construct one day - à la perl5i's foreach :-)

In perl, how do you store multiple values from a single test file line into an array

Here are a couple of typical lines from an input file I am trying to crunch:
ICC2_DPD 2.7V ma 0.006 0.006 0.006 ... ...
DPD_Rel 2.7V ma 0.062 0.054 0.040 0.065 0.037 0.066 0.071 0.073 ... ... ...
(The number of floats can vary) Here is what I initially started with:
if(/^(\w+)\s+(\d+\.?\d*)V\s+(\w+)/)
{
print $OUT "$1 $2 $3\n";
}
How would you capture and store the floating point values into an array/hash given that the number of values varies. I am stuck on how to manage the termination of the array.
Well, in your case, I would probably consider using split, separating the fields on whitespace:
while (<DATA>) {
my #vals = split; # default split is fine
print join(" ", #vals[3 .. $#vals]), "\n";
}
Or if you want to store them, just push them onto an array, or use a hash with a suitable key. Something like...
push #array, [ #vals[3 .. $#vals] ]; # push an array ref
$hash{$vals[0]} = [ #vals[3 .. $#vals] ]; # use a hash
The [ ... ] part is creating an anonymous array ref, which can then store the line's values in a single scalar slot.
I might be inclined to write something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use List::Util qw( sum );
my #data;
while (my $line = <DATA>) {
my ($state, $voltage, $current_unit, $obs) = split ' ', $line, 4;
my #obs = ($obs =~ / ( [0-9] [.] [0-9]+ )/gx);
push #data, {
state => $state,
voltage => $voltage,
current_unit => $current_unit,
obs => \#obs,
average_current => sum(#obs) / #obs,
};
}
for my $x (#data) {
printf(
"State = %-16sAverage current = %.3f%s\n",
#$x{qw(state average_current current_unit)},
);
}
__DATA__
ICC2_DPD 2.7V ma 0.006 0.006 0.006
DPD_Rel 2.7V ma 0.062 0.054 0.040 0.065 0.037 0.066 0.071 0.073
Output:
State = ICC2_DPD Average current = 0.006ma
State = DPD_Rel Average current = 0.059ma

In Perl, is there a built in way to compare two arrays for equality?

I have two arrays of strings that I would like to compare for equality:
my #array1 = ("part1", "part2", "part3", "part4");
my #array2 = ("part1", "PART2", "part3", "part4");
Is there a built-in way to compare arrays like there is for scalars?
I tried:
if (#array1 == #array2) {...}
but it just evaluated each array in scalar context, and so compared the length of each array.
I can roll my own function to do it, but it seems like such a low-level operation that there should be a built-in way to do it. Is there?
Edit: sadly, I don't have access to 5.10+ or optional components.
There is the new smart match operator:
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
my #x = (1, 2, 3);
my #y = qw(1 2 3);
say "[#x] and [#y] match" if #x ~~ #y;
Regarding Array::Compare:
Internally the comparator compares the two arrays by using join to turn both arrays into strings and comparing the strings using eq.
I guess that is a valid method, but so long as we are using string comparisons, I would much rather use something like:
#!/usr/bin/perl
use strict;
use warnings;
use List::AllUtils qw( each_arrayref );
my #x = qw(1 2 3);
my #y = (1, 2, 3);
print "[#x] and [#y] match\n" if elementwise_eq( \(#x, #y) );
sub elementwise_eq {
my ($xref, $yref) = #_;
return unless #$xref == #$yref;
my $it = each_arrayref($xref, $yref);
while ( my ($x, $y) = $it->() ) {
return unless $x eq $y;
}
return 1;
}
If the arrays you are comparing are large, joining them is going to do a lot of work and consume a lot of memory than just comparing each element one by one.
Update: Of course, one should test such statements. Simple benchmarks:
#!/usr/bin/perl
use strict;
use warnings;
use Array::Compare;
use Benchmark qw( cmpthese );
use List::AllUtils qw( each_arrayref );
my #x = 1 .. 1_000;
my #y = map { "$_" } 1 .. 1_000;
my $comp = Array::Compare->new;
cmpthese -5, {
iterator => sub { my $r = elementwise_eq(\(#x, #y)) },
array_comp => sub { my $r = $comp->compare(\(#x, #y)) },
};
This is the worst case scenario where elementwise_eq has to go through each and every element in both arrays 1_000 times and it shows:
Rate iterator array_comp
iterator 246/s -- -75%
array_comp 1002/s 308% --
On the other hand, the best case scenario is:
my #x = map { rand } 1 .. 1_000;
my #y = map { rand } 1 .. 1_000;
Rate array_comp iterator
array_comp 919/s -- -98%
iterator 52600/s 5622% --
iterator performance drops quite quickly, however:
my #x = 1 .. 20, map { rand } 1 .. 1_000;
my #y = 1 .. 20, map { rand } 1 .. 1_000;
Rate iterator array_comp
iterator 10014/s -- -23%
array_comp 13071/s 31% --
I did not look at memory utilization.
There's Test::More's is_deeply() function, which will also display exactly where the structures differ, or Test::Deep's eq_deeply(), which doesn't require a test harness (and just returns true or false).
Not built-in, but there is Array::Compare.
This is one of the operations that's left out of the Perl core for what I believe are didactic reasons -- that is, if you're trying to do it, there's probably something wrong. The most illustrative example of this, I think, is the absence of a core read_entire_file function; basically, providing that function in the core would lead people to think it's a good idea to do that, but instead, Perl is designed in a way that gently nudges you toward processing files line-at-a-time, which is generally far more efficient and otherwise a better idea, but novice programmers are rarely comfortable with it and they need some encouragement to get there.
The same applies here: there is probably a much better way to make the determination you're trying to accomplish by comparing two arrays. Not necessarily, but probably. So Perl is nudging you to think about other ways of accomplishing your goal.
Perl 5.10 gives you the smart match operator.
use 5.010;
if( #array1 ~~ #array2 )
{
say "The arrays are the same";
}
Otherwise, as you said, you'll have top roll your own.
So long as you are using perl 5.10 or newer, you can use the smart match operator.
if (#array1 ~~ #array2) {...}
Simpler solution is faster:
#!/usr/bin/perl
use strict;
use warnings;
use Array::Compare;
use Benchmark qw( cmpthese );
use List::AllUtils qw( each_arrayref );
my #x = 1 .. 1_000;
my #y = map { "$_" } 1 .. 1_000;
my $comp = Array::Compare->new;
cmpthese -2, {
iterator => sub { my $r = elementwise_eq(\(#x, #y)) },
my_comp => sub { my $r = my_comp(\(#x, #y)) },
array_comp => sub { my $r = $comp->compare(\(#x, #y)) },
};
#x = 1 .. 20, map { rand } 1 .. 1_000;
#y = 1 .. 20, map { rand } 1 .. 1_000;
cmpthese -2, {
iterator => sub { my $r = elementwise_eq(\(#x, #y)) },
my_comp => sub { my $r = my_comp(\(#x, #y)) },
array_comp => sub { my $r = $comp->compare(\(#x, #y)) },
};
sub elementwise_eq {
my ($xref, $yref) = #_;
return unless #$xref == #$yref;
my $it = each_arrayref($xref, $yref);
while ( my ($x, $y) = $it->() ) {
return unless $x eq $y;
}
return 1;
}
sub my_comp {
my ($xref, $yref) = #_;
return unless #$xref == #$yref;
my $i;
for my $e (#$xref) {
return unless $e eq $yref->[$i++];
}
return 1;
}
And result in perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux-gnu-thread-multi:
Rate iterator array_comp my_comp
iterator 1544/s -- -67% -80%
array_comp 4697/s 204% -- -41%
my_comp 7914/s 413% 68% --
Rate iterator array_comp my_comp
iterator 63846/s -- -1% -75%
array_comp 64246/s 1% -- -75%
my_comp 252629/s 296% 293% --
This question has turned into a very useful resource. ++ for the benchmarks and discussion.
As others have pointed out smart match feature had issues and is being phased out in its current form. There are alternatives that are "less smart" (and so avoid the issues) and that are small, fairly fast and don't have too many non CORE dependencies.
Smart::Match
match::simple (and match::smart)
Scalar::In
You can find links to some pretty good discussions about the history of the future of ~~ by looking at a couple of blog posts by #brian d foy, and the p5p mail archive threads from 2011 and 2012 from #rjbs.
Comparing arrays can be simple and fun!
use v5.20;
use match::smart;
my #x = (1, 2, 3);
my #y = qw(4 5 6);
my #z = qw(4 5 6);
say \#x |M| \#y ? "[\#x] and [\#y] match": "no match";
say \#y |M| \#z ? "[\#y] and [\#z] match": "no match";
__END__
#y and #z match, #x and #y do not
... especially fun if the array is simple. But an array can be a complicated thing, and sometimes you want different kinds of information from the results of the comparison. For that, Array::Compare can make fine tuned comparison easier.
Data::Cmp is another recent option. The cmp_data() function operates similarly to the cmp operator (see perlop for cmp usage).
Example:
use 5.10;
use Data::Cmp qw/cmp_data/;
my #array1 = ("part1", "part2", "part3", "part4");
my #array2 = ("part1", "PART2", "part3", "part4");
my #array3 = ("part1", "PART2", "part3", "part4");
# sample usage
say "1 & 2 are different" if cmp_data(\#array1, \#array2) ;
sat "2 & 3 are the same" unless cmp_data(\#array2, \#array3) ;
It's also possible to compare hashes and more complicated nested data structures (within reason). For a single module with no non-core dependencies, Data::Cmp is pretty "smart" ;-) ... errm I mean "useful".
One could use grep function in scalar context (http://perldoc.perl.org/functions/grep.html#grep-BLOCK-LIST)
($#array1 == $#array2) && (0 == (grep { $array1[ $_ ] ne $array2[ $_ ] } 0..$#array1))
My core-only solution with List::Util::all:
use List::Util qw(all);
if (#array1 == #array2 && all { $array1[$_] eq $array2[$_] } 0..$#array1) {
print "matched\n";
}
As a subroutine:
# call me like string_array_equals([#array1], [#array2])
sub string_array_equals {
my ($array1, $array2) = #_;
#$array1 == #$array2 and
all { $array1->[$_] eq $array2->[$_] } 0..$#$array1;
}
If you want a custom comparison:
# call me like array_equals { $a eq $b } [#array1], [#array2]
sub array_equals(&$$) {
my ($compare, $array1, $array2) = #_;
#$array1 == #$array2 and
all {
local $a = $array1->[$_];
local $b = $array2->[$_];
$compare->($a, $b);
} 0..$#$array1;
}
At this point, all doesn't save much space and you could just do a for:
# call me like array_equals { $a eq $b } [#array1], [#array2]
sub array_equals(&$$) {
my ($compare, $array1, $array2) = #_;
#$array1 == #$array2 or return 0;
for (0..$#$array1) {
local $a = $array1->[$_];
local $b = $array2->[$_];
$compare->($a, $b) or return 0;
}
1;
}
Edit: List::Util::first works as a substitute on older perls (< v5.20).
use List::Util qw(first);
if (#array1 == #array2 && !defined first { $array1[$_] ne $array2[$_] } 0..$#array1) {
print "matched\n";
}
If casing is the only difference, you can simply use:
if (lc "#array1" eq lc "#array2") {...}
Whereas "#array1" returns the same as join ( " ", #array1 )
If order and duplicate values do not matter but only values equality (i.e. set comparison), you could use Set::Scalar.
It overloads common operators such as == or !=.
my #array1 = ("part1", "part2", "part3", "part4");
my #array2 = ("part1", "PART2", "part3", "part4");
if ( Set::Scalar->new(#array1) == Set::Scalar->new(#array2) ) {...}
Alternatively, there's also Algorithm::Diff and List::Compare.
For checking equality of two arrays try this.
In given code, if %eq_or_not has any value then both arrays are not equal otherwise they are equal.
my #array1 = ("part1", "part2", "part3", "part4");
my #array2 = ("part1", "PART2", "part3", "part4");
my %eq_or_not;
#eq_or_not{ #array1 } = undef;
delete #eq_or_not{ #array2 };
if (join(",",sort #a) eq join(",",sort #b))
if performance concern can be ignored, as mentioned several times in the threads here
If the only criteria is "are they equivalent or not?", and not the more complex question, "are they equivalent or not, and if they differ, how?" there are much quicker/uglier ways to do it. For example, smash the entirety of each array into two scalars and compare those.
For example
my #array1 = ("part1", "part2", "part3", "part4");
my #array2 = ("part1", "PART2", "part3", "part4");
my $smash1 = join("", #array1);
my $smash2 = join("", #array2);
if ($smash1 eq $smash2)
{
# equal
}
else
{
#unequal
}
Yes, I probably just made Larry Wall cry.

Resources