Taking the average of many N sized arrays - arrays

Can anyone help me fix this? I'm trying to write a script that takes the sum of many N sized arrays. In the example below the average of the arrays would be (1,2) since (0+1+2+1)/4 = 1 and (2+3+2+1)/4 = 2. Currently the code below only works for arrays of size 2. How would I do this with arrays of say size 100 such that the length of #results is 100? I imagine I would need a counter right?
use strict;
use warnings;
my #w = (0, 2);
my #x = (1, 3);
my #y = (2, 2);
my #z = (1, 1);
my #arrays = \(#w, #x, #y, #z);
my ($x, $y) = (0, 0);
foreach my $arr(#arrays) {
$x += $arr->[0];
$y += $arr->[1];
}
my #result = ( $x / #arrays, $y / #arrays);
print "#result\n"; # <---- prints 1 2
#######
# my attempt
my #avg;
for my $i(0..$w) {
# I'm guessing the result 'map...' returns is an array
#avg[$i] = sum(\(map $_->[$i], #arrays)) / #arrays;
}
# sum the elements in an array and return its value
sub sum{
# takes 1 param: an arrey_ref
my $sum = 0;
for ( #{$_[0]} ) {
$sum += $_;
}
return $sum;
}
My attempt is close but it doesn't work. I would like to do this without using a module.

This is my solution:
use strict;
use warnings;
my $SIZE = 3;
my #w = (0, 2, 3);
my #x = (1, 3, 4);
my #y = (2, 2, 6);
my #z = (1, 1, 3);
my #arrays = \(#w, #x, #y, #z);
my #result = ();
foreach my $arr(#arrays) {
for(my $i=0; $i<$SIZE; $i++) {
$result[$i] += $arr->[$i];
}
}
#result = map { $_ / #arrays } #result;
print "(#result)", "\n"; # <---- prints (1 2 4)

Do it in a simple way, like this:
#!/usr/bin/env perl
use strict;
use warnings;
my #w = (1, 2, 3, 4);
my #x = (1, 2, 3, 4);
my #y = (1, 2, 3, 4);
my #z = (1, 2, 3, 4);
my #arrays = \(#w, #x, #y, #z);
my #sums;
foreach my $arr(#arrays) {
for( my $i = 0; $i <= $#w; $i++, $j++ )
{
$sums[$j] += $arr->[$i];
}
}
my #avg = map { $_ / #arrays } #sums;
foreach( #avg )
{
print "$_ ";
}
print "\n";

Yes its very close. Here are the corrections to the problem code:
my #avg;
for my $i (0..$#w) {
my #transposed = map {$_->[$i]} #arrays;
$avg[$i] = sum(\#transposed) / #transposed;
# or in one line: $avg[$i] = sum( [map {$_->[$i]} #arrays] ) / #arrays;
}
(I'm assuming that the arrays #w, #x, #y and #z are all the same size.)
Note that #avg[$i] is an array slice. You want $avg[$i] to specify a particular position.

The List::Util module can be very useful for problems like this. Together with a simple average subroutine it makes everything very simple.
use strict;
use warnings;
use List::Util 'sum';
sub average { sum(#_) / #_ }
my #w = (0, 2);
my #x = (1, 3);
my #y = (2, 2);
my #z = (1, 1);
my #arrays = \(#w, #x, #y, #z);
my #results = map {
my $i = $_;
average(map $_->[$i], #arrays);
} 0 .. $#{$arrays[0]};
print "#results\n";
output
1 2

map does not return an array. docs
Change
#avg[$i] = sum(\(map $_->[$i], #arrays)) /#arrays;
to
$avg[$i] = sum( map $_->[$i], #{$arrays[$i]} ) / #{$arrays[$i]};
and change
sub sum{
my $sum = 0;
for ( #{$_[0]} ) {
$sum += $_;
}
return $sum;
}
to
sub sum {
my $sum;
$sum += $_ for #_;
return $sum;
}

Related

Remove even amounts of duplicates from an array

I have an array
[ 1, 0, 0, 0, 5, 2, 4, 5, 2, 2 ]
I need to delete even amounts of duplicates.
That means, if a value appears an even number of times in the array then remove them all, but if it appears an odd number of times then keep just one.
The result from the array above should be
[ 1, 0, 2, 4 ]
How can I do that?
Removing duplicates is usually done as follows:
use List::Util 1.44 qw( uniqnum );
#a = uniqnum #a;
or
my %seen;
#a = grep { !$seen{$_}++ } #a;
To achieve what you want, we simply need chain grep that removes the other undesired elements.
use List::Util 1.44 qw( uniqnum );
#a = uniqnum grep { $counts{$_} % 2 } #a;
or
my %seen;
#a = grep { !$seen{$_}++ } grep { $counts{$_} % 2 } #a;
or
my %seen;
#a = grep { ( $counts{$_} % 2 ) && !$seen{$_}++ } #a;
The above solutions rely on having the count of each value. To obtain that, we can use the following:
my %counts;
++$counts{$_} for #a;
All together:
my ( %counts, %seen );
++$counts{$_} for #a;
#a = grep { ( $counts{$_} % 2 ) && !$seen{$_}++ } #a;
Note that these methods of removing duplicates preserve the order of the elements (keeping the first duplicate). This is more efficient (O(N)) then involving sort (O(N log N)) to avoid producing something non-deterministic.
This really isn't difficult, and it is very bad form to show no attempt at all at solving it yourself. I would like someone who posted questions like this to describe how they feel comfortable getting someone else to do their work for them. Even difficult crosswords don't get this flood of requests for a solution, but in this case presumably you are being paid for a solution written by someone else? Why is that not a problem to you?
Build a hash to calculate the current count for each value
use $_ % 2 do determine the new final count
Deconstruct the hash to a new array
my $array = [ 1, 0, 0, 0, 5, 2, 4, 5, 2, 2 ];
my #new_array = do {
my %counts;
++$counts{$_} for #$array;
map {
( $_ ) x ( $counts{$_} % 2 )
} sort { $a <=> $b } keys %counts;
};
use Data::Dump;
dd \#new_array;
output
[0, 1, 2, 4]
See the comments, to see how this possible solution does it.
#!/usr/bin/perl
use strict;
use warnings;
my #a = qw(1 0 0 0 5 2 4 5 2 2);
# Move through the array.
for (my $i = 0; $i < scalar(#a); ) {
# Move through the positions at and ahead of current position $i
# and collect all positions $j, that share the value at the
# current position $i.
my #indexes;
for (my $j = $i; $j < scalar(#a); $j++) {
if ($a[$j] == $a[$i]) {
push(#indexes, $j);
}
}
if (scalar(#indexes) % 2) {
# If the number of positions collected is odd remove the first
# position from the collection. The number of positions in the
# collection is then even afterwards.
shift(#indexes);
# As we will keep the value at the current position $i no new
# value will move into that position. Hence we have to advance
# the current position.
$i++;
}
# Move through the collected positions.
for (my $k = 0; $k < scalar(#indexes); $k++) {
# Remove the element at the position as indicated by the
# $k'th element of the collect positions.
# We have to subtract $k from the collected position, to
# compensate for the movement of the remaining elements to the
# left.
splice(#a, $indexes[$k] - $k, 1);
}
}
print("#a");
You have a bunch of answers, here's another:
use strict;
use warnings;
use Data::Dumper;
my $input = [ 1, 0, 0, 0, 5, 2, 4, 5, 2, 2 ];
my $output = dedupe_evens($input);
print Data::Dumper->Dump([$input, $output], ['$input', '$output']);
exit;
sub dedupe_evens {
my($input) = #_;
my %seen;
$seen{$_}++ foreach #$input;
my #output = grep {
my $count = delete $seen{$_}; # only want first occurrence
$count && $count % 2;
} #$input;
return \#output;
}
Which produces this output (reformatted for brevity):
$input = [ 1, 0, 0, 0, 5, 2, 4, 5, 2, 2 ];
$output = [ 1, 0, 2, 4 ];

How to split an array in uneven batches in Perl

I am trying to split an array into batches of 500, but I want to cover the case where the array size is not evenly divided by 500 as well. For example:
Array = [1,...,1100]
subArray1 = [1..500]
subArray2 = [500..1000]
subArray3 = [1000..1100]
Here is what I have so far:
my #array = [prepopulated];
my $array_size = scalar(#array);
my $start = 0;
my $end = 500;
while ($start < $array_size) {
my #subArray = #array[$start..$end];
print Dumper(\#subArray);
$start = $end;
if($end + 500 > $array_size) {
$end = $start + ($array_size % 500);
} else {
$end += 500;
}
}
This seems to work, but is there a better way to do this or does this break for some case I am not seeing?
List::MoreUtils covers exactly this kind of task:
use strict; use warnings;
use List::MoreUtils qw(natatime);
my #Array = (1..1100);
my $it = natatime 500, #Array;
while (my #vals = $it->()) {
printf "%d\n", 0+#vals;
}
500
500
100
No need to reinvent the wheel.
I think splice would be simpler and cleaner:
use strict;
use warnings;
my #array = (1 .. 1100);
my $batch_size = 500;
while (#array) {
my #batch = splice(#array, 0, $batch_size);
# ...
}
Keep in mind that perl's range operator start..end is inclusive both sides. 0..500 gives you: 0, 1, ..., 499, 500
Assuming you don't want to duplicate boundary elements (e.g. index 500 included in two sub arrays), try the following.
Summary:
change $end initialization to 499
inside the loop set $start = $end+1
when checking if the new end is overflowing use >= instead: $end + 500 >= $array_size
code
my #array = [prepopulated];
my $array_size = scalar(#array);
my $start = 0;
my $end = 499;
while ($start < $array_size) {
my #subArray = #array[$start..$end];
print Dumper(\#subArray);
$start = $end+1;
if($end + 500 >= $array_size) {
$end = $array_size-1;
} else {
$end += 500;
}
}
Creating a second array of indexes for natatime is extremely wasteful.
use List::Util qw( min );
for (my $start=0; $start<#array; $start+=500) {
my $end = min($start+500, 0+#array);
...
}

slicing out multiples of nth element

I am having a list which has more than 10k elements. I want to remove every third element.
For example,
#testlists = qw (helloworld sessions first.cgi login localpcs depthhashes.cgi search view macros plugins ...) ;
I want to remove first.cgi, depthhashses.cgi, macros and so on from original array. Grep function is little bit slower. Please suggest me a faster grep search or any other similar subroutines . Any help would be highly appreciated
I can think of a few solutions:
Grep on index divisibility
my $i = 0;
#testlist = grep { ++$i % 3 } #testlist;
repeated splicing
for (my $i = 2; $i < $#testlist; $i += 2) {
splice #testlist, $i, 1;
}
Copying with skipping
my #output;
# pre-extend the array for fewer reallocations
$#output = #testlist * 2/3;
#output = ();
# FIXME annoying off-by one errors at the end that can add one undef
for (my $i = 0; $i < #testlist; $i += 3) {
push #output, #testlist[$i, $i+1];
}
Ikegami has corrected and optimized the copying solution in his remarkable answer.
A benchmark with 1,000-element lists declares splicing the clear winner:
Rate slice grep copy splice
slice 790/s -- -10% -18% -37%
grep 883/s 12% -- -8% -29%
copy 960/s 22% 9% -- -23%
splice 1248/s 58% 41% 30% --
(slice is mob's solution)
This is probably because it offloads most of the actual work into the C-level implementations, and avoids allocations and costly Perl-level operations.
With 10,000-element lists, the advantages shift towards the other solutions. Indeed, the splice solution has a very bad algorithmic complexity, because it moves all elements after all splicing position, meaning that the last element is moved nearly 3333 times:
Rate splice slice grep copy
splice 42.7/s -- -35% -42% -49%
slice 65.3/s 53% -- -12% -23%
grep 74.2/s 74% 14% -- -12%
copy 84.4/s 98% 29% 14% --
Here is the script I used for benchmarking.
amon's copy can be tweaked to be 30% faster!
my $i = 1;
my $j = 1;
while ($i < #a) {
$a[++$j] = $a[$i+=2];
$a[++$j] = $a[++$i];
}
$#a = $j-1 if #a>2;
And you can get even faster by avoid copying anything at all (though the result is a reference to an array). The longer the strings in each element, the better this will do.
my $i = 0;
my $ref = sub { \#_ }->( grep { ++$i % 3 } #a );
Results (10,000 elements):
>perl a.pl
Rate splice_a splice_r grep_a copy1_a copy1_r copy2_r copy2_a grep_r
splice_a 52.8/s -- -0% -51% -54% -56% -66% -66% -68%
splice_r 52.9/s 0% -- -51% -54% -55% -66% -66% -68%
grep_a 107/s 103% 103% -- -7% -10% -30% -31% -34%
copy1_a 115/s 118% 117% 7% -- -3% -25% -26% -30%
copy1_r 119/s 125% 124% 11% 3% -- -23% -23% -27%
copy2_r 154/s 191% 190% 43% 34% 29% -- -0% -6%
copy2_a 154/s 192% 192% 44% 34% 30% 0% -- -6%
grep_r 163/s 209% 209% 52% 42% 37% 6% 6% --
Benchmark:
use strict;
use warnings;
use Benchmark qw( cmpthese );
my #testlist = qw( helloworld sessions first.cgi login localpcs depthhashes.cgi search view macros );
#testlist = ( #testlist ) x ( 10000 / #testlist );
sub grep_a { my #a = #testlist; my $i = 0; #a = grep { ++$i % 3 } #a; 1 }
sub copy1_a { my #a = #testlist;
my #b;
$#b = $#a; #b = (); # Does absolutely nothing in this benchmark because of optimisations in Perl.
for (my $i = 0; $i < #a; $i += 3) {
push #b, #a[$i, $i+1];
}
1
}
sub copy2_a { my #a = #testlist;
my $i = 1;
my $j = 1;
while ($i < #a) {
$a[++$j] = $a[$i+=2];
$a[++$j] = $a[++$i];
}
$#a = $j-1 if #a>2;
1
}
sub splice_a { my #a = #testlist;
for (my $i = 2; $i < $#a; $i += 2) {
splice #a, $i, 1;
}
1
}
sub grep_r { my $r = [ #testlist ]; my $i = 0; $r = sub { \#_ }->( grep { ++$i % 3 } #$r ); 1 }
sub copy1_r { my $r = [ #testlist ];
my #b;
$#b = $#$r; #b = (); # Does absolutely nothing in this benchmark because of optimisations in Perl.
for (my $i = 0; $i < #$r; $i += 3) {
push #b, #$r[$i, $i+1];
}
$r = \#b;
1
}
sub copy2_r { my $r = [ #testlist ];
my $i = 1;
my $j = 1;
while ($i < #$r) {
$r->[++$j] = $r->[$i+=2];
$r->[++$j] = $r->[++$i];
}
$#$r = $j-1 if #$r>2;
1
}
sub splice_r { my $r = [ #testlist ];
for (my $i = 2; $i < $#$r; $i += 2) {
splice #$r, $i, 1;
}
1
}
cmpthese(-3, {
grep_a => \&grep_a,
copy1_a => \&copy1_a,
copy2_a => \&copy2_a,
splice_a => \&splice_a,
grep_r => \&grep_r,
copy1_r => \&copy1_r,
copy2_r => \&copy2_r,
splice_r => \&splice_r,
});
Use an array slice.
#testlists = #testlists[ grep { ($_+1) % 3 } 0..$#testlists ];
I'm not sure what you mean about using grep but perhaps you mean something like this.
for $i (0 .. $#testlists) {
if (($i % 3) == 2) {
delete $testlists[$i];
}
}
# Demonstrate the values.
foreach $e (#testlists) {
print "$e\n";
}

Subroutine that takes average of one or more arrays

I'm working on a subroutine that takes the average of 1 or more arrays. I would like to do this without using a module.
use strict;
use warnings;
use List::Util 'sum';
my #w = (0, 2);
my #x = (1, 3);
my #y = (2, 2);
my #z = (1, 1);
# the average of these four arrays is (1,2) since
# (0+1+2+1)/4 = 1 and (2+3+2+1)/4 = 2
my #arrays = \(#w, #x, #y, #z);
my #avg;
# this is the way to do it using the module
for my $i (0..$#w) {
$avg[$i] = sum(map $_->[$i], #arrays) / #arrays;
}
print "#avg\n";
# my way of doing it without module
#avg;
for my $i (0..$#w) {
$avg[$i] = prod_sum(map $_->[$i], \#arrays) / #arrays;
}
print "#avg\n";
# subroutines
sub prod_sum{
my $o = $_[0];
my $arr_ref = $_[1];
my $array_ref;
foreach my $row (#$arr_ref){
foreach my $cell (#$row) {
push(#{ $array_ref }, $_);
}
}
my $sum = $o + the_sum($array_ref);
return $sum;
}
sub the_sum{
my $sum = 0;
for ( #{$_[0]} ) {
$sum += $_;
}
return $sum;
}
output
1 2
[pair of really random big numbers]
The first output is correct. It displays the average of all of the arrays. The second output is completely wrong. How do I do this without using a module?
I propose this solution:
use strict;
use warnings;
my #w = (0, 2);
my #x = (1, 3);
my #y = (2, 2);
my #z = (1, 1);
my #arrays = \(#w, #x, #y, #z);
my ($x, $y) = (0, 0);
foreach my $arr(#arrays) {
$x += $arr->[0];
$y += $arr->[1];
}
my #result = ( $x / #arrays, $y / #arrays);
print "(#result)", "\n"; # <---- prints (1 2)
You think sum is being passed two variables, it is not. It is only being passed an array. Modify your prod_sum to expect only an array (and replace \#arrays in the call of prod_sum to be just #arrays). Or you can use this:
sub sum {
return 0 if $#_ < 0;
my $head = shift;
return $head + sum(#_);
}
The above is a recursive subroutine that will sum an array.
Note: if your array has more then 100 element, use warnings will emit a deep recursion warning. For more on that topic, see here

taking average of array

I would like to get the average value of 1 or more arrays.
my #w = (0,2);
my #x = (1,3);
my #y = (2,2);
my #z = (1,1);
such that the average of four arrays above is (1,2) since (0+1+2+1)/4 = 1 and (2+3+2+1)/4 = 2.
And the average of my #r = (3,4,1,4,5) is (3/5, 4/5, 1/5, 4/5/, 1).
Here's my solution for the first one.
my #AoA = (\#w, \#x, \#y, \#z);
my #sums;
foreach my $column(0..$##w){
my $average;
my $size = #w;
foreach my $aref (#AoA){
$average += ($aref -> [$column]) / $size ;
}
push(#sums, $average);
}
this may or may not work. Is there an easier way to do this? Perhaps a function in a module?
My solution for the second one. I think this one's actually easier now that I think about it:
my $scalar = #r;
my #new_r;
foreach my $x (#r) {
$x = $x / $scalar;
push(#new_r, $x);
}
again may or may not work. there's probably an easier way to do this.
Given
use List::Util qw( sum );
sub avg { sum(#_)/#_ }
You could use
my #w = (0,2);
my #x = (1,3);
my #y = (2,2);
my #z = (1,1);
my #avg = map {
my $i = $_;
avg map $_->[$i], \#w, \#x, \#y, \#z
} 0..$#w;
Or if you had an AoA:
my #matrix = ([0,2], [1,3], [2,2], [1,1]);
my #avg = map {
my $i = $_;
avg map $_->[$i], #matrix
} 0..$#{$matrix[0]};
As for the "average" of #r,
my #avg_r = map $_/#r, #r;
Something like this perhaps?
use strict;
use warnings;
use List::Util 'sum';
my #w = (0, 2);
my #x = (1, 3);
my #y = (2, 2);
my #z = (1, 1);
my #arrays = \(#w, #x, #y, #z);
my #avg;
for my $i (0..$#w) {
$avg[$i] = sum(map $_->[$i], #arrays) / #arrays;
}
print "#avg\n";
my #r = (3, 4, 1, 4, 5);
my #avg_r = map $_ / #r, #r;
print "#avg_r\n";
output
1 2
0.6 0.8 0.2 0.8 1

Resources