How to split an array in uneven batches in Perl - arrays

I am trying to split an array into batches of 500, but I want to cover the case where the array size is not evenly divided by 500 as well. For example:
Array = [1,...,1100]
subArray1 = [1..500]
subArray2 = [500..1000]
subArray3 = [1000..1100]
Here is what I have so far:
my #array = [prepopulated];
my $array_size = scalar(#array);
my $start = 0;
my $end = 500;
while ($start < $array_size) {
my #subArray = #array[$start..$end];
print Dumper(\#subArray);
$start = $end;
if($end + 500 > $array_size) {
$end = $start + ($array_size % 500);
} else {
$end += 500;
}
}
This seems to work, but is there a better way to do this or does this break for some case I am not seeing?

List::MoreUtils covers exactly this kind of task:
use strict; use warnings;
use List::MoreUtils qw(natatime);
my #Array = (1..1100);
my $it = natatime 500, #Array;
while (my #vals = $it->()) {
printf "%d\n", 0+#vals;
}
500
500
100
No need to reinvent the wheel.

I think splice would be simpler and cleaner:
use strict;
use warnings;
my #array = (1 .. 1100);
my $batch_size = 500;
while (#array) {
my #batch = splice(#array, 0, $batch_size);
# ...
}

Keep in mind that perl's range operator start..end is inclusive both sides. 0..500 gives you: 0, 1, ..., 499, 500
Assuming you don't want to duplicate boundary elements (e.g. index 500 included in two sub arrays), try the following.
Summary:
change $end initialization to 499
inside the loop set $start = $end+1
when checking if the new end is overflowing use >= instead: $end + 500 >= $array_size
code
my #array = [prepopulated];
my $array_size = scalar(#array);
my $start = 0;
my $end = 499;
while ($start < $array_size) {
my #subArray = #array[$start..$end];
print Dumper(\#subArray);
$start = $end+1;
if($end + 500 >= $array_size) {
$end = $array_size-1;
} else {
$end += 500;
}
}

Creating a second array of indexes for natatime is extremely wasteful.
use List::Util qw( min );
for (my $start=0; $start<#array; $start+=500) {
my $end = min($start+500, 0+#array);
...
}

Related

Is there any function in Perl that shifts the array element without removing them?

I have an array, say #array1 = qw(abc def ghi jkl).
Now, I want to use this array in a way that elements are shifted 1 by 1, but that shifting takes place virtually, and not in the array.
Like, "shift" will shift the elements and remove them from the array. But, I don't want those elements to be removed.
Short Code Snippet:
while (my $rName = shift #array1) {
my $bName = shift #array1 ;
## Do something now with the value
}
##And now, I want that I can use #array1 again with the original elements residing
How can it be implemented?
In Perl 5.36 you'll be able to do this:
for my ($rName, $bName) (#array1) { ... }
Use a C-style for loop and increment by two. $#foo is the index of the last element.
my #foo = 0 .. 5;
for (my $i = 0; $i <= $#foo; $i += 2){
my $r_name = $foo[$i];
my $b_name = $foo[$i+1];
}
If you wanted fancier-looking code, you could use natatime from List::MoreUtils on CPAN, which gives you an iterator that you can use in a while loop.
use List::MoreUtils 'natatime';
my #foo = 0 .. 5;
my $it = natatime 2, #foo;
while ( my ($r_name, $b_name) = $it->() ) {
print "$r_name $b_name\n";
}
You can also use pairs from the core List::Util module:
A convenient shortcut to operating on even-sized lists of pairs, this function returns a list of ARRAY references, each containing two items from the given list.
#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;
use List::Util qw/pairs/;
my #array1 = qw/a 1 b 2 c 3/;
for my $pair (pairs #array1) {
my ($rName, $bName) = #$pair;
say "$rName => $bName";
}
say "#array1";
# Require Perl 5.36
for my ( $rName, $bName ) ( #array1 ) {
...
}
my #array2 = #array1;
while ( #array2 ) {
my $rName = shift #array2;
my $bName = shift #array2;
...
}
for my $i ( 0 .. $#array1/2 ) {
my $rName = $array1[ $i * 2 + 0 ];
my $bName = $array1[ $i * 2 + 1 ];
...
}
for ( my $i = 0; $i < #array1; ) {
my $rName = $array1[ $i++ ];
my $bName = $array1[ $i++ ];
...
}
use List::Util qw( pairs );
for ( pairs #array1 ) {
my ( $rName, $bName ) = #$_;
...
}
I tried to organize them from fastest to slowest, but I didn't actually benchmark anything.
Three of the solutions were previously mentioned, but noone mentioned what I think is the fastest solution that doesn't require 5.36 (which hasn't been released yet).

Create Multi Dimensional Hash Using Array Elements

I need to create multidimensional hashes with varying depth using array elements as keys. Pseudocode attempt:
Example line:
Statement Coverage for instance Hierarchical/path/with/block/and/module ...
if (/Statement Coverage for instance (.&?) /)
{
#array = split /\//, $1;
}
for (my $eye = 1; $eye <= $#array; $eye++)
{
A miracle happens to %hash!
}
$hash{"path"}{"with"}{"block"}{"and"} now has a value of "module". Remember, the number of keys can vary. Any ideas?
That's what Data::Diver does for you:
my #array = split /\//, $1;
DiveVal(\ my %hash, #array[ 0 .. $#array - 1 ]) = $array[-1];
print Dumper(\%hash);
See my code below. It builds the desired hash recursively.
But I think that you are taking a wrong approach. I obviously don't know what exactly you are trying to achieve, but seems to me, that you should use tree data structure instead of the multidimensional hash.
use strict;
use warnings;
use v5.10;
use Data::Dumper;
my #data = (
'some/path/test',
'some/path/deeper/test',
'another/deeper/path/test',
);
my $resultHr = {};
foreach my $path (#data) {
my #elems = split /\//, $path;
buildHash($resultHr, #elems);
}
say Dumper($resultHr);
sub buildValue {
my $n = shift;
if (#_) {
return {$n => buildValue(#_)};
}
else {
return $n;
}
}
sub buildHash {
my $hr = shift;
my $k = shift;
return unless $k;
if (exists $hr->{$k} && ref $hr->{$k}) {
buildHash($hr->{$k}, #_);
}
else {
$hr->{$k} = buildValue(#_);
}
}

Run Function On N Elements of Array at a Time

Let's say I have an array my #arr which is of hypothetical length 240 (but this length is variable). How could I go about running a function on each N elements of that array at a time?
For example, if N = 100, the function would first run on #arr[0..99], then #arr[100..199], and finally #arr[200..239].
My idea was to have some sort of loop which creates a temporary array of length N by joining N elements together, but these seems overly complicated.
while (#arr) {
f(splice(#arr, 0, 100));
}
Non-destructive version:
use List::Util qw( min );
for (my $i=0; $i<#a; $i+=100) {
f(#arr[$i .. min($i+99, $#arr)]);
}
You could also avoid destroying the original array cheaply by creating an array of aliases:
my $shadow = sub { \#_ }->(#arr);
while (#$shadow) {
f(splice(#$shadow, 0, 100));
}
You can use splice:
sub apply_f_to_n_elements {
my ($f, $n, #elements) = #_;
my #results;
while ( #elements ) {
my #batch = splice(#elements, 0, $n)
push #results,
$f->(#batch);
}
return #results;
}
Usage:
apply_f_to_n_elements(\&f, 100, #arr);
or
my #results = apply_f_to_n_elements(\&f, 100, #arr);
The following is optimized to avoid making two copies of every input (while still remaining non-destructive), and optimized to avoid collecting the results of the callback if they're going to be discarded anyway:
sub apply_f_to_n_elements {
my $f = shift;
my $n = shift;
if (wantarray) {
my #results;
while (#_) {
push #results, $f->(splice(#_, 0, $n));
}
return #results;
} else {
while (#_) {
$f->(splice(#_, 0, $n));
}
}
}
Using List::Util qw(min) works well.
However, if you want a more semantic solution, you can use List::MoreUtils qw(natatime).
use List::MoreUtils qw(natatime);
my $iter = natatime 100, #arr;
while (my #vals = $iter->()) {
f(#vals);
}

slicing out multiples of nth element

I am having a list which has more than 10k elements. I want to remove every third element.
For example,
#testlists = qw (helloworld sessions first.cgi login localpcs depthhashes.cgi search view macros plugins ...) ;
I want to remove first.cgi, depthhashses.cgi, macros and so on from original array. Grep function is little bit slower. Please suggest me a faster grep search or any other similar subroutines . Any help would be highly appreciated
I can think of a few solutions:
Grep on index divisibility
my $i = 0;
#testlist = grep { ++$i % 3 } #testlist;
repeated splicing
for (my $i = 2; $i < $#testlist; $i += 2) {
splice #testlist, $i, 1;
}
Copying with skipping
my #output;
# pre-extend the array for fewer reallocations
$#output = #testlist * 2/3;
#output = ();
# FIXME annoying off-by one errors at the end that can add one undef
for (my $i = 0; $i < #testlist; $i += 3) {
push #output, #testlist[$i, $i+1];
}
Ikegami has corrected and optimized the copying solution in his remarkable answer.
A benchmark with 1,000-element lists declares splicing the clear winner:
Rate slice grep copy splice
slice 790/s -- -10% -18% -37%
grep 883/s 12% -- -8% -29%
copy 960/s 22% 9% -- -23%
splice 1248/s 58% 41% 30% --
(slice is mob's solution)
This is probably because it offloads most of the actual work into the C-level implementations, and avoids allocations and costly Perl-level operations.
With 10,000-element lists, the advantages shift towards the other solutions. Indeed, the splice solution has a very bad algorithmic complexity, because it moves all elements after all splicing position, meaning that the last element is moved nearly 3333 times:
Rate splice slice grep copy
splice 42.7/s -- -35% -42% -49%
slice 65.3/s 53% -- -12% -23%
grep 74.2/s 74% 14% -- -12%
copy 84.4/s 98% 29% 14% --
Here is the script I used for benchmarking.
amon's copy can be tweaked to be 30% faster!
my $i = 1;
my $j = 1;
while ($i < #a) {
$a[++$j] = $a[$i+=2];
$a[++$j] = $a[++$i];
}
$#a = $j-1 if #a>2;
And you can get even faster by avoid copying anything at all (though the result is a reference to an array). The longer the strings in each element, the better this will do.
my $i = 0;
my $ref = sub { \#_ }->( grep { ++$i % 3 } #a );
Results (10,000 elements):
>perl a.pl
Rate splice_a splice_r grep_a copy1_a copy1_r copy2_r copy2_a grep_r
splice_a 52.8/s -- -0% -51% -54% -56% -66% -66% -68%
splice_r 52.9/s 0% -- -51% -54% -55% -66% -66% -68%
grep_a 107/s 103% 103% -- -7% -10% -30% -31% -34%
copy1_a 115/s 118% 117% 7% -- -3% -25% -26% -30%
copy1_r 119/s 125% 124% 11% 3% -- -23% -23% -27%
copy2_r 154/s 191% 190% 43% 34% 29% -- -0% -6%
copy2_a 154/s 192% 192% 44% 34% 30% 0% -- -6%
grep_r 163/s 209% 209% 52% 42% 37% 6% 6% --
Benchmark:
use strict;
use warnings;
use Benchmark qw( cmpthese );
my #testlist = qw( helloworld sessions first.cgi login localpcs depthhashes.cgi search view macros );
#testlist = ( #testlist ) x ( 10000 / #testlist );
sub grep_a { my #a = #testlist; my $i = 0; #a = grep { ++$i % 3 } #a; 1 }
sub copy1_a { my #a = #testlist;
my #b;
$#b = $#a; #b = (); # Does absolutely nothing in this benchmark because of optimisations in Perl.
for (my $i = 0; $i < #a; $i += 3) {
push #b, #a[$i, $i+1];
}
1
}
sub copy2_a { my #a = #testlist;
my $i = 1;
my $j = 1;
while ($i < #a) {
$a[++$j] = $a[$i+=2];
$a[++$j] = $a[++$i];
}
$#a = $j-1 if #a>2;
1
}
sub splice_a { my #a = #testlist;
for (my $i = 2; $i < $#a; $i += 2) {
splice #a, $i, 1;
}
1
}
sub grep_r { my $r = [ #testlist ]; my $i = 0; $r = sub { \#_ }->( grep { ++$i % 3 } #$r ); 1 }
sub copy1_r { my $r = [ #testlist ];
my #b;
$#b = $#$r; #b = (); # Does absolutely nothing in this benchmark because of optimisations in Perl.
for (my $i = 0; $i < #$r; $i += 3) {
push #b, #$r[$i, $i+1];
}
$r = \#b;
1
}
sub copy2_r { my $r = [ #testlist ];
my $i = 1;
my $j = 1;
while ($i < #$r) {
$r->[++$j] = $r->[$i+=2];
$r->[++$j] = $r->[++$i];
}
$#$r = $j-1 if #$r>2;
1
}
sub splice_r { my $r = [ #testlist ];
for (my $i = 2; $i < $#$r; $i += 2) {
splice #$r, $i, 1;
}
1
}
cmpthese(-3, {
grep_a => \&grep_a,
copy1_a => \&copy1_a,
copy2_a => \&copy2_a,
splice_a => \&splice_a,
grep_r => \&grep_r,
copy1_r => \&copy1_r,
copy2_r => \&copy2_r,
splice_r => \&splice_r,
});
Use an array slice.
#testlists = #testlists[ grep { ($_+1) % 3 } 0..$#testlists ];
I'm not sure what you mean about using grep but perhaps you mean something like this.
for $i (0 .. $#testlists) {
if (($i % 3) == 2) {
delete $testlists[$i];
}
}
# Demonstrate the values.
foreach $e (#testlists) {
print "$e\n";
}

Subroutine that takes average of one or more arrays

I'm working on a subroutine that takes the average of 1 or more arrays. I would like to do this without using a module.
use strict;
use warnings;
use List::Util 'sum';
my #w = (0, 2);
my #x = (1, 3);
my #y = (2, 2);
my #z = (1, 1);
# the average of these four arrays is (1,2) since
# (0+1+2+1)/4 = 1 and (2+3+2+1)/4 = 2
my #arrays = \(#w, #x, #y, #z);
my #avg;
# this is the way to do it using the module
for my $i (0..$#w) {
$avg[$i] = sum(map $_->[$i], #arrays) / #arrays;
}
print "#avg\n";
# my way of doing it without module
#avg;
for my $i (0..$#w) {
$avg[$i] = prod_sum(map $_->[$i], \#arrays) / #arrays;
}
print "#avg\n";
# subroutines
sub prod_sum{
my $o = $_[0];
my $arr_ref = $_[1];
my $array_ref;
foreach my $row (#$arr_ref){
foreach my $cell (#$row) {
push(#{ $array_ref }, $_);
}
}
my $sum = $o + the_sum($array_ref);
return $sum;
}
sub the_sum{
my $sum = 0;
for ( #{$_[0]} ) {
$sum += $_;
}
return $sum;
}
output
1 2
[pair of really random big numbers]
The first output is correct. It displays the average of all of the arrays. The second output is completely wrong. How do I do this without using a module?
I propose this solution:
use strict;
use warnings;
my #w = (0, 2);
my #x = (1, 3);
my #y = (2, 2);
my #z = (1, 1);
my #arrays = \(#w, #x, #y, #z);
my ($x, $y) = (0, 0);
foreach my $arr(#arrays) {
$x += $arr->[0];
$y += $arr->[1];
}
my #result = ( $x / #arrays, $y / #arrays);
print "(#result)", "\n"; # <---- prints (1 2)
You think sum is being passed two variables, it is not. It is only being passed an array. Modify your prod_sum to expect only an array (and replace \#arrays in the call of prod_sum to be just #arrays). Or you can use this:
sub sum {
return 0 if $#_ < 0;
my $head = shift;
return $head + sum(#_);
}
The above is a recursive subroutine that will sum an array.
Note: if your array has more then 100 element, use warnings will emit a deep recursion warning. For more on that topic, see here

Resources