Currently, I am using following code to convert an irregular multidimensional array into one dimensional array.
my $array = [0,
[1],
2,
[3, 4, 5],
[6,
[7, 8, 9 ],
],
[10],
11,
];
my #mylist;
getList($array);
print Dumper (\#mylist);
sub getList
{
my $array = shift;
return if (!defined $array);
if (ref $array eq "ARRAY")
{
foreach my $i (#$array)
{
getList($i);
}
}
else
{
print "pushing $array\n";
push (#mylist, $array);
}
}
This is based on recursion where I am checking each element. If element is a reference to an array then calling it recursively with new array.
Is there a better way to solve this kind of problem?
First of all your function should never return data by modifying a global variable. Return a list instead.
As for efficiency, Perl has a surprisingly large function call overhead. Therefore for large data structures I would prefer a non-recursive approach. Like so:
use Data::Dumper;
my $array = [
0,
[1],
2,
[3, 4, 5],
[6, [7, 8, 9 ]],
[10],
11,
];
my #mylist = get_list($array);
print Dumper (\#mylist);
sub get_list {
my #work = #_;
my #result;
while (#work) {
my $next = shift #work;
if (ref($next) eq 'ARRAY') {
unshift #work, #$next;
}
else {
push #result, $next;
}
}
return #result;
}
Note that the formatting that I am using here matches the recommendations of perlstyle. We all know the futility of arguing the One True Brace Style. But at the least I'm going to suggest that you reduce your 8 space indent. There is research into this, and code comprehension has been shown to be improved with indents in the 2-4 space range. Read Code Complete for details. It doesn't matter where you are in that range for young people, but older programmers whose eyesight is going find 4 a better indent. Read Perl Best Practices for more on that.
Use CPAN. Do not worry about recursion overhead until you know it is a problem.
#!/usr/bin/perl
use strict;
use warnings;
use List::Flatten::Recursive;
my $array = [
0,
[1],
2,
[3, 4, 5],
[6, [7, 8, 9 ]],
[10],
11,
];
my #result = flat($array);
print join(", ", #result), "\n";
It's generally better to replace recursion with iteration. For general techniques, see Higher Order Perl book (freely avaialble) chapter 5, in this case:
my #stack = ($array);
my #flattened;
while (#stack) {
my $first = shift #stack;
if (ref($first) eq ref([])) {
push #stack, #$first; # Use unshift to keep the "order"
} else {
push #flattened, $first;
}
}
The reason it's better is because recursive implementations:
Risk running into stack overflow if there are too many nested levels
Less efficient due to the cost of recursive calls
In generall this is the only way to do this.
You can optimize your code a little, by only caling getList() again, when you encounter a ArrayRef. If you find a regular value you can push it directly into #mylist instead of rerunning getList().
I've used this is in the past. This code is on the command line, but you can put the code in single quotes into your .pl file
$ perl -le'
use Data::Dumper;
my #array = ( 1, 2, 3, [ 4, 5, 6, [ 7, 8, 9 ] ], [ 10, 11, 12, [ 13, 14, 15 ] ], 16, 17, 18 );
sub flatten { map ref eq q[ARRAY] ? flatten( #$_ ) : $_, #_ }
my #flat = flatten #array;
print Dumper \#flat;
'
Related
This question is different from this one.
I have an array of arrays of AR items looking something like:
[[1,2,3], [4,5,6], [7,8,9], [7,8,9], [1,2,3], [7,8,9]]
I would like to sort it by number of same occurences of the second array:
[[7,8,9], [1,2,3], [4,5,6]]
My real data are more complexes, looking something like:
raw_data = {}
raw_data[:grapers] = []
suggested_data = {}
suggested_data[:grapers] = []
varietals = []
similar_vintage.varietals.each do |varietal|
# sub_array
varietals << Graper.new(:name => varietal.grape.name, :grape_id => varietal.grape_id, :percent => varietal.percent)
end
raw_data[:grapers] << varietals
So, I want to sort raw_data[:grapers] by the max occurrencies of each varietals array comparing this value: grape_id inside them.
When I need to sort a classical array of data by max occurencies I do that:
grapers_with_frequency = raw_data[:grapers].inject(Hash.new(0)) { |h,v| h[v] += 1; h }
suggested_data[:grapers] << raw_data[:grapers].max_by { |v| grapers_with_frequency[v] }
This code doesn't work cos there are sub arrays there, including AR models that I need to analyze.
Possible solution:
array.group_by(&:itself) # grouping
.sort_by {|k, v| -v.size } # sorting
.map(&:first) # optional step, depends on your real data
#=> [[7, 8, 9], [1, 2, 3], [4, 5, 6]]
I recommend you take a look at the Ruby documentation for the sort_by method. It allows you to sort an array using anything associated with the elements, rather than the values of the elements.
my_array.sort_by { |elem| -my_array.count(elem) }.uniq
=> [[7, 8, 9], [1, 2, 3], [4, 5, 6]]
This example sorts by the count of each element in the original array. This is preceded with a minus so that the elements with the highest count are first. The uniq is to only have one instance of each element in the final result.
You can include anything you like in the sort_by block.
As Ilya has pointed out, having my_array.count(elem) in each iteration will be costlier than using group_by beforehand. This may or may not be an issue for you.
arr = [[1,2,3], [4,5,6], [7,8,9], [7,8,9], [1,2,3], [7,8,9]]
arr.each_with_object(Hash.new(0)) { |a,h| h[a] += 1 }.
sort_by(&:last).
reverse.
map(&:first)
#=> [[7.8.9]. [1,2,3], [4,5,6]]
This uses the form of Hash::new that takes an argument (here 0) that is the hash's default value.
i am trying to create a subroutine that does the following :
Takes two arrays as input (Filter, Base)
Outputs only the values of the second array that do not exist in the first
Example :
#a = ( 1, 2, 3, 4, 5 );
#b = ( 1, 2, 3, 4, 5, 6, 7);
Expected output : #c = ( 6, 7 );
Called as : filter_list(#filter, #base)
###############################################
sub filter_list {
my #names = shift;
my #arrayout;
foreach my $element (#_)
{
if (!($element ~~ #names )){
push #arrayout, $element;
}
}
return #arrayout
}
Test Run :
#filter = ( 'Tom', 'John' );
#array = ( 'Tom', 'John', 'Mary' );
#array3 = filter_list(#filter,#array);
print #array3;
print "\n";
Result :
JohnJohnMary
Can anyone help? Thank you.
You can't pass arrays to subs, only scalars. So when you do
my #filtered = filter_list(#filter, #base);
you are really doing
my #filtered = filter_list($filter[0], $filter[1], ..., $base[0], $base[1], ...);
As such, when you do
my #names = shift;
you are really doing
my #names = $filter[0];
which is obviously wrong.
The simplest solution is to pass references to the arrays.
my #filtered = filter_list(\#filter, \#base);
A hash permits an efficient implementation (O(N+M)).
sub filter_list {
my ($filter, $base) = #_;
my %filter = map { $_ => 1 } #$filter;
return grep { !$filter{$_} } #$base;
}
Alternatively,
my #filtered = filter_list(\#filter, #base);
could be implemented as
sub filter_list {
my $filter = shift;
my %filter = map { $_ => 1 } #$filter;
return grep { !$filter{$_} } #_;
}
What you're looking for is the difference of two sets. This, along with union, intersection, and a bunch of others are set operations. Rather than writing your own, there's plenty of modules for dealing with sets.
Set::Object is very fast and featureful. I'd avoid using the operator interface (ie. $set1 - $set2) as it makes the code confusing. Instead use explicit method calls.
use strict;
use warnings;
use v5.10;
use Set::Object qw(set);
my $set1 = set(1, 2, 3, 4, 5);
my $set2 = set(1, 2, 3, 4, 5, 6, 7);
say join ", ", $set2->difference($set1)->members;
Note that sets are unordered and cannot contain duplicates. This may or may not be what you want.
This uses List::Compare, a module with a large collection of routines for comparing lists.
Here you want get_complement
use warnings;
use strict;
use List::Compare;
my #arr1 = ( 1, 2, 3, 4, 5 );
my #arr2 = ( 1, 2, 3, 4, 5, 6, 7);
my $lc = List::Compare->new(\#arr1, \#arr2);
my #only_in_second = $lc->get_complement;
print "#only_in_second\n";
The module has many options.
If you don't need the result sorted, pass -u to the constructor for faster operation.
There is also the "Accelerated Mode", obtained by passing -a. For the purpose of efficient repeated comparisons between the same arrays many things are precomputed at construction. With this flag that is suppressed, which speeds up single comparisons. See List::Compare Modes.
These two options can be combined, List::Compare->new('-u', '-a', \#a1, \#a2).
Operations on three or more lists are supported.
There is also the functional interface, as a separate List::Compare::Functional module.
In Ruby, is there a short and sweet way to sort this hash of arrays by score descending:
scored = {:id=>[1, 2, 3], :score=>[8.3, 5, 10]}
so it looks like this?:
scored = {:id=>[3, 1, 2], :score=>[10, 8.3, 5]}
I couldnt find an example where I can sort arrays within a hash like this? I could do this with some nasty code but I feel like there should be a 1 or 2 liner that does it?
You could use sort_by
scored = {:id=>[1, 2, 3], :score=>[8.3, 5, 10]}
scored.tap do |s|
s[:id] = s[:id].sort_by.with_index{ |a, i| -s[:score][i] }
s[:score] = s[:score].sort_by{ |a| -a }
end
#=> {:id=>[3, 1, 2], :score=>[10, 8.3, 5]}
order = scored[:score].each_with_index.sort_by(&:first).map(&:last).reverse
#=> [2,0,1]
scored.update(scored) { |_,a| a.values_at *order }
#=> {:id=>[3, 1, 2], :score=>[10, 8.3, 5]}
If scored is to not to be mutated, replace update with merge.
Some points:
Computing order makes it easy for the reader to understand what's going on.
The second line uses the form of Hash#merge that employs a block to determine the values of keys that are present in both hashes being merged (which here is all keys). This is a convenient way to modify hash values (generally), in part because the new hash is returned.
I sorted then reversed, rather than sorted by negated values, to make the method more rubust. (That is, the elements of the arrays that are the values can be from any class that implements <=>).
With Ruby 2.2+, another way to sort an array arr in descending order is to use Enumerable#max_by: arr.max_by(arr.size).to_a.
The first line could be replaced with:
arr = scored[:score]
order = arr.each_index.sort_by { |i| arr[i] }.reverse
#=> [2,0,1]
Here is one possible solution. It has an intermediate step, where it utilizes a zipped version of the scores object, but produces the correct output:
s = scored.values.inject(&:zip).sort_by(&:last).reverse
#=> [[3, 10], [1, 8.3], [2, 5]]
result = { id: s.map(&:first), score: s.map(&:last) }
#=> { :id => [3, 1, 2], :score => [10, 8.3, 5] }
There is a question very similar to this already but I would like to do this for multiple arrays. I have an array of arrays.
my #AoA = (
$arr1 = [ 1, 0, 0, 0, 1 ],
$arr2 = [ 1, 1, 0, 1, 1 ],
$arr3 = [ 2, 0, 2, 1, 0 ]
);
I want to sum the items of all the three (or more) arrays to get a new one like
( 4, 1, 2, 2, 2 )
The use List::MoreUtils qw/pairwise/ requires two array arguments.
#new_array = pairwise { $a + $b } #$arr1, #$arr2;
One solution that comes to mind is to loop through #AoA and pass the first two arrays into the pairwise function. In the subsequent iterations, I will pass the next #$arr in #AoA and the #new_array into the pairwise function. In the case of an odd sized array of arrays, after I've passed in the last #$arr in #AoA, I will pass in an equal sized array with elements of 0's.
Is this a good approach? And if so, how do I implement this? thanks
You can easily implement a “n-wise” function:
sub nwise (&#) # ← take a code block, and any number of further arguments
{
my ($code, #arefs) = #_;
return map {$code->( do{ my $i = $_; map $arefs[$_][$i], 0 .. $#arefs } )}
0 .. $#{$arefs[0]};
}
That code is a bit ugly because Perl does not support slices of multidimensional arrays. Instead I use nested maps.
A quick test:
use Test::More;
my #a = (1, 0, 0, 0, 1);
my #b = (1, 1, 0, 1, 1);
my #c = (2, 0, 2, 1, 0);
is_deeply [ nwise { $_[0] + $_[1] + $_[2] } \#a, \#b, \#c], [4, 1, 2, 2, 2];
I prefer passing the arrays as references instead of using the \# or + prototype: This allows you to do
my #arrays = (\#a, \#b, \#c);
nwise {...} #arrays;
From List::MoreUtils you could have also used each_arrayref:
use List::Util qw/sum/;
use List::MoreUtils qw/each_arrayref/;
my $iter = each_arrayref #arrays;
my #out;
while (my #vals = $iter->()) {
push #out, sum #vals;
}
is_deeply \#out, [4, 1, 2, 2, 2];
Or just plain old loops:
my #out;
for my $i (0 .. $#a) {
my $accumulator = 0;
for my $array (#arrays) {
$accumulator += $array->[$i];
}
push #out, $accumulator;
}
is_deeply \#out, [4, 1, 2, 2, 2];
The above all assumed that all arrays were of the same length.
A note on your snippet:
Your example of the array structure is of course legal perl, which will even run as intended, but it would be best to leave out the inner assignments:
my #AoA = (
[ 1, 0, 0, 0, 1 ],
[ 1, 1, 0, 1, 1 ],
[ 2, 0, 2, 1, 0 ],
);
You might actually be looking for PDL, the Perl Data Language. It is a numerical array module for Perl. It has many functions for processing arrays of data. Unlike other numerical array modules for other languages it has this handy ability to use its functionality on arbitrary dimensions and it will do what you mean. Note that this is all done at the C level, so it is efficient and fast!
In your case you are looking for the projection method sumover which will take an N dimensional object and return an N-1 dimensional object created by summing over the first dimension. Since in your system you want to sum over the second we first have to transpose by exchanging dimensions 0 and 1.
#!/usr/bin/env perl
use strict;
use warnings;
use PDL;
my #AoA = (
[ 1, 0, 0, 0, 1 ],
[ 1, 1, 0, 1, 1 ],
[ 2, 0, 2, 1, 0 ],
);
my $pdl = pdl \#AoA;
my $sum = $pdl->xchg(0,1)->sumover;
print $sum . "\n";
# [4 1 2 2 2]
The return from sumover is another PDL object, if you need a Perl list you can use list
print "$_\n" for $sum->list;
Here's a simple iterative approach. It probably will perform terribly for large data sets. If you want a better performing solution you will probably need to change the data structure, or look on CPAN for one of the statistical packages. The below assumes that all arrays are the same size as the first array.
$sum = 0;
#rv = ();
for ($y=0; $y < scalar #{$AoA[0]}; $y++) {
for ($x=0; $x < scalar #AoA; $x++) {
$sum += ${$AoA[$x]}[$y];
}
push #rv, $sum;
$sum = 0;
}
print '('.join(',',#rv).")\n";
Assumptions:
each row in your AoA will have the same number of columns as the first row.
each value in the arrayrefs will be a number (specifically, a value in a format that "works" with the += operator)
there will be at least one "row" with sat least one "column"
Note: "$#{$AoA[0]}" means, "the index of the last element ($#) of the array that is the first arrayref in #AoA ({$AoA[0]})"
(shebang)/usr/bin/perl
use strict;
use warnings;
my #AoA = (
[ 1, 0, 0, 0, 1 ],
[ 1, 1, 0, 1, 1 ],
[ 2, 0, 2, 1, 0 ]
);
my #sums;
foreach my $column (0..$#{$AoA[0]}) {
my $sum;
foreach my $aref (#AoA){
$sum += $aref->[$column];
}
push #sums,$sum;
}
use Data::Dumper;
print Dumper \#sums;
Pseudo code:
my #unsortedArray = { ["Harry", 10], ["Tim", 8], ["Joe", 3]};
my #sortedArray = ?????
Final sortedArray should be sorted based on col-2 (integers), taking care of the 1-to-1 relationship with the "Name of the person" (col-1). Final result should look like:
sortedArray should be { ["Joe", 3], ["Tim", 8], ["Harry", 10] };
You can give a predicate to sort, that is: a function which is evaluated to compare elements of the list.
my #unsorted = ( ["Harry", 10], ["Tim", 8], ["Joe", 3] );
my #sorted = sort { $a->[1] <=> $b->[1] } #unsorted;
In the predicate (the expression in curly braces), $a and $b are the elements of the outer list which are compared.
sort is only concerned with one-dimensional lists, so it won't mess with the internal structure of the elements of the outer list. So the relationship between name and number is retained effortlessly.
Refer to perldoc -f sort and perldoc perlop for more details.
A more efficient solution, especially for larger arrays, may be to use List::UtilsBy::nsort_by:
use List::UtilsBy qw( nsort_by );
my #unsorted = ( ["Harry", 10], ["Tim", 8], ["Joe", 3] );
my #sorted = nsort_by { $_->[1] } #unsorted;
While in small cases the overhead is not likely to be noticed, for more complex functions the O(n log n) key extraction cost becomes higher, and it is more preferrable to extract the "sort key" of each value only once, which is what nsort_by does.