Parametric sorting of a list of hashes

Parametric sorting of a list of hashes - arrays

My aim is to write a subroutine which takes in
An array of hashes
A list containing the sort order
Just to be clear - the keys may be anything. My example is just for reference.
Given an array containing a list of keys in their required sort order
my #aSortOrder = ( 'DELTA1_2', 'SET1', 'SET2' );
My idea is to form a string
$a->{DELTA1_2} <=> $b->{DELTA1_2} or $a->{SET1} <=> $b->{SET1} or $a->{SET2} <=> $b->{SET2}
and then execute it with eval.
Here's my code
my $paRecords = [
{ 'SET1' => 48265, 'DELTA1_2' => -1, 'SET2' => 48264 },
{ 'SET1' => 8328, 'DELTA1_2' => -29, 'SET2' => 8299 },
{ 'SET1' => 20, 'DELTA1_2' => 0, 'SET2' => 0 },
{ 'SET1' => 10, 'DELTA1_2' => 0, 'SET2' => 0 }
];
my #aSortOrder = ( 'DELTA1_2', 'SET1', 'SET2' );
my $pStr = '';
foreach ( #aSortOrder ) {
$pStr = $pStr . ' or $a->{' . $_ . '} <=> $b->{' . $_ . '}';
}
$pStr =~ s/^\s*or\s*//;
my #aSorted = sort { eval "$pStr"; } #$paRecords;
print Dumper \#aSorted;
output
$VAR1 = [
{
'SET1' => 8328,
'SET2' => 8299,
'DELTA1_2' => -29
},
{
'SET1' => 48265,
'SET2' => 48264,
'DELTA1_2' => -1
},
{
'SET2' => 0,
'DELTA1_2' => 0,
'SET1' => 10
},
{
'SET2' => 0,
'DELTA1_2' => 0,
'SET1' => 20
}
];
I guess that this is far from the ideal approach to solving the problem, so any pointer on how this problem could be better solved would be a great help.

Just create a sub that does the comparison.
sub custom_cmp {
my $keys = shift;
for my $key (#$keys) {
my $cmp = $_[0]{$key} <=> $_[1]{$key};
return $cmp if $cmp;
}
return 0;
}
my #aSorted = sort { custom_cmp(\#aSortOrder, $a, $b) } #$paRecords;
The above makes two sub calls for each comparison. If we generate the compare function, we can reduce that to one.
sub make_custom_cmp {
my #keys = #_;
return sub($$) {
for my $key (#keys) {
my $cmp = $_[0]{$key} <=> $_[1]{$key};
return $cmp if $cmp;
}
return 0;
};
}
my $cmp = make_custom_cmp(#aSortOrder);
my #aSorted = sort $cmp #$paRecords;
We could go one further and flatten the loop through code generation. This is what a "proper" eval-based solution would look like. However, this level of optimization is hardly needed.
sub make_custom_cmp {
my #keys = #_;
my #cmps;
for $i (0..$#keys) {
push #cmps, "\$_[0]{\$keys[$i]} <=> \$_[1]{\$keys[$i]}"
}
return eval("sub($$) { ".( join(" || ", #cmps) )."}");
}
my $cmp = make_custom_cmp(#aSortOrder);
my #aSorted = sort $cmp #$paRecords;
In fact, the following is probably the most performant solution:
my #aSorted =
map $paRecords->[ unpack('N', substr($_, -4))-0x7FFFFFFF ],
sort
map pack('N*', map $_+0x7FFFFFFF, #{ $paRecords->[$_] }{#aSortOrder}, $_),
0..$#$paRecords;

The block passed to sort may contain any amount of code. It is required only to evaluate to a negative number, zero, or a positive number according to whether $a should be considered to be less than, equal to, or great than $b
I agree with your decision to bundle this into a subroutine, so I have written sort_hashes_by_keys, which expects a reference to an array of hashes to be sorted, and a reference to an array of key strings. It returns a list of hashes sorted according to the list of keys
use strict;
use warnings 'all';
use Data::Dump 'dd';
my $records = [
{ SET1 => 48265, DELTA1_2 => -1, SET2 => 48264 },
{ SET1 => 8328, DELTA1_2 => -29, SET2 => 8299 },
{ SET1 => 20, DELTA1_2 => 0, SET2 => 0 },
{ SET1 => 10, DELTA1_2 => 0, SET2 => 0 }
];
my #sort_order = qw/ DELTA1_2 SET1 SET2 /;
my #sorted = sort_hashes_by_keys( $records, \#sort_order );
dd \#sorted;
sub sort_hashes_by_keys {
my ( $hashes, $order ) = #_;
sort {
my $cmp = 0;
for my $key ( #$order ) {
last if $cmp = $a->{$key} <=> $b->{$key};
}
$cmp;
} #$hashes;
}
output
[
{ DELTA1_2 => -29, SET1 => 8328, SET2 => 8299 },
{ DELTA1_2 => -1, SET1 => 48265, SET2 => 48264 },
{ DELTA1_2 => 0, SET1 => 10, SET2 => 0 },
{ DELTA1_2 => 0, SET1 => 20, SET2 => 0 },
]
Note that I strongly advise against both hungarian notation and camel case when naming your variables. Perl is not strictly typed, and it has sigils like $, # and % which indicate the type of every variable, so hungarian notation is superfluous at best, and also adds distracting and irrelevant noise. Also, by convention, capital letters are reserved for module names and global variables, so local identifiers should be in "snake case", i.e. lower-case letters and underscores. Many non-English speakers also find camel case difficult to parse

Well, you're quite right - using eval like that is a road to future pain.
The joy of 'sort' is that you can define a sort subroutine, that implicitly defines $a and $b and you can use whatever logic you desire to decide if it's a positive, negative or 'zero' comparison (equal). (e.g. like <=> or cmp do).
The trick here is - 'true' is anything non zero, so <=> you can test for 'true' to see if there's a comparison to be made ( 4 <=> 4 is 'false')
So if you're just working numerically (you'd need to test for 'alphanumeric' and use cmp in some cases there, but doesn't seem to apply to your data):
#!/usr/bin/env perl
use strict;
use warnings;
my $paRecords = [
{ 'SET1' => 48265, 'DELTA1_2' => -1, 'SET2' => 48264 },
{ 'SET1' => 8328, 'DELTA1_2' => -29, 'SET2' => 8299 },
{ 'SET1' => 20, 'DELTA1_2' => 0, 'SET2' => 0 },
{ 'SET1' => 10, 'DELTA1_2' => 0, 'SET2' => 0 }
];
#qw is 'quote-words' and just lets you space delimit terms.
#it's semantically the same as ( 'DELTA1_2', 'SET1', 'SET2' );
my #order = qw ( DELTA1_2 SET1 SET2 );
#note - needs to come after definition of `#order` but it can be re-written later as long as it's in scope.
#you can pass an order explicitly into the subroutine if you want though.
sub order_by {
for my $key (#order) {
#compare key
my $result = $a->{$key} <=> $b->{$key};
#return it and exit the loop if they aren't equal, otherwise
#continue iterating sort terms.
return $result if $result;
}
return 0; #all keys were similar, therefore return zero.
}
print join (",", #order), "\n";
foreach my $record ( sort {order_by} #$paRecords ) {
#use hash slice to order output in 'sort order'.
#optional, but hopefully clarifies what's going on.
print join (",", #{$record}{#order}), "\n";
}
This, given your data outputs:
DELTA1_2,SET1,SET2
-29,8328,8299
-1,48265,48264
0,10,0
0,20,0
Note, I've opted to use hash slice for your output, because otherwise hashes are unordered, and so your Dumper output will be inconsistent (randomly ordered fields).
If you need to be a little more dynamic about your ordering, you can pass it into the sort-sub:
#!/usr/bin/env perl
use strict;
use warnings;
sub order_by {
for my $key (#_) {
#compare key
my $result = $a->{$key} <=> $b->{$key};
#return it and exit the loop if they aren't equal, otherwise
#continue iterating sort terms.
return $result if $result;
}
return 0; #all keys were similar, therefore return zero.
}
my $paRecords = [
{ 'SET1' => 48265, 'DELTA1_2' => -1, 'SET2' => 48264 },
{ 'SET1' => 8328, 'DELTA1_2' => -29, 'SET2' => 8299 },
{ 'SET1' => 20, 'DELTA1_2' => 0, 'SET2' => 0 },
{ 'SET1' => 10, 'DELTA1_2' => 0, 'SET2' => 0 }
];
#qw is 'quote-words' and just lets you space delimit terms.
#it's semantically the same as ( 'DELTA1_2', 'SET1', 'SET2' );
my #order = qw ( DELTA1_2 SET1 SET2 );
print join( ",", #order ), "\n";
foreach my $record ( sort {order_by ( #order ) } #$paRecords ) {
#use hash slice to order output in 'sort order'.
#optional, but hopefully clarifies what's going on.
print join( ",", #{$record}{#order} ), "\n";
}

Related

Programmatically change sorting order in Perl

I'd like to give the user the possibility to change the sorting order (asc / desc) in a data structure. As far as I know, this is done changing the order of $a and $bin the code, but I'd like to programmatically change this to avoid redundant code.
I made a working example:
use 5.018;
use warnings;
# Supply any argument to change sorting order
my $sorting_direction = $ARGV[0];
my $data = {
'item1' => {
'min' => 4,
'size' => 825,
'max' => 256,
},
'item2' => {
'min' => 4,
'size' => 130,
'max' => 65,
},
};
if (defined $sorting_direction) {
foreach my $item (sort { $$data{$a}{'size'} <=> $$data{$b}{'size'} } keys %{$data} ) {
say "$item\t", $$data{$item}{'size'};
}
} else {
foreach my $item (sort { $$data{$b}{'size'} <=> $$data{$a}{'size'} } keys %{$data} ) {
say "$item\t", $$data{$item}{'size'};
}
}
Giving any parameter will change the sorting_direction. Can I do this without the if conditional?

As <=> has a value of -1, 0 or 1, you can multiply with -1 to get the opposite sorting order.
So if your $sorting_direction is 1 or -1 use
$sorting_direction * ( $$data{$a}{'size'} <=> $$data{$b}{'size'} )

A generic solution is to use different compare functions.
my %sorters = (
by_size_asc => sub { $data->{$a}{size} <=> $data->{$b}{size} },
by_size_desc => sub { $data->{$b}{size} <=> $data->{$a}{size} },
# ...
);
#ARGV
or die("usage\n");
my $sorter = $sorters{$ARGV[0]}
or die("Invalid sort function \"$ARGV[0]\".\n");
my #sorted_keys = sort $sorter keys(%$data);
You could also use different sort functions, such as when using the great Sort::Key module.
use Sort::Key qw( ikeysort rikeysort );
my %sorters = (
by_size_asc => sub { ikeysort { $data->{$_}{size} } #_ },
by_size_desc => sub { rikeysort { $data->{$_}{size} } #_ },
# ...
);
#ARGV
or die("usage\n");
my $sorter = $sorters{$ARGV[0]}
or die("Invalid sort function \"$ARGV[0]\".\n");
my #sorted_keys = $sorter->( keys(%$data) );

While it's always going to be slower because it's a full extra operation, if performance is not as much a concern as code cleanliness you could just reverse the list when the opposite sorting direction is chosen. Note that this would be slightly different in the case of sorting equal elements, as sort in Perl is normally stable (equal elements stay in the same order they originally were).
my #sorted = sort { $$data{$a}{'size'} <=> $$data{$b}{'size'} } keys %{$data};
#sorted = reverse #sorted if $reverse;

Convert array to multidimensional hash

My task is convert array, containing hash with x keys to x-1 dimensional hash.
Example:
use Data::Dumper;
my $arr = [
{
'source' => 'source1',
'group' => 'group1',
'param' => 'prm1',
'value' => 1,
},
{
'source' => 'source1',
'group' => 'group1',
'param' => 'prm2',
'value' => 2,
},
];
my $res;
for my $i (#$arr) {
$res->{ $i->{source} } = {};
$res->{ $i->{source} }{ $i->{group} } = {};
$res->{ $i->{source} }{ $i->{group} }{ $i->{param} } = $i->{value};
}
warn Dumper $res;
my $res_expected = {
'source1' => {
'group1' => {
'prm1' => 1, # wasn't added, why ?
'prm2' => 2
}
}
};
However it doesn't work as expected, 'prm1' => 1 wasn't added. What is wrong and how to solve this task ?

The problem is that you are assigning to the source even if something was there, and you lose it. Just do a ||= instead of = and you'll be fine.
Or even easier, just use the fact that Perl autovivifies and leave that out.
my $res;
for my $i (#$arr) {
$res->{ $i->{source} }{ $i->{group} }{ $i->{param} } = $i->{value};
}
warn Dumper $res;

The first 2 lines in the for loop are what is causing your problem. They assign a new hash reference each iteration of the loop (and erase what was entered in the previous iteration). In perl, there is no need to set a reference as you did. Just eliminate the first 2 lines and your data structure will be as you wish.
The method you chose only shows 'prmt' => 2 because that was the last item entered.

perl: deep merge with per-element arrays merge

I'm trying to merge two hashes and Hash::Merge does almost exactly what I need, except for arrays. Instead of concatenating arrays I need it to do per-element merge.
For example:
use Hash::Merge qw (merge);
my %a = ( 'arr' => [ { 'a' => 'b' } ] );
my %b = ( 'arr' => [ { 'c' => 'd' } ] );
my %c = %{ merge( \%a, \%b) };
Desired result is ('arr'=>[{'a'=>'b','c'=>'d'}]), actual result is ('arr'=>[{'a'=>'b'},{'c'=>'d'}])
Can this be done by using specify_behavior or is there some other way?

I think that specify_behaviour is used to specify how to handle conflicts, or uneven structures to merge. The documentation doesn't actually say much. But try it, go through defined shortcuts, or try to set them yourself. For your data structure you could try
SCALAR => ARRAY => sub { [ %{$_0}, %{$_[0]} ] }
SCALAR => ARRAY => HASH => sub { [ $_[0], $_[0] ] }
If you tried and it didn't work you may have found a bug in the module? By what you show it just didn't go "deep" enough. Here it is without the module. I've enlarged your sample structures.
use warnings;
use strict;
my %a = (
'arr1' => [ { a => 'A', a1 => 'A1' } ],
'arr2' => [ { aa => 'AA', aa1 => 'AA1' } ]
);
my %b = (
'arr1' => [ { b => 'B', b1 => 'B1' } ],
'arr2' => [ { bb => 'BB', bb1 => 'BB1' } ]
);
# Copy top level, %a to our target %c
my %c;
#c{keys %a} = values %a;
# Iterate over hash keys, then through array
foreach my $key (sort keys %c) {
my $arr_len = #{$c{$key}};
foreach my $i (0..$arr_len-1) {
my %hb = %{ ${$b{$key}}[$i] };
# merge: add %b to %c
#{ ${$c{$key}}[$i] }{keys %hb} = values %hb;
}
}
# Print it out
foreach my $key (sort keys %c) {
print "$key: ";
my $arr_len = #{$c{$key}};
foreach my $i (0..$arr_len-1) {
my %hc = %{ ${$c{$key}}[$i] };
print "$_ => $hc{$_}, " for sort keys %hc;
}
print "\n";
}
This prints the contents of %c (aligned manually here)
arr1: a => A, a1 => A1, b => B, b1 => B1,
arr2: aa => AA, aa1 => AA1, bb => BB, bb1 => BB1,
Code does not handle arrays/hashes of unequal sizes but checks can be added readily.

Another solution (that handles uneven hash elements in %a and %b).
my %c;
foreach my $key (keys %a, keys %b) {
my $a_ref = $a{$key};
my $b_ref = $b{$key};
$c{$key} = { map %$_, #$a_ref, #$b_ref };
}
use Data::Dumper;
print Dumper \%c;

Sorting Hash of Hashes by value

I have the following data structure
my %HoH = {
'foo1' => {
'bam' => 1,
'zip' => 0,
},
'foo2' => {
'bam' => 0,
'zip' => 1,
'boo' => 1
}
};
I would like to sort KEY1 (foo1 or foo2) by the VALUE stored in 'zip' in order from greatest to least.
Here's how I'm doing it.
use strict; use warnings;
use Data::Dumper;
my #sorted;
foreach my $KEY1 (keys %HoH) {
# sort KEY1 by the value 'zip' maps to in descending order
#sorted = sort {$HoH{$KEY1}{'zip'}{$b} <=>
$HoH{$KEY1}{'zip'}{$a}} keys %HoH;
}
print Dumper(\#sorted);
I'm getting an weird warning: Reference found where even-sized list expected at test.pl line 6.
Also print Dumper(\#sorted); is printing
$VAR1 = [
'HASH(0x1b542a8)'
];
When it should be printing
$VAR1 = [
['foo2', 'foo1']
];
Since foo2 has 1 zip and foo1 has 0 zip.

%HoH is declared as a hash, but is defined as a hashreference. Use parentheses (...) instead of braces {...}.
You don't need to loop through the hash to sort it. Sort will take care of that.
if you sort {...} keys %HoH, then the special variables $a and $b represent the keys of %HoH as it performs the sort.
$a and $b are in reverse order because your expected result is in decreasing order. (Update: Oh I just noticed that you had that in the first place.)
The zip value in the nested hash is $HoH{$KEY}{'zip'}, which is what you should sort by.
use strict;
use warnings;
use Data::Dumper;
my %HoH = (
'foo1' => {
'bam' => 1,
'zip' => 0,
},
'foo2' => {
'bam' => 0,
'zip' => 1,
'boo' => 1
}
);
my #sorted = sort {$HoH{$b}{'zip'} <=> $HoH{$a}{'zip'}} keys %HoH;
print Dumper \#sorted;
Note that the result of this code will give you an array:
$VAR1 = [
'foo2',
'foo1'
];
... not a nested array:
$VAR1 = [
['foo2', 'foo1']
];

How to push an Array of Hashes to Array of Hashes?

I have this script
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my %acc = ();
&insert_a(\%acc, 11);
&insert_p(\%acc, 111);
print Dumper %acc;
sub insert_a() {
my $acc_ref = shift;
$acc_ref->{"$_[0]"} = {
a => -1,
b => -1,
c => [ { }, ],
}
}
sub insert_p() {
my $acc_ref = shift;
my #AoH = (
{
d => -1,
e => -1,
}
);
push $acc_ref->{"$_[0]"}{"c"}, #AoH;
}
where I am trying to insert AoH into c which also is an AoH, but I am getting
Type of arg 1 to push must be array (not hash element) at ./push.pl line 36, near "#AoH;"
Execution of ./push.pl aborted due to compilation errors.
Any ideas how to do that?

The specific problem is that you can only push to an array, so you first need to dereference the array, and then, since it's in a larger data structure, you want to set its value to a reference.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my %acc = ();
# don't use & to call subs; that overrides prototypes (among other things)
# which you won't need to worry about, because you shouldn't be using
# prototypes here; they're used for something else in Perl.
insert_a(\%acc, 11);
insert_p(\%acc, 111);
# use \%acc to print as a nice-looking hashref, all in one variable
print Dumper \%acc;
# don't use () here - that's a prototype, and they're used for other things.
sub insert_a {
my $acc_ref = shift;
$acc_ref->{"$_[0]"} = {
a => -1,
b => -1,
c => [ { }, ],
}
}
# same here
sub insert_p {
my $acc_ref = shift;
my #AoH = (
{
d => -1,
e => -1,
}
);
# You need to dereference the first array, and pass it a reference
# as the second argument.
push #{ $acc_ref->{"$_[0]"}{"c"} }, \#AoH;
}
I'm not quite sure that the resulting data structure is what you intended, but now that you have the program working and can see the resulting structure, you can modify it to get what you need.

Hash values are always scalar, so to store an array in a hash you need to store a reference to the array. Try using the following line, where the hash value is dereferenced to an array.
push #{ $acc_ref->{$_[0]}->{'c'} }, #AoH;

Do it like,
push #{$acc_ref->{"$_[0]"}->{"c"}}, #AoH;
or you can try $acc_ref->{"$_[0]"}->{"c"} = \#AoH;
Your script,
use strict;
use warnings
use Data::Dumper;
my %acc = ();
&insert_a(\%acc, 11);
&insert_p(\%acc, 111);
print Dumper %acc;
sub insert_a() {
my $acc_ref = shift;
$acc_ref->{"$_[0]"} = {
a => -1,
b => -1,
c => [ { }, ],
}
}
sub insert_p() {
my $acc_ref = shift;
my #AoH = (
{
d => -1,
e => -1,
}
);
push #{$acc_ref->{"$_[0]"}->{"c"}}, #AoH;
}
Output:
$VAR1 = '11';
$VAR2 = {
'c' => [
{}
],
'a' => -1,
'b' => -1
};
$VAR3 = '111';
$VAR4 = {
'c' => [
{
'e' => -1,
'd' => -1
}
]
};

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Parametric sorting of a list of hashes - arrays

Related

Programmatically change sorting order in Perl

Convert array to multidimensional hash

perl: deep merge with per-element arrays merge

Sorting Hash of Hashes by value

How to push an Array of Hashes to Array of Hashes?

Categories

Resources