Perl reorder an array of hashes based on content - arrays

Have an array of hashes, I want to be able to reorder them, moving the first entry I find that matches a criteria to be the first entry in the array.
using List::Utils first method I can identify what I want to be the first entry in the array. How can I make the found entry the first element in the AoH?
#Borodin
An example of what the data looks like:
CAT1 => 'Foo', CAT2 => 'BAR', TITLE='test1',
CAT1 => 'BAZ', CAT2 => 'BAR', TITLE='test2',
.....
It has many entries. I wish to find the first entry (there could be more than one) where CAT1=BAZ and CAT2=BAR and move it to be the first item in the AoH.

Without realistic sample data it is hard to help.
You may sort the values of a list according to any criterion that is computable using Perl's sort operator, which takes an expression or a block as its second parameter
The library List::UtilsBy provides operators sort_by etc. that will probably provide a speed advantage if the sort criterion is a complex one
This sets up the data you've given and dumps it using Data::Dump
Then I've used first_index from List::MoreUtils, which finds the index of the first element of the array that conforms to your criteria
$_->{CAT1} eq 'BAZ' and $_->{CAT2} eq 'BAR'
And then an unshift together with a splice removes that element and puts it at the front of the array. There's a check that $i isn't zero to avoid moving an item that's already at the start of the array
Finally another call to dd shows that the matching item has been moved
use strict;
use warnings 'all';
use List::MoreUtils 'first_index';
use Data::Dump;
my #data = (
{
CAT1 => 'Foo',
CAT2 => 'BAR',
TITLE => 'test1',
},
{
CAT1 => 'BAZ',
CAT2 => 'BAR',
TITLE => 'test2',
}
);
dd \#data;
my $i = first_index {
$_->{CAT1} eq 'BAZ' and $_->{CAT2} eq 'BAR'
} #data;
die if $i < 0;
unshift #data, splice #data, $i, 1 unless $i == 0;
dd \#data;
output
[
{ CAT1 => "Foo", CAT2 => "BAR", TITLE => "test1" },
{ CAT1 => "BAZ", CAT2 => "BAR", TITLE => "test2" },
]
[
{ CAT1 => "BAZ", CAT2 => "BAR", TITLE => "test2" },
{ CAT1 => "Foo", CAT2 => "BAR", TITLE => "test1" },
]

To move the first matching entry to the start:
use List::MoreUtils qw( first_index );
my $i = first_index { matches($_) } #aoh;
unshift #aoh, splice(#aoh, $i, 1);
To move all matching entries to the start:
use sort 'stable';
#aoh =
sort {
my $a_matches = matches($a);
my $b_matches = matches($b);
( $a_matches ? 0 : 1 ) <=> ( $b_matches ? 0 : 1 )
}
#aoh;

Related

Looping through an array, displaying elements that match a criteria

I have this big array that I need to break down and only display specific elements within it that match a criteria.
My array looks like this.
[
{
:id => 9789,
:name => "amazing location",
:priority => 1,
:address_id => 12697,
:disabled => false
},
{
:id => 9790,
:name => "better location",
:priority => 1,
:address_id => 12698,
:disabled => false
},
{
:id => 9791,
:name => "ok location",
:priority => 1,
:address_id => 12699,
:disabled => true
}
]
What I need is to only display the elements within this array that have disabled set to true.
However when I try this, I get the error stating no implicit conversion of Symbol into Integer
array.map do |settings, value|
p hash[:disabled][:true]
end
I'm wondering if there is another way, or if there is a way to do this. If anyone could take a look, I would greatly appreciate it.
By providing two arguments to #map on an array, you're actually getting the first hash and then nil. When in reality you just want to loop for each and select those where disabled is true. You can do that instead with Array#select which will filter all elements of the array where the block returns a truthy value:
print array.select { |hash| hash[:disabled] }
=> [{:id=>9791, :name=>"ok location", :priority=>1, :address_id=>12699, :disabled=>true}]
You can try this with a short each or select.
a.each { |k,_v| puts k if k[:disabled] == true }
=> {:id=>9791, :name=>"ok location", :priority=>1, :address_id=>12699, :disabled=>true}
This iterates over each element (hash) inside the array you have and checks if the value of the key disabled on each value is true, and puts the key, just for example, you can set it as you want to do.
Or shorter:
puts a.select { |k,_v| k[:disabled] }
=> {:id=>9791, :name=>"ok location", :priority=>1, :address_id=>12699, :disabled=>true}
Your error shows up when you are treating an array or string as a Hash.
In PHP, array keys can be either numbers or strings, whereas in Ruby associative arrays are a separate data type, called a hash.
Here’s a cheatsheet for various foreach variants, translated into idiomatic Ruby:
Looping over a numeric array (PHP) :
<?php
$items = array( 'orange', 'pear', 'banana' );
# without indexes
foreach ( $items as $item ) {
echo $item;
}
# with indexes
foreach ( $items as $i => $item ) {
echo $i, $item;
}
Looping over an array (Ruby) :
items = ['orange', 'pear', 'banana']
# without indexes
items.each do |item|
puts item
end
# with indexes
items.each_with_index do |item, i|
puts i, item
end
Looping over an associative array (PHP) :
<?php
$continents = array(
'africa' => 'Africa',
'europe' => 'Europe',
'north-america' => 'North America'
);
# without keys
foreach ( $continents as $continent ) {
echo $continent;
}
# with keys
foreach ( $continents as $slug => $title ) {
echo $slug, $title;
}
Looping over a hash (Ruby):
continents = {
'africa' => 'Africa',
'europe' => 'Europe',
'north-america' => 'North America'
}
# without keys
continents.each_value do |continent|
puts continent
end
# with keys
continents.each do |slug, title|
puts slug, title
end
In Ruby 1.9 hashes were improved so that they preserved their internal order. In Ruby 1.8, the order in which you inserted items into a hash would have no correlation to the order in which they were stored, and when you iterated over a hash, the results could appear totally random. Now hashes preserve the order of insertion, which is clearly useful when you are using them for keyword arguments in method definitions. (thanks steenslag for correcting me on this)

perl: deep merge with per-element arrays merge

I'm trying to merge two hashes and Hash::Merge does almost exactly what I need, except for arrays. Instead of concatenating arrays I need it to do per-element merge.
For example:
use Hash::Merge qw (merge);
my %a = ( 'arr' => [ { 'a' => 'b' } ] );
my %b = ( 'arr' => [ { 'c' => 'd' } ] );
my %c = %{ merge( \%a, \%b) };
Desired result is ('arr'=>[{'a'=>'b','c'=>'d'}]), actual result is ('arr'=>[{'a'=>'b'},{'c'=>'d'}])
Can this be done by using specify_behavior or is there some other way?
I think that specify_behaviour is used to specify how to handle conflicts, or uneven structures to merge. The documentation doesn't actually say much. But try it, go through defined shortcuts, or try to set them yourself. For your data structure you could try
SCALAR => ARRAY => sub { [ %{$_0}, %{$_[0]} ] }
SCALAR => ARRAY => HASH => sub { [ $_[0], $_[0] ] }
If you tried and it didn't work you may have found a bug in the module? By what you show it just didn't go "deep" enough. Here it is without the module. I've enlarged your sample structures.
use warnings;
use strict;
my %a = (
'arr1' => [ { a => 'A', a1 => 'A1' } ],
'arr2' => [ { aa => 'AA', aa1 => 'AA1' } ]
);
my %b = (
'arr1' => [ { b => 'B', b1 => 'B1' } ],
'arr2' => [ { bb => 'BB', bb1 => 'BB1' } ]
);
# Copy top level, %a to our target %c
my %c;
#c{keys %a} = values %a;
# Iterate over hash keys, then through array
foreach my $key (sort keys %c) {
my $arr_len = #{$c{$key}};
foreach my $i (0..$arr_len-1) {
my %hb = %{ ${$b{$key}}[$i] };
# merge: add %b to %c
#{ ${$c{$key}}[$i] }{keys %hb} = values %hb;
}
}
# Print it out
foreach my $key (sort keys %c) {
print "$key: ";
my $arr_len = #{$c{$key}};
foreach my $i (0..$arr_len-1) {
my %hc = %{ ${$c{$key}}[$i] };
print "$_ => $hc{$_}, " for sort keys %hc;
}
print "\n";
}
This prints the contents of %c (aligned manually here)
arr1: a => A, a1 => A1, b => B, b1 => B1,
arr2: aa => AA, aa1 => AA1, bb => BB, bb1 => BB1,
Code does not handle arrays/hashes of unequal sizes but checks can be added readily.
Another solution (that handles uneven hash elements in %a and %b).
my %c;
foreach my $key (keys %a, keys %b) {
my $a_ref = $a{$key};
my $b_ref = $b{$key};
$c{$key} = { map %$_, #$a_ref, #$b_ref };
}
use Data::Dumper;
print Dumper \%c;

references in perl: hash of array to another array

I have a problem with referencing a hash in an array to another array.
I have an array #result which looks like this:
#result = (
{ "type" => "variable",
"s" => "NGDP",
"variable" => "NGDP" },
{"type" => "subject",
"s" => "USA",
"subject" => "USA",
"variable" => "NGDP" },
{ "type" => "colon",
"s" => ",",
"colon" => "," },
{ "type" => "subject",
"s" => "JPN",
"subject" => "JPN",
"variable" => "NGDP" },
{ "type" => "operator",
"s" => "+",
"operator => "+" },
{"type" => "subject",
"s" => "CHN",
"subject" => "CHN",
"variable" => "NGDP" },
);
I want to divide this array into colons and push elements of the #result array to another array, so i wrote the script:
for ($i = 0; $i <= $#result; $i++) {
if (defined $result[$i]{subject} or $result[$i]{operator} and not defined $result[$i]{colon}) {
push #part_col, \%{$result[$i]};
}
elsif ($i == $#result) {
push #part_col_all, \#part_col;
}
elsif (defined $result[$i]{colon}) {
push #part_col_all, \#part_col;
my #part_col;
}
}
So what I need is that if I print out $part_col_all[0][0]{subject} the result will be "USA",
and for $part_col_all[1][0]{subject} will be "JPN",
and for $part_col_all[1][1]{operator} will be "+" etc.
My result for $part_col_all[0][0]{subject} is "USA"
and for $part_col_all[0][1]{subject} is "JPN" which should be in $part_col_all[1][0]{subject}.
The result for $part_col_all[0][3]{subject} is "CHN", while it should be in $part_col_all[1][2]{subject}.
I'm making an application which is creating graphs from economical data based on a certain economical input. The #result array is my preprocessed input where I know to which country which variable belongs. If I get an input like GDP USA CAN, JPN+CHN I need to split this input to GDP USA CAN and JPN+CHN. That's why I made a condition, if colon is found, push everything in #part_col to the first element of #part_col_all, and then if it's on the end of the input, push JPN+CHN to the second element of #push_col_all.
So #part_col_all should looks like this:
#part_col_all = (
(
{"type" => "subject",
"s" => "USA",
"subject" => "USA",
"variable" => "NGDP" },
{"type" => "subject",
"s" => "CAN",
"subject" => "CAN",
"variable" => "NGDP" },
),
(
{ "type" => "subject",
"s" => "JPN",
"subject" => "JPN",
"variable" => "NGDP" },
{ "type" => "operator",
"s" => "+",
"operator" => "+" },
{"type" => "subject",
"s" => "CHN",
"subject" => "CHN",
"variable" => "NGDP" },
)
);
I dont know what I'm doing wrong. Sorry if there are any basic mistakes, im a beginner. Thanks a lot.
First, you're missing a quote:
{ "type" => "operator",
"s" => "+",
"operator" => "+" },
^ missing
As for printing, you can do the following:
foreach my $part (#part_col){
print $part->{operator}."\n";
}
Or do whatever you want in the print cycle with the values
You should read the Perl Reference Tutorial to help you.
There's no sin in dereferencing to simplify your code:
my #part_col;
my #part_col_all;
for $i ( 0..$#array ) {
my %hash = ${ $result[$i] }; # Make it easy on yourself. Dereference
if ( defined $hash{subject} or defined $hash{operator} and not defined $hash{colon} ) {
push #part_col, \%hash; # or push, #par_col, $result[$i]
}
}
Notice I changed the for from the three part setup you had to a cleaner and easier to understand way of stating it.
Looking closer at your data structure, I notice that $hash{type} will tell you whether or not $hash{operator}, $hash{subject}, or $hash{colon} is defined. Let's just use $hash{type} and simplify that if:
my #part_col;
my #part_col_all;
for my $i ( 0..$#array ) {
my %hash = ${ $result[$i] }; # Make it easy on yourself. Dereference
if ( $hash{type} eq "subject" or $hash{type} eq "operator" ) {
push #part_col, \%hash; # or push, #par_col, $result[$i]
}
}
In fact, since #array is just an array, I'll treat it like one. I'll use a simple for structure to go through each element of my array. Each element is a hash_reference, so:
for my $hash_ref ( #array ) {
my %hash = %{ %hash_ref };
if ( $hash{type} eq "subject" or $hash{type} eq "operator" ) {
push #part_col, \%hash;
}
}
And further simplification, I can dereference and talk about a particular element of my hash all at once by using the -> syntax:
for my $hash_ref ( #array ) {
if ( $hash_ref->{type} eq "subject" or $hash_ref->{type} eq "operator" ) {
push #part_col, $hash_ref;
}
}
I'm trying to understand the rest of your code:
elsif ($i == $#result) {
push #part_col_all, \#part_col;
}
elsif (defined $hash_ref->{colon}) {
push #part_col_all, \#part_col;
my #part_col;
}
}
These pushes of #part_col onto #part_col_all confuse me. Exactly what are you trying to store in #part_col_all? Remember that \#part_col is the location in memory where you're storing #part_col. You're pushing that same memory location over and over onto that hash, so you're storing the same reference over and over again. Is that really what you want? I doubt it.
You need to do is to decide exactly what your data structure really represents. A data structure should have a solid definition. What does the data structure #part_col_all represent? What does the data structure $part_col_all[$i] represent? What does the data structure $part_col_all[$i]->[$j] represent? Without knowing this, it's very hard to answer the rest of your question.
Are you storing elements where the type is colon in one array and everything else in another array? Or are you storing everything in one array, and in another array, storing everything that's not a type colon?
Once I understand this, I can answer the rest of your question.
Addendum
Thank you for your reply, I will try that way and write my results. It is realy helpful. I updated my question with more information about data structure of #part_col_all. I hope that you understand what I'm trying to explain, if not I'll try it again.
If I understand what you're doing, someone enters in NGDP USA , JPN+CNA and that means you're comparing the NGDP between the United States vs. Japan and China combined.
It seems to me that you would want three separate variables:
$parameter - What you are measuring. (GDP, etc.)
#countries_set_1 - The first set of countries
#countries_set_2 - The second set of countries which you're comparing against the first set.
And, what you call the colon (which we would call a comma in the U.S.) as a separator between the first set of countries vs. the second set. Then, you'd simply go through a loop. It could be that the two arrays are merely two elements of the same array, and the sets of countries are array references. I imagine something like this:
#input = qw(GDP USA, JPN CHN); # Compare the GDP of the USA with Japan and China together
my $parameter = shift #input; # Remove what you're measuring
my #country_sets; # An array of arrays
my $set = 0 # Which set you're on
for my $value ( #input ) {
if ( $value eq "," ) {
$set += 1; # Next Set
next;
}
push #{ $country_sets[$set] }, $input;
}
This would create a data structure like this:
#country_sets = (
(
USA,
),
(
JPN,
CHN,
),
)
No need for the complex #results since you're only going to have a single operation (GDP, etc.) for all involved.
However, I think I see what you want. We'll go with an array of arrays. Here's what I had before:
for my $hash_ref ( #array ) {
if ( $hash_ref->{type} eq "subject" or $hash_ref->{type} eq "operator" ) {
push #part_col, $hash_ref;
}
}
We'll combine that and the code I offered right above which splits the countries into two sets:
my #country_sets; # An array of arrays
my $set = 0 # Which set you're on
for my $country_ref ( #array ) {
next if $country_ref->{type} eq "variable"; # We don't want variables
if ( $country_ref{type} eq "colon" ) { # Switch to the other country set
set += 1;
next;
}
push #{ $country_sets[$set] }, $country_ref;
}
The first few entries will go into $country_sets[0] which will be an array reference. After the colon (which won't be input into the set), the second set of countries will go into $country_sets[1] which will be an other array_ref to a reference of hashes:
#country_sets - Contains the input information into two sets
#country_sets[$x] - A particular set of countries (and possibly operator)
#country_sets[$x]->[$y] - A Particular country or operator
#country_sets[$x]->[$y]->{$key} - A particular value from a particular country
Where $x goes from 0 to 1. This will give you something like this:
$country_sets[0] = (
{
"type" => "subject",
"s" => "USA",
"subject" => "USA",
"variable" => "NGDP",
},
)
$country_sets[1] = (
{
"type" => "subject",
"s" => "JPN",
"subject" => "JPN",
"variable" => "NGDP",
},
{
"type" => "operator",
"s" => "+",
"operator => "+",
},
{
"type" => "subject",
"s" => "CHN",
"subject" => "CHN",
"variable" => "NGDP",
},
);

Perl sort by hash value in array of hashes or hash of hashes

Can anybody tell me what I am doing wrong here? I have tried just about every possible combination of array / hash type and sort query I can think of and cannot seem to get this to work.
I am trying to sort the hash ref below by value1 :
my $test = {
'1' => { 'value1' => '0.001000', 'value2' => 'red'},
'2' => { 'value1' => '0.005000', 'value2' => 'blue'},
'3' => { 'value1' => '0.002000', 'value2' => 'green'},
'7' => { 'value1' => '0.002243', 'value2' => 'violet'},
'9' => { 'value1' => '0.001005', 'value2' => 'yellow'},
'20' => { 'value1' => '0.0010200', 'value2' => 'purple'}
};
Using this sort loop:
foreach (sort { $test{$a}->{'value1'} <=> $test{$b}->{'value1'} } keys \%{$test} ){
print "key: $_ value: $test->{$_}->{'value1'}\n"
}
I get:
key: 1 value: 0.001000
key: 3 value: 0.002000
key: 7 value: 0.002243
key: 9 value: 0.001005
key: 2 value: 0.005000
key: 20 value: 0.0010200
I have tried with integers and the same thing seems to happen.
I don't actually need to loop through the hash either I just want it ordered for later use. Its easy to do with an array of hashes, but not so with a hash of hashes..?
Don't call keys on a reference. Call it on the actual hash.
Also, this $test{$a}->, should be $test->{$a}, because $test is a hash reference.
foreach (sort { $test->{$a}{'value1'} <=> $test->{$b}{'value1'} } keys %{$test} ){
print "key: $_ value: $test->{$_}->{'value1'}\n"
}
If you had use strict; and use warnings; turned on, you would've gotten the following error to alert you to an issue:
Global symbol "%test" requires explicit package name
Just wanted to provide a source for the other answers, and a working code example. Like they said, you are calling keys with a hash reference for the argument. According to the documentation:
Starting with Perl 5.14, keys can take a scalar EXPR, which must
contain a reference to an unblessed hash or array. The argument will
be dereferenced automatically. This aspect of keys is considered
highly experimental. The exact behaviour may change in a future
version of Perl.
for (keys $hashref) { ... }
for (keys $obj->get_arrayref) { ... }
However this does work for me:
#!/usr/bin/perl
use strict;
use warnings;
my $test = {
'1' => { 'value1' => '0.001000', 'value2' => 'red'},
'2' => { 'value1' => '0.005000', 'value2' => 'blue'},
'3' => { 'value1' => '0.002000', 'value2' => 'green'},
'7' => { 'value1' => '0.002243', 'value2' => 'violet'},
'9' => { 'value1' => '0.001005', 'value2' => 'yellow'},
'20' => { 'value1' => '0.0010200', 'value2' => 'purple'}
};
foreach (sort { $test->{$a}->{'value1'} <=> $test->{$b}->{'value1'} } keys \%{$test} ) {
print "key: $_ value: $test->{$_}->{'value1'}\n"
}
Example:
matt#mattpc:~/Documents/test/10$ perl test.pl
key: 1 value: 0.001000
key: 9 value: 0.001005
key: 20 value: 0.0010200
key: 3 value: 0.002000
key: 7 value: 0.002243
key: 2 value: 0.005000
matt#mattpc:~/Documents/test/10$ perl --version
This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux-gnu-thread-multi
(with 88 registered patches, see perl -V for more detail)
This is with using a hash reference as the input to keys which I would not recommend.
I'd recommend following the advice of the other questions and adding use strict and use warnings and changing the hash reference to a hash, %{test}.
It's simply keys %$test. The argument of keys must be a hash, not a hashref. \%${test} is the same as $test, a ref.
And use $test->{$a}, not $test{$a}, as $test is a hash-ref, not a hash.
foreach (sort { $test->{$a}->{'value1'} <=> $test->{$b}->'{value1'} } keys %$test) {
print "key: $_ value: $test->{$_}->{'value1'}\n"
}
or a shorter form with some syntactic sugar: You can omit the additional arrows after the first one. And you don't have to quote string literal keys when addressing hashes.
foreach (sort { $test->{$a}{value1} <=> $test->{$b}{value1} } keys %$test) {
print "key: $_ value: $test->{$_}{value1}\n"
}
It usually helps a lot to turn on use warnings;, at least for debugging.
The only wrong thing I can spot is usage of hash ref \%{$test} where you should use hash %$test. keys work with that.

Sorting Hash of Hashes by value

I have the following data structure
my %HoH = {
'foo1' => {
'bam' => 1,
'zip' => 0,
},
'foo2' => {
'bam' => 0,
'zip' => 1,
'boo' => 1
}
};
I would like to sort KEY1 (foo1 or foo2) by the VALUE stored in 'zip' in order from greatest to least.
Here's how I'm doing it.
use strict; use warnings;
use Data::Dumper;
my #sorted;
foreach my $KEY1 (keys %HoH) {
# sort KEY1 by the value 'zip' maps to in descending order
#sorted = sort {$HoH{$KEY1}{'zip'}{$b} <=>
$HoH{$KEY1}{'zip'}{$a}} keys %HoH;
}
print Dumper(\#sorted);
I'm getting an weird warning: Reference found where even-sized list expected at test.pl line 6.
Also print Dumper(\#sorted); is printing
$VAR1 = [
'HASH(0x1b542a8)'
];
When it should be printing
$VAR1 = [
['foo2', 'foo1']
];
Since foo2 has 1 zip and foo1 has 0 zip.
%HoH is declared as a hash, but is defined as a hashreference. Use parentheses (...) instead of braces {...}.
You don't need to loop through the hash to sort it. Sort will take care of that.
if you sort {...} keys %HoH, then the special variables $a and $b represent the keys of %HoH as it performs the sort.
$a and $b are in reverse order because your expected result is in decreasing order. (Update: Oh I just noticed that you had that in the first place.)
The zip value in the nested hash is $HoH{$KEY}{'zip'}, which is what you should sort by.
use strict;
use warnings;
use Data::Dumper;
my %HoH = (
'foo1' => {
'bam' => 1,
'zip' => 0,
},
'foo2' => {
'bam' => 0,
'zip' => 1,
'boo' => 1
}
);
my #sorted = sort {$HoH{$b}{'zip'} <=> $HoH{$a}{'zip'}} keys %HoH;
print Dumper \#sorted;
Note that the result of this code will give you an array:
$VAR1 = [
'foo2',
'foo1'
];
... not a nested array:
$VAR1 = [
['foo2', 'foo1']
];

Resources