Perl: Getting difference between two arrays of hashes? - arrays

I have two array references that contain hashes:
$A = [
{
"t" => "1419054300000",
"v" => "28.1"
},
{
"t" => "1419053400000",
"v" => "28.2"
},
{
"t" => "1419052500000",
"v" => "28.4"
}
];
$B = [
{
"t" => "1419053400000",
"v" => "28.2"
},
{
"t" => "1419052500000",
"v" => "28.4"
}
];
I want to get only the hashes from $A where their value of t doesn't already exist in one of the hashes in $B (the t values are unique per arrayref, v isn't).
I assume there's some obvious method of doing this, but I've been banging my head against this all day without success.

You can use the perl5i diff method.
use perl5i::2;
...initialize $A and $B...
say $A->diff($B)->mo->as_perl;
__END__
[
{
't' => '1419054300000',
'v' => '28.1'
}
]

As always you can build hash look up where keys are elements you want to filter out,
my %seen;
#seen{ map $_->{t}, #$B } = ();
my $C = [
grep { !exists $seen{$_->{t}} } #$A
];

Related

Convert array to multidimensional hash

My task is convert array, containing hash with x keys to x-1 dimensional hash.
Example:
use Data::Dumper;
my $arr = [
{
'source' => 'source1',
'group' => 'group1',
'param' => 'prm1',
'value' => 1,
},
{
'source' => 'source1',
'group' => 'group1',
'param' => 'prm2',
'value' => 2,
},
];
my $res;
for my $i (#$arr) {
$res->{ $i->{source} } = {};
$res->{ $i->{source} }{ $i->{group} } = {};
$res->{ $i->{source} }{ $i->{group} }{ $i->{param} } = $i->{value};
}
warn Dumper $res;
my $res_expected = {
'source1' => {
'group1' => {
'prm1' => 1, # wasn't added, why ?
'prm2' => 2
}
}
};
However it doesn't work as expected, 'prm1' => 1 wasn't added. What is wrong and how to solve this task ?
The problem is that you are assigning to the source even if something was there, and you lose it. Just do a ||= instead of = and you'll be fine.
Or even easier, just use the fact that Perl autovivifies and leave that out.
my $res;
for my $i (#$arr) {
$res->{ $i->{source} }{ $i->{group} }{ $i->{param} } = $i->{value};
}
warn Dumper $res;
The first 2 lines in the for loop are what is causing your problem. They assign a new hash reference each iteration of the loop (and erase what was entered in the previous iteration). In perl, there is no need to set a reference as you did. Just eliminate the first 2 lines and your data structure will be as you wish.
The method you chose only shows 'prmt' => 2 because that was the last item entered.

Ruby pick up a value in hash of array to reformat into a hash

Is there a way I can pick a value in hash of array, and reformat it to be only hash?
Is there any method I can do with it?
Example
[
{
"qset_id" => 1,
"name" => "New1"
},
{
"qset_id" => 2,
"name" => "New2"
}
]
Result
{
1 => {
"name" => "New1"
},
2 => {
"name" => "New2"
}
}
You can basically do arbitary manipulation using reduce function on array or hashes, for example this will get your result
array.reduce({}) do |result, item|
result[item["qset_id"]] = { "name" => item["name"] }
result
end
You can do the same thing with each.with_object do:
array.each.with_object({}) do |item, result|
result[item["qset_id"]] = { "name" => item["name"] }
end
it's basically the same thing but you don't have to make each iteration return the result (called a 'memo object').
You could iterate over the first hash and map it into a second hash:
h1.map{|h| {h['qset_id'] => {'name' => h['name']}} }
# => [{1=>{"name"=>"New1"}}, {2=>{"name"=>"New2"}}]
... but that would return an array. You could pull the elements into a second hash like this:
h2 = {}
h1.each do |h|
h2[h['qset_id']] = {'name' => h['name']}
end
>> h2
=> {1=>{"name"=>"New1"}, 2=>{"name"=>"New2"}}

perl: deep merge with per-element arrays merge

I'm trying to merge two hashes and Hash::Merge does almost exactly what I need, except for arrays. Instead of concatenating arrays I need it to do per-element merge.
For example:
use Hash::Merge qw (merge);
my %a = ( 'arr' => [ { 'a' => 'b' } ] );
my %b = ( 'arr' => [ { 'c' => 'd' } ] );
my %c = %{ merge( \%a, \%b) };
Desired result is ('arr'=>[{'a'=>'b','c'=>'d'}]), actual result is ('arr'=>[{'a'=>'b'},{'c'=>'d'}])
Can this be done by using specify_behavior or is there some other way?
I think that specify_behaviour is used to specify how to handle conflicts, or uneven structures to merge. The documentation doesn't actually say much. But try it, go through defined shortcuts, or try to set them yourself. For your data structure you could try
SCALAR => ARRAY => sub { [ %{$_0}, %{$_[0]} ] }
SCALAR => ARRAY => HASH => sub { [ $_[0], $_[0] ] }
If you tried and it didn't work you may have found a bug in the module? By what you show it just didn't go "deep" enough. Here it is without the module. I've enlarged your sample structures.
use warnings;
use strict;
my %a = (
'arr1' => [ { a => 'A', a1 => 'A1' } ],
'arr2' => [ { aa => 'AA', aa1 => 'AA1' } ]
);
my %b = (
'arr1' => [ { b => 'B', b1 => 'B1' } ],
'arr2' => [ { bb => 'BB', bb1 => 'BB1' } ]
);
# Copy top level, %a to our target %c
my %c;
#c{keys %a} = values %a;
# Iterate over hash keys, then through array
foreach my $key (sort keys %c) {
my $arr_len = #{$c{$key}};
foreach my $i (0..$arr_len-1) {
my %hb = %{ ${$b{$key}}[$i] };
# merge: add %b to %c
#{ ${$c{$key}}[$i] }{keys %hb} = values %hb;
}
}
# Print it out
foreach my $key (sort keys %c) {
print "$key: ";
my $arr_len = #{$c{$key}};
foreach my $i (0..$arr_len-1) {
my %hc = %{ ${$c{$key}}[$i] };
print "$_ => $hc{$_}, " for sort keys %hc;
}
print "\n";
}
This prints the contents of %c (aligned manually here)
arr1: a => A, a1 => A1, b => B, b1 => B1,
arr2: aa => AA, aa1 => AA1, bb => BB, bb1 => BB1,
Code does not handle arrays/hashes of unequal sizes but checks can be added readily.
Another solution (that handles uneven hash elements in %a and %b).
my %c;
foreach my $key (keys %a, keys %b) {
my $a_ref = $a{$key};
my $b_ref = $b{$key};
$c{$key} = { map %$_, #$a_ref, #$b_ref };
}
use Data::Dumper;
print Dumper \%c;

How do I breakdown common elements in hash of arrays in perl?

I am trying to find any intersections of elements within a hash of arrays in Perl
For example
my %test = (
Lot1 => [ "A","B","C"],
Lot2 => [ "A","B","C"],
Lot3 => ["C"],
Lot4 => ["E","F"],
);
The result I would be after is
Lot1 and Lot2 have AB
Lot1,Lot2 and Lot3 have C
Lot4 has E and F.
I think this could be done with a recursive function that effectively moves its way through the arrays and if an intersection between two arrays is found it calls itself recursively with the intersection found and the next array. The stopping condition would be running out of arrays.
Once the function is exited I would have to iterate through the hash to get the arrays that contain these values.
Does this sound like a good approach? I have been struggling with the code, but was going to use List::Compare to determine the intersection.
Thank you.
Array::Utils has an intersection operation where you can test the intersect of two arrays. But that's only the start point of what you're trying to do.
So I would be thinking that you need to first invert your lookup:
my %member_of;
foreach my $key ( keys %test ) {
foreach my $element ( #{$test{$key}} ) {
push ( #{$member_of{$element}}, $key );
}
}
print Dumper \%member_of;
Giving:
$VAR1 = {
'A' => [
'Lot1',
'Lot2'
],
'F' => [
'Lot4'
],
'B' => [
'Lot1',
'Lot2'
],
'E' => [
'Lot4'
],
'C' => [
'Lot1',
'Lot2',
'Lot3'
]
};
Then collapse that, into a key set:
my %new_set;
foreach my $element ( keys %member_of ) {
my $set = join( ",", #{ $member_of{$element} } );
push( #{ $new_set{$set} }, $element );
}
print Dumper \%new_set;
Giving:
$VAR1 = {
'Lot1,Lot2,Lot3' => [
'C'
],
'Lot1,Lot2' => [
'A',
'B'
],
'Lot4' => [
'E',
'F'
]
};
So overall:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my %test = (
Lot1 => [ "A", "B", "C" ],
Lot2 => [ "A", "B", "C" ],
Lot3 => ["C"],
Lot4 => [ "E", "F" ],
);
my %member_of;
foreach my $key ( sort keys %test ) {
foreach my $element ( #{ $test{$key} } ) {
push( #{ $member_of{$element} }, $key );
}
}
my %new_set;
foreach my $element ( sort keys %member_of ) {
my $set = join( ",", #{ $member_of{$element} } );
push( #{ $new_set{$set} }, $element );
}
foreach my $set ( sort keys %new_set ) {
print "$set contains: ", join( ",", #{ $new_set{$set} } ), "\n";
}
I don't think there's a more efficient way to tackle it, because you're comparing each array to each other array, and forming a new compound key out of it.
This gives you:
Lot1,Lot2 contains: A,B
Lot1,Lot2,Lot3 contains: C
Lot4 contains: E,F
This can be done as two simple hash conversions:
Build a hash that lists all of the lots each item is in
Convert that to a hash that lists all items for each lot combination
Then just dump the last hash in a convenient form
This is the code.
use strict;
use warnings 'all';
use feature 'say';
my %test = (
Lot1 => [ "A", "B", "C" ],
Lot2 => [ "A", "B", "C" ],
Lot3 => ["C"],
Lot4 => [ "E", "F" ],
);
my %items;
for my $lot ( keys %test ) {
for my $item ( #{ $test{$lot} } ) {
push #{ $items{$item} }, $lot;
}
}
my %lots;
for my $item ( keys %items ) {
my $lots = join '!', sort #{ $items{$item} };
push #{ $lots{$lots} }, $item;
}
for my $lots ( sort keys %lots ) {
my #lots = split /!/, $lots;
my $items = join '', #{ $lots{$lots} };
$lots = join ', ', #lots;
$lots =~ s/.*\K,/ and/;
printf "%s %s %s\n", $lots, #lots > 1 ? 'have' : 'has', $items;
}
output
Lot1 and Lot2 have AB
Lot1, Lot2 and Lot3 have C
Lot4 has EF
It generates an %items hash that looks like this
{
A => ["Lot2", "Lot1"],
B => ["Lot2", "Lot1"],
C => ["Lot2", "Lot3", "Lot1"],
E => ["Lot4"],
F => ["Lot4"],
}
and from that a %lots hash that looks like this
{
"Lot1!Lot2" => ["A", "B"],
"Lot1!Lot2!Lot3" => ["C"],
"Lot4" => ["E", "F"],
}

references in perl: hash of array to another array

I have a problem with referencing a hash in an array to another array.
I have an array #result which looks like this:
#result = (
{ "type" => "variable",
"s" => "NGDP",
"variable" => "NGDP" },
{"type" => "subject",
"s" => "USA",
"subject" => "USA",
"variable" => "NGDP" },
{ "type" => "colon",
"s" => ",",
"colon" => "," },
{ "type" => "subject",
"s" => "JPN",
"subject" => "JPN",
"variable" => "NGDP" },
{ "type" => "operator",
"s" => "+",
"operator => "+" },
{"type" => "subject",
"s" => "CHN",
"subject" => "CHN",
"variable" => "NGDP" },
);
I want to divide this array into colons and push elements of the #result array to another array, so i wrote the script:
for ($i = 0; $i <= $#result; $i++) {
if (defined $result[$i]{subject} or $result[$i]{operator} and not defined $result[$i]{colon}) {
push #part_col, \%{$result[$i]};
}
elsif ($i == $#result) {
push #part_col_all, \#part_col;
}
elsif (defined $result[$i]{colon}) {
push #part_col_all, \#part_col;
my #part_col;
}
}
So what I need is that if I print out $part_col_all[0][0]{subject} the result will be "USA",
and for $part_col_all[1][0]{subject} will be "JPN",
and for $part_col_all[1][1]{operator} will be "+" etc.
My result for $part_col_all[0][0]{subject} is "USA"
and for $part_col_all[0][1]{subject} is "JPN" which should be in $part_col_all[1][0]{subject}.
The result for $part_col_all[0][3]{subject} is "CHN", while it should be in $part_col_all[1][2]{subject}.
I'm making an application which is creating graphs from economical data based on a certain economical input. The #result array is my preprocessed input where I know to which country which variable belongs. If I get an input like GDP USA CAN, JPN+CHN I need to split this input to GDP USA CAN and JPN+CHN. That's why I made a condition, if colon is found, push everything in #part_col to the first element of #part_col_all, and then if it's on the end of the input, push JPN+CHN to the second element of #push_col_all.
So #part_col_all should looks like this:
#part_col_all = (
(
{"type" => "subject",
"s" => "USA",
"subject" => "USA",
"variable" => "NGDP" },
{"type" => "subject",
"s" => "CAN",
"subject" => "CAN",
"variable" => "NGDP" },
),
(
{ "type" => "subject",
"s" => "JPN",
"subject" => "JPN",
"variable" => "NGDP" },
{ "type" => "operator",
"s" => "+",
"operator" => "+" },
{"type" => "subject",
"s" => "CHN",
"subject" => "CHN",
"variable" => "NGDP" },
)
);
I dont know what I'm doing wrong. Sorry if there are any basic mistakes, im a beginner. Thanks a lot.
First, you're missing a quote:
{ "type" => "operator",
"s" => "+",
"operator" => "+" },
^ missing
As for printing, you can do the following:
foreach my $part (#part_col){
print $part->{operator}."\n";
}
Or do whatever you want in the print cycle with the values
You should read the Perl Reference Tutorial to help you.
There's no sin in dereferencing to simplify your code:
my #part_col;
my #part_col_all;
for $i ( 0..$#array ) {
my %hash = ${ $result[$i] }; # Make it easy on yourself. Dereference
if ( defined $hash{subject} or defined $hash{operator} and not defined $hash{colon} ) {
push #part_col, \%hash; # or push, #par_col, $result[$i]
}
}
Notice I changed the for from the three part setup you had to a cleaner and easier to understand way of stating it.
Looking closer at your data structure, I notice that $hash{type} will tell you whether or not $hash{operator}, $hash{subject}, or $hash{colon} is defined. Let's just use $hash{type} and simplify that if:
my #part_col;
my #part_col_all;
for my $i ( 0..$#array ) {
my %hash = ${ $result[$i] }; # Make it easy on yourself. Dereference
if ( $hash{type} eq "subject" or $hash{type} eq "operator" ) {
push #part_col, \%hash; # or push, #par_col, $result[$i]
}
}
In fact, since #array is just an array, I'll treat it like one. I'll use a simple for structure to go through each element of my array. Each element is a hash_reference, so:
for my $hash_ref ( #array ) {
my %hash = %{ %hash_ref };
if ( $hash{type} eq "subject" or $hash{type} eq "operator" ) {
push #part_col, \%hash;
}
}
And further simplification, I can dereference and talk about a particular element of my hash all at once by using the -> syntax:
for my $hash_ref ( #array ) {
if ( $hash_ref->{type} eq "subject" or $hash_ref->{type} eq "operator" ) {
push #part_col, $hash_ref;
}
}
I'm trying to understand the rest of your code:
elsif ($i == $#result) {
push #part_col_all, \#part_col;
}
elsif (defined $hash_ref->{colon}) {
push #part_col_all, \#part_col;
my #part_col;
}
}
These pushes of #part_col onto #part_col_all confuse me. Exactly what are you trying to store in #part_col_all? Remember that \#part_col is the location in memory where you're storing #part_col. You're pushing that same memory location over and over onto that hash, so you're storing the same reference over and over again. Is that really what you want? I doubt it.
You need to do is to decide exactly what your data structure really represents. A data structure should have a solid definition. What does the data structure #part_col_all represent? What does the data structure $part_col_all[$i] represent? What does the data structure $part_col_all[$i]->[$j] represent? Without knowing this, it's very hard to answer the rest of your question.
Are you storing elements where the type is colon in one array and everything else in another array? Or are you storing everything in one array, and in another array, storing everything that's not a type colon?
Once I understand this, I can answer the rest of your question.
Addendum
Thank you for your reply, I will try that way and write my results. It is realy helpful. I updated my question with more information about data structure of #part_col_all. I hope that you understand what I'm trying to explain, if not I'll try it again.
If I understand what you're doing, someone enters in NGDP USA , JPN+CNA and that means you're comparing the NGDP between the United States vs. Japan and China combined.
It seems to me that you would want three separate variables:
$parameter - What you are measuring. (GDP, etc.)
#countries_set_1 - The first set of countries
#countries_set_2 - The second set of countries which you're comparing against the first set.
And, what you call the colon (which we would call a comma in the U.S.) as a separator between the first set of countries vs. the second set. Then, you'd simply go through a loop. It could be that the two arrays are merely two elements of the same array, and the sets of countries are array references. I imagine something like this:
#input = qw(GDP USA, JPN CHN); # Compare the GDP of the USA with Japan and China together
my $parameter = shift #input; # Remove what you're measuring
my #country_sets; # An array of arrays
my $set = 0 # Which set you're on
for my $value ( #input ) {
if ( $value eq "," ) {
$set += 1; # Next Set
next;
}
push #{ $country_sets[$set] }, $input;
}
This would create a data structure like this:
#country_sets = (
(
USA,
),
(
JPN,
CHN,
),
)
No need for the complex #results since you're only going to have a single operation (GDP, etc.) for all involved.
However, I think I see what you want. We'll go with an array of arrays. Here's what I had before:
for my $hash_ref ( #array ) {
if ( $hash_ref->{type} eq "subject" or $hash_ref->{type} eq "operator" ) {
push #part_col, $hash_ref;
}
}
We'll combine that and the code I offered right above which splits the countries into two sets:
my #country_sets; # An array of arrays
my $set = 0 # Which set you're on
for my $country_ref ( #array ) {
next if $country_ref->{type} eq "variable"; # We don't want variables
if ( $country_ref{type} eq "colon" ) { # Switch to the other country set
set += 1;
next;
}
push #{ $country_sets[$set] }, $country_ref;
}
The first few entries will go into $country_sets[0] which will be an array reference. After the colon (which won't be input into the set), the second set of countries will go into $country_sets[1] which will be an other array_ref to a reference of hashes:
#country_sets - Contains the input information into two sets
#country_sets[$x] - A particular set of countries (and possibly operator)
#country_sets[$x]->[$y] - A Particular country or operator
#country_sets[$x]->[$y]->{$key} - A particular value from a particular country
Where $x goes from 0 to 1. This will give you something like this:
$country_sets[0] = (
{
"type" => "subject",
"s" => "USA",
"subject" => "USA",
"variable" => "NGDP",
},
)
$country_sets[1] = (
{
"type" => "subject",
"s" => "JPN",
"subject" => "JPN",
"variable" => "NGDP",
},
{
"type" => "operator",
"s" => "+",
"operator => "+",
},
{
"type" => "subject",
"s" => "CHN",
"subject" => "CHN",
"variable" => "NGDP",
},
);

Resources