Perl Compare hash of arrays with another array - arrays

I am trying to compare all the array values (complete array) with a hash's value(which is array) and if the match founds,then push the key of hash to new array.
The below code compare if the hash value is not array but how can I compare if its array?
%hash=(
storeA=>['milk','eggs'],
storeB=>['milk','fruits','eggs','vegetables'],
storeC=>['milk','fruits','eggs'],
);
#array = (
'fruits',
'milk',
'eggs'
);
Code to compare
use strict;
use warnings;
use Data::Dumper;
foreach my $thing (#array) {
foreach ((my $key, my $value) = each %hash) {
if ($value eq $thing) {
push #new_array, $key;
}
}
}
print Dumper(\#new_array);
Expected Output
#new_array = (
storeB,
storeC
);

You could also use a combination of all and any form List::Util :
while ((my $key, my $value) = each %hash) {
if ( all { my $temp = $_; any { $_ eq $temp } #$value } #array ) {
push #new_array, $key;
}
}
So here you are looking for the case where all the elements of #array exists in the given array from the hash.

I would build a hash out of each store's stock array. It's a wasteful method, but not egregiously so as long as the real data isn't enormous
Like this. The inner grep statement counts the number of items in #list that are available at this store and compares it to the number of items in the list, returning true if everything is in stock
If this is a real situation (I suspect it's homework) then for all practical purposes that I can think of, the %stocks hash should contain hashes of the items available at each store
use strict;
use warnings 'all';
my %stocks = (
storeA => [ qw/ milk eggs / ],
storeB => [ qw/ milk fruits eggs vegetables / ],
storeC => [ qw/ milk fruits eggs / ],
);
my #list = qw/ fruits milk eggs /;
my #stores = grep {
my %stock = map { $_ => 1 } #{ $stocks{$_} };
grep($stock{$_}, #list) == #list;
} keys %stocks;
use Data::Dump;
dd \#stores;
output
["storeB", "storeC"]

Find the intersection of the two sets, if the number of its elements is the number of the elements in the array, you want to store the key:
#!/usr/bin/perl
use warnings;
use strict;
sub intersect {
my ($arr1, $arr2) = #_;
my %intersection;
$intersection{$_}{0}++ for #$arr1;
$intersection{$_}{1}++ for #$arr2;
return grep 2 == keys %{ $intersection{$_} }, keys %intersection
}
my %hash = (
storeA => [qw[ milk eggs ]],
storeB => [qw[ milk fruits eggs vegetables ]],
storeC => [qw[ milk fruits eggs ]],
);
my #array = qw( fruits milk eggs );
my #new_array;
while (my ($store, $arr) = each %hash) { # while, not for!
push #new_array, $store if #array == intersect(\#array, $arr);
}
use Data::Dumper;
print Dumper(\#new_array);

Simply try this. One small trick i done here. grep was use to filter the element from an array.
I created the variable $joined_array which contain the | separated #array data. Then i pass the variable into the grep.
And the trick is, when the array is compare with a scalar data, the comparison is behave, the total number of an array element with scalar data.
my #array = qw(one two three);
if(#array == 3)
{
print "Hi\n";
}
Here condition is internally run as 3 == 3.
That the same logic i done here.
use warnings;
use strict;
my %hash=(
"storeA"=>['milk','eggs'],
"storeB"=>['milk','fruits','eggs','vegetables'],
"storeC"=>['milk','fruits','eggs'],
);
my #array = (
'fruits',
'milk',
'eggs'
);
my #new_array;
my $joined_array = join("|",#array);
foreach (keys %hash)
{
push(#new_array,$_) if ((grep{ /\b$joined_array\b/ } #{$hash{$_}}) >= scalar #array);
}
print "$_\n" for #new_array

Go through all stores (keys) and for each check whether all array elems are in the key's array-ref.
use strict;
use warnings;
my %inventory = (
storeA => ['milk','eggs'],
storeB => ['milk','fruits','eggs','vegetables'],
storeC => ['milk','fruits','eggs'],
);
my #items = ('fruits', 'milk', 'eggs');
my #found;
foreach my $store (keys %inventory) {
push #found, $store
if #items == grep { in_store($_, $inventory{$store}) } #items
}
sub in_store { for (#{$_[1]}) { return 1 if $_[0] eq $_ }; return 0; }
print "#found\n"; # prints: storeB storeC
The grep block checks for each item whether it is (available) in the store, and if the number of those that pass is equal to the number of items (array size) that store has all items and is added. Note that a subroutine returns the last value evaluated without an explicit return, so the final return is not needed. It was added for clarity.

Related

Compare two hash of arrays

I have two arrays and a hash holds these arrays
Array 1:
my $group = "west"
#{ $my_big_hash{$group} } = (1534,2341,2322,3345,689,3333,4444,5533,3334,5666,6676,3435);
Array 2 :
my $element = "Location" ;
my $group = "west" ;
#{ $my_tiny_hash{$element}{$group} } = (153,333,667,343);
Now i would want to compare
#{ $my_tiny_hash{$element}{$group} }
with
#{ $my_big_hash{$group} }
and check whether all the elements of tiny hash array are a part of big_hash array .
As we can see tiny hash has just 3 digit elements and all these elements are matching with big hash if we just compare the first 3 digits
if first 3 digits/letters match and all are available in the big array, then its matching or We have to print the unmatched elements
Its an array to array comparison.
How do we achieve it.
PS : Without Array Utils , How to achieve it
The solution using Array Utils is really simple
my #minus = array_minus( #{ $my_tiny_hash{$element}{$group} } , #{ $my_big_hash{$group} } );
But it compares all the digits and i would just want to match the first 3 digits
Hope this is clear
Thanks
This seems to do what you want.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my (%big_hash, %tiny_hash);
my $group = 'west';
my $element = 'Location';
# Less confusing initialisation!
$big_hash{$group} = [1534,2341,2322,3345,689,3333,4444,5533,3334,5666,6676,3435];
$tiny_hash{$element}{$group} = [153,333,667,343];
# Create a hash where the keys are the first three digits of the numbers
# in the big array. Doesn't matter what the values are.
my %check_hash = map { substr($_, 0, 3) => 1 } #{ $big_hash{$group} };
# grep the small array by checking the elements' existence in %check_hash
my #missing = grep { ! exists $check_hash{$_} } #{ $tiny_hash{$element}{$group} };
say "Missing items: #missing";
Update: Another solution that seems closer to your original code.
my #truncated_big_array = map { substr($_, 0, 3) } #{ $big_hash{$group} };
my #minus = array_minus( #{ $my_tiny_hash{$element}{$group} } , #truncated_big_array );
A quick and bit dirty solution (which extends your existing code).
#!/usr/bin/perl
use strict;
use warnings;
my (%my_big_hash, %my_tiny_hash, #temp_array);
my $group = "west";
#{ $my_big_hash{$group} } = (1534,343,2341,2322,3345,689,3333,4444,5533,3334,5666,6676,3435);
foreach (#{ $my_big_hash{$group} }){
push #temp_array, substr $_, 0,3;
}
my $element = "Location";
my $group2 = "west";
#{ $my_tiny_hash{$element}{$group2} } = (153,333,667,343,698);
#solution below
my %hash = map { $_ => 1 } #temp_array;
foreach my $search (#{$my_tiny_hash{'Location'}->{west}}){
if (exists $hash{$search}){
print "$search exists\n";
}
else{
print "$search does not exist\n";
}
}
Output:
153 exists
333 exists
667 exists
343 exists
698 does not exist
Demo
Also see: https://stackoverflow.com/a/39585810/257635
Edit: As per request using Array::Utils.
foreach (#{ $my_big_hash{$group} }){
push #temp_array, substr $_, 0,3;
}
my #minus = array_minus( #{ $my_tiny_hash{$element}{$group} } , #temp_array );
print "#minus";
An alternative, using ordered comparison instead of hashes:
#big = sort (1534,2341,2322,3345,689,3333,4444,5533,3334,5666,6676,3435);
#tiny = sort (153,333,667,343,698);
for(#tiny){
shift #big while #big and ($big[0] cmp $_) <0;
push #{$result{
$_ eq substr($big[0],0,3)
? "found" : "missing" }},
$_;
}
Contents of %result:
{
'found' => [
153,
333,
343,
667
],
'missing' => [
698
]
}

Separating CSV file into key and array

I am new to perl, and I am trying to separate a csv file (has 10 comma-separated items per line) into a key (first item) and an array (9 items) to put in a hash. Eventually, I want to use an if function to match another variable to the key in the hash and print out the elements in the array.
Here's the code I have, which doesn't work right.
use strict;
use warnings;
my %hash;
my $in2 = "metadata1.csv";
open IN2, "<$in2" or die "Cannot open the file: $!";
while (my $line = <IN2>) {
my ($key, #value) = split (/,/, $line, 2);
%hash = (
$key => #value
);
}
foreach my $key (keys %hash)
{
print "The key is $key and the array is $hash{$key}\n";
}
Thank you for any help!
Don't use 2 as the third argument to split: it will split the line to only two elements, so there'll be just one #value.
Also, by doing %hash =, you're overwriting the hash in each iteration of the loop. Just add a new key/value pair:
$hash{$key} = \#value;
Note the \ sign: you can't store an array directly as a hash value, you have to store a reference to it. When printing the value, you have to dereference it back:
#! /usr/bin/perl
use warnings;
use strict;
my %hash;
while (<DATA>) {
my ($key, #values) = split /,/;
$hash{$key} = \#values;
}
for my $key (keys %hash) {
print "$key => #{ $hash{$key} }";
}
__DATA__
id0,1,2,a
id1,3,4,b
id2,5,6,c
If your CSV file contains quoted or escaped commas, you should use Text::CSV.
First of all hash can have only one unique key, so when you have lines like these in your CSV file:
key1,val11,val12,val13,val14,val15,val16,val17,val18,val19
key1,val21,val22,val23,val24,val25,val26,val27,val28,val29
after adding both key/value pairs with 'key1' key to the hash, you'll get just one pair saved in the hash, the one that were added to the hash later.
So to keep all records, the result you probably need array of hashes structure, where value of each hash is an array reference, like this:
#result = (
{ 'key1' => ['val11','val12','val13','val14','val15','val16','val17','val18','val19'] },
{ 'key1' => ['val21','val22','val23','val24','val25','val26','val27','val28','val29'] },
{ 'and' => ['so on'] },
);
In order to achieve that your code should become like this:
use strict;
use warnings;
my #AoH; # array of hashes containing data from CSV
my $in2 = "metadata1.csv";
open IN2, "<$in2" or die "Cannot open the file: $!";
while (my $line = <IN2>) {
my #string_bits = split (/,/, $line);
my $key = $string_bits[0]; # first element - key
my $value = [ #string_bits[1 .. $#string_bits] ]; # rest get into arr ref
push #AoH, {$key => $value}; # array of hashes structure
}
foreach my $hash_ref (#AoH)
{
my $key = (keys %$hash_ref)[0]; # get hash key
my $value = join ', ', #{ $hash_ref->{$key} }; # join array into str
print "The key is '$key' and the array is '$value'\n";
}

Find Duplicate arrays and Intersection of arrays in array of hash values using Perl

I want to find duplicate Arrays from hash that contains arrays. Point is, I am trying to develop sets and storing them into hash table of Perl. After, I need to extract
1. those arrays which are completely duplicate(Having all values same).
2. Intersection of arrays
Source code is given as under:
use strict;
use warnings;
my #test1= ("Bob", "Flip", "David");
my #test2= ("Bob", "Kevin", "John", "Michel");
my #test3= ("Bob", "Flip", "David");
my #test4= ("Haidi", "Bob", "Grook", "Franky");
my #test5= ();
my #test6=();
my %arrayHash= ( "ppl1" => [#test1],
"ppl2"=> [#test2],
"ppl3" => [#test3],
"ppl4"=> [#test4],
"ppl5"=> [#test5],
"ppl6"=> [#test6],
);
Required Output: ppl1 and ppl3 have duplicate lists
Intersection of arrays= Bob
Kindly note that duplication of empty arrays is not desired!
So there's a set of steps here:
compare your arrays one to the other. This is harder because you're doing multi-element arrays. You can't directly test equivalence, because you need to compare members.
Filter one from the other.
So first of all:
(Edit: Coping with empty)
#!/usr/bin/env perl
use strict;
use warnings;
my #test1 = ( "Bob", "Flip", "David" );
my #test2 = ( "Kevin", "John", "Michel" );
my #test3 = ( "Bob", "Flip", "David" );
my #test4 = ( "Haidi", "Grook", "Franky" );
my #test5 = ();
my #test6 = ();
my %arrayHash = (
"ppl1" => [#test1],
"ppl2" => [#test2],
"ppl3" => [#test3],
"ppl4" => [#test4],
"ppl5" => [#test5],
"ppl6" => [#test6],
);
my %seen;
#cycle through the hash
foreach my $key ( sort keys %arrayHash ) {
#skip empty:
next unless #{ $arrayHash{$key} };
#turn your array into a string - ':' separated
my $value_str = join( ":", sort #{ $arrayHash{$key} } );
#check if that 'value string' has already been seen
if ( $seen{$value_str} ) {
print "$key is a duplicate of $seen{$value_str}\n";
}
$seen{$value_str} = $key;
}
Now note - this is a bit of a cheat - it sticks together your arrays with :, which doesn't work in every scenario.
("Bob:", "Flip") and ("Bob", ":Flip") will end up the same.
It will also only print your most recent duplicate if you have multiple.
You can work around this - if you want - by pushing multiple values into the %seen hash.
You need to check two arrays for equality for the hash keys.For that you can use smart match operator for comparison.
Next you can use grep to filter-out values which are not duplicates and a hash to keep track of values which are already checked.
#!/usr/bin/perl
use strict;
use warnings;
my #test1= ("Bob", "Flip", "David");
my #test2= ("Kevin", "John", "Michel");
my #test3= ("Bob", "Flip", "David");
my #test4= ("Haidi", "Grook", "Franky");
my #test5= ("Bob", "Flip", "David");
my #test6= ("Kevin", "John", "Michel");
my #test7= ("Haidi", "Grook", "Frank4");
my %arrayHash= ( "ppl1" => [#test1],
"ppl2"=> [#test2],
"ppl3" => [#test3],
"ppl4"=> [#test4],
"ppl5"=> [#test5],
"ppl6"=> [#test6],
"ppl7"=> [#test7]
);
my %seen;
foreach my $key1 (sort keys %arrayHash){
next unless #{$arrayHash{$key1}};
my #keys;
if(#keys=grep{(#{$arrayHash{$key1}} ~~ #{$arrayHash{$_}} ) && ($_ ne $key1) && (not exists $seen{$key1})}sort keys %arrayHash){
unshift(#keys,$key1);
print "#keys are duplicates \n";
#seen{#keys}=#keys;
}
}
output:
ppl1 ppl3 ppl5 are duplicates
ppl2 ppl6 are duplicates
use strict;
use warnings;
my #test1= ("Bob", "Flip", "David");
my #test2= ("Kevin", "John", "Michel");
my #test3= ("Bob", "Flip", "David");
my #test4= ("Haidi", "Grook", "Franky");
my %arrayHash= ( "1" => \#test1,
"2"=> \#test2,
"3" => \#test3,
"4"=> \#test4,
);
sub arrayCmp {
my #array1 = #{$_[0]};
my #array2 = #{$_[1]};
return 0 if ($#array1 != $#array2);
#array1 = sort(#array1);
#array2 = sort(#array2);
for (my $ii = 0; $ii <= $#array1; $ii++) {
if ($array1[$ii] ne $array2[$ii]) {
#print "$array1[$ii] != $array2[$ii]\n";
return 0;
}
}
return 1;
}
my #keyArr = sort(keys(%arrayHash));
for(my $i = 0; $i <= $#keyArr - 1; $i++) {
my #arr1 = #{$arrayHash{$keyArr[$i]}};
for(my $j = 1; $j <= $#keyArr; $j++) {
my #arr2 = #{$arrayHash{$keyArr[$j]}};
if ($keyArr[$i] ne $keyArr[$j] && arrayCmp(\#arr1, \#arr2) == 1) {
print "$keyArr[$i] and $keyArr[$j] are duplicates\n";
}
}
}
Outputs this
1 and 3 are duplicates

ID tracking while swapping and sorting other two arrays in perl

#! /usr/bin/perl
use strict;
my (#data,$data,#data1,#diff,$diff,$tempS,$tempE, #ID,#Seq,#Start,#End, #data2);
#my $file=<>;
open(FILE, "< ./out.txt");
while (<FILE>){
chomp $_;
#next if ($line =~/Measurement count:/ or $line =~/^\s+/) ;
#push #data, [split ("\t", $line)] ;
my #data = split('\t');
push(#ID, $data[0]);
push(#Seq, $data[1]);
push(#Start, $data[2]);
push(#End, $data[3]);
# push #$data, [split ("\t", $line)] ;
}
close(FILE);
my %hash = map { my $key = "$ID[$_]"; $key => [ $Start[$_], $End[$_] ] } (0..$#ID);
for my $key ( %hash ) {
print "Key: $key contains: ";
for my $value ($hash{$key} ) {
print " $hash{$key}[0] ";
}
print "\n";
}
for (my $j=0; $j <=$#Start ; $j++)
{
if ($Start[$j] > $End[$j])
{
$tempS=$Start[$j];
$Start[$j]=$End[$j];
$End[$j]=$tempS;
}
print"$tempS\t$Start[$j]\t$End[$j]\n";
}
my #sortStart = sort { $a <=> $b } #Start;
my #sortEnd = sort { $a <=> $b } #End;
#open(OUT,">>./trial.txt");
for(my $i=1521;$i>=0;$i--)
{
print "hey";
my $diff = $sortStart[$i] - $sortStart[$i-1];
print "$ID[$i]\t$diff\n";
}
I have three arrays of same length, ID with IDs (string), Start and End with integer values (reading from a file).
I want to loop through all these arrays and also want to keep track of IDs. First am swapping elements in Start with End if Start > End, then I have to sort these two arrays for further application (as I am negating Start[0]-Start[1] for each item in that Start). While sorting, the Id values may change, and as my IDs are unique for each Start and End elements, how can I keep track of my IDs while sorting them?
Three arrays, ID, Start and End, are under my consideration.
Here is a small chunk of my input data:
DQ704383 191990066 191990037
DQ698580 191911184 191911214
DQ724878 191905507 191905532
DQ715191 191822657 191822686
DQ722467 191653368 191653339
DQ707634 191622552 191622581
DQ715636 191539187 191539157
DQ692360 191388765 191388796
DQ722377 191083572 191083599
DQ697520 189463214 189463185
DQ709562 187245165 187245192
DQ540163 182491372 182491400
DQ720940 180753033 180753060
DQ707760 178340696 178340726
DQ725442 178286164 178286134
DQ711885 178250090 178250119
DQ718075 171329314 171329344
DQ705091 171062479 171062503
The above ID, Start, End respectively. If Start > End i swapped them only between those two arrays. But after swapping the descending order may change, but i want them in descending order also their corresponding ID for negation as explained above.
Don't use different arrays, use a hash to keep the related pieces of information together.
#!/usr/bin/perl
use warnings;
use strict;
use enum qw( START END );
my %hash;
while (<>) {
my ($id, $start, $end) = split;
$hash{$id} = [ $start < $end ? ($start, $end)
: ($end, $start) ];
}
my #by_start = sort { $hash{$a}[START] <=> $hash{$b}[START] } keys %hash;
my #by_end = sort { $hash{$a}[END] <=> $hash{$b}[END] } keys %hash;
use Test::More;
is_deeply(\#by_start, \#by_end, 'same');
done_testing();
Moreover, in the data sample you provided, the order of id's is the same regardless of by what you sort them.

How do you map an array [key1,val1] to a hash { key1 => val1} in perl?

The problem is I have an array that has the key value pairs as elements in an array and I have to split them up somehow into key => value pairs in a hash.
my first attempt at this works, but I think its pretty messy - I have to get every other element of the arrays, and then filter through to find the accepted keys to create a hash with
my $HASH;
my $ARRAY = [ key1, val1, key2, val2, __key3__, val3, __key4__, val4 ];
my #keys = map{ $ARRAY[ $_*2 ] } 0 .. int($#ARRAY/2);
my #vals = map{ $ARRAY[ $_*2+1 ] } 0 .. int($#ARRAY/2);
my $i = 0;
#filter for keys that only have __key__ format
for $key (#keys){
if( $key && $key =~ m/^__(.*)__$/i ){
$HASH{$1} = $vals[$i];
}
$i++;
}
# only key3 and key4 should be in $HASH now
I found this sample code which I think is close to what I'm looking for but I cant figure out how to implement something similar over an array rather than iterating over lines of a text file:
$file = 'myfile.txt'
open I, '<', $file
my %hash;
%hash = map { split /\s*,\s*,/,$_,2 } grep (!/^$/,<I>);
print STDERR "[ITEM] $_ => $hash{$_}\n" for keys %hash;
Can any of you perl gurus out there help me understand the best way to do this? Even if I could somehow join all the elements into a string then split over every second white space token -- that might work too, but for now Im stuck!
This is very easy:
use strict; use warnings;
use YAML;
my $ARRAY = [qw(key1 val1 key2 val2 __key3__ val3 __key4__ val4)];
my $HASH = { #$ARRAY };
print Dump $HASH;
Output:
C:\Temp>
---
__key3__: val3
__key4__: val4
key1: val1
key2: val2
my $ARRAY = [ qw(key1 val1 key2 val2 key3 val3 key4 val4) ];
my $HASH = { #$ARRAY };
In the sample code you found, the <I> portion reads in the entire file and returns a list to grep. grep processes the list and then passes it to map. Map then creates its own list and this list is assigned to the hash.
When you assign a list to a hash, the list is assumed to be an even list of key/value pairs.
It does not matter where this list comes from, it could be the output of a map, grep, or split command. It could be right from a file, it could be stored in an array.
Your line:
my $HASH = ();
Does not do anything useful. Writing my $HASH; is exactly the same.
At this point, $HASH is undefined. When you have an undefined value and you dereference it as a hash, %$HASH, the undefined value will become a hash.
You can make this explicit by writing:
my $HASH = {}; # note the curly braces and not parens
If you have a list of key value pairs in an array:
%$HASH = #array;
If you have a list of keys and a list of values:
#$HASH{#keys} = #values;
Per your question, here is one simple way of creating your hash from the array while filtering the values:
my $HASH = {};
my $ARRAY = [ qw(key1 val1 key2 val2 __key3__ val3 __key4__ val4) ];
{my #list = #$ARRAY; # make a copy since splice eats the list
while (my ($k, $v) = splice #list, 0, 2) {
if ($k =~ /^__(.+)__$/) {
$$HASH{$1} = $v
}
}
}
use Data::Dumper;
print Dumper($HASH);
which prints:
$VAR1 = {
'key4' => 'val4',
'key3' => 'val3'
};
If you want to do that all in one line, you could use the function mapn from my module List::Gen which is a function like map but that allows you to move over the list with any step size you want, not just one item at a time.
use List::Gen 'mapn';
%$HASH = mapn {/^__(.+)__$/ ? ($1, $_[1]) : ()} 2 => #$ARRAY;
I didnt know you could dump the array and hash would figure it out.
There's nothing magical about values that come from an array. There's nothing for the hash to figure out.
If you assign a list to a hash, it will clear the hash, then treat the list as a list of key-value pairs from which to initialise the hash. So
%hash = ( foo => 1, bar => 2 );
is equivalent to
my #anon = ( foo => 1, bar => 2 );
%hash = ();
while (#anon) {
my $key = shift(#anon);
my $val = shift(#anon);
$hash{$key} = $val;
}
A list is a list. It doesn't matter if it was produced using the list/comma op
x => "foo", y => "bar"
using qw()
qw( x foo y bar )
or using an array
#array
So that means the following are all the same.
%hash = ( x => "foo", y => "bar" );
%hash = qw( x foo y bar );
%hash = #array; # Assuming #array contains the correct values.
And so are
$hash = { x => "foo", y => "bar" };
$hash = {qw( x foo y bar )};
$hash = { #array }; # Assuming #array contains the correct values.
Since your particular case has already been answered, I thought I would answer the case of what I took your question to ask. I expected to see an array of pairs like [ key => $value ], and that you wanted to put into either an array of hashes or a hash:
That answer goes like so:
my %hash = map { #$_ } [[ key1 => $value1 ], [ key2 => $value2 ], ... ];
my #ha = map {; { #$_ } } [[ key1 => $value1 ], [ key2 => $value2 ], ... ];
my %hash = #$array_ref_of_values;
Only, I take each one and "explode" them, through dereferencing ( #$_ ).

Resources