Compare two hash of arrays - arrays

I have two arrays and a hash holds these arrays
Array 1:
my $group = "west"
#{ $my_big_hash{$group} } = (1534,2341,2322,3345,689,3333,4444,5533,3334,5666,6676,3435);
Array 2 :
my $element = "Location" ;
my $group = "west" ;
#{ $my_tiny_hash{$element}{$group} } = (153,333,667,343);
Now i would want to compare
#{ $my_tiny_hash{$element}{$group} }
with
#{ $my_big_hash{$group} }
and check whether all the elements of tiny hash array are a part of big_hash array .
As we can see tiny hash has just 3 digit elements and all these elements are matching with big hash if we just compare the first 3 digits
if first 3 digits/letters match and all are available in the big array, then its matching or We have to print the unmatched elements
Its an array to array comparison.
How do we achieve it.
PS : Without Array Utils , How to achieve it
The solution using Array Utils is really simple
my #minus = array_minus( #{ $my_tiny_hash{$element}{$group} } , #{ $my_big_hash{$group} } );
But it compares all the digits and i would just want to match the first 3 digits
Hope this is clear
Thanks

This seems to do what you want.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my (%big_hash, %tiny_hash);
my $group = 'west';
my $element = 'Location';
# Less confusing initialisation!
$big_hash{$group} = [1534,2341,2322,3345,689,3333,4444,5533,3334,5666,6676,3435];
$tiny_hash{$element}{$group} = [153,333,667,343];
# Create a hash where the keys are the first three digits of the numbers
# in the big array. Doesn't matter what the values are.
my %check_hash = map { substr($_, 0, 3) => 1 } #{ $big_hash{$group} };
# grep the small array by checking the elements' existence in %check_hash
my #missing = grep { ! exists $check_hash{$_} } #{ $tiny_hash{$element}{$group} };
say "Missing items: #missing";
Update: Another solution that seems closer to your original code.
my #truncated_big_array = map { substr($_, 0, 3) } #{ $big_hash{$group} };
my #minus = array_minus( #{ $my_tiny_hash{$element}{$group} } , #truncated_big_array );

A quick and bit dirty solution (which extends your existing code).
#!/usr/bin/perl
use strict;
use warnings;
my (%my_big_hash, %my_tiny_hash, #temp_array);
my $group = "west";
#{ $my_big_hash{$group} } = (1534,343,2341,2322,3345,689,3333,4444,5533,3334,5666,6676,3435);
foreach (#{ $my_big_hash{$group} }){
push #temp_array, substr $_, 0,3;
}
my $element = "Location";
my $group2 = "west";
#{ $my_tiny_hash{$element}{$group2} } = (153,333,667,343,698);
#solution below
my %hash = map { $_ => 1 } #temp_array;
foreach my $search (#{$my_tiny_hash{'Location'}->{west}}){
if (exists $hash{$search}){
print "$search exists\n";
}
else{
print "$search does not exist\n";
}
}
Output:
153 exists
333 exists
667 exists
343 exists
698 does not exist
Demo
Also see: https://stackoverflow.com/a/39585810/257635
Edit: As per request using Array::Utils.
foreach (#{ $my_big_hash{$group} }){
push #temp_array, substr $_, 0,3;
}
my #minus = array_minus( #{ $my_tiny_hash{$element}{$group} } , #temp_array );
print "#minus";

An alternative, using ordered comparison instead of hashes:
#big = sort (1534,2341,2322,3345,689,3333,4444,5533,3334,5666,6676,3435);
#tiny = sort (153,333,667,343,698);
for(#tiny){
shift #big while #big and ($big[0] cmp $_) <0;
push #{$result{
$_ eq substr($big[0],0,3)
? "found" : "missing" }},
$_;
}
Contents of %result:
{
'found' => [
153,
333,
343,
667
],
'missing' => [
698
]
}

Related

Read space delimited text file into array of hashes [Perl]

I have text file that matches the following format:
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
And I need to read it into a data structure in perl that will allow me to sort and print according to any of those columns.
From left to right the columns are process_id, memory_size, cpu_time and program_name.
How can I read a text file with formatting like that in a way that allows me to sort the data structure and print it according to the sort?
My attempt so far:
my %tasks;
sub open_file{
if (open (my $input, "task_file" || die "$!\n")){
print "Success!\n";
while( my $line = <$input> ) {
chomp($line);
($process_id, $memory_size, $cpu_time, $program_name) = split( /\s/, $line, 4);
$tasks{$process_id} = $process_id;
$tasks{$memory_size} = $memory_size;
$tasks{$cpu_time} = $cpu_time;
$tasks{$program_name} = $program_name;
print "$tasks{$process_id} $tasks{$memory_size} $tasks{$cpu_time} $tasks{$program_name}\n";
}
This does print the output correctly, however I can't figure out how to then sort my resulting %tasks hash by a specific column (i.e. process_id, or any other column) and print the whole data structure in a sorted format.
You're storing the values under keys that are equal to the values. Use Data::Dumper to inspect the structure:
use Data::Dumper;
# ...
print Dumper(\%tasks);
You can store the pids in a hash of hashes, using the value of each column as the inner key.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw{ say };
my #COLUMNS = qw( memory cpu program );
my %sort_strings = ( program => sub { $a cmp $b } );
my (%process_details, %sort);
while (<DATA>) {
my ($process_id, $memory_size, $cpu_time, $program_name) = split;
$process_details{$process_id} = { memory => $memory_size,
cpu => $cpu_time,
program => $program_name };
undef $sort{memory}{$memory_size}{$process_id};
undef $sort{cpu}{$cpu_time}{$process_id};
undef $sort{program}{$program_name}{$process_id};
}
say 'By pid:';
say join ', ', $_, #{ $process_details{$_} }{#COLUMNS}
for sort { $a <=> $b } keys %process_details;
for my $column (#COLUMNS) {
say "\nBy $column:";
my $cmp = $sort_strings{$column} || sub { $a <=> $b };
for my $value (sort $cmp keys %{ $sort{$column} }
) {
my #pids = keys %{ $sort{$column}{$value} };
say join ', ', $_, #{ $process_details{$_} }{#COLUMNS}
for #pids;
}
}
__DATA__
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
But if the data aren't really large and the sorting isn't time critical, just sorting the whole array of arrays by a given column is much easier to write and read:
#!/usr/bin/perl
use strict;
use feature qw{ say };
use warnings;
use enum qw( PID MEMORY CPU PROGRAM );
my #COLUMN_NAMES = qw( pid memory cpu program );
my %sort_strings = ((PROGRAM) => 1);
my #tasks;
push #tasks, [ split ] while <DATA>;
for my $column_index (0 .. $#COLUMN_NAMES) {
say "\nBy $COLUMN_NAMES[$column_index]:";
my $sort = $sort_strings{$column_index}
? sub { $a->[$column_index] cmp $b->[$column_index] }
: sub { $a->[$column_index] <=> $b->[$column_index] };
say "#$_" for sort $sort #tasks;
}
__DATA__
...
You need to install the enum distribution.
I can't figure out how to then sort my resulting %tasks hash by a specific column
You can't sort a hash. You need to convert each of your input rows in a hash (which you're doing successfully) and then store all of those hashes in an array. You can then print the contents of the array in a sorted order.
This seems to do what you want:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my #cols = qw[process_id memory_size cpu_time program_name];
#ARGV or die "Usage: $0 [sort_order]\n";
my $sort = lc shift;
if (! grep { $_ eq $sort } #cols ) {
die "$sort is not a valid sort order.\n"
. "Valid sort orders are: ", join('/', #cols), "\n";
}
my #data;
while (<DATA>) {
chomp;
my %rec;
#rec{#cols} = split;
push #data, \%rec;
}
if ($sort eq $cols[-1]) {
# Do a string sort
for (sort { $a->{$sort} cmp $b->{$sort} } #data) {
say join ' ', #{$_}{#cols};
}
} else {
# Do a numeric sort
for (sort { $a->{$sort} <=> $b->{$sort} } #data) {
say join ' ', #{$_}{#cols};
}
}
__DATA__
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
I've used the built-in DATA filehandle to make the code simpler. You would need to replace that with some code to read from an external file.
I've used a hash slice to simplify reading the data into a hash.
The column that you want to sort by is passed into the program as a command-line argument.
Note that you have to sort the last column (the program name) using string comparison and all other columns using numeric comparison.
This decides how to sort using the first argument the script receives.
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
open my $fh, '<', 'task_file';
my #tasks;
my %sort_by = (
process_id=>0,
memory_size=>1,
cpu_time=>2,
program_name=>3
);
my $sort_by = defined $sort_by{defined $ARGV[0]?$ARGV[0]:0} ? $sort_by{$ARGV[0]} : 0;
while (<$fh>) {
push #tasks, [split /\s+/, $_];
}
#tasks = sort {
if ($b->[$sort_by] =~ /^[0-9]+$/ ) {
$b->[$sort_by] <=> $a->[$sort_by];
} else {
$a->[$sort_by] cmp $b->[$sort_by];
}
} #tasks;
for (#tasks) {
say join ' ', #{$_};
}

Create Multi Dimensional Hash Using Array Elements

I need to create multidimensional hashes with varying depth using array elements as keys. Pseudocode attempt:
Example line:
Statement Coverage for instance Hierarchical/path/with/block/and/module ...
if (/Statement Coverage for instance (.&?) /)
{
#array = split /\//, $1;
}
for (my $eye = 1; $eye <= $#array; $eye++)
{
A miracle happens to %hash!
}
$hash{"path"}{"with"}{"block"}{"and"} now has a value of "module". Remember, the number of keys can vary. Any ideas?
That's what Data::Diver does for you:
my #array = split /\//, $1;
DiveVal(\ my %hash, #array[ 0 .. $#array - 1 ]) = $array[-1];
print Dumper(\%hash);
See my code below. It builds the desired hash recursively.
But I think that you are taking a wrong approach. I obviously don't know what exactly you are trying to achieve, but seems to me, that you should use tree data structure instead of the multidimensional hash.
use strict;
use warnings;
use v5.10;
use Data::Dumper;
my #data = (
'some/path/test',
'some/path/deeper/test',
'another/deeper/path/test',
);
my $resultHr = {};
foreach my $path (#data) {
my #elems = split /\//, $path;
buildHash($resultHr, #elems);
}
say Dumper($resultHr);
sub buildValue {
my $n = shift;
if (#_) {
return {$n => buildValue(#_)};
}
else {
return $n;
}
}
sub buildHash {
my $hr = shift;
my $k = shift;
return unless $k;
if (exists $hr->{$k} && ref $hr->{$k}) {
buildHash($hr->{$k}, #_);
}
else {
$hr->{$k} = buildValue(#_);
}
}

Perl Compare hash of arrays with another array

I am trying to compare all the array values (complete array) with a hash's value(which is array) and if the match founds,then push the key of hash to new array.
The below code compare if the hash value is not array but how can I compare if its array?
%hash=(
storeA=>['milk','eggs'],
storeB=>['milk','fruits','eggs','vegetables'],
storeC=>['milk','fruits','eggs'],
);
#array = (
'fruits',
'milk',
'eggs'
);
Code to compare
use strict;
use warnings;
use Data::Dumper;
foreach my $thing (#array) {
foreach ((my $key, my $value) = each %hash) {
if ($value eq $thing) {
push #new_array, $key;
}
}
}
print Dumper(\#new_array);
Expected Output
#new_array = (
storeB,
storeC
);
You could also use a combination of all and any form List::Util :
while ((my $key, my $value) = each %hash) {
if ( all { my $temp = $_; any { $_ eq $temp } #$value } #array ) {
push #new_array, $key;
}
}
So here you are looking for the case where all the elements of #array exists in the given array from the hash.
I would build a hash out of each store's stock array. It's a wasteful method, but not egregiously so as long as the real data isn't enormous
Like this. The inner grep statement counts the number of items in #list that are available at this store and compares it to the number of items in the list, returning true if everything is in stock
If this is a real situation (I suspect it's homework) then for all practical purposes that I can think of, the %stocks hash should contain hashes of the items available at each store
use strict;
use warnings 'all';
my %stocks = (
storeA => [ qw/ milk eggs / ],
storeB => [ qw/ milk fruits eggs vegetables / ],
storeC => [ qw/ milk fruits eggs / ],
);
my #list = qw/ fruits milk eggs /;
my #stores = grep {
my %stock = map { $_ => 1 } #{ $stocks{$_} };
grep($stock{$_}, #list) == #list;
} keys %stocks;
use Data::Dump;
dd \#stores;
output
["storeB", "storeC"]
Find the intersection of the two sets, if the number of its elements is the number of the elements in the array, you want to store the key:
#!/usr/bin/perl
use warnings;
use strict;
sub intersect {
my ($arr1, $arr2) = #_;
my %intersection;
$intersection{$_}{0}++ for #$arr1;
$intersection{$_}{1}++ for #$arr2;
return grep 2 == keys %{ $intersection{$_} }, keys %intersection
}
my %hash = (
storeA => [qw[ milk eggs ]],
storeB => [qw[ milk fruits eggs vegetables ]],
storeC => [qw[ milk fruits eggs ]],
);
my #array = qw( fruits milk eggs );
my #new_array;
while (my ($store, $arr) = each %hash) { # while, not for!
push #new_array, $store if #array == intersect(\#array, $arr);
}
use Data::Dumper;
print Dumper(\#new_array);
Simply try this. One small trick i done here. grep was use to filter the element from an array.
I created the variable $joined_array which contain the | separated #array data. Then i pass the variable into the grep.
And the trick is, when the array is compare with a scalar data, the comparison is behave, the total number of an array element with scalar data.
my #array = qw(one two three);
if(#array == 3)
{
print "Hi\n";
}
Here condition is internally run as 3 == 3.
That the same logic i done here.
use warnings;
use strict;
my %hash=(
"storeA"=>['milk','eggs'],
"storeB"=>['milk','fruits','eggs','vegetables'],
"storeC"=>['milk','fruits','eggs'],
);
my #array = (
'fruits',
'milk',
'eggs'
);
my #new_array;
my $joined_array = join("|",#array);
foreach (keys %hash)
{
push(#new_array,$_) if ((grep{ /\b$joined_array\b/ } #{$hash{$_}}) >= scalar #array);
}
print "$_\n" for #new_array
Go through all stores (keys) and for each check whether all array elems are in the key's array-ref.
use strict;
use warnings;
my %inventory = (
storeA => ['milk','eggs'],
storeB => ['milk','fruits','eggs','vegetables'],
storeC => ['milk','fruits','eggs'],
);
my #items = ('fruits', 'milk', 'eggs');
my #found;
foreach my $store (keys %inventory) {
push #found, $store
if #items == grep { in_store($_, $inventory{$store}) } #items
}
sub in_store { for (#{$_[1]}) { return 1 if $_[0] eq $_ }; return 0; }
print "#found\n"; # prints: storeB storeC
The grep block checks for each item whether it is (available) in the store, and if the number of those that pass is equal to the number of items (array size) that store has all items and is added. Note that a subroutine returns the last value evaluated without an explicit return, so the final return is not needed. It was added for clarity.

ID tracking while swapping and sorting other two arrays in perl

#! /usr/bin/perl
use strict;
my (#data,$data,#data1,#diff,$diff,$tempS,$tempE, #ID,#Seq,#Start,#End, #data2);
#my $file=<>;
open(FILE, "< ./out.txt");
while (<FILE>){
chomp $_;
#next if ($line =~/Measurement count:/ or $line =~/^\s+/) ;
#push #data, [split ("\t", $line)] ;
my #data = split('\t');
push(#ID, $data[0]);
push(#Seq, $data[1]);
push(#Start, $data[2]);
push(#End, $data[3]);
# push #$data, [split ("\t", $line)] ;
}
close(FILE);
my %hash = map { my $key = "$ID[$_]"; $key => [ $Start[$_], $End[$_] ] } (0..$#ID);
for my $key ( %hash ) {
print "Key: $key contains: ";
for my $value ($hash{$key} ) {
print " $hash{$key}[0] ";
}
print "\n";
}
for (my $j=0; $j <=$#Start ; $j++)
{
if ($Start[$j] > $End[$j])
{
$tempS=$Start[$j];
$Start[$j]=$End[$j];
$End[$j]=$tempS;
}
print"$tempS\t$Start[$j]\t$End[$j]\n";
}
my #sortStart = sort { $a <=> $b } #Start;
my #sortEnd = sort { $a <=> $b } #End;
#open(OUT,">>./trial.txt");
for(my $i=1521;$i>=0;$i--)
{
print "hey";
my $diff = $sortStart[$i] - $sortStart[$i-1];
print "$ID[$i]\t$diff\n";
}
I have three arrays of same length, ID with IDs (string), Start and End with integer values (reading from a file).
I want to loop through all these arrays and also want to keep track of IDs. First am swapping elements in Start with End if Start > End, then I have to sort these two arrays for further application (as I am negating Start[0]-Start[1] for each item in that Start). While sorting, the Id values may change, and as my IDs are unique for each Start and End elements, how can I keep track of my IDs while sorting them?
Three arrays, ID, Start and End, are under my consideration.
Here is a small chunk of my input data:
DQ704383 191990066 191990037
DQ698580 191911184 191911214
DQ724878 191905507 191905532
DQ715191 191822657 191822686
DQ722467 191653368 191653339
DQ707634 191622552 191622581
DQ715636 191539187 191539157
DQ692360 191388765 191388796
DQ722377 191083572 191083599
DQ697520 189463214 189463185
DQ709562 187245165 187245192
DQ540163 182491372 182491400
DQ720940 180753033 180753060
DQ707760 178340696 178340726
DQ725442 178286164 178286134
DQ711885 178250090 178250119
DQ718075 171329314 171329344
DQ705091 171062479 171062503
The above ID, Start, End respectively. If Start > End i swapped them only between those two arrays. But after swapping the descending order may change, but i want them in descending order also their corresponding ID for negation as explained above.
Don't use different arrays, use a hash to keep the related pieces of information together.
#!/usr/bin/perl
use warnings;
use strict;
use enum qw( START END );
my %hash;
while (<>) {
my ($id, $start, $end) = split;
$hash{$id} = [ $start < $end ? ($start, $end)
: ($end, $start) ];
}
my #by_start = sort { $hash{$a}[START] <=> $hash{$b}[START] } keys %hash;
my #by_end = sort { $hash{$a}[END] <=> $hash{$b}[END] } keys %hash;
use Test::More;
is_deeply(\#by_start, \#by_end, 'same');
done_testing();
Moreover, in the data sample you provided, the order of id's is the same regardless of by what you sort them.

What is the 'best' way to delete multiple non-sequential elements in a Perl array?

While executing a script, I need to delete multiple elements (these elements are not sequential) of an array. I will get my array and indexes while executing the script.
For example:
I may get an array and list of indexes like below:
my #array = qw(one two three four five six seven eight nine);
my #indexes = ( 2, 5, 7 );
I have below subroutine to do this:
sub splicen {
my $count = 0;
my $array_ref = shift #_;
croak "Not an ARRAY ref $array_ref in $0 \n"
if ref $array_ref ne 'ARRAY';
for (#_) {
my $index = $_ - $count;
splice #{$array_ref}, $index, 1;
$count++;
}
return $array_ref;
}
If I call my subroutine like below:
splicen(\#array , #indexes);
That works for me but:
Is there any better way to do this?
If instead you splice from the end of the array, you won't have to maintain the offset $count:
sub delete_elements {
my ( $array_ref, #indices ) = #_;
# Remove indexes from end of the array first
for ( sort { $b <=> $a } #indices ) {
splice #$array_ref, $_, 1;
}
}
Another way think about it is to build a new array rather than modifying the original:
my #array = qw(one two three four five size seven eight nine);
my #indexes = (2, 5, 7);
my %indexes = map { $_ => 1 } #indexes;
my #kept = map { $array[$_] } grep { ! exists $indexes{$_} } 0 .. $#array;

Resources