How to get first n values from perl Hash of arrays - arrays

Experts,
I have a hash of array in perl which I want to print the first 2 values.
my %dramatis_personae = (
humans => [ 'hamnet', 'shakespeare', 'robyn', ],
faeries => [ 'oberon', 'titania', 'puck', ],
other => [ 'morpheus, lord of dreams' ],
);
foreach my $group (keys %dramatis_personae) {
foreach (#{$dramatis_personae{$group}}[0..1]) { print "\t$_\n";}
}
The output I get is
"hamnet
shakespeare
oberon
titania
morpheus
lord of dreams"
which is basically first two array values for each key. But I am looking to have the output as:
hamnet
shakespeare
Please advise how I can get this result. Thanks!

Keys of hashes are not ordered, so you should specify keys ordering by yourself. Then you can concatenate arrays from each key specified and take first two values from resulting array, is it what you want ?
print "\t$_\n" foreach (map {(#{$dramatis_personae{$_}})} qw/humans faeries other/)[0..1];

Hashes are unordered, so what you requested to achieve is impossible. Unless you have some knowledge about the keys and the order they should be in, the closest you can get is something that can produce any of the following:
'hamnet', 'shakespeare'
'oberon', 'titania'
'morpheus, lord of dreams', 'hamnet'
'morpheus, lord of dreams', 'oberon'
The following is an implementation that does just that:
my $to_fetch = 2;
my #fetched = ( map #$_, values %dramatis_personae )[0..$to_fetch-1];
The following is a more efficient version for larger structures. It also handles insufficient data better:
my $to_fetch = 2;
my #fetched;
for my $group (values(%dramatis_personae)) {
if (#$group > $to_fetch) {
push #fetched, #$group[0..$to_fetch-1];
$to_fetch = 0;
last;
} else {
push #fetched, #$group;
$to_fetch -= #$group;
}
}
die("Insufficient data\n") if $to_fetch;

Related

How to compare two hashes of different levels in perl without using sub routines or modules?

my arrays are
my #arr = ('mars','earth','jupiter');
my #arr1 = ('mercury','mars');
my #arr2 = ('planet','earth','star','sun','planet2','mars');
%space = ( 'earth'=>{
'planet'=> {
'1' =>'US',
'2' =>'UK'
},
'planet2'=>{
'1' =>'AFRICA',
'2' =>'AUS'
}
},
'sun'=>{
'star' =>{
'1' =>'US',
'2' =>'UK'
}
},
'mars' =>{
'planet2' =>{
'1' =>'US',
'2' =>'UK'
}
}
);
now i am comparing the first two arrays in the following manner
foreach (#arr)
{
$arr_hash{$_} =1;
}
foreach my $name (keys %space)
{
foreach my $key (keys %{$space{$name}})
if ($arr_hash{$name} !=1)
{
#do something
}
now how should i compare the third array? I am trying something like this
else
{
if($arr2_hash{$key}{$name} !=1)
{
#do something else
}
I want to check whether the planet+earth pair(ex. the combination of key1 and key2 should be matched with first and second element in #arr2) is present in %space too?
any help?
I've done this twice now in Perl. Once for Test::More's is_deeply() and again for perl5i's are_equal(). Doing it right is not simple. Doing it without subroutines is just silly. If you want to see how this is done, look at are_equal(), though it can be done better.
But I don't think you actually need to compare two hashes.
What I think is happening is you need to check if the things in the various arrays are present in %space. For example...
my #arr = ('mars','earth','jupiter');
That would be true, true, and false.
my #arr1 = ('mercury','mars');
False, true.
my #arr2 = ('planet','earth','star','sun','planet2','mars');
Assuming these are pairs, they're all true.
I'm going to use better variable names than #arr which describe the contents, not the type of the structure. I'm also going to assume that use strict; use warnings; use v5.10; is present.
The first two are simple, loop through the array and check if there's an entry in %space. And we can do both arrays in one loop.
for my $name in (#names1, #names2) {
print "$name...";
say $space{$name} ? "Yes" : "No";
}
The third set is a little trickier, and how the data is laid out makes it harder. Putting pairs in a list is awkward, that's what hashes are for. This would make more sense...
my %object_types = (
earth => "planet", sun => "star", mars => "planet2"
);
Then it's easy. Check that $space{$name}{$type} is true.
for my $name (keys %object_types) {
my $type = $object_types{$name};
print "$name / $type...";
say $space{$name}{$type} ? "Yes" : "No";
}
Or if you're stuck with the array we can iterate through the list in pairs.
# $i will be 0, 2, 4, etc...
for( my $i = 0; $i < $#stellar_objects; $i+=2 ) {
my($type, $name) = ($stellar_objects[$i], $stellar_objects[$i+1]);
print "$name / $type...";
say $space{$name}{$type} ? "Yes" : "No";
}
What if you had a hash of types with multiple names to check instead?
my %object_types = (
planet =>['earth'],
star =>['sun'],
planet2 =>['earth','mars']
);
Same idea, but we need an inner loop over the names array. Good use of plural variable names helps keep thing straight.
for my $type (keys %object_types) {
my $names = $object_types{$type};
for my $name (#$names) {
print "$name / $type...";
say $space{$name}{$type} ? "Yes" : "No";
}
}
Since these are really a set of pairs to search for, combining them into a big hash is a disservice. A better data structure to feed this search might be a list of pairs.
my #searches = (
[ planet => 'earth' ],
[ star => 'sun' ],
[ planet2 => 'earth' ],
[ planet2 => 'mars' ],
);
for my $search (#searches) {
my($type, $name) = #$search;
print "$name / $type...";
say $space{$name}{$type} ? "Yes" : "No";
}
For the record, %space is poorly designed. The first two levels are fine, name and type, it's the country hashes that are awkward.
'sun'=>{
'star' =>{
# This part
'1' =>'US',
'2' =>'UK'
}
},
This has none of the advantages of a hash, and all of the disadvantages. The advantage of a hash is it's very fast to look up a single key, but this makes it awkward by making the interesting part a value. If the key is trying to impose an order on the hash, use an array.
sun => {
star => [ 'US', 'UK' ]
},
Then you can get a list the countries: $countries = $space{$name}{$type}
If you want fast key lookup and order doesn't matter, use a hash with the keys being the thing stored, and the value being 1 (just a placeholder for "true").
sun => {
star => { 'US' => 1, 'UK' => 1 }
},
This takes advantage of hash key lookup and allows $space{$name}{$type}{$country} to quickly check for existence. The "values" (even though they're stored as keys) are also guaranteed to be unique. This formally known as a set, a collection of unique values.
And you can store further information in the value.

Hash in array in a hash

I'm trying to identify the output of Data::Dumper, it produces the output below when used on a hash in some code I'm trying to modify:
print Dumper(\%unholy_horror);
$VAR1 = {
'stream_details' => [
{
'file_path' => '../../../../tools/test_data/',
'test_file' => 'test_file_name'
}
]
};
Is this a hash inside an array inside a hash? If not what is it? and what is the syntax to access the "file path" and "test_file" keys, and their values.
I want to iterate over that inner hash like below, how would I do that?
while ( ($key, $value) = each %hash )
{
print "key: $key, value: $hash{$key}\n";
}
You're correct. It's a hash in an array in a hash.
my %top;
$top{'stream_details'}[0]{'file_path'} = '../../../../tools/test_data/';
$top{'stream_details'}[0]{'test_file'} = 'test_file_name';
print Dumper \%top;
You can access the elements as above, or iterate with 3 levels of for loop - assuming you want to iterate the whole thing.
foreach my $topkey ( keys %top ) {
print "$topkey\n";
foreach my $element ( #{$top{$topkey}} ) {
foreach my $subkey ( keys %$element ) {
print "$subkey = ",$element->{$subkey},"\n";
}
}
}
I would add - sometimes you get some quite odd seeming hash topologies as a result of parsing XML or JSON. It may be worth looking to see if that's what's happening, because 'working' with the parsed object might be easier.
The above might be the result of:
#JSON
{"stream_details":[{"file_path":"../../../../tools/test_data/","test_file":"test_file_name"}]}
Or something similar from an API. (I think it's unlikely to be XML, since XML doesn't implicitly have 'arrays' in the way JSON does).

php multi array

What if I have a multidimensional array, in which I want to make combinations. So for example if I have 3 arrays:
Array(1,2)
Array(7,3)
Array(3,9,8,2)
Then I want a function which make combinations like: array( array(1,7) array(1,3) array( 3,9) etcetera but also array(7,3) array(7,9)
Plus it must check the availability so if the numbers are not available then do not make combinations with them.
Now I got this, and further I cant get:
$xCombinations =
Array(
Array(1,2)
Array(7,3)
Array(3,9,8,2)
);
$available = Array([1] => 2,[2] => 1, [7]=>3,[3]=>4, [8]=>2, [2]=>4);
$combination = Array();
recursiveArray($xCombinations);
function recursiveArray($tmpArr){
if($tmpArr){
foreach ($tmpArr)as $value) {
if (is_array($value)) {
displayArrayRecursively($value);
} else {
//here 1st time $value would be 1,
//so this must now be combined with 7 , 3 , 9, 8 ,2 >
//so you will get an array of array(1,7) array, (1,3) array (3,9)
//etcetera.. there arrays must be also in a multidimensional array
}
}
}
}
This issue consists ow two parts. Actually, it can be implemented with one functional block, but that would be hard to read. So, I'll split it to two operations.
Retrieving combinations
First task is to get all combinations. By definition, those combinations are Cartesian product of your array members. That could be produced without recursion, like:
$result = array_reduce(array_slice($input, 1), function($c, $x)
{
return call_user_func_array('array_merge',
array_map(function($y) use ($c)
{
return array_map(function($z) use ($y)
{
return array_merge((array)$z, (array)$y);
}, $c);
}, $x)
);
}, current($input));
Here we're accumulating our tuples one be one. Result is just all possible combinations as it should be for Cartesian product.
Filtering result
This is another part of the issue. Assuming you have $available as a set of key=>value pairs, where key corresponds to specific number in a tuple and value corresponds to how many times can that number appear in that tuple at most, here's simple filtering code:
$result = array_filter($result, function($tuple) use ($available)
{
foreach(array_count_values($tuple) as $value=>$count)
{
if(isset($available[$value]) && $available[$value]<$count)
{
return false;
}
}
return true;
});
For instance, if we have $input as
$input = [
[1,3],
[1,6]
];
and restrictions on it as
$available = [6=>0, 1=>1];
Then result would be only
array(1) {
[1]=>
array(2) {
[0]=>
int(3)
[1]=>
int(1)
}
}
since only this tuple fulfills all requirements (so it has not 6 and 1 appeared once in it)

Looping through known elements in a hash of hashes of arrays

I have a question I am hoping someone could help with (simplified for the purposes of explaining my question).
I have the following hash of hashes of arrays (I think that is what it is anyway?)
Data structure
{
Cat => {
Height => ["Tiny"],
},
Dog => {
Colour => ["Black"],
Height => ["Tall"],
Weight => ["Fat", "Huge"],
},
Elephant => {
Colour => ["Grey"],
Height => ["Really Big"],
Weight => ["Fat", "Medium", "Thin"],
},
}
What I am trying to do
The program below will print the whole data structure. I want to use this kind of way to do it
my %h;
for my $animal (keys %h) {
print "$animal\n";
for my $attribute ( keys %{$h{$animal}} ) {
print "\t $attribute\n";
for my $i (0 .. $#{$h{$animal}{$attribute}} ) {
print "\t\t$h{$animal}{$attribute}[$i]\n";
}
}
}
The problem I am having
I am trying to access a particular part of the data structure. For example, I want to only print out the Height arrays for each animal as I do not care about the other Colour, Weight attributes in this example.
I'm sure there is a simple answer to this, and I know I need to specify the Height part, but what is the correct way of doing it? I have tried multiple ways that I thought would work without success.
In your code, instead of looping over all the attributes with
for my $attribute ( keys %{ $h{$animal} } ) { ... }
just use the one you are interested in. Like this
for my $animal (keys %h) {
print "$animal\n";
for my $attribute ( 'Height' ) {
print "\t $attribute\n";
for my $i (0 .. $#{$h{$animal}{$attribute}} ) {
print "\t$h{$animal}{$attribute}[$i]\n";
}
}
}
I would choose to loop over the contents of the heights array rather than the indices, making the code look like this:
for my $animal (keys %h) {
print "$animal\n";
print "\t\t$_\n" for #{ $h{$animal}{Height} };
}
Taking a quick look at your data structure: It's a hash of hashes of arrays! Wow. Mind officially blown.
Here's a quick way of printing out all of the data:
use feature qw(say);
# Working with a Hash of Hash of Arrays
for my $animal (keys %h) {
say "Animal: $animal";
# Dereference: Now I am talking about a hash of arrays
my %animal_attributes = %{ $h{$animal} };
for my $attribute (keys %animal_attributes) {
# Dereference: Now I am talking about just an array
my #attribute_value_list = #{ $animal_attributes{$attribute} };
say "\tAttribute: $attribute - " . join ", ", #attribute_value_list;
}
}
Note I use dereferencing. I don't have to do the dereference, but it makes the code a bit easier to work with. I don't have to think of my various levels. I know my animal is a hash of attributes, and those attributes are an array of attribute values. By using dereferencing, it allows me to keep things straight.
Now, let's say you want to print out only a list of desirable attributes. You can use the exists function to see if that attribute exists before trying to print it out.
use feature qw(say);
use constant DESIRED_ATTRIBUTES => qw(weight height sex_appeal);
# Working with a Hash of Hash of Arrays
for my $animal (keys %h) {
say "Animal: $animal";
# Dereference: Now I am talking about a hash of arrays
my %animal_attributes = %{ $h{$animal} };
for my $attribute ( DESIRED_ATTRIBUTES ) {
if ( exists $animal_attributes{$attribute} ) {
# Dereference: Now I am talking about just an array
my #attribute_value_list = #{ $animal_attributes{$attribute} };
say "\tAttribute: $attribute - " . join ", ", #attribute_value_list;
}
}
}
Same code, I just added an if clause.
When you get into these complex data structures, you might be better off using Object Oriented design. Perl has an excellent tutorial on OOP Perl. If you used that, you could have defined a class of animals and have various methods to pull out the data you want. It makes maintenance much easier and allows you to bravely create even more complex data structures without worrying about tracking where you are.
I think sometimes it's easier to use the value directly, if it is a reference to another structure. You could do something like:
my $height = "Height";
while (my ($animal, $attr) = each %h) {
print "$animal\n";
print "\t$height\n";
print "\t\t$_\n" for #{ $attr->{$height} };
}
Using the value of the main keys, you can skip over one step of references and go straight at the Height attribute. The output below is after the format you had in your original code.
Output:
Elephant
Height
Really Big
Cat
Height
Tiny
Dog
Height
Tall
Assuming your variable is called %h:
foreach my $animal (keys %h) {
my $heights = $h{$animal}->{Height}; #gets the Height array
print $animal, "\n";
foreach my $height( #$heights ) {
print " ", $height, "\n";
}
}
I think I have worked it out and found what I was doing wrong?
This is how I think it should be:
my %h;
for my $animal (keys %h) {
print "$animal\n";
for my $i (0 .. $#{$h{$animal}{Height}} ) {
print "\t\t$h{$animal}{Height}[$i]\n";
}
}

How do I consolidate a hash in Perl?

I have an array of hash references. The hashes contain 2 keys, USER and PAGES. The goal here is to go through the array of hash references and keep a running total of the pages that the user printed on a printer (this comes from the event logs). I pulled the data from an Excel spreadsheet and used regexes to pull the username and pages. There are 182 rows in the spreadsheet and each row contains a username and the number of pages they printed on that job. Currently the script can print each print job (all 182) with the username and the pages they printed but I want to consolidate this down so it will show: username 266 (i.e. just show the username once, and the total number of pages they printed for the whole spreadsheet.
Here is my attempt at going through the array of hash references, seeing if the user already exists and if so, += the number of pages for that user into a new array of hash references (a smaller one). If not, then add the user to the new hash ref array:
my $criteria = "USER";
my #sorted_users = sort { $a->{$criteria} cmp $b->{$criteria} } #user_array_of_hash_refs;
my #hash_ref_arr;
my $hash_ref = \#hash_ref_arr;
foreach my $index (#sorted_users)
{
my %hash = (USER=>"",PAGES=>"");
if(exists $index{$index->{USER}})
{
$hash{PAGES}+=$index->{PAGES};
}
else
{
$hash{USER}=$index->{USER};
$hash{PAGES}=$index->{PAGES};
}
push(#hash_ref_arr,{%hash});
}
But it gives me an error:
Global symbol "%index" requires explicit package name at ...
Maybe my logic isn't the best on this. Should I use arrays instead? It seems as though a hash is the best thing here, given the nature of my data. I just don't know how to go about slimming the array of hash refs down to just get a username and the total pages they printed (I know I seem redundant but I'm just trying to be clear). Thank you.
my %totals;
$totals{$_->{USER}} += $_->{PAGES} for #user_array_of_hash_refs;
And then, to get the data out:
print "$_ : $totals{$_}\n" for keys %totals;
You could sort by usage too:
print "$_ : $totals{$_}\n" for sort { $totals{$a} <=> $totals{$b} } keys %totals;
As mkb mentioned, the error is in the following line:
if(exists $index{$index->{USER}})
However, after reading your code, your logic is faulty. Simply correcting the syntax error will not provide your desired results.
I would recommend skipping the use of temporary hash within the loop. Just work with the a results hash directly.
For example:
#!/usr/bin/perl
use strict;
use warnings;
my #test_data = (
{ USER => "tom", PAGES => "5" },
{ USER => "mary", PAGES => "2" },
{ USER => "jane", PAGES => "3" },
{ USER => "tom", PAGES => "3" }
);
my $criteria = "USER";
my #sorted_users = sort { $a->{$criteria} cmp $b->{$criteria} } #test_data;
my %totals;
for my $index (#sorted_users) {
if (not exists $totals{$index->{USER}}) {
# initialize total for this user
$totals{$index->{USER}} = 0;
}
# add to user's running total
$totals{$index->{USER}} += $index->{PAGES}
}
print "$_: $totals{$_}\n" for keys %totals;
This produces the following output:
$ ./test.pl
jane: 3
tom: 8
mary: 2
The error comes from this line:
if(exists $index{$index->{USER}})
The $ sigil in Perl 5 with {} after the name means that you are getting a scalar value out of a hash. There is no hash declared by the name %index. I think that you probably just need to add a -> operator so the problem line becomes:
if(exists $index->{$index->{USER}})
but not having the data makes me unsure.
Also, good on you for using use strict or you would be instantiating the %index hash silently and wondering why your results didn't make any sense.
my %total;
for my $name_pages_pair (#sorted_users) {
$total{$name_pages_pair->{USER}} += $name_pages_pair->{PAGES};
}
for my $username (sort keys %total) {
printf "%20s %6u\n", $username, $total{$username};
}

Resources