Perl Hash Trouble - arrays

An easy one for a Perl guru...
I want a function that simply takes in an array of items (actually multiple arrays) and counts the number of times each item in the key section of a hash is there. However, I am really unsure of Perl hashes.
#array = qw/banana apple orange apple orange apple pear/
I read that you need to do arrays using code like this:
my %hash = (
'banana' => 0,
'orange' => 0,
'apple' => 0
#I intentionally left out pear... I only want the values in the array...
);
However, I am struggling getting a loop to work that can go through and add one to the value with a corresponding key equal to a value in the array for each item in the array.
foreach $fruit (#array) {
if ($_ #is equal to a key in the hash) {
#Add one to the corresponding value
}
}
This has a few basic functions all wrapped up in one, so on behalf of all beginning Perl programmers, thank you in advance!

All you need is
my #array = qw/banana apple orange apple orange apple pear/;
my %counts;
++$counts{$_} for #array;
This results in a hash like
my %counts = ( apple => 3, banana => 1, orange => 2, pear => 1 )
The for loop can be written with block and a an explicit loop counter variable if you prefer, like this
for my $word (#array) {
++$counts{$word};
}
with exactly the same effect.

You can use exists.
http://perldoc.perl.org/functions/exists.html
Given an expression that specifies an element of a hash, returns true
if the specified element in the hash has ever been initialized, even
if the corresponding value is undefined.
foreach my $fruit (#array) {
if (exists $hash{$fruit}) {
$hash{$fruit}++;
}
}

Suppose you have an array named #array. You'd access the 0th element of the array with $array[0].
Hashes are similar. %hash's banana element can be accessed with $hash{'banana'}.
Here's a pretty simple example. It makes use of the implicit variable $_ and a little bit of string interpolation:
use strict;
my #array = qw/banana apple orange apple orange apple pear/;
my %hash;
$hash{$_} += 1 for #array; #add one for each fruit in the list
print "$_: $hash{$_}\n" for keys %hash; #print our results
If needed, you can check if a particular hash key exists: if (exists $hash{'banana'}) {...}.
You'll eventually get to see something called a "hashref", which is not a hash but a reference to a hash. In that case, $hashref has $hashref->{'banana'}.

I'm trying to understand you here:
You have an array and a hash.
You want to count the items in the array and see how many time they occur
But, only if this item is in your hash.
Think of hashes as keyed arrays. Arrays have a position. You can talk about the 0th element, or the 5th element. There is only one 0th element and their is only one 5th element.
Let's look at a hash:
my %jobs;
$jobs{bob} = "banker";
$jobs{sue} = "banker";
$jobs{joe} = "plumber;
Just as we can talk about the element in the array in the 0th position, we can talk about the element with the of bob. Just as there is only one element in the 0th position, there can only be one element with a key of bob.
Hashes provide a quick way to look up information. For example, I can quickly find out Sue's job:
print "Sue is a $jobs{sue}\n";
We have:
An array filled with items.
A hash with the items we want to count
Another hash with the totals.
Here's the code:
use strict;
use warnings;
use feature qw(say);
my #items = qw(.....); # Items we want to count
my %valid_items = (....); # The valid items we want
# Initializing the totals. Explained below...
my %totals;
map { $totals{$_} = 0; } keys %valid_items;
for my $item ( #items ) {
if ( exists $valid_items{$item} ) {
$totals{$item} += 1; #Add one to the total number of items
}
}
#
# Now print the totals
#
for my $item ( sort keys %totals ) {
printf "%-10.10s %4d\n", $item, $totals{$item};
}
The map command takes the list of items on the right side (in our case keys %valid_items), and loop through the entire list.
Thus:
map { $totals{$_} = 0; } keys %valid_items;
Is a short way of saying:
for ( keys %valid_items ) {
$totals{$_} = 0;
}
The other things I use are keys which returns as an array (okay list) all of the keys of my hash. Thus, I get back apple, banana, and oranges when I say keys %valid_items.
The [exists](http://perldoc.perl.org/functions/exists.html) is a test to see if a particular key is in my hash. The value of that key might be zero, a null string, or even an undefined value, but if the key is in my hash, theexists` function will return a true value.
However, if we can use exists to see if a key is in my %valid_items hash, we could do the same with %totals. They have the same set of keys.
Instead or creating a %valid_items hash, I'm going to use a #valid_items array because arrays are easier to initialize than hashes. I just have to list the values. Instead of using keys %valid_items to get a list of the keys, I can use #valid_items:
use strict;
use warnings;
use feature qw(say);
my #items = qw(banana apple orange apple orange apple pear); # Items we want to count
my #valid_items = qw(banana apple orange); # The valid items we want
my %totals;
map { $totals{$_} = 0; } #valid_items;
# Now %totals is storing our totals and is the list of valid items
for my $item ( #items ) {
if ( exists $totals{$item} ) {
$totals{$item} += 1; #Add one to the total number of items
}
}
#
# Now print the totals
#
for my $item ( sort keys %totals ) {
printf "%-10.10s %4d\n", $item, $totals{$item};
}
And this prints out:
apple 3
banana 1
orange 2
I like using printf for keeping tables nice and orderly.

This will be easier to understand as I too started to write code just 2 months back.
use Data::Dumper;
use strict;
use warnings;
my #array = qw/banana apple orange apple orange apple pear/;
my %hashvar;
foreach my $element (#array) {
#Check whether the element is already added into hash ; if yes increment; else add.
if (defined $hashvar{$element}) {
$hashvar{$element}++;
}
else {
$hashvar{$element} = 1;
}
}
print Dumper(\%hashvar);
Will print out the output as
$VAR1 = {
'banana' => 1,
'apple' => 3,
'orange' => 2,
'pear' => 1
};
Cheers

Related

Problem to get more than 1 element with same highest occurrence from array in Perl

I only get the smaller element as output although there are 2 elements with same highest occurrence in array
I have tried to remove sort function from the codes but it still returns me the smaller element
my(#a) = (undef,11,12,13,14,15,13,13,14,14);
my(%count);
foreach my $value (#a) {
$count{$value}++;
}
$max_value = (sort {$count{$b} <=> $count{$a}} #a)[0];
print "Max value = $max_value, occur $count{$max_value} times\n";
Expected result: Max value =13 14, occur 3 times
max_by from List::UtilsBy will return all values that share the maximum in list context.
use strict;
use warnings;
use List::UtilsBy 'max_by';
my #a = (undef,11,12,13,14,15,13,13,14,14);
my %count;
$count{$_}++ for #a;
my #max_values = max_by { $count{$_} } keys %count;
Your code simply takes the first maximal value it finds in the sorted data. You need to continue reading array elements until you reach one that is no longer maximal.
However, as you probably have to test all the hash values there's no great advantage to sorting it. You can just traverse it and keep track of the maximal value(s) you find.
my #a = (undef,11,12,13,14,15,13,13,14,14);
my %count;
$count{$_}++ for #a;
my ($max_count, #max_values);
while ( my ($k,$v) = each %count) {
if ($v > $max_count) {
#max_values = ($k);
$max_count = $v;
}
elsif ($v == $max_count) {
push #max_values, $k;
}
}
my $max_values = join " ", sort #max_values;
print "Max value = $max_values, occur $max_count times\n";
Note that undef is not a valid hash key - it gets converted to "".

De-reference x number of times for x number of data structures

I've come across an obstacle in one of my perl scripts that I've managed to solve, but I don't really understand why it works the way it works. I've been scouring the internet but I haven't found a proper explanation.
I have a subroutine that returns a reference to a hash of arrays. The hash keys are simple strings, and the values are references to arrays.
I print out the elements of the array associated with each key, like this
for my $job_name (keys %$build_numbers) {
print "$job_name => ";
my #array = #{#$build_numbers{$job_name}}; # line 3
for my $item ( #array ) {
print "$item \n";
}
}
While I am able to print out the keys & values, I don't really understand the syntax behind line 3.
Our data structure is as follows:
Reference to a hash whose values are references to the populated arrays.
To extract the elements of the array, we have to:
- dereference the hash reference so we can access the keys
- dereference the array reference associated to a key to extract elements.
Final question being:
When dealing with perl hashes of hashes of arrays etc; to extract the elements at the "bottom" of the respective data structure "tree" we have to dereference each level in turn to reach the original data structures, until we obtain our desired level of elements?
Hopefully somebody could help out by clarifying.
Line 3 is taking a slice of your hash reference, but it's a very strange way to do what you're trying to do because a) you normally wouldn't slice a single element and b) there's cleaner and more obvious syntax that would make your code easier to read.
If your data looks something like this:
my $data = {
foo => [0 .. 9],
bar => ['A' .. 'F'],
};
Then the correct version of your example would be:
for my $key (keys(%$data)) {
print "$key => ";
for my $val (#{$data->{$key}}) {
print "$val ";
}
print "\n";
}
Which produces:
bar => A B C D E F
foo => 0 1 2 3 4 5 6 7 8 9
If I understand your second question, the answer is that you can access precise locations of complex data structures if you use the correct syntax. For example:
print "$data->{bar}->[4]\n";
Will print E.
Additional recommended reading: perlref, perlreftut, and perldsc
Working with data structures can be hard depending on how it was made.
I am not sure if your "job" data structure is exactly this but:
#!/usr/bin/env perl
use strict;
use warnings;
use diagnostics;
my $hash_ref = {
job_one => [ 'one', 'two'],
job_two => [ '1','2'],
};
foreach my $job ( keys %{$hash_ref} ){
print " Job => $job\n";
my #array = #{$hash_ref->{$job}};
foreach my $item ( #array )
{
print "Job: $job Item $item\n";
}
}
You have an hash reference which you iterate the keys that are arrays. But each item of this array could be another reference or a simple scalar.
Basically you can work with the ref or undo the ref like you did in the first loop.
There is a piece of documentation you can check for more details here.
So answering your question:
Final question being: - When dealing with perl hashes of hashes of
arrays etc; to extract the elements at the "bottom" of the respective
data structure "tree" we have to dereference each level in turn to
reach the original data structures, until we obtain our desired level
of elements?
It depends on how your data structure was made and if you already know what you are looking for it would be simple to get the value for example:
%city_codes = (
a => 1, b => 2,
);
my $value = $city_codes{a};
Complex data structures comes with complex code.

Count Perl array size

I'm trying to print out the size of my array. I've followed a few other questions like this one on Stack Overflow. However, I never get the result I want.
All I wish for in this example is for the value of 3 to be printed as I have three indexes. All I get, from both print methods is 0.
my #arr;
$arr{1} = 1;
$arr{2} = 2;
$arr{3} = 3;
my $size = #arr;
print $size; # Prints 0
print scalar #arr; # Prints 0
What am I doing wrong, and how do I get the total size of an array when declared and populated this way?
First off:
my #arr;
$arr{1} = 1;
$arr{2} = 2;
$arr{3} = 3;
is nonsense. {} is for hash keys, so you are referring to %arr not #arr. use strict; and use warnings; would have told you this, and is just one tiny fragment of why they're considered mandatory.
To count the elements in an array, merely access it in a scalar context.
print scalar #arr;
if ( $num_elements < #arr ) { do_something(); }
But you would need to change your thing to
my #arr;
$arr[1] = 1;
$arr[2] = 2;
$arr[3] = 3;
And note - the first element of your array $arr[0] would be undefined.
$VAR1 = [
undef,
1,
2,
3
];
As a result, you would get a result of 4. To get the desired 'count of elements' you would need to filter the undefined items, with something like grep:
print scalar grep {defined} #arr;
This will take #arr filter it with grep (returning 3 elements) and then take the scalar value - count of elements, in this case 3.
But normally - you wouldn't do this. It's only necessary because you're trying to insert values into specific 'slots' in your array.
What you would do more commonly, is use either a direct assignment:
my #arr = ( 1, 2, 3 );
Or:
push ( #arr, 1 );
push ( #arr, 2 );
push ( #arr, 3 );
Which inserts the values at the end of the array. You would - if explicitly iterating - go from 0..$#arr but you rarely need to do this when you can do:
foreach my $element ( #arr ) {
print $element,"\n";
}
Or you can do it with a hash:
my %arr;
$arr{1} = 1;
$arr{2} = 2;
$arr{3} = 3;
This turns your array into a set of (unordered) key-value pairs, which you can access with keys %arr and do exactly the same:
print scalar keys %arr;
if ( $elements < keys %arr ) { do_something(); }
In this latter case, your hash will be:
$VAR1 = {
'1' => 1,
'3' => 3,
'2' => 2
};
I would suggest this is bad practice - if you have ordered values, the tool for the job is the array. If you have 'key' values, a hash is probably the tool for the job still - such as a 'request ID' or similar. You can typically tell the difference by looking at how you access the data, and whether there are any gaps (including from zero).
So to answer your question as asked:
my $size = #arr;
print $size; # prints 0
print scalar #arr; # prints 0
These don't work, because you never insert any values into #arr. But you do have a hash called %arr which you created implicitly. (And again - use strict; and use warnings; would have told you this).
You are initializing a hash, not an array.
To get the "size" of your hash you can write.
my $size = keys %arr;
I just thought there should be an illustration of your code run with USUW (use strict/use warnings) and what it adds to the troubleshooting process:
use strict;
use warnings;
my #arr;
...
And when you run it:
Global symbol "%arr" requires explicit package name (did you forget to declare "my %arr"?) at - line 9.
Global symbol "%arr" requires explicit package name (did you forget to declare "my %arr"?) at - line 10.
Global symbol "%arr" requires explicit package name (did you forget to declare "my %arr"?) at - line 11.
Execution of - aborted due to compilation errors.
So USUW.
You may be thinking that you are instantiating an element of #arr when you are typing in the following code:
$arr{1} = 1;
However, you are instantiating a hash doing that. This tells me that you are not using strict or you would have an error. Instead, change to brackets, like this:
$arr[1] = 1;

Why ever use an array instead of a hash?

I have read that it is much faster to iterate through a hash than through an array. Retrieving values from a hash is also much faster.
Instead of using an array, why not just use a hash and give each key a value corresponding to an index? If the items ever need to be in order, they can be sorted.
Retrieving from hash is faster in a sense that you can fetch value directly by key instead of iterating over whole hash (or array when you're searching for particular string). Having that said, $hash{key} isn't faster than $array[0] as no iteration is taking place.
Arrays can't be replaced by hashes, as they have different features,
arrays hashes
------------------------------------
ordered keys x -
push/pop x -
suitable for looping x -
named keys - x
I don't know where you read that hashes are faster than arrays. According to some Perl reference works (Mastering Algorithms with Perl), arrays are faster than hashes (follow this link for some more info).
If speed is your only criterae, you should benchmark to see which technique is going to be faster. It depends on what operations you will be doing onto the array/hash.
Here is an SO link with some further information: Advantage of 'one dimensional' hash over array in Perl
I think this is a good question: it's not so much a high level "language design" query so much as it is an implementation question. It could be worded in a way that emphasizes that - say using hashes versus arrays for a particular technique or use case.
Hashes are nice but you need lists/arrays (c.f. #RobEarl). You can use tie (or modules like Tie::IxHash or Tie::Hash::Indexed ) to "preserve" the order of a hash, but I believe these would have to be slower than a regular hash and in some cases you can't pass them around or copy them in quite the same way.
This code is more or less how a hash works. It should explain well enough why you would want to use an array instead of a hash.
package DIYHash;
use Digest::MD5;
sub new {
my ($class, $buckets) = #_;
my $self = bless [], $class;
$#$self = $buckets || 32;
return $self;
}
sub fetch {
my ( $self, $key ) = #_;
my $i = $self->_get_bucket_index( $key );
my $bo = $self->_find_key_in_bucket($key);
return $self->[$i][$bo][1];
}
sub store {
my ( $self, $key, $value ) = #_;
my $i = $self->_get_bucket_index( $key );
my $bo = $self->_find_key_in_bucket($key);
$self->[$i][$bo] = [$key, $value];
return $value;
}
sub _find_key_in_bucket {
my ($self, $key, $index) = #_;
my $bucket = $self->[$index];
my $i = undef;
for ( 0..$#$bucket ) {
next unless $bucket->[$_][0] eq $key;
$i = $_;
}
$i = #$bucket unless defined $i;
return $i;
}
# This function needs to always return the same index for a given key.
# It can do anything as long as it always does that.
# I use the md5 hashing algorithm here.
sub _get_bucket_index {
my ( $self, $key ) = #_;
# Get a number from 0 to 1 - bucket count.
my $index = unpack( "I", md5($key) ) % #$self;
return $index;
}
1;
To use this amazing cluster of code:
my $hash = DIYHash->new(4); #This hash has 4 buckets.
$hash->store(mouse => "I like cheese");
$hash->store(cat => "I like mouse");
say $hash->fetch('mouse');
Hashes look like they are constant time, rather than order N because for a given data set, you select a number of buckets that keeps the number of items in any bucket very small.
A proper hashing system would be able to resize as appropriate when the number of collisions gets too high. You don't want to do this often, because it is an order N operation.

Loop 1st element of 2d List

I have a list like this
[[hash,hash,hash],useless,useless,useless]
I want to take the first element of hashes and loop through it - i try this:
my #list = get_list_somehow();
print Dumper($list[0][0]); print Dumper($list[0][1]);print Dumper($list[0][2]);
and i am able to access the elements fine manually, but when i try this
my #list = get_list_somehow()[0];
print Dumper($list[0]); print Dumper($list[1]);print Dumper($list[2]);
foreach(#list){
do_something_with($_);
}
only $list[0] returns a value (the first hash, everything else is undefined)
You are taking a subscript [0] of the return value of get_list_somehow() (although technically, you need parentheses there). What you need to do is to dereference the first element in that list. So:
my #list = get_list_somehow();
my $first = $list[0]; # take first element
my #newlist = #$first; # dereference array ref
Of course, this is cumbersome and verbose, and if you just want to print the array with Data::Dumper you can just do:
print Dumper $list[0];
Or if you just want the first array, you can do it in one step. Although this looks complicated and messy:
my #list = #{ (get_list_somehow())[0] };
The #{ ... } will expand an array reference inside it, which is what hopefully is returned from your subscript of the list from get_list_somehow().
I'm taking that your sample data looks like this:
my #data = [
{
one => 1,
two => 2,
three => 3,
},
"value",
"value",
"value",
];
That is, the first element of #data, $data[0] is your hash. Is that correct?
Your hash is a hash reference. That is the $data[0] points to the memory location where that hash is stored.
To get the hash itself, it must be dereferenced:
my %hash = %{ $data[0] }; # Dereference the hash in $data[0]
for my $key ( keys %hash ) {
say qq( \$hash{$key} = "$hash{$key}".);
}
I could have done the dereferencing in one step...
for my $key ( keys #{ $data[0] } ) {
say qq(\$data[0]->{$key} = ") . $data[0]->{$key} . qq(".);
}
Take a look at the Perl Reference Tutorial for information on how to work with references.
I'm guessing a bit on your data structure here:
my $list = [
[ { a => 1,
b => 2,
c => 3, },
{ d => 4, }
{ e => 5, }
], undef, undef, undef,
];
Then we get the 0th (first) element of the top-level array reference, which is another array reference, and then the 0th (first) element of THAT array reference, which is the first hash reference:
my $hr = $list->[0][0];
And iterate over the hash keys. That could also be written as one step: keys %{ $list->[0][0] }. It's a bit easier to see what's going on when broken out into two steps.
for my $key (keys %$hr) {
printf "%s => %s\n", $key, $hr->{$key};
}
Which outputs:
c => 3
a => 1
b => 2

Resources