Iterate through an array of hashes of array in Perl? - arrays

I have trouble visualizing loops and have what I think is an array of hashes of arrays. Please correct me if I am misunderstanding this. I want to be able to loop through the below array and print each key's value.
The End results would print the elements like so:
name
version
pop
tart
Unfortunately, I fall apart when I get to key three.
my #complex = (
[
{
one => 'name',
two => 'version',
three => [qw( pop tart )],
},
],
);
Here's what I've managed so far. I just don't know to handle key three within these loops.
for my $aref (#complex) {
for my $href (#$aref) {
for (keys %{$href}) {
print "$href->{$_}\n";
}
}
}
Any help would be appreciated.

What seems to be holding you back is that your hash has some values which are strings and some which are array references. You can find out which are which using ref and then print accordingly
for my $aref (#complex) {
for my $href (#$aref) {
for my $key (keys %{$href}) {
my $refval = ref $href->{$key};
if (not $refval) { # not a reference at all
print "$href->{$key}\n";
} elsif ($refval eq 'ARRAY') {
print "$_\n" for #{ $href->{$key} };
#print "#{ $href->{$key} }\n"; # or all in one line
} else {
warn "Unexpected data structure: $refval";
}
}
}
}
For deeper structures, or those that you don't know, write a recursive procedure based on this. And then there are modules which will do it, as well.
Note that a careful consideration of what data structures to use pays off handsomely; it is one of the critical parts of design. On the other hand, once these complex data structures grow unwieldy, or rather if you estimate ahead of time that that can happen in the lifetime of the project, the answer is to switch to a class.

Related

De-reference x number of times for x number of data structures

I've come across an obstacle in one of my perl scripts that I've managed to solve, but I don't really understand why it works the way it works. I've been scouring the internet but I haven't found a proper explanation.
I have a subroutine that returns a reference to a hash of arrays. The hash keys are simple strings, and the values are references to arrays.
I print out the elements of the array associated with each key, like this
for my $job_name (keys %$build_numbers) {
print "$job_name => ";
my #array = #{#$build_numbers{$job_name}}; # line 3
for my $item ( #array ) {
print "$item \n";
}
}
While I am able to print out the keys & values, I don't really understand the syntax behind line 3.
Our data structure is as follows:
Reference to a hash whose values are references to the populated arrays.
To extract the elements of the array, we have to:
- dereference the hash reference so we can access the keys
- dereference the array reference associated to a key to extract elements.
Final question being:
When dealing with perl hashes of hashes of arrays etc; to extract the elements at the "bottom" of the respective data structure "tree" we have to dereference each level in turn to reach the original data structures, until we obtain our desired level of elements?
Hopefully somebody could help out by clarifying.
Line 3 is taking a slice of your hash reference, but it's a very strange way to do what you're trying to do because a) you normally wouldn't slice a single element and b) there's cleaner and more obvious syntax that would make your code easier to read.
If your data looks something like this:
my $data = {
foo => [0 .. 9],
bar => ['A' .. 'F'],
};
Then the correct version of your example would be:
for my $key (keys(%$data)) {
print "$key => ";
for my $val (#{$data->{$key}}) {
print "$val ";
}
print "\n";
}
Which produces:
bar => A B C D E F
foo => 0 1 2 3 4 5 6 7 8 9
If I understand your second question, the answer is that you can access precise locations of complex data structures if you use the correct syntax. For example:
print "$data->{bar}->[4]\n";
Will print E.
Additional recommended reading: perlref, perlreftut, and perldsc
Working with data structures can be hard depending on how it was made.
I am not sure if your "job" data structure is exactly this but:
#!/usr/bin/env perl
use strict;
use warnings;
use diagnostics;
my $hash_ref = {
job_one => [ 'one', 'two'],
job_two => [ '1','2'],
};
foreach my $job ( keys %{$hash_ref} ){
print " Job => $job\n";
my #array = #{$hash_ref->{$job}};
foreach my $item ( #array )
{
print "Job: $job Item $item\n";
}
}
You have an hash reference which you iterate the keys that are arrays. But each item of this array could be another reference or a simple scalar.
Basically you can work with the ref or undo the ref like you did in the first loop.
There is a piece of documentation you can check for more details here.
So answering your question:
Final question being: - When dealing with perl hashes of hashes of
arrays etc; to extract the elements at the "bottom" of the respective
data structure "tree" we have to dereference each level in turn to
reach the original data structures, until we obtain our desired level
of elements?
It depends on how your data structure was made and if you already know what you are looking for it would be simple to get the value for example:
%city_codes = (
a => 1, b => 2,
);
my $value = $city_codes{a};
Complex data structures comes with complex code.

Perl - How do I update (and access) an array stored in array stored in a hash?

Perhaps I have made this more complicated than I need it to be but I am currently trying to store an array that contains, among other things, an array inside a hash in Perl.
i.e. hash -> array -> array
use strict;
my %DEVICE_INFORMATION = {}; #global hash
sub someFunction() {
my $key = 'name';
my #storage = ();
#assume file was properly opened here for the foreach-loop
foreach my $line (<DATA>) {
if(conditional) {
my #ports = ();
$storage[0] = 'banana';
$storage[1] = \#ports;
$storage[2] = '0';
$DEVICE_INFORMATION{$key} = \#storage;
}
elsif(conditional) {
push #{$DEVICE_INFORMATION{$key}[1]}, 5;
}
}#end foreach
} #end someFunction
This is a simplified version of the code I am writing. I have a subroutine that I call in the main. It parses a very specifically designed file. That file guarantees that the if statement fires before subsequent elsif statement.
I think the push call in the elsif statement is not working properly - i.e. 5 is not being stored in the #ports array that should exist in the #storage array that should be returned when I hash the key into DEVICE_INFORMATION.
In the main I try and print out each element of the #storage array to check that things are running smoothly.
#main execution
&someFunction();
print $DEVICE_INFORMATION{'name'}[0];
print $DEVICE_INFORMATION{'name'}[1];
print $DEVICE_INFORMATION{'name'}[2];
The output for this ends up being... banana ARRAY(blahblah) 0
If I change the print statement for the middle call to:
print #{$DEVICE_INFORMATION{'name'}[1]};
Or to:
print #{$DEVICE_INFORMATION{'name'}[1]}[0];
The output changes to banana [blankspace] 0
Please advise on how I can properly update the #ports array while it is stored inside the #storage array that has been hash'd into DEVICE_INFORMATION and then how I can access the elements of #ports. Many thanks!
P.S. I apologize for the length of this post. It is my first question on stackoverflow.
I was going to tell you that Data::Dumper can help you sort out Perl data structures, but Data::Dumper can also tell you about your first problem:
Here's what happens when you sign open-curly + close-curly ( '{}' ) to a hash:
use Data::Dumper ();
my %DEVICE_INFORMATION = {}; #global hash
print Dumper->Dump( [ \%DEVICE_INFORMATION ], [ '*DEVICE_INFORMATION ' ] );
Here's the output:
%DEVICE_INFORMATION = (
'HASH(0x3edd2c)' => undef
);
What you did is you assigned the stringified hash reference as a key to the list element that comes after it. implied
my %DEVICE_INFORMATION = {} => ();
So Perl assigned it a value of undef.
When you assign to a hash, you assign a list. A literal empty hash is not a list, it's a hash reference. What you wanted to do for an empty hash--and what is totally unnecessary--is this:
my %DEVICE_INFORMATION = ();
And that's unnecessary because it is exactly the same thing as:
my %DEVICE_INFORMATION;
You're declaring a hash, and that statement fully identifies it as a hash. And Perl is not going to guess what you want in it, so it's an empty hash from the get-go.
Finally, my advice on using Data::Dumper. If you started your hash off right, and did the following:
my %DEVICE_INFORMATION; # = {}; #global hash
my #ports = ( 1, 2, 3 );
# notice that I just skipped the interim structure of #storage
# and assigned it as a literal
# * Perl has one of the best literal data structure languages out there.
$DEVICE_INFORMATION{name} = [ 'banana', \#ports, '0' ];
print Data::Dumper->Dump(
[ \%DEVICE_INFORMATION ]
, [ '*DEVICE_INFORMATION' ]
);
What you see is:
%DEVICE_INFORMATION = (
'name' => [
'banana',
[
1,
2,
3
],
'0'
]
);
So, you can better see how it's all getting stored, and what levels you have to deference and how to get the information you want out of it.
By the way, Data::Dumper delivers 100% runnable Perl code, and shows you how you can specify the same structure as a literal. One caveat, you would have to declare the variable first, using strict (which you should always use anyway).
You update #ports properly.
Your print statement accesses $storage[1] (reference to #ports) in wrong way.
You may use syntax you have used in push.
print $DEVICE_INFORMATION{'name'}[0], ";",
join( ':', #{$DEVICE_INFORMATION{'name'}[1]}), ";",
$DEVICE_INFORMATION{'name'}[2], "\n";
print "Number of ports: ", scalar(#{$DEVICE_INFORMATION{'name'}[1]})),"\n";
print "First port: ", $DEVICE_INFORMATION{'name'}[1][0]//'', "\n";
# X//'' -> X or '' if X is undef

Faster way to check for element in array?

This function does the same as exists does with hashes.
I plan on use it a lot.
Can it be optimized in some way?
my #a = qw/a b c d/;
my $ret = array_exists("b", #a);
sub array_exists {
my ($var, #a) = #_;
foreach my $e (#a) {
if ($var eq $e) {
return 1;
}
}
return 0;
}
If you have to do this a lot on a fixed array, use a hash instead:
my %hash = map { $_, 1 } #array;
if( exists $hash{$key} ) { ... }
Some people reach for the smart match operator, but that's one of the features that we need to remove from Perl. You need to decide if this should match, where the array hold an array reference that has a hash reference with the key b:
use 5.010;
my #a = (
qw(x y z),
[ { 'b' => 1 } ],
);
say 'Matches' if "b" ~~ #a; # This matches
Since the smart match is recursive, if keeps going down into data structures. I write about some of this in Rethinking smart matching.
You can use smart matching, available in Perl 5.10 and later:
if ("b" ~~ #a) {
# "b" exists in #a
}
This should be much faster than a function call.
I'd use List::MoreUtils::any.
my $ret = any { $_ eq 'b' } #a;
Since there are lots of similar questions on StackOverflow where different "correct answers" return different results, I tried to compare them. This question seems to be a good place to share my little benchmark.
For my tests I used a test set (#test_set) of 1,000 elements (strings) of length 10 where only one element ($search_value) matches a given string.
I took the following statements to validate the existence of this element in a loop of 100,000 turns.
_grep
grep( $_ eq $search_value, #test_set )
_hash
{ map { $_ => 1 } #test_set }->{ $search_value }
_hash_premapped
$mapping->{ $search_value }
uses a $mapping that is precalculated as $mapping = { map { $_ => 1 } #test_set } (which is included in the final measuring)
_regex
sub{ my $rx = join "|", map quotemeta, #test_set; $search_value =~ /^(?:$rx)$/ }
_regex_prejoined
$search_value =~ /^(?:$rx)$/
uses a regular expression $rx that is precalculated as $rx = join "|", map quotemeta, #test_set; (which is included in the final measuring)
_manual_first
sub{ foreach ( #test_set ) { return 1 if( $_ eq $search_value ); } return 0; }
_first
first { $_ eq $search_value } #test_set
from List::Util (version 1.38)
_smart
$search_value ~~ #test_set
_any
any { $_ eq $search_value } #test_set
from List::MoreUtils (version 0.33)
On my machine ( Ubuntu, 3.2.0-60-generic, x86_64, Perl v5.14.2 ) I got the following results. The shown values are seconds and returned by gettimeofday and tv_interval of Time::HiRes (version 1.9726).
Element $search_value is located at position 0 in array #test_set
_hash_premapped: 0.056211
_smart: 0.060267
_manual_first: 0.064195
_first: 0.258953
_any: 0.292959
_regex_prejoined: 0.350076
_grep: 5.748364
_regex: 29.27262
_hash: 45.638838
Element $search_value is located at position 500 in array #test_set
_hash_premapped: 0.056316
_regex_prejoined: 0.357595
_first: 2.337911
_smart: 2.80226
_manual_first: 3.34348
_any: 3.408409
_grep: 5.772233
_regex: 28.668455
_hash: 45.076083
Element $search_value is located at position 999 in array #test_set
_hash_premapped: 0.054434
_regex_prejoined: 0.362615
_first: 4.383842
_smart: 5.536873
_grep: 5.962746
_any: 6.31152
_manual_first: 6.59063
_regex: 28.695459
_hash: 45.804386
Conclusion
The fastest method to check the existence of an element in an array is using prepared hashes. You of course buy that by an proportional amount of memory consumption and it only makes sense if you search for elements in the set multiple times. If your task includes small amounts of data and only a single or a few searches, hashes can even be the worst solution. Not the same way fast, but a similar idea would be to use prepared regular expressions, which seem to have a smaller preparation time.
In many cases, a prepared environment is no option.
Surprisingly List::Util::first has very good results, when it comes to the comparison of statements, that don't have a prepared environment. While having the search value at the beginning (which could be perhaps interpreted as the result in smaller sets, too) it is very close to the favourites ~~ and any (and could even be in the range of measurement inaccuracy). For items in the middle or at the end of my larger test set, first is definitely the fastest.
brian d foy suggested using a hash, which gives O(1) lookups, at the cost of slightly more expensive hash creation. There is a technique that Marc Jason Dominus describes in his book Higher Order Perl where by a hash is used to memoize (or cache) results of a sub for a given parameter. So for example, if findit(1000) always returns the same thing for the given parameter, there's no need to recalculate the result every time. The technique is implemented in the Memoize module (part of the Perl core).
Memoizing is not always a win. Sometimes the overhead of the memoized wrapper is greater than the cost of calculating a result. Sometimes a given parameter is unlikely to ever be checked more than once or a relatively few times. And sometimes it cannot be guaranteed that the result of a function for a given parameter will always be the same (ie, the cache can become stale). But if you have an expensive function with stable return values per parameter, memoization can be a big win.
Just as brian d foy's answer uses a hash, Memoize uses a hash internally. There is additional overhead in the Memoize implementation, but the benefit to using Memoize is that it doesn't require refactoring the original subroutine. You just use Memoize; and then memoize( 'expensive_function' );, provided it meets the criteria for benefitting from memoization.
I took your original subroutine and converted it to work with integers (just for simplicity in testing). Then I added a second version that passed a reference to the original array rather than copying the array. With those two versions, I created two more subs that I memoized. I then benchmarked the four subs.
In benchmarking, I had to make some decisions. First, how many iterations to test. The more iterations we test, the more likely we are to have good cache hits for the memoized versions. Then I also had to decide how many items to put into the sample array. The more items, the less likely to have cache hits, but the more significant the savings when a cache hit occurs. I ultimately decided on an array to be searched containing 8000 elements, and chose to search through 24000 iterations. That means that on average there should be two cache hits per memoized call. (The first call with a given param will write to the cache, while the second and third calls will read from the cache, so two good hits on average).
Here is the test code:
use warnings;
use strict;
use Memoize;
use Benchmark qw/cmpthese/;
my $n = 8000; # Elements in target array
my $count = 24000; # Test iterations.
my #a = ( 1 .. $n );
my #find = map { int(rand($n)) } 0 .. $count;
my ( $orx, $ormx, $opx, $opmx ) = ( 0, 0, 0, 0 );
memoize( 'orig_memo' );
memoize( 'opt_memo' );
cmpthese( $count, {
original => sub{ my $ret = original( $find[ $orx++ ], #a ); },
orig_memo => sub{ my $ret = orig_memo( $find[ $ormx++ ], #a ); },
optimized => sub{ my $ret = optimized( $find[ $opx++ ], \#a ); },
opt_memo => sub{ my $ret = opt_memo( $find[ $opmx++ ], \#a ); }
} );
sub original {
my ( $var, #a) = #_;
foreach my $e ( #a ) {
return 1 if $var == $e;
}
return 0;
}
sub orig_memo {
my ( $var, #a ) = #_;
foreach my $e ( #a ) {
return 1 if $var == $e;
}
return 0;
}
sub optimized {
my( $var, $aref ) = #_;
foreach my $e ( #{$aref} ) {
return 1 if $var == $e;
}
return 0;
}
sub opt_memo {
my( $var, $aref ) = #_;
foreach my $e ( #{$aref} ) {
return 1 if $var == $e;
}
return 0;
}
And here are the results:
Rate orig_memo original optimized opt_memo
orig_memo 876/s -- -10% -83% -94%
original 972/s 11% -- -82% -94%
optimized 5298/s 505% 445% -- -66%
opt_memo 15385/s 1657% 1483% 190% --
As you can see, the memoized version of your original function was actually slower. That's because so much of the cost of your original subroutine was spent in making copies of the 8000 element array, combined with the fact that there is additional call-stack and bookkeeping overhead with the memoized version.
But once we pass an array reference instead of a copy, we remove the expense of passing the entire array around. Your speed jumps considerably. But the clear winner is the optimized (ie, passing array refs) version that we memoized (cached), at 1483% faster than your original function. With memoization the O(n) lookup only happens the first time a given parameter is checked. Subsequent lookups occur in O(1) time.
Now you would have to decide (by Benchmarking) whether memoization helps you. Certainly passing an array ref does. And if memoization doesn't help you, maybe brian's hash method is best. But in terms of not having to rewrite much code, memoization combined with passing an array ref may be an excellent alternative.
Your current solution iterates through the array before it finds the element it is looking for. As such, it is a linear algorithm.
If you sort the array first with a relational operator (>for numeric elements, gt for strings) you can use binary search to find the elements. It is a logarithmic algorithm, much faster than linear.
Of course, one must consider the penalty of sorting the array in the first place, which is a rather slow operation (n log n). If the contents of the array you are matching against change often, you must sort after every change, and it gets really slow. If the contents remain the same after you've initially sorted them, binary search ends up being practically faster.
You can use grep:
sub array_exists {
my $val = shift;
return grep { $val eq $_ } #_;
}
Surprisingly, it's not off too far in speed from List::MoreUtils' any(). It's faster if your item is at the end of the list by about 25% and slower by about 50% if your item is at the start of the list.
You can also inline it if needed -- no need to shove it off into a subroutine. i.e.
if ( grep { $needle eq $_ } #haystack ) {
### Do something
...
}

Filling hash of multi-dimensional arrays in perl

Given three scalars, what is the perl syntax to fill a hash in which one of the scalars is the key, another determines which of two arrays is filled, and the third is appended to one of the arrays? For example:
my $weekday = "Monday";
my $kind = "Good";
my $event = "Birthday";
and given only the scalars and not their particular values, obtained inside a loop, I want a hash like:
my %Weekdays = {
'Monday' => [
["Birthday", "Holiday"], # The Good array
["Exam", "Workday"] # The Bad array
]
'Saturday' => [
["RoadTrip", "Concert", "Movie"],
["Yardwork", "VisitMIL"]
]
}
I know how to append a value to an array in a hash, such as if the key is a single array:
push( #{ $Weekdays{$weekday} }, $event);
Used in a loop, that could give me:
%Weekdays = {
'Monday' => [
'Birthday',
'Holiday',
'Exam',
'Workday'
]
}
I suppose the hash key is the particular weekday, and the value should be a two dimensional array. I don't know the perl syntax to, say, push Birthday into the hash as element [0][0] of the weekday array, and the next time through the loop, push another event in as [0][1] or [1][0]. Similarly, I don't know the syntax to access same.
Using your variables, I'd write it like this:
push #{ $Weekdays{ $weekday }[ $kind eq 'Good' ? 0 : 1 ] }, $event;
However, I'd probably just make the Good/Bad specifiers keys as well. And given my druthers:
use autobox::Core;
( $Weekdays{ $weekday }{ $kind } ||= [] )->push( $event );
Note that the way I've written it here, neither expression cares whether or not an array exists before we start.
Is there some reason that
push #{ $Weekdays{Monday}[0] }, "whatever";
isn’t working for you?

How to get hashes out of arrays in Perl?

I want to write a little "DBQuery" function in perl so I can have one-liners which send an SQL statement and receive back and an array of hashes, i.e. a recordset. However, I'm running into an issue with Perl syntax (and probably some odd pointer/reference issue) which is preventing me from packing out the information from the hash that I'm getting from the database. The sample code below demonstrates the issue.
I can get the data "Jim" out of a hash inside an array with this syntax:
print $records[$index]{'firstName'}
returns "Jim"
but if I copy the hash record in the array to its own hash variable first, then I strangely can't access the data anymore in that hash:
%row = $records[$index];
$row{'firstName'};
returns "" (blank)
Here is the full sample code showing the problem. Any help is appreciated:
my #records = (
{'id' => 1, 'firstName' => 'Jim'},
{'id' => 2, 'firstName' => 'Joe'}
);
my #records2 = ();
$numberOfRecords = scalar(#records);
print "number of records: " . $numberOfRecords . "\n";
for(my $index=0; $index < $numberOfRecords; $index++) {
#works
print 'you can print the records like this: ' . $records[$index]{'firstName'} . "\n";
#does NOT work
%row = $records[$index];
print 'but not like this: ' . $row{'firstName'} . "\n";
}
The nested data structure contains a hash reference, not a hash.
# Will work (the -> dereferences the reference)
$row = $records[$index];
print "This will work: ", $row->{firstName}, "\n";
# This will also work, by promoting the hash reference into a hash
%row = %{ $records[$index] };
print "This will work: ", $row{firstName}, "\n";
If you're ever presented with a deep Perl data structure, you may profit from printing it using Data::Dumper to print it into human-readable (and Perl-parsable) form.
The array of hashes doesn't actually contain hashes, but rather an references to a hash.
This line:
%row = $records[$index];
assigns %row with one entry. The key is the scalar:
{'id' => 1, 'firstName' => 'Jim'},
Which is a reference to the hash, while the value is blank.
What you really want to do is this:
$row = $records[$index];
$row->{'firstName'};
or else:
$row = %{$records[$index];}
$row{'firstName'};
Others have commented on hashes vs hashrefs. One other thing that I feel should be mentioned is your DBQuery function - it seems you're trying to do something that's already built into the DBI? If I understand your question correctly, you're trying to replicate something like selectall_arrayref:
This utility method combines "prepare", "execute" and "fetchall_arrayref" into a single call. It returns a reference to an array containing a reference to an array (or hash, see below) for each row of data fetched.
To add to the lovely answers above, let me add that you should always, always, always (yes, three "always"es) use "use warnings" at the top of your code. If you had done so, you would have gotten the warning "Reference found where even-sized list expected at -e line 1."
what you actually have in your array is a hashref, not a hash. If you don't understand this concept, its probably worth reading the perlref documentation.
to get the hash you need to do
my %hash = %{#records[$index]};
Eg.
my #records = (
{'id' => 1, 'firstName' => 'Jim'},
{'id' => 2, 'firstName' => 'Joe'}
);
my %hash = %{$records[1]};
print $hash{id}."\n";
Although. I'm not sure why you would want to do this, unless its for academic purposes. Otherwise, I'd recommend either fetchall_hashref/fetchall_arrayref in the DBI module or using something like Class::DBI.
Also note a good perl idiom to use is
for my $rowHR ( #records ) {
my %row = %$rowHR;
#or whatever...
}
to iterate through the list.

Resources