Shorthand to modify value in a array of hash refs - arrays

I have a array of hash refs. The date field in a hash is stored in epoch. I have to format it to human readable before returning the array. Following is my code:
for my $post (#sorted) {
$post->{date} = format_time($post->{date});
push #formatted, $post;
}
I have tried
my #formatted = map {$_{date} = format_time($_{date})} #sorted;
All fields except {date} are dropped.
Is there any smarter method?
Thanks

$_->{date} = format_time($_->{date}) for #sorted.
Then the dates in #sorted will have been converted.

There's nothing really wrong with the for loop you're currently using. The map can work too, but there are two problems:
The hashref in the array is stored in the scalar $_. You are accessing the hash %_.
The return value of the block is what will end up in the result array. In your case, that's the result of the assignment rather than the entire hashref.
Also, do note that the hashrefs in #sorted will be modified. The following map statement should work for you:
my #formatted = map { $_->{date} = format_time($_->{date}); $_ } #sorted;

If you really want:
sub format_time_in_place {
my $time = $_[0];
# do work
$_[0] = $reformatted_time;
}
# elsewhere
format_time_in_place($_->{date}) for #sorted;
I helpfully renamed the function to reduce the odds of the maintenance programmer being tempted to become a homicidal axe murderer. There still may be an element of shock if said programmer was not aware that you can change passed in arguments with the correct manipulation of #_.

This is equivalent to your code:
$_->{date} = format_time($_->{date}) for #sorted;
#formatted = #sorted;
I don't know why you want two identical arrays, but I don't see the point of combining those two unrelated operations. It'll just make your code less readable.

If you want or don't mind not referencing the same hashes as are in #sorted, you can:
my #formatted = map +{ %$_, 'date' => format_time($_->{date}) }, #sorted;

Related

how to replace values in an array of hashes properly in Perl?

As seen below, I have a foreach loop inside which, a value inside an array of hashes is being replaced with a value from another array of hashes.
The second foreach loop is just to print and test whether the values got assigned correctly.
foreach my $row (0 .. $#row_buff) {
$row_buff[$row]{'offset'} = $vars[$row]{'expression'};
print $row_buff[$row]{'offset'},"\n";
}
foreach (0 .. $#row_buff) {
print $row_buff[$_]{'offset'},"\n";
}
Here #row_buff and #vars are the two array of hashes. They are prefilled with values for all keys used.
The hashes were pushed into the arrays like so:
push #row_buff, \%hash;
ISSUE:
Let's say the print statement in the first foreach print's like this:
string_a
string_b
string_c
string_d
Then the print statement in the second foreach loop print's like so:
string_d
string_d
string_d
string_d
This is what confuses me. Both print statements are supposed to print the exact same way am I right? But the value that gets printed by the second print statement is just the last value alone in a repeated manner. Could someone please point me to what could be going wrong here? Any hint is greatly appreciated. This is my first time putting up a question so pardon me if I missed anything.
UPDATE
There was a bit of information that I could have added, sorry about that everyone. There was one more line before the foreach, it was like so:
#row_buff = (#row_buff) x $itercnt;
foreach my $row (0 .. $#row_buff) {
$row_buff[$row]{'offset'} = $vars[$row]{'expression'};
print $row_buff[$row]{'offset'},"\n";
}
foreach (0 .. $#row_buff) {
print $row_buff[$_]{'offset'},"\n";
}
$itercnt is an integer. I was using it to replicate the #row_buff that many number of times.
This clearly has to do with storing references on the array, instead of independent data. How that comes about isn't clear since details aren't given, but the following discussion should help.
Consider these two basic examples.
First, place a hash (reference) on an array, first changing a value each time
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd);
# use Storable qw(dclone);
my %h = ( a => 1, b => 2 );
my #ary_w_refs;
for my $i (1..3) {
$h{a} = $i;
push #ary_w_refs, \%h; # almost certainly WRONG
# push #ary_w_refs, { %h }; # *copy* data
# push #ary_w_refs, dclone \%h; # may be necessary, or just safer
}
dd $_ for #ary_w_refs;
I use Data::Dump for displaying complex data structures, for its simplicity and default compact output. There are other modules for this purpose, Data::Dumper being in the core (installed).
The above prints
{ a => 3, b => 2 }
{ a => 3, b => 2 }
{ a => 3, b => 2 }
See how that value for key a, that we changed in the hash each time, and so supposedly set for each array element, to a different value (1, 2, 3) -- is the same in the end, and equal to the one we assigned last? (This appears to be the case in the question.)
This is because we assigned a reference to the hash %h to each element, so even though every time through the loop we first change the value in the hash for that key in the end it's just the reference there, at each element, to that same hash.∗
So when the array is queried after the loop we can only get what is in the hash (at key a it's the last assigned number, 3). The array doesn't have its own data, only a pointer to hash's data.† (Thus hash's data can be changed by writing to the array as well, as seen in the example below.)
Most of the time, we want a separate, independent copy. Solution? Copy the data.
Naively, instead of
push #ary_w_refs, \%h;
we can do
push #ary_w_refs, { %h };
Here {} is a constructor for an anonymous hash,‡ so %h inside gets copied. So actual data gets into the array and all is well? In this case, yes, where hash values are plain strings/numbers.
But what when the hash values themselves are references? Then those references get copied, and #ary_w_refs again does not have its own data! We'll have the exact same problem. (Try the above with the hash being ( a => [1..10] ))
If we have a complex data structure, carrying references for values, we need a deep copy. One good way to do that is to use a library, and Storable with its dclone is very good
use Storable qw(dclone);
...
push #ary_w_refs, dclone \%h;
Now array elements have their own data, unrelated (but at the time of copy equal) to %h.
This is a good thing to do with a simple hash/array as well, to be safe from future changes, whereby the hash is changed but we forget about the places where it's copied (or the hash and its copies don't even know about each other).
Another example. Let's populate an array with a hashref, and then copy it to another array
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd pp);
my %h = ( a => 1, b => 2 );
my #ary_src = \%h;
say "Source array: ", pp \#ary_src;
my #ary_tgt = $ary_src[0];
say "Target array: ", pp \#ary_tgt;
$h{a} = 10;
say "Target array: ", pp(\#ary_tgt), " (after hash change)";
$ary_src[0]{b} = 20;
say "Target array: ", pp(\#ary_tgt), " (after hash change)";
$ary_tgt[0]{a} = 100;
dd \%h;
(For simplicity I use arrays with only one element.)
This prints
Source array: [{ a => 1, b => 2 }]
Target array: [{ a => 1, b => 2 }]
Target array: [{ a => 10, b => 2 }] (after hash change)
Target array: [{ a => 10, b => 20 }] (after hash change)
{ a => 100, b => 20 }
That "target" array, which supposedly was merely copied off of a source array, changes when the distant hash changes! And when its source array changes. Again, it is because a reference to the hash gets copied, first to one array and then to the other.
In order to get independent data copies, again, copy the data, each time. I'd again advise to be on the safe side and use Storable::dclone (or an equivalent library of course), even with simple hashes and arrays.
Finally, note a slightly sinister last case -- writing to that array changes the hash! This (second-copied) array may be far removed from the hash, in a function (in another module) that the hash doesn't even know of. This kind of an error can be a source of really hidden bugs.
Now if you clarify where references get copied, with a more complete (simple) representation of your problem, we can offer a more specific remedy.
∗ An important way of using a reference that is correct, and which is often used, is when the structure taken the reference of is declared as a lexical variable every time through
for my $elem (#data) {
my %h = ...
...
push #results, \%h; # all good
}
That lexical %h is introduced anew every time so the data for its reference on the array is retained, as the array persists beyond the loop, independently for each element.
It is also more efficient doing it this way since the data in %h isn't copied, like it is with { %h }, but is just "re-purposed," so to say, from the lexical %h that gets destroyed at the end of iteration to the reference in the array.
This of course may not always be suitable, if a structure to be copied naturally lives outside of the loop. Then use a deep copy of it.
The same kind of a mechanism works in a function call
sub some_func {
...
my %h = ...
...
return \%h; # good
}
my $hashref = some_func();
Again, the lexical %h goes out of scope as the function returns and it doesn't exist any more, but the data it carried and a reference to it is preserved, since it is returned and assigned so its refcount is non-zero. (At least returned to the caller, that is; it could've been passed yet elsewhere during the sub's execution so we may still have a mess with multiple actors working with the same reference.) So $hashref has a reference to data that had been created in the sub.
Recall that if a function was passed a reference, when it was called or during its execution (by calling yet other subs which return references), changed and returned it, then again we have data changed in some caller, potentially far removed from this part of program flow.
This is done often of course, with larger pools of data which can't just be copied around all the time, but then one need be careful and organize code (to be as modular as possible, for one) so to minimize chance of errors.
† This is a loose use of the word "pointer," for what a reference does, but if one were to refer to C I'd say that it's a bit of a "dressed" C-pointer
‡ In a different context it can be a block

Perl unit testing: check if a string is an array

I have this function that I want to test:
use constant NEXT => 'next';
use constant BACK => 'back';
sub getStringIDs {
return [
NEXT,
BACK
];
}
I've tried to write the following test, but it fails:
subtest 'check if it contains BACK' => sub {
use constant BACK => 'back';
my $strings = $magicObject->getStringIDs();
ok($strings =~ /BACK/);
}
What am I doing wrong?
Your getStringIDs() method returns an array reference.
The regex binding operator (=~) expects a string on its left-hand side. So it converts your array reference to a string. And a stringified array reference will look something like ARRAY(0x1ff4a68). It doesn't give you any of the contents of the array.
You can get from your array reference ($strings) to an array by dereferencing it (#$strings). And you can stringify an array by putting it in double quotes ("#$strings").
So you could do something like this:
ok("#$strings" =~ /BACK/);
But I suspect, you want word boundary markers in there:
ok("#$strings" =~ /\bBACK\b/);
And you might also prefer the like() testing function.
like("#$strings", qr[\bBACK\b], 'Strings array contains BACK');
Update: Another alternative is to use grep to check that one of your array elements is the string "BACK".
# Note: grep in scalar context returns the number of elements
# for which the block evaluated as 'true'. If we don't care how
# many elements are "BACK", we can just check that return value
# for truth with ok(). If we care that it's exactly 1, we should
# use is(..., 1) instead.
ok(grep { $_ eq 'BACK' } #$strings, 'Strings array contains BACK');
Update 2: Hmm... the fact that you're using constants here complicates this. Constants are subroutines and regexes are strings and subroutines aren't interpolated in strings.
The return value of $magicObject->getStringIDs is an array reference, not a string. It looks like the spirit of your test is that you want to check if at least one element in the array pattern matches BACK. The way to do this is to grep through the dereferenced array and check if there are a non-zero number of matches.
ok( grep(/BACK/,#$strings) != 0, 'contains BACK' );
At one time, the smartmatch operator promised to be a solution to this problem ...
ok( $strings ~~ /BACK/ )
but it has fallen into disrepute and should be used with caution (and the no warnings 'experimental::smartmatch' pragma).
The in operator is your friend.
use Test::More;
use syntax 'in';
use constant NEXT => 'next';
use constant BACK => 'back';
ok BACK |in| [NEXT, BACK], 'BACK is in the arrayref';
done_testing;

How to use an array reference to pairs of elements

I am considering this answer which uses a single array reference of points, where a point is a reference to a two-element array.
My original code of the question (function extract-crossing) uses two separate arrays $x and $y here which I call like this:
my #x = #{ $_[0] }; my #y = #{ $_[1] };
...
return extract_crossing(\#x, \#y);
The new code below based on the answer takes (x, y) and returns a single datatype, here x intercept points:
use strict; use warnings;
use Math::Geometry::Planar qw(SegmentLineIntersection);
use Test::Exception;
sub x_intercepts {
my ($points) = #_;
die 'Must pass at least 2 points' unless #$points >= 2;
my #intercepts;
my #x_axis = ( [0, 0], [1, 0] );
foreach my $i (0 .. $#$points - 1) {
my $intersect = SegmentLineIntersection([#$points[$i,$i+1],#x_axis]);
push #intercepts, $intersect if $intersect;
}
return \#intercepts;
}
which I try to call like this:
my #x = #{ $_[0] }; my #y = #{ $_[1] };
...
my $masi = x_intercepts(\#x);
return $masi;
However, the code does not make sense.
I am confused about passing the "double array" to the x_intercepts() function.
How can you make the example code clearer to the original setting?
If I am understanding the problem here, #ThisSuitIsBlackNot++ has written a function (x_intercepts which is available in the thread: Function to extract intersections of crossing lines on axes) that expects its argument to be a reference to a list of array references. The x_intercepts subroutine in turn uses a function from Math::Geometry::Planar which expects the points of a line segment to be passed as series of array references/anonymous arrays that contain the x,y values for each point.
Again - it is not entirely clear - but it seems your data is in two different arrays: one containing all the x values and one with the corresponding y values. Is this the case? If this is not correct please leave a comment and I will remove this answer.
If that is the source of your problem then you can "munge" or transform your data before you pass it to x_intercepts or - as #ThisSuitIsBlackNot suggests - you can rewrite the function. Here is an example of munging your existing data into an "#input_list" to pass to x_intercepts.
my #xs = qw/-1 1 3/;
my #ys = qw/-1 1 -1 /;
my #input_list ;
foreach my $i ( 0..$#ys ) {
push #input_list, [ $xs[$i], $ys[$i] ] ;
}
my $intercept_list = x_intercepts(\#input_list) ;
say join ",", #$_ for #$intercept_list ;
Adding the lines above to your script produces:
Output:
0,0
2,0
You have to be very careful doing this kind of thing and using tests to make sure you are passing the correctly transformed data in an expected way is a good idea.
I think a more general difficulty is that until you are familiar with perl it is sometimes tricky to easily see what sorts of values a subroutine is expecting, where they end up after they are passed in, and how to access them.
A solid grasp of perl data structures can help with that - for example I think what you are calling a "double array" or "double element" here is an "array of arrays". There are ways to make it easier to see where default arguments passed to a subroutine (in #_) are going (notice how #ThisSuitIsBlackNot has passed them to a nicely named array reference: "($points)"). Copious re-reading of perldocperbsub can help things seem more obvious.
References are key to understanding perl subroutines since to pass an array or hash to a subrouting you need to do so by references. If the argument passed x_intercepts is a list of two lists of anonymous arrays then when it is assigned to ($points), #$points->[0] #$points->[1] will be the arrays contain those lists.
I hope this helps and is not too basic (or incorrect). If #ThisSuitIsBlackNot finds the time to provide an answer you should accept it: some very useful examples have been provided.

Perl How to access a hash that is the element of an array that is the value of another hash?

I am trying to create a Hash that has as its value an array.
The first element of the value(which is an array) is a scalar.
The second element of the value(which is an array) is another hash.
I have put values in the key and value of this hash as follows :
${${$senseInformationHash{$sense}[1]}{$word}}++;
Here,
My main hash -> senseInformationHash
My Value -> Is an Array
So, ${$senseInformationHash{$sense}[1]} gives me reference to my hash
and I put in key and value as follows :
${${$senseInformationHash{$sense}[1]}{$word}}++;
I am not sure if this is a correct way to do it. Since I am stuck and not sure how I can print this complex thing out. I want to print it out in order to check if I am doing it correctly.
Any help will be very much appreciated. Thanks in advance!
Just write
$sense_information_hash{$sense}[1]{$word}++;
and be done with it.
Perl gets jealous of CamelCase, you know, so you should use proper underscores. Otherwise it can spit and buck and generally misbehave.
A hash value is never an array, it is an array reference.
To see if you are doing it right, you can dump out the whole structure:
my %senseInformationHash;
my $sense = 'abc';
my $word = '123';
${${$senseInformationHash{$sense}[1]}{$word}}++;
use Data::Dumper;
print Dumper( \%senseInformationHash );
which gets you:
$VAR1 = {
'abc' => [
undef,
{
'123' => \1
}
]
};
Note the \1: presumably you want the value to be 1, not a reference to the scalar 1. You are getting the latter because your ${ ... }++; says treat what's in the curly braces as a scalar reference and increment the scalar referred to.
${$senseInformationHash{$sense}[1]}{$word}++; does what you want, as does $senseInformationHash{$sense}[1]{$word}++. You may find http://perlmonks.org/?node=References+quick+reference helpful in seeing why.
Thanks Axeman and TChrist.
The code I have to access it is as follows :
foreach my $outerKey (keys(%sense_information_hash))
{
print "\nKey => $outerKey\n";
print " Count(sense) => $sense_information_hash{$outerKey}[0]\n";
foreach(keys (%{$sense_information_hash{$outerKey}[1]}) )
{
print " Word wt sense => $_\n";
print " Count => $sense_information_hash{$outerKey}[1]{$_}\n";
}
}
This is working now. Thanks much!

In Perl, how do I create a hash whose keys come from a given array?

Let's say I have an array, and I know I'm going to be doing a lot of "Does the array contain X?" checks. The efficient way to do this is to turn that array into a hash, where the keys are the array's elements, and then you can just say if($hash{X}) { ... }
Is there an easy way to do this array-to-hash conversion? Ideally, it should be versatile enough to take an anonymous array and return an anonymous hash.
%hash = map { $_ => 1 } #array;
It's not as short as the "#hash{#array} = ..." solutions, but those ones require the hash and array to already be defined somewhere else, whereas this one can take an anonymous array and return an anonymous hash.
What this does is take each element in the array and pair it up with a "1". When this list of (key, 1, key, 1, key 1) pairs get assigned to a hash, the odd-numbered ones become the hash's keys, and the even-numbered ones become the respective values.
#hash{#array} = (1) x #array;
It's a hash slice, a list of values from the hash, so it gets the list-y # in front.
From the docs:
If you're confused about why you use
an '#' there on a hash slice instead
of a '%', think of it like this. The
type of bracket (square or curly)
governs whether it's an array or a
hash being looked at. On the other
hand, the leading symbol ('$' or '#')
on the array or hash indicates whether
you are getting back a singular value
(a scalar) or a plural one (a list).
#hash{#keys} = undef;
The syntax here where you are referring to the hash with an # is a hash slice. We're basically saying $hash{$keys[0]} AND $hash{$keys[1]} AND $hash{$keys[2]} ... is a list on the left hand side of the =, an lvalue, and we're assigning to that list, which actually goes into the hash and sets the values for all the named keys. In this case, I only specified one value, so that value goes into $hash{$keys[0]}, and the other hash entries all auto-vivify (come to life) with undefined values. [My original suggestion here was set the expression = 1, which would've set that one key to 1 and the others to undef. I changed it for consistency, but as we'll see below, the exact values do not matter.]
When you realize that the lvalue, the expression on the left hand side of the =, is a list built out of the hash, then it'll start to make some sense why we're using that #. [Except I think this will change in Perl 6.]
The idea here is that you are using the hash as a set. What matters is not the value I am assigning; it's just the existence of the keys. So what you want to do is not something like:
if ($hash{$key} == 1) # then key is in the hash
instead:
if (exists $hash{$key}) # then key is in the set
It's actually more efficient to just run an exists check than to bother with the value in the hash, although to me the important thing here is just the concept that you are representing a set just with the keys of the hash. Also, somebody pointed out that by using undef as the value here, we will consume less storage space than we would assigning a value. (And also generate less confusion, as the value does not matter, and my solution would assign a value only to the first element in the hash and leave the others undef, and some other solutions are turning cartwheels to build an array of values to go into the hash; completely wasted effort).
Note that if typing if ( exists $hash{ key } ) isn’t too much work for you (which I prefer to use since the matter of interest is really the presence of a key rather than the truthiness of its value), then you can use the short and sweet
#hash{#key} = ();
I always thought that
foreach my $item (#array) { $hash{$item} = 1 }
was at least nice and readable / maintainable.
There is a presupposition here, that the most efficient way to do a lot of "Does the array contain X?" checks is to convert the array to a hash. Efficiency depends on the scarce resource, often time but sometimes space and sometimes programmer effort. You are at least doubling the memory consumed by keeping a list and a hash of the list around simultaneously. Plus you're writing more original code that you'll need to test, document, etc.
As an alternative, look at the List::MoreUtils module, specifically the functions any(), none(), true() and false(). They all take a block as the conditional and a list as the argument, similar to map() and grep():
print "At least one value undefined" if any { !defined($_) } #list;
I ran a quick test, loading in half of /usr/share/dict/words to an array (25000 words), then looking for eleven words selected from across the whole dictionary (every 5000th word) in the array, using both the array-to-hash method and the any() function from List::MoreUtils.
On Perl 5.8.8 built from source, the array-to-hash method runs almost 1100x faster than the any() method (1300x faster under Ubuntu 6.06's packaged Perl 5.8.7.)
That's not the full story however - the array-to-hash conversion takes about 0.04 seconds which in this case kills the time efficiency of array-to-hash method to 1.5x-2x faster than the any() method. Still good, but not nearly as stellar.
My gut feeling is that the array-to-hash method is going to beat any() in most cases, but I'd feel a whole lot better if I had some more solid metrics (lots of test cases, decent statistical analyses, maybe some big-O algorithmic analysis of each method, etc.) Depending on your needs, List::MoreUtils may be a better solution; it's certainly more flexible and requires less coding. Remember, premature optimization is a sin... :)
In perl 5.10, there's the close-to-magic ~~ operator:
sub invite_in {
my $vampires = [ qw(Angel Darla Spike Drusilla) ];
return ($_[0] ~~ $vampires) ? 0 : 1 ;
}
See here: http://dev.perl.org/perl5/news/2007/perl-5.10.0.html
Also worth noting for completeness, my usual method for doing this with 2 same-length arrays #keys and #vals which you would prefer were a hash...
my %hash = map { $keys[$_] => $vals[$_] } (0..#keys-1);
Raldi's solution can be tightened up to this (the '=>' from the original is not necessary):
my %hash = map { $_,1 } #array;
This technique can also be used for turning text lists into hashes:
my %hash = map { $_,1 } split(",",$line)
Additionally if you have a line of values like this: "foo=1,bar=2,baz=3" you can do this:
my %hash = map { split("=",$_) } split(",",$line);
[EDIT to include]
Another solution offered (which takes two lines) is:
my %hash;
#The values in %hash can only be accessed by doing exists($hash{$key})
#The assignment only works with '= undef;' and will not work properly with '= 1;'
#if you do '= 1;' only the hash key of $array[0] will be set to 1;
#hash{#array} = undef;
You could also use Perl6::Junction.
use Perl6::Junction qw'any';
my #arr = ( 1, 2, 3 );
if( any(#arr) == 1 ){ ... }
If you do a lot of set theoretic operations - you can also use Set::Scalar or similar module. Then $s = Set::Scalar->new( #array ) will build the Set for you - and you can query it with: $s->contains($m).
You can place the code into a subroutine, if you don't want pollute your namespace.
my $hash_ref =
sub{
my %hash;
#hash{ #{[ qw'one two three' ]} } = undef;
return \%hash;
}->();
Or even better:
sub keylist(#){
my %hash;
#hash{#_} = undef;
return \%hash;
}
my $hash_ref = keylist qw'one two three';
# or
my #key_list = qw'one two three';
my $hash_ref = keylist #key_list;
If you really wanted to pass an array reference:
sub keylist(\#){
my %hash;
#hash{ #{$_[0]} } = undef if #_;
return \%hash;
}
my #key_list = qw'one two three';
my $hash_ref = keylist #key_list;
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my #a = qw(5 8 2 5 4 8 9);
my #b = qw(7 6 5 4 3 2 1);
my $h = {};
#{$h}{#a} = #b;
print Dumper($h);
gives (note repeated keys get the value at the greatest position in the array - ie 8->2 and not 6)
$VAR1 = {
'8' => '2',
'4' => '3',
'9' => '1',
'2' => '5',
'5' => '4'
};
You might also want to check out Tie::IxHash, which implements ordered associative arrays. That would allow you to do both types of lookups (hash and index) on one copy of your data.

Resources