how to replace values in an array of hashes properly in Perl? - arrays

As seen below, I have a foreach loop inside which, a value inside an array of hashes is being replaced with a value from another array of hashes.
The second foreach loop is just to print and test whether the values got assigned correctly.
foreach my $row (0 .. $#row_buff) {
$row_buff[$row]{'offset'} = $vars[$row]{'expression'};
print $row_buff[$row]{'offset'},"\n";
}
foreach (0 .. $#row_buff) {
print $row_buff[$_]{'offset'},"\n";
}
Here #row_buff and #vars are the two array of hashes. They are prefilled with values for all keys used.
The hashes were pushed into the arrays like so:
push #row_buff, \%hash;
ISSUE:
Let's say the print statement in the first foreach print's like this:
string_a
string_b
string_c
string_d
Then the print statement in the second foreach loop print's like so:
string_d
string_d
string_d
string_d
This is what confuses me. Both print statements are supposed to print the exact same way am I right? But the value that gets printed by the second print statement is just the last value alone in a repeated manner. Could someone please point me to what could be going wrong here? Any hint is greatly appreciated. This is my first time putting up a question so pardon me if I missed anything.
UPDATE
There was a bit of information that I could have added, sorry about that everyone. There was one more line before the foreach, it was like so:
#row_buff = (#row_buff) x $itercnt;
foreach my $row (0 .. $#row_buff) {
$row_buff[$row]{'offset'} = $vars[$row]{'expression'};
print $row_buff[$row]{'offset'},"\n";
}
foreach (0 .. $#row_buff) {
print $row_buff[$_]{'offset'},"\n";
}
$itercnt is an integer. I was using it to replicate the #row_buff that many number of times.

This clearly has to do with storing references on the array, instead of independent data. How that comes about isn't clear since details aren't given, but the following discussion should help.
Consider these two basic examples.
First, place a hash (reference) on an array, first changing a value each time
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd);
# use Storable qw(dclone);
my %h = ( a => 1, b => 2 );
my #ary_w_refs;
for my $i (1..3) {
$h{a} = $i;
push #ary_w_refs, \%h; # almost certainly WRONG
# push #ary_w_refs, { %h }; # *copy* data
# push #ary_w_refs, dclone \%h; # may be necessary, or just safer
}
dd $_ for #ary_w_refs;
I use Data::Dump for displaying complex data structures, for its simplicity and default compact output. There are other modules for this purpose, Data::Dumper being in the core (installed).
The above prints
{ a => 3, b => 2 }
{ a => 3, b => 2 }
{ a => 3, b => 2 }
See how that value for key a, that we changed in the hash each time, and so supposedly set for each array element, to a different value (1, 2, 3) -- is the same in the end, and equal to the one we assigned last? (This appears to be the case in the question.)
This is because we assigned a reference to the hash %h to each element, so even though every time through the loop we first change the value in the hash for that key in the end it's just the reference there, at each element, to that same hash.∗
So when the array is queried after the loop we can only get what is in the hash (at key a it's the last assigned number, 3). The array doesn't have its own data, only a pointer to hash's data.† (Thus hash's data can be changed by writing to the array as well, as seen in the example below.)
Most of the time, we want a separate, independent copy. Solution? Copy the data.
Naively, instead of
push #ary_w_refs, \%h;
we can do
push #ary_w_refs, { %h };
Here {} is a constructor for an anonymous hash,‡ so %h inside gets copied. So actual data gets into the array and all is well? In this case, yes, where hash values are plain strings/numbers.
But what when the hash values themselves are references? Then those references get copied, and #ary_w_refs again does not have its own data! We'll have the exact same problem. (Try the above with the hash being ( a => [1..10] ))
If we have a complex data structure, carrying references for values, we need a deep copy. One good way to do that is to use a library, and Storable with its dclone is very good
use Storable qw(dclone);
...
push #ary_w_refs, dclone \%h;
Now array elements have their own data, unrelated (but at the time of copy equal) to %h.
This is a good thing to do with a simple hash/array as well, to be safe from future changes, whereby the hash is changed but we forget about the places where it's copied (or the hash and its copies don't even know about each other).
Another example. Let's populate an array with a hashref, and then copy it to another array
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd pp);
my %h = ( a => 1, b => 2 );
my #ary_src = \%h;
say "Source array: ", pp \#ary_src;
my #ary_tgt = $ary_src[0];
say "Target array: ", pp \#ary_tgt;
$h{a} = 10;
say "Target array: ", pp(\#ary_tgt), " (after hash change)";
$ary_src[0]{b} = 20;
say "Target array: ", pp(\#ary_tgt), " (after hash change)";
$ary_tgt[0]{a} = 100;
dd \%h;
(For simplicity I use arrays with only one element.)
This prints
Source array: [{ a => 1, b => 2 }]
Target array: [{ a => 1, b => 2 }]
Target array: [{ a => 10, b => 2 }] (after hash change)
Target array: [{ a => 10, b => 20 }] (after hash change)
{ a => 100, b => 20 }
That "target" array, which supposedly was merely copied off of a source array, changes when the distant hash changes! And when its source array changes. Again, it is because a reference to the hash gets copied, first to one array and then to the other.
In order to get independent data copies, again, copy the data, each time. I'd again advise to be on the safe side and use Storable::dclone (or an equivalent library of course), even with simple hashes and arrays.
Finally, note a slightly sinister last case -- writing to that array changes the hash! This (second-copied) array may be far removed from the hash, in a function (in another module) that the hash doesn't even know of. This kind of an error can be a source of really hidden bugs.
Now if you clarify where references get copied, with a more complete (simple) representation of your problem, we can offer a more specific remedy.
∗ An important way of using a reference that is correct, and which is often used, is when the structure taken the reference of is declared as a lexical variable every time through
for my $elem (#data) {
my %h = ...
...
push #results, \%h; # all good
}
That lexical %h is introduced anew every time so the data for its reference on the array is retained, as the array persists beyond the loop, independently for each element.
It is also more efficient doing it this way since the data in %h isn't copied, like it is with { %h }, but is just "re-purposed," so to say, from the lexical %h that gets destroyed at the end of iteration to the reference in the array.
This of course may not always be suitable, if a structure to be copied naturally lives outside of the loop. Then use a deep copy of it.
The same kind of a mechanism works in a function call
sub some_func {
...
my %h = ...
...
return \%h; # good
}
my $hashref = some_func();
Again, the lexical %h goes out of scope as the function returns and it doesn't exist any more, but the data it carried and a reference to it is preserved, since it is returned and assigned so its refcount is non-zero. (At least returned to the caller, that is; it could've been passed yet elsewhere during the sub's execution so we may still have a mess with multiple actors working with the same reference.) So $hashref has a reference to data that had been created in the sub.
Recall that if a function was passed a reference, when it was called or during its execution (by calling yet other subs which return references), changed and returned it, then again we have data changed in some caller, potentially far removed from this part of program flow.
This is done often of course, with larger pools of data which can't just be copied around all the time, but then one need be careful and organize code (to be as modular as possible, for one) so to minimize chance of errors.
† This is a loose use of the word "pointer," for what a reference does, but if one were to refer to C I'd say that it's a bit of a "dressed" C-pointer
‡ In a different context it can be a block

Related

2D Array Printing as Reference

I have the code similar to below:
my #array1 = (); #2d array to be used
my $string1 = "blank1";
my $string2 = "blank2";
my $string3 = "blank3";
my #temp = ($string1, $string2, $string3);
push (#array1, \#temp);
The reason I am assigning the strings and then putting them into an array is because they are in a loop and the values get updated in the loop (#array1 is not declared in the loop).
When I run my program, it only gives me a reference to an array rather than an actual 2D array. How can I get it to print out the content as a 2D array and not as a reference or flattened out to a 1D array?
I would like an output like [[blank1, blank2, blank3],....] so i can access it like $array1[i][j]
An array can only have scalars for elements. Thus this includes references, to arrays for example, what enables us to build complex data structures. See perldsc, Tom's Perl Data Structure Cookbook.
Elements of those ("second-level") arrays are accessed by dereferencing, so $array1[0]->[1] is the second element of the array whose reference is the first element of the top-level array (#array1). Or, for convenience, a simpler syntax is allowed as well: $array1[0][1].
If we want a list of all elements of a second-level array then dereference it with #, like:
my #l2 = #{ $array1[0] }; # or, using
my #l2 = $array1[0]->#*; # postfix dereferencing
Or, to get just a few elements of the array, but in one scoop -- a slice
my #l2_slice = #{$array1[0]}[1..2]; # or
my #l2_slice = $array1[0]->#[1..2]; # postfix reference slice
what returns the list with the second and third elements of the same second-level array.
The second lines are of a newer syntax called postfix dereferencing, stable as of v5.24. It avails us with the same logic for getting elements as when we drill for a single one, by working left-to-right all the way. So ->#* to get a list of all elements for an arrayref,->%* for a hashref (etc). See for instance a perl.com article and an Effective Perler article.
There is a thing to warn about when it comes to multidimensional structures built with references. There are two distinct ways to create them: by using references to existing, named, variables
my #a1 = 5 .. 7;
my %h1 = ( a => 1, b => 2 );
my #tla1 = (\#a1, \%h1);
or by using anonymous ones, where arrayrefs are constructed by [] and hashrefs by {}
my #tla2 = ( [ 5..7 ], { a => 1, b => 2 } );
The difference is important to keep in mind. In the first case, the references to variables that the array carries can be used to change those variables -- if we change elements of #tla1 then we really change the referred variables
$tla1[0][1] = 100; # now #a1 == 5, 100, 7
Also, changing variables with references in #tla1 is seen via the top-level array as well.
With anonymous arrays and hashes in #tla this isn't the case as elements (references) of #tla access independent data, which cannot be accessed (and changed) in any other way.
Both of these ways to build complex data structures have their uses.

How to use an array reference to pairs of elements

I am considering this answer which uses a single array reference of points, where a point is a reference to a two-element array.
My original code of the question (function extract-crossing) uses two separate arrays $x and $y here which I call like this:
my #x = #{ $_[0] }; my #y = #{ $_[1] };
...
return extract_crossing(\#x, \#y);
The new code below based on the answer takes (x, y) and returns a single datatype, here x intercept points:
use strict; use warnings;
use Math::Geometry::Planar qw(SegmentLineIntersection);
use Test::Exception;
sub x_intercepts {
my ($points) = #_;
die 'Must pass at least 2 points' unless #$points >= 2;
my #intercepts;
my #x_axis = ( [0, 0], [1, 0] );
foreach my $i (0 .. $#$points - 1) {
my $intersect = SegmentLineIntersection([#$points[$i,$i+1],#x_axis]);
push #intercepts, $intersect if $intersect;
}
return \#intercepts;
}
which I try to call like this:
my #x = #{ $_[0] }; my #y = #{ $_[1] };
...
my $masi = x_intercepts(\#x);
return $masi;
However, the code does not make sense.
I am confused about passing the "double array" to the x_intercepts() function.
How can you make the example code clearer to the original setting?
If I am understanding the problem here, #ThisSuitIsBlackNot++ has written a function (x_intercepts which is available in the thread: Function to extract intersections of crossing lines on axes) that expects its argument to be a reference to a list of array references. The x_intercepts subroutine in turn uses a function from Math::Geometry::Planar which expects the points of a line segment to be passed as series of array references/anonymous arrays that contain the x,y values for each point.
Again - it is not entirely clear - but it seems your data is in two different arrays: one containing all the x values and one with the corresponding y values. Is this the case? If this is not correct please leave a comment and I will remove this answer.
If that is the source of your problem then you can "munge" or transform your data before you pass it to x_intercepts or - as #ThisSuitIsBlackNot suggests - you can rewrite the function. Here is an example of munging your existing data into an "#input_list" to pass to x_intercepts.
my #xs = qw/-1 1 3/;
my #ys = qw/-1 1 -1 /;
my #input_list ;
foreach my $i ( 0..$#ys ) {
push #input_list, [ $xs[$i], $ys[$i] ] ;
}
my $intercept_list = x_intercepts(\#input_list) ;
say join ",", #$_ for #$intercept_list ;
Adding the lines above to your script produces:
Output:
0,0
2,0
You have to be very careful doing this kind of thing and using tests to make sure you are passing the correctly transformed data in an expected way is a good idea.
I think a more general difficulty is that until you are familiar with perl it is sometimes tricky to easily see what sorts of values a subroutine is expecting, where they end up after they are passed in, and how to access them.
A solid grasp of perl data structures can help with that - for example I think what you are calling a "double array" or "double element" here is an "array of arrays". There are ways to make it easier to see where default arguments passed to a subroutine (in #_) are going (notice how #ThisSuitIsBlackNot has passed them to a nicely named array reference: "($points)"). Copious re-reading of perldocperbsub can help things seem more obvious.
References are key to understanding perl subroutines since to pass an array or hash to a subrouting you need to do so by references. If the argument passed x_intercepts is a list of two lists of anonymous arrays then when it is assigned to ($points), #$points->[0] #$points->[1] will be the arrays contain those lists.
I hope this helps and is not too basic (or incorrect). If #ThisSuitIsBlackNot finds the time to provide an answer you should accept it: some very useful examples have been provided.

Can BerkeleyDB in perl handle a hash of hashes of hashes (up to n)?

I have a script that utilizes a hash, which contains four strings as keys whose values are hashes. These hashes also contain four strings as keys which also have hashes as their values. This pattern continues up to n-1 levels, which is determined at run-time. The nth-level of hashes contain integer (as opposed to the usual hash-reference) values.
I installed the BerkeleyDB module for Perl so I can use disk space instead of RAM to store this hash. I assumed that I could simply tie the hash to a database, and it would work, so I added the following to my code:
my %tags = () ;
my $file = "db_tags.db" ;
unlink $file;
tie %tags, "BerkeleyDB::Hash",
-Filename => $file,
-Flags => DB_CREATE
or die "Cannot open $file\n" ;
However, I get the error:
Can't use string ("HASH(0x1a69ad8)") as a HASH ref while "strict refs" in use at getUniqSubTreeBDB.pl line 31, line 1.
To test, I created a new script, with the code (above) that tied to hash to a file. Then I added the following:
my $href = \%tags;
$tags{'C'} = {} ;
And it ran fine. Then I added:
$tags{'C'}->{'G'} = {} ;
And it would give pretty much the same error. I am thinking that BerkeleyDB cannot handle the type of data structure I am creating. Maybe it was able to handle the first level (C->{}) in my test because it was just a regular key -> scaler?
Anyways, any suggestions or affirmations of my hypothesis would be appreciated.
Use DBM::Deep.
my $db = DBM::Deep->new( "foo.db" );
$db->{mykey} = "myvalue";
$db->{myhash} = {};
$db->{myhash}->{subkey} = "subvalue";
print $db->{myhash}->{subkey} . "\n";
The code I provided yesterday would work fine with this.
sub get_node {
my $p = \shift;
$p = \( ($$p)->{$_} ) for #_;
return $p;
}
my #seqs = qw( CG CA TT CG );
my $tree = DBM::Deep->new("foo.db");
++${ get_node($tree, split //) } for #seqs;
No. BerkeleyDB stores pairs of one key and one value, where both are arbitrary bytestrings. If you store a hashref as the value, it'll store the string representation of a hashref, which isn't very useful when you read it back (as you noticed).
The MLDBM module can do something like you describe, but it works by serializing the top-level hashref to a string and storing that in the DBM file. This means it has to read/write the entire top-level hashref every time you access or change a value in it.
Depending on your application, you may be able to combine your keys into a single string, and use that as the key for your DBM file. The main limitation with that is that it's difficult to iterate over the keys of one of your interior hashes.
You might use the semi-obsolete multidimensional array emulation for this. $foo{$a,$b,$c} is interpreted as $foo{join($;, $a, $b, $c)}, and that works with tied hashes also.
No; it can only store strings. But you can use the →filter_fetch_value and →filter_store_value to define "filters" that will automatically freeze arbitrary structures to strings before storing, and to convert back when fetching. There are analogous hooks for marshalling and unmarshalling non-string keys.
Beware though: using this method to store objects that share subobjects will not preserve the sharing. For example:
$a = [1, 2, 3];
$g = { array => $a };
$h = { array => $a };
$db{g} = $g;
$db{h} = $h;
#$a = ();
push #{$db{g}{array}}, 4;
print #{$db{g}{array}}; # prints 1234, not 4
print #{$db{h}{array}}; # prints 123, not 1234 or 4
%db here is a tied hash; if it were an ordinary hash the two prints would both print 4.
While you can't store normal multidimensional hashes in a BerkeleyDB tied hash, you can use emulated multidimensional hashes with a syntax like $tags{ 'C', 'G'}. This creates a single key that looks like ('C' . $; . 'G')
I had the same question, found this. Might be useful for you as well.
Storing data structures as values in BDB
Often, we might be interested in storing complex data structures: arrays, hashtables,… whose elements can be simple values, of references to other data structures. To do this, we need to serialize the data structure: convert it to a string that can be stored in the database, and can be later converted back into the original data structure using a deserialization procedure.
There are several perl modules available to perform this serialization/deserialization process. One of the most popular is JSON::XS. The next example shows how to use this module:
use JSON::XS;
# Data to be stored
my %structure;
# Convert the data into a json string
my $json = encode_json(%structure);
# Save it in the database
$dbh->db_put($key,$json);
To retrieve the original structure, we perform the inverse operation:
# Retrieve the json string from the database
$dbh->db_get($key, $json);
# Deserialize the json string into a data structure
my $hr_structure = decode_json($json);
In perl you can do this. You are using references beyond the first level.
use GDBM_File;
use Storable;
use MLDBM qw(GDBM_File Storable);
my %hash;
my %level_2_hash;
my %level_3_hash1 = (key1 => x, key2 => y, key3 => z)
my %level_3_hash2 = (key10 => a, key20 => b, key30 => c)
$level_2_hash = (keyA => /%level_3_hash1, keyB => level_3_hash2)
$hash{key} = \%level_2_hash;
This can be found in the online beginning perl book in chapter 13.

How do I update values in an array of hashes, which is in a hash of a hash in perl?

Seems very confusing I know. I'll try to "draw" this data structure:
hash-> key->( (key)->[(key,value),(key,value),(key,value),...],
(key,value))
So there is the first key, whose value is enclosed in the parenthesis. The value for the first key of the hash is two keys, one (the right one) being another simple key, value pair. The other (the left one) key's value is an array of hashes. I am able to update the "right" key, value pair with the following line of code:
$hash{$parts[1]}{"PAGES"} += $parts[2];
Where $parts[1] and $parts[2] are just elements from an array. I am +=ing a number to the "right" key, value pair from my hash. What I need to do now is update the "left" key,value pair - the array of hashes within a hash of hashes. Here is how I initialize the array for both key, value pairs in the hash of hashes:
$hash{$printer}{"PAGES"} = 0;
$hash{$printer}{"USERS"} = [#tmp];
Here is one of my many attempts to access and update the values in the array of hashes:
$hash{$parts[1]}{"USERS"}[$parts[0]] += $parts[2];
I just can't figure out the correct syntax for this. If anyone could help me I'd appreciate it. Thanks.
Edit: I guess a more pointed question is: How do I get a hash key from an array of hashes (keeping in mind that the array is in a hash of hashes)?
Edit: Added this to the code:
#Go through each user to check to see which user did a print job and add the page
#count to their total
#$parts[0] is the user name, $parts[1] is the printer name, $parts[2] is the page
#count for the current print job
for(my $i=0;$i<$arr_size;$i++)
{
my $inner = $hash{$parts[1]}{"USERS"}[$i];
my #hash_arr = keys %$inner;
my $key = $hash_arr[0];
#problem line - need to compare the actual key with $parts[0]
#(not the key's value which is a number)
if($hash{$parts[1]}{"USERS"}[$i]{$key} eq $parts[0])
{
$hash{$parts[1]}{"USERS"}[$i]{$parts[0]} += $parts[2];
}
}
Edit: Whoops hehe this is what I needed. It still isn't quite there but this is kind of what I am looking for:
if($key eq $parts[0])
{
$hash{$parts[1]}{"USERS"}[$i]{$parts[0]} += $parts[2];
}
Edited to respond to the edited question: How do I get a hash key from an array of hashes (keeping in mind that the array is in a hash of hashes).
use strict;
use warnings;
my %h;
$h{printer}{PAGES} = 0;
$h{printer}{USERS} = [
{a => 1, b => 2},
{c => 3, d => 4},
{e => 5, f => 6},
];
# Access a particular element.
$h{printer}{USERS}[0]{a} += 100;
# Access one of the inner hashes.
my $inner = $h{printer}{USERS}[1];
$inner->{$_} += 1000 for keys %$inner;
# Ditto, but without the convenience variable.
$h{printer}{USERS}[2]{$_} += 9000 for keys %{ $h{printer}{USERS}[2] };
use Data::Dumper qw(Dumper);
print Dumper \%h;
Output:
$VAR1 = {
'printer' => {
'PAGES' => 0,
'USERS' => [
{
'a' => 101,
'b' => 2
},
{
'c' => 1003,
'd' => 1004
},
{
'e' => 9005,
'f' => 9006
}
]
}
};
I have trouble understanding the structure from your description.
My advice is to avoid such structures like pestilentious devils from bad neighbourhoods.
One reason people end up with such maintenance nightmares is they're using XML::Simple.
Whatever the reason in your example, do yourself a favour and prevent that horrible data structure from ever getting created. There are always alternatives. If you describe the problem, people will be able to suggest some.
The way you've described the structure is somewhat impenetrable to me, but accessing array references embedded within other structures is done thusly, using a simpler example structure:
my $ref = {k => [1, 3, 5, 6, 9]};
Below, the value of 6 is incremented to 7:
$ref->{k}->[3] += 1;
See perlref for more details on the -> operator, but briefly, the expression to the left of the arrow can be anything that returns a reference. In some cases, the -> operator is optional but it's best left in for clarity.
I still haven't decoded your structure. But, I'll make two comments:
Use the -> syntactic sugar. For example, $hash->{key}->[2]->{key} is a bit clearer than trying to parse things out without the syntactic sugar: ${{hash}{key}}[1]{key} (if that's even correct...)
Look into using Object Oriented Perl for this structure. It's not as scary as it sounds. In Perl, objects are subroutines that handle the dirty work for you. Take a look at perldoc perlboot. It's what I used to understand how object oriented Perl works. You don't even have to create a full separate module. The object definitions can live in the same Perl script.
Using Object Oriented Perl keeps the mess outside your main program and makes it easier to maintain your program. Plus, if you have to modify your structure, you don't have to search your entire code to find all the places to change. The syntactic sugar makes it easier to see where you're going with your structure.
Compare this monstrosity with this object oriented monstrosity. Both programs do the same thing. I wrote the first one long ago, and found it so hard to maintain that I rewrote it from scratch in an object oriented style. (They're pre-commit hooks for Subversion in case anyone is wondering).

In Perl, how do I create a hash whose keys come from a given array?

Let's say I have an array, and I know I'm going to be doing a lot of "Does the array contain X?" checks. The efficient way to do this is to turn that array into a hash, where the keys are the array's elements, and then you can just say if($hash{X}) { ... }
Is there an easy way to do this array-to-hash conversion? Ideally, it should be versatile enough to take an anonymous array and return an anonymous hash.
%hash = map { $_ => 1 } #array;
It's not as short as the "#hash{#array} = ..." solutions, but those ones require the hash and array to already be defined somewhere else, whereas this one can take an anonymous array and return an anonymous hash.
What this does is take each element in the array and pair it up with a "1". When this list of (key, 1, key, 1, key 1) pairs get assigned to a hash, the odd-numbered ones become the hash's keys, and the even-numbered ones become the respective values.
#hash{#array} = (1) x #array;
It's a hash slice, a list of values from the hash, so it gets the list-y # in front.
From the docs:
If you're confused about why you use
an '#' there on a hash slice instead
of a '%', think of it like this. The
type of bracket (square or curly)
governs whether it's an array or a
hash being looked at. On the other
hand, the leading symbol ('$' or '#')
on the array or hash indicates whether
you are getting back a singular value
(a scalar) or a plural one (a list).
#hash{#keys} = undef;
The syntax here where you are referring to the hash with an # is a hash slice. We're basically saying $hash{$keys[0]} AND $hash{$keys[1]} AND $hash{$keys[2]} ... is a list on the left hand side of the =, an lvalue, and we're assigning to that list, which actually goes into the hash and sets the values for all the named keys. In this case, I only specified one value, so that value goes into $hash{$keys[0]}, and the other hash entries all auto-vivify (come to life) with undefined values. [My original suggestion here was set the expression = 1, which would've set that one key to 1 and the others to undef. I changed it for consistency, but as we'll see below, the exact values do not matter.]
When you realize that the lvalue, the expression on the left hand side of the =, is a list built out of the hash, then it'll start to make some sense why we're using that #. [Except I think this will change in Perl 6.]
The idea here is that you are using the hash as a set. What matters is not the value I am assigning; it's just the existence of the keys. So what you want to do is not something like:
if ($hash{$key} == 1) # then key is in the hash
instead:
if (exists $hash{$key}) # then key is in the set
It's actually more efficient to just run an exists check than to bother with the value in the hash, although to me the important thing here is just the concept that you are representing a set just with the keys of the hash. Also, somebody pointed out that by using undef as the value here, we will consume less storage space than we would assigning a value. (And also generate less confusion, as the value does not matter, and my solution would assign a value only to the first element in the hash and leave the others undef, and some other solutions are turning cartwheels to build an array of values to go into the hash; completely wasted effort).
Note that if typing if ( exists $hash{ key } ) isn’t too much work for you (which I prefer to use since the matter of interest is really the presence of a key rather than the truthiness of its value), then you can use the short and sweet
#hash{#key} = ();
I always thought that
foreach my $item (#array) { $hash{$item} = 1 }
was at least nice and readable / maintainable.
There is a presupposition here, that the most efficient way to do a lot of "Does the array contain X?" checks is to convert the array to a hash. Efficiency depends on the scarce resource, often time but sometimes space and sometimes programmer effort. You are at least doubling the memory consumed by keeping a list and a hash of the list around simultaneously. Plus you're writing more original code that you'll need to test, document, etc.
As an alternative, look at the List::MoreUtils module, specifically the functions any(), none(), true() and false(). They all take a block as the conditional and a list as the argument, similar to map() and grep():
print "At least one value undefined" if any { !defined($_) } #list;
I ran a quick test, loading in half of /usr/share/dict/words to an array (25000 words), then looking for eleven words selected from across the whole dictionary (every 5000th word) in the array, using both the array-to-hash method and the any() function from List::MoreUtils.
On Perl 5.8.8 built from source, the array-to-hash method runs almost 1100x faster than the any() method (1300x faster under Ubuntu 6.06's packaged Perl 5.8.7.)
That's not the full story however - the array-to-hash conversion takes about 0.04 seconds which in this case kills the time efficiency of array-to-hash method to 1.5x-2x faster than the any() method. Still good, but not nearly as stellar.
My gut feeling is that the array-to-hash method is going to beat any() in most cases, but I'd feel a whole lot better if I had some more solid metrics (lots of test cases, decent statistical analyses, maybe some big-O algorithmic analysis of each method, etc.) Depending on your needs, List::MoreUtils may be a better solution; it's certainly more flexible and requires less coding. Remember, premature optimization is a sin... :)
In perl 5.10, there's the close-to-magic ~~ operator:
sub invite_in {
my $vampires = [ qw(Angel Darla Spike Drusilla) ];
return ($_[0] ~~ $vampires) ? 0 : 1 ;
}
See here: http://dev.perl.org/perl5/news/2007/perl-5.10.0.html
Also worth noting for completeness, my usual method for doing this with 2 same-length arrays #keys and #vals which you would prefer were a hash...
my %hash = map { $keys[$_] => $vals[$_] } (0..#keys-1);
Raldi's solution can be tightened up to this (the '=>' from the original is not necessary):
my %hash = map { $_,1 } #array;
This technique can also be used for turning text lists into hashes:
my %hash = map { $_,1 } split(",",$line)
Additionally if you have a line of values like this: "foo=1,bar=2,baz=3" you can do this:
my %hash = map { split("=",$_) } split(",",$line);
[EDIT to include]
Another solution offered (which takes two lines) is:
my %hash;
#The values in %hash can only be accessed by doing exists($hash{$key})
#The assignment only works with '= undef;' and will not work properly with '= 1;'
#if you do '= 1;' only the hash key of $array[0] will be set to 1;
#hash{#array} = undef;
You could also use Perl6::Junction.
use Perl6::Junction qw'any';
my #arr = ( 1, 2, 3 );
if( any(#arr) == 1 ){ ... }
If you do a lot of set theoretic operations - you can also use Set::Scalar or similar module. Then $s = Set::Scalar->new( #array ) will build the Set for you - and you can query it with: $s->contains($m).
You can place the code into a subroutine, if you don't want pollute your namespace.
my $hash_ref =
sub{
my %hash;
#hash{ #{[ qw'one two three' ]} } = undef;
return \%hash;
}->();
Or even better:
sub keylist(#){
my %hash;
#hash{#_} = undef;
return \%hash;
}
my $hash_ref = keylist qw'one two three';
# or
my #key_list = qw'one two three';
my $hash_ref = keylist #key_list;
If you really wanted to pass an array reference:
sub keylist(\#){
my %hash;
#hash{ #{$_[0]} } = undef if #_;
return \%hash;
}
my #key_list = qw'one two three';
my $hash_ref = keylist #key_list;
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my #a = qw(5 8 2 5 4 8 9);
my #b = qw(7 6 5 4 3 2 1);
my $h = {};
#{$h}{#a} = #b;
print Dumper($h);
gives (note repeated keys get the value at the greatest position in the array - ie 8->2 and not 6)
$VAR1 = {
'8' => '2',
'4' => '3',
'9' => '1',
'2' => '5',
'5' => '4'
};
You might also want to check out Tie::IxHash, which implements ordered associative arrays. That would allow you to do both types of lookups (hash and index) on one copy of your data.

Resources