Perl: mapping to lists' first element - arrays

Task: to build hash using map, where keys are the elements of the given array #a, and values are the first elements of the list returned by some function f($element_of_a):
my #a = (1, 2, 3);
my %h = map {$_ => (f($_))[0]} #a;
All the okay until f() returns an empty list (that's absolutely correct for f(), and in that case I'd like to assign undef). The error could be reproduced with the following code:
my %h = map {$_ => ()[0]} #a;
the error itself sounds like "Odd number of elements in hash assignment". When I rewrite the code such that:
my #a = (1, 2, 3);
my $s = ()[0];
my %h = map {$_ => $s} #a;
or
my #a = (1, 2, 3);
my %h = map {$_ => undef} #a;
Perl does not complain at all.
So how should I resolve this — get first elements of list returned by f(), when the returned list is empty?
Perl version is 5.12.3
Thanks.

I've just played around a bit, and it seems that ()[0], in list context, is interpreted as an empty list rather than as an undef scalar. For example, this:
my #arr = ()[0];
my $size = #arr;
print "$size\n";
prints 0. So $_ => ()[0] is roughly equivalent to just $_.
To fix it, you can use the scalar function to force scalar context:
my %h = map {$_ => scalar((f($_))[0])} #a;
or you can append an explicit undef to the end of the list:
my %h = map {$_ => (f($_), undef)[0]} #a;
or you can wrap your function's return value in a true array (rather than just a flat list):
my %h = map {$_ => [f($_)]->[0]} #a;
(I like that last option best, personally.)
The special behavior of a slice of an empty list is documented under “Slices” in perldata:
A slice of an empty list is still an empty list. […] This makes it easy to write loops that terminate when a null list is returned:
while ( ($home, $user) = (getpwent)[7,0]) {
printf "%-8s %s\n", $user, $home;
}

I second Jonathan Leffler's suggestion - the best thing to do would be to solve the problem from the root if at all possible:
sub f {
# ... process #result
return #result ? $result[0] : undef ;
}
The explicit undef is necessary for the empty list problem to be circumvented.

At first, much thanks for all repliers! Now I'm feeling that I should provide the actual details of the real task.
I'm parsing a XML file containing the set of element each looks like that:
<element>
<attr_1>value_1</attr_1>
<attr_2>value_2</attr_2>
<attr_3></attr_3>
</element>
My goal is to create Perl hash for element that contains the following keys and values:
('attr_1' => 'value_1',
'attr_2' => 'value_2',
'attr_3' => undef)
Let's have a closer look to <attr_1> element. XML::DOM::Parser CPAN module that I use for parsing creates for them an object of class XML::DOM::Element, let's give the name $attr for their reference. The name of element is got easy by $attr->getNodeName, but for accessing the text enclosed in <attr_1> tags one has to receive all the <attr_1>'s child elements at first:
my #child_ref = $attr->getChildNodes;
For <attr_1> and <attr_2> elements ->getChildNodes returns a list containing exactly one reference (to object of XML::DOM::Text class), while for <attr_3> it returns an empty list. For the <attr_1> and <attr_2> I should get value by $child_ref[0]->getNodeValue, while for <attr_3> I should place undef into the resulting hash since no text elements there.
So you see that f function's (method ->getChildNodes in real life) implementation could not be controlled :-) The resulting code that I have wrote is (the subroutine is provided with list of XML::DOM::Element references for elements <attr_1>, <attr_2>, and <attr_3>):
sub attrs_hash(#)
{
my #keys = map {$_->getNodeName} #_; # got ('attr_1', 'attr_2', 'attr_3')
my #child_refs = map {[$_->getChildNodes]} #_; # got 3 refs to list of XML::DOM::Text objects
my #values = map {#$_ ? $_->[0]->getNodeValue : undef} #child_refs; # got ('value_1', 'value_2', undef)
my %hash;
#hash{#keys} = #values;
%hash;
}

Related

How do I copy list elements to hash keys in perl?

I have found a couple of ways to copy the elements of a list to the keys of a hash, but could somebody please explain how this works?
#!/usr/bin/perl
use v5.34.0;
my #arry = qw( ray bill lois shirly missy hank );
my %hash;
$hash{$_}++ for #arry; # What is happening here?
foreach (keys %hash) {
say "$_ => " . $hash{$_};
}
The output is what I expected. I don't know how the assignment is being made.
hank => 1
shirly => 1
missy => 1
bill => 1
lois => 1
ray => 1
$hash{$_}++ for #array;
Can also be written
for (#array) {
$hash{$_}++;
}
Or more explicitly
for my $key (#array) {
$hash{$key}++;
}
$_ is "the default input and pattern-searching space"-variable. Often in Perl functions, you can leave out naming an explicit variable to use, and it will default to using $_. for is an example of that. You can also write an explicit variable name, that might feel more informative for your code:
for my $word (#words)
Or idiomatically:
for my $key (keys %hash) # using $key variable name for hash keys
You should also be aware that for and foreach are exactly identical in Perl. They are aliases for the same function. Hence, I always use for because it is shorter.
The second part of the code is the assignment, using the auto-increment operator ++
It is appended to a variable on the LHS and increments its value by 1. E.g.
$_++ means $_ = $_ + 1
$hash{$_}++ means $hash{$_} = $hash{$_} + 1
...etc
It also has a certain Perl magic included, which you can read more about in the documentation. In this case, it means that it can increment even undefined variables without issuing a warning about it. This is ideal when it comes to initializing hash keys, which do not exist beforehand.
Your code will initialize a hash key for each word in your #arry list, and also count the occurrences of each word. Which happens to be 1 in this case. This is relevant to point out, because since hash keys are unique, your array list may be bigger than the list of keys in the hash, since some keys would overwrite each other.
my #words = qw(foo bar bar baaz);
my %hash1;
for my $key (#words) {
$hash{$key} = 1; # initialize each word
}
# %hash1 = ( foo => 1, bar => 1, baaz => 1 );
# note -^^
my %hash2; # new hash
for my $key (#words) {
$hash{$key}++; # use auto-increment: words are counted
}
# %hash2 = ( foo => 1, bar => 2, baaz => 1);
# note -^^
Here is another one
my %hash = map { $_ => 1 } #ary;
Explanation: map takes an element of the input array at a time and for each prepapres a list, here of two -- the element itself ($_, also quoted because of =>) and a 1. Such a list of pairs then populates a hash, as a list of an even length can be assigned to a hash, whereby each two successive elements form a key-value pair.
Note: This does not account for possibly multiple occurences of same elements in the array but only builds an existance-check structure (whether an element is in the array or not).
$hash{$_}++ for #arry; # What is happening here?
It is iterating over the array, and for each element, it's assigning it as a key to the hash, and incrementing the value of that key by one. You could also write it like this:
my %hash;
my #array = (1, 2, 2, 3);
for my $element (#array) {
$hash{$element}++;
}
The result would be:
$VAR1 = {
'2' => 2,
'1' => 1,
'3' => 1
};
$hash{$_}++ for #arry; # What is happening here?
Read perlsyn, specifically simple statements and statement modifiers:
Simple Statements
The only kind of simple statement is an expression evaluated for its side-effects. Every simple statement must be terminated with a semicolon, unless it is the final statement in a block, in which case the semicolon is optional. But put the semicolon in anyway if the block takes up more than one line, because you may eventually add another line. Note that there are operators like eval {}, sub {}, and do {} that look like compound statements, but aren't--they're just TERMs in an expression--and thus need an explicit termination when used as the last item in a statement.
Statement Modifiers
Any simple statement may optionally be followed by a SINGLE modifier, just before the terminating semicolon (or block ending). The possible modifiers are:
if EXPR
unless EXPR
while EXPR
until EXPR
for LIST
foreach LIST
when EXPR
[...]
The for(each) modifier is an iterator: it executes the statement once for each item in the LIST (with $_ aliased to each item in turn). There is no syntax to specify a C-style for loop or a lexically scoped iteration variable in this form.
print "Hello $_!\n" for qw(world Dolly nurse);

Using an element in a Hash of Arrays to access the key

I have a hash of arrays like this:
my #array1 = ("one", "two", "three", "four", "five");
my #array2 = ("banana", "pear", "apple");
my %hash = (
numbers => \#array1,
fruit => \#array2
);
I would like to use an element of the array to access the key. So for example, if I have "banana", I would like to print "fruit".
However when I do print $hash{banana} I get "use of unitialized value in print". How do I properly access this?
As already mentioned by Borodin in the comments, there is no direct way to accomplish this. But you could do it in the following way:
sub getKeyByValue {
my ($hashref, $val) = #_; # get sub arguments
for (keys %$hashref) {
# find key by value and give back the key
return $_ if grep $_ eq $val, #{$hashref->{$_}};
}
return undef; # value not found
}
my $key = getKeyByValue(\%hash, 'banana');
print $key;
Output: fruit
Just give the hash reference and your desired value to the subroutine getKeyByValue() and it will return the corresponding key. If the value can't be found, the subroutine will return the undefined value undef. If your data structure is really big, this trivial search is obviously not the most efficient solution.
Note: If the value banana is stored several times (under more than one key), then this subroutine will of course only return the first random match (key). You have to modify the subroutine if you are interested in all the keys under which banana is possibly stored.
There are many ways to do this, like most of the time in Perl. For example you could reverse the hash and create a new one (see example in perlfaq4).
You could create two distinct hashes:
my %hash1 = map { $_ => "numbers" } #array1;
my %hash2 = map { $_ => "fruit" } #array2;
and concatenate them:
my %hash = (%hash1, %hash2);

Self-deleting array elements (once they become undefined)

I have a Perl script generating an array of weak references to objects. Once one of these objects goes out of scope, the reference to it in the array will become undefined.
ex (pseudo code):
# Imagine an array of weak references to objects
my #array = ( $obj1_ref, $obj2_ref, $obj3_ref );
# Some other code here causes the last strong reference
# of $obj2_ref to go out of scope.
# We now have the following array
#array = ( $obj1_ref, undef, $obj3_ref )
Is there a way to make the undefined reference automatically remove itself from the array once it becomes undefined?
I want #array = ($obj1_red, $obj3_ref ).
EDIT:
I tried this solution and it didn't work:
#!/usr/bin/perl
use strict;
use warnings;
{
package Object;
sub new { my $class = shift; bless({ #_ }, $class) }
}
{
use Scalar::Util qw(weaken);
use Data::Dumper;
my $object = Object->new();
my $array;
$array = sub { \#_ }->( grep defined, #$array );
{
my $object = Object->new();
#$array = ('test1', $object, 'test3');
weaken($array->[1]);
print Dumper($array);
}
print Dumper($array);
Output:
$VAR1 = [
'test1',
bless( {}, 'Object' ),
'test3'
];
$VAR1 = [
'test1',
undef,
'test3'
];
The undef is not removed from the array automatically.
Am I missing something?
EDIT 2:
I also tried removing undefined values from the array in the DESTROY method of the object, but that doesn't appear to work either. It appears that since the object is still technically not "destroyed" yet, the weak references are still defined until the DESTROY method is completed...
No, there isn't, short of using a magical (e.g. tied) array.
If you have a reference to an array instead of an array, you can use the following to filter out the undefined element efficiently without "hardening" any of the references.
$array = sub { \#_ }->( grep defined, #$array );
This doesn't copy the values at all, in fact. Only "C pointers" get copied.
Perl won't do this for you automatically. You have a couple of options. The first is to clean it yourself whenever you use it:
my #clean = grep { defined $_ } #dirty;
Or you could create a tie'd array and add that functionality to the FETCH* and POP hooks.

Two-dimensional array access in Perl

I have an array populated with cities. I want to pass the array by reference to a sub routine and print each city to output. However, I have the following problems:
I can access each element before my while loop in the subroutine. But I cannot access the elements within my while loop. I get the error message:
...
Use of uninitialized value in print at line 44, line 997 (#1)
Use of uninitialized value in print at line 44, line 998 (#1)
...
The following is some code. I have commented what prints and what doesn't (I tried to cut out code that is not needed for my explanation...):
#cities;
# Assume cities is loaded successfully
&loadCities(getFileHandle('cities.txt'), $NUM_CITIES, \#cities);
&printElements(getFileHandle('names.txt'), \#cities);
sub printElements{
my $counter = 0;
my $arraySize = scalar $_[1];
# Prints fine!!!
print #{$_[1][($counter)%$arraySize];
while ((my $line = $_[0]->getline()) && $counter < 1000){
# Doesn't print. Generates the above error
print #{$_[1][($counter)%$arraySize];
$counter += 1;
}
}
The Perl syntax has me super confused. I do not understand what is going on with #{$_[1]}[0]. I am trying to work it out.
$_[1], treat the value at this location as scalar value (memory
address of the array)
#{...}, interpret what is stored at this
memory address as an array
#{...} [x], access the element at index x
Am I on the right track?
My first tip is that you should put use strict; and use warnings; at the top of your script. This generally reveals quite a few things.
This line: print #{$_[1][($counter)%$arraySize]; doesn't have a closing }. You also don't need the parenthesis around $counter.
Like you mentioned, the best/most clear way to get the length of an array is my $arraySize = scalar #{$_[1]};.
You can check out the documentation here for working with references. I'll give you a quick overview.
You can declare an array as normal
my #array = (1, 2, 3);
Then you can reference it using a backslash.
my $array_ref = \#array;
If you want to use the reference, use #{...}. This is just like using a regular array.
print #{$array_ref};
You could also declare it as a reference to begin with using square braces.
my $array_ref = [1, 2, 3];
print #{$array_ref}; # prints 123
In Perl, a 2-dimensional array is actually an array of array references. Here is an example:
my #array = ( ['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i'] );
print #{$array[1]}; # prints def
Now let's try passing in an array reference to a subroutine.
my #array = ( ['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i'] );
example(\#array); # pass in an array reference
sub example {
my #arr = #{$_[0]}; # use the array reference to assign a new array
print #{$arr[1]};
print #{$_[0][1]}; # using the array reference works too!
}
Now let's put it together and print the whole 2-d array.
my #array = ( ['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i'] );
example(\#array);
sub example {
my #arr = #{$_[0]};
for my $ref (#arr) {
print #{$ref};
}
} # prints abcdefghi
You could adapt this example pretty easily if you wanted to use it for your printElements subroutine.
Another note about printing the elements in an array. Let's take this line from the last example:
print #{$ref};
Since we are calling it every time through the loop, we may want to print a new line at the end of it.
print #{$ref} . "\n";
What does this print? Try it! Does it work?
This is where the built-in subroutine join comes in handy.
print join(" ", #{$ref}) . "\n";
For loops are generally the best way to iterate through an array. My answer here talks a little about doing it with a while loop: https://stackoverflow.com/a/21950936/2534803
You can also check out this question: Best way to iterate through a Perl array
To make references a bit easier to understand, I prefer the -> syntax instead of the munge-it-all-together syntax.
Instead of:
#{$_[1]}[0].
Try
$_[1]->[0];
It means the same thing. It's easier, and it's cleaner to see. You can see that $_[1] is an array reference, and that you're referencing the first element in that array reference.
However, an even better way is to simply set variables for your various element in #_. You have to type a few more letters, but your code is much easier to understand, and is a lot easier to debug.
sub print_elements {
my $file_handle = shift; # This isn't a "reference", but an actual file handle
my $cities_array_ref = shift; # This is a reference to your array
my #cities = #{ $cities_array_ref }; # Dereferencing makes it easier to do your program
Now, your subroutine is dealing with variables which have names, and your array reference is an array which makes things cleaner. Also, you cannot accidentally affect the values in your main program. When you use #_, it's a direct link to the values you pass to it. Modifying #_ modifies the value in your main program which is probably not what you want to do.
So, going through your subroutine:
sub printElements {
my file_handle = shift;
my $cities_array_ref = shift;
my #cities = #{ $cities_array_ref };
my $counter;
my $array_size = #cities; # No need for scalar. This is automatic
while ( my $line = $file_handle->getline and $counter < 1000 ) {
chomp $line;
my $city_number = $counter % $array_size;
print $cities[$city_number]. "\n";
$counter += 1;
}
}
Note, how much easier it is to see what's going on by simply assigning a few variables instead of trying to cram everything together. I can easily see what your parameters to your subroutine are suppose to be. If you called the subroutine with the incorrect parameter order, you could easily spot it. Also notice I broke out $counter % $array_size and assigned that to a variable too. Suddenly, it's obvious what I'm trying to get out of it.
However, I can't see where you're using the $line you're getting with getline. Did I miss something?
By the way, I could have done this without referencing the array in the while loop too:
sub printElements {
my file_handle = shift;
my $cities = shift; # This is an array reference!
my $counter;
my $array_size = #{ $cities }; # I need to deref to get an array
while ( my $line = $file_handle->getline and $counter < 1000 ) {
chomp $line;
my $city_number = $counter % $array_size;
print $cities->[$city_number]. "\n"; # That's it!
$counter += 1;
}
}
See how that -> syntax makes it easy to see that $cities is a reference that points to an array? A lot cleaner and easier to understand than ${$cities}[$city_number].
This code wouldn't actually compile.
print #{$_[1][($counter)%$arraySize];
It probably wants to be
print $_[1]->[($counter)%$arraySize];
after you fix arraySize.
If the result is somehow a pointer to an array then
print "#{$_[1]->[($counter)%$arraySize]}";
I figured how to solve my #1 problem (still looking for help on my #2 if anyone can).
I changed
my $arraySize = scalar $_[1];
to
my $arraySize = #{$_[1]};
And my second print statement is printing the way I want.
It seems that scalar $_[1] was taking the memory address of the array and I was moding against this allowing my $counter to go way beyond the number of elements in the array.
References confuse me too! I always like to dereference them as soon as possible. This works for me:
sub printElements{
my $counter = 0;
my $fh = $_[0];
my #array = #{$_[1]};
my $arraySize = scalar #array;
# Prints fine!!!
print #array[($counter)%$arraySize];
while ((my $line = $fh->getline()) && $counter < 1000){
#Doesn't print. Generates the above error
print #array[($counter)%$arraySize];
$counter += 1;
}
}
I'm sure someone else could explain in the comments why they think working with the reference is a better way (please do), but under the mantra of "keep it simple", I don't like working with them. Probably because I was never a C programmer...

Ways to Flatten A Perl Array in Scalar Context

I recently started learning perl and have a question that I'm not finding a clear answer to on the Internet. say I have something like this,
#arr = (1, 2, 3);
$scal = "#arr"
# $scal is now 123.
Is the use of quotes the only way to flatten the array so that each element is stored in the scalar value? It seems improbable but I haven't found any other ways of doing this. Thanks in advance.
The join function is commonly used to "flatten" lists. Lets you specify what you want between each element in the resulting string.
$scal = join(",", #arr);
# $scal is no "1,2,3"
In your example, you're interpolating an array in a double-quoted string. What happens in those circumstances is is controlled by Perl's $" variable. From perldoc perlvar:
$LIST_SEPARATOR
$"
When an array or an array slice is interpolated into a double-quoted string or a similar context such as /.../ , its elements are separated by this value. Default is a space. For example, this:
print "The array is: #array\n";
is equivalent to this:
print "The array is: " . join($", #array) . "\n";
Mnemonic: works in double-quoted context.
The default value for $" is a space. You can obviously change the value of $".
{
local $" = ':',
my #arr = (1, 2, 3);
my $scalar = "#arr"; # $scalar contains '1:2:3'
}
As with any of Perl's special variables, it's always best to localise any changes within a code block.
You could also use join without any seperator
my $scalar = join( '' , #array ) ;
There is more than one way to do it.
in the spirit of TIMTOWTDI:
my $scal;
$scal .= $_ foreach #arr;
Read section Context in perldata. Perl has two major contexts: scalar and list.
For example:
#a = (1, 1, 1); # list context
print #a; # list context
$count = #a; # scalar context, returns the number of elements in #a
etc.

Resources