Perl unit testing: check if a string is an array - arrays

I have this function that I want to test:
use constant NEXT => 'next';
use constant BACK => 'back';
sub getStringIDs {
return [
NEXT,
BACK
];
}
I've tried to write the following test, but it fails:
subtest 'check if it contains BACK' => sub {
use constant BACK => 'back';
my $strings = $magicObject->getStringIDs();
ok($strings =~ /BACK/);
}
What am I doing wrong?

Your getStringIDs() method returns an array reference.
The regex binding operator (=~) expects a string on its left-hand side. So it converts your array reference to a string. And a stringified array reference will look something like ARRAY(0x1ff4a68). It doesn't give you any of the contents of the array.
You can get from your array reference ($strings) to an array by dereferencing it (#$strings). And you can stringify an array by putting it in double quotes ("#$strings").
So you could do something like this:
ok("#$strings" =~ /BACK/);
But I suspect, you want word boundary markers in there:
ok("#$strings" =~ /\bBACK\b/);
And you might also prefer the like() testing function.
like("#$strings", qr[\bBACK\b], 'Strings array contains BACK');
Update: Another alternative is to use grep to check that one of your array elements is the string "BACK".
# Note: grep in scalar context returns the number of elements
# for which the block evaluated as 'true'. If we don't care how
# many elements are "BACK", we can just check that return value
# for truth with ok(). If we care that it's exactly 1, we should
# use is(..., 1) instead.
ok(grep { $_ eq 'BACK' } #$strings, 'Strings array contains BACK');
Update 2: Hmm... the fact that you're using constants here complicates this. Constants are subroutines and regexes are strings and subroutines aren't interpolated in strings.

The return value of $magicObject->getStringIDs is an array reference, not a string. It looks like the spirit of your test is that you want to check if at least one element in the array pattern matches BACK. The way to do this is to grep through the dereferenced array and check if there are a non-zero number of matches.
ok( grep(/BACK/,#$strings) != 0, 'contains BACK' );
At one time, the smartmatch operator promised to be a solution to this problem ...
ok( $strings ~~ /BACK/ )
but it has fallen into disrepute and should be used with caution (and the no warnings 'experimental::smartmatch' pragma).

The in operator is your friend.
use Test::More;
use syntax 'in';
use constant NEXT => 'next';
use constant BACK => 'back';
ok BACK |in| [NEXT, BACK], 'BACK is in the arrayref';
done_testing;

Related

how to replace values in an array of hashes properly in Perl?

As seen below, I have a foreach loop inside which, a value inside an array of hashes is being replaced with a value from another array of hashes.
The second foreach loop is just to print and test whether the values got assigned correctly.
foreach my $row (0 .. $#row_buff) {
$row_buff[$row]{'offset'} = $vars[$row]{'expression'};
print $row_buff[$row]{'offset'},"\n";
}
foreach (0 .. $#row_buff) {
print $row_buff[$_]{'offset'},"\n";
}
Here #row_buff and #vars are the two array of hashes. They are prefilled with values for all keys used.
The hashes were pushed into the arrays like so:
push #row_buff, \%hash;
ISSUE:
Let's say the print statement in the first foreach print's like this:
string_a
string_b
string_c
string_d
Then the print statement in the second foreach loop print's like so:
string_d
string_d
string_d
string_d
This is what confuses me. Both print statements are supposed to print the exact same way am I right? But the value that gets printed by the second print statement is just the last value alone in a repeated manner. Could someone please point me to what could be going wrong here? Any hint is greatly appreciated. This is my first time putting up a question so pardon me if I missed anything.
UPDATE
There was a bit of information that I could have added, sorry about that everyone. There was one more line before the foreach, it was like so:
#row_buff = (#row_buff) x $itercnt;
foreach my $row (0 .. $#row_buff) {
$row_buff[$row]{'offset'} = $vars[$row]{'expression'};
print $row_buff[$row]{'offset'},"\n";
}
foreach (0 .. $#row_buff) {
print $row_buff[$_]{'offset'},"\n";
}
$itercnt is an integer. I was using it to replicate the #row_buff that many number of times.
This clearly has to do with storing references on the array, instead of independent data. How that comes about isn't clear since details aren't given, but the following discussion should help.
Consider these two basic examples.
First, place a hash (reference) on an array, first changing a value each time
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd);
# use Storable qw(dclone);
my %h = ( a => 1, b => 2 );
my #ary_w_refs;
for my $i (1..3) {
$h{a} = $i;
push #ary_w_refs, \%h; # almost certainly WRONG
# push #ary_w_refs, { %h }; # *copy* data
# push #ary_w_refs, dclone \%h; # may be necessary, or just safer
}
dd $_ for #ary_w_refs;
I use Data::Dump for displaying complex data structures, for its simplicity and default compact output. There are other modules for this purpose, Data::Dumper being in the core (installed).
The above prints
{ a => 3, b => 2 }
{ a => 3, b => 2 }
{ a => 3, b => 2 }
See how that value for key a, that we changed in the hash each time, and so supposedly set for each array element, to a different value (1, 2, 3) -- is the same in the end, and equal to the one we assigned last? (This appears to be the case in the question.)
This is because we assigned a reference to the hash %h to each element, so even though every time through the loop we first change the value in the hash for that key in the end it's just the reference there, at each element, to that same hash.∗
So when the array is queried after the loop we can only get what is in the hash (at key a it's the last assigned number, 3). The array doesn't have its own data, only a pointer to hash's data.† (Thus hash's data can be changed by writing to the array as well, as seen in the example below.)
Most of the time, we want a separate, independent copy. Solution? Copy the data.
Naively, instead of
push #ary_w_refs, \%h;
we can do
push #ary_w_refs, { %h };
Here {} is a constructor for an anonymous hash,‡ so %h inside gets copied. So actual data gets into the array and all is well? In this case, yes, where hash values are plain strings/numbers.
But what when the hash values themselves are references? Then those references get copied, and #ary_w_refs again does not have its own data! We'll have the exact same problem. (Try the above with the hash being ( a => [1..10] ))
If we have a complex data structure, carrying references for values, we need a deep copy. One good way to do that is to use a library, and Storable with its dclone is very good
use Storable qw(dclone);
...
push #ary_w_refs, dclone \%h;
Now array elements have their own data, unrelated (but at the time of copy equal) to %h.
This is a good thing to do with a simple hash/array as well, to be safe from future changes, whereby the hash is changed but we forget about the places where it's copied (or the hash and its copies don't even know about each other).
Another example. Let's populate an array with a hashref, and then copy it to another array
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd pp);
my %h = ( a => 1, b => 2 );
my #ary_src = \%h;
say "Source array: ", pp \#ary_src;
my #ary_tgt = $ary_src[0];
say "Target array: ", pp \#ary_tgt;
$h{a} = 10;
say "Target array: ", pp(\#ary_tgt), " (after hash change)";
$ary_src[0]{b} = 20;
say "Target array: ", pp(\#ary_tgt), " (after hash change)";
$ary_tgt[0]{a} = 100;
dd \%h;
(For simplicity I use arrays with only one element.)
This prints
Source array: [{ a => 1, b => 2 }]
Target array: [{ a => 1, b => 2 }]
Target array: [{ a => 10, b => 2 }] (after hash change)
Target array: [{ a => 10, b => 20 }] (after hash change)
{ a => 100, b => 20 }
That "target" array, which supposedly was merely copied off of a source array, changes when the distant hash changes! And when its source array changes. Again, it is because a reference to the hash gets copied, first to one array and then to the other.
In order to get independent data copies, again, copy the data, each time. I'd again advise to be on the safe side and use Storable::dclone (or an equivalent library of course), even with simple hashes and arrays.
Finally, note a slightly sinister last case -- writing to that array changes the hash! This (second-copied) array may be far removed from the hash, in a function (in another module) that the hash doesn't even know of. This kind of an error can be a source of really hidden bugs.
Now if you clarify where references get copied, with a more complete (simple) representation of your problem, we can offer a more specific remedy.
∗ An important way of using a reference that is correct, and which is often used, is when the structure taken the reference of is declared as a lexical variable every time through
for my $elem (#data) {
my %h = ...
...
push #results, \%h; # all good
}
That lexical %h is introduced anew every time so the data for its reference on the array is retained, as the array persists beyond the loop, independently for each element.
It is also more efficient doing it this way since the data in %h isn't copied, like it is with { %h }, but is just "re-purposed," so to say, from the lexical %h that gets destroyed at the end of iteration to the reference in the array.
This of course may not always be suitable, if a structure to be copied naturally lives outside of the loop. Then use a deep copy of it.
The same kind of a mechanism works in a function call
sub some_func {
...
my %h = ...
...
return \%h; # good
}
my $hashref = some_func();
Again, the lexical %h goes out of scope as the function returns and it doesn't exist any more, but the data it carried and a reference to it is preserved, since it is returned and assigned so its refcount is non-zero. (At least returned to the caller, that is; it could've been passed yet elsewhere during the sub's execution so we may still have a mess with multiple actors working with the same reference.) So $hashref has a reference to data that had been created in the sub.
Recall that if a function was passed a reference, when it was called or during its execution (by calling yet other subs which return references), changed and returned it, then again we have data changed in some caller, potentially far removed from this part of program flow.
This is done often of course, with larger pools of data which can't just be copied around all the time, but then one need be careful and organize code (to be as modular as possible, for one) so to minimize chance of errors.
† This is a loose use of the word "pointer," for what a reference does, but if one were to refer to C I'd say that it's a bit of a "dressed" C-pointer
‡ In a different context it can be a block

Perl remove spaces from array elements

I have an array which contains n number of elements. So there might be chances the each element could have spaces in the beginning or at the end. So I want to remove the space in one shot. Here is my code snippet which is working and which is not working (The one which not working is able to trim at the end but not from the front side of the element).
Not Working:
....
use Data::Dumper;
my #a = ("String1", " String2 ", "String3 ");
print Dumper(\#a);
#a = map{ (s/\s*$//)&&$_}#a;
print Dumper(\#a);
...
Working:
...
use Data::Dumper;
my #a = ("String1", " String2 ", "String3 ");
print Dumper(\#a);
my #b = trim_spaces(#a);
print Dumper(\#b);
sub trim_spaces
{
my #strings = #_;
s/\s+//g for #strings;
return #strings;
}
...
No idea whats the difference between these two.
If there is any better please share with me!!
Your "not working example" only removes spaces from one end of the string.
The expression s/^\s+|\s+$//g will remove spaces from both ends.
You can improve your code by using the /r flag to return a modified copy:
#a = map { s/^\s+|\s+$//gr } #a;
or, if you must:
#a = map { s/^\s+|\s+$//g; $_ } #a;
This block has two problems:
{ (s/\s*$//)&& $_ }
The trivial problem is that it's only removing trailing spaces, not leading, which you said you wanted to remove as well.
The more insidious problem is the misleading use of &&. If the regex in s/// doesn't find a match, it returns undef; on the left side of a &&, that means the right side is never executed, and the undef becomes the value of the whole block. Which means any string that the regex doesn't match will be removed and replaced with a undef in the result array returned by map, which is probably not what you want.
That won't actually happen with your regex as written, because every string matches \s*, and s/// still returns true even if it doesn't actually modify the string. But that's dependent on the regex and a bad assumption to make.
More generally, your approach mixes and matches two incompatible methods for modifying data: mutating in place (s///) versus creating a copy with some changes applied (map).
The map function is designed to create a new array whose elements are based in some way on an input array; ideally, it should not modify the original array in the process. But your code does – even if you weren't assigning the result of map back to #a, the s/// modifies the strings inside #a in place. In fact, you could remove the #a = from your code and get the same result. This is not considered good practice; map should be used for its return value, not its side effects.
If you want to modify the elements of an array in place, your for solution is actually the way to go. It makes it clear what you're doing and side effects are OK.
If you want to keep the original array around and make a new one with the changes applied, you should use the /r flag on the substitutions, which causes them to return the resulting string instead of modifying the original in place:
my #b = map { s/^\s+|\s+$//gr } #a;
That leaves #a alone and creates a new array #b with the trimmed strings.

How to reference a split expression in Perl?

I want to create a reference to an array obtained by a split in Perl.
I'm thinking something like:
my $test = \split( /,/, 'a,b,c,d,e');
foreach $k (#$test) {
print "k is $k\n";
}
But that complains with Not an ARRAY reference at c:\temp\test.pl line 3.
I tried a few other alternatives, all without success.
Background explanation:
split, like other functions, returns a list. You cannot take a reference to a list. However, if you apply the reference operator to a list, it gets applied to all its members. For example:
use Data::Dumper;
my #x = \('a' .. 'c');
print Dumper \#x
Output:
$VAR1 = [
\'a',
\'b',
\'c'
];
Therefore, when you write my $test = \split( /,/, 'a,b,c,d,e');, you get a reference to the last element of the returned list (see, for example, What’s the difference between a list and an array?). Your situation is similar to:
Although it looks like you have a list on the righthand side, Perl actually sees a bunch of scalars separated by a comma:
my $scalar = ( 'dog', 'cat', 'bird' ); # $scalar gets bird
Since you’re assigning to a scalar, the righthand side is in scalar context. The comma operator (yes, it’s an operator!) in scalar context evaluates its lefthand side, throws away the result, and evaluates it’s righthand side and returns the result. In effect, that list-lookalike assigns to $scalar it’s rightmost value. Many people mess this up becuase they choose a list-lookalike whose last element is also the count they expect:
my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally
In your case, what you get on the RHS is a list of references to the elements of the list returned by split, and the last element of that list ends up in $test. You first need to construct an array from those return values and take a reference to that. You can make that a single statement by forming an anonymous array and storing the reference to that in $test:
my $test = [ split( /,/, 'a,b,c,d,e') ];
Surround split command between square brackets to make an anonymous reference.
my $test = [ split( /,/, 'a,b,c,d,e') ];
Giving it a name has different semantics in that changes to the named variable then change what was referenced while each anonymous array is unique. I discovered this the hard way by doing this in a loop.

Shorthand to modify value in a array of hash refs

I have a array of hash refs. The date field in a hash is stored in epoch. I have to format it to human readable before returning the array. Following is my code:
for my $post (#sorted) {
$post->{date} = format_time($post->{date});
push #formatted, $post;
}
I have tried
my #formatted = map {$_{date} = format_time($_{date})} #sorted;
All fields except {date} are dropped.
Is there any smarter method?
Thanks
$_->{date} = format_time($_->{date}) for #sorted.
Then the dates in #sorted will have been converted.
There's nothing really wrong with the for loop you're currently using. The map can work too, but there are two problems:
The hashref in the array is stored in the scalar $_. You are accessing the hash %_.
The return value of the block is what will end up in the result array. In your case, that's the result of the assignment rather than the entire hashref.
Also, do note that the hashrefs in #sorted will be modified. The following map statement should work for you:
my #formatted = map { $_->{date} = format_time($_->{date}); $_ } #sorted;
If you really want:
sub format_time_in_place {
my $time = $_[0];
# do work
$_[0] = $reformatted_time;
}
# elsewhere
format_time_in_place($_->{date}) for #sorted;
I helpfully renamed the function to reduce the odds of the maintenance programmer being tempted to become a homicidal axe murderer. There still may be an element of shock if said programmer was not aware that you can change passed in arguments with the correct manipulation of #_.
This is equivalent to your code:
$_->{date} = format_time($_->{date}) for #sorted;
#formatted = #sorted;
I don't know why you want two identical arrays, but I don't see the point of combining those two unrelated operations. It'll just make your code less readable.
If you want or don't mind not referencing the same hashes as are in #sorted, you can:
my #formatted = map +{ %$_, 'date' => format_time($_->{date}) }, #sorted;

Perl How to access a hash that is the element of an array that is the value of another hash?

I am trying to create a Hash that has as its value an array.
The first element of the value(which is an array) is a scalar.
The second element of the value(which is an array) is another hash.
I have put values in the key and value of this hash as follows :
${${$senseInformationHash{$sense}[1]}{$word}}++;
Here,
My main hash -> senseInformationHash
My Value -> Is an Array
So, ${$senseInformationHash{$sense}[1]} gives me reference to my hash
and I put in key and value as follows :
${${$senseInformationHash{$sense}[1]}{$word}}++;
I am not sure if this is a correct way to do it. Since I am stuck and not sure how I can print this complex thing out. I want to print it out in order to check if I am doing it correctly.
Any help will be very much appreciated. Thanks in advance!
Just write
$sense_information_hash{$sense}[1]{$word}++;
and be done with it.
Perl gets jealous of CamelCase, you know, so you should use proper underscores. Otherwise it can spit and buck and generally misbehave.
A hash value is never an array, it is an array reference.
To see if you are doing it right, you can dump out the whole structure:
my %senseInformationHash;
my $sense = 'abc';
my $word = '123';
${${$senseInformationHash{$sense}[1]}{$word}}++;
use Data::Dumper;
print Dumper( \%senseInformationHash );
which gets you:
$VAR1 = {
'abc' => [
undef,
{
'123' => \1
}
]
};
Note the \1: presumably you want the value to be 1, not a reference to the scalar 1. You are getting the latter because your ${ ... }++; says treat what's in the curly braces as a scalar reference and increment the scalar referred to.
${$senseInformationHash{$sense}[1]}{$word}++; does what you want, as does $senseInformationHash{$sense}[1]{$word}++. You may find http://perlmonks.org/?node=References+quick+reference helpful in seeing why.
Thanks Axeman and TChrist.
The code I have to access it is as follows :
foreach my $outerKey (keys(%sense_information_hash))
{
print "\nKey => $outerKey\n";
print " Count(sense) => $sense_information_hash{$outerKey}[0]\n";
foreach(keys (%{$sense_information_hash{$outerKey}[1]}) )
{
print " Word wt sense => $_\n";
print " Count => $sense_information_hash{$outerKey}[1]{$_}\n";
}
}
This is working now. Thanks much!

Resources