Why doesn't ||= work with arrays? - arrays

I use the ||= operator to provide default values for variables, like
$x ||= 1;
I tried to use this syntax with an array but got a syntax error:
#array||= 1..3;
Can't modify array dereference in logical or assignment (||=) ...
What does it mean and how should I provide arrays with default values?

Because || is a scalar operator. If #array||= 1..3; worked, it would evaluate 1..3 in scalar context, which is not what you want. It's also evaluating the array in scalar context (which is ok, because an empty array in scalar context is false), except that you can't assign to scalar(#array).
To assign a default value, use:
#array = 1..3 unless #array;
But note that there's no way to tell the difference between an array that has never been initialized and one that has been assigned the empty list. It's not like a scalar, where you can distinguish between undef and the empty string (although ||= doesn't distinguish between them).
eugene y found this perl.perl5.porters message (the official Perl developers' mailing list) that goes into more detail about this.

This page has a good explanation, imho:
op= can occur between any two
expressions, not just a var and an
expression, but the left one must be
an lvalue in scalar context.
Since #x ||= 42 is equivalent to
scalar(#x) = #x || 42, and you aren't
allowed to use scalar(#x) as an
lvalue, you get an error.

Related

Perl: Negative range indexing of array reference

I have an array reference that I would like to slice the last two elements of the array. I found that using -2..-1 would work. I was using the following syntax:
subroutine($var->[-2..-1]);
This gave me the following error:
Use of uninitialized value $. in range (or flip)
Argument "" isn't numeric in array element
I changed the line to this and that worked:
subroutine(#$var[-2..-1]);
I don't understand why the second way works though and the first doesn't. I thought using the array operator was the same as dereferencing with #. Is the context ambiguous with the arrow operator?
-> is the dereference operator. $aref->[$i] is to an $aref like $arr[$i] is to #arr. To get a slice from an array, you need to change the sigil: #arr[$i, $j]. It's similar for dereference, but instead of changing the sigil, you first dereference the reference, then slice it:
#{ $aref }[$i, $j]
which can be shortened to #$aref[$i, $j].
So the -> operator can only be used for single values for array and hash references. You need #{} for slices.

Perl: Length of an anonymous list

How to get the length of an anonymous list?
perl -E 'say scalar ("a", "b");' # => b
I expected scalar to return the list in a scalar context - its length.
Why it returns the second (last) element?
It works for an array:
perl -E 'my #lst = ("a", "b"); say scalar #lst;' # => 2
You can use
my $n = () = f();
As applied to your case, it's
say scalar( () = ("a", "b") );
or
say 0+( () = ("a", "b") );
First, let's clear up a misconception.
You appear to believe that some operators evaluate to some kind of data structure called a list regardless of context, and that this list returns its length when coerced into scalar context.
All of that is incorrect.
An operator must evaluate to exactly one scalar in scalar context, and a sub must return exactly one scalar in scalar context. In list context, operators can evaluate to any number of scalars, and subs can return any number of scalars. So when we say an operator evaluates to a list, and when we say a sub returns a list, we aren't referring to some data structure; we are simply using "list" as a shorthand for "zero or more scalars".
Since there's no such thing as a list data structure, it can't be coerced into a scalar. Context isn't a coercion; context is something operators check to determine to what they evaluate in the first place. They literally let context determine their behaviour and what they return. It's up to each operator to decide what they return in scalar and list context, and there's a lot of variance.
As you've noted,
The #a operator in scalar context evaluates to a single scalar: the length of the array.
The comma operator in scalar context evaluates to a single scalar: the same value as its last operand.
The qw operator in scalar context evaluates to a single scalar: the last value it would normally return.
On to your question.
To determine to how many scalars an operator would evaluate when evaluated in list context, we need to evaluate the operator in list context. An operator always evaluates to a single scalar in scalar context, so your attempts to impose a scalar context are ill-founded (unless the operator happens to evaluate to the length of what it would have returned in list context, as is the case for #a, but not for many other operators).
The solution is to use
my $n = () = f();
The explanation is complicated.
One way
perl -wE'$len = () = qw(a b c); say $len' #--> 3
The = () = "operator" is a play on context. It forces list context on its right side and assigns the length of the list. See this post about list vs scalar assignments and this page for some thoughts on all this.
If this need be used in a list context then the LHS context can also be forced by scalar, like
say scalar( () = qw(a b c) );
Or by yet other ways (0+...), but scalar is in this case actually suitable, and clearest.
In your honest attempt scalar imposes the scalar context on its operand -- or here an expression, which is thus evaluated by the comma operator, whereby one after another term is discarded, until the last one which is returned.
You'd get to know about that with warnings on, as it would emit
Useless use of a constant ("a") in void context at -e line 1
Warnings can always be enabled in one-liners as well, with -w flag. I recommend that.
I'd like to also comment on the notion of a "list" in Perl, often misunderstood.
In programming text a "list" is merely a syntax device, that code can use; a number of scalars, perhaps submitted to a function, or assigned to an array variable, or so. It is often identified by parenthesis but those really only decide precedence and don't "make" anything nor give a "list" any sort of individuality, like a variable has; a list is just a grouping of scalars.
Internally that's how data is moved around; a "list" is a fleeting bunch of scalars on a stack, returned somewhere and gone.
A list is not -- not -- any kind of a data structure or a data type; that would be an array. See for instance a perlfaq4 item and this related page.
Perl references are hard. I'm not sending you to read prelref since it's not something that anyone can just read and start using.
Long story short, use this pattern to get an anonymous array size: 0+#{[ <...your array expression...> ]}
Example:
print 0+#{[ ("a", "b") ]};

Perl unit testing: check if a string is an array

I have this function that I want to test:
use constant NEXT => 'next';
use constant BACK => 'back';
sub getStringIDs {
return [
NEXT,
BACK
];
}
I've tried to write the following test, but it fails:
subtest 'check if it contains BACK' => sub {
use constant BACK => 'back';
my $strings = $magicObject->getStringIDs();
ok($strings =~ /BACK/);
}
What am I doing wrong?
Your getStringIDs() method returns an array reference.
The regex binding operator (=~) expects a string on its left-hand side. So it converts your array reference to a string. And a stringified array reference will look something like ARRAY(0x1ff4a68). It doesn't give you any of the contents of the array.
You can get from your array reference ($strings) to an array by dereferencing it (#$strings). And you can stringify an array by putting it in double quotes ("#$strings").
So you could do something like this:
ok("#$strings" =~ /BACK/);
But I suspect, you want word boundary markers in there:
ok("#$strings" =~ /\bBACK\b/);
And you might also prefer the like() testing function.
like("#$strings", qr[\bBACK\b], 'Strings array contains BACK');
Update: Another alternative is to use grep to check that one of your array elements is the string "BACK".
# Note: grep in scalar context returns the number of elements
# for which the block evaluated as 'true'. If we don't care how
# many elements are "BACK", we can just check that return value
# for truth with ok(). If we care that it's exactly 1, we should
# use is(..., 1) instead.
ok(grep { $_ eq 'BACK' } #$strings, 'Strings array contains BACK');
Update 2: Hmm... the fact that you're using constants here complicates this. Constants are subroutines and regexes are strings and subroutines aren't interpolated in strings.
The return value of $magicObject->getStringIDs is an array reference, not a string. It looks like the spirit of your test is that you want to check if at least one element in the array pattern matches BACK. The way to do this is to grep through the dereferenced array and check if there are a non-zero number of matches.
ok( grep(/BACK/,#$strings) != 0, 'contains BACK' );
At one time, the smartmatch operator promised to be a solution to this problem ...
ok( $strings ~~ /BACK/ )
but it has fallen into disrepute and should be used with caution (and the no warnings 'experimental::smartmatch' pragma).
The in operator is your friend.
use Test::More;
use syntax 'in';
use constant NEXT => 'next';
use constant BACK => 'back';
ok BACK |in| [NEXT, BACK], 'BACK is in the arrayref';
done_testing;

Perl structures and assignment

In perldata, I found the following examples and explanations.
#foo = ('cc', '-E', $bar);
assigns the entire list value to array #foo, but
$foo = ('cc', '-E', $bar);
assigns the value of variable $bar to the scalar variable $foo.
This really confuses me, so $foo is equivalent to $bar? How to understand the difference between #foo and $foo
The examples in perldata:
#foo = ('cc', '-E', $bar);
$foo = ('cc', '-E', $bar);
Because #foo creates a list context, all the values in the parens are assigned to #foo. $foo on the other hand is a scalar, and so is only assigned the last element in the list, because it is in scalar context.
It is equal to saying:
'cc', '-E';
$foo = $bar;
In Perl, a scalar, like $foo, can only hold a single value, and so the rest of the list is simply discarded. An array, like #foo will slurp as many values as the list holds.
In Perl, it is allowed to have the same name on variables of different types. #foo and $foo will be considered two different variables.
Expressions are allowed to have different meanings depending on which context they're evaluated in. The three main contexts are list, scalar, and void, though there exists several subcontexts of scalar context (boolean, string, and numeric being the most important ones).
The comma operator is no exception to this rule. In list context, the comma operator acts as a list concatenation operator, evaluating its operands in list context and combining the resulting lists into a single list. This is likely the context you're familiar with when dealing with the comma operator.
However, in scalar context, the comma operator functions much like the comma operator in C; it evaluates a sequence of expressions and discards their results, except for the rightmost expression which it returns (as a side note, the expressions that are discarded are evaluated in void context, and the expression that's returned is evaluated in scalar context). To learn how each of the perl operators behave in different contexts, I suggest reading perlop.
In order to fully understand context, you have to realize that the outermost operator enforces a context on its operands, whose operators then enforce a context on their operands, and so on (another side note: the outermost expression of a line is always evaluated in void context). So, for example, when the assignment operator is being used with an array or hash variable (beginning with a % or #), the right-hand side of the assignment is consequently evaluated in list context. If the variable is a scalar, however, the right-hand side of the assignment is evaluated in scalar context instead. This is why the comma operators in the assignments below:
#foo = ('cc', '-E', $bar);
$foo = ('cc', '-E', $bar);
act in completely different ways.
For more information on how you can write code that controls or reacts to context, read about the scalar and wantarray operators.

How to reference a split expression in Perl?

I want to create a reference to an array obtained by a split in Perl.
I'm thinking something like:
my $test = \split( /,/, 'a,b,c,d,e');
foreach $k (#$test) {
print "k is $k\n";
}
But that complains with Not an ARRAY reference at c:\temp\test.pl line 3.
I tried a few other alternatives, all without success.
Background explanation:
split, like other functions, returns a list. You cannot take a reference to a list. However, if you apply the reference operator to a list, it gets applied to all its members. For example:
use Data::Dumper;
my #x = \('a' .. 'c');
print Dumper \#x
Output:
$VAR1 = [
\'a',
\'b',
\'c'
];
Therefore, when you write my $test = \split( /,/, 'a,b,c,d,e');, you get a reference to the last element of the returned list (see, for example, What’s the difference between a list and an array?). Your situation is similar to:
Although it looks like you have a list on the righthand side, Perl actually sees a bunch of scalars separated by a comma:
my $scalar = ( 'dog', 'cat', 'bird' ); # $scalar gets bird
Since you’re assigning to a scalar, the righthand side is in scalar context. The comma operator (yes, it’s an operator!) in scalar context evaluates its lefthand side, throws away the result, and evaluates it’s righthand side and returns the result. In effect, that list-lookalike assigns to $scalar it’s rightmost value. Many people mess this up becuase they choose a list-lookalike whose last element is also the count they expect:
my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally
In your case, what you get on the RHS is a list of references to the elements of the list returned by split, and the last element of that list ends up in $test. You first need to construct an array from those return values and take a reference to that. You can make that a single statement by forming an anonymous array and storing the reference to that in $test:
my $test = [ split( /,/, 'a,b,c,d,e') ];
Surround split command between square brackets to make an anonymous reference.
my $test = [ split( /,/, 'a,b,c,d,e') ];
Giving it a name has different semantics in that changes to the named variable then change what was referenced while each anonymous array is unique. I discovered this the hard way by doing this in a loop.

Resources