Understanding referencing within perl arrays and how data is accessed - arrays

I could do with some help on perl and how it handles its arrays. (A long time ago) I used to do quite a lot of coding (hacking would be a better description, never pretty work) using php, java, js, etc but for various reasons I'm using perl for a project and i'm struggling to work out why I'm finding arrays such a pain.
For example, the following code:
#inflightsequences=([1,6,[["SRCIP","1.2.3.4"],["DSTIP","5.6.7.8"]]],[2,2,[["SRCIP","1.2.3.4"],["DSTIP","5.6.7.8"]]]);
foreach (#inflightsequences) {print Dumper #_};
where the definition of the array creates this (printed using Dumper)
$VAR1 = [
1,
6,
[
[
'SRCIP',
'1.2.3.4'
],
[
'DSTIP',
'5.6.7.8'
]
]
];
$VAR2 = [
2,
2,
[
[
'SRCIP',
'1.2.3.4'
],
[
'DSTIP',
'5.6.7.8'
]
]
];
(NB I'll refer to the data inside the array using VAR1 and VAR2 from now on so its clear which block I'm referring to, regardless of whether thats actually what Dumper calls it)
...but the foreach outputs absolutely nothing, when I expected it to cycle twice and output whats in VAR1 first then in VAR2. However
print Dumper #inflightsequences[0];
print Dumper #inflightsequences[1];
does print out VAR1 and VAR2 as expected.
Then I extract the first item from the #inflightsequences array
#dataset = shift(#inflightsequences);
and I expected print $dataset[1] to print out the first value (1) in what was VAR1 and print $dataset[2] to print the second value (6) but no, to achieve what I expected I have to do print $dataset[0][0] and print $dataset[0][1]. Why the extra [0]?
And thus I have got myself completely confused....
Thanks
--Chris

What is confusing you is that the elements of Perl arrays are always scalar values. You make arrays of arrays by using references for those scalars.
You can create an array reference either by building a named array and taking its reference
my #data = ( 'a', 'b', 'c' );
my $array_ref = \#data;
or by creating an anonymous array
my $array_ref = [ 'a', 'b', 'c' ];
The only difference between these two is that the data can be accessed through the name #data in the first case as well as through the reference $array_ref. To access elements of an array through a reference, you use the arrow operator, so
$array_ref->[0]
is the same as
$data[0]
The reason your foreach loop prints nothing is that you are dumping the contents of the array #_ which you have never mentioned before and is empty. #_ is the array that is set within a subroutine to the actual parameters passed when that subroutine is called. It has no use otherwise.
Remembering that array elements are scalars, and that if you don't specify a loop control variable then Perl will use $_, what you should have written is
foreach (#inflightsequences) { print Dumper($_) }
or, more Perlishly,
print Dumper($_) for #inflightsequences
The same applies to your statement
#dataset = shift(#inflightsequences)
which, again, because the contents of #inflightsequences are scalars is removing the first array reference and putting it into #dataset, which is now just a one-element array containing an array reference. That means you have moved $inflightsequences[0] to $dataset[0], which is now equal to
[1, 6, [ ["SRCIP", "1.2.3.4"], ["DSTIP", "5.6.7.8"] ] ]
not forgetting that the square brackets create a reference to an anonymous array. So, like our $array_ref->[0] above, you can access the first element of this array using $dataset[0]->[0]. And because Perl allows you to remove the arrow operator between pairs of square brackets (and curly brackets if we're using hashes) you can contract that to $dataset[0][0] which happens to be the value 1.
I hope that helps. You would do well to read perlref and experiment a little. Note that the Data::Dump module produces output much superior to Data::Dumper, but you may need to install it as it isn't a core module. Once it's installed the code will look like
use Data::Dump;
dd \#inflightsequences;

#inflightsequences is an array of array references so
$r=shift(#inflightsequences)
print $r->[0]
will show 1
and
print $r->[1]
will show 6
doing
#dataset=shift(#inflightsequences)
Makes an array from the result from the shift. So it's a array with one element, the shift result, which is accessed as $dataset[0]. $dataset[0]->[1] will give 6, for example

Related

2D Array Printing as Reference

I have the code similar to below:
my #array1 = (); #2d array to be used
my $string1 = "blank1";
my $string2 = "blank2";
my $string3 = "blank3";
my #temp = ($string1, $string2, $string3);
push (#array1, \#temp);
The reason I am assigning the strings and then putting them into an array is because they are in a loop and the values get updated in the loop (#array1 is not declared in the loop).
When I run my program, it only gives me a reference to an array rather than an actual 2D array. How can I get it to print out the content as a 2D array and not as a reference or flattened out to a 1D array?
I would like an output like [[blank1, blank2, blank3],....] so i can access it like $array1[i][j]
An array can only have scalars for elements. Thus this includes references, to arrays for example, what enables us to build complex data structures. See perldsc, Tom's Perl Data Structure Cookbook.
Elements of those ("second-level") arrays are accessed by dereferencing, so $array1[0]->[1] is the second element of the array whose reference is the first element of the top-level array (#array1). Or, for convenience, a simpler syntax is allowed as well: $array1[0][1].
If we want a list of all elements of a second-level array then dereference it with #, like:
my #l2 = #{ $array1[0] }; # or, using
my #l2 = $array1[0]->#*; # postfix dereferencing
Or, to get just a few elements of the array, but in one scoop -- a slice
my #l2_slice = #{$array1[0]}[1..2]; # or
my #l2_slice = $array1[0]->#[1..2]; # postfix reference slice
what returns the list with the second and third elements of the same second-level array.
The second lines are of a newer syntax called postfix dereferencing, stable as of v5.24. It avails us with the same logic for getting elements as when we drill for a single one, by working left-to-right all the way. So ->#* to get a list of all elements for an arrayref,->%* for a hashref (etc). See for instance a perl.com article and an Effective Perler article.
There is a thing to warn about when it comes to multidimensional structures built with references. There are two distinct ways to create them: by using references to existing, named, variables
my #a1 = 5 .. 7;
my %h1 = ( a => 1, b => 2 );
my #tla1 = (\#a1, \%h1);
or by using anonymous ones, where arrayrefs are constructed by [] and hashrefs by {}
my #tla2 = ( [ 5..7 ], { a => 1, b => 2 } );
The difference is important to keep in mind. In the first case, the references to variables that the array carries can be used to change those variables -- if we change elements of #tla1 then we really change the referred variables
$tla1[0][1] = 100; # now #a1 == 5, 100, 7
Also, changing variables with references in #tla1 is seen via the top-level array as well.
With anonymous arrays and hashes in #tla this isn't the case as elements (references) of #tla access independent data, which cannot be accessed (and changed) in any other way.
Both of these ways to build complex data structures have their uses.

how to replace values in an array of hashes properly in Perl?

As seen below, I have a foreach loop inside which, a value inside an array of hashes is being replaced with a value from another array of hashes.
The second foreach loop is just to print and test whether the values got assigned correctly.
foreach my $row (0 .. $#row_buff) {
$row_buff[$row]{'offset'} = $vars[$row]{'expression'};
print $row_buff[$row]{'offset'},"\n";
}
foreach (0 .. $#row_buff) {
print $row_buff[$_]{'offset'},"\n";
}
Here #row_buff and #vars are the two array of hashes. They are prefilled with values for all keys used.
The hashes were pushed into the arrays like so:
push #row_buff, \%hash;
ISSUE:
Let's say the print statement in the first foreach print's like this:
string_a
string_b
string_c
string_d
Then the print statement in the second foreach loop print's like so:
string_d
string_d
string_d
string_d
This is what confuses me. Both print statements are supposed to print the exact same way am I right? But the value that gets printed by the second print statement is just the last value alone in a repeated manner. Could someone please point me to what could be going wrong here? Any hint is greatly appreciated. This is my first time putting up a question so pardon me if I missed anything.
UPDATE
There was a bit of information that I could have added, sorry about that everyone. There was one more line before the foreach, it was like so:
#row_buff = (#row_buff) x $itercnt;
foreach my $row (0 .. $#row_buff) {
$row_buff[$row]{'offset'} = $vars[$row]{'expression'};
print $row_buff[$row]{'offset'},"\n";
}
foreach (0 .. $#row_buff) {
print $row_buff[$_]{'offset'},"\n";
}
$itercnt is an integer. I was using it to replicate the #row_buff that many number of times.
This clearly has to do with storing references on the array, instead of independent data. How that comes about isn't clear since details aren't given, but the following discussion should help.
Consider these two basic examples.
First, place a hash (reference) on an array, first changing a value each time
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd);
# use Storable qw(dclone);
my %h = ( a => 1, b => 2 );
my #ary_w_refs;
for my $i (1..3) {
$h{a} = $i;
push #ary_w_refs, \%h; # almost certainly WRONG
# push #ary_w_refs, { %h }; # *copy* data
# push #ary_w_refs, dclone \%h; # may be necessary, or just safer
}
dd $_ for #ary_w_refs;
I use Data::Dump for displaying complex data structures, for its simplicity and default compact output. There are other modules for this purpose, Data::Dumper being in the core (installed).
The above prints
{ a => 3, b => 2 }
{ a => 3, b => 2 }
{ a => 3, b => 2 }
See how that value for key a, that we changed in the hash each time, and so supposedly set for each array element, to a different value (1, 2, 3) -- is the same in the end, and equal to the one we assigned last? (This appears to be the case in the question.)
This is because we assigned a reference to the hash %h to each element, so even though every time through the loop we first change the value in the hash for that key in the end it's just the reference there, at each element, to that same hash.∗
So when the array is queried after the loop we can only get what is in the hash (at key a it's the last assigned number, 3). The array doesn't have its own data, only a pointer to hash's data.† (Thus hash's data can be changed by writing to the array as well, as seen in the example below.)
Most of the time, we want a separate, independent copy. Solution? Copy the data.
Naively, instead of
push #ary_w_refs, \%h;
we can do
push #ary_w_refs, { %h };
Here {} is a constructor for an anonymous hash,‡ so %h inside gets copied. So actual data gets into the array and all is well? In this case, yes, where hash values are plain strings/numbers.
But what when the hash values themselves are references? Then those references get copied, and #ary_w_refs again does not have its own data! We'll have the exact same problem. (Try the above with the hash being ( a => [1..10] ))
If we have a complex data structure, carrying references for values, we need a deep copy. One good way to do that is to use a library, and Storable with its dclone is very good
use Storable qw(dclone);
...
push #ary_w_refs, dclone \%h;
Now array elements have their own data, unrelated (but at the time of copy equal) to %h.
This is a good thing to do with a simple hash/array as well, to be safe from future changes, whereby the hash is changed but we forget about the places where it's copied (or the hash and its copies don't even know about each other).
Another example. Let's populate an array with a hashref, and then copy it to another array
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd pp);
my %h = ( a => 1, b => 2 );
my #ary_src = \%h;
say "Source array: ", pp \#ary_src;
my #ary_tgt = $ary_src[0];
say "Target array: ", pp \#ary_tgt;
$h{a} = 10;
say "Target array: ", pp(\#ary_tgt), " (after hash change)";
$ary_src[0]{b} = 20;
say "Target array: ", pp(\#ary_tgt), " (after hash change)";
$ary_tgt[0]{a} = 100;
dd \%h;
(For simplicity I use arrays with only one element.)
This prints
Source array: [{ a => 1, b => 2 }]
Target array: [{ a => 1, b => 2 }]
Target array: [{ a => 10, b => 2 }] (after hash change)
Target array: [{ a => 10, b => 20 }] (after hash change)
{ a => 100, b => 20 }
That "target" array, which supposedly was merely copied off of a source array, changes when the distant hash changes! And when its source array changes. Again, it is because a reference to the hash gets copied, first to one array and then to the other.
In order to get independent data copies, again, copy the data, each time. I'd again advise to be on the safe side and use Storable::dclone (or an equivalent library of course), even with simple hashes and arrays.
Finally, note a slightly sinister last case -- writing to that array changes the hash! This (second-copied) array may be far removed from the hash, in a function (in another module) that the hash doesn't even know of. This kind of an error can be a source of really hidden bugs.
Now if you clarify where references get copied, with a more complete (simple) representation of your problem, we can offer a more specific remedy.
∗ An important way of using a reference that is correct, and which is often used, is when the structure taken the reference of is declared as a lexical variable every time through
for my $elem (#data) {
my %h = ...
...
push #results, \%h; # all good
}
That lexical %h is introduced anew every time so the data for its reference on the array is retained, as the array persists beyond the loop, independently for each element.
It is also more efficient doing it this way since the data in %h isn't copied, like it is with { %h }, but is just "re-purposed," so to say, from the lexical %h that gets destroyed at the end of iteration to the reference in the array.
This of course may not always be suitable, if a structure to be copied naturally lives outside of the loop. Then use a deep copy of it.
The same kind of a mechanism works in a function call
sub some_func {
...
my %h = ...
...
return \%h; # good
}
my $hashref = some_func();
Again, the lexical %h goes out of scope as the function returns and it doesn't exist any more, but the data it carried and a reference to it is preserved, since it is returned and assigned so its refcount is non-zero. (At least returned to the caller, that is; it could've been passed yet elsewhere during the sub's execution so we may still have a mess with multiple actors working with the same reference.) So $hashref has a reference to data that had been created in the sub.
Recall that if a function was passed a reference, when it was called or during its execution (by calling yet other subs which return references), changed and returned it, then again we have data changed in some caller, potentially far removed from this part of program flow.
This is done often of course, with larger pools of data which can't just be copied around all the time, but then one need be careful and organize code (to be as modular as possible, for one) so to minimize chance of errors.
† This is a loose use of the word "pointer," for what a reference does, but if one were to refer to C I'd say that it's a bit of a "dressed" C-pointer
‡ In a different context it can be a block

Use of uninitialized value in array value

I am having an array named #Lst_1. One of my elements is 0 in array.
Whenever I call that element. In example below the value stored in second index of an array is 0.
$Log_Sheet->write(Row,0,"$Lst_1[2]",$Format);
I am getting a warning saying
Use of uninitialized value within #Lst_1 in string.
Please help me do that.
First index of array is 0. Second element will be $List_1[1];
#!/usr/bin/env perl
use v5.22;
use warnings;
my #array = qw(foo bar);
# number of elements in array
say scalar(#array);
# last index of array
say $#array;
# undefined element (warn)
say $array[ $#array + 1];
If you just want to silence the error,
$Log_Sheet->write(Row, 0, $Lst_1[2] // 0, $Format);
This does use a feature of perl 5.10, but that's ancient enough you really should be using a sufficiently new perl to have it. I mean, there's a lot of ancient perl bugs, so it behooves you to be using a newer version.
As far as understanding the issue, no, $Lst_1[2] doesn't contain a 0. It contains an undef, which just happens to be treated mostly like 0 in numeric contexts.
Yes, I did remove the quotes around $Lst_1[2] - that was necessary, because "$Lst_1[2]" treats that undef like a string, so it's become the empty string for the purpose of a "$Lst_1[2]" // 0 test. (The empty string also happens to be treated like 0, so that doesn't change the behavior in a numeric context.)
It's not clear from your short excerpt whether #Lst_1 is less than 3 elements long, or if there's an explicit undef in #Lst_1. You'd need to show a larger excerpt of your code - or possibly even the whole thing and the data it is processing - for us to be able to determine that by looking. You could determine it, however, by adding something like the following in front of the line you gave:
if (#Lst_1 < 3) {
print "\#Lst_1 only has " . #Lst_1 . " elements\n"
} elsif (not defined($Lst_1[2])) {
print "\$Lst_1[2] is set to undef\n";
}
There are two basic ways a list can have an explicit undef element in it. The following code demonstrates both:
my #List = map "Index $_", 0 .. 3;
$List[2] = undef;
$List[5] = "Index 5";
use Data::Dump;
dd #List;
This will output
("Index 0", "Index 1", undef, "Index 3", undef, "Index 5")
The first undef was because I set it, the second was because there wasn't a fifth element but I put something in the sixth slot.

How to use an array reference to pairs of elements

I am considering this answer which uses a single array reference of points, where a point is a reference to a two-element array.
My original code of the question (function extract-crossing) uses two separate arrays $x and $y here which I call like this:
my #x = #{ $_[0] }; my #y = #{ $_[1] };
...
return extract_crossing(\#x, \#y);
The new code below based on the answer takes (x, y) and returns a single datatype, here x intercept points:
use strict; use warnings;
use Math::Geometry::Planar qw(SegmentLineIntersection);
use Test::Exception;
sub x_intercepts {
my ($points) = #_;
die 'Must pass at least 2 points' unless #$points >= 2;
my #intercepts;
my #x_axis = ( [0, 0], [1, 0] );
foreach my $i (0 .. $#$points - 1) {
my $intersect = SegmentLineIntersection([#$points[$i,$i+1],#x_axis]);
push #intercepts, $intersect if $intersect;
}
return \#intercepts;
}
which I try to call like this:
my #x = #{ $_[0] }; my #y = #{ $_[1] };
...
my $masi = x_intercepts(\#x);
return $masi;
However, the code does not make sense.
I am confused about passing the "double array" to the x_intercepts() function.
How can you make the example code clearer to the original setting?
If I am understanding the problem here, #ThisSuitIsBlackNot++ has written a function (x_intercepts which is available in the thread: Function to extract intersections of crossing lines on axes) that expects its argument to be a reference to a list of array references. The x_intercepts subroutine in turn uses a function from Math::Geometry::Planar which expects the points of a line segment to be passed as series of array references/anonymous arrays that contain the x,y values for each point.
Again - it is not entirely clear - but it seems your data is in two different arrays: one containing all the x values and one with the corresponding y values. Is this the case? If this is not correct please leave a comment and I will remove this answer.
If that is the source of your problem then you can "munge" or transform your data before you pass it to x_intercepts or - as #ThisSuitIsBlackNot suggests - you can rewrite the function. Here is an example of munging your existing data into an "#input_list" to pass to x_intercepts.
my #xs = qw/-1 1 3/;
my #ys = qw/-1 1 -1 /;
my #input_list ;
foreach my $i ( 0..$#ys ) {
push #input_list, [ $xs[$i], $ys[$i] ] ;
}
my $intercept_list = x_intercepts(\#input_list) ;
say join ",", #$_ for #$intercept_list ;
Adding the lines above to your script produces:
Output:
0,0
2,0
You have to be very careful doing this kind of thing and using tests to make sure you are passing the correctly transformed data in an expected way is a good idea.
I think a more general difficulty is that until you are familiar with perl it is sometimes tricky to easily see what sorts of values a subroutine is expecting, where they end up after they are passed in, and how to access them.
A solid grasp of perl data structures can help with that - for example I think what you are calling a "double array" or "double element" here is an "array of arrays". There are ways to make it easier to see where default arguments passed to a subroutine (in #_) are going (notice how #ThisSuitIsBlackNot has passed them to a nicely named array reference: "($points)"). Copious re-reading of perldocperbsub can help things seem more obvious.
References are key to understanding perl subroutines since to pass an array or hash to a subrouting you need to do so by references. If the argument passed x_intercepts is a list of two lists of anonymous arrays then when it is assigned to ($points), #$points->[0] #$points->[1] will be the arrays contain those lists.
I hope this helps and is not too basic (or incorrect). If #ThisSuitIsBlackNot finds the time to provide an answer you should accept it: some very useful examples have been provided.

How to reference a split expression in Perl?

I want to create a reference to an array obtained by a split in Perl.
I'm thinking something like:
my $test = \split( /,/, 'a,b,c,d,e');
foreach $k (#$test) {
print "k is $k\n";
}
But that complains with Not an ARRAY reference at c:\temp\test.pl line 3.
I tried a few other alternatives, all without success.
Background explanation:
split, like other functions, returns a list. You cannot take a reference to a list. However, if you apply the reference operator to a list, it gets applied to all its members. For example:
use Data::Dumper;
my #x = \('a' .. 'c');
print Dumper \#x
Output:
$VAR1 = [
\'a',
\'b',
\'c'
];
Therefore, when you write my $test = \split( /,/, 'a,b,c,d,e');, you get a reference to the last element of the returned list (see, for example, What’s the difference between a list and an array?). Your situation is similar to:
Although it looks like you have a list on the righthand side, Perl actually sees a bunch of scalars separated by a comma:
my $scalar = ( 'dog', 'cat', 'bird' ); # $scalar gets bird
Since you’re assigning to a scalar, the righthand side is in scalar context. The comma operator (yes, it’s an operator!) in scalar context evaluates its lefthand side, throws away the result, and evaluates it’s righthand side and returns the result. In effect, that list-lookalike assigns to $scalar it’s rightmost value. Many people mess this up becuase they choose a list-lookalike whose last element is also the count they expect:
my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally
In your case, what you get on the RHS is a list of references to the elements of the list returned by split, and the last element of that list ends up in $test. You first need to construct an array from those return values and take a reference to that. You can make that a single statement by forming an anonymous array and storing the reference to that in $test:
my $test = [ split( /,/, 'a,b,c,d,e') ];
Surround split command between square brackets to make an anonymous reference.
my $test = [ split( /,/, 'a,b,c,d,e') ];
Giving it a name has different semantics in that changes to the named variable then change what was referenced while each anonymous array is unique. I discovered this the hard way by doing this in a loop.

Resources