Generate array and subarray dynamically (perl) - arrays

I have a several files that have contain product install information.
I am able to grep for the elements that I want (for example, "version") from the files. And I end up with something like this:
instancefile1:ABC_version=1.2.3
instancefile1:XYZ_version=2.5
instancefile2:ABC_version=3.4.5
instancefile3:XYZ_version=1.1
Where the components are named ABC or XYZ.
What I'd like to do is take this grep output and parse it through perl to build arrays on the file.
The first array would be composed of the instance number (pulled from the filenames) - above, I'd have an array that would have 1,2,3 as elements.
And inside each of those arrays would be the components that that particular instance has.
Full expected arrays and components from above:
array[0] = instancefile1 # can keep this named like the filename,
or assign name. Does not matter
array[0][0] = ABC_version=1.2.3
array[0][1] = XYZ_version=2.5
array[1] = instancefile2
array[1][0] = ABC_version=3.4.5
array[2] = instancefile3
array[2][0] = XYZ_version=1.1
(I know my notation for referencing subarrays is not correct - I'm rusty on my perl.)
How might I do that?
(I've been doing it with just bash arrays and grep and then reiterating through the initial grep output with my first array and doing another grep to fill another array - but this seems like it is going through the data more than one time, instead of building it on the fly.)
The idea is for it to build each array as it sees it. It sees "fileinstance1", it stores the values to the right in that array, as it sees it. Then if it sees "fileinstance2", it creates that array and populates with those values, all in one pass. I believe perl is the best tool for this?

Unless you can guaranteed the records with the same key will be next to each other, it's easier to start with a HoA.
my %data;
while (<>) {
chomp;
my ($key, $val) = split /:/;
push #{ $data{$key} }, $val;
}
Then convert to AoA:
my #data = values(%data);
Order-preserving:
my %data;
my #data;
while (<>) {
chomp;
my ($key, $val) = split /:/;
push #data, $data{$key} = []
if !$data{$key};
push #{ $data{$key} }, $val;
}

Related

Perl split and throw away the first element in one line

I have some data that should be able to be easily split into a hash.
The following code is intended to split the string into its corresponding key/value pairs and store the output in a hash.
Code:
use Data::Dumper;
# create a test string
my $string = "thing1:data1thing2:data2thing3:data3";
# Doesn't split properly into a hash
my %hash = split m{(thing.):}, $string;
print Dumper(\%hash);
However upon inspecting the output it is clear that this code does not work as intended.
Output:
$VAR1 = {
'data3' => undef,
'' => 'thing1',
'data2' => 'thing3',
'data1' => 'thing2'
};
To further investigate the problem I split the output into an array instead and printed the results.
Code:
# There is an extra blank element at the start of the array
my #data = split m{(thing.):}, $string;
for my $line (#data) {
print "LINE: $line\n";
}
Output:
LINE:
LINE: thing1
LINE: data1
LINE: thing2
LINE: data2
LINE: thing3
LINE: data3
As you can see the problem is that split is returning an extra empty element at the start of the array.
Is there any way that I can throw away the first element from the split output and store it in a hash in one line?
I know I can store the output in an array and then just shift off the first value and store the array in a hash... but I'm just curious whether or not this can be done in one step.
my (undef, %hash) = split m{(thing.):}, $string; will throw away the first value.
I'd alternatively suggest - use regex not split:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $string = "thing1:data1thing2:data2thing3:data3";
my %results = $string =~ m/(thing\d+):([A-Z]+\d+)/ig;
print Dumper \%results;
Of course, this does make the assumption that you're matching 'word+digit' groups, as without that "numeric" separator it won't work as well.
I'm aiming to primarily illustrate the technique - grab 'paired' values out of a string, because then they assign straight to a hash.
You might have to be a bit more complicated with the regex, for example nongreedy quantifiers:
my %results = $string =~ m/(thing.):(\w+?)(?=thing|$)/ig;
This may devalue it in terms of clarity.

Perl array element comparing

I am new in Perl programming. I am trying to compare the two arrays each element. So here is my code:
#!/usr/bin/perl
use strict;
use warnings;
use v5.10.1;
my #x = ("tom","john","michell");
my #y = ("tom","john","michell","robert","ricky");
if (#x ~~ #y)
{
say "elements matched";
}
else
{
say "no elements matched";
}
When I run this I get the output
no elements matched
So I want to compare both array elements in deep and the element do not matches, those elements I want to store it in a new array. As I can now compare the only matched elements but I can't store it in a new array.
How can I store those unmatched elements in a new array?
Please someone can help me and advice.
I'd avoid smart matching in Perl - e.g. see here
If you're trying to compare the contents of $y[0] with $x[0] then this is one way to go, which puts all non-matches in an new array #keep:
use strict;
use warnings;
use feature qw/say/;
my #x = qw(tom john michell);
my #y = qw(tom john michell robert ricky);
my #keep;
for (my $i = 0; $i <$#y; $i++) {
unless ($y[$i] eq $x[$i]){
push #keep, $y[$i];
}
}
say for #keep;
Or, if you simply want to see if one name exists in the other array (and aren't interested in directly comparing elements), use two hashes:
my (%x, %y);
$x{$_}++ for #x;
$y{$_}++ for #y;
foreach (keys %y){
say if not exists $x{$_};
}
It would be well worth your while spending some time reading the Perl FAQ.
Perl FAQ 4 concerns Data Manipulation and includes the following question and answer:
How do I compute the difference of two arrays? How do I compute
the intersection of two arrays?
Use a hash. Here's code to do both and more. It assumes that each
element is unique in a given array:
my (#union, #intersection, #difference);
my %count = ();
foreach my $element (#array1, #array2) { $count{$element}++ }
foreach my $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
Note that this is the symmetric difference, that is, all elements
in either A or in B but not in both. Think of it as an xor
operation.

Perl: Load file into hash

I'm struggling to understand logic behind hashes in Perl. Task is to load file in to hash and assign values to keys which are created using this file.
File contains alphabet with each letter on its own line:
a
b
c
d
e
and etc,.
When using array instead of hash, logic is simple: load file into array and then print each element with corresponding number using some counter ($counter++).
But now my question is, how can I read file into my hash, assign automatically generated values and sort it in that way where output is printed like this:
a:1
b:2
c:3
I've tried to first create array and then link it to hash using
%hash = #array
but it makes my hash non-sortable.
There are a number of ways to approach this. The most direct would be to load the data into the hash as you read through the file.
my %hash;
while(<>)
{
chomp;
$hash{$_} = $.; #Use the line number as your autogenerated counter.
}
You can also perform simliar logic if you already have a populated array.
for (0..$#array)
{
$hash{$array[$_]} = $_;
}
Although, if you are in that situation, map is the perlier way of doing things.
%hash = map { $array[$_] => $_ } #array;
Think of a hash as a set of pairs (key, value), where the keys must be unique. You want to read the file one line at a time, and add a pair to the hash:
$record = <$file_handle>;
$hash{$record} = $counter++;
Of course, you could read the entire file into an array at once and then assign to your hash. But the solution is not:
#records = <$file_handle>;
%hash = #records;
... as you found out. If you think in terms of (key, value) pairs, you will see that the above is equivalent to:
$hash{a} = 'b';
$hash{c} = 'd';
$hash{e} = 'f';
...
and so on. You still are going to need a loop, either an explicit one like this:
foreach my $rec (#records)
{
$hash{$rec} = $counter++;
}
or an implicit one like one of these:
%hash = map {$_ => $counter++} #records;
# or:
$hash{$_} = $counter++ for #records;
This code should generate the proper output, where my-text-file is the path to your data file:
my %hash;
my $counter = 0;
open(FILE, "my-text-file");
while (<FILE>) {
chomp;
$counter++;
$hash{$_} = $counter;
}
# Now to sort
foreach $key (sort(keys(%hash))) {
print $key . ":" . $hash{$key} . "\n";
}
I assume you want to sort the hash aplhabetically. keys(%hash) and values(%hash) return the keys and values of %hash as an array, respectively. Run the program on this file:
f
a
b
d
e
c
And we get:
a:2
b:3
c:6
d:4
e:5
f:1
I hope this helps you.

How can I create a variable array name in Perl?

Array #p is a multiline array, e.g. $p[1] is the second line.
This code will explain what I want:
$size=#p; # line number of array #p
for($i=0; $i<$size; $i++)
{
#p{$i}= split(/ +/,$p[$i]);
}
I want the result should be like this:
#p0 = $p[0] first line of array #p goes to array #p0;
#p1 = $p[1] second line of array #p goes to array #p1;
...
...
and so on.
But above code does not work, how can I do it?
It is a bad idea to dynamically generate variable names.
I suggest the best solution here is to convert each line in your #p array to an array of fields.
Lets suppose you have a better name for #p, say #lines. Then you can write
my #lines = map [ split ], <$fh>;
to read in all the lines from the file handle $fh and split them on whitespace. The first field of the first line is then $lines[0][0]. The third field of the first line is $lines[0][2] etc.
First, the syntax #p{$i} accesses the entry with the key $i in a hash %p, and returns it in list context. I don't think you meant that. use strict; use warnings; to get warned about undeclared variables.
You can declare variables with my, e.g. my #p; or my $size = #p;
Creating variable names on the fly is possible, but a bad practice. The good thing is that we don't need to: Perl has references. A reference to an array allows us to nest arrays, e.g.
my #AoA = (
[1, 2, 3],
["a", "b"],
);
say $AoA[0][1]; # 2
say $AoA[1][0]; # a
We can create an array reference by using brackets, e.g. [ #array ], or via the reference operator \:
my #inner_array = (1 .. 3);
my #other_inner = ("a", "b");
my #AoA = (\#inner_array, \#other_array);
But careful: the array references still point to the same array as the original names, thus
push #other_inner, "c";
also updates the entry in #AoA:
say $AoA[1][2]; # c
Translated to your problem this means that you want:
my #pn;
for (#p) {
push #pn, [ split /[ ]+/ ];
}
There are many other ways to express this, e.g.
my #pn = map [ split /[ ]+/ ], #p;
or
my #pn;
for my $i ( 0 .. $#p ) {
$pn[$i] = [ split /[ ]+/, $p[$i] ];
}
To learn more about references, read
perlreftut,
perldsc, and
perlref.

2d arr explicitpackage

I've looked through several threads on websites including this one to try and understand why I am getting an undeclared variable error for my usage of my $line . Each element of the #lines array is an array of strings.
The error is in line 25 and 27 with the $line[$count] statement
use strict;
use warnings;
my #lines;
my #sizes;
# read input from stdin file into 2d array
while(<>)
{
push(#lines, my #tokens = split(/\s+/, $_));
}
# search through each array for largest sizes in
# corresponding elements
for (my $count = 0; $count <= 5; $count++)
{
push(#sizes, 0);
foreach my $line (#lines)
{
if(length($line[$count])>$sizes[$count])
{
$sizes[$count] = length($line[$count]);
}
}
}
I can post the full code if it is necessary, but I am pretty sure the error must be in here somewhere.
The problem is here:
push(#lines, my #tokens = split(/\s+/, $_));
Pushing one array into another just adds all elements to the first array. So you are making a really long one dimensional array.
To fix this, use brackets to make an array reference:
push #lines, [ split(/\s+/, $_) ]; #No need for a temp variable.
Also, to access the array reference, you have to de-reference it. Both of these syntaxes are options:
${$line}[$count];
$line->[$count];
I think the second syntax is more readable.
Update: Also, you could simplify your code if you keep track of the longest lengths while you go through the file:
use strict;
use warnings;
use List::Util qw/max/;
my #lines;
my #sizes = (0)x6;
while(<>)
{
push #lines, [ my #tokens = split ];
#sizes = map { max ( length($tokens[$_]), $sizes[$_] ) } 0..$#tokens;
}
Note: The Data::Dumper core module is an invaluable tool when working with complex data structures in Perl.
use Data::Dumper;
print Dumper #lines;
This will print out the complete structure of whatever variable you give it. That way you can see if you actually created what you thought you did.

Resources