Perl: Separate array value by certain variable string

Perl: Separate array value by certain variable string - arrays

Need advice on how to separate array data into different column base on certain string. Like example below base on "EXIT" to split data & print into different column. Thank.
Example:
Input
John
Eva
Felix
Exit
a
b
c
Exit
1
2
3
output
John a 1
Eve b 2
Felix c 3

Iterate over the elements, store them into an array of arrays, resetting the index of the outer array on each Exit:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my #arr = qw(John Eva Felix Exit a b c Exit 1 2 3);
my #out;
my $index = 0;
for (#arr) {
if ('Exit' eq $_) {
$index = 0;
} else {
push #{ $out[$index++] }, $_;
}
}
say join ' ', #$_ for #out;
If the input lines aren't of the same length, you can assign to the particular element in the array:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my #arr = qw(John Eva Felix Exit a b c d e f Exit 1 2 3 4);
my #out;
my $outer = 0;
my $inner = 0;
for (#arr) {
if ('Exit' eq $_) {
$outer = 0;
++$inner;
} else {
$out[$outer++][$inner] = $_;
}
}
say join "\t", map $_ // q(), #$_ for #out;

Related

For Loop Issues in creating nested array

Creating a matrix of products for three element arrays. I understand Perl does not have multi-dimensional arrays and are flattened. I have been using refs but I can't seem to get past the for loop issue in getting three products into a single array and pushing that array into a different single array. And I could be way off too. Be nice, but I've spent too many hours on this.
I have moved values inside and out of various places i.e. { }, printed out variables until I'm blue and used $last all over for debugging. I'm likely fried at this point.
use strict;
use warnings;
my #array1 = (1, 2, 3);
my #array2 = (2, 4, 6);
my #matrixArray = ();
my $matrixArray;
my #row;
my #finalArray = maths(\#array1, \#array2);
print #finalArray;
sub maths{
my $array1ref = shift;
my $array2ref = shift;
my $value1;
my $value2;
my $maths;
my #row = ();
my #array1 = #{$array1ref};
my #array2 = #{$array2ref};
my $len1 = #array1;
my $len2 = #array2;
for my $x (0 ..($len1 -1)){
#iterate through first array at each value
$value1 = $array1[$x];
#print $value1, " value1 \n";
for my $y (0 .. ($len2 -1)){
#iterate through second array at each value
$value2 = $array2[$y];
#print $value2, " value2 \n";
#calculate new values
$maths = $value1 * $value2;
#exactly right here
#print $maths, " maths \n" ;
push #row, $maths;
}
}
#and exactly right here but not set of arrays
#print #row, "\n";
return #row;
}
Currently I'm able to get this: 246481261218. Which is the correct dumb math but...
it should appear as a matrix:
2 4 6
4 8 12
6 12 18
I am not passing three arrays so it seems my issue is up in the sub routine before I can get on with anything else. This seems to be a theme that I often miss. So sorry if I sound inept.
EDIT***
This was working but I couldn't unpack it
use strict;
use warnings;
my #array1 = (1, 2, 3);
my #array2 = (2, 4, 6);
my #matrixArray = ();
maths(\#array1, \#array2);
foreach my $x (#matrixArray){
print "$x \n";
}
sub maths{
my $array1ref = shift;
my $array2ref = shift;
my $value1;
my $value2;
my $maths;
my #row = ();
my $row;
my #array1 = #{$array1ref};
my #array2 = #{$array2ref};
my $len1 = #array1;
my $len2 = #array2;
for my $x (0 ..($len1 -1)){
#iterate through first array at each value
$value1 = $array1[$x];
for my $y (0 .. ($len2 -1)){
#iterate through second array at each value
$value2 = $array2[$y];
#calculate new values
$maths = $value1 * $value2;
push #row, $maths;
$row = \#row;
}
push #matrixArray, $row;
}
return #matrixArray;
}
The output right after the function call is this:
ARRAY(0x55bbe2c667b0)
ARRAY(0x55bbe2c667b0)
ARRAY(0x55bbe2c667b0)
which would be the (line 10) print of $x.
****EDIT
This Works (almost):
print join(" ", #{$_}), "\n" for #matrixArray;
Output is a bit wrong...
2 4 6 4 8 12 6 12 18
2 4 6 4 8 12 6 12 18
2 4 6 4 8 12 6 12 18
And of note: I knew $x was an array but I seemed to run into trouble trying to unpack it correctly. And I'm no longer a fan of Perl. I'm pining for the fjords of Python.
And *****EDIT
This is working great and I get three arrays out of it:
sub maths{
my ($array1, $array2) = #_;
my #res;
for my $x (#$array1) {
my #row;
for my $y (#$array2) {
push #row, $x * $y;
}
push #res, \#row;
}
#This is the correct structure on print #res!
return #res;
}
But, though it's putting it together correctly, I have no output after the call
maths(\#array1, \#array2);
NOTHING HERE...
print #res;
print join(" ", #{$_}), "\n" for #res;
foreach my $x (#res){
print join(" ", #{$x}), "\n";
}
And of course a million thanks! I regret taking this stupid course and fear my grade will eventually do me in. Still pining for Python!

It appears that you need a matrix with rows obtained by multiplying an array by elements of another.
One way
use warnings;
use strict;
use Data::Dump qw(dd);
my #ary = (2, 4, 6);
my #factors = (1, 2, 3);
my #matrix = map {
my $factor = $_;
[ map { $_ * $factor } #ary ]
} #factors;
dd #matrix;
The array #matrix, formed by the outer map, has array references for each element and is thus (at least) a two-dimensional structure (a "matrix"). Those arrayrefs are built with [ ], which creates an anonymous array out of a list inside. That list is generated by map over the #ary.
I use Data::Dump to nicely print complex data. In the core there is Data::Dumper.
With a lot of work like this, and with large data, efficiency may matter. The common wisdom would have it that direct iteration should be a bit faster than map, but here is a benchmark. This also serves to show more basic ways as well.
use warnings;
use strict;
use feature 'say';
use Benchmark qw(cmpthese);
my $runfor = shift // 5; # run each case for these many seconds
sub outer_map {
my ($ary, $fact) = #_;
my #matrix = map {
my $factor = $_;
[ map { $_ * $factor } #$ary ]
} #$fact;
return \#matrix;
}
sub outer {
my ($ary, $fact) = #_;
my #matrix;
foreach my $factor (#$fact) {
push #matrix, [];
foreach my $elem (#$ary) {
push #{$matrix[-1]}, $elem * $factor;
}
}
return \#matrix;
}
sub outer_tmp {
my ($ary, $fact) = #_;
my #matrix;
foreach my $factor (#$fact) {
my #tmp;
foreach my $elem (#$ary) {
push #tmp, $elem * $factor;
}
push #matrix, \#tmp;
}
return \#matrix;
}
my #a1 = map { 2*$_ } 1..1_000; # worth comparing only for large data
my #f1 = 1..1_000;
cmpthese( -$runfor, {
direct => sub { my $r1 = outer(\#a1, \#f1) },
w_tmp => sub { my $r2 = outer_tmp(\#a1, \#f1) },
w_map => sub { my $r3 = outer_map(\#a1, \#f1) },
});
On a nice machine with v5.16 this prints
Rate direct w_map w_tmp
direct 11.0/s -- -3% -20%
w_map 11.4/s 3% -- -17%
w_tmp 13.8/s 25% 21% --
The results are rather similar on v5.29.2, and on an oldish laptop.
So map is a touch faster than building a matrix directly, and 15-20% slower than the method using a temporary array for rows, which I'd also consider clearest. The explicit loops can be improved a little by avoiding scopes and scalars, and the "direct" method can perhaps be sped up some by using indices. But these are dreaded micro-optimizations, and for fringe benefits at best.
Note that timings such as these make sense only with truly large amounts of data, what the above isn't. (I did test with both dimensions ten times as large, with very similar results.)

The second program is mostly correct.
The problem is that you didn't unpack the second level of the array.
foreach my $x (#matrixArray){
print "$x \n";
}
should be something like:
foreach my $x (#matrixArray) {
print join(" ", #{$x}), "\n";
}
# or just:
print join(" ", #{$_}), "\n" for #matrixArray;
Your maths function can be made shorter without losing legibility (it may actually make it more legible) by cutting out unnecessary temporary variables and indexing. For example:
sub maths {
my #array1 = #{ $_[0] };
my #array2 = #{ $_[1] }; # or: ... = #{ (shift) };
my #res = ();
for my $x (#array1) {
my #row = (); # <-- bugfix of original code
for my $y (#array2) {
my $maths = $x * $y;
push #row, $maths;
}
push #res, \#row;
}
return #res;
}

Read space delimited text file into array of hashes [Perl]

I have text file that matches the following format:
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
And I need to read it into a data structure in perl that will allow me to sort and print according to any of those columns.
From left to right the columns are process_id, memory_size, cpu_time and program_name.
How can I read a text file with formatting like that in a way that allows me to sort the data structure and print it according to the sort?
My attempt so far:
my %tasks;
sub open_file{
if (open (my $input, "task_file" || die "$!\n")){
print "Success!\n";
while( my $line = <$input> ) {
chomp($line);
($process_id, $memory_size, $cpu_time, $program_name) = split( /\s/, $line, 4);
$tasks{$process_id} = $process_id;
$tasks{$memory_size} = $memory_size;
$tasks{$cpu_time} = $cpu_time;
$tasks{$program_name} = $program_name;
print "$tasks{$process_id} $tasks{$memory_size} $tasks{$cpu_time} $tasks{$program_name}\n";
}
This does print the output correctly, however I can't figure out how to then sort my resulting %tasks hash by a specific column (i.e. process_id, or any other column) and print the whole data structure in a sorted format.

You're storing the values under keys that are equal to the values. Use Data::Dumper to inspect the structure:
use Data::Dumper;
# ...
print Dumper(\%tasks);
You can store the pids in a hash of hashes, using the value of each column as the inner key.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw{ say };
my #COLUMNS = qw( memory cpu program );
my %sort_strings = ( program => sub { $a cmp $b } );
my (%process_details, %sort);
while (<DATA>) {
my ($process_id, $memory_size, $cpu_time, $program_name) = split;
$process_details{$process_id} = { memory => $memory_size,
cpu => $cpu_time,
program => $program_name };
undef $sort{memory}{$memory_size}{$process_id};
undef $sort{cpu}{$cpu_time}{$process_id};
undef $sort{program}{$program_name}{$process_id};
}
say 'By pid:';
say join ', ', $_, #{ $process_details{$_} }{#COLUMNS}
for sort { $a <=> $b } keys %process_details;
for my $column (#COLUMNS) {
say "\nBy $column:";
my $cmp = $sort_strings{$column} || sub { $a <=> $b };
for my $value (sort $cmp keys %{ $sort{$column} }
) {
my #pids = keys %{ $sort{$column}{$value} };
say join ', ', $_, #{ $process_details{$_} }{#COLUMNS}
for #pids;
}
}
__DATA__
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
But if the data aren't really large and the sorting isn't time critical, just sorting the whole array of arrays by a given column is much easier to write and read:
#!/usr/bin/perl
use strict;
use feature qw{ say };
use warnings;
use enum qw( PID MEMORY CPU PROGRAM );
my #COLUMN_NAMES = qw( pid memory cpu program );
my %sort_strings = ((PROGRAM) => 1);
my #tasks;
push #tasks, [ split ] while <DATA>;
for my $column_index (0 .. $#COLUMN_NAMES) {
say "\nBy $COLUMN_NAMES[$column_index]:";
my $sort = $sort_strings{$column_index}
? sub { $a->[$column_index] cmp $b->[$column_index] }
: sub { $a->[$column_index] <=> $b->[$column_index] };
say "#$_" for sort $sort #tasks;
}
__DATA__
...
You need to install the enum distribution.

I can't figure out how to then sort my resulting %tasks hash by a specific column
You can't sort a hash. You need to convert each of your input rows in a hash (which you're doing successfully) and then store all of those hashes in an array. You can then print the contents of the array in a sorted order.
This seems to do what you want:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my #cols = qw[process_id memory_size cpu_time program_name];
#ARGV or die "Usage: $0 [sort_order]\n";
my $sort = lc shift;
if (! grep { $_ eq $sort } #cols ) {
die "$sort is not a valid sort order.\n"
. "Valid sort orders are: ", join('/', #cols), "\n";
}
my #data;
while (<DATA>) {
chomp;
my %rec;
#rec{#cols} = split;
push #data, \%rec;
}
if ($sort eq $cols[-1]) {
# Do a string sort
for (sort { $a->{$sort} cmp $b->{$sort} } #data) {
say join ' ', #{$_}{#cols};
}
} else {
# Do a numeric sort
for (sort { $a->{$sort} <=> $b->{$sort} } #data) {
say join ' ', #{$_}{#cols};
}
}
__DATA__
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
I've used the built-in DATA filehandle to make the code simpler. You would need to replace that with some code to read from an external file.
I've used a hash slice to simplify reading the data into a hash.
The column that you want to sort by is passed into the program as a command-line argument.
Note that you have to sort the last column (the program name) using string comparison and all other columns using numeric comparison.

This decides how to sort using the first argument the script receives.
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
open my $fh, '<', 'task_file';
my #tasks;
my %sort_by = (
process_id=>0,
memory_size=>1,
cpu_time=>2,
program_name=>3
);
my $sort_by = defined $sort_by{defined $ARGV[0]?$ARGV[0]:0} ? $sort_by{$ARGV[0]} : 0;
while (<$fh>) {
push #tasks, [split /\s+/, $_];
}
#tasks = sort {
if ($b->[$sort_by] =~ /^[0-9]+$/ ) {
$b->[$sort_by] <=> $a->[$sort_by];
} else {
$a->[$sort_by] cmp $b->[$sort_by];
}
} #tasks;
for (#tasks) {
say join ' ', #{$_};
}

Referencing an element in a 2D array in Perl

I have the following code which reads in a 6x6 array from STDIN and saves it as an array of anonymous arrays. I am trying to print out each element with $arr[i][j], but the code below isn't working. It just prints out the first element over and over. How am I not accessing the element correctly?
#!/user/bin/perl
my $arr_i = 0;
my #arr = ();
while ($arr_i < 6){
my $arr_temp = <STDIN>;
my #arr_t = split / /, $arr_temp;
chomp #arr_t;
push #arr,\#arr_t;
$arr_i++;
}
foreach my $i (0..5){
foreach my $j (0..5){
print $arr[i][j] . "\n";
}
}

i and j are not the same as the variables you declared in the foreach lines. Change:
print $arr[i][j] . "\n";
to:
print $arr[$i][$j] . "\n";
warnings alerted me to this issue. You should add these lines to all your Perl code:
use warnings;
use strict;

To demonstrate the Perlish mantra that there's "more than one way to do it":
use 5.10.0; # so can use "say"
use strict;
use warnings qw(all);
sub get_data {
my ($cols, $rows) = #_;
my ($line, #rows);
my $i;
for ($i = 1; $i <= $rows and $line = <DATA>; $i++) {
chomp $line;
my $cells = [ split ' ', $line ];
die "Row $i had ", scalar(#$cells), " instead of $cols" if #$cells != $cols;
push #rows, $cells;
}
die "Not enough rows, got ", $i - 1, "\n" if $i != $rows + 1;
\#rows;
}
sub print_data {
my ($cols, $rows, $data) = #_;
for (my $i = 0; $i < $rows; $i++) {
for (my $j = 0; $j < $cols; $j++) {
say $data->[$i][$j];
}
}
}
my $data = get_data(6, 6);
print_data(6, 6, $data);
__DATA__
1 2 3 4 5 6
a b c d e f
6 5 4 3 2 1
f e d c b a
A B C D E F
7 8 9 10 11 12
Explanation:
if we use say, that avoids unsightly print ..., "\n"
get_data is a function that can be called and/or reused, instead of just being part of the main script
get_data knows what data-shape it expects and throws an error if it doesn't get it
[ ... ] creates an anonymous array and returns a reference to it
get_data returns an array-reference so data isn't copied
print_data is a function too
both functions use a conventional for loop instead of making lists of numbers, which in Perl 5 needs to allocate memory
There is also a two-line version of the program (with surrounding bits, and test data):
use 5.10.0; # so can use "say"
my #lines = map { [ split ' ', <DATA> ] } (1..6);
map { say join ' ', map qq{"$_"}, #$_ } #lines;
__DATA__
1 2 3 4 5 6
a b c d e f
6 5 4 3 2 1
f e d c b a
A B C D E F
7 8 9 10 11 12
Explanation:
using map is the premier way to iterate over lists of things where you don't need to know how many you've seen (otherwise, a for loop is needed)
the adding of " around the cell contents is only to prove they've been processed. Otherwise the second line could just be: map { say join ' ', #$_ } #lines;

perl: split array into matches and non-matches

I know you can use grep to filter an array based on a boolean condition. However, I want to get 2 arrays back: 1 for elements that match the condition and 1 for elements that fail. For example, instead of this, which requires iterating over the list twice:
my #arr = (1,2,3,4,5);
my #evens = grep { $_%2==0 } #arr;
my #odds = grep { $_%2!=0 } #arr;
I'd like something like this:
my #arr = (1,2,3,4,5);
my ($evens, $odds) = magic { $_%2==0 } #arr;
Where magic returns 2 arrayrefs or something. Does such an operator exist, or do I need to write it myself?

It's probably most succinct to simply push each value to the correct array in a for loop
use strict;
use warnings 'all';
my #arr = 1 .. 5;
my ( $odds, $evens );
push #{ $_ % 2 ? $odds : $evens }, $_ for #arr;
print "#$_\n" for $odds, $evens;
output
1 3 5
2 4

List::UtilsBy::extract_by is like grep but it modifies the input list:
use List::UtilsBy 'extract_by';
my #arr = (1,2,3,4,5);
my #evens = #arr;
my #odds = extract_by { $_ % 2 } #evens;
print "#evens\n#odds\n";
Output:
2 4
1 3 5
There is also List::UtilsBy::partition_by:
my %parts = partition_by { $_ % 2 } #arr;
#evens = #{$parts{0}}; # (2,4)
#odds = #{$parts{1}}; # (1,3,5)

How to create multiple multi-dimensional arrays from STDIN in perl?

So I'm getting input from STDIN like:
1 2 3
4 5 6
7 6 3
4 3 2
2 3 5
2 5 1
Blank lines separate the matrices, so the above input should create two multi-dimensional arrays...I know how to create one (code below), but how do I create multiple ones depending on how many blank lines the user inputs?
I won't know how many arrays the user wants to create so how can I dynamically create arrays depending on the blank lines in the user input?
my #arrayrefs;
while(<>)
{
chomp;
my #data = split(/\s+/,$_);
push #arrayrefs, \#data;
}
for $ref (#arrayrefs){
print "[#$ref] \n";
}

With your data, I'd say using paragraph mode for the input stream would be a good idea. That is basically setting the input record separator $/ to "\n\n", but in this case we will use "", which is a bit more magical in that it is flexible with extra blank lines.
use strict;
use warnings;
use Data::Dumper;
sub parse_data {
my #matrix = map { [ split / / ] } split /\n/, shift;
return \#matrix;
}
my #array;
$/ = "";
while (<>) {
push #array, parse_data($_);
}
print Dumper \#array;
The map/split statement is not as complex as it looks. Reading from right to left:
shift an argument from the argument list #_
split that argument on newline
take each those (i.e. map them) split arguments and split them again on space, and put the result inside an anonymous array, using brackets [ ].
All done.

It won't win any Code Golf competition, but it does seem to work:
$ cat data
1 2 3
4 5 6
7 6 3
4 3 2
2 3 5
2 5 1
$ cat xx.pl
#!/usr/bin/env perl
use strict;
use warnings;
my #matrices;
my #matrix;
sub print_matrices()
{
print "Matrix dump\n";
foreach my $mref (#matrices)
{
foreach my $rref (#{$mref})
{
foreach my $num (#{$rref})
{
print " $num";
}
print "\n";
}
print "\n";
}
}
while(<>)
{
chomp;
if ($_ eq "")
{
my(#result) = #matrix;
push #matrices, \#result;
#matrix = ();
}
else
{
my #row = split(/\s+/,$_);
push #matrix, \#row;
}
}
# In case the last line of the file is not a blank line
if (scalar(#matrix) != 0)
{
my(#result) = #matrix;
push #matrices, \#result;
#matrix = ();
}
print_matrices();
$ perl xx.pl data
Matrix dump
1 2 3
4 5 6
7 6 3
4 3 2
2 3 5
2 5 1
$

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #arrays = [];
while (<>) {
if (my #array = /(\d+)/g) {
push $arrays[$#arrays], \#array;
} else {
push #arrays, [];
}
}
$Data::Dumper::Indent = 0;
printf("%s\n", Dumper $arrays[0]);
printf("%s\n", Dumper $arrays[1]);
Output:
$VAR1 = [['1','2','3'],['4','5','6'],['7','6','3']];
$VAR1 = [['4','3','2'],['2','3','5'],['2','5','1']];