Array manipulation in Perl - arrays

The Scenario is as follows:
I have a dynamically changing text file which I'm passing to a variable to capture a pattern that occurs throughout the file. It looks something like this:
my #array1;
my $file = `cat <file_name>.txt`;
if (#array1 = ( $file =~ m/<pattern_match>/g) ) {
print "#array1\n";
}
The array looks something like this:
10:38:49 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54 10:38:51 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54
From the above array1 output, the pattern of the array is something like this:
T1 P1 t1(1) t1(2)...t1(25) T2 P2 t2(1) t2(2)...t2(25) so on and so forth
Currently, /g in the regex returns a set of values that occur only twice (only because the txt file contains this pattern that number of times). This particular pattern occurrence will change depending on the file name that I plan to pass dynamically.
What I intend to acheive:
The final result should be a csv file that contains these values in the following format:
T1,P1,t1(1),t1(2),...,t1(25)
T2,P2,t2(1),t2(2),...,t2(25)
so on and so forth
For instance: My final CSV file should look like this:
10:38:49,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
10:38:51,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
The delimiter for this pattern is T1 which is time in the format \d\d:\d\d:\d\d
Example: 10:38:49, 10:38:51 etc
What I have tried so far:
use Data::Dumper;
use List::MoreUtils qw(part);
my $partitions = 2;
my $i = 0;
print Dumper part {$partitions * $i++ / #array1} #array1;
In this particular case, my $partitions = 2; holds good since the pattern occurrence in the txt file is only twice, and hence, I'm splitting the array into two. However, as mentioned earlier, the pattern occurrence number keeps changing according to the txt file I use.
The Question:
How can I make this code more generic to achieve my final goal of splitting the array into multiple equal sized arrays without losing the contents of the original array, and then converting these mini-arrays into one single CSV file?
If there is any other workaround for this other than array manipulation, please do let me know.
Thanks in advance.
PS: I considered Hash of Hashes and Array of Hashes, but that kind of a data structure did not seem to be healthy solution for the problem I'm facing right now.

As far as I can tell, all you need is splice, which will work fine as long as you know the record size and it's constant
The data you showed has 52 fields, but the description of it requires 27 fields per record. It looks like each line has T, P, and t1 .. t24, rather than ending at t25
Here's how it looks if I split the data into 26-element chunks
use strict;
use warnings 'all';
my #data = qw/
10:38:49 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54 10:38:51 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54
/;
while ( #data ) {
my #set = splice #data, 0, 26;
print join(',', #set), "\n";
}
output
10:38:49,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
10:38:51,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
If you wanted to use List::MoreUtils instead of splice, the the natatime function returns an iterator that will do the same thing as the splice above
Like this
use List::MoreUtils qw/ natatime /;
my $iter = natatime 26, #data;
while ( my #set = $iter->() ) {
print join(',', #set), "\n";
}
The output is identical to that of the program above
Note
It is very wrong to start a new shell process just to use cat to read a file. The standard method is to undefine the input record separator $/ like this
my $file = do {
open my $fh, '<', '<file_name>.txt' or die "Unable to open file for input: $!";
local $/;
<$fh>;
};
Or if you prefer you could use File::Slurper like this
use File::Slurper qw/ read_binary /;
my $file = read_binary '<file_name>.txt';
although you will probably have to install it as it is not a core module

Related

Why after the while loop I am only getting last row value?

This is the files I am reading,
#Log1
Time Src_id Des_id Address
0 34 56 x9870
2 36 58 x9872
4 38 60 x9874
6 40 62 x9876
8 42 64 x9878
#Log2
Time Src_id Des_id Address
1 35 57 x9871
3 37 59 x9873
5 39 61 x9875
7 41 63 x9877
9 43 65 x9879
This the code I wrote where I am reading line by line and then spliting it
#!usr/bin/perl
use warnings;
use strict;
my $log1_file = "log1.log";
my $log2_file = "log2.log";
open(IN1, "<$log1_file" ) or die "Could not open file $log1_file: $!";
open(IN2, "<$log2_file" ) or die "Could not open file $log2_file: $!";
my $i_d1;
my $i_d2;
my #fields1;
my #fields2;
while (my $line = <IN1>) {
#fields1 = split " ", $line;
}
while (my $line = <IN2>) {
#fields2 = split " ", $line;
}
print "#fields1\n";
print "#fields2\n";
close IN1;
close IN2;
Output I am getting
8 42 64 x9878
9 43 65 x9879
Output Desired
Time Src_id Des_id Address
0 34 56 x9870
2 36 58 x9872
4 38 60 x9874
6 40 62 x9876
8 42 64 x9878
9 43 65 x9879
Time Src_id Des_id Address
1 35 57 x9871
3 37 59 x9873
5 39 61 x9875
7 41 63 x9877
9 43 65 x9879
If I use push(#fields1 , split " ", $line); I am getting output like this,
Time Src_id Des_id Address 0 34 56 x9870 B 36 58 x9872 D 38 60 x9874 F 40 62 x9876 H 42 64 x9878
It should print whole array but printing just last row?
Also after this I need to compare both the "Times" part of both log & print in sequence way but don't know how to run both array simultaneously in while loop?
Please suggest in standard way without any modules because I need to run this in someone else server.
Following code demonstrates how to read and print log files
(OP does not specify why he splits lines into fields)
use strict;
use warnings;
use feature 'say';
my $fname1 = 'log1.txt';
my $fname2 = 'log2.txt';
my $div = "\t";
my $file1 = read_file($fname1);
my $file2 = read_file($fname2);
print_file($file1,$div);
print_file($file2,$div);
sub read_file {
my $fname = shift;
my #data;
open my $fh, '<', $fname
or die "Couldn't read $fname";
while( <$fh> ) {
chomp;
next if /^#Log/;
push #data, [split];
}
close $fh;
return \#data;
}
sub print_file {
my $data = shift;
my $div = shift;
say join($div,#{$_}) for #{$data};
}
Output
Time Src_id Des_id Address
0 34 56 x9870
2 36 58 x9872
4 38 60 x9874
6 40 62 x9876
8 42 64 x9878
Time Src_id Des_id Address
1 35 57 x9871
3 37 59 x9873
5 39 61 x9875
7 41 63 x9877
9 43 65 x9879
Let's assume that OP wants to merge two files into one with sorted lines on Time field
read files into %data hash with Time field as key
print header (#fields)
print hash values sorted on Time key
use strict;
use warnings;
use feature 'say';
my(#fields,%data);
my $fname1 = 'log1.txt';
my $fname2 = 'log2.txt';
read_data($fname1);
read_data($fname2);
say join("\t",#fields);
say join("\t",#{$data{$_}}) for sort { $a <=> $b } keys %data;
sub read_data {
my $fname = shift;
open my $fh, '<', $fname
or die "Couldn't open $fname";
while( <$fh> ) {
next if /^#Log/;
if( /^Time/ ) {
#fields = split;
} else {
my #line = split;
$data{$line[0]} = \#line;
}
}
close $fh;
}
Output
Time Src_id Des_id Address
0 34 56 x9870
1 35 57 x9871
2 36 58 x9872
3 37 59 x9873
4 38 60 x9874
5 39 61 x9875
6 40 62 x9876
7 41 63 x9877
8 42 64 x9878
9 43 65 x9879
Because #fields* gets overwritten during each loop. You need this:
while(my $line = <IN1>){
my #tmp = split(" ", $line);
push(#fields1, \#tmp);
}
foreach $item (#fields1){
print("#{$item}\n");
}
Then #fields1 contains references pointing to the splited array.
The final #fields1 looks like:
#fields1 = (
<ref> ----> ["0", "34", "56", "x9870"]
<ref> ----> ["2", "36", "58", "x9872"]
...
)
The print will print:
Time Src_id Des_id Address
0 34 56 x9870
2 36 58 x9872
4 38 60 x9874
6 40 62 x9876
8 42 64 x9878
And I guess it would be better if you do chomp($line).
But I'd like to simply do push(#fields1, $line). And split each array item when in comparison stage.
To compare the content of 2 files, I personally would use 2 while loops to read into 2 arrays just like what you have done. Then do the comparison in one for or foreach.
You can merge the log files using paste, and read the resulting merged file one line at a time. This is more elegant and saves RAM. Here is an example of a possible comparison of time1 and time2, writing STDOUT and STDERR into separate files. The example prints into STDOUT all the input fields if time1 < time2 and time1 < 4, otherwise prints a warning into STDERR:
cat > log1.log <<EOF
Time Src_id Des_id Address
0 34 56 x9870
2 36 58 x9872
4 38 60 x9874
6 40 62 x9876
8 42 64 x9878
EOF
cat > log2.log <<EOF
Time Src_id Des_id Address
1 35 57 x9871
3 37 59 x9873
5 39 61 x9875
7 41 63 x9877
9 43 65 x9879
EOF
# Paste files side by side, skip header, read data lines together, compare and print:
paste log1.log log2.log | \
tail -n +2 | \
perl -lane '
BEGIN {
for $file_num (1, 2) { push #col_names, map { "$_$file_num" } qw( time src_id des_id address ) }
}
my %val;
#val{ #col_names } = #F;
if ( $val{time1} < $val{time2} and $val{time1} < 4) {
print join "\t", #val{ #col_names};
} else {
warn "not found: #val{ qw( time1 time2 ) }";
}
' 1>out.tsv 2>out.log
Output:
% cat out.tsv
0 34 56 x9870 1 35 57 x9871
2 36 58 x9872 3 37 59 x9873
% cat out.log
not found: 4 5 at -e line 10, <> line 3.
not found: 6 7 at -e line 10, <> line 4.
not found: 8 9 at -e line 10, <> line 5.
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

How to increment through the values in a list?

I have repeatedly tried to search for this question, but cannot form a search that produces results that are actually relevant to my question.
I am trying to build a script that parses an SVG file (XML text format file that produces graphic content) looking for specific cues and assigns RGB values. (Each RGB value will be slightly different.)
To do this, I picture multiple incrementing variables (for instance $i, $j & $k) that increment based on triggers found while parsing the text file.
However, the amounts that I need to increment are not "1". Also, the values needed are hexadecimal in form.
I could set up something where the vars are incremented by a given amount, such as +33, but I would also need to convert numbers to hex, figure out how to start over, etc.
A far more versatile, powerful and elegant approach occurs to me, but I don't know how to go about it.
How can I set up these incrementing variables to increment through the values I've set up in an array?
For example, say my RGB potential values are #rgbval = (00,33,66,99,cc,ff). How can I make $i go from one value in this list to the next?
Even better, how could I make $i++ (or something similar) mean "go to the next element's value in #rgbval"?
And assuming this is possible, how would I tell Perl to start over at element [0] after reaching the end of the array?
So you have a string that's the hex representation of a number
my $hex = 'cc';
To do arithmetic on it, you first need to convert that to a number.
my $num = hex($hex);
Now we can do arithmetic on it.
$num += 33;
If we want to convert it back to hex, we can use
$hex = sprintf("%02x", $num);
$i is usually used for indexes. If that's what you want, you can use the following:
for my $i (0..$#rgbval) {
# Do something with $i and/or $rgbval[$i]...
}
If instead you want $i to take on each value, you can use the following:
for my $i (#rgbval) {
# Do something with $i...
}
But it seems to be you want a counter that wraps around.
The straightforward solution would be use an if statement.
my $i = 0;
while (...) {
# Do something with $i and/or $rgbval[$i]...
++$i;
$i = 0 if $i == #rgbval;
}
But I'd use modulo arithmetic.
my $i = 0;
while (...) {
# Do something with $i and/or $rgbval[$i]...
$i = ( $i + 1 ) % #rgbval;
}
Alternatively, you could rotate the array.
while (...) {
# Do something with $rgbval[0]...
push #rgbval, shift(#rgbval);
}
Ikegami, what an excellent bunch of information your response held.
I found three of your proposals too appealing to ignore, and tried to understand them. Your first section was about processing the math described, converting into and out of hex.
I tried to wrestle these steps into a "test of concept" script, along with your suggestion of a modulo reset. (Okay, a "test of understanding of concept".)
For the test script I used an iterator rather than searching for a triggering event, that seemed simpler.
The goal was to have the values increment through the hex numbers listed in the example array, and to start over after the last value.
So I iterated up to ten times, to give the values a chance to start over. After I figured out that I needed to add 51 each time instead of 33 to get those example values, and also had to make the numerical value of my array 51 times larger since I was incrementing by 51, it worked pretty well:
my $num = hex("00");
my #rgbval = qw(a b c d e f);
for my $i (0..10) {
print ( "For i=$i, \$num is " , sprintf("%02x ", $num) , "\n");
$num = ( $num + 51 ) % ( 51 * #rgbval );
}
output:
~\Perlscripts>iterate2.pl
For i=0, $num is 00 For i=1, $num is 33 For i=2, $num is 66 For i=3,
$num is 99 For i=4, $num is cc For i=5, $num is ff For i=6, $num is 00
For i=7, $num is 33 For i=8, $num is 66 For i=9, $num is 99 For i=10,
$num is cc
As far as the non-mathy approach, incrementing through the strings of the array, I understood you to be saying I would need to increment the indices that reference the array values. I did manage to confuse my self with the different iterators and what they were doing, but after a few stumbles, I was able to make this approach work as well:
my #hue = qw(00 33 66 99 cc ff);
my $v = 0;
for my $i (0..10) {
print "\$i=$i, \$hue[$v]=" , $hue[$v] , "\n";
$v = ( $v + 1 ) % #hue;
}
output:
~\Perlscripts>iterate2.pl
$i=0, $hue[0]=00 $i=1, $hue[1]=33 $i=2, $hue[2]=66 $i=3,
$hue[3]=99 $i=4, $hue[4]=cc $i=5, $hue[5]=ff $i=6,
$hue[0]=00 $i=7, $hue[1]=33 $i=8, $hue[2]=66 $i=9,
$hue[3]=99 $i=10, $hue[4]=cc
Your last proposed solution, rotating the array with push and shift seemed perhaps the most novel approach and quite compelling, especially once I realized that if i have a variable that stores the shifted value, that will be the correct value to push next time, around and around.
In this approach I don't even have to worry about starting over after the last value; the changing array takes care of that automatically for me:
my #hue = qw(00 33 66 99 cc ff);
for my $i (0..10) {
my $curval = shift(#hue);
print "\$i=$i, \$curval is $curval \.\.\.And the array is currently: ( #hue )\n";
push(#hue,$curval);
}
output:
~\Perlscripts>iterate2.pl
$i=0, $curval is 00 ...And the array is currently: (
33 66 99 cc ff ) $i=1, $curval is 33 ...And the
array is currently: ( 66 99 cc ff 00 ) $i=2, $curval is 66 ...And the array is currently: ( 99 cc ff 00 33 ) $i=3, $curval is
99 ...And the array is currently: ( cc ff 00 33 66 )
$i=4, $curval is cc ...And the array is currently: (
ff 00 33 66 99 ) $i=5, $curval is ff ...And the
array is currently: ( 00 33 66 99 cc ) $i=6, $curval is 00 ...And the array is currently: ( 33 66 99 cc ff ) $i=7, $curval is
33 ...And the array is currently: ( 66 99 cc ff 00 )
$i=8, $curval is 66 ...And the array is currently: (
99 cc ff 00 33 ) $i=9, $curval is 99 ...And the
array is currently: ( cc ff 00 33 66 ) $i=10, $curval is cc ...And the array is currently: ( ff 00 33 66 99 )
Most educational and helpful! Thanks so much.

add each column to array, not just whole line - PERL

I am writing a perl script and currently working on a subroutine to sum all values of an array. Currently, my code only reads in each line and stores the entire line into each array element. I need each individual number stored in it's own element.
Here's a sample of my data:
50 71 55 93 115
45 76 49 88 102
59 78 53 96 145
33 65 39 82 100
54 77 56 98 158
Here's my code:
my #array;
#bring in each line and store into array called 'array'
open(my $fh, "<", "score")
or die "Failed to open file: $!\n";
while(<$fh>) {
chomp;
push #array, $_;
}
close $fh;
When I call my subroutine to sum the values of the array, my result is 241. That is the sum of each of the first numbers in each line.
Any help or suggestions?
So, you want to add all values inside an array. Easy, But In your code, you are adding strings of values instead of value itself.
With push #array, $_; you are creating an array of lines in the file score.
Try:
print Dumper(\#array);
You will see output like this:
$VAR1 = [
'50 71 55 93 115',
'45 76 49 88 102',
'59 78 53 96 145',
'33 65 39 82 100',
'54 77 56 98 158'
];
So when you are adding the values, it adds all elements of array:
'50 71 55 93 115' + '59 78 53 96 145' + '33 65 39 82 100' ......and so on
The moment you put + with string it is treated as numeric and by default, perl adds first character in the string to the first character in the other string. If the first character is not a number, It is treated as 0.
You should check perlop for more info.
The solution for this problem is to separate the numbers from every line, treat each of them individually and store them inside the array. This can be done simply using:
push #array, split;
Now when you try:
print Dumper(\#array);
It will be like this:
$VAR1 = [
'50',
'71',
'55',
'93',
'115',
'45',
'76',
'49',
'88',
'102',
'59',
'78',
'53',
'96',
'145',
'33',
'65',
'39',
'82',
'100',
'54',
'77',
'56',
'98',
'158'
];
After this just call your subroutine using:
my $total_sum = sum(#array);
print $total_sum,"\n";
and define your subroutine as:
sub sum {
my #nums = #_;
my $total_sum = 0;
$total_sum += $_ foreach(#nums);
return $total_sum;
}
The output will be 1937 as expected.

comparing multiple column files using python3

input_file1:
a 1 33
a 34 67
a 68 78
b 1 99
b 100 140
c 1 70
c 71 100
c 101 190
input file2:
a 5 23
a 30 72
a 76 78
b 5 30
c 23 88
c 92 98
I want to compare these two files such that for every value of 'a' in file2 the two integers (boundary) fall in the range (boundaries) of 'a' in file1 or between two ranges.
Instead of storing values like this 'a 1 33', you can make one structure (like 'a:1:33') for your data while writing into file. So that it will become easy to read data also.
Then, you can read each line and can split it based on ':' separator and you can compare with another file easily.

How to make 3D array in BASH?

I want to write/make/use a 3D array of [m][n][k] in BASH. From what I understand, BASH does not support array that are not 1D.
Any ideas how to do it?
Fake multi-dimensionality with a crafted associative array key:
declare -A ary
for i in 1 2 3; do
for j in 4 5 6; do
for k in 7 8 9; do
ary["$i,$j,$k"]=$((i*j*k))
done
done
done
for key in "${!ary[#]}"; do printf "%s\t%d\n" "$key" "${ary[$key]}"; done | sort
1,4,7 28
1,4,8 32
1,4,9 36
1,5,7 35
1,5,8 40
1,5,9 45
1,6,7 42
1,6,8 48
1,6,9 54
2,4,7 56
2,4,8 64
2,4,9 72
2,5,7 70
2,5,8 80
2,5,9 90
2,6,7 84
2,6,8 96
2,6,9 108
3,4,7 84
3,4,8 96
3,4,9 108
3,5,7 105
3,5,8 120
3,5,9 135
3,6,7 126
3,6,8 144
3,6,9 162
I used sort because keys of an assoc.array have no inherent order.
You can use associative arrays if your bash is recent enough:
unset assoc
declare -A assoc
assoc["1.2.3"]=x
But, I'd rather switch to a language that supports multidimensional arrays (e.g. Perl).
As in C, you can simulate multidimensional array using an offset.
#! /bin/bash
xmax=100
ymax=150
zmax=80
xymax=$((xmax*ymax))
vol=()
for ((z=0; z<zmax; z++)); do
for ((y=0; y<ymax; y++)); do
for ((x=0; x<xmax; x++)); do
((t = z*xymax+y*xmax+x))
if ((vol[t] == 0)); then
((vol[t] = vol[t-xymax] + vol[t-ymax] + vol[t-1]))
fi
done
done
done

Resources