Read space delimited text file into array of hashes [Perl] - arrays

I have text file that matches the following format:
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
And I need to read it into a data structure in perl that will allow me to sort and print according to any of those columns.
From left to right the columns are process_id, memory_size, cpu_time and program_name.
How can I read a text file with formatting like that in a way that allows me to sort the data structure and print it according to the sort?
My attempt so far:
my %tasks;
sub open_file{
if (open (my $input, "task_file" || die "$!\n")){
print "Success!\n";
while( my $line = <$input> ) {
chomp($line);
($process_id, $memory_size, $cpu_time, $program_name) = split( /\s/, $line, 4);
$tasks{$process_id} = $process_id;
$tasks{$memory_size} = $memory_size;
$tasks{$cpu_time} = $cpu_time;
$tasks{$program_name} = $program_name;
print "$tasks{$process_id} $tasks{$memory_size} $tasks{$cpu_time} $tasks{$program_name}\n";
}
This does print the output correctly, however I can't figure out how to then sort my resulting %tasks hash by a specific column (i.e. process_id, or any other column) and print the whole data structure in a sorted format.

You're storing the values under keys that are equal to the values. Use Data::Dumper to inspect the structure:
use Data::Dumper;
# ...
print Dumper(\%tasks);
You can store the pids in a hash of hashes, using the value of each column as the inner key.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw{ say };
my #COLUMNS = qw( memory cpu program );
my %sort_strings = ( program => sub { $a cmp $b } );
my (%process_details, %sort);
while (<DATA>) {
my ($process_id, $memory_size, $cpu_time, $program_name) = split;
$process_details{$process_id} = { memory => $memory_size,
cpu => $cpu_time,
program => $program_name };
undef $sort{memory}{$memory_size}{$process_id};
undef $sort{cpu}{$cpu_time}{$process_id};
undef $sort{program}{$program_name}{$process_id};
}
say 'By pid:';
say join ', ', $_, #{ $process_details{$_} }{#COLUMNS}
for sort { $a <=> $b } keys %process_details;
for my $column (#COLUMNS) {
say "\nBy $column:";
my $cmp = $sort_strings{$column} || sub { $a <=> $b };
for my $value (sort $cmp keys %{ $sort{$column} }
) {
my #pids = keys %{ $sort{$column}{$value} };
say join ', ', $_, #{ $process_details{$_} }{#COLUMNS}
for #pids;
}
}
__DATA__
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
But if the data aren't really large and the sorting isn't time critical, just sorting the whole array of arrays by a given column is much easier to write and read:
#!/usr/bin/perl
use strict;
use feature qw{ say };
use warnings;
use enum qw( PID MEMORY CPU PROGRAM );
my #COLUMN_NAMES = qw( pid memory cpu program );
my %sort_strings = ((PROGRAM) => 1);
my #tasks;
push #tasks, [ split ] while <DATA>;
for my $column_index (0 .. $#COLUMN_NAMES) {
say "\nBy $COLUMN_NAMES[$column_index]:";
my $sort = $sort_strings{$column_index}
? sub { $a->[$column_index] cmp $b->[$column_index] }
: sub { $a->[$column_index] <=> $b->[$column_index] };
say "#$_" for sort $sort #tasks;
}
__DATA__
...
You need to install the enum distribution.

I can't figure out how to then sort my resulting %tasks hash by a specific column
You can't sort a hash. You need to convert each of your input rows in a hash (which you're doing successfully) and then store all of those hashes in an array. You can then print the contents of the array in a sorted order.
This seems to do what you want:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my #cols = qw[process_id memory_size cpu_time program_name];
#ARGV or die "Usage: $0 [sort_order]\n";
my $sort = lc shift;
if (! grep { $_ eq $sort } #cols ) {
die "$sort is not a valid sort order.\n"
. "Valid sort orders are: ", join('/', #cols), "\n";
}
my #data;
while (<DATA>) {
chomp;
my %rec;
#rec{#cols} = split;
push #data, \%rec;
}
if ($sort eq $cols[-1]) {
# Do a string sort
for (sort { $a->{$sort} cmp $b->{$sort} } #data) {
say join ' ', #{$_}{#cols};
}
} else {
# Do a numeric sort
for (sort { $a->{$sort} <=> $b->{$sort} } #data) {
say join ' ', #{$_}{#cols};
}
}
__DATA__
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
I've used the built-in DATA filehandle to make the code simpler. You would need to replace that with some code to read from an external file.
I've used a hash slice to simplify reading the data into a hash.
The column that you want to sort by is passed into the program as a command-line argument.
Note that you have to sort the last column (the program name) using string comparison and all other columns using numeric comparison.

This decides how to sort using the first argument the script receives.
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
open my $fh, '<', 'task_file';
my #tasks;
my %sort_by = (
process_id=>0,
memory_size=>1,
cpu_time=>2,
program_name=>3
);
my $sort_by = defined $sort_by{defined $ARGV[0]?$ARGV[0]:0} ? $sort_by{$ARGV[0]} : 0;
while (<$fh>) {
push #tasks, [split /\s+/, $_];
}
#tasks = sort {
if ($b->[$sort_by] =~ /^[0-9]+$/ ) {
$b->[$sort_by] <=> $a->[$sort_by];
} else {
$a->[$sort_by] cmp $b->[$sort_by];
}
} #tasks;
for (#tasks) {
say join ' ', #{$_};
}

Related

sort hash of array by value

I am trying to sort by value in a HoA wherein key => [ a, b, c]
I want to sort alphabetically and have tried and read with no success. I think its the commas, but please help! Below is a short snippet. The raw data is exactly how it appears in the data dumper print vs. the CLI. I have to use some sort of delimiter otherwise the cli output is tedious! Thank you!
use strict;
use warnings;
my ( $lsvm_a,$lsvm_b,%hashA,%hashB );
my $vscincludes = qr/(^0x\w+)\,\w+\,\w+.*/; #/
open (LSMAP_A, "-|", "/usr/ios/cli/ioscli lsmap -vadapter vhost7 -field clientid vtd backing -fmt ," ) or die $!;
while ($lsvm_a = (<LSMAP_A>)) {
chomp($lsvm_a);
next unless $lsvm_a =~ /$vscincludes/;
#{$hashA{$1}} = (split ',', $lsvm_a);
}
open (LSMAP_B, "-|", "/usr/sbin/clcmd -m xxxxxx /usr/ios/cli/ioscli lsmap -vadapter vhost29 -field clientid vtd backing -fmt ," ) or die $!;
while ($lsvm_b = (<LSMAP_B>)) {
chomp($lsvm_b);
next unless $lsvm_b =~ /$vscincludes/;
push #{$hashA{$1}}, (split ',', $lsvm_b);
}
print "\n\nA:";
for my $key ( sort { $hashA{$a} cmp $hashA{$b} } keys %hashA ) {
print "$key => '", join(", ", #{$hashA{$key}}), "'\n";
}
##
print "===\nB:";
foreach my $key ( sort { (#{$hashB{$a}}) cmp (#{$hashB{$b}}) } keys %hashB ) {
print "$key ==> #{$hashB{$key}}\n";
}
print "\n\n__DATA_DUMPER__\n\n";
use Data::Dumper; print Dumper \%hashA; print Dumper \%hashB;
Output
A:
0x00000008 => '0x00000008, atgdb003f_avg01, hdisk10, atgdb003f_ovg01, hdisk96, atgdb003f_pvg01, hdisk68, atgdb003f_rvg01, hdisk8, vtscsi0, atgdb003f_data.5bcd027df10f27bf9a880ce7bc1dd924'
===
B:
0x00000008 => '0x00000008, atgdb003f_avg01, hdisk10, atgdb003f_data, atgdb003f_data.5bcd027df10f27bf9a880ce7bc1dd924, atgdb003f_ovg01, hdisk96, atgdb003f_pvg01, hdisk68, atgdb003f_rvg01, hdisk8'
__DATA_DUMPER__
$VAR1 = {
'0x00000008' => [
'0x00000008',
'atgdb003f_avg01',
'hdisk10',
'atgdb003f_ovg01',
'hdisk96',
'atgdb003f_pvg01',
'hdisk68',
'atgdb003f_rvg01',
'hdisk8',
'vtscsi0',
'atgdb003f_data.5bcd027df10f27bf9a880ce7bc1dd924'
]
};
$VAR1 = {
'0x00000008' => [
'0x00000008',
'atgdb003f_avg01',
'hdisk10',
'atgdb003f_data',
'atgdb003f_data.5bcd027df10f27bf9a880ce7bc1dd924',
'atgdb003f_ovg01',
'hdisk96',
'atgdb003f_pvg01',
'hdisk68',
'atgdb003f_rvg01',
'hdisk8'
]
};
### CLI out ###
###0x00000008,atgdb003f_avg01,hdisk10,atgdb003f_ovg01,hdisk96,atgdb003f_pvg01,hdisk68,atgdb003f_rvg01,hdisk8,vtscsi0,atgdb003f_data.5bcd027df10f27bf9a880ce7bc1dd924
###0x00000008,atgdb003f_avg01,hdisk10,atgdb003f_data,atgdb003f_data.5bcd027df10f27bf9a880ce7bc1dd924,atgdb003f_ovg01,hdisk96,atgdb003f_pvg01,hdisk68,atgdb003f_rvg01,hdisk8
Update The arrayrefs (hash values) have multiple elements after all, and need be sorted. Then
for my $key (keys %h) { #{$h{$key}} = sort #{$h{$key}} }
or, more efficiently† (and in the statement modifier form, with less noise but perhaps less clear)
$h{$_} = [ sort #{$h{$_}} ] for keys %h;
The sort by default uses lexicographical sort, as wanted.
Keys are desired to be sorted numerically, but note that while we can rewrite the arrays to make them sorted it is not so with hashes, which are inherently unordered. We can print sorted of course
foreach my $k (sort { $a <=> $b } keys %h) { ... }
This will warn if keys aren't numbers.
† By 56% – 60% in my benchmarks on three different machines, with both v5.16 and v5.30.0
Original post
I take it that you need to sort a hash which has an arrayref for a value, whereby that arrayref has a single element. Then sort on that, first, element
foreach my $key ( sort { $hashB{$a}->[0] cmp $hashB{$b}->[0] } keys %hashB ) {
print "$key ==> #{$hashB{$key}}\n";
}
See the cmp operator under Equality operators in perlop. It takes scalars, which are stringwise compared (so the attempted sorting with an array from the question is wrong since cmp would get lengths of those arrays to sort by!)
In my understanding your hash to sort is like
$VAR1 = {
'0x00000008' => [ 'atgdb003f_avg01,hdisk10,atgdb003f_ovg01,...' ],
...
}
where each value is an arrayref with exactly one element.

Perl formatting array output.

I have a small program that I am trying to format the output.
The results get loaded in to an array - I am just having trouble formating the
printing out the array into a certain format.
#!/usr/bin/perl
use strict ;
use warnings ;
my #first_array ;
my #second_array ;
my #cartesian ;
while (<>) {
my $first_input = $_ ;
#first_array = split(' ', $first_input) ;
last ;
}
while (<>) {
my $second_input = $_ ;
#second_array = split(' ', $second_input) ;
last ;
}
while(my $first=shift(#first_array)) {
push(#cartesian, $first) ;
my $second = shift(#second_array) ;
push(#cartesian, $second ) ;
}
print "This is the merged array: #cartesian\n" ;
When I enter this in, I get this:
$ ./double_while2.pl
1 2 3
mon tue wed
This is the merged array 1 mon 2 tue 3 wed
what I want to print out is :
"1", "mon",
"2", "tue" ,
"3", "wed",
or alternately:
1 => "mon",
2 => "tue",
3 => "wed,
May I suggest a hash, since you are pairing things
my %cartesian;
#cartesian{ #first_array } = #second_array;
print "$_ => $cartesian{$_}\n" for sort keys %cartesian;
A hash slice is used above. See Slices in perldata
The arrays that you build had better pair up just right, or there will be errors.
If the goal is to build a data structure that pairs up elements, that can probably be done directly, without arrays. More information would help to comment on that.
Try to use hash instead.
for my $i(0..$#first_array){
$hash{$first_array[$i]} = $second_array[$i];
}
or else, you want format without using hashes, try as follows
for (my $i = 0; $i < $#cartesion/2; $i++) {
my $j = ($cartesion/2) + $i;
print "$cartesion[$i] $cartesion[$j] \n";
}
From your question and your code, I suppose that you are a lovely new 'victim' to perl ~
To merge two arrays with same lengh, I suggeest using 'map' to simplify your code:
my #cartesian = map {$first_array[$_], $second_array[$_]} 0..$#first_array;
and to format print style , you can define a subroutine to meet your different requirements:
sub format_my_array{
my $array_ref = shift;
my $sep = shift;
print $array_ref->[$_],$sep,$array_ref->[$_+1],"\n" for grep {! ($_%2)} 0..$#$array_ref;
}
Now, you can try calling your subroutine:
format_my_array(\#cartesian, " => ");
or
format_my_array(\#cartesian, " , ");
Now, you get what you want~
You may have noticed that some intermediate concepts are used in this answer, don't doute , that's exactly what I'm trying to introduce you to ~
May you the great happiness in learning perl ~
The trick is to go with Perl's strengths instead of fighting against them:
#!/usr/bin/perl
use strict;
use warnings;
# For say()
use 5.010;
my #first_array = split ' ', <>;
my #second_array = split ' ', <>;
if (#first_array != #second_array) {
die "Arrays must be the same length\n";
}
my #cartesian = map { $first_array[$_], $second_array[$_] } 0 .. $#first_array;
for (0 .. $#cartesian / 2) {
say "$cartesian[$_*2] => $cartesian[$_*2+1]";
}
But, it gets much easier still if you use a hash instead of an array for your merged data.
#!/usr/bin/perl
use strict;
use warnings;
# For say()
use 5.010;
my #first_array = split ' ', <>;
my #second_array = split ' ', <>;
if (#first_array != #second_array) {
die "Arrays must be the same length\n";
}
my %cartesian;
#cartesian{#first_array} = #second_array;
for (sort keys %cartesian) {
say "$_ => $cartesian{$_}";
}

Compare two hash of arrays

I have two arrays and a hash holds these arrays
Array 1:
my $group = "west"
#{ $my_big_hash{$group} } = (1534,2341,2322,3345,689,3333,4444,5533,3334,5666,6676,3435);
Array 2 :
my $element = "Location" ;
my $group = "west" ;
#{ $my_tiny_hash{$element}{$group} } = (153,333,667,343);
Now i would want to compare
#{ $my_tiny_hash{$element}{$group} }
with
#{ $my_big_hash{$group} }
and check whether all the elements of tiny hash array are a part of big_hash array .
As we can see tiny hash has just 3 digit elements and all these elements are matching with big hash if we just compare the first 3 digits
if first 3 digits/letters match and all are available in the big array, then its matching or We have to print the unmatched elements
Its an array to array comparison.
How do we achieve it.
PS : Without Array Utils , How to achieve it
The solution using Array Utils is really simple
my #minus = array_minus( #{ $my_tiny_hash{$element}{$group} } , #{ $my_big_hash{$group} } );
But it compares all the digits and i would just want to match the first 3 digits
Hope this is clear
Thanks
This seems to do what you want.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my (%big_hash, %tiny_hash);
my $group = 'west';
my $element = 'Location';
# Less confusing initialisation!
$big_hash{$group} = [1534,2341,2322,3345,689,3333,4444,5533,3334,5666,6676,3435];
$tiny_hash{$element}{$group} = [153,333,667,343];
# Create a hash where the keys are the first three digits of the numbers
# in the big array. Doesn't matter what the values are.
my %check_hash = map { substr($_, 0, 3) => 1 } #{ $big_hash{$group} };
# grep the small array by checking the elements' existence in %check_hash
my #missing = grep { ! exists $check_hash{$_} } #{ $tiny_hash{$element}{$group} };
say "Missing items: #missing";
Update: Another solution that seems closer to your original code.
my #truncated_big_array = map { substr($_, 0, 3) } #{ $big_hash{$group} };
my #minus = array_minus( #{ $my_tiny_hash{$element}{$group} } , #truncated_big_array );
A quick and bit dirty solution (which extends your existing code).
#!/usr/bin/perl
use strict;
use warnings;
my (%my_big_hash, %my_tiny_hash, #temp_array);
my $group = "west";
#{ $my_big_hash{$group} } = (1534,343,2341,2322,3345,689,3333,4444,5533,3334,5666,6676,3435);
foreach (#{ $my_big_hash{$group} }){
push #temp_array, substr $_, 0,3;
}
my $element = "Location";
my $group2 = "west";
#{ $my_tiny_hash{$element}{$group2} } = (153,333,667,343,698);
#solution below
my %hash = map { $_ => 1 } #temp_array;
foreach my $search (#{$my_tiny_hash{'Location'}->{west}}){
if (exists $hash{$search}){
print "$search exists\n";
}
else{
print "$search does not exist\n";
}
}
Output:
153 exists
333 exists
667 exists
343 exists
698 does not exist
Demo
Also see: https://stackoverflow.com/a/39585810/257635
Edit: As per request using Array::Utils.
foreach (#{ $my_big_hash{$group} }){
push #temp_array, substr $_, 0,3;
}
my #minus = array_minus( #{ $my_tiny_hash{$element}{$group} } , #temp_array );
print "#minus";
An alternative, using ordered comparison instead of hashes:
#big = sort (1534,2341,2322,3345,689,3333,4444,5533,3334,5666,6676,3435);
#tiny = sort (153,333,667,343,698);
for(#tiny){
shift #big while #big and ($big[0] cmp $_) <0;
push #{$result{
$_ eq substr($big[0],0,3)
? "found" : "missing" }},
$_;
}
Contents of %result:
{
'found' => [
153,
333,
343,
667
],
'missing' => [
698
]
}

ID tracking while swapping and sorting other two arrays in perl

#! /usr/bin/perl
use strict;
my (#data,$data,#data1,#diff,$diff,$tempS,$tempE, #ID,#Seq,#Start,#End, #data2);
#my $file=<>;
open(FILE, "< ./out.txt");
while (<FILE>){
chomp $_;
#next if ($line =~/Measurement count:/ or $line =~/^\s+/) ;
#push #data, [split ("\t", $line)] ;
my #data = split('\t');
push(#ID, $data[0]);
push(#Seq, $data[1]);
push(#Start, $data[2]);
push(#End, $data[3]);
# push #$data, [split ("\t", $line)] ;
}
close(FILE);
my %hash = map { my $key = "$ID[$_]"; $key => [ $Start[$_], $End[$_] ] } (0..$#ID);
for my $key ( %hash ) {
print "Key: $key contains: ";
for my $value ($hash{$key} ) {
print " $hash{$key}[0] ";
}
print "\n";
}
for (my $j=0; $j <=$#Start ; $j++)
{
if ($Start[$j] > $End[$j])
{
$tempS=$Start[$j];
$Start[$j]=$End[$j];
$End[$j]=$tempS;
}
print"$tempS\t$Start[$j]\t$End[$j]\n";
}
my #sortStart = sort { $a <=> $b } #Start;
my #sortEnd = sort { $a <=> $b } #End;
#open(OUT,">>./trial.txt");
for(my $i=1521;$i>=0;$i--)
{
print "hey";
my $diff = $sortStart[$i] - $sortStart[$i-1];
print "$ID[$i]\t$diff\n";
}
I have three arrays of same length, ID with IDs (string), Start and End with integer values (reading from a file).
I want to loop through all these arrays and also want to keep track of IDs. First am swapping elements in Start with End if Start > End, then I have to sort these two arrays for further application (as I am negating Start[0]-Start[1] for each item in that Start). While sorting, the Id values may change, and as my IDs are unique for each Start and End elements, how can I keep track of my IDs while sorting them?
Three arrays, ID, Start and End, are under my consideration.
Here is a small chunk of my input data:
DQ704383 191990066 191990037
DQ698580 191911184 191911214
DQ724878 191905507 191905532
DQ715191 191822657 191822686
DQ722467 191653368 191653339
DQ707634 191622552 191622581
DQ715636 191539187 191539157
DQ692360 191388765 191388796
DQ722377 191083572 191083599
DQ697520 189463214 189463185
DQ709562 187245165 187245192
DQ540163 182491372 182491400
DQ720940 180753033 180753060
DQ707760 178340696 178340726
DQ725442 178286164 178286134
DQ711885 178250090 178250119
DQ718075 171329314 171329344
DQ705091 171062479 171062503
The above ID, Start, End respectively. If Start > End i swapped them only between those two arrays. But after swapping the descending order may change, but i want them in descending order also their corresponding ID for negation as explained above.
Don't use different arrays, use a hash to keep the related pieces of information together.
#!/usr/bin/perl
use warnings;
use strict;
use enum qw( START END );
my %hash;
while (<>) {
my ($id, $start, $end) = split;
$hash{$id} = [ $start < $end ? ($start, $end)
: ($end, $start) ];
}
my #by_start = sort { $hash{$a}[START] <=> $hash{$b}[START] } keys %hash;
my #by_end = sort { $hash{$a}[END] <=> $hash{$b}[END] } keys %hash;
use Test::More;
is_deeply(\#by_start, \#by_end, 'same');
done_testing();
Moreover, in the data sample you provided, the order of id's is the same regardless of by what you sort them.

combine hashes from two files into a single file

I have two files containing data like this:
FILE1 contains group numbers (first column) and the frequency (third column) of their switching another group (second column):
FILE1:
1 2 0.6
2 1 0.6
3 1 0.4
1 3 0.4
2 3 0.2
etc...
FILE2 contains group numbers (first columns) and their frequency of occurrence (second column).
FILE2:
1 0.9
2 0.7
3 0.5
etc...
I want to make another file containing FILE2 with the values for each switch from FILE1 like this:
1 0.9 2 0.6 3 0.4 ...
2 0.7 1 0.6 3 0.2 ...
Basically, I want first column to be the group number, second the frequency of its occurrence, then the group they switch to and the frequency of that switch, then next switch all in the same line for that particular group, then next line - group 2 etc.
So I want to read in FILE1, make a hash of arrays for each group with keys being group numbers and the values being the group they switch to and the frequency of that switch. I will have one big array for each group containing subarrays of each group they switch to and frequency. Then I want to make another hash with the same keys as in the first hash but with the numbers from the first column in FILE2 and values from the second column of FILE2. Then I will print out "hash2 key hash2 value hash1 whole array for that key". This is my attempt using Perl:
#!/usr/bin/perl -W
$input1= $ARGV[0];
$input2 = $ARGV[1];
$output = $ARGV[2];
%switches=();
open (IN1, "$input1");
while (<IN1>) {
#tmp = split (/\s+/, $_);
chomp #tmp;
$group = shift #tmp;
$switches{$group} = [#tmp];
push (#{$switches{$group}}, [#tmp]);
}
close IN1;
%groups=();
open (IN2, "$input2");
while (<IN2>) {
chomp $_;
($group, $pop) = split (/\s+/, $_);
$groups{$group} = $pop;
}
close IN2;
open (OUT, ">$output");
foreach $group (keys %groups) {
print OUT "$group $pop #{$switches{$group}}\n"
}
close OUT;
The output I get contains something like:
1 0.1 2 0.1 ARRAY(0x100832330)
2 0.3 5 0.2 ARRAY(0x1008325d0)
So basically:
"group" "one last frequency number" "one last group that that group switches to" "one last switch frequency" "smth like ARRAY(0x100832330)"
I assume I am doing smth wrong with pushing all switches into the hash of arrays while in FILE1 and also with dereferencing at the end when I print out.
Please help,
Thanks!
Your %switches hash contains redundant information; just use the push. Also, you need to do more work to print out what you want. Here is your code with minimal changes:
$input1= $ARGV[0];
$input2 = $ARGV[1];
$output = $ARGV[2];
%switches=();
open (IN1, "$input1");
while (<IN1>) {
#tmp = split (/\s+/, $_);
chomp #tmp;
$group = shift #tmp;
push (#{$switches{$group}}, [#tmp]);
}
close IN1;
%groups=();
open (IN2, "$input2");
while (<IN2>) {
chomp $_;
($group, $pop) = split (/\s+/, $_);
$groups{$group} = $pop;
}
close IN2;
open (OUT, ">$output");
foreach $group (sort {$a <=> $b} keys %groups) {
print OUT "$group $groups{$group} ";
for my $aref (#{$switches{$group}}) {
print OUT "#{$aref}";
}
print OUT "\n";
}
close OUT;
__END__
1 0.9 2 0.63 0.4
2 0.7 1 0.63 0.2
3 0.5 1 0.4
See also perldoc perldsc and perldoc Data::Dumper
Since each column represents something of value, instead of an array, you should store your data in a more detailed structure. You can do this via references in Perl.
A reference is a pointer to another data structure. For example, you could store your groups in a hash. However, instead of each hash value containing a bunch of numbers separate by spaces, each hash value instead points to an array that contains the data points for that group. And, each of these data points in that array points to a hash whose keys are SWITCH representing their switching and FREQ for their frequency.
You could talk about the frequency of the first data point of Group 1 as:
$data{1}->[0]->{FREQ};
This way, you can more easily manipulate your data -- even if you're simply rewriting it into another flat file. You can also use the Storable module to write your data in a way which saves its structure.
#! /usr/bin/env perl
#
use strict;
use feature qw(say);
use autodie;
use warnings;
use Data::Dumper;
use constant {
FILE1 => "file1.txt",
FILE2 => "file2.txt",
};
my %data; # A hash of an array of hashes (superfun!)
open my $fh1, "<", FILE1;
while ( my $line = <$fh1> ) {
chomp $line;
my ( $group, $switch, $frequency ) = split /\s+/, $line;
if ( not exists $data{$group} ) {
$data{$group} = [];
}
push #{ $data{$group} }, { SWITCH => $switch, FREQ => $frequency };
}
close $fh1;
open my $fh2, "<", FILE2;
while ( my $line = <$fh2> ) {
chomp $line;
my ( $group, $frequency ) = split /\s+/, $line;
if ( not exists $data{$group} ) {
$data{$group} = [];
}
push #{ $data{$group} }, { SWITCH => undef, FREQ => $frequency };
}
close $fh2;
say Dumper \%data;
This will give you:
$VAR1 = {
'1' => [
{
'SWITCH' => '2',
'FREQ' => '0.6'
},
{
'SWITCH' => '3',
'FREQ' => '0.4'
},
{
'SWITCH' => undef,
'FREQ' => '0.9'
}
],
'3' => [
{
'SWITCH' => '1',
'FREQ' => '0.4'
},
{
'SWITCH' => undef,
'FREQ' => '0.5'
}
],
'2' => [
{
'SWITCH' => '1',
'FREQ' => '0.6'
},
{
'SWITCH' => '3',
'FREQ' => '0.2'
},
{
'SWITCH' => undef,
'FREQ' => '0.7'
}
]
};
This will do what you need.
I apologize for the lack of analysis, but it is late and I should be in bed.
I hope this helps.
use strict;
use warnings;
my $fh;
my %switches;
open $fh, '<', 'file1.txt' or die $!;
while (<$fh>) {
my ($origin, #switch) = split;
push #{ $switches{$origin} }, \#switch;
}
open $fh, '<', 'file2.txt' or die $!;
while (<$fh>) {
my ($origin, $freq) = split;
my $switches = join ' ', map join(' ', #$_), #{ $switches{$origin} };
print join(' ', $origin, $freq, $switches), "\n";
}
output
1 0.9 2 0.6 3 0.4
2 0.7 1 0.6 3 0.2
3 0.5 1 0.4
Update
Here is a fixed version of your own code that produces similar results. The main problem is that the values in your %switches arrays of arrays, so you have to do two dereferences. I've fixed that by adding #switches, which contains the same contents as the current %switches value, but has strings in place of two-element arrays.
I've also added use strict and use warnings, and declared all your variables properly. The open calls have been changed to the three-argument open with lexical file handles as they should be, and they are now being checked for success. I've changed your split calls, as a simple bare split with no parameters is all you need. And I've removed your #tmp and used proper list assignments instead. Oh, and I've changed the wasteful [#array] to a simple \#array (which wouldn't have worked without declaring variables using my).
I still think my version is better, if only because it's much shorter, and yours prints the groups in random order.
#!/usr/bin/perl
use strict;
use warnings;
my ($input1, $input2, $output) = #ARGV;
my %switches;
open my $in1, '<', $input1 or die $!;
while (<$in1>) {
my ($group, #switches) = split;
push #{ $switches{$group} }, \#switches;
}
close $in1;
my %groups;
open my $in2, '<', $input2 or die $!;
while (<$in2>) {
my ($group, $pop) = split;
$groups{$group} = $pop;
}
close $in2;
open my $out, '>', $output or die $!;
for my $group (keys %groups) {
my $pop = $groups{$group};
my #switches = map "#$_", #{ $switches{$group} };
print $out "$group $pop #switches\n"
}
close $out or die $!;

Resources