I have two files containing data like this:
FILE1 contains group numbers (first column) and the frequency (third column) of their switching another group (second column):
FILE1:
1 2 0.6
2 1 0.6
3 1 0.4
1 3 0.4
2 3 0.2
etc...
FILE2 contains group numbers (first columns) and their frequency of occurrence (second column).
FILE2:
1 0.9
2 0.7
3 0.5
etc...
I want to make another file containing FILE2 with the values for each switch from FILE1 like this:
1 0.9 2 0.6 3 0.4 ...
2 0.7 1 0.6 3 0.2 ...
Basically, I want first column to be the group number, second the frequency of its occurrence, then the group they switch to and the frequency of that switch, then next switch all in the same line for that particular group, then next line - group 2 etc.
So I want to read in FILE1, make a hash of arrays for each group with keys being group numbers and the values being the group they switch to and the frequency of that switch. I will have one big array for each group containing subarrays of each group they switch to and frequency. Then I want to make another hash with the same keys as in the first hash but with the numbers from the first column in FILE2 and values from the second column of FILE2. Then I will print out "hash2 key hash2 value hash1 whole array for that key". This is my attempt using Perl:
#!/usr/bin/perl -W
$input1= $ARGV[0];
$input2 = $ARGV[1];
$output = $ARGV[2];
%switches=();
open (IN1, "$input1");
while (<IN1>) {
#tmp = split (/\s+/, $_);
chomp #tmp;
$group = shift #tmp;
$switches{$group} = [#tmp];
push (#{$switches{$group}}, [#tmp]);
}
close IN1;
%groups=();
open (IN2, "$input2");
while (<IN2>) {
chomp $_;
($group, $pop) = split (/\s+/, $_);
$groups{$group} = $pop;
}
close IN2;
open (OUT, ">$output");
foreach $group (keys %groups) {
print OUT "$group $pop #{$switches{$group}}\n"
}
close OUT;
The output I get contains something like:
1 0.1 2 0.1 ARRAY(0x100832330)
2 0.3 5 0.2 ARRAY(0x1008325d0)
So basically:
"group" "one last frequency number" "one last group that that group switches to" "one last switch frequency" "smth like ARRAY(0x100832330)"
I assume I am doing smth wrong with pushing all switches into the hash of arrays while in FILE1 and also with dereferencing at the end when I print out.
Please help,
Thanks!
Your %switches hash contains redundant information; just use the push. Also, you need to do more work to print out what you want. Here is your code with minimal changes:
$input1= $ARGV[0];
$input2 = $ARGV[1];
$output = $ARGV[2];
%switches=();
open (IN1, "$input1");
while (<IN1>) {
#tmp = split (/\s+/, $_);
chomp #tmp;
$group = shift #tmp;
push (#{$switches{$group}}, [#tmp]);
}
close IN1;
%groups=();
open (IN2, "$input2");
while (<IN2>) {
chomp $_;
($group, $pop) = split (/\s+/, $_);
$groups{$group} = $pop;
}
close IN2;
open (OUT, ">$output");
foreach $group (sort {$a <=> $b} keys %groups) {
print OUT "$group $groups{$group} ";
for my $aref (#{$switches{$group}}) {
print OUT "#{$aref}";
}
print OUT "\n";
}
close OUT;
__END__
1 0.9 2 0.63 0.4
2 0.7 1 0.63 0.2
3 0.5 1 0.4
See also perldoc perldsc and perldoc Data::Dumper
Since each column represents something of value, instead of an array, you should store your data in a more detailed structure. You can do this via references in Perl.
A reference is a pointer to another data structure. For example, you could store your groups in a hash. However, instead of each hash value containing a bunch of numbers separate by spaces, each hash value instead points to an array that contains the data points for that group. And, each of these data points in that array points to a hash whose keys are SWITCH representing their switching and FREQ for their frequency.
You could talk about the frequency of the first data point of Group 1 as:
$data{1}->[0]->{FREQ};
This way, you can more easily manipulate your data -- even if you're simply rewriting it into another flat file. You can also use the Storable module to write your data in a way which saves its structure.
#! /usr/bin/env perl
#
use strict;
use feature qw(say);
use autodie;
use warnings;
use Data::Dumper;
use constant {
FILE1 => "file1.txt",
FILE2 => "file2.txt",
};
my %data; # A hash of an array of hashes (superfun!)
open my $fh1, "<", FILE1;
while ( my $line = <$fh1> ) {
chomp $line;
my ( $group, $switch, $frequency ) = split /\s+/, $line;
if ( not exists $data{$group} ) {
$data{$group} = [];
}
push #{ $data{$group} }, { SWITCH => $switch, FREQ => $frequency };
}
close $fh1;
open my $fh2, "<", FILE2;
while ( my $line = <$fh2> ) {
chomp $line;
my ( $group, $frequency ) = split /\s+/, $line;
if ( not exists $data{$group} ) {
$data{$group} = [];
}
push #{ $data{$group} }, { SWITCH => undef, FREQ => $frequency };
}
close $fh2;
say Dumper \%data;
This will give you:
$VAR1 = {
'1' => [
{
'SWITCH' => '2',
'FREQ' => '0.6'
},
{
'SWITCH' => '3',
'FREQ' => '0.4'
},
{
'SWITCH' => undef,
'FREQ' => '0.9'
}
],
'3' => [
{
'SWITCH' => '1',
'FREQ' => '0.4'
},
{
'SWITCH' => undef,
'FREQ' => '0.5'
}
],
'2' => [
{
'SWITCH' => '1',
'FREQ' => '0.6'
},
{
'SWITCH' => '3',
'FREQ' => '0.2'
},
{
'SWITCH' => undef,
'FREQ' => '0.7'
}
]
};
This will do what you need.
I apologize for the lack of analysis, but it is late and I should be in bed.
I hope this helps.
use strict;
use warnings;
my $fh;
my %switches;
open $fh, '<', 'file1.txt' or die $!;
while (<$fh>) {
my ($origin, #switch) = split;
push #{ $switches{$origin} }, \#switch;
}
open $fh, '<', 'file2.txt' or die $!;
while (<$fh>) {
my ($origin, $freq) = split;
my $switches = join ' ', map join(' ', #$_), #{ $switches{$origin} };
print join(' ', $origin, $freq, $switches), "\n";
}
output
1 0.9 2 0.6 3 0.4
2 0.7 1 0.6 3 0.2
3 0.5 1 0.4
Update
Here is a fixed version of your own code that produces similar results. The main problem is that the values in your %switches arrays of arrays, so you have to do two dereferences. I've fixed that by adding #switches, which contains the same contents as the current %switches value, but has strings in place of two-element arrays.
I've also added use strict and use warnings, and declared all your variables properly. The open calls have been changed to the three-argument open with lexical file handles as they should be, and they are now being checked for success. I've changed your split calls, as a simple bare split with no parameters is all you need. And I've removed your #tmp and used proper list assignments instead. Oh, and I've changed the wasteful [#array] to a simple \#array (which wouldn't have worked without declaring variables using my).
I still think my version is better, if only because it's much shorter, and yours prints the groups in random order.
#!/usr/bin/perl
use strict;
use warnings;
my ($input1, $input2, $output) = #ARGV;
my %switches;
open my $in1, '<', $input1 or die $!;
while (<$in1>) {
my ($group, #switches) = split;
push #{ $switches{$group} }, \#switches;
}
close $in1;
my %groups;
open my $in2, '<', $input2 or die $!;
while (<$in2>) {
my ($group, $pop) = split;
$groups{$group} = $pop;
}
close $in2;
open my $out, '>', $output or die $!;
for my $group (keys %groups) {
my $pop = $groups{$group};
my #switches = map "#$_", #{ $switches{$group} };
print $out "$group $pop #switches\n"
}
close $out or die $!;
Related
I have text file that matches the following format:
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
And I need to read it into a data structure in perl that will allow me to sort and print according to any of those columns.
From left to right the columns are process_id, memory_size, cpu_time and program_name.
How can I read a text file with formatting like that in a way that allows me to sort the data structure and print it according to the sort?
My attempt so far:
my %tasks;
sub open_file{
if (open (my $input, "task_file" || die "$!\n")){
print "Success!\n";
while( my $line = <$input> ) {
chomp($line);
($process_id, $memory_size, $cpu_time, $program_name) = split( /\s/, $line, 4);
$tasks{$process_id} = $process_id;
$tasks{$memory_size} = $memory_size;
$tasks{$cpu_time} = $cpu_time;
$tasks{$program_name} = $program_name;
print "$tasks{$process_id} $tasks{$memory_size} $tasks{$cpu_time} $tasks{$program_name}\n";
}
This does print the output correctly, however I can't figure out how to then sort my resulting %tasks hash by a specific column (i.e. process_id, or any other column) and print the whole data structure in a sorted format.
You're storing the values under keys that are equal to the values. Use Data::Dumper to inspect the structure:
use Data::Dumper;
# ...
print Dumper(\%tasks);
You can store the pids in a hash of hashes, using the value of each column as the inner key.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw{ say };
my #COLUMNS = qw( memory cpu program );
my %sort_strings = ( program => sub { $a cmp $b } );
my (%process_details, %sort);
while (<DATA>) {
my ($process_id, $memory_size, $cpu_time, $program_name) = split;
$process_details{$process_id} = { memory => $memory_size,
cpu => $cpu_time,
program => $program_name };
undef $sort{memory}{$memory_size}{$process_id};
undef $sort{cpu}{$cpu_time}{$process_id};
undef $sort{program}{$program_name}{$process_id};
}
say 'By pid:';
say join ', ', $_, #{ $process_details{$_} }{#COLUMNS}
for sort { $a <=> $b } keys %process_details;
for my $column (#COLUMNS) {
say "\nBy $column:";
my $cmp = $sort_strings{$column} || sub { $a <=> $b };
for my $value (sort $cmp keys %{ $sort{$column} }
) {
my #pids = keys %{ $sort{$column}{$value} };
say join ', ', $_, #{ $process_details{$_} }{#COLUMNS}
for #pids;
}
}
__DATA__
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
But if the data aren't really large and the sorting isn't time critical, just sorting the whole array of arrays by a given column is much easier to write and read:
#!/usr/bin/perl
use strict;
use feature qw{ say };
use warnings;
use enum qw( PID MEMORY CPU PROGRAM );
my #COLUMN_NAMES = qw( pid memory cpu program );
my %sort_strings = ((PROGRAM) => 1);
my #tasks;
push #tasks, [ split ] while <DATA>;
for my $column_index (0 .. $#COLUMN_NAMES) {
say "\nBy $COLUMN_NAMES[$column_index]:";
my $sort = $sort_strings{$column_index}
? sub { $a->[$column_index] cmp $b->[$column_index] }
: sub { $a->[$column_index] <=> $b->[$column_index] };
say "#$_" for sort $sort #tasks;
}
__DATA__
...
You need to install the enum distribution.
I can't figure out how to then sort my resulting %tasks hash by a specific column
You can't sort a hash. You need to convert each of your input rows in a hash (which you're doing successfully) and then store all of those hashes in an array. You can then print the contents of the array in a sorted order.
This seems to do what you want:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my #cols = qw[process_id memory_size cpu_time program_name];
#ARGV or die "Usage: $0 [sort_order]\n";
my $sort = lc shift;
if (! grep { $_ eq $sort } #cols ) {
die "$sort is not a valid sort order.\n"
. "Valid sort orders are: ", join('/', #cols), "\n";
}
my #data;
while (<DATA>) {
chomp;
my %rec;
#rec{#cols} = split;
push #data, \%rec;
}
if ($sort eq $cols[-1]) {
# Do a string sort
for (sort { $a->{$sort} cmp $b->{$sort} } #data) {
say join ' ', #{$_}{#cols};
}
} else {
# Do a numeric sort
for (sort { $a->{$sort} <=> $b->{$sort} } #data) {
say join ' ', #{$_}{#cols};
}
}
__DATA__
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
I've used the built-in DATA filehandle to make the code simpler. You would need to replace that with some code to read from an external file.
I've used a hash slice to simplify reading the data into a hash.
The column that you want to sort by is passed into the program as a command-line argument.
Note that you have to sort the last column (the program name) using string comparison and all other columns using numeric comparison.
This decides how to sort using the first argument the script receives.
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
open my $fh, '<', 'task_file';
my #tasks;
my %sort_by = (
process_id=>0,
memory_size=>1,
cpu_time=>2,
program_name=>3
);
my $sort_by = defined $sort_by{defined $ARGV[0]?$ARGV[0]:0} ? $sort_by{$ARGV[0]} : 0;
while (<$fh>) {
push #tasks, [split /\s+/, $_];
}
#tasks = sort {
if ($b->[$sort_by] =~ /^[0-9]+$/ ) {
$b->[$sort_by] <=> $a->[$sort_by];
} else {
$a->[$sort_by] cmp $b->[$sort_by];
}
} #tasks;
for (#tasks) {
say join ' ', #{$_};
}
Although my code runs without throwing a fatal error, the output is clearly erroneous. I first create a hash of arrays. Then I search sequences in a file against the keys in the hash. If the sequence exists as a key in the hash, I print the key and the associated values. This should be simple enough and I am creating the hash of arrays correctly. However, when I print the associated values I get "ARRAY(0x7ff4bbb0c7b8)" in its place.
The file "INFILE" is tab delimitated and looks like this, for example:
AAAAA AAAAA
BBBBB BBBBB BBBBB
Here is my code:
use strict;
use warnings;
open(INFILE, '<', '/path/to/file') or die $!;
my $count = 0;
my %hash = (
AAAAA => [ "QWERT", "YUIOP" ],
BBBBB => [ "ASDFG", "HJKL", "ZXCVB" ],
);
while (my $line = <INFILE>){
chomp $line;
my $hash;
my #elements = split "\t", $line;
my $number = grep exists $hash{$_}, #elements;
open my $out, '>', "/path/out/Cluster__Number$._$number.txt" or die $!;
foreach my $sequence(#elements){
if (exists ($hash{$sequence})){
print $out ">$sequence\n$hash{$sequence}\n";
}
else
{
$count++;
print "Key Doesn't Exist ", $count, "\n";
}
}
}
Current output looks like:
>AAAAA
ARRAY(0x7fc52a805ce8)
>AAAAA
ARRAY(0x7fc52a805ce8)
Expected output will look like:
>AAAAA
QWERT
>AAAAA
YUIOP
Thank you very much for your help.
In this line:
print $out ">$sequence\n$hash{$sequence}\n";
...$hash{$sequence} is a reference to an array. You have to dereference the referenced array before printing it. Here's an example of printing $sequence, then printing the elements of the $hash{$sequence} array on the following line, with the elements separated by a comma:
print $out ">$sequence\n";
print $out join ', ', #{ $hash{$sequence} };
The key here is work with the arrayref held by the hash rather than just trying to print it. No matter what, you are going to want to remove the first item from the array, you can do this with the shift function. You can then either push the item onto the end of the array, or delete the key from the hash when there are no more items depending on what you want to happen when all keys have been used once. You could also choose a random element from the array with the rand function like this:
my $out_seq = $hash{$sequence}[rand $#{ $hash{$sequence} }];
If you wanted the items to run out in the random case, you would need to remove the item from the array. The best way to do that is probably with splice (the generic form of shift, unshift, pop, and push):
my $out_seq = splice #{ $hash{$sequence} }, rand #{ $hash{$sequence} }, 1;
delete $hash{$sequence} unless #{ $hash{$sequence} };
Here is my version of your program:
#!/usr/bin/perl
use strict;
use warnings;
use strict;
use warnings;
# open my $in, '<', '/path/to/file') or die $!;
my $in = \*DATA; #use internal data file instead for testing
my $count = 0;
my %hash = (
AAAAA => [ "QWERT", "YUIOP" ],
BBBBB => [ "ASDFG", "HJKL", "ZXCVB" ],
);
while (<$in>) {
chomp;
my $hash;
my #elements = split "\t";
my $number = grep exists $hash{$_}, #elements;
#open my $out, '>', "/path/out/Cluster__Number$._$number.txt" or die $!;
my $out = \*STDOUT; # likewise use STDOUT for testing
for my $sequence (#elements) {
if (exists $hash{$sequence}) {
my $out_seq = shift #{ $hash{$sequence} };
# if you want to repeat
push #{ $hash{$sequence} }, $out_seq;
# if you want to remove $sequence when they run out
# delete $hash{$sequence} unless #{ $hash{$sequence} };
print $out ">$sequence\n$out_seq\n";
} else {
warn "Key [$sequence] Doesn't Exist ", ++$count, "\n";
}
}
}
__DATA__
AAAAA AAAAA
CCCCC
BBBBB BBBBB BBBBB
example of file content:
>random sequence 1 consisting of 500 residues.
VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV
HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;
>random sequence 2 consisting of 500 residues.
KAAATKKPWADTIPYLLCTFMQTSGLEWLHTDYNNFSSVVCVRYFEQFWVQCQDHVFVKN
KNWHQVLWEEYAVIDSMNFAWPPLYQSVSSNLDSTERMMWWWVYYQFEDNIQIRMEWCNI
YSGFLSREKLELTHNKCEVCVDKFVRLVFKQTKWVRTMNNRRRVRFRGIYQQTAIQEYHV
HQKIIRYPCHVMQFHDPSAPCDMTRQGKRMNFCFIIFLYTLYEVKYWMHFLTYLNCLEHR;
>random sequence 3 consisting of 500 residues.
AYCSCWRIHNVVFQKDVVLGYWGHCWMSWGSMNQPFHRQPYNKYFCMAPDWCNIGTYAWK
I need an algorithm to build a hash $hash{$key} = $value; where lines starting with > are the values and following lines are the keys.
What I have tried:
open (DATA, "seq-at.txt") or die "blabla";
#data = <DATA>;
%result = ();
$k = 0;
$i = 0;
while($k != #data) {
$info = #data[$k]; #istrina pirma elementa
if(#data[$i] !=~ ">") {
$key .= #data[$i]; $i++;
} else {
$k = $i;
}
$result{$key} = $value;
}
but it doesn't work.
You don't have to previously use an array, you can directly build your hash:
use strict;
use warnings;
# ^- start always your code like this to see errors and what is ambiguous
# declare your variables using "my" to specify the scope
my $filename = 'seq-at.txt';
# use the 3 parameters open syntax to avoid to overwrite the file:
open my $fh, '<', $filename or die "unable to open '$filename' $!";
my %hash;
my $hkey = '';
my $hval = '';
while (<$fh>) {
chomp; # remove the newline \n (or \r\n)
if (/^>/) { # when the line start with ">"
# store the key/value in the hash if the key isn't empty
# (the key is empty when the first ">" is encountered)
$hash{$hkey} = $hval if ($hkey);
# store the line in $hval and clear $hkey
($hval, $hkey) = $_;
} elsif (/\S/) { # when the line isn't empty (or blank)
# append the line to the key
$hkey .= $_;
}
}
# store the last key/val in the hash if any
$hash{$hkey} = $hval if ($hkey);
# display the hash
foreach (keys %hash) {
print "key: $_\nvalue: $hash{$_}\n\n";
}
It is unclear what you want, the array seems to be the lines subsequent to the random sequence number... If the contenst of a file test.txt are:
Line 1:">"random sequence 1 consisting of 500 residues.
Line 2:VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV
Line 3:HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;
You could try something like:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $contentFile = $ARGV[0];
my %testHash = ();
my $currentKey = "";
open(my $contentFH,"<",$contentFile);
while(my $contentLine = <$contentFH>){
chomp($contentLine);
next if($contentLine eq ''); # Empty lines.
if($contentLine =~ /^"\>"(.*)/){
$currentKey= $1;
}else{
push(#{$testHash{$currentKey}},$contentLine);
}
}
print Dumper(\%testHash);
Which results in a structure like this:
seb#amon:[~]$ perl test.pl test.txt
$VAR1 = {
'random sequence 3 consisting of 500 residues.' => [
'AYCSCWRIHNVVFQKDVVLGYWGHCWMSWGSMNQPFHRQPYNKYFCMAPDWCNIGTYAWK'
],
'random sequence 1 consisting of 500 residues.' => [
'VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV',
'HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;'
],
'random sequence 2 consisting of 500 residues.' => [
'KAAATKKPWADTIPYLLCTFMQTSGLEWLHTDYNNFSSVVCVRYFEQFWVQCQDHVFVKN',
'KNWHQVLWEEYAVIDSMNFAWPPLYQSVSSNLDSTERMMWWWVYYQFEDNIQIRMEWCNI',
'YSGFLSREKLELTHNKCEVCVDKFVRLVFKQTKWVRTMNNRRRVRFRGIYQQTAIQEYHV',
'HQKIIRYPCHVMQFHDPSAPCDMTRQGKRMNFCFIIFLYTLYEVKYWMHFLTYLNCLEHR;'
]
};
You would be basically using each hash "value" as an array structure, the #{$variable} does the magic.
I am new to perl, and I am trying to separate a csv file (has 10 comma-separated items per line) into a key (first item) and an array (9 items) to put in a hash. Eventually, I want to use an if function to match another variable to the key in the hash and print out the elements in the array.
Here's the code I have, which doesn't work right.
use strict;
use warnings;
my %hash;
my $in2 = "metadata1.csv";
open IN2, "<$in2" or die "Cannot open the file: $!";
while (my $line = <IN2>) {
my ($key, #value) = split (/,/, $line, 2);
%hash = (
$key => #value
);
}
foreach my $key (keys %hash)
{
print "The key is $key and the array is $hash{$key}\n";
}
Thank you for any help!
Don't use 2 as the third argument to split: it will split the line to only two elements, so there'll be just one #value.
Also, by doing %hash =, you're overwriting the hash in each iteration of the loop. Just add a new key/value pair:
$hash{$key} = \#value;
Note the \ sign: you can't store an array directly as a hash value, you have to store a reference to it. When printing the value, you have to dereference it back:
#! /usr/bin/perl
use warnings;
use strict;
my %hash;
while (<DATA>) {
my ($key, #values) = split /,/;
$hash{$key} = \#values;
}
for my $key (keys %hash) {
print "$key => #{ $hash{$key} }";
}
__DATA__
id0,1,2,a
id1,3,4,b
id2,5,6,c
If your CSV file contains quoted or escaped commas, you should use Text::CSV.
First of all hash can have only one unique key, so when you have lines like these in your CSV file:
key1,val11,val12,val13,val14,val15,val16,val17,val18,val19
key1,val21,val22,val23,val24,val25,val26,val27,val28,val29
after adding both key/value pairs with 'key1' key to the hash, you'll get just one pair saved in the hash, the one that were added to the hash later.
So to keep all records, the result you probably need array of hashes structure, where value of each hash is an array reference, like this:
#result = (
{ 'key1' => ['val11','val12','val13','val14','val15','val16','val17','val18','val19'] },
{ 'key1' => ['val21','val22','val23','val24','val25','val26','val27','val28','val29'] },
{ 'and' => ['so on'] },
);
In order to achieve that your code should become like this:
use strict;
use warnings;
my #AoH; # array of hashes containing data from CSV
my $in2 = "metadata1.csv";
open IN2, "<$in2" or die "Cannot open the file: $!";
while (my $line = <IN2>) {
my #string_bits = split (/,/, $line);
my $key = $string_bits[0]; # first element - key
my $value = [ #string_bits[1 .. $#string_bits] ]; # rest get into arr ref
push #AoH, {$key => $value}; # array of hashes structure
}
foreach my $hash_ref (#AoH)
{
my $key = (keys %$hash_ref)[0]; # get hash key
my $value = join ', ', #{ $hash_ref->{$key} }; # join array into str
print "The key is '$key' and the array is '$value'\n";
}
I have two files the first one has a number range and a version name, the number range is retrieved from the second file which is consist of a list number. From the second file I am looking for the numbers start in position 11 for 9 char then compare it with my first file "the range file" then print to the screen the name of the version and how many matches.
My first file looks like this
imb,folded ,655575645,827544086
imb,selfmail ,827549192,827572977
My second file looks like this
0026110795165557564528452972062
0026110795165557648628452974959
0026110795182749420290503162401
0026110795182749566690703875348
0026110795182750564290503365856
0026110795182751155490713282618
0026110795182751819190503415474
0026110795182752054790503331977
0026110795182752888194578410931
0026110795182753115893308242647
0026110795182753522398248322033
0026110795182753601890723246006
0026110795182754156995403760702
0026110795182754174597213102232
0026110795182754408698248770395
0026110795182754919290713221614
0026110795182755128698248922635
0026110795182755566790713334451
0026110795182755669490713213633
0026110795182755806390507009696
0026110795182756204890713212248
0026110795182756217690713273839
0026110795182756259998248961157
0026110795182756309595403769515
0026110795182756708894578164887
0026110795182756829090713282238
0026110795182757082791367220156
0026110795182757130090713274108
0026110795182757297798248934527
0026110795182757370277063564556
My output now looks like this
folded IMB Count: 15
No Matched IMB Count: 1
selfmail IMB Count: 14
I need to create files with a name based on the version name in my first array, then to print to each files the original value for what it match. For instance folded has 15 match I need to print the original number from the file list to a file with a name of folded.txt
my code is
#!/usr/bin/perl
use warnings
use strict
use feature qw{ say };
sub trimspaces {
my #argsarray = #_;
$argsarray[0] =~ s/^\s+//;
$argsarray[0] =~ s/\s+$//;
return $argsarray[0];
}
open(INPUT , "< D:\\Home\\emahou\\imbfilelist.txt") or die $!;
open(INPUT2 , "< D:\\Home\\emahou\\imbrange.txt") or die $!;
my $n;
my #fh;
my $value;
my #ranges;
my $isMatch;
my $printed;
my $fVersion;
my %versionHash=();
while (<INPUT2>) {
chomp;
my ($version, $from, $to) = (split /,/)[ 1, 2, 3 ];
push #ranges, [ $from, $to, trimspaces($version)];
if (!exists $versionHash{trimspaces($version)})
{
$versionHash{trimspaces($version)}=0;
}
}
$versionHash{"No Matched"}=0;
close INPUT2;
while (<INPUT>) {
$isMatch=0;
$n = substr($_,12-1,9);
for my $r (#ranges) {
if ( $n >= $r->[0] && $n <= $r->[1]) {
$fVersion=$r->[2];
if (exists $versionHash{$fVersion}) {
$versionHash{$fVersion}++;
}
$isMatch=1;
last;
}
}
if (!$isMatch) {
$versionHash{"No Matched"}++;
}
}
foreach my $key (keys %versionHash) {
print STDOUT "$key IMB Count: " . $versionHash{$key} . "\n";
}
close INPUT;
This seems to do as you ask
It works by building a hash %filelist with keys from the second column of imbfilelist.txt and values from, to, fh (the output file handle) and count (the number of records that matched this range
Then the imbrange.txt is read a line at a time, the nine-digit code extracted, and compared with the from and to values of each element of the %filelist hash. If a match is found then the line is printed to the corresponding file handle and the counter is incremented. If the code from this line doesn't match any of the ranges then $none_matched is incremented for output in the summary
use strict;
use warnings;
use 5.010;
use autodie;
chdir 'D:\Home\emahou';
# Build a hash of `version` strings with their `from` and `to` values
open my $fh, '<', 'imbfilelist.txt';
my %filelist;
while ( <$fh> ) {
chomp;
my ($version, $from, $to) = (split /\s*,\s*/)[1,2,3];
$filelist{$version} = { from => $from, to => $to };
}
# Open an output file for each item and set the count to zero
while ( my ($version, $info) = each %filelist ) {
open $info->{fh}, '>', "$version.txt";
$info->{count} = 0;
}
# Filter the data in the range file, printing to the
# appropriate output file and keeping count
open $fh, '<', 'imbrange.txt';
my $none_matched = 0;
while ( my $line = <$fh> ) {
next unless $line =~ /\S/;
chomp $line;
my $code = substr $line, 11, 9;
my $matched = 0;
while ( my ($version, $info) = each %filelist ) {
next unless $code >= $info->{from} and $code <= $info->{to};
print { $info->{fh} } $line, "\n";
++$info->{count};
++$matched;
}
++$none_matched unless $matched;
}
close $_->{fh} for values %filelist;
# Print the summary
while ( my ($version, $info) = each %filelist ) {
print "$version IMB Count: $info->{count}\n"
}
print "None matched IMB Count: $none_matched\n"
output
selfmail IMB Count: 14
folded IMB Count: 15
None matched IMB Count: 1
folded.txt
0026110795165557564528452972062
0026110795165557648628452974959
0026110795182749420290503162401
0026110795182749566690703875348
0026110795182750564290503365856
0026110795182751155490713282618
0026110795182751819190503415474
0026110795182752054790503331977
0026110795182752888194578410931
0026110795182753115893308242647
0026110795182753522398248322033
0026110795182753601890723246006
0026110795182754156995403760702
0026110795182754174597213102232
0026110795182754408698248770395
selfmail.txt
0026110795182754919290713221614
0026110795182755128698248922635
0026110795182755566790713334451
0026110795182755669490713213633
0026110795182755806390507009696
0026110795182756204890713212248
0026110795182756217690713273839
0026110795182756259998248961157
0026110795182756309595403769515
0026110795182756708894578164887
0026110795182756829090713282238
0026110795182757082791367220156
0026110795182757130090713274108
0026110795182757297798248934527