Perl Hash to CSV

Perl Hash to CSV - arrays

I have a data file for alarms occuring during the day. The format looks like
2014/04/27-23:42:22.177742- Alarm1
2014/04/27-23:42:22.177744- Alarm2
2014/04/27-23:42:22.177747- Alarm3
2014/04/27-23:42:22.177749- Alarm1
Now I cant guess when any alarm would appear. it depends on the system. What I have done is to insert the data of alarm (e.g Alarm1) into a 2D hash. I take a chunk of 5 minutes everytime and look for alarm that appeared during the 5 minutes. I would add the value into hash, everytime I find a new alarm. In case of a repetition (like Alarm1 above) I would simply add 1 to the value. So in the end it would give me a hash which contains the alarm name and the times it appeared in 5 minutes.
Next I would start processing the next 5 minutes.
I am processing it for the whole day, so it is possible that 1 alarm can start appearing 10 in the morning, so It would be a new entry in to the hash. Now ehen I try to print the values eventually to a CSV, its a mess. Totally makes no sense. What I expect is a csv which should look like
Name,00:00,00:05,00:10,
Alarm1,2,5,2,7,
Alarm2,4,7,3,6
Alarm3,6,1,6,3
...
My code is:
use Time::Local;
use POSIX 'strftime';
use Data::Dumper;
my %outputHash= ();
$curr = timelocal(0, 0, 0, (split /\//, $ARGV[0])[1], (split /\//, $ARGV[0])[0]-1, (split /\//, $ARGV[0])[-1]);
$currentTime = strftime "%Y/%m/%d-%H:%M:%S", localtime($curr);
for ($count = 1; $count <= 288; $count++) { #there are 288 '5 minutes' in a day.
$curr += 300;
$nextTime = strftime "%Y/%m/%d-%H:%M:%S", localtime($curr);
$cmd = "awk '\$0>=from&&\$0<=to' from=\"$currentTime\" to=\"$nextTime\" Output.txt";
my $dataChunk = qx($cmd);
my #lines = split /[\n]+/, $dataChunk;
foreach my $line (#lines) {
chomp;
$timeStamp1 = substr($line,21,6);
#print "\n$timeStamp1\n$error\n";
if ($timeStamp1 != $timeStamp2){
$outputHash{$error}{$count} = $outputHash{$error}{$count} + 1;
}
$ind = index($line,'- ') + 2;
$len = length($line) - $ind;
$error = substr($line,$ind, $len);
$timeStamp2 = $timeStamp1;
}
$currentTime = $nextTime;
# if ($count>3){$count=300;}
}
`>/tmp/report.txt`;
open (MYFILE, '>>/tmp/report.txt');
my #outputArray = ();
my $flag =1;
foreach my $error (sort keys %outputHash)
{
print MYFILE "$error,";
#$outputArray[$flag][0] = $error;
for ($count=1,$count <= 288, $count++)
{
print MYFILE "$outputHash{$error}{$count},";
#$outputArray[$flag][$count] = int($outputHash{$error}{$count});
}
$flag += 1;print MYFILE "\n";
}
close (MYFILE);
#print Dumper(\#outputArray);
exit;
A simplified display of my has looks like this. The reason its haphazard is because Alarm 1 occurred only in '2nd' 5 minutes interval, Alarm 2 occured in 1st only, Alarm 3 occured in 4 consecutive 5 minutes intervals we monitored.
'Alarm1{
'2' => '5'
},
'Alarm2{
'1' => '1'
},
'Alarm3
'4' => '1',
'1' => '2',
'3' => '1',
'2' => '1'
},

Try this out, its best if you use a module that is meant for dealing with CSV.
I chose Class::CSV because it is is simple to use.
#!/usr/bin/perl
use strict;
use warnings;
use Class::CSV;
my %hash = (
'Alarm1' => {'2' => '5', },
'Alarm2' => {'1' => '1', },
'Alarm3' => {
'4' => '1',
'1' => '2',
'3' => '1',
'2' => '1'
},
);
my #fields = qw/AlarmNo 00:00:00 00:05:00 00:10:00 00:15:00/;
my $csv = Class::CSV->new( fields => \#fields );
#make the hash into a suitable array
my #array;
my #keys = keys %hash;
for my $i (0 .. $#keys){
push #{ $array[$i] }, $keys[$i];
for my $inter (1 .. 4){
my $val = '';
if(exists $hash{$keys[$i]}->{$inter}){
$val = $hash{$keys[$i]}->{$inter};
}
push #{ $array[$i] }, $val;
}
}
$csv->add_line($_) for(#array);
print join(',', #fields), "\n"; #Just to make it tidy on the commandline
$csv->print();
So you would use print MYFILE $csv->string to get it into your file.
Edit:
If you can't install Class::CSV check out Text::CSV which may be installed by default.
You can also join the arrays by a comma like so
for(#array){
print join(',', #{$_});
}

Related

How to sort an hash and assign a variable to the keypair from an array in perl?

I have an array like this
my #arr =('1','apple','2','orange','1','orange','3','berry','2','berry','1','berry');
my %hash;
my $var =1;
Now how can i sort and assign a variable to the pair?
The desired output is
$hash{1}{apple} =>1;
$hash{1}{orange} =>1;
$hash{1}{berry} =>1;
$hash{2}{orange} =>1;
$hash{2}{berry} =>1;
$hash{3}{berry} =>1;

You need to iterate your array and take two values out per iteration. One way to do this is with a while loop. This will consume the array, so if you want to keep it you might want to make a copy.
use strict;
use warnings;
use Data::Printer;
my #arr = (
'1', 'apple', '2', 'orange', '1', 'orange',
'3', 'berry', '2', 'berry', '1', 'berry',
);
my %hash;
my $var = 1;
while ( my $first_key = shift #arr ) {
my $second_key = shift #arr;
$hash{$first_key}->{$second_key} = $var;
}
p %hash;
This outputs
{
1 {
apple 1,
berry 1,
orange 1
},
2 {
berry 1,
orange 1
},
3 {
berry 1
}
}
An alternative is to use a C-style for loop. This does not change the array.
for (my $i = 0; $i <= $#arr; $i+=2) {
$hash{ $arr[$i] }->{ $arr[$i + 1] } = $var;
}
Or you could use List::Util's pairs function to get two out at the same time.
use List::Util 'pairs';
foreach my $pair ( pairs #arr ) {
my ( $first_key, $second_key ) = #$pair;
$hash{$first_key}->{$second_key} = $var;
}

It's normally expected that you at least spend a few hours trying to write a solution yourself. We will happily help you if you've made a decent attempt of your own but have run out of ideas, but it doesn't go down well if you appear to have dumped your problem on us and are waiting for an answer to pop up while you drink a mug of coffee. You've been told about this before, and only one of your post has a net positive vote. You need to work on that.
Are you certain that you really want a hash of hashes? This is very reminiscent of your previous question How to find if the value exists in hash without using key in perl? where we pretty much established that it was the wrong choice.
The only non-obvious part is extracting the values from the array in pairs, I and I have used C-style for loop to achieve this.
I have used Data::Dumper only to show the resulting hash of hashes.
use strict;
use warnings 'all';
my #arr = qw/ 1 apple 2 orange 1 orange 3 berry 2 berry 1 berry /;
my %hash;
for ( my $i = 0; $i < $#arr; $i += 2 ) {
$hash{$arr[$i]}{$arr[$i+1]} = 1;
}
use Data::Dumper;
print Dumper \%hash;
output
$VAR1 = {
'2' => {
'berry' => 1,
'orange' => 1
},
'3' => {
'berry' => 1
},
'1' => {
'berry' => 1,
'orange' => 1,
'apple' => 1
}
};
Update
Here's an example of generating the keys as I described in the comments. It's almost identical to the solution above, but the resulting hash contents are different.
use strict;
use warnings 'all';
my #arr = qw/ 1 apple 2 orange 1 orange 3 berry 2 berry 1 berry /;
my %hash;
for ( my $i = 0; $i < $#arr; $i += 2 ) {
$hash{"#arr[$i,$i+1]"} = 1;
}
use Data::Dumper;
print Dumper \%hash;
output
$VAR1 = {
'2 berry' => 1,
'1 apple' => 1,
'3 berry' => 1,
'1 orange' => 1,
'1 berry' => 1,
'2 orange' => 1
};

Take the values from the array two at a time (key/value), put them into a hash, then assign the variable as the value.
use Data::Dumper;
sub SortAndAssign {
my ($args) = #_;
my #arr = #{$args->{ARRAY}};
my $var = $args->{VARIABLE};
my %hash;
my $i = 0;
my $size = scalar(#arr);
while ($i < $size) {
# alternating key/value pairs (really a hash)
my $key = $arr[$i++]; # e.g. 1
my $value = $arr[$i++]; # e.g. apple
$hash{$key}{$value} = $var; # e.g. hash->1->apple = 1
}
return %hash;
}
sub ShowSortAndAssign {
my #arr =('1','apple','2','orange','1','orange','3','berry','2','berry','1','berry');
my $var = 1;
my %hash = SortAndAssign({
ARRAY => \#arr,
VARIABLE => $var,
});
print Dumper(\%hash);
print "first apple is " . $hash{1}{apple};
}
sub _Main {
ShowSortAndAssign();
}
_Main();

How to assign array variable as a hash?

I'm trying to print this hash. As key1 is the array[0], key2 is array[2] and $sum[0] is the value. But the has does not work. What I'm doing wrong?
#array=(10,45,20);
#sum=($array[0]+$array[1]+$array[2]);
%hash;
$hash{$array[0]}{$array[2]}=$sum[0]
At the end of the hashes I want to print 10 : 75 to the screen.

You've set
$hash{$array[0]}{$array[2]} = $sum[0]
which with the given values is
$hash{10}{20} = 75
If you want to print 10 : 75 from the hash then you need to write
printf "%d : %d\n",10, $hash{10}{20}
And while I'm sure you want something more general than that, you really haven't given enough information

From the description you gave to #ikegami «my program will accept ...» I created a file that would have the data:
data_1.txt:
john 10 45 20
alex 30 15 12
pete 23 45 10 21
will 06 56
bob 8 12 3
lazy
note that only the first two lines actually match the description, I'll come back to that later.
sum.pl:
use strict;
use warnings;
use List::Util 'sum';
# get the two filenames it should work with
#
my $filename_1 = shift;
my $filename_2 = shift;
# be sure we read a file for most modern systems, UTF-8
#
open( my $file1, '<:encoding(UTF-8)', $filename_1)
or die "Can't open file: $filename_1";
# create the (empty) data structure
#
my %sums_and_names;
#
# the % in perl means you are talking about a hash,
# use a sensible name instead of 'hash'
# read line by line
while ( my $line = <$file1> ) {
chomp $line; # get rid of the line endings
my ($name, #grades) = split ' ', $line;
#
# this is not strictly doing what you asked for, just more flexible
#
# split on ' ', a space character, splits on any asmount of (white) space
# your task said that there is one space.
# strictly, you could should split on / /, the regular expression
#
# the first part will go into the variable $name, the rest in array #grades
# strictly you have only three grades so the following would do
# my ($name, $grade_1, $grade_2, $grade_3) = split / /, $line;
my $sum = sum(#grades) // 'no grades';
#
# since we now can handle any number of grades, not just three, we could
# have no grades at all and thus result in `undef`
#
# using the function sum0 would return the value 0 instead
#
# you'll get away with the `undef` using in a hash assignment,
# it will turn it into an empty string `''`
=pod
$sums_and_names{foo}{bar} = [ 'baz', 'qux' ];
=cut
#
# here is where your task doesn't make sense
# i am guessing:
#
$sums_and_names{$sum}{$name} = \#grades;
#
# at least we have all the data from filename_1, and the sum of the grades
}
# please decide on what you want to print
use Data::Dumper;
print Dumper \%sums_and_names;
and running perl sum.pl data_1.txt data_2.txt will give you something like
output:
$VAR1 = {
'no grades' => {
'lazy' => []
},
'23' => {
'bob' => [
'8',
'12',
'3'
]
},
'57' => {
'alex' => [
'30',
'15',
'12'
]
},
'62' => {
'will' => [
'06',
'56'
]
},
'75' => {
'john' => [
'10',
'45',
'20'
]
},
'99' => {
'pete' => [
'23',
'45',
'10',
'21'
]
}
};
please note, strictly the block inside the while loop could had been written as:
chomp $line;
my ($name, $grade_1, $grade_2, $grade_3) = split / /, $line;
$sum = $grade_1 + $grade_2 + $grade_3;
$sums_and_names{$sum}{$name} = [ $grade_1, $grade_2, $grade_3 ];
but I quote from #Borodin:
And while I'm sure you want something more general than that, you really haven't given enough information

Always use use strict; use warnings qw( all );!!!
There's only one sum (at a time), so don't need an array.
There's no need for a hash of hash; a simple hash will do.
Fixed:
use strict;
use warnings qw( all );
use List::Util qw( sum );
my %hash;
while (...) {
my #nums = ...;
$hash{ $nums[0] } = sum(#nums);
}
for (sort { $a <=> $b } keys(%hash)) {
print("$_: $hash{$_}\n");
}

combine hashes from two files into a single file

I have two files containing data like this:
FILE1 contains group numbers (first column) and the frequency (third column) of their switching another group (second column):
FILE1:
1 2 0.6
2 1 0.6
3 1 0.4
1 3 0.4
2 3 0.2
etc...
FILE2 contains group numbers (first columns) and their frequency of occurrence (second column).
FILE2:
1 0.9
2 0.7
3 0.5
etc...
I want to make another file containing FILE2 with the values for each switch from FILE1 like this:
1 0.9 2 0.6 3 0.4 ...
2 0.7 1 0.6 3 0.2 ...
Basically, I want first column to be the group number, second the frequency of its occurrence, then the group they switch to and the frequency of that switch, then next switch all in the same line for that particular group, then next line - group 2 etc.
So I want to read in FILE1, make a hash of arrays for each group with keys being group numbers and the values being the group they switch to and the frequency of that switch. I will have one big array for each group containing subarrays of each group they switch to and frequency. Then I want to make another hash with the same keys as in the first hash but with the numbers from the first column in FILE2 and values from the second column of FILE2. Then I will print out "hash2 key hash2 value hash1 whole array for that key". This is my attempt using Perl:
#!/usr/bin/perl -W
$input1= $ARGV[0];
$input2 = $ARGV[1];
$output = $ARGV[2];
%switches=();
open (IN1, "$input1");
while (<IN1>) {
#tmp = split (/\s+/, $_);
chomp #tmp;
$group = shift #tmp;
$switches{$group} = [#tmp];
push (#{$switches{$group}}, [#tmp]);
}
close IN1;
%groups=();
open (IN2, "$input2");
while (<IN2>) {
chomp $_;
($group, $pop) = split (/\s+/, $_);
$groups{$group} = $pop;
}
close IN2;
open (OUT, ">$output");
foreach $group (keys %groups) {
print OUT "$group $pop #{$switches{$group}}\n"
}
close OUT;
The output I get contains something like:
1 0.1 2 0.1 ARRAY(0x100832330)
2 0.3 5 0.2 ARRAY(0x1008325d0)
So basically:
"group" "one last frequency number" "one last group that that group switches to" "one last switch frequency" "smth like ARRAY(0x100832330)"
I assume I am doing smth wrong with pushing all switches into the hash of arrays while in FILE1 and also with dereferencing at the end when I print out.
Please help,
Thanks!

Your %switches hash contains redundant information; just use the push. Also, you need to do more work to print out what you want. Here is your code with minimal changes:
$input1= $ARGV[0];
$input2 = $ARGV[1];
$output = $ARGV[2];
%switches=();
open (IN1, "$input1");
while (<IN1>) {
#tmp = split (/\s+/, $_);
chomp #tmp;
$group = shift #tmp;
push (#{$switches{$group}}, [#tmp]);
}
close IN1;
%groups=();
open (IN2, "$input2");
while (<IN2>) {
chomp $_;
($group, $pop) = split (/\s+/, $_);
$groups{$group} = $pop;
}
close IN2;
open (OUT, ">$output");
foreach $group (sort {$a <=> $b} keys %groups) {
print OUT "$group $groups{$group} ";
for my $aref (#{$switches{$group}}) {
print OUT "#{$aref}";
}
print OUT "\n";
}
close OUT;
__END__
1 0.9 2 0.63 0.4
2 0.7 1 0.63 0.2
3 0.5 1 0.4
See also perldoc perldsc and perldoc Data::Dumper

Since each column represents something of value, instead of an array, you should store your data in a more detailed structure. You can do this via references in Perl.
A reference is a pointer to another data structure. For example, you could store your groups in a hash. However, instead of each hash value containing a bunch of numbers separate by spaces, each hash value instead points to an array that contains the data points for that group. And, each of these data points in that array points to a hash whose keys are SWITCH representing their switching and FREQ for their frequency.
You could talk about the frequency of the first data point of Group 1 as:
$data{1}->[0]->{FREQ};
This way, you can more easily manipulate your data -- even if you're simply rewriting it into another flat file. You can also use the Storable module to write your data in a way which saves its structure.
#! /usr/bin/env perl
#
use strict;
use feature qw(say);
use autodie;
use warnings;
use Data::Dumper;
use constant {
FILE1 => "file1.txt",
FILE2 => "file2.txt",
};
my %data; # A hash of an array of hashes (superfun!)
open my $fh1, "<", FILE1;
while ( my $line = <$fh1> ) {
chomp $line;
my ( $group, $switch, $frequency ) = split /\s+/, $line;
if ( not exists $data{$group} ) {
$data{$group} = [];
}
push #{ $data{$group} }, { SWITCH => $switch, FREQ => $frequency };
}
close $fh1;
open my $fh2, "<", FILE2;
while ( my $line = <$fh2> ) {
chomp $line;
my ( $group, $frequency ) = split /\s+/, $line;
if ( not exists $data{$group} ) {
$data{$group} = [];
}
push #{ $data{$group} }, { SWITCH => undef, FREQ => $frequency };
}
close $fh2;
say Dumper \%data;
This will give you:
$VAR1 = {
'1' => [
{
'SWITCH' => '2',
'FREQ' => '0.6'
},
{
'SWITCH' => '3',
'FREQ' => '0.4'
},
{
'SWITCH' => undef,
'FREQ' => '0.9'
}
],
'3' => [
{
'SWITCH' => '1',
'FREQ' => '0.4'
},
{
'SWITCH' => undef,
'FREQ' => '0.5'
}
],
'2' => [
{
'SWITCH' => '1',
'FREQ' => '0.6'
},
{
'SWITCH' => '3',
'FREQ' => '0.2'
},
{
'SWITCH' => undef,
'FREQ' => '0.7'
}
]
};

This will do what you need.
I apologize for the lack of analysis, but it is late and I should be in bed.
I hope this helps.
use strict;
use warnings;
my $fh;
my %switches;
open $fh, '<', 'file1.txt' or die $!;
while (<$fh>) {
my ($origin, #switch) = split;
push #{ $switches{$origin} }, \#switch;
}
open $fh, '<', 'file2.txt' or die $!;
while (<$fh>) {
my ($origin, $freq) = split;
my $switches = join ' ', map join(' ', #$_), #{ $switches{$origin} };
print join(' ', $origin, $freq, $switches), "\n";
}
output
1 0.9 2 0.6 3 0.4
2 0.7 1 0.6 3 0.2
3 0.5 1 0.4
Update
Here is a fixed version of your own code that produces similar results. The main problem is that the values in your %switches arrays of arrays, so you have to do two dereferences. I've fixed that by adding #switches, which contains the same contents as the current %switches value, but has strings in place of two-element arrays.
I've also added use strict and use warnings, and declared all your variables properly. The open calls have been changed to the three-argument open with lexical file handles as they should be, and they are now being checked for success. I've changed your split calls, as a simple bare split with no parameters is all you need. And I've removed your #tmp and used proper list assignments instead. Oh, and I've changed the wasteful [#array] to a simple \#array (which wouldn't have worked without declaring variables using my).
I still think my version is better, if only because it's much shorter, and yours prints the groups in random order.
#!/usr/bin/perl
use strict;
use warnings;
my ($input1, $input2, $output) = #ARGV;
my %switches;
open my $in1, '<', $input1 or die $!;
while (<$in1>) {
my ($group, #switches) = split;
push #{ $switches{$group} }, \#switches;
}
close $in1;
my %groups;
open my $in2, '<', $input2 or die $!;
while (<$in2>) {
my ($group, $pop) = split;
$groups{$group} = $pop;
}
close $in2;
open my $out, '>', $output or die $!;
for my $group (keys %groups) {
my $pop = $groups{$group};
my #switches = map "#$_", #{ $switches{$group} };
print $out "$group $pop #switches\n"
}
close $out or die $!;

Comparison of two arrays perl

I am new to Perl world, and I have a script that compares two arrays.
I use List::MoreUtils (each_arrayref) to do the comparison.
I have two questions:
1) Is there a way to compare two chunks of arrays (like natatime but for two arrayrefs) instead of comparing single element at a time as in each_arrayref?
The elements should be from the same index from each array.
The data structure is something like this:
{
atr => [qw/ a b c d /],
ats => [qw/ a b c d /],
att => [qw/ a b c d /],
}
This is what I have got so far.
my #lists = keys %{$hash};
for (my $i = 0; $i <= #lists; $i++) {
my $list_one = $lists[$i];
my $one = $hash->{$list_one};
for (my $j = 0 ; $j <= #lists ; $j++) {
my $list_two = $lists[$j];
my $two = $hash->{$list_two};
my ($overlapping, $mismatch, $identity);
my $match = 0;
my $non_match = 0;
my $count_ac_calls = 0;
my $each_array = each_arrayref($one, $two);
while (my ($call_one, $call_two) = $each_array->()) {
if ((defined $call_one) && (defined $call_two)) {
if ($call_one eq $call_two) {
$match++;
}
if ($call_one ne $call_two) {
$non_match++;
}
}
} #end of while loop $each_array->()
print "$list_one,$list_two,$match,$non_match";
} #end of for j loop
} #end of for i loop
I would like to compare atr->ats, atr->att, ats->att. But with my current code, I get repetitions of comparison like ats->atr att->atr,att->ats.
2) How can I avoid those?

I'm not clear what your first question means. Do you want an iterator that, say, returns (('a','b','c'),('a','b','c'))
instead of
('a','a')? If so then there isn't one available in a library, but it wouldn't be hard to write your own.
As for the second, the usual way to avoid items being compared with themselves is to change the inner loop to start after the current value of the first. Like so
for my $i (0..$#lists) {
for my $j ($i+1..$#lists) {
}
}
This works because A eq B is generally the same as B eq A, so there is no point in comparing an entry with one earlier in the list because the inverse comparison has already been made.
Note that it is much better Perl to write for loops this way than the messy C-style syntax. You also have a couple of bugs in
for (my $i = 0 ; $i <= #lists ; $i++) { ... }
because the maximum index of #lists is one less than the scalar value of #lists - usually coded as $#lists. The same problem exists in your loop for $j.
Update
Here is a refactoring of your program, written to include the ideas I have described and to be more Perlish. I hope it is useful to you.
use strict;
use warnings;
use List::MoreUtils 'each_arrayref';
my $hash = {
atr => [qw/ a b c d /],
ats => [qw/ a b c d /],
att => [qw/ a b c d /],
};
my #keys = keys %{$hash};
for my $i (0 .. $#keys) {
my $key1 = $keys[$i];
my $list1 = $hash->{$key1};
for my $j ($i+1 .. $#keys) {
my $key2 = $keys[$j];
my $list2 = $hash->{$key2};
my ($match, $non_match) = (0, 0);
my $iter = each_arrayref($list1, $list2);
while (my ($call1, $call2) = $iter->()) {
if (defined $call1 and defined $call2) {
($call1 eq $call2 ? $match : $non_match)++;
}
}
print "$key1, $key2, $match, $non_match\n";
}
}

One option is to use Array::Compare to return the number of different array elements. Also, Math::Combinatorics is used to obtain only unique comparisons.
use strict;
use warnings;
use Array::Compare;
use Math::Combinatorics;
my %hash = (
'atr' => [ 'a', 'b', 'c', 'd' ],
'ats' => [ 'a', 'b', 'c', 'd' ],
'att' => [ 'a', 'c', 'c', 'd' ],
);
my $comp = Array::Compare->new( DefFull => 1 );
my $combinat = Math::Combinatorics->new(
count => 2,
data => [ keys %hash ],
);
while ( my ($key1, $key2) = $combinat->next_combination ) {
my $diff = $comp->compare( \#{ $hash{$key1} }, \#{ $hash{$key2} } );
print "$key1,$key2," . ( #{ $hash{$key1} } - $diff ) . ",$diff\n";
}
Output:
ats,att,3,1
ats,atr,4,0
att,atr,3,1

You're not really taking advantage of the features Perl has to offer. Rather than use an error prone C-style loop, just use for my $var (LIST). You can also skip redundant list checking by skipping the self-checks, too. I've taken your script, made some alterations, and I'm sure you'll find it a bit easier to read.
use v5.16;
use warnings;
use List::MoreUtils qw{each_arrayref};
my $hash = {
'atr' => [
'a',
'b',
'c',
'd'
],
'ats'=>[
'a',
'b',
'c',
'd'
],
'att' => [
'a',
'c',
'c',
'd'
],
};
for my $list_one (keys $hash) {
my $one = $hash->{$list_one};
for my $list_two (keys $hash) {
next if $list_one ~~ $list_two;
my $two = $hash->{$list_two};
my ($match, $non_match);
$match = $non_match = 0;
my $each_array = each_arrayref($one, $two);
while (my ($call_one, $call_two) = $each_array->()) {
if($call_one && $call_two) {
if($call_one eq $call_two) {
$match++;
}
else {
$non_match++;
}
}
}
print "$list_one,$list_two,$match,$non_match\n";
}
}
You'll want to evaluate one at a time anyway so that you can add in some extra bits like the index location. (Yes, you could use the C-style loop, but that'd be a bit more difficult to read.)

Clustering By Interval Via Hash of Array in Perl

I have a data that looks like this
#Status value
TP 5.000
TP 3.000
TP 3.000
TN 10.000
TP 2.000
TP 9.000
TN 1.000
TP 9.000
TN 1.000
What we want to do is to cluster the Status based on the given interval in value.
Let that interval be 1-3, 4-6, 7-9, 10-12, etc .. (i.e. Bin size 3).
We hope to get the hash of array like this:
my %hoa = (
'1-3' => [TP,TP,TP,TN,TN],
'4-6' => [TP],
'7-9' => [TP,TP],
'10-12' => [TN]);
What's the way to achieve that?
Update: Corrected the HoA for 7-9, thanks to ysth.

Abstracting away the code to determine interval:
sub interval {
my ($val) = #_;
my $i = int( ( $val + 2 ) / 3 );
my $interval = sprintf( '%d-%d', $i * 3 -2, $i * 3 );
return $interval;
}
my %hoa;
while ( my $line = <> ) {
next if $line =~ /^#/;
my ($status, $value) = split ' ', $line;
push #{ $hoa{ interval($value) } }, $status;
}
use Data::Dumper;
print Dumper \%hoa;
(which gets two TPs for 7-9, not one as you show).

ysth's answer was the first thing that occurred to me as well, and I think he has the right approach.
I'd just like to leave a suggestion: you could use a clustering algorithm to do this for you in a future-proof kind of way (say, when your data becomes multidimensional). K-means, for example, would work fine, even for 1D data such as yours.
For example:
use strict; use warnings;
use Algorithm::KMeans;
my $datafile = $ARGV[0] or die;
my $K = $ARGV[1] or 0;
my $mask = 'N1';
my $clusterer = Algorithm::KMeans->new(
datafile => $datafile,
mask => $mask,
K => $K,
terminal_output => 0,
);
$clusterer->read_data_from_file();
my ($clusters, $cluster_centers) = $clusterer->kmeans();
my %clusters;
while (#$clusters) {
my $cluster = shift #$clusters;
my $center = shift #$cluster_centers;
$clusters{"#$center"} = $cluster;
}
use YAML; print Dump \%clusters;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Perl Hash to CSV - arrays

Related

How to sort an hash and assign a variable to the keypair from an array in perl?

How to assign array variable as a hash?

combine hashes from two files into a single file

Comparison of two arrays perl

Clustering By Interval Via Hash of Array in Perl

Categories

Resources