Problems with array references in perl - arrays

I am trying to count how many time DE10 and each of the keys in my ICD10 hash occurs in the same line in my file2.tsv. I further have to divide it by male/female (M/K).
I therefore made a hash called results. Each of the keys in this is named after the key in the ICD10 hash, and they refers to an array of 2 elements, the first counting the male, the second counting the females.
But I get this warning:
Can't use string ("0") as an ARRAY ref while "strict refs"
due to this line:
$results{$key}[1] +=1;
I am a little weak on this reference part, can someone help me with my mistake? thanks a lot
#!/usr/bin/perl -w
use strict;
###################
# loading my hash #
###################
my %icd10;
open(IN, '<', 'myfile.tsv') or die;
while (defined (my $line = <IN>)) {
chomp $line;
$icd10{$line} = 1;
}
close IN;
################
### COUNTING
#################
my %results;
open(IN, '<', 'myfile2.tsv') or die;
while (defined (my $line = <IN>)) {
chomp $line;
my #line = split('\t', $line);
my %hash;
for (my $i = 2; $i < scalar(#line); $i++){
$hash{$line[$i]} = 1;
}
if (grep (m/^DE10/, keys %hash)) {
foreach my $key (keys %icd10){
if (grep (m/^$key/, keys %hash)) {
if (exists $results{$key}) {
if ($line[1] eq 'M') {
$results{$key}[1] +=1;
}
elsif ($line[1] eq 'K'){
$results{$key}[2] +=1;
}
}
else{
if ($line[1] eq 'M') {
$results{$key}=(1,0);
}
elsif ($line[1] eq 'K'){
$results{$key}=(0,1);
}
}

If you want $results{$key} to be a reference to an array, then the parentheses in these two identical sentences $results{$key}=(1,0); should be square brackets, like this: $results{$key}=[1,0];.
To create a reference to an array, you can use backslash operator:
$arrayref = \#array;
To create a reference to an anonymous array you should use square brackets:
$arrayref = [ 'ele1', 'ele2' ];
See perlref for further details.

Related

Is it possible to put elements of array to hash in perl?

example of file content:
>random sequence 1 consisting of 500 residues.
VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV
HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;
>random sequence 2 consisting of 500 residues.
KAAATKKPWADTIPYLLCTFMQTSGLEWLHTDYNNFSSVVCVRYFEQFWVQCQDHVFVKN
KNWHQVLWEEYAVIDSMNFAWPPLYQSVSSNLDSTERMMWWWVYYQFEDNIQIRMEWCNI
YSGFLSREKLELTHNKCEVCVDKFVRLVFKQTKWVRTMNNRRRVRFRGIYQQTAIQEYHV
HQKIIRYPCHVMQFHDPSAPCDMTRQGKRMNFCFIIFLYTLYEVKYWMHFLTYLNCLEHR;
>random sequence 3 consisting of 500 residues.
AYCSCWRIHNVVFQKDVVLGYWGHCWMSWGSMNQPFHRQPYNKYFCMAPDWCNIGTYAWK
I need an algorithm to build a hash $hash{$key} = $value; where lines starting with > are the values and following lines are the keys.
What I have tried:
open (DATA, "seq-at.txt") or die "blabla";
#data = <DATA>;
%result = ();
$k = 0;
$i = 0;
while($k != #data) {
$info = #data[$k]; #istrina pirma elementa
if(#data[$i] !=~ ">") {
$key .= #data[$i]; $i++;
} else {
$k = $i;
}
$result{$key} = $value;
}
but it doesn't work.
You don't have to previously use an array, you can directly build your hash:
use strict;
use warnings;
# ^- start always your code like this to see errors and what is ambiguous
# declare your variables using "my" to specify the scope
my $filename = 'seq-at.txt';
# use the 3 parameters open syntax to avoid to overwrite the file:
open my $fh, '<', $filename or die "unable to open '$filename' $!";
my %hash;
my $hkey = '';
my $hval = '';
while (<$fh>) {
chomp; # remove the newline \n (or \r\n)
if (/^>/) { # when the line start with ">"
# store the key/value in the hash if the key isn't empty
# (the key is empty when the first ">" is encountered)
$hash{$hkey} = $hval if ($hkey);
# store the line in $hval and clear $hkey
($hval, $hkey) = $_;
} elsif (/\S/) { # when the line isn't empty (or blank)
# append the line to the key
$hkey .= $_;
}
}
# store the last key/val in the hash if any
$hash{$hkey} = $hval if ($hkey);
# display the hash
foreach (keys %hash) {
print "key: $_\nvalue: $hash{$_}\n\n";
}
It is unclear what you want, the array seems to be the lines subsequent to the random sequence number... If the contenst of a file test.txt are:
Line 1:">"random sequence 1 consisting of 500 residues.
Line 2:VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV
Line 3:HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;
You could try something like:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $contentFile = $ARGV[0];
my %testHash = ();
my $currentKey = "";
open(my $contentFH,"<",$contentFile);
while(my $contentLine = <$contentFH>){
chomp($contentLine);
next if($contentLine eq ''); # Empty lines.
if($contentLine =~ /^"\>"(.*)/){
$currentKey= $1;
}else{
push(#{$testHash{$currentKey}},$contentLine);
}
}
print Dumper(\%testHash);
Which results in a structure like this:
seb#amon:[~]$ perl test.pl test.txt
$VAR1 = {
'random sequence 3 consisting of 500 residues.' => [
'AYCSCWRIHNVVFQKDVVLGYWGHCWMSWGSMNQPFHRQPYNKYFCMAPDWCNIGTYAWK'
],
'random sequence 1 consisting of 500 residues.' => [
'VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV',
'HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;'
],
'random sequence 2 consisting of 500 residues.' => [
'KAAATKKPWADTIPYLLCTFMQTSGLEWLHTDYNNFSSVVCVRYFEQFWVQCQDHVFVKN',
'KNWHQVLWEEYAVIDSMNFAWPPLYQSVSSNLDSTERMMWWWVYYQFEDNIQIRMEWCNI',
'YSGFLSREKLELTHNKCEVCVDKFVRLVFKQTKWVRTMNNRRRVRFRGIYQQTAIQEYHV',
'HQKIIRYPCHVMQFHDPSAPCDMTRQGKRMNFCFIIFLYTLYEVKYWMHFLTYLNCLEHR;'
]
};
You would be basically using each hash "value" as an array structure, the #{$variable} does the magic.

Access the key value from an associative array

I have the associative array %cart_item, within this is a series of associative arrays. I need to access the value of the keys within %cart_item. I have the following code which iterates on each array key. (I do the equivalent of php's continue if the value is 'meta')
my $key_value;
for (keys %cart_item) {
next if (/^meta$/ || /^\s*$/);
}
I need to do something like this though (although this isn't valid), setting the value of the keys in the loop:
my $key_value;
for $i (keys %cart_item) {
next if (/^meta$/ || /^\s*$/);
$key_value = $i;
# do stuff
}
Could anyone suggest a solution here? Apologies if this is obvious, I'm a Perl newbie. Thanks
I think you are asking for
for my $key (keys %cart_item) {
next if $key =~ /^meta$/ || $key =~ /^\s*$/;
my $val = $cart_item{$key};
...
}
If you're just looking for the value that goes with the key, you can get both at the same time with each:
while (my ($key, $val) = each %cart_item) {
next if $key eq 'meta' || $key =~ /^\s*$/;
...
}
That's the equivalent of PHP's foreach ($cart_item as $key => $val).
I also changed the "meta" check to use simple string equality; no need to use a regular expression for an exact match.
Your original code has
for ( keys %cart_item ) {
next if (/^meta$/ || /^\s*$/);
}
which works fine because the for has no loop control variable so it defaults to Perl's "pronoun" it variable $_. In addition, your regex pattern matches have no object so they also default to $_
Written fully, this would be
for $_ ( keys %cart_item ) {
next if ( $_ =~ /^meta$/ || $+ =~ /^\s*$/);
}
but we don't have to write all of that. Some people hate it; others like me think it's absolute genius
Your non-working code
my $key_value;
for $i (keys %cart_item) {
next if (/^meta$/ || /^\s*$/);
$key_value = $i;
# do stuff
}
does use a loop control control variable $i (bad name for a hash key, by the way). That's all fine except that your regex matches still
my $key_value;
for $i (keys %cart_item) {
next if $i =~ /^meta$/ or $i =~ /^\s*$/;
$key_value = $i;
# do stuff
}
or, better still, stick with $_ and write this
for ( keys %cart_item ) {
next if /^meta$/ or /^\s*$/;
my $key_value = $_;
# do stuff
}

Perl : matching the contents of a file with the contents of an array

I have an array #arr1 where each element is of the form #define A B.
I have another file, f1 with contents:
#define,x,y
#define,p,q
and so on. I need to check if the second value of every line (y, q etc) matches the first value in any element of the array. Example: say the array has an element #define abc 123 and the file has a line #define,hij,abc.
When such a match occurs, I need to add the line #define hij 123 to the array.
while(<$fhDef>) #Reading the file
{
chomp;
$_ =~ tr/\r//d;
if(/#define,(\w+),(\w+)/)
{
my $newLabel = $1;
my $oldLabel = $2;
push #oldLabels, $oldLabel;
push #newLabels, $newLabel;
}
}
foreach my $x(#tempX) #Reading the array
{
chomp $x;
if($x =~ /#define\h{1}\w+\h*0x(\w+)\h*/)
{
my $addr = $1;
unless(grep { $x =~ /$_/ } #oldLabels)
{
next;
}
my $index = grep { $oldLabels[$_] eq $_ } 0..$#oldLabels;
my $new1 = $newLabels[$index];
my $headerLabel1 = $headerLabel."X_".$new1;
chomp $headerLabel1;
my $headerLine = "#define ".$headerLabel1."0x".$addr;
push #tempX, $headerLine;
}
}
This just hangs. No doubt I'm missing something right in front of me, but what??
The canonical way is to use a hash. Hash the array, using the first argument as the key. Then walk the file and check for existence of the key in the hash. I used a HoA (hash of arrays) to handle multiple values for each key (see the last two lines).
#! /usr/bin/perl
use warnings;
use strict;
my #arr1 = ( '#define y x',
'#define abc 123',
);
my %hash;
for (#arr1) {
my ($arg1, $arg2) = (split ' ')[1, 2];
push #{ $hash{$arg1} }, $arg2;
}
while (<DATA>) {
chomp;
my ($arg1, $arg2) = (split /,/)[1, 2];
if ($hash{$arg2}) {
print "#define $arg1 $_\n" for #{ $hash{$arg2} };
}
}
__DATA__
#define,x,y
#define,p,q
#define,hij,abc
#define,klm,abc
As the other answer said, it's better to use a hash. Also, keep in mind that you're doing a
foreach my $x(#tempX)
but you're also doing a
push #tempX, $headerLine;
which means that you're modifying the array on which you're iterating. This is not just bad practice, this also means that you're most likely going to have an infinite loop because of it.

ID tracking while swapping and sorting other two arrays in perl

#! /usr/bin/perl
use strict;
my (#data,$data,#data1,#diff,$diff,$tempS,$tempE, #ID,#Seq,#Start,#End, #data2);
#my $file=<>;
open(FILE, "< ./out.txt");
while (<FILE>){
chomp $_;
#next if ($line =~/Measurement count:/ or $line =~/^\s+/) ;
#push #data, [split ("\t", $line)] ;
my #data = split('\t');
push(#ID, $data[0]);
push(#Seq, $data[1]);
push(#Start, $data[2]);
push(#End, $data[3]);
# push #$data, [split ("\t", $line)] ;
}
close(FILE);
my %hash = map { my $key = "$ID[$_]"; $key => [ $Start[$_], $End[$_] ] } (0..$#ID);
for my $key ( %hash ) {
print "Key: $key contains: ";
for my $value ($hash{$key} ) {
print " $hash{$key}[0] ";
}
print "\n";
}
for (my $j=0; $j <=$#Start ; $j++)
{
if ($Start[$j] > $End[$j])
{
$tempS=$Start[$j];
$Start[$j]=$End[$j];
$End[$j]=$tempS;
}
print"$tempS\t$Start[$j]\t$End[$j]\n";
}
my #sortStart = sort { $a <=> $b } #Start;
my #sortEnd = sort { $a <=> $b } #End;
#open(OUT,">>./trial.txt");
for(my $i=1521;$i>=0;$i--)
{
print "hey";
my $diff = $sortStart[$i] - $sortStart[$i-1];
print "$ID[$i]\t$diff\n";
}
I have three arrays of same length, ID with IDs (string), Start and End with integer values (reading from a file).
I want to loop through all these arrays and also want to keep track of IDs. First am swapping elements in Start with End if Start > End, then I have to sort these two arrays for further application (as I am negating Start[0]-Start[1] for each item in that Start). While sorting, the Id values may change, and as my IDs are unique for each Start and End elements, how can I keep track of my IDs while sorting them?
Three arrays, ID, Start and End, are under my consideration.
Here is a small chunk of my input data:
DQ704383 191990066 191990037
DQ698580 191911184 191911214
DQ724878 191905507 191905532
DQ715191 191822657 191822686
DQ722467 191653368 191653339
DQ707634 191622552 191622581
DQ715636 191539187 191539157
DQ692360 191388765 191388796
DQ722377 191083572 191083599
DQ697520 189463214 189463185
DQ709562 187245165 187245192
DQ540163 182491372 182491400
DQ720940 180753033 180753060
DQ707760 178340696 178340726
DQ725442 178286164 178286134
DQ711885 178250090 178250119
DQ718075 171329314 171329344
DQ705091 171062479 171062503
The above ID, Start, End respectively. If Start > End i swapped them only between those two arrays. But after swapping the descending order may change, but i want them in descending order also their corresponding ID for negation as explained above.
Don't use different arrays, use a hash to keep the related pieces of information together.
#!/usr/bin/perl
use warnings;
use strict;
use enum qw( START END );
my %hash;
while (<>) {
my ($id, $start, $end) = split;
$hash{$id} = [ $start < $end ? ($start, $end)
: ($end, $start) ];
}
my #by_start = sort { $hash{$a}[START] <=> $hash{$b}[START] } keys %hash;
my #by_end = sort { $hash{$a}[END] <=> $hash{$b}[END] } keys %hash;
use Test::More;
is_deeply(\#by_start, \#by_end, 'same');
done_testing();
Moreover, in the data sample you provided, the order of id's is the same regardless of by what you sort them.

regex matching I think

First sorry if I should have added this to my earlier question today, but I now have the below code and am having problems getting things to add up to 100...
use strict;
use warnings;
my #arr = map {int( rand(49) + 1) } ( 1..100 ); # build an array of 100 random numbers between 1 and 49
my #count2;
foreach my $i (1..49) {
my #count = join(',', #arr) =~ m/,$i,/g; # ???
my $count1 = scalar(#count); # I want this $count1 to be the number of times each of the numbers($i) was found within the string/array.
# push(#count2, $count1 ." times for ". $i); # pushing a "number then text and a number / scalar, string, scalar" to an array.
push(#count2, [$count1, $i]);
}
#sort #count2 and print the top 7
my #sorted = sort { $b->[0] <=> $a->[0] } #count2;
my $sum = 0;
foreach my $i (0..$#sorted) { # (0..6)
printf "%d times for %d\n", $sorted[$i][0], $sorted[$i][1];
$sum += $sorted[$i][0]; # try to add up/sum all numbers in the first coloum to make sure they == 100
}
print "Generated $sum random numbers.\n"; # doesn't add up to 100, I think it is because of the regex and because the first number doesn't have a "," in front of it
# seem to be always 96 or 97, 93...
Replace these two lines:
my #count = join(',', #arr) =~ m/,$i,/g; # ???
my $count1 = scalar(#count); # I want this $count1 to be the number of times each of the numbers($i) was found within the string/array.
with this:
my $count1 = grep { $i == $_ } #arr;
grep will return a list of elements where only the expression in {} evaluates to true. This is less error-prone and much more efficient than joining the entire array and using a a regex. Also note that scalar is not necessary since the variable $count1 is scalar, so perl will return the result of grep in scalar context.
You can also get rid of this line:
push(#count2, $count1 ." times for ". $i); # pushing a "number then text and a number / scalar, string, scalar" to an array.
since you are already printing the same information in your last foreach loop.
#!/usr/bin/perl
use strict; use warnings;
use YAML;
my #arr;
$#arr = 99;
my %counts;
for my $i (0 .. 99) {
my $n = int(rand(49) + 1);
$arr[ $i ] = $n;
++$counts{ $n };
}
my #result = map [$_, $counts{$_}],
sort {$counts{$a} <=> $counts{$b} }
keys %counts;
my $sum;
$sum += $_->[1] for #result;
print "Number of draws: $sum\n";
You can probably reuse some well-tested code from List::MoreUtils.
use List::MoreUtils qw/ indexes /;
...
foreach my $i (1..49) {
my #indexes = indexes { $_ == $i } #arr;
my $count1 = scalar( #indexes );
push( #count2, [ $count1, $i ] );
}
If you don't need the warns in the sum loop, then I'd recommend using sum from List:Util.
use List::Util qw/ sum /;
...
my $sum = sum map { $_->[0] } #sorted;
If you insist on the loop, rewrite it as:
foreach my $i ( #sorted ) {

Resources