Perl array and string as method arguments - arrays

I have a method where I give two arguments: one array and one string. The problem is when I initialise two different variables, one for the array and one for the string, I get the first element of the array in the first variable and the second element in the second variable. My question is how can I get the whole array in a variable and the string in a different variable?
Analyze.pm
sub analyze {
my $self = shift;
my ($content, $stringToSearch) = #_;
# my ($stringToSearch) = #_;
print "$stringToSearch";
if (!defined($stringToSearch) or length($stringToSearch) == 0) { die 'stringToSearch is not defined yet! ' }
foreach my $element ($content) {
#print "$element";
my $loc = index($element, $stringToSearch);
# print "$loc\n";
given ($stringToSearch) {
when ($stringToSearch eq "Hi") {
if ($loc != 0) {
print "Searched word is Hi \n";
} else {
print "No word found like this one! "
}
}
#when ($stringToSearch == 'ORIENTED_EDGE') {
# print 'Searched word is ORIENTED_EDGE';
#} # Printed out because i dont need it now!
}
break; # For testing
}
}
example.pm
my #fileContent = ('Hi', 'There', 'Its', 'Me')
my $analyzer = Analyze->new();
$analyzer->analyze(#fileContent, 'Hi');
When I change $content to #content it puts all the values of the array and the string in #content
I hope someone is able to help me. I'm a beginner in Perl. Thanks in advance

You can't pass arrays to subs (or methods), only a list of scalars. As such,
$analyzer->analyze(#lines, 'Hi');
is the same as as
$analyzer->analyze($lines[0], $lines[1], ..., 'Hi');
It means that the following won't work well:
my (#strings, $target) = #_;
Perl doesn't know how many items belonged to the original array, so it places them all in #strings, leaving $target undefined.
Solutions
You can do what you want as follows:
sub analyze {
my $self = shift;
my $target = pop;
my #strings = #_
for my $string (#strings) {
...
}
}
$analyzer->analyze(#lines, 'Hi')
Or without the needless copy:
sub analyze {
my $self = shift;
my $target = pop;
for my $string (#_) {
...
}
}
$analyzer->analyze(#lines, 'Hi')
Passing the target first can be easier.
sub analyze {
my ($self, $target, #strings) = #_;
for my $string (#strings) {
...
}
}
$analyzer->analyze('Hi', #lines)
Or without the needless copy:
sub analyze {
my $self = shift;
my $target = shift;
for my $string (#_) {
...
}
}
$analyzer->analyze('Hi', #lines)
You could also pass a reference to the array (in any order you like) since a reference is a scalar.
sub analyze {
my ($self, $target, $strings) = #_;
for my $string (#$strings) {
...
}
}
$analyzer->analyze('Hi', \#lines)
I would go with the second to last. It follows the same general pattern as the well-known grep.
my #matches = grep { condition($_) } #strings;

I'd just like to add a slightly different method of passing an array to a subroutine.
# Begin main program
my #arr=(1,2,3);
my $i=0;
mysub(\#arr,$i); # Pass the reference to the array
exit; # Exit main program
##########
sub mysub
{my($aref,$i)=#_; # We are receiving an array ref.
my #arr=#$aref; # Now we are back to a regular array.
print "$arr[0]\n$arr[1]\n$arr[2]\n";
return;
}

Related

Error: readline on closed filehandle using Perl

!/usr/bin/perl
use Cwd;
use warnings;
open($fh,'<', "clinical.txt");
$line = <$fh>;
while($line = <$fh>)
{
my #fields2 =split(" ",$line);
push(#id,$fields2[1]);
push(#status,$fields2[3]);
}
#flow = grep { -d } glob "*";
$arrSize = #flow;
for ($p = 0; $p < $arrSize; $p++)
{
#files=<$flow[$p]/*.maf>;
print $flow[$p],"\n";
foreach $file(#files)
{
print $file,"\n";
open(x,$file);
%hash={};
%tested={};
%Mut_Count={};
$hyper=0;
$line = <x>;
$line = <x>;
$line = <x>;
$line = <x>;
$line = <x>;
$line = <x>;
while($line = <x>)
{
#temp=split("\t",$line);
$key=$temp[4]."_".$temp[5]."_".$temp[6]."_".$temp[10]."_".$temp[11]."_".$temp[12]."_".$temp[15];
push #{$hash{$key}}, "0";
push #{$Mut_Count{$temp[15]}}, "0";
}
#nm=split(/\./,$file);
open(x,$file);
open(Out1,">Results/".$nm[1]."_Hyper.txt");
$line = <x>;
$line = <x>;
$line = <x>;
$line = <x>;
$line = <x>;
#temp=split(" ",$line);
#temp2=split(",",$temp[1]);
print Out1 "Gene\tMutation\tType\tdbSNP\tStatus\tPolyphen\tSift";
for($j=0;$j<scalar(#id);$j++){
my #M=split('-', $id[$j]);
for($i=0;$i<scalar(#temp2);$i++)
{
my #N=split('-', $temp2[$i]);
if(scalar(#{$Mut_Count{$temp2[$i]}})>499 && $M[0] eq $N[0] && $M[1] eq $N[1] && $M[2] eq $N[2] && $status[$j] eq 'MSS')
{
print Out1 "\t",$temp2[$i];
$hyper++;
}
}
}
$line = <x>;
while($line = <x>)
{
$hy=0;
#temp=split("\t",$line);
$key=$temp[4]."_".$temp[5]."_".$temp[6]."_".$temp[10]."_".$temp[11]."_".$temp[12];
if(!exists $tested{$key})
{
push #{$tested{$key}}, "0";
print Out1 "\n",$temp[0],"\t",$key,"\t",$temp[8],"\t",$temp[13],"\t",$temp[25],"\t",$temp[72],"\t",$temp[73];
for($i=0;$i<scalar(#temp2);$i++)
{
$key=$temp[4]."_".$temp[5]."_".$temp[6]."_".$temp[10]."_".$temp[11]."_".$temp[12]."_".$temp2[$i];
my #L=split('-', $temp2[$i]);
for($j=0;$j<scalar(#id);$j++){
my #O=split('-', $id[$j]);
if(scalar(#{$Mut_Count{$temp2[$i]}})>499 && $L[0] eq $O[0] && $L[1] eq $O[1] && $L[2] eq $O[2] && $status[$j] eq 'MSS')
{
if(exists $hash{$key})
{
print Out1 "\t1";
$hy++;
}
else
{
print Out1 "\t0";
}
}
}
}
}
}
open(Out3,">Results/".$nm[1]."_Summary.txt");
print Out3 "Hypermutated\t$hyper\n";
}
}
$line = <$fh>; while($line = <$fh>)
these lines are showing readline() on closed filehandle $fh
%hash={}; %tested={}; %Mut_Count={};
on these three lines it says Reference found where even-sized list expected
.maf are basically GDC downloaded files with bit modified header according to our need with unique TCGA IDs.
Whereas, Clinical Info is file contaning TCGA IDs, its source and MSI_Status that tells us whether it is MSI-L, MSI-H or MSS.
I'm reading multiple .maf files and comparing it with clinical_info file
and if the if condition is satisfied that I want it to generate mastertable(write a file)
I'm doing it in windows. Kindly, help me resolve this, Thanks in anticipation.
Answer to
%hash={}; %tested={}; %Mut_Count={}; on these three lines it says Reference found where even-sized list expected
You can get rid of the errors by getting rid of these three lines, though if you accept (please!) the use strict; recommendation you will need to replace them with my %hash; and so forth.
Under Perl, hashes (and arrays) do not need to be initialized. If you wish to initialize one, you assign it another array, or a list, or another hash. The length of the initializer must be even, since it is interpreted as key/value pairs. Your code supplied a hash reference, which will not be expanded by Perl to an empty hash. Hence the error.
If it makes you nervous not to initialize the hash, you can say my %hash = ();.
On the other hand, if you intend to use a scalar as a hash reference, you will need to initialize it. This is where you use the curly brackets, which (among other things) are a hash constructor which returns a reference to the constructed hash. So:
my $hash = {};
$hash->{$key} = $value;
...

Adding a key value pair in hash, by assigning an array in the value => Can't use an undefined value as an ARRAY reference

I'm trying to assign an array in a hash key-value pair as a value of a key. After assigning it i'm trying to dereference it and print the array values from the specific key in an output file as you can see from the code below.
The code is not working well on the array manipulation part. Can someone tell me what I'm doing wrong?
use strict;
use warnings;
use Data::Dumper;
# File input
my $in_file = 'log.txt';
# Output file
my $out_file_name = 'output.csv';
open(my $fout, '>', $out_file_name);
# print header csv
print $fout "Col1\,Col2\,Col3\,Col4\,Col5\n";
# Read the input file
open(FH, '<', $in_file) or die "Could not open file '$in_file' $!";
my #log_file = <FH>;
# print Dumper(#log_file),"\n";
close (FH);
# my #test_val;
my ($read, $ref, $val_county, $val_rec, $val_tar, $val_print, #test_values, $status);
foreach(#log_file) {
# print $_;
if ($_ =~ /\t+(?<county_name>(?!Total).+)\s+/i) {
$ref->{code} = $+{county_name};
$val_county = $ref->{code};
} elsif ($_ =~ /^Total\s+records\s+in\s+TAR\s+\(pr.+\)\:\s+(?<tar_records>.+)$/i) {
$ref->{code} = $val_county;
push(#test_values, $+{tar_records});
$ref->{tar_rec} = \#test_values;
# $val_rec = $ref->{tar_rec};
# $val_rec =~ s/\.//g;
}
&print_file($ref);
}
sub print_file {
my $ref = shift;
my $status = shift;
print $fout join(",", $ref->{code}, [#{$ref->{tar_rec}}]), "\n"; # Line 68
print Dumper($ref);
}
close $fout;
print "Done!","\n";
The code is a providing an error like:
"Can't use an undefined value as an ARRAY reference at test_array_val_hash.pl line 68."
Until the second regex in your forloop block is matched, the $ref->{tar_rec} key will not be assigned a value - and will be undefined. The following snippet - based on your own code - highlights the issue.
#!/usr/bin/perl -w
my #tar_records = (15,35,20);
my $ref = {
code => 'Cork',
tar_rec => \#tar_records,
};
sub print_info {
my $ref = shift;
print join(", ", $ref->{code}, (#{$ref->{tar_rec}})), $/;
}
print_info($ref);
# Once we 'undefine' the relevant key, we witness the afore-
# mentioned error.
undef $ref->{tar_rec};
print_info($ref);
To avoid this error, you could assign an anonymous array reference to $ref->{tar_rec} key before the for loop (since $ref->{tar_rec} is a cumulative value).
# Be sure not to declare $ref twice!
my ($read, $val_county, $val_rec, $val_tar, $val_print, #test_values, $status);
my $ref = {
code => '',
tar_rec => [],
}
P.S. Notice also that I used round brackets rather than square brackets in the join() function (although you actually don't need either).
The problem is that you're calling print_file in the wrong place.
Imagine that you're parsing the file a line at a time. Your code parses the first line and that populates $ref->{code}. But then you call print_file on a partially populated $ref so it doesn't work.
Your code is also not resetting any of the variables used, so as it progresses through the file, the contents of $ref are going to grow.
The code below fixes the first problem by implicitly setting an empty array in $ref->{tar_rec} and only printing out the record when it's starting a new one or when it's finished reading in the file. Since $ref->{tar_rec} is an array it solves the other problem by allowing you to directly push into it rather than relying upon #test_values. Just for added safety it assigns an empty hash to $ref.
if(open(my $fh, '<', $in_file)) {
my $ref;
my $val_county;
foreach(<$fh>) {
# print $_;
if ($_ =~ /\t+(?<county_name>(?!Total).+)\s+/i) {
if(defined($val_county)) {
print_file($ref);
}
$ref={};
$val_county = $+{county_name};
$ref->{code} = $val_county;
$ref->{tar_rec} = [];
} elsif ($_ =~ /^Total\s+records\s+in\s+TAR\s+\(pr.+\)\:\s+(?<tar_records>.+)$
push #{$ref->{tar_rec}}, $+{tar_records};
}
}
if(defined($ref)) {
print_file($ref);
}
close($fh);
} else {
die "Could not open file '$in_file' $!";
}
You're also printing out the array incorrectly
print $fout join(",", $ref->{code}, [#{$ref->{tar_rec}}]), "\n";
you don't need any brackets around #{$ref->{tar_rec}} - it'll be treated as a list of values to pass to join as is.
print $fout join(",", $ref->{code}, #{$ref->{tar_rec}}), "\n";

Indexing the values of string elements from array perl

I have an array which contains some DNA sequences as strings stored in its elements.
Ex: print $array[0]; give an output like this: ACTAG (#the first position in each sequence).
I have written this code that allows me to analyse the first position for each sequence.
#!/usr/bin/perl
$infile = #ARGV[0];
$ws= $ARGV[1];
$wsnumber= $ARGV[2];
open INFILE, $infile or die "Can't open $infile: $!"; # This opens file, but if file isn't there it mentions this will not open
my $sequence = (); # This sequence variable stores the sequences from the .fasta file
my $line; # This reads the input file one-line-at-a-time
while ($line = <INFILE>) {
chomp $line;
if ($ws ne "--ws=") {
print "no flag or invalid flag\n";
last;
}
else {
if($line =~ /^\s*$/) { # This finds lines with whitespaces from the beginning to the ending of the sequence. Removes blank line.
next;
} elsif($line =~ /^\s*#/) { # This finds lines with spaces before the hash character. Removes .fasta comment
next;
} elsif($line =~ /^>/) { # This finds lines with the '>' symbol at beginning of label. Removes .fasta label
next;
} else {
$sequence = $line;
$sequence =~ s/\s//g; # Whitespace characters are removed
#array = split //,$sequence;
$seqlength = length($sequence);}
}
$count=0;
foreach ($array[0]){
if( $array[0] !~ m/A|T|C|G/ ){
next;
}
else {
$count += 1;
$suma += $count;
}
}
}
But I don't know how to modify $array[0] for running this code for each position (I'm only manage to do it for a specific postion (in the above example for the first position).
Can someone help me?
Thank you!
Not sure I understand well, but is that what you want?
Assuming #array contains one sequence by element.
foreach my $seq(#array) {
my #chars = split '', $seq;
foreach my $char(#char) {
if ($char =~ /[ATCG]/) {
# do stuff
}
}
}
You're only looking at one element in your array:
foreach ($array[0]){ ... }
To iterate over the whole array use:
my #array = qw(ATTTCFCGGCTTA);
foreach (#array){
my #split = split('');
foreach (#split){
die "$_ is not a valid character\n" unless /[ATGC]/;
print "$_\n";
}
}

What is the value if you shift beyond the last element of an array?

In this piece of code, shift is used twice, even though the method only takes one parameter:
sub regexVerify ($)
{
my $re = shift;
return sub
{
local $_ = shift;
m/$re/ ? $_ : undef;
};
}
What does this make the value of local $_, once shift is used again? I was (perhaps naively) assuming that shifting into nothingness would result in undef. But if that were true, this line has no meaning, right?:
m/$re/ ? $_ : undef;
The above sub is called like:
regexVerify (qr/^([a-z].*)?$/i);
The second shift is inside the inner sub declaration. That scope will have an entirely new #_ to work with, which won't have anything to do with the #_ passed to the outer subroutine.
regexVerify is a subroutine that returns another subroutine. Presumably you would later invoke that subroutine with an argument:
my $func = regexVerify(qr/^([a-z].*)?$/i);
# $func is now a "code reference" or "anonymous subroutine"
...
if ($func->($foo)) { # invoke the subroutine stored in $func with arg ($foo)
print "$foo is verified.\n";
} else {
print "$foo is not verified!\n";
}
local $_ = shift; doesn't get executed until you make call to anonymous function. ie
my $anon_func = regexVerify (qr/^([a-z].*)?$/i);
# NOW sending arguments in #_ for local $_ = shift;
print $anon_func->("some string");

I can't access hash values

I have a program that creates an array of hashes while parsing a FASTA file. Here is my code
use strict;
use warnings;
my $docName = "A_gen.txt";
my $alleleCount = 0;
my $flag = 1;
my $tempSequence;
my #tempHeader;
my #arrayOfHashes = ();
my $fastaDoc = open(my $FH, '<', $docName);
my #fileArray = <$FH>;
for (my $i = 0; $i <= $#fileArray; $i++) {
if ($fileArray[$i] =~ m/>/) { # creates a header for the hashes
$flag = 0;
$fileArray[$i] =~ s/>//;
$alleleCount++;
#tempHeader = split / /, $fileArray[$i];
pop(#tempHeader); # removes the pointless bp
for (my $j = 0; $j <= scalar(#tempHeader)-1; $j++) {
print $tempHeader[$j];
if ($j < scalar(#tempHeader)-1) {
print " : "};
if ($j == scalar(#tempHeader) - 1) {
print "\n";
};
}
}
# push(#arrayOfHashes, "$i");
if ($fileArray[$i++] =~ m/>/) { # goes to next line
push(#arrayOfHashes, {
id => $tempHeader[0],
hla => $tempHeader[1],
bpCount => $tempHeader[2],
sequence => $tempSequence
});
print $arrayOfHashes[0]{id};
#tempHeader = ();
$tempSequence = "";
}
$i--; # puts i back to the current line
if ($flag == 1) {
$tempSequence = $tempSequence.$fileArray[$i];
}
}
print $arrayOfHashes[0]{id};
print "\n";
print $alleleCount."\n";
print $#fileArray +1;
My problem is when the line
print $arrayOfHashes[0]{id};
is called, I get an error that says
Use of uninitialized value in print at fasta_tie.pl line 47, line 6670.
You will see in the above code I commented out a line that says
push(#arrayOfHashes, "$i");
because I wanted to make sure that the hash works. Also the data prints correctly in the
desired formatting. Which looks like this
HLA:HLA00127 : A*74:01 : 2918
try to add
print "Array length:" . scalar(#arrayOfHashes) . "\n";
before
print $arrayOfHashes[0]{id};
So you can see, if you got some content in your variable. You can also use the module Data::Dumper to see the content.
use Data::Dumper;
print Dumper(\#arrayOfHashes);
Note the '\' before the array!
Output would be something like:
$VAR1 = [
{
'sequence' => 'tempSequence',
'hla' => 'hla',
'bpCount' => 'bpCount',
'id' => 'id'
}
];
But if there's a Module for Fasta, try to use this. You don't have to reinvent the wheel each time ;)
First you do this:
$fileArray[$i] =~ s/>//;
Then later you try to match like this:
$fileArray[$i++] =~ m/>/
You step through the file array, removing the first "greater than" sign in the line on each line. And then you want to match the current line by that same character. That would be okay if you only want to push the line if it has a second "greater than", but you will never push anything into the array if you only expect 1, or there turns out to be only one.
Your comment "puts i back to the current line" shows what you were trying to do, but if you only use it once, why not use the expression $i + 1?
Also, because you're incrementing it post-fix and not using it for anything, your increment has no effect. If $i==0 before, then $fileArray[$i++] still accesses $fileArray[0], only $i==1 after the expression has been evaluated--and to no effect--until being later decremented.
If you want to peek ahead, then it is better to use the pre-fix increment:
if ($fileArray[++$i] =~ m/>/) ...

Resources