I have a program that creates an array of hashes while parsing a FASTA file. Here is my code
use strict;
use warnings;
my $docName = "A_gen.txt";
my $alleleCount = 0;
my $flag = 1;
my $tempSequence;
my #tempHeader;
my #arrayOfHashes = ();
my $fastaDoc = open(my $FH, '<', $docName);
my #fileArray = <$FH>;
for (my $i = 0; $i <= $#fileArray; $i++) {
if ($fileArray[$i] =~ m/>/) { # creates a header for the hashes
$flag = 0;
$fileArray[$i] =~ s/>//;
$alleleCount++;
#tempHeader = split / /, $fileArray[$i];
pop(#tempHeader); # removes the pointless bp
for (my $j = 0; $j <= scalar(#tempHeader)-1; $j++) {
print $tempHeader[$j];
if ($j < scalar(#tempHeader)-1) {
print " : "};
if ($j == scalar(#tempHeader) - 1) {
print "\n";
};
}
}
# push(#arrayOfHashes, "$i");
if ($fileArray[$i++] =~ m/>/) { # goes to next line
push(#arrayOfHashes, {
id => $tempHeader[0],
hla => $tempHeader[1],
bpCount => $tempHeader[2],
sequence => $tempSequence
});
print $arrayOfHashes[0]{id};
#tempHeader = ();
$tempSequence = "";
}
$i--; # puts i back to the current line
if ($flag == 1) {
$tempSequence = $tempSequence.$fileArray[$i];
}
}
print $arrayOfHashes[0]{id};
print "\n";
print $alleleCount."\n";
print $#fileArray +1;
My problem is when the line
print $arrayOfHashes[0]{id};
is called, I get an error that says
Use of uninitialized value in print at fasta_tie.pl line 47, line 6670.
You will see in the above code I commented out a line that says
push(#arrayOfHashes, "$i");
because I wanted to make sure that the hash works. Also the data prints correctly in the
desired formatting. Which looks like this
HLA:HLA00127 : A*74:01 : 2918
try to add
print "Array length:" . scalar(#arrayOfHashes) . "\n";
before
print $arrayOfHashes[0]{id};
So you can see, if you got some content in your variable. You can also use the module Data::Dumper to see the content.
use Data::Dumper;
print Dumper(\#arrayOfHashes);
Note the '\' before the array!
Output would be something like:
$VAR1 = [
{
'sequence' => 'tempSequence',
'hla' => 'hla',
'bpCount' => 'bpCount',
'id' => 'id'
}
];
But if there's a Module for Fasta, try to use this. You don't have to reinvent the wheel each time ;)
First you do this:
$fileArray[$i] =~ s/>//;
Then later you try to match like this:
$fileArray[$i++] =~ m/>/
You step through the file array, removing the first "greater than" sign in the line on each line. And then you want to match the current line by that same character. That would be okay if you only want to push the line if it has a second "greater than", but you will never push anything into the array if you only expect 1, or there turns out to be only one.
Your comment "puts i back to the current line" shows what you were trying to do, but if you only use it once, why not use the expression $i + 1?
Also, because you're incrementing it post-fix and not using it for anything, your increment has no effect. If $i==0 before, then $fileArray[$i++] still accesses $fileArray[0], only $i==1 after the expression has been evaluated--and to no effect--until being later decremented.
If you want to peek ahead, then it is better to use the pre-fix increment:
if ($fileArray[++$i] =~ m/>/) ...
Related
I'm trying to assign an array in a hash key-value pair as a value of a key. After assigning it i'm trying to dereference it and print the array values from the specific key in an output file as you can see from the code below.
The code is not working well on the array manipulation part. Can someone tell me what I'm doing wrong?
use strict;
use warnings;
use Data::Dumper;
# File input
my $in_file = 'log.txt';
# Output file
my $out_file_name = 'output.csv';
open(my $fout, '>', $out_file_name);
# print header csv
print $fout "Col1\,Col2\,Col3\,Col4\,Col5\n";
# Read the input file
open(FH, '<', $in_file) or die "Could not open file '$in_file' $!";
my #log_file = <FH>;
# print Dumper(#log_file),"\n";
close (FH);
# my #test_val;
my ($read, $ref, $val_county, $val_rec, $val_tar, $val_print, #test_values, $status);
foreach(#log_file) {
# print $_;
if ($_ =~ /\t+(?<county_name>(?!Total).+)\s+/i) {
$ref->{code} = $+{county_name};
$val_county = $ref->{code};
} elsif ($_ =~ /^Total\s+records\s+in\s+TAR\s+\(pr.+\)\:\s+(?<tar_records>.+)$/i) {
$ref->{code} = $val_county;
push(#test_values, $+{tar_records});
$ref->{tar_rec} = \#test_values;
# $val_rec = $ref->{tar_rec};
# $val_rec =~ s/\.//g;
}
&print_file($ref);
}
sub print_file {
my $ref = shift;
my $status = shift;
print $fout join(",", $ref->{code}, [#{$ref->{tar_rec}}]), "\n"; # Line 68
print Dumper($ref);
}
close $fout;
print "Done!","\n";
The code is a providing an error like:
"Can't use an undefined value as an ARRAY reference at test_array_val_hash.pl line 68."
Until the second regex in your forloop block is matched, the $ref->{tar_rec} key will not be assigned a value - and will be undefined. The following snippet - based on your own code - highlights the issue.
#!/usr/bin/perl -w
my #tar_records = (15,35,20);
my $ref = {
code => 'Cork',
tar_rec => \#tar_records,
};
sub print_info {
my $ref = shift;
print join(", ", $ref->{code}, (#{$ref->{tar_rec}})), $/;
}
print_info($ref);
# Once we 'undefine' the relevant key, we witness the afore-
# mentioned error.
undef $ref->{tar_rec};
print_info($ref);
To avoid this error, you could assign an anonymous array reference to $ref->{tar_rec} key before the for loop (since $ref->{tar_rec} is a cumulative value).
# Be sure not to declare $ref twice!
my ($read, $val_county, $val_rec, $val_tar, $val_print, #test_values, $status);
my $ref = {
code => '',
tar_rec => [],
}
P.S. Notice also that I used round brackets rather than square brackets in the join() function (although you actually don't need either).
The problem is that you're calling print_file in the wrong place.
Imagine that you're parsing the file a line at a time. Your code parses the first line and that populates $ref->{code}. But then you call print_file on a partially populated $ref so it doesn't work.
Your code is also not resetting any of the variables used, so as it progresses through the file, the contents of $ref are going to grow.
The code below fixes the first problem by implicitly setting an empty array in $ref->{tar_rec} and only printing out the record when it's starting a new one or when it's finished reading in the file. Since $ref->{tar_rec} is an array it solves the other problem by allowing you to directly push into it rather than relying upon #test_values. Just for added safety it assigns an empty hash to $ref.
if(open(my $fh, '<', $in_file)) {
my $ref;
my $val_county;
foreach(<$fh>) {
# print $_;
if ($_ =~ /\t+(?<county_name>(?!Total).+)\s+/i) {
if(defined($val_county)) {
print_file($ref);
}
$ref={};
$val_county = $+{county_name};
$ref->{code} = $val_county;
$ref->{tar_rec} = [];
} elsif ($_ =~ /^Total\s+records\s+in\s+TAR\s+\(pr.+\)\:\s+(?<tar_records>.+)$
push #{$ref->{tar_rec}}, $+{tar_records};
}
}
if(defined($ref)) {
print_file($ref);
}
close($fh);
} else {
die "Could not open file '$in_file' $!";
}
You're also printing out the array incorrectly
print $fout join(",", $ref->{code}, [#{$ref->{tar_rec}}]), "\n";
you don't need any brackets around #{$ref->{tar_rec}} - it'll be treated as a list of values to pass to join as is.
print $fout join(",", $ref->{code}, #{$ref->{tar_rec}}), "\n";
I am trying to find two character strings in a text file and print them and their frequencies out.
#!/usr/bin/perl
#digram finder
use strict; use warnings;
#finds digrams in a file and prints them and their frequencies out
die "Must input file\n" if (#ARGV != 1);
my ($file) = #ARGV;
my %wordcount;
open (my $in, "<$file") or die "Can't open $file\n";
while (my $words = <$in>){
chomp $words;
my $length = length($words);
for (my $i = 0; $i<$length; $i++){
my $duo = substr($words, $i; 2);
if (not exists $wordcount{$duo}){
$wordcount{$duo} = 1;
}
else {
$wordcount{$duo}++;
}
}
}
foreach my $word (sort {$wordcount{$b} cmp $wordcount{$a}} keys %wordcount){
print "$word\t$wordcount{$duo}\n";
}
close($in);
First I set the text file to a string $words.
Then, I run a for loop and create a substring $duo at each position along $words
If $duo doesn't exist within the hash %wordcount, then the program creates the key $duo
If $duo does exist, then the count for that key goes up by 1
Then the program prints out the digrams and their frequencies, in order of decreasing frequency
When I try to run the code, I get the error message that I forgot to declare $word on line 17 but I do not even have the string $word. I am not sure where this error message is coming from. Can someone help me find where the error is coming from?
Thank you
My best guess is that you actually have $word instead of $words; a typo. If the compilation found the symbol $word in the text then it's probably there.
However, I'd also like to comment on the code. A cleaned up version
while (my $words = <$in>) {
chomp $words;
my $last_duo_idx = length($words) - 2;
for my $i (0 .. $last_duo_idx) {
my $duo = substr($words, $i, 2);
++$wordcount{$duo};
}
}
my #skeys = sort { $wordcount{$b} <=> $wordcount{$a} } keys %wordcount;
foreach my $word (#skeys) {
print "$word\t$wordcount{$word}\n";
}
This runs correctly on a made-up file. (I sort separately only so to not run off of the page.)
Comments
Need to stop one before last in the line, and substr starts from 0; thus -2
One almost never needs a C-style loop
There is no need here to test for existence of a key. If it doesn't exist it is autovivified (created), then incremented to 1 with ++; otherwise the count is incremented.
To sort numerically use <=>, not cmp
Typos:
substr($words, $i; 2) needs a , not ;, so substr($words, $i, 2)
$wordcount{$duo} in print should be $wordcount{$word}.
I am not sure about naming: why is a line of text called $words?
I am having an error saying that prototype not terminated at filename.txt line number 113 where as line number 113 belongs to a different program which is running successfully.
sub howmany(
my #H = #_;
my $m = 0;
foreach $x (#H) {
if ( $x > 5 ) {
$m +=1;
}
else {
$m +=0;
}
}
print "Number of elements greater than 5 is equal to: $m \n";
}
howmany(1,6,9);
The sub keyword should be followed by { } not ( ) (if you define a simple function), that's why the error
prototype not terminated
After this, always start with : use strict; use warnings;
Put this and debug your script, there's more errors.
Last but not least, indent your code properly, using an editor with syntax highlighting, you will save many time debugging
The error is due to parenthesis.
Never do $m += 0; As you actually load processor for nothing. Of course it's not gonna be visible on such a small function, but...
sub howmany {
my $m = 0;
foreach (#_) {
$m++ if ($_ > 5);
}
print "Number of elements greater than 5 is equal to: $m \n";
}
howmany(1,6,9);
I have an array which contains some DNA sequences as strings stored in its elements.
Ex: print $array[0]; give an output like this: ACTAG (#the first position in each sequence).
I have written this code that allows me to analyse the first position for each sequence.
#!/usr/bin/perl
$infile = #ARGV[0];
$ws= $ARGV[1];
$wsnumber= $ARGV[2];
open INFILE, $infile or die "Can't open $infile: $!"; # This opens file, but if file isn't there it mentions this will not open
my $sequence = (); # This sequence variable stores the sequences from the .fasta file
my $line; # This reads the input file one-line-at-a-time
while ($line = <INFILE>) {
chomp $line;
if ($ws ne "--ws=") {
print "no flag or invalid flag\n";
last;
}
else {
if($line =~ /^\s*$/) { # This finds lines with whitespaces from the beginning to the ending of the sequence. Removes blank line.
next;
} elsif($line =~ /^\s*#/) { # This finds lines with spaces before the hash character. Removes .fasta comment
next;
} elsif($line =~ /^>/) { # This finds lines with the '>' symbol at beginning of label. Removes .fasta label
next;
} else {
$sequence = $line;
$sequence =~ s/\s//g; # Whitespace characters are removed
#array = split //,$sequence;
$seqlength = length($sequence);}
}
$count=0;
foreach ($array[0]){
if( $array[0] !~ m/A|T|C|G/ ){
next;
}
else {
$count += 1;
$suma += $count;
}
}
}
But I don't know how to modify $array[0] for running this code for each position (I'm only manage to do it for a specific postion (in the above example for the first position).
Can someone help me?
Thank you!
Not sure I understand well, but is that what you want?
Assuming #array contains one sequence by element.
foreach my $seq(#array) {
my #chars = split '', $seq;
foreach my $char(#char) {
if ($char =~ /[ATCG]/) {
# do stuff
}
}
}
You're only looking at one element in your array:
foreach ($array[0]){ ... }
To iterate over the whole array use:
my #array = qw(ATTTCFCGGCTTA);
foreach (#array){
my #split = split('');
foreach (#split){
die "$_ is not a valid character\n" unless /[ATGC]/;
print "$_\n";
}
}
The purpose of the script is to process all words from a file and output ALL words that occur the most. So if there are 3 words that each occur 10 times, the program should output all the words.
The script now runs, thanks to some tips I have gotten here. However, it does not handle large text files (i.e. the New Testament). I'm not sure if that is a fault of mine or just a limitation of the code. I am sure there are several other problems with the program, so any help would be greatly appreciated.
#!/usr/bin/perl -w
require 5.10.0;
print "Your file: " . $ARGV[0] . "\n";
#Make sure there is only one argument
if ($#ARGV == 0){
#Make sure the argument is actually a file
if (-f $ARGV[0]){
%wordHash = (); #New hash to match words with word counts
$file=$ARGV[0]; #Stores value of argument
open(FILE, $file) or die "File not opened correctly.";
#Process through each line of the file
while (<FILE>){
chomp;
#Delimits on any non-alphanumeric
#words=split(/[^a-zA-Z0-9]/,$_);
$wordSize = #words;
#Put all words to lowercase, removes case sensitivty
for($x=0; $x<$wordSize; $x++){
$words[$x]=lc($words[$x]);
}
#Puts each occurence of word into hash
foreach $word(#words){
$wordHash{$word}++;
}
}
close FILE;
#$wordHash{$b} <=> $wordHash{$a};
$wordList="";
$max=0;
while (($key, $value) = each(%wordHash)){
if($value>$max){
$max=$value;
}
}
while (($key, $value) = each(%wordHash)){
if($value==$max && $key ne "s"){
$wordList.=" " . $key;
}
}
#Print solution
print "The following words occur the most (" . $max . " times): " . $wordList . "\n";
}
else {
print "Error. Your argument is not a file.\n";
}
}
else {
print "Error. Use exactly one argument.\n";
}
Your problem lies in the two missing lines at the top of your script:
use strict;
use warnings;
If they had been there, they would have reported lots of lines like this:
Argument "make" isn't numeric in array element at ...
Which comes from this line:
$list[$_] = $wordHash{$_} for keys %wordHash;
Array elements can only be numbers, and since your keys are words, that won't work. What happens here is that any random string is coerced into a number, and for any string that does not begin with a number, that will be 0.
Your code works fine reading the data in, although I would write it differently. It is only after that that your code becomes unwieldy.
As near as I can tell, you are trying to print out the most occurring words, in which case you should consider the following code:
use strict;
use warnings;
my %wordHash;
#Make sure there is only one argument
die "Only one argument allowed." unless #ARGV == 1;
while (<>) { # Use the diamond operator to implicitly open ARGV files
chomp;
my #words = grep $_, # disallow empty strings
map lc, # make everything lower case
split /[^a-zA-Z0-9]/; # your original split
foreach my $word (#words) {
$wordHash{$word}++;
}
}
for my $word (sort { $wordHash{$b} <=> $wordHash{$a} } keys %wordHash) {
printf "%-6s %s\n", $wordHash{$word}, $word;
}
As you'll note, you can sort based on hash values.
Here is an entirely different way of writing it (I could have also said "Perl is not C"):
#!/usr/bin/env perl
use 5.010;
use strict; use warnings;
use autodie;
use List::Util qw(max);
my ($input_file) = #ARGV;
die "Need an input file\n" unless defined $input_file;
say "Input file = '$input_file'";
open my $input, '<', $input_file;
my %words;
while (my $line = <$input>) {
chomp $line;
my #tokens = map lc, grep length, split /[^A-Za-z0-9]+/, $line;
$words{ $_ } += 1 for #tokens;
}
close $input;
my $max = max values %words;
my #argmax = sort grep { $words{$_} == $max } keys %words;
for my $word (#argmax) {
printf "%s: %d\n", $word, $max;
}
why not just get the keys from the hash sorted by their value and extract the first X?
this should provide an example: http://www.devdaily.com/perl/edu/qanda/plqa00016