loop through elements of array to find character perl - arrays

I have a perl array where I only want to loop through elements 2-8.
The elements are only meant to contain numbers, so if any of those elements contain a letter, I want to set an error flag = 1, as well as some other variables as seen.
The reason I have 2 error flag variables is due to scope rules within the loop.
fields is an array, I created by splitting another irrelevant array by the " " key.
So, when I try to print error_line2, error_fname2 from outside the loop, I get this:
Use of uninitialized value $error_flag2 in numeric eq (==)
I don't know why, because I've initialized the value within the loop and created the variable outside the loop.
Not sure if I'm even looping to find characters correctly, so then it's not setting the error_flag2 = 1.
Example line:
bob hankerman 2039 3232 23 232 645 64x3 324
since element 7 has the letter 'x' , I want the flag to be set to 1.
#!/usr/bin/perl
use strict;
use warnings;
use Scalar::Util qw(looks_like_number);
my $players_file = $ARGV[0];
my #players_array;
open (my $file, "<", "$players_file")
or die "Failed to open file: $!\n";
while(<$file>) {
chomp;
push #players_array, $_;
}
close $file;
#print join "\n", #players_array;
my $num_of_players = #players_array;
my $error_flag;
my $error_line;
my $error_fname;
my $error_lname;
my $error_flag2=1;
my $error_line2;
my $error_fname2;
my $error_lname2;
my $i;
foreach my $player(#players_array){
my #fields = split " ", $player;
my $size2 = #fields;
for($i=2; $i<9; $i++){
print "$fields[$i] \n";
if (grep $_ =~ /^[a-zA-Z]+$/){
my $errorflag2 = 1;
$error_flag2 = $errorflag2;
my $errorline2 = $player +1;
$error_line2 = $errorline2;
my $errorfname2 = $fields[0];
$error_fname2 = $errorfname2;
}
}
if ($size2 == "9" ) {
my $firstname = $fields[0];
my $lastname = $fields[1];
my $batting_average = ($fields[4]+$fields[5]+$fields[6]+$fields[7]) / $fields[3];
my $slugging = ($fields[4]+($fields[5]*2)+($fields[6]*3)+($fields[7]*4)) / $fields[3];
my $on_base_percent = ($fields[4]+$fields[5]+$fields[6]+$fields[7] +$fields[8]) / $fields[2];
print "$firstname ";
print "$lastname ";
print "$batting_average ";
print "$slugging ";
print "$on_base_percent\n ";
}
else {
my $errorflag = 1;
$error_flag = $errorflag;
my $errorline = $player +1;
$error_line = $errorline;
my $errorfname = $fields[0];
$error_fname = $errorfname;
my $errorlname = $fields[1];
$error_lname = $errorlname;
}
}
if ($error_flag == "1"){
print "\n Line $error_line : ";
print "$error_fname, ";
print "$error_lname :";
print "Line contains not enough data.\n";
}
if ($error_flag2 == "1"){
print "\n Line $error_line2 : ";
print "$error_fname2, ";
print "Line contains bad data.\n";
}

OK, so the problem you've got here is that you're thinking of grep in Unix terms - a text based thing. It doesn't work like that in perl - it operates on a list.
Fortunately, this is pretty easy to handle in your case, because you can split your line into words.
Without your source data, this is hopefully a proof of concept:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
while ( <DATA> ) {
#split the current line on whitespace into an array.
#first two elements get assigned to firstname lastname, and then the rest
#goes into #values
my ( $firstname, $lastname, #values ) = split; #works on $_ implicitly.
#check every element in #values, and test the regex 'non-digit' against it.
my #errors = grep { /\D/ } #values;
#output any matches e.g. things that contained 'non-digits' anywhere.
print Dumper \#errors;
#an array in a scalar context evaluates as the number of elements.
#we need to use "scalar" here because print accepts list arguments.
print "There were ", scalar #errors, " errors\n";
}
__DATA__
bob hankerman 2039 3232 23 232 645 64x3 324
Or reducing down the logic:
#!/usr/bin/perl
use strict;
use warnings;
while ( <DATA> ) {
#note - we don't need to explicity specify 'scalar' here,
#because assigning it to a scalar does that automatically.
#(split) splits the current line, and [2..8] skips the first two.
my $count_of_errors = grep { /\D/ } (split)[2..8];
print $count_of_errors;
}
__DATA__
bob hankerman 2039 3232 23 232 645 64x3 324

First : You don't need to use "GREP", Simply you can match the string with "=~" in perl and you can print matched value with $&.
Second : You should use $_ if and only if there is not other variable used in the loop. There is already $i used in the loop, you can write the loop as :
for my $i (2..9) {
print "$i\n";
}
or
foreach(2..9) {
print "$_\n";
}

Related

Perl output successive string from array

I have an array I can print out as "abcd" however I am trying to print it as "a>ab>abc>abcd". I can't figure out the nested loop I need within the foreach loop I have. What loop do I need within it to print it this way?
my $str = "a>b>c>d";
my #words = split />/, $str;
foreach my $i (0 .. $#words) {
print $words[$i], "\n";
}
Thank you.
You had the right idea, but instead of printing the word at position i, you want to print all the words between positions 0 and i (inclusive). Also, your input can contain multiple strings, so loop over them.
use warnings;
while (my $str = <>) { # read lines from stdin or named files
chomp($str); # remove any trailing line separator
my #words = split />/, $str; # break string into array of words
foreach my $i (0 .. $#words) {
print join '', #words[0 .. $i]; # build the term from the first n words
print '>' if $i < $#words; # print separator between terms (but not at end)
}
print "\n";
}
There are many other ways to write it, but hopefully this way helps you understand what's happening and why. Good luck!
one liner:
perl -e '#a=qw(a b c d); for(#a) {$s.=($h.=$_).">"} $s=substr($s,0,-1);print $s'
I would do it like this:
#!/usr/bin/perl
use strict;
use warnings;
my $str = "a>b>c>d>e>f>g";
my #words = split />/, $str;
$" = '';
my #new_words;
push #new_words, "#words[0 .. $_]" for 0 .. $#words;
print join '>', #new_words;
A few things to explain.
Perl will expand array variables in a double-quoted string. So something like this:
#array = ('x', 'y', 'z');
print "#array";
will print x y z. Notice there are spaces between the elements. The string that is inserted between the elements is controlled by the $" variable. So by setting that variable to an empty string we can remove the spaces, so:
$" = '';
#array = ('x', 'y', 'z');
print "#array";
will print xyz.
The most complex line is:
push #new_words, "#words[0 .. $_]" for 0 .. $#words;
That's just a compact way to write:
for (0 .. $#words) {
my $new_word = "#words[0 .. $_]";
push #new_words, $new_word;
}
We iterate across the integers from zero to the last index in #words. Each time around the loop, we use an array slice to get a list of elements from the array, convert that to a string (by putting it in double-quotes) and then push that string onto #new_words.
This is what I ended up with, It's the only way I could understand and get the output I was looking for.
use strict;
use warnings;
my $str = "a>b>c>d>e>f>g";
my #words = split />/, $str;
my $j = $#words;
my $i = 0;
my #newtax;
while($i <= $#words){
foreach my $i (0 .. $#words - $j){
push (#new, $words[$i]);
}
if($i < $#words){
push(#new, ">");
}
$j--;
$i++;
}
print #new;
This output "a>ab>abc>abcd>abcde>abcdef>abcdefg"

Comparing two strings line by line in Perl

I am looking for code in Perl similar to
my #lines1 = split /\n/, $str1;
my #lines2 = split /\n/, $str2;
for (int $i=0; $i<lines1.length; $i++)
{
if (lines1[$i] ~= lines2[$i])
print "difference in line $i \n";
}
To compare two strings line by line and show the lines at which there is any difference.
I know what I have written is mixture of C/Perl/Pseudo-code. How do I write it in the way that it works on Perl?
What you have written is sort of ok, except you cannot use that notation in Perl lines1.length, int $i, and ~= is not an operator, you mean =~, but that is the wrong tool here. Also if must have a block { } after it.
What you want is simply $i < #lines1 to get the array size, my $i to declare a lexical variable, and eq for string comparison. Along with if ( ... ) { ... }.
Technically you can use the binding operator to perform a string comparison, for example:
"foo" =~ "foobar"
But it is not a good idea when comparing literal strings, because you can get partial matches, and you need to escape meta characters. Therefore it is easier to just use eq.
Using C-style for loops is valid, but the more Perl-ish way is to use this notation:
for my $i (0 .. $#lines1)
Which will iterate over the range 0 to the max index of the array.
Perl allows you to open filehandles on strings by using a reference to the scalar variable that holds the string:
open my $string1_fh, '<', \$string1 or die '...';
open my $string2_fh, '<', \$string2 or die '...';
while( my $line1 = <$string1_fh> ) {
my $line2 = <$string2_fh>;
....
}
But, depending on what you mean by difference (does that include insertion or deletion of lines?), you might want something different.
There are several modules on CPAN that you can inspect for ideas, such as Test::LongString or Algorithm::Diff.
my #lines1 = split(/^/, $str1);
my #lines2 = split(/^/, $str2);
# splits at start of line
# use /\n/ if you want to ignore newline and trailing spaces
for ($i=0; $i < #lines1; $i++) {
print "difference in line $i \n" if (lines1[$i] ne lines2[$i]);
}
Comparing Arrays is a way easier if you create a Hashmap out of it...
#Searching the difference
#isect = ();
#diff = ();
%count = ();
foreach $item ( #array1, #array2 ) { $count{$item}++; }
foreach $item ( keys %count ) {
if ( $count{$item} == 2 ) {
push #isect, $item;
}
else {
push #diff, $item;
}
}
#Output
print "Different= #diff\n\n";
print "\nA Array = #array1\n";
print "\nB Array = #array2\n";
print "\nIntersect Array = #isect\n";
Even after spliting you could compare them as Array.

Perl Script that should move through array with i=3 prints indicies that aren't x3

I have these arrays of Sequences and I wrote this script to walk through each sequence three letters at a time (eg. {0,1,2}, {3,4,5},{6,7,8}) and print the index of where it first encounters a certian 3 letter combination (TAA,TAG,TGA). (EX. if sequence were CGTAGCCCCTAACCCC, then the script would skip over the TAG in the 2 position because its not in the correct frame of 3 and report the TAA in the 9 position). Therefore, I am only expecting indices in multiples of 3 in my results.
On most strings there is no problem, however every once in a while it will index at 4 or other non multiples of three. I was wondering if anyone more advanced than I can figure out why this may happen. I know this script is ugly and I am sorry for that, I am a biologist and I mod it for whatever I am mining out of sequences at the time. I just can't figure out the bug.
Here are some sequences from my file. The 3rd line is the sequence that gives the strange result. Just for an example of what I am dealing with.
AGGTACGCGAGTCACCTTTCGTCTTCAATCTCGTTTGATCGAAGCTATTTGTCAAAAAGAGAGGATTTTTTTGCATCTCAATTATGATCATTCCTTAGGGTTTTCAGGGTTTTGGATTGTTGTTTTTGTTAACATTTATCTGATTCGTTTGTATTTGTGTGGCAGTCTAAAGTGGCATCAACAATGGCGTCTTTTATTATACATAAGCCAAAGGAGAGATCGCCTTTCACGAAAGCTGCTTTCAAAACGGTACCTTTAGTGATTCAGCATTTTTATCTGAAATATGTTTGTTGCATTATTGAATGATTCTGATGTGGTGTTGCTACCAACTTGTCTATGTTGGTTGATTTAGCTTGATAGCATCAAGGAGTTGGAACTGTTTATGTTGAAGCATCGAAAGGATTATGTTGATCTGCACCGGACTACAGAACAGGAAAAGGATAGTATTGAACAAGAAGTAAGTACTCTGAGCTAGGCTTGCCCGTAGTATATATCTGAACTCATGAAGTTACTGCGATAAATCTATGCTTGAGTTGAGATTGAACATATGGAACTATGGAATCATAAGAAATGTAGCAACTCATATTGAGATAACTCAGGAAGATTAATGTCTATTACTTTAGATAGCGAGGGAGTTAGTATATTGTGACACTGAGGAACTTGGATCTTGTATTCTTATACCTCTTGCAGTGTTTGATCGAGAACTATGTCTACTTATGTGTTGTGTAATATCATCAAACTCTCTCTCTCTCCCTCTTGCAGGTTGCTGCTTTTATTAAAGCTTGCAAAGAACAGATCGATATTCTCATAAACAGTATTAGAAATGAAGAAGCAAACTCCAAAGGATGGCTTGGCCTCCCCGCAGATAACTTCAATGCTGATTCTATAGCACACAAACATGGAGTGGTATGATATGCACCAATGTAGTAAGCCAACTTTGGTTTTTTTTTACTATGTTTTCTTTCAAAGTATCTAGATGTGTAGAAGTAATGGTAATTTTTTTTGTATGCAGGTTTTGATTCTGAGTGAGAAACTTCATTCAGTCACTGCCCAGTTTGATCAGCTTAGAGCTACTCGTTTCCAAGATATTATAAACAGAGCTATGCCGAGAAGAAAACCTAAGAGGGTCATAAAGGAAGCTACCCCAATTAATACAACTCTGGGAAATTCGGAGTCCATAGAACCGGATGAAATCCAGGCCCAACCTCGTAGATTACAACAACAACAACTTCTAGACGATGAAACACAAGCCCTTCAGGTAACAAGGCAAATATACATGATCTTCGAAAACTTGCATAAGTTTTGTAGTTATGCTAAATTTTGAAATTGATAATTTTTGCAGGTAGAGCTAAGTAATCTTTTAGATGGTGCTAGGCAGACAGAAACTAAGATGGTGGAGATGTCTGCATTAAACCACTTGATGGCAACTCATGTTCTGCAGCAAGCCCAACAGATAGAGTTTCTTTATGACCAGGTTAGGACTTATTAACTTCTCTAACGCTCTCATGTCAACACACTGTTTTGTTAGGCTTTCACTGTTCTTTACACTCCTTTGCTATCTCAAAGTTAAATTCGGATGCTTATTGTATTCAGAACTTTTCCTTGTCACATTCACCTAAATTAGGTATAGAGACGGGAAAGAAACTTTGTATTGGTCCAATTTTAATTGCTCTCCAATTTAGTGGTAGGAAATGGAACGGTTAATGTTTTTAGCTATGTAAAGTCTCTAAAACTCCATTTGAATGTGTCAATGACTCAATGCCATTCCCAATACTTTAGTTTATGGGGCTTTGCAGTTTTCCTACTCTGTAAACGTACAGCTTATGACTGACTTGGTGGCTCTCTTTATGTGTGTGTGTGTGTGTCTTGAGGCCCTTTTTCTCACTCAGTTTGACACTAAATGCAGGCAGTTGAGGCAACAAAGAACGTGGAGCTTGGAAACAAAGAGCTTTCTCAAGCAATCCAACGAAACAGCAGCAGCAGAACCTTTCTCTTACTGTTTTTCTTCGTCCTTACTTTCTCCGTCTTGTTCTTGGATTGGTACAGTTAAaaaacc
AGGTGATTGTTTTGTTATTATAAATCAAGATCAGTACATATATATTTTTGTTTTTCTTGGTTTCATATGTAATATTTTGGACTTTTGGTGTTTAGGTTTTTGACTTGGAAGAAAAGAACGTAATGGATGAGTCACTACACGAGGTGTATAAATTTTGCCTCACCGATGTTGATGAGAGAAGCAAGAAAGAGACATCAATGAAAGATGATTACATAGAACATAAGAAGTCTACTAGATTGTTGGCTGAAAATGCGAAGAAGTCCGGTCACAGTTTAGAAATATTAAGGCCGGAATCTAAACCTGAGACTGAAAAAGAGGTGATTTTATTTTCTTGTTATATAAAGATTCGTAGACATATATTTGGTTTTTCTTTGGTTTCATAATATTTTGGACTTATGTGTGTTTAGGTCAATGAAGAGGAAGAGAAGAGAGTAATGGATCCGGATGTGGATATTAGTTGTTATGAAGAGTCACCACACGAGGTGTATAAATTTAGCCTCACCGATTTCGAAGAAGAGATAATGGAAGATGATTACAGAGAAGATATGAAGTGTAGAATGTTGGATGATATAGTGAAGAATTCCGGTCACCGTGTAGAAATATCAAGGCCGGAATATTATAAACCTGAGATTGAAAAACAGGTTTTATTTTTTTGGTTATTTTGTGATTAAGATCAGTTTTTTTTTTTTTTTTTTTTGGTTTAATAATATTTGATCTTGTGTGTGTTTAGGTATATGAAAAGGAAGAGAAGAAAGTAATGGATCCGGATATCTATATTAGATCTTATGAAGAGTCACCAAACGAGGTGTATAAATTTAGCCTCACTGATTTGGAAGAAGAGATAATGGAAAATGACTCCATAGAAGGTGTGAAGTGTAGAATGTTGGATGAAATAATGAAGAAGTCCGGTCACCATTTAAAAATATCAAGGCCGGAATATAAACCTGAGATTGAAAAACAGGTTAGTTTTTAATAAAAAGATCACTAGATATTTTTTTTTATTTTTTTTTGTTTTTGGTTTCATAATATTTGACTTGTGGCATGTGTTTAGGTATATGAAGAGGAAGAGAAGAAAGTAATGGATCCAGATGTGGATATTAGATGTTATGAAGAGTCACCACACGAGGTGTCTAAATTTAGCCTCACCGATTTCGAAGAAGAGATAATGGAAGATGATTACATAGAAGCTTTGAAGTGTAGAATGTTGGATGATATATTGAAGAAGTCCGGTCACCGTTTAGAAATATCAAGGCGGCAATATAATAAACCTGAGATTGAAATACAGGTGATTTTTTTTTTTTATTATTGTTGTTATAGTAAGATCAGTAGATATATATCTTGGTTTCATAATATTTTGGACTTGTGTGTGTTTAGGTCAATGAAAAGGAAGAGAAGAAAGTAATCAATACGGATATGGATATTAGATATGATGATGAGTCACCAGAAGAGGTGGAGACATATTCTAGTCTCACGGATGATGAAGAAGAGAGAAGCAAGGAAGATACATCAATGGAAGATGTGAAGTGTAGAATGTTGGATTAAAAAACGACGAAGCTCGGCCACCTTTTAGGAATATCAAGGCCGGAATATAGACCTGAGATTGAAAAACAGGTGATTTTATTTTGTTGTTAATTGTATTAGTAAAGATCAGTAGATATATATTTGTTTTTGTTTTTCGGTTTCATAATATTTTGGACGCTTGTGTTTAGGTCAATGAAGAGAAAGAAAGAAAGTAATGGATATTAGATCTGCTGGTCAGTCACAAACACGAGGTGTACAAATTTAGCCTCACCGATATCAAAGAAGAGAGAAGCAATGAAGATACATCAATGGAAGATTGTTGCATAGAAGAGGCTCAAGTCGGAAAAGATCAAAGAGTCTTCAGATTCAGAGAAAGTAGTGAAGAGAAGAGAAAATCCTCATCATCACCATTATCACCACTAACAGAGTTTAGGGATATGGAGAGTTTGACGTATTACATGAGGCAAAAAGGGATGCATCGAAGAAGAAGAAGATCATCAACATCACCACATTGTTGCCATAATGTAGTATACAATGAGTTTAAAGTGACGAAGGAAGAAGAAGAGGAAGAAAGACAAAGATTAACAACCAAACGTGTTCATTCTAAGCTTCATGAATACGAACAATTTTTAACTCAGTTTAAAAAGAAGAAGGAAGAAGAAAACGAGAGACGAAGATTATCACCCAAAGACTTTGAGCCTACGCTTCCTGATTACGACCAAGTGATTACTCGCTTTAGAGTGCTGGAGAAGGAAGAAGAAGAAAGACGAAGATTAGCAACAAAACATGTTCATCCTAAGCTTCCTGATTACGACCAGATTGCTACTAAGTTTAAACTCCTGAAGGAGGTAGAAAAAGAAAGACGAAGATTATTAACCAAACACAGTTCATCCTAAgcttcc
TGGTAATTTTTGCATCTTCAAAATGTTCTAAAATTTTGGCAAATGGTTTTGTTAAGTTCGAATTTTTGGTTATGATACAGTTTGAACGTTTTTCTTCATAGATTACAGTTTTAGCAAATGTGAATCATTAAAAGTGGAATAGTTGGTTTGAAAACAATTGTCAATTTCATTTTTTTTTTGGTTTTATGGTTAGGCGAGGAAAGCATTAAGAGCTTTGAAAGGTATAGTGAAGCTACAAGCATTAGTGAGAGGATACTTAGTAAGGAAACGCGCGGCCGCAATGTTGCAGAGCATACAAACTTTGATCAGAGTCCAAACCGCTATGCGATCAAAACGCATCAATCGCAGCCTCAACAAAGAGTACAACAACATGTTTCAACCTCGACAATCCTTTGTAAAGAACTATTCTCATTTCCATTGGCTCTCTTTTTTTCTTTAAGCCAAAACAAGACTTAAAGTGTGTCCTCTGTTTGTAGGATAAGTTTGATGAAGCAACGTTCGATGACAGAAGAACAAAGATTGTAGAGAAGGACGATAGATACATGAGAAGATCAAGTTCAAGATCAAGATCTAGACAAGTGCACAATGTTGTTTCAATGTCTGACTATGAAGGCGATTTTGTTTACAAAGGGAATGATTTGGAGTTGTGTTTCTCGGATGAGAAGTGGAAGTTTGCTACCGCGCAGAACACGCCGAGATTATTGCATCACCATTCTGCTAATAATCGCTATTATGTAATGCAGTCTCCAGCTAAGAGTGTTGGTGGAAAGGCTTTGTGTGACTATGAAAGCAGTGTGAGTACTCCTGGCTACATGGAGAAAACTAAGTCCTTTAAGGCAAAAGTGCGTTCACACAGCGCACCGCGCCAGCGATCTGAGAGGCAGAGGTTGTCGCTAGATGAAGTTATGGCCTCTAAGAGTAGCGTTAGCGGTGTGAGTATGTCGCATCAGCATCCACCACGCCATTCTTGTTCCTGTGATCCGCTTTAActtaac
GAGTTAGTAAACAAAGTGTTCACATTTTAGTAAACATTGTTGTTCGTTAATCACGTAACGTTTTGTTTTTCCAGTTTACACTGAGCTCTGATGAGTATATAACGGAGGTGAATGGTTACTACAAAACTACGTTTTCGGGAGAAGTCATAACGTCGTTGACGTTCAAGACGAACAAAAGGACATATGGGACTTACGGAAATAAAACCAGTAGCTACTTTTCTGTTGCCGCACCCAAAGATAACCAGATTGTCGGTTTTCTTGGAAGTAGCAGCCATGCTCTCAACTCCATCGACGCTCATTTTGCCCCTGCTCCTCCTCCTGGTAGCACCGGAGCTAAGCCCGGTGCTAGTGGCATCGGAAGTGATTCTGGTAGCATTGGTAGTGCCGGAACTAACCCTGGTGCTGATGGCACCAGAGAAACCGAAAAAAACGCTGGTGGCTCAAAACCTAGTAGTGGTAGTGCCGGAACTAACCCTGGTGCTAGTGCTGTTGGCAACGGAGAAACCGAAAAAAATGCTGGTGGCTCAAAACCTAGCAGTGGTAGTGCTGGAACTAACCCTGGTGCTAGTGCTGGTGGCAACGGAGAAACCGAAAAAAACGTTGGTGGCTCAAAACCTAGCAGTGGTAAAGCCGGAACTAACCCTGGTGCTAATGCTGGTGGCAACGGAGGAACCGAAAAAAACGCTGGTGGCTCAAAATCTAGCAGTGGTAGTGCTCGAACTAACCCTGGTGCTAGTGCTGGTGGCAACGGAGAAACTGTTTCCAACATTGGAGATACGGAAAGTAACGCTGGTGGCTCGAAAAGTAATGATGGTGCTAACAATGGTGCTAGTGGCATTGAAAGTAATGCTGGTAGCACTGGAACTAACTTTGGTGCTGGTGGCACCGGGGGAATTGGAGATACGGAAAGTGATGCTGGTGGCTCCAAAACTAACTCTGGAAACGGCGGAACTAACGATGGTGCTAGTGGTATTGGAAGTAATGATGGTAGCACTGGAACTAACCCTGGTGCTGGTGGAGGAACAGATTCAAACATCGAAGGTACTGAAAATAACGTTGGTGGCAAGGAAACTAACCCTGGTGCTAGTGGCATTGGAAATAGTGATGGTAGCACTGGAACTAGCCCCGAAGGTACCGAAAGTAACGCTGACGGCACAAAAACTAACACGGGAGGCAAAGAATCTAACACCGGAAGTGAATCCAACACCAATTCTAGTCCACAAAAGTTGGAAGCACAAGGAGGCAATGGAGGAAATCAATGGGACGACGGAACCGATCATGATGGTGTGATGAAGATACATGTTGCAGTTGGTGGTCTAGGAATTGAGCAAATTAGATTTGATTATGTCAAGAACGGACAGTTGAAGGAAGGACCCTTCCACGGTGTCAAAGGAAGAGGTGGCACTTCAACGGTGCGTAAATTTTTATTATTATGGCTCAATTACGTTTTTCGAATAAGTGTTAATTCAAGATTATTGATCTTCATGATTCTGCAGATTGAGATTAGCCATCCGGACGAGTATCTTGTTTCCGTCGAGGGGTTGTACGACTCTTCCAATATCATTCAAGGAATCCAGTTTCAATCCAACAAACACACTTCTCAGTACTTTGGATATGAATATTATGGAGATGGTACACAATTTTCACTTCAAGTTAATGAAAAGAAGATCATTGGTTTCCATGGTTTTGCCGACTCACACCTTAATTCTCTTGGAGCTTATTTCGTTCCAATCTCATCCTCTTCTTCCTCCTTGACTCCTCCTCCCAACAAAGTTAAAGCTCAAGGAGGAAGTTATGGAGAAACATTTGACGATGGTGCTTTCGATCATGTAAGAAAGGTTTATGTTGGTCAAGGTGATTCTGGTGTAGCTTATGTCAAGTTCGATTATGAAAAAGACGGTAAAAAGGAGACACAAGAACATGGAAAAATGACATTGTCAGGAACAGAGGAGTTTGAGGTTGATTCAGACGATTACATAACATCAATGGAGGTTTATGTCGACAAAGTCTACGGTTATAAAAGCGAAATCGTCATTGCTCTTACCTTCAAGACCTTTAAGGGTGAAACTTCTCCACGTTTTGGAATAGAGACTGAGAATAAATATGAAGTTAAAGACGGTAAAGGAGGAAAACTTGCTGGTTTCCATGGAAAAGCTAGCGATGTTCTTTATGCTATTGGTGCTTATTTCATTCCAGCAGCAAATTAGagagtt
ACGTATGTCTTAGTTACTACTATCATACTATATTACTATGTATTGGAAAACTTTTGGTTAGAACCTGTTGGGAGGAAAGGGTTTATGTTCTGGTTCATTTTACGTGTACTAAGTACTTATAATTAAGATTAAAAGAAACATTTACAGCTTCACCCTCTGGTCGATGTATGTGGGCTGTGGGCATGTGGCCAATCTCTGAAGCGTTAGGTAGAGCAAATATAGAGTTGAGAGTTGCTTAAGTTAGTGAACGTGAATGACTAAAAAGATATGTTGCATTTAAATCGTATTGGGCCTCATCCCATCTAAAATATAGTAGGTGTAGGCCTTTTAGGTTAATTTGAATAAAATCAACCTTTTTGTAAGCAACATCGACGATTGTCACATTTTTCTCATACACATAGGTGTAATCTAGCTTTGAATGTTTTCTCATACACATAGGTGTAATCACCGTAATTATCATTTGTGAAGATATATGTTTTACCAAGTGGTTTGTATTGTCCATATATACTTTACCACTTTCATATTAACATATAATGTTTTTGTAAGTATTATACCATAAAGGATTGGTTTCTTAATATTATTAACAAAACGCAAAAATTCTTTTAAACGCAGGCGATTCCAATCCACAGCGTTGCGGTTAGAGTAGGATCAACACAAAGAGTAGTGATGGAGATCATAATCACATTCGCATTGGTCTACACTGTTTACGCCACAGCCATTGACTCCAACAATGGCACTCTCGGAACCATCGCTCCACTTGCTATCAGACTCATCGTTGGTGCTAACATTCTTGCAGCCGGCCCATTCTCTGGTGGTCCAATGAACCCTGGACGTTCTTTTGGATCATCTCTTGCCGTTGGAAATTTTTCAGGACATTAGgtttat
and here is the script I am running:
#!/usr/bin/perl
use strict;
use warnings;
# A program to find the first inframe stop codon of non-spliced intron containing genes
print "ENTER THE FILENAME FOR DNA SEQUENCES:= ";
# Asks for Sequence file and if file does not exist prints error message
my $filename = <STDIN>;
#my $sequence;
my #sequence;
chomp $filename;
unless (open(DNAFILE, $filename) ) {
print "Cannot open file \"$filename\"\n\n";
}
#sequence = <DNAFILE>;
close DNAFILE;
open (FILE, ">AtPTCindex.txt");
my $j;
my $i;
my $codon;
my $stopseq;
my $counter;
#Change $j<(375) to n=number of sequences
for ($j = 0; $j < #sequence; $j ++) {
$counter = 0;
for ($i = 0; $i < (length($sequence[$j]) - 2) && $counter < 1; $i += 3) {
$codon = substr($sequence[$j], $i, 3);
if ($codon =~ m/TAG|TGA|TAA/g) {
# m added before /TAG... above
$stopseq = substr($sequence[$j], $i, 9);
my $result = index($sequence[$j], $stopseq);
$counter = 1;
#my $results = index($sequence[$j], $stopseq);
print FILE "$result \n";
#print FILE "$results $j \n";
}
}
if ($counter == 0) {
print FILE "\n"
}
}
close FILE;
exit;
Thanks so much.
As threatened, the following is a cleaned up version of your script:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
die "Usage: $0 Filename\n" if #ARGV != 1;
my $file = shift;
open my $infh, '<', $file;
open my $outfh, '>', 'AtPTCindex.txt';
while (my $line = <$infh>) {
chomp($line);
my $result = '';
for (my $i = 0; $i < (length($line) - 2); $i += 3) {
my $codon = substr($line, $i, 3);
if ($codon =~ m/TAG|TGA|TAA/) {
# m added before /TAG... above
my $stopseq = substr($line, $i, 9);
$result = index($line, $stopseq);
$result .= " ($i, $codon, $stopseq)";
last;
}
}
print "$result\n";
# print $outfh "$result\n";
# print $outfh "$result $.\n";
}
close $infh;
close $outfh;
For the 5 lines of data that you provided, the following is the output:
84 (84, TGA, TGATCATTC)
3 (3, TGA, TGATTGTTT)
3 (3, TAA, TAATTTTTG)
4 (27, TAG, TAGTAAACA)
123 (123, TAA, TAAGATTAA)
I believe your issue is with these lines:
my $stopseq = substr($line, $i, 9);
$result = index($line, $stopseq);
You're pulling a sequence from the $line at position $i, and then immediately doing an index for it. In the case of 4 of 5 of those lines, it immediately finds the same value $i. However, in the case of line 4, it finds a matching sequence earlier in the line.
If this isn't desired, you'll have to explain what your desired behavior actually is. Perhaps, you just want $i? Or are you looking for a matching stop sequence any point AFTER $i? You'll have to specify what your actual logic wants to be.
I took a different approach, unpacking it into groups of three instead of counting by indexes of three. I believe this script does what you want, and it looks a lot cleaner. It can also optionally take the filename as argument.
#!/usr/bin/perl
use strict;
use warnings;
my $filename = 'a'; # dummy value
my $resultfile = 'AtPTCindex.txt';
# User may have passed filename as arguement
if (#ARGV) { if (-e $ARGV[0]) { $filename = $ARGV[0] } }
unless (-e $filename)
{
print "ENTER THE FILENAME FOR DNA SEQUENCES: ";
chomp($filename = <STDIN>)
}
open DNA,"<$filename" or die "Couldn't open $filename for reading: $!\n";
my #sequence = <DNA> or die "Couldn't read $filename: $!\n";;
close DNA;
# Uncomment the below line if you're braver than me
if (-e $resultfile) { die "Cowardly refusing to write to existing file" }
if (-e $resultfile) { unlink $resultfile };
open RESULT,">>$resultfile" or die "Courdn't open$!\n";
foreach my $string (#sequence)
{
# split into groups of 3
my #groups = unpack "(A3)*", $string;
# Search for the group you want
for (my $groupnum = 0; $groupnum < #groups - 1; $groupnum++)
{
if ($groups[$groupnum] =~ m/(TAG|TGA|TAA)/g)
{
print RESULT (($groupnum + 0) * 3) . "\n";
print "$1 (" . $1 . ( $groups[$groupnum + 1]) . ($groups[$groupnum + 2]) . ") at index " . (($groupnum + 0) * 3) . "\n";
last;
}
}
}
close RESULT;
Running the script on your sample data, it outputs:
TGA (TGATCATTC) at index 84
TGA (TGATTGTTT) at index 3
TAA (TAATTTTTG) at index 3
TAG (TAGTAAACA) at index 27
TAA (TAAGATTAA) at index 123
...as well as writes the raw index numbers to the file specified.

Compare Arrays and Delete Arrays

I have below three sets ( arrays ) I need to perform an operation like this ( (A-B)UC ) on.
Can someone have the logic of this in Perl?
Here is my code I can able check for is B subset of A or not but I could not able to do "A-B":
my #array = (MAJOR,MINOR,MM,DD,YY);
my #exclude = (MM,MINOR,YY);
my #include = (LICENSE,VALID);
foreach (#exclude) {
if ( $_ ~~ #array ) {
print "\n $_ is defined in variables and it will be excluded \n";
#array = grep {!/\$_/} #array;
print "#array \n";
}
else {
print "\n $_ is not defined under variables please check the files \n";
exit 100;
}
}
foreach (#array){
print "$_ \n";
}
I suspect something is wrong in my logic with grep operation i.e. delete operation.
One problem with the grep is that $_ in the outer loop is redefined inside the grep block to each element of #array. You need to have different names. Also, your regex was lacking anchors; however, instead of a regex, just use string inequality. Try this:
my #array = qw(MAJOR MINOR MM DD YY);
my #exclude = qw(MM MINOR YY);
my #include = qw(LICENSE VALID);
foreach my $e (#exclude) {
if ( $e ~~ #array ) {
print "\n $e is defined in variables and it will be excluded \n";
#array = grep {$e ne $_} #array;
print "#array \n";
} else {
print "\n $e is not defined under variables please check the files \n";
exit 100;
}
}
use strict and warnings to alert you to many pitfalls in perl.
A hash is the most natural way to represent a set in perl.
use strict;
use warnings;
my #array = ('MAJOR','MINOR','MM','DD','YY');
my #exclude = ('MM','MINOR','YY');
my #include = ('LICENSE','VALID');
my %set;
# add #array to set
#set{#array} = ();
# remove #exclude
delete #set{#exclude};
# add #include
#set{#include} = ();
# array of elements resulting
my #result = sort keys %set;
You could use a set to do those kind of operations. I used a non-standard module Set::Scalar to help me with it:
#!/usr/bin/env perl
use warnings;
use strict;
use Set::Scalar;
my #array = qw(MAJOR MINOR MM DD YY);
my #exclude = qw(MM MINOR YY);
my #include = qw(LICENSE VALID);
my $array_set = Set::Scalar->new(#array);
my $exclude_set = Set::Scalar->new(#exclude);
my $include_set = Set::Scalar->new(#include);
my $result = $array_set->difference($exclude_set)->union($include_set);
use Data::Dumper;
print Dumper #$result;
Run it like:
perl script.pl
That yields:
$VAR1 = 'VALID';
$VAR2 = 'MAJOR';
$VAR3 = 'DD';
$VAR4 = 'LICENSE';

Perl Script -: Useless use of array element in void context at letter_counter.pl lin 38 and 44

this is first my perl script
http://bpaste.net/show/171137/
#!/usr/bin/perl
#This program will take a user's input and then count how many letters there are. Whereupon it will count the number of unique letters before printing all the data
#back to the user.
use strict;
use warnings;
#======================================================================================================================
# This section is to collect and spit back the input to the user.
#======================================================================================================================
print "\n\nHello, please enter a word, a phrase, or a sentence. Press Enter when you are done.\n";
my $input = <>; #Collecting the input from the user.
chomp $input; #Chomping, or removing, the \n from the end of the input.
print "\nYou typed -:[$input]\n";
#======================================================================================================================
#This section will find how many unique characters there are.
#======================================================================================================================
my #uniqueArray;
my #stringArray = split(// , $input);
my $x = 0;
my $string_max_index = $#stringArray;
for($stringArray[$x];$stringArray[$string_max_index];$x++)
{
my $found = 0;
my $test = $stringArray[$x];
my $y = 0;
for($uniqueArray[$y];$uniqueArray[$#uniqueArray];$y++)
{
if($test eq $uniqueArray[$y])
{
$found=1;
}
}
if($found eq 1)
{
$uniqueArray[$#uniqueArray] = $stringArray[$x];
}
}
#======================================================================================================================
# This section will determine how many ascii characters are in the $input variable and output the results of this
# program.
#======================================================================================================================
my $numOfLet = 0;
while ( $input ne "" )
{
$numOfLet = $numOfLet + 1;
chop $input
}
print "Total Characters -: $numOfLet";
print "Total of Unique Characters -: $#uniqueArray \n\n\n";
exit;
I was able to get rid of all the errors except for these two,
Useless use of array element in void context at letter_counter.pl line 38
Useless use of array element in void context at letter_counter.pl line 44
What is confusing me is that There is nothing at those lines, just the closing brackets for my for loop, which leads me to believe that the issue is an element I called in each for loop.
The initialization block of your for loop is the immediate culprit. Adjusting to something like this resolves the warning:
for(;$stringArray[$string_max_index];$x++)
Otherwise you're accessing a value, but doing... nothing with it? That's what the warning is for.
I spot a few other problems, though:
Your for loops are... a little funny, I don't know how else to put that.
Array length is usually easiest to read with the scalar keyword.
Adding members to an array is usually done with the push keyword.
Using the above in combination:
for(my $x = 0; $x < scalar #stringArray;$x++)
{
my $found = 0;
my $test = $stringArray[$x];
my $y = 0;
for (my $y = 0; !$found && $y < scalar #uniqueArray;$y++)
{
if($test eq $uniqueArray[$y])
{
$found=1;
}
}
unless ($found)
{
push #uniqueArray, $stringArray[$x];
}
}
If the above for loops don't look sensible to you, now is a good time to look up some tutorials.
This could be simplified with foreach loops:
foreach my $letter (#stringArray) {
...
}
Or with grep searches:
my $found = grep { $_ eq $letter } #uniqueArray;
But, in the particular case of counting unique values, it's often simplest to assign to a hash:
my %uniques;
$uniques{$_} = 1 for #stringArray;
my $num_uniques = scalar keys %uniques;
Combining all of that:
my #letters = split(//, $input); #split input into array of chars
my %uniques; #declare empty hash
$uniques{$_} = 1 for #letters; #set hash key for each char
my $num_letters = scalar #letters; #count entries in letter list
my $num_uniques = scalar keys %uniques; #count unique keys in letter hash
Exercise for the reader: adjust the above code so that it counts the number of times each character is used.
That's because #uniqueArray is empty...
Given this short example:
use strict;
use warnings;
my #arr;
my $t = 0;
for ($arr[$t]; $arr[$#arr]; $t++ ) {
print "no\n";
}
__OUTPUT__
Useless use of array element in void context at t.pl line 11.
You declare my #uniqueArray; at line 21 and never do anything with it...
Which also means how will this ever match at line 34?
if($test eq $uniqueArray[$y])
Again, #uniqueArray is an empty array.
To fix your script (although please look at rutter's hash suggestion), you can do the following. Remove:
my $x = 0;
my $y = 0;
Instead of using C-style loops, replace with the following:
for my $x (0 .. $string_max_index )
for my $y (0 .. $#uniqueArray)
Lastly, use the following:
if(!$found)
{
push #uniqueArray, $stringArray[$x];
}
Hope this helps!

Resources