What pitfalls does this Perl code have? - arrays

I have written some code to print formatted arrays (first line = no of inputs, second line = max width of numbers). The star can be any sort of marker to differentiate some elements from the rest.
$ cat inp.txt
6
2
1 *
2
3
4
9
12 *
$ cat inp.txt | ./formatmyarray.pl
____ ____ ____ ____ ____ ____
| * | | | | | * |
| 1 | 2 | 3 | 4 | 9 | 12 |
|____|____|____|____|____|____|
$
fomatmyarray.pl
#!/usr/bin/perl
use warnings;
use strict;
my $spc = q{ };
my $und = q{_};
my $sep = q{|};
my $end = "\n";
my #inp = <STDIN>;
my $len = $inp[0];
my $wid = $inp[1];
chomp $wid;
sub printwall {
my($left, $middle, $l, $w) = #_;
for(0..($l - 1)) {
if ($_ == 0) { print $left; }
for(1..($w + 2)) { print $middle; }
print $left;
}
print $end;
return;
}
sub printchar {
my($left, $middle, $l, $w) = #_;
for(0..($l - 1)) {
if ($_ == 0) { print $left; }
my #temp = split ' ', $inp[$_ + 2];
my $star = 0;
if (($#temp) >= 1) { $star = 1; }
my $mid = sprintf "%d", (($w + 2) /2);
for(1..($w + 2)) {
if (($_ == $mid) && ($star == 1)) { print "*"; }
else { print $middle; }
}
print $left;
}
print $end;
return;
}
sub printnum {
my($left, $middle, $l, $w) = #_;
for(0..($l - 1)) {
if ($_ == 0) { print $left; }
my #temp = split ' ', $inp[$_ + 2];
my $format = join '', q{%}, $w, q{d};
my $num = sprintf($format, $temp[0]);
print join '', $middle, $num, $middle, $left;
}
print $end;
return;
}
printwall($spc, $und, $len, $wid);
printchar($sep, $spc, $len, $wid);
printnum ($sep, $spc, $len, $wid);
printwall($sep, $und, $len, $wid);
I already checked it with Perl::Critic but that will only tell me the syntactical problems (which I have already corrected). Are there any problems that you see with this code. Something an experienced Perl programmer would do differently?
Any comments or suggestions are welcome.

Several suggestions in here. Hope this is helpful.
use strict;
use warnings;
use Getopt::Long qw(GetOptions);
my $SPC = q{ };
my $UND = q{_};
my $SEP = q{|};
my $END = "\n";
main();
sub main {
# Try to keep run options and core input data separate from each other.
GetOptions('max=i' => \my $max_n);
# Parse input at the outset so that subsequent methods
# don't have to worry about such low-level details.
my $inp = parse_input();
# Prune the input array at the outset.
# This helps to keep subsequent methods simpler.
splice #$inp, $max_n if $max_n;
# Don't require the user to compute max width.
my $wid = determine_width($inp);
# The format string can be defined at the outset.
my $fmt = join '', $SEP, $SPC, '%', $wid, 's', $SPC;
# You can print both data and stars using one method.
print_border($inp, $wid, $SPC);
print_content($inp, $fmt, $_) for qw(star data);
print_border($inp, $wid, $SEP);
}
sub parse_input {
my #parsed;
# Using <> provides more flexibility than <STDIN>.
while (<>){
chomp;
my ($value, $star) = split;
$star = $SPC unless defined $star;
push #parsed, { data => $value, star => $star }
}
return \#parsed;
}
sub determine_width {
my $inp = shift;
my $wid = 0;
for (#$inp){
my $len = length $_->{data};
$wid = $len if $len > $wid;
}
return $wid;
}
# Because we did work at the outset to create a data structure
# that represents our goals conveniently, generating output
# is much simpler.
sub print_border {
my ($inp, $wid, $wall_sep) = #_;
print $wall_sep, $UND x ($wid + 2) for #$inp;
print $wall_sep, $END;
}
sub print_content {
my ($inp, $fmt, $mode) = #_;
printf $fmt, $_->{$mode} for #$inp;
print $SEP, $END;
}

There's a lot of room for improvement here (I'll update this answer as and when I have time).
Let's start off with the inputs. You should not have to specify the number of entries or their maximum length as Perl can infer that for you:
my $entries = my #entries = <STDIN>;
Don't forget about CPAN.
For instance, consider Text::ASCIITable.

The return statements would not appear in most people's code - a sub returns when it reaches the end (but see discussion in comments).
In printwall, I'd unconditionally print the first left wall outside the loop; ditto the other functions.
I'm not convinced I'd read all the data into #inp as shown. More likely, I'd use:
my $num = <STDIN>; # Or, more likely, just <>
my $wid = <STDIN>;
my #inp = <STDIN>;
This would clean up the $inp[$_ + 2] in the functions.
I'd probably pass the array to the functions, rather than using global variables - globals are grubby in Perl as everywhere else.
The count of the number of values is not needed in the input. With the array containing just the data to be printed, you can iterate over each member of the array in the functions with a suitable foreach, improving its Perlishness.
In printnum, you can build the format string once (not each iteration).
This:
my $mid = sprintf "%d", (($w + 2) /2);
is a funny way of writing:
my $mid = int(($w + 2) / 2);
I'd probably use a regex to find the star; it isn't clear whether you should print a '*' if any character is found, or if you should print the character that is found.
I'd probably be using a single format to deal with the stars:
my $fmt = sprintf "%*s%%c%*s%c", $wid, $middle, $wid, $middle, $left;
I might need to tune one of the $wid values to allow for even widths, but the output would be:
" %c |"
You can then simply print each cell with a blank or a '*' for the value using the format.
Similarly, in printnum, I'd be generating a simple format string like " %2d |" to print each number - and I'd generate that format once.
Etc.

Related

For Loop Issues in creating nested array

Creating a matrix of products for three element arrays. I understand Perl does not have multi-dimensional arrays and are flattened. I have been using refs but I can't seem to get past the for loop issue in getting three products into a single array and pushing that array into a different single array. And I could be way off too. Be nice, but I've spent too many hours on this.
I have moved values inside and out of various places i.e. { }, printed out variables until I'm blue and used $last all over for debugging. I'm likely fried at this point.
use strict;
use warnings;
my #array1 = (1, 2, 3);
my #array2 = (2, 4, 6);
my #matrixArray = ();
my $matrixArray;
my #row;
my #finalArray = maths(\#array1, \#array2);
print #finalArray;
sub maths{
my $array1ref = shift;
my $array2ref = shift;
my $value1;
my $value2;
my $maths;
my #row = ();
my #array1 = #{$array1ref};
my #array2 = #{$array2ref};
my $len1 = #array1;
my $len2 = #array2;
for my $x (0 ..($len1 -1)){
#iterate through first array at each value
$value1 = $array1[$x];
#print $value1, " value1 \n";
for my $y (0 .. ($len2 -1)){
#iterate through second array at each value
$value2 = $array2[$y];
#print $value2, " value2 \n";
#calculate new values
$maths = $value1 * $value2;
#exactly right here
#print $maths, " maths \n" ;
push #row, $maths;
}
}
#and exactly right here but not set of arrays
#print #row, "\n";
return #row;
}
Currently I'm able to get this: 246481261218. Which is the correct dumb math but...
it should appear as a matrix:
2 4 6
4 8 12
6 12 18
I am not passing three arrays so it seems my issue is up in the sub routine before I can get on with anything else. This seems to be a theme that I often miss. So sorry if I sound inept.
EDIT***
This was working but I couldn't unpack it
use strict;
use warnings;
my #array1 = (1, 2, 3);
my #array2 = (2, 4, 6);
my #matrixArray = ();
maths(\#array1, \#array2);
foreach my $x (#matrixArray){
print "$x \n";
}
sub maths{
my $array1ref = shift;
my $array2ref = shift;
my $value1;
my $value2;
my $maths;
my #row = ();
my $row;
my #array1 = #{$array1ref};
my #array2 = #{$array2ref};
my $len1 = #array1;
my $len2 = #array2;
for my $x (0 ..($len1 -1)){
#iterate through first array at each value
$value1 = $array1[$x];
for my $y (0 .. ($len2 -1)){
#iterate through second array at each value
$value2 = $array2[$y];
#calculate new values
$maths = $value1 * $value2;
push #row, $maths;
$row = \#row;
}
push #matrixArray, $row;
}
return #matrixArray;
}
The output right after the function call is this:
ARRAY(0x55bbe2c667b0)
ARRAY(0x55bbe2c667b0)
ARRAY(0x55bbe2c667b0)
which would be the (line 10) print of $x.
****EDIT
This Works (almost):
print join(" ", #{$_}), "\n" for #matrixArray;
Output is a bit wrong...
2 4 6 4 8 12 6 12 18
2 4 6 4 8 12 6 12 18
2 4 6 4 8 12 6 12 18
And of note: I knew $x was an array but I seemed to run into trouble trying to unpack it correctly. And I'm no longer a fan of Perl. I'm pining for the fjords of Python.
And *****EDIT
This is working great and I get three arrays out of it:
sub maths{
my ($array1, $array2) = #_;
my #res;
for my $x (#$array1) {
my #row;
for my $y (#$array2) {
push #row, $x * $y;
}
push #res, \#row;
}
#This is the correct structure on print #res!
return #res;
}
But, though it's putting it together correctly, I have no output after the call
maths(\#array1, \#array2);
NOTHING HERE...
print #res;
print join(" ", #{$_}), "\n" for #res;
foreach my $x (#res){
print join(" ", #{$x}), "\n";
}
And of course a million thanks! I regret taking this stupid course and fear my grade will eventually do me in. Still pining for Python!
It appears that you need a matrix with rows obtained by multiplying an array by elements of another.
One way
use warnings;
use strict;
use Data::Dump qw(dd);
my #ary = (2, 4, 6);
my #factors = (1, 2, 3);
my #matrix = map {
my $factor = $_;
[ map { $_ * $factor } #ary ]
} #factors;
dd #matrix;
The array #matrix, formed by the outer map, has array references for each element and is thus (at least) a two-dimensional structure (a "matrix"). Those arrayrefs are built with [ ], which creates an anonymous array out of a list inside. That list is generated by map over the #ary.
I use Data::Dump to nicely print complex data. In the core there is Data::Dumper.
With a lot of work like this, and with large data, efficiency may matter. The common wisdom would have it that direct iteration should be a bit faster than map, but here is a benchmark. This also serves to show more basic ways as well.
use warnings;
use strict;
use feature 'say';
use Benchmark qw(cmpthese);
my $runfor = shift // 5; # run each case for these many seconds
sub outer_map {
my ($ary, $fact) = #_;
my #matrix = map {
my $factor = $_;
[ map { $_ * $factor } #$ary ]
} #$fact;
return \#matrix;
}
sub outer {
my ($ary, $fact) = #_;
my #matrix;
foreach my $factor (#$fact) {
push #matrix, [];
foreach my $elem (#$ary) {
push #{$matrix[-1]}, $elem * $factor;
}
}
return \#matrix;
}
sub outer_tmp {
my ($ary, $fact) = #_;
my #matrix;
foreach my $factor (#$fact) {
my #tmp;
foreach my $elem (#$ary) {
push #tmp, $elem * $factor;
}
push #matrix, \#tmp;
}
return \#matrix;
}
my #a1 = map { 2*$_ } 1..1_000; # worth comparing only for large data
my #f1 = 1..1_000;
cmpthese( -$runfor, {
direct => sub { my $r1 = outer(\#a1, \#f1) },
w_tmp => sub { my $r2 = outer_tmp(\#a1, \#f1) },
w_map => sub { my $r3 = outer_map(\#a1, \#f1) },
});
On a nice machine with v5.16 this prints
Rate direct w_map w_tmp
direct 11.0/s -- -3% -20%
w_map 11.4/s 3% -- -17%
w_tmp 13.8/s 25% 21% --
The results are rather similar on v5.29.2, and on an oldish laptop.
So map is a touch faster than building a matrix directly, and 15-20% slower than the method using a temporary array for rows, which I'd also consider clearest. The explicit loops can be improved a little by avoiding scopes and scalars, and the "direct" method can perhaps be sped up some by using indices. But these are dreaded micro-optimizations, and for fringe benefits at best.
Note that timings such as these make sense only with truly large amounts of data, what the above isn't. (I did test with both dimensions ten times as large, with very similar results.)
The second program is mostly correct.
The problem is that you didn't unpack the second level of the array.
foreach my $x (#matrixArray){
print "$x \n";
}
should be something like:
foreach my $x (#matrixArray) {
print join(" ", #{$x}), "\n";
}
# or just:
print join(" ", #{$_}), "\n" for #matrixArray;
Your maths function can be made shorter without losing legibility (it may actually make it more legible) by cutting out unnecessary temporary variables and indexing. For example:
sub maths {
my #array1 = #{ $_[0] };
my #array2 = #{ $_[1] }; # or: ... = #{ (shift) };
my #res = ();
for my $x (#array1) {
my #row = (); # <-- bugfix of original code
for my $y (#array2) {
my $maths = $x * $y;
push #row, $maths;
}
push #res, \#row;
}
return #res;
}

loop through elements of array to find character perl

I have a perl array where I only want to loop through elements 2-8.
The elements are only meant to contain numbers, so if any of those elements contain a letter, I want to set an error flag = 1, as well as some other variables as seen.
The reason I have 2 error flag variables is due to scope rules within the loop.
fields is an array, I created by splitting another irrelevant array by the " " key.
So, when I try to print error_line2, error_fname2 from outside the loop, I get this:
Use of uninitialized value $error_flag2 in numeric eq (==)
I don't know why, because I've initialized the value within the loop and created the variable outside the loop.
Not sure if I'm even looping to find characters correctly, so then it's not setting the error_flag2 = 1.
Example line:
bob hankerman 2039 3232 23 232 645 64x3 324
since element 7 has the letter 'x' , I want the flag to be set to 1.
#!/usr/bin/perl
use strict;
use warnings;
use Scalar::Util qw(looks_like_number);
my $players_file = $ARGV[0];
my #players_array;
open (my $file, "<", "$players_file")
or die "Failed to open file: $!\n";
while(<$file>) {
chomp;
push #players_array, $_;
}
close $file;
#print join "\n", #players_array;
my $num_of_players = #players_array;
my $error_flag;
my $error_line;
my $error_fname;
my $error_lname;
my $error_flag2=1;
my $error_line2;
my $error_fname2;
my $error_lname2;
my $i;
foreach my $player(#players_array){
my #fields = split " ", $player;
my $size2 = #fields;
for($i=2; $i<9; $i++){
print "$fields[$i] \n";
if (grep $_ =~ /^[a-zA-Z]+$/){
my $errorflag2 = 1;
$error_flag2 = $errorflag2;
my $errorline2 = $player +1;
$error_line2 = $errorline2;
my $errorfname2 = $fields[0];
$error_fname2 = $errorfname2;
}
}
if ($size2 == "9" ) {
my $firstname = $fields[0];
my $lastname = $fields[1];
my $batting_average = ($fields[4]+$fields[5]+$fields[6]+$fields[7]) / $fields[3];
my $slugging = ($fields[4]+($fields[5]*2)+($fields[6]*3)+($fields[7]*4)) / $fields[3];
my $on_base_percent = ($fields[4]+$fields[5]+$fields[6]+$fields[7] +$fields[8]) / $fields[2];
print "$firstname ";
print "$lastname ";
print "$batting_average ";
print "$slugging ";
print "$on_base_percent\n ";
}
else {
my $errorflag = 1;
$error_flag = $errorflag;
my $errorline = $player +1;
$error_line = $errorline;
my $errorfname = $fields[0];
$error_fname = $errorfname;
my $errorlname = $fields[1];
$error_lname = $errorlname;
}
}
if ($error_flag == "1"){
print "\n Line $error_line : ";
print "$error_fname, ";
print "$error_lname :";
print "Line contains not enough data.\n";
}
if ($error_flag2 == "1"){
print "\n Line $error_line2 : ";
print "$error_fname2, ";
print "Line contains bad data.\n";
}
OK, so the problem you've got here is that you're thinking of grep in Unix terms - a text based thing. It doesn't work like that in perl - it operates on a list.
Fortunately, this is pretty easy to handle in your case, because you can split your line into words.
Without your source data, this is hopefully a proof of concept:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
while ( <DATA> ) {
#split the current line on whitespace into an array.
#first two elements get assigned to firstname lastname, and then the rest
#goes into #values
my ( $firstname, $lastname, #values ) = split; #works on $_ implicitly.
#check every element in #values, and test the regex 'non-digit' against it.
my #errors = grep { /\D/ } #values;
#output any matches e.g. things that contained 'non-digits' anywhere.
print Dumper \#errors;
#an array in a scalar context evaluates as the number of elements.
#we need to use "scalar" here because print accepts list arguments.
print "There were ", scalar #errors, " errors\n";
}
__DATA__
bob hankerman 2039 3232 23 232 645 64x3 324
Or reducing down the logic:
#!/usr/bin/perl
use strict;
use warnings;
while ( <DATA> ) {
#note - we don't need to explicity specify 'scalar' here,
#because assigning it to a scalar does that automatically.
#(split) splits the current line, and [2..8] skips the first two.
my $count_of_errors = grep { /\D/ } (split)[2..8];
print $count_of_errors;
}
__DATA__
bob hankerman 2039 3232 23 232 645 64x3 324
First : You don't need to use "GREP", Simply you can match the string with "=~" in perl and you can print matched value with $&.
Second : You should use $_ if and only if there is not other variable used in the loop. There is already $i used in the loop, you can write the loop as :
for my $i (2..9) {
print "$i\n";
}
or
foreach(2..9) {
print "$_\n";
}

Referencing an element in a 2D array in Perl

I have the following code which reads in a 6x6 array from STDIN and saves it as an array of anonymous arrays. I am trying to print out each element with $arr[i][j], but the code below isn't working. It just prints out the first element over and over. How am I not accessing the element correctly?
#!/user/bin/perl
my $arr_i = 0;
my #arr = ();
while ($arr_i < 6){
my $arr_temp = <STDIN>;
my #arr_t = split / /, $arr_temp;
chomp #arr_t;
push #arr,\#arr_t;
$arr_i++;
}
foreach my $i (0..5){
foreach my $j (0..5){
print $arr[i][j] . "\n";
}
}
i and j are not the same as the variables you declared in the foreach lines. Change:
print $arr[i][j] . "\n";
to:
print $arr[$i][$j] . "\n";
warnings alerted me to this issue. You should add these lines to all your Perl code:
use warnings;
use strict;
To demonstrate the Perlish mantra that there's "more than one way to do it":
use 5.10.0; # so can use "say"
use strict;
use warnings qw(all);
sub get_data {
my ($cols, $rows) = #_;
my ($line, #rows);
my $i;
for ($i = 1; $i <= $rows and $line = <DATA>; $i++) {
chomp $line;
my $cells = [ split ' ', $line ];
die "Row $i had ", scalar(#$cells), " instead of $cols" if #$cells != $cols;
push #rows, $cells;
}
die "Not enough rows, got ", $i - 1, "\n" if $i != $rows + 1;
\#rows;
}
sub print_data {
my ($cols, $rows, $data) = #_;
for (my $i = 0; $i < $rows; $i++) {
for (my $j = 0; $j < $cols; $j++) {
say $data->[$i][$j];
}
}
}
my $data = get_data(6, 6);
print_data(6, 6, $data);
__DATA__
1 2 3 4 5 6
a b c d e f
6 5 4 3 2 1
f e d c b a
A B C D E F
7 8 9 10 11 12
Explanation:
if we use say, that avoids unsightly print ..., "\n"
get_data is a function that can be called and/or reused, instead of just being part of the main script
get_data knows what data-shape it expects and throws an error if it doesn't get it
[ ... ] creates an anonymous array and returns a reference to it
get_data returns an array-reference so data isn't copied
print_data is a function too
both functions use a conventional for loop instead of making lists of numbers, which in Perl 5 needs to allocate memory
There is also a two-line version of the program (with surrounding bits, and test data):
use 5.10.0; # so can use "say"
my #lines = map { [ split ' ', <DATA> ] } (1..6);
map { say join ' ', map qq{"$_"}, #$_ } #lines;
__DATA__
1 2 3 4 5 6
a b c d e f
6 5 4 3 2 1
f e d c b a
A B C D E F
7 8 9 10 11 12
Explanation:
using map is the premier way to iterate over lists of things where you don't need to know how many you've seen (otherwise, a for loop is needed)
the adding of " around the cell contents is only to prove they've been processed. Otherwise the second line could just be: map { say join ' ', #$_ } #lines;

Perl Script that should move through array with i=3 prints indicies that aren't x3

I have these arrays of Sequences and I wrote this script to walk through each sequence three letters at a time (eg. {0,1,2}, {3,4,5},{6,7,8}) and print the index of where it first encounters a certian 3 letter combination (TAA,TAG,TGA). (EX. if sequence were CGTAGCCCCTAACCCC, then the script would skip over the TAG in the 2 position because its not in the correct frame of 3 and report the TAA in the 9 position). Therefore, I am only expecting indices in multiples of 3 in my results.
On most strings there is no problem, however every once in a while it will index at 4 or other non multiples of three. I was wondering if anyone more advanced than I can figure out why this may happen. I know this script is ugly and I am sorry for that, I am a biologist and I mod it for whatever I am mining out of sequences at the time. I just can't figure out the bug.
Here are some sequences from my file. The 3rd line is the sequence that gives the strange result. Just for an example of what I am dealing with.
AGGTACGCGAGTCACCTTTCGTCTTCAATCTCGTTTGATCGAAGCTATTTGTCAAAAAGAGAGGATTTTTTTGCATCTCAATTATGATCATTCCTTAGGGTTTTCAGGGTTTTGGATTGTTGTTTTTGTTAACATTTATCTGATTCGTTTGTATTTGTGTGGCAGTCTAAAGTGGCATCAACAATGGCGTCTTTTATTATACATAAGCCAAAGGAGAGATCGCCTTTCACGAAAGCTGCTTTCAAAACGGTACCTTTAGTGATTCAGCATTTTTATCTGAAATATGTTTGTTGCATTATTGAATGATTCTGATGTGGTGTTGCTACCAACTTGTCTATGTTGGTTGATTTAGCTTGATAGCATCAAGGAGTTGGAACTGTTTATGTTGAAGCATCGAAAGGATTATGTTGATCTGCACCGGACTACAGAACAGGAAAAGGATAGTATTGAACAAGAAGTAAGTACTCTGAGCTAGGCTTGCCCGTAGTATATATCTGAACTCATGAAGTTACTGCGATAAATCTATGCTTGAGTTGAGATTGAACATATGGAACTATGGAATCATAAGAAATGTAGCAACTCATATTGAGATAACTCAGGAAGATTAATGTCTATTACTTTAGATAGCGAGGGAGTTAGTATATTGTGACACTGAGGAACTTGGATCTTGTATTCTTATACCTCTTGCAGTGTTTGATCGAGAACTATGTCTACTTATGTGTTGTGTAATATCATCAAACTCTCTCTCTCTCCCTCTTGCAGGTTGCTGCTTTTATTAAAGCTTGCAAAGAACAGATCGATATTCTCATAAACAGTATTAGAAATGAAGAAGCAAACTCCAAAGGATGGCTTGGCCTCCCCGCAGATAACTTCAATGCTGATTCTATAGCACACAAACATGGAGTGGTATGATATGCACCAATGTAGTAAGCCAACTTTGGTTTTTTTTTACTATGTTTTCTTTCAAAGTATCTAGATGTGTAGAAGTAATGGTAATTTTTTTTGTATGCAGGTTTTGATTCTGAGTGAGAAACTTCATTCAGTCACTGCCCAGTTTGATCAGCTTAGAGCTACTCGTTTCCAAGATATTATAAACAGAGCTATGCCGAGAAGAAAACCTAAGAGGGTCATAAAGGAAGCTACCCCAATTAATACAACTCTGGGAAATTCGGAGTCCATAGAACCGGATGAAATCCAGGCCCAACCTCGTAGATTACAACAACAACAACTTCTAGACGATGAAACACAAGCCCTTCAGGTAACAAGGCAAATATACATGATCTTCGAAAACTTGCATAAGTTTTGTAGTTATGCTAAATTTTGAAATTGATAATTTTTGCAGGTAGAGCTAAGTAATCTTTTAGATGGTGCTAGGCAGACAGAAACTAAGATGGTGGAGATGTCTGCATTAAACCACTTGATGGCAACTCATGTTCTGCAGCAAGCCCAACAGATAGAGTTTCTTTATGACCAGGTTAGGACTTATTAACTTCTCTAACGCTCTCATGTCAACACACTGTTTTGTTAGGCTTTCACTGTTCTTTACACTCCTTTGCTATCTCAAAGTTAAATTCGGATGCTTATTGTATTCAGAACTTTTCCTTGTCACATTCACCTAAATTAGGTATAGAGACGGGAAAGAAACTTTGTATTGGTCCAATTTTAATTGCTCTCCAATTTAGTGGTAGGAAATGGAACGGTTAATGTTTTTAGCTATGTAAAGTCTCTAAAACTCCATTTGAATGTGTCAATGACTCAATGCCATTCCCAATACTTTAGTTTATGGGGCTTTGCAGTTTTCCTACTCTGTAAACGTACAGCTTATGACTGACTTGGTGGCTCTCTTTATGTGTGTGTGTGTGTGTCTTGAGGCCCTTTTTCTCACTCAGTTTGACACTAAATGCAGGCAGTTGAGGCAACAAAGAACGTGGAGCTTGGAAACAAAGAGCTTTCTCAAGCAATCCAACGAAACAGCAGCAGCAGAACCTTTCTCTTACTGTTTTTCTTCGTCCTTACTTTCTCCGTCTTGTTCTTGGATTGGTACAGTTAAaaaacc
AGGTGATTGTTTTGTTATTATAAATCAAGATCAGTACATATATATTTTTGTTTTTCTTGGTTTCATATGTAATATTTTGGACTTTTGGTGTTTAGGTTTTTGACTTGGAAGAAAAGAACGTAATGGATGAGTCACTACACGAGGTGTATAAATTTTGCCTCACCGATGTTGATGAGAGAAGCAAGAAAGAGACATCAATGAAAGATGATTACATAGAACATAAGAAGTCTACTAGATTGTTGGCTGAAAATGCGAAGAAGTCCGGTCACAGTTTAGAAATATTAAGGCCGGAATCTAAACCTGAGACTGAAAAAGAGGTGATTTTATTTTCTTGTTATATAAAGATTCGTAGACATATATTTGGTTTTTCTTTGGTTTCATAATATTTTGGACTTATGTGTGTTTAGGTCAATGAAGAGGAAGAGAAGAGAGTAATGGATCCGGATGTGGATATTAGTTGTTATGAAGAGTCACCACACGAGGTGTATAAATTTAGCCTCACCGATTTCGAAGAAGAGATAATGGAAGATGATTACAGAGAAGATATGAAGTGTAGAATGTTGGATGATATAGTGAAGAATTCCGGTCACCGTGTAGAAATATCAAGGCCGGAATATTATAAACCTGAGATTGAAAAACAGGTTTTATTTTTTTGGTTATTTTGTGATTAAGATCAGTTTTTTTTTTTTTTTTTTTTGGTTTAATAATATTTGATCTTGTGTGTGTTTAGGTATATGAAAAGGAAGAGAAGAAAGTAATGGATCCGGATATCTATATTAGATCTTATGAAGAGTCACCAAACGAGGTGTATAAATTTAGCCTCACTGATTTGGAAGAAGAGATAATGGAAAATGACTCCATAGAAGGTGTGAAGTGTAGAATGTTGGATGAAATAATGAAGAAGTCCGGTCACCATTTAAAAATATCAAGGCCGGAATATAAACCTGAGATTGAAAAACAGGTTAGTTTTTAATAAAAAGATCACTAGATATTTTTTTTTATTTTTTTTTGTTTTTGGTTTCATAATATTTGACTTGTGGCATGTGTTTAGGTATATGAAGAGGAAGAGAAGAAAGTAATGGATCCAGATGTGGATATTAGATGTTATGAAGAGTCACCACACGAGGTGTCTAAATTTAGCCTCACCGATTTCGAAGAAGAGATAATGGAAGATGATTACATAGAAGCTTTGAAGTGTAGAATGTTGGATGATATATTGAAGAAGTCCGGTCACCGTTTAGAAATATCAAGGCGGCAATATAATAAACCTGAGATTGAAATACAGGTGATTTTTTTTTTTTATTATTGTTGTTATAGTAAGATCAGTAGATATATATCTTGGTTTCATAATATTTTGGACTTGTGTGTGTTTAGGTCAATGAAAAGGAAGAGAAGAAAGTAATCAATACGGATATGGATATTAGATATGATGATGAGTCACCAGAAGAGGTGGAGACATATTCTAGTCTCACGGATGATGAAGAAGAGAGAAGCAAGGAAGATACATCAATGGAAGATGTGAAGTGTAGAATGTTGGATTAAAAAACGACGAAGCTCGGCCACCTTTTAGGAATATCAAGGCCGGAATATAGACCTGAGATTGAAAAACAGGTGATTTTATTTTGTTGTTAATTGTATTAGTAAAGATCAGTAGATATATATTTGTTTTTGTTTTTCGGTTTCATAATATTTTGGACGCTTGTGTTTAGGTCAATGAAGAGAAAGAAAGAAAGTAATGGATATTAGATCTGCTGGTCAGTCACAAACACGAGGTGTACAAATTTAGCCTCACCGATATCAAAGAAGAGAGAAGCAATGAAGATACATCAATGGAAGATTGTTGCATAGAAGAGGCTCAAGTCGGAAAAGATCAAAGAGTCTTCAGATTCAGAGAAAGTAGTGAAGAGAAGAGAAAATCCTCATCATCACCATTATCACCACTAACAGAGTTTAGGGATATGGAGAGTTTGACGTATTACATGAGGCAAAAAGGGATGCATCGAAGAAGAAGAAGATCATCAACATCACCACATTGTTGCCATAATGTAGTATACAATGAGTTTAAAGTGACGAAGGAAGAAGAAGAGGAAGAAAGACAAAGATTAACAACCAAACGTGTTCATTCTAAGCTTCATGAATACGAACAATTTTTAACTCAGTTTAAAAAGAAGAAGGAAGAAGAAAACGAGAGACGAAGATTATCACCCAAAGACTTTGAGCCTACGCTTCCTGATTACGACCAAGTGATTACTCGCTTTAGAGTGCTGGAGAAGGAAGAAGAAGAAAGACGAAGATTAGCAACAAAACATGTTCATCCTAAGCTTCCTGATTACGACCAGATTGCTACTAAGTTTAAACTCCTGAAGGAGGTAGAAAAAGAAAGACGAAGATTATTAACCAAACACAGTTCATCCTAAgcttcc
TGGTAATTTTTGCATCTTCAAAATGTTCTAAAATTTTGGCAAATGGTTTTGTTAAGTTCGAATTTTTGGTTATGATACAGTTTGAACGTTTTTCTTCATAGATTACAGTTTTAGCAAATGTGAATCATTAAAAGTGGAATAGTTGGTTTGAAAACAATTGTCAATTTCATTTTTTTTTTGGTTTTATGGTTAGGCGAGGAAAGCATTAAGAGCTTTGAAAGGTATAGTGAAGCTACAAGCATTAGTGAGAGGATACTTAGTAAGGAAACGCGCGGCCGCAATGTTGCAGAGCATACAAACTTTGATCAGAGTCCAAACCGCTATGCGATCAAAACGCATCAATCGCAGCCTCAACAAAGAGTACAACAACATGTTTCAACCTCGACAATCCTTTGTAAAGAACTATTCTCATTTCCATTGGCTCTCTTTTTTTCTTTAAGCCAAAACAAGACTTAAAGTGTGTCCTCTGTTTGTAGGATAAGTTTGATGAAGCAACGTTCGATGACAGAAGAACAAAGATTGTAGAGAAGGACGATAGATACATGAGAAGATCAAGTTCAAGATCAAGATCTAGACAAGTGCACAATGTTGTTTCAATGTCTGACTATGAAGGCGATTTTGTTTACAAAGGGAATGATTTGGAGTTGTGTTTCTCGGATGAGAAGTGGAAGTTTGCTACCGCGCAGAACACGCCGAGATTATTGCATCACCATTCTGCTAATAATCGCTATTATGTAATGCAGTCTCCAGCTAAGAGTGTTGGTGGAAAGGCTTTGTGTGACTATGAAAGCAGTGTGAGTACTCCTGGCTACATGGAGAAAACTAAGTCCTTTAAGGCAAAAGTGCGTTCACACAGCGCACCGCGCCAGCGATCTGAGAGGCAGAGGTTGTCGCTAGATGAAGTTATGGCCTCTAAGAGTAGCGTTAGCGGTGTGAGTATGTCGCATCAGCATCCACCACGCCATTCTTGTTCCTGTGATCCGCTTTAActtaac
GAGTTAGTAAACAAAGTGTTCACATTTTAGTAAACATTGTTGTTCGTTAATCACGTAACGTTTTGTTTTTCCAGTTTACACTGAGCTCTGATGAGTATATAACGGAGGTGAATGGTTACTACAAAACTACGTTTTCGGGAGAAGTCATAACGTCGTTGACGTTCAAGACGAACAAAAGGACATATGGGACTTACGGAAATAAAACCAGTAGCTACTTTTCTGTTGCCGCACCCAAAGATAACCAGATTGTCGGTTTTCTTGGAAGTAGCAGCCATGCTCTCAACTCCATCGACGCTCATTTTGCCCCTGCTCCTCCTCCTGGTAGCACCGGAGCTAAGCCCGGTGCTAGTGGCATCGGAAGTGATTCTGGTAGCATTGGTAGTGCCGGAACTAACCCTGGTGCTGATGGCACCAGAGAAACCGAAAAAAACGCTGGTGGCTCAAAACCTAGTAGTGGTAGTGCCGGAACTAACCCTGGTGCTAGTGCTGTTGGCAACGGAGAAACCGAAAAAAATGCTGGTGGCTCAAAACCTAGCAGTGGTAGTGCTGGAACTAACCCTGGTGCTAGTGCTGGTGGCAACGGAGAAACCGAAAAAAACGTTGGTGGCTCAAAACCTAGCAGTGGTAAAGCCGGAACTAACCCTGGTGCTAATGCTGGTGGCAACGGAGGAACCGAAAAAAACGCTGGTGGCTCAAAATCTAGCAGTGGTAGTGCTCGAACTAACCCTGGTGCTAGTGCTGGTGGCAACGGAGAAACTGTTTCCAACATTGGAGATACGGAAAGTAACGCTGGTGGCTCGAAAAGTAATGATGGTGCTAACAATGGTGCTAGTGGCATTGAAAGTAATGCTGGTAGCACTGGAACTAACTTTGGTGCTGGTGGCACCGGGGGAATTGGAGATACGGAAAGTGATGCTGGTGGCTCCAAAACTAACTCTGGAAACGGCGGAACTAACGATGGTGCTAGTGGTATTGGAAGTAATGATGGTAGCACTGGAACTAACCCTGGTGCTGGTGGAGGAACAGATTCAAACATCGAAGGTACTGAAAATAACGTTGGTGGCAAGGAAACTAACCCTGGTGCTAGTGGCATTGGAAATAGTGATGGTAGCACTGGAACTAGCCCCGAAGGTACCGAAAGTAACGCTGACGGCACAAAAACTAACACGGGAGGCAAAGAATCTAACACCGGAAGTGAATCCAACACCAATTCTAGTCCACAAAAGTTGGAAGCACAAGGAGGCAATGGAGGAAATCAATGGGACGACGGAACCGATCATGATGGTGTGATGAAGATACATGTTGCAGTTGGTGGTCTAGGAATTGAGCAAATTAGATTTGATTATGTCAAGAACGGACAGTTGAAGGAAGGACCCTTCCACGGTGTCAAAGGAAGAGGTGGCACTTCAACGGTGCGTAAATTTTTATTATTATGGCTCAATTACGTTTTTCGAATAAGTGTTAATTCAAGATTATTGATCTTCATGATTCTGCAGATTGAGATTAGCCATCCGGACGAGTATCTTGTTTCCGTCGAGGGGTTGTACGACTCTTCCAATATCATTCAAGGAATCCAGTTTCAATCCAACAAACACACTTCTCAGTACTTTGGATATGAATATTATGGAGATGGTACACAATTTTCACTTCAAGTTAATGAAAAGAAGATCATTGGTTTCCATGGTTTTGCCGACTCACACCTTAATTCTCTTGGAGCTTATTTCGTTCCAATCTCATCCTCTTCTTCCTCCTTGACTCCTCCTCCCAACAAAGTTAAAGCTCAAGGAGGAAGTTATGGAGAAACATTTGACGATGGTGCTTTCGATCATGTAAGAAAGGTTTATGTTGGTCAAGGTGATTCTGGTGTAGCTTATGTCAAGTTCGATTATGAAAAAGACGGTAAAAAGGAGACACAAGAACATGGAAAAATGACATTGTCAGGAACAGAGGAGTTTGAGGTTGATTCAGACGATTACATAACATCAATGGAGGTTTATGTCGACAAAGTCTACGGTTATAAAAGCGAAATCGTCATTGCTCTTACCTTCAAGACCTTTAAGGGTGAAACTTCTCCACGTTTTGGAATAGAGACTGAGAATAAATATGAAGTTAAAGACGGTAAAGGAGGAAAACTTGCTGGTTTCCATGGAAAAGCTAGCGATGTTCTTTATGCTATTGGTGCTTATTTCATTCCAGCAGCAAATTAGagagtt
ACGTATGTCTTAGTTACTACTATCATACTATATTACTATGTATTGGAAAACTTTTGGTTAGAACCTGTTGGGAGGAAAGGGTTTATGTTCTGGTTCATTTTACGTGTACTAAGTACTTATAATTAAGATTAAAAGAAACATTTACAGCTTCACCCTCTGGTCGATGTATGTGGGCTGTGGGCATGTGGCCAATCTCTGAAGCGTTAGGTAGAGCAAATATAGAGTTGAGAGTTGCTTAAGTTAGTGAACGTGAATGACTAAAAAGATATGTTGCATTTAAATCGTATTGGGCCTCATCCCATCTAAAATATAGTAGGTGTAGGCCTTTTAGGTTAATTTGAATAAAATCAACCTTTTTGTAAGCAACATCGACGATTGTCACATTTTTCTCATACACATAGGTGTAATCTAGCTTTGAATGTTTTCTCATACACATAGGTGTAATCACCGTAATTATCATTTGTGAAGATATATGTTTTACCAAGTGGTTTGTATTGTCCATATATACTTTACCACTTTCATATTAACATATAATGTTTTTGTAAGTATTATACCATAAAGGATTGGTTTCTTAATATTATTAACAAAACGCAAAAATTCTTTTAAACGCAGGCGATTCCAATCCACAGCGTTGCGGTTAGAGTAGGATCAACACAAAGAGTAGTGATGGAGATCATAATCACATTCGCATTGGTCTACACTGTTTACGCCACAGCCATTGACTCCAACAATGGCACTCTCGGAACCATCGCTCCACTTGCTATCAGACTCATCGTTGGTGCTAACATTCTTGCAGCCGGCCCATTCTCTGGTGGTCCAATGAACCCTGGACGTTCTTTTGGATCATCTCTTGCCGTTGGAAATTTTTCAGGACATTAGgtttat
and here is the script I am running:
#!/usr/bin/perl
use strict;
use warnings;
# A program to find the first inframe stop codon of non-spliced intron containing genes
print "ENTER THE FILENAME FOR DNA SEQUENCES:= ";
# Asks for Sequence file and if file does not exist prints error message
my $filename = <STDIN>;
#my $sequence;
my #sequence;
chomp $filename;
unless (open(DNAFILE, $filename) ) {
print "Cannot open file \"$filename\"\n\n";
}
#sequence = <DNAFILE>;
close DNAFILE;
open (FILE, ">AtPTCindex.txt");
my $j;
my $i;
my $codon;
my $stopseq;
my $counter;
#Change $j<(375) to n=number of sequences
for ($j = 0; $j < #sequence; $j ++) {
$counter = 0;
for ($i = 0; $i < (length($sequence[$j]) - 2) && $counter < 1; $i += 3) {
$codon = substr($sequence[$j], $i, 3);
if ($codon =~ m/TAG|TGA|TAA/g) {
# m added before /TAG... above
$stopseq = substr($sequence[$j], $i, 9);
my $result = index($sequence[$j], $stopseq);
$counter = 1;
#my $results = index($sequence[$j], $stopseq);
print FILE "$result \n";
#print FILE "$results $j \n";
}
}
if ($counter == 0) {
print FILE "\n"
}
}
close FILE;
exit;
Thanks so much.
As threatened, the following is a cleaned up version of your script:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
die "Usage: $0 Filename\n" if #ARGV != 1;
my $file = shift;
open my $infh, '<', $file;
open my $outfh, '>', 'AtPTCindex.txt';
while (my $line = <$infh>) {
chomp($line);
my $result = '';
for (my $i = 0; $i < (length($line) - 2); $i += 3) {
my $codon = substr($line, $i, 3);
if ($codon =~ m/TAG|TGA|TAA/) {
# m added before /TAG... above
my $stopseq = substr($line, $i, 9);
$result = index($line, $stopseq);
$result .= " ($i, $codon, $stopseq)";
last;
}
}
print "$result\n";
# print $outfh "$result\n";
# print $outfh "$result $.\n";
}
close $infh;
close $outfh;
For the 5 lines of data that you provided, the following is the output:
84 (84, TGA, TGATCATTC)
3 (3, TGA, TGATTGTTT)
3 (3, TAA, TAATTTTTG)
4 (27, TAG, TAGTAAACA)
123 (123, TAA, TAAGATTAA)
I believe your issue is with these lines:
my $stopseq = substr($line, $i, 9);
$result = index($line, $stopseq);
You're pulling a sequence from the $line at position $i, and then immediately doing an index for it. In the case of 4 of 5 of those lines, it immediately finds the same value $i. However, in the case of line 4, it finds a matching sequence earlier in the line.
If this isn't desired, you'll have to explain what your desired behavior actually is. Perhaps, you just want $i? Or are you looking for a matching stop sequence any point AFTER $i? You'll have to specify what your actual logic wants to be.
I took a different approach, unpacking it into groups of three instead of counting by indexes of three. I believe this script does what you want, and it looks a lot cleaner. It can also optionally take the filename as argument.
#!/usr/bin/perl
use strict;
use warnings;
my $filename = 'a'; # dummy value
my $resultfile = 'AtPTCindex.txt';
# User may have passed filename as arguement
if (#ARGV) { if (-e $ARGV[0]) { $filename = $ARGV[0] } }
unless (-e $filename)
{
print "ENTER THE FILENAME FOR DNA SEQUENCES: ";
chomp($filename = <STDIN>)
}
open DNA,"<$filename" or die "Couldn't open $filename for reading: $!\n";
my #sequence = <DNA> or die "Couldn't read $filename: $!\n";;
close DNA;
# Uncomment the below line if you're braver than me
if (-e $resultfile) { die "Cowardly refusing to write to existing file" }
if (-e $resultfile) { unlink $resultfile };
open RESULT,">>$resultfile" or die "Courdn't open$!\n";
foreach my $string (#sequence)
{
# split into groups of 3
my #groups = unpack "(A3)*", $string;
# Search for the group you want
for (my $groupnum = 0; $groupnum < #groups - 1; $groupnum++)
{
if ($groups[$groupnum] =~ m/(TAG|TGA|TAA)/g)
{
print RESULT (($groupnum + 0) * 3) . "\n";
print "$1 (" . $1 . ( $groups[$groupnum + 1]) . ($groups[$groupnum + 2]) . ") at index " . (($groupnum + 0) * 3) . "\n";
last;
}
}
}
close RESULT;
Running the script on your sample data, it outputs:
TGA (TGATCATTC) at index 84
TGA (TGATTGTTT) at index 3
TAA (TAATTTTTG) at index 3
TAG (TAGTAAACA) at index 27
TAA (TAAGATTAA) at index 123
...as well as writes the raw index numbers to the file specified.

getting arrays and replacing spaces with comma

I'm working on something the whole day and night but it seems like i'm not getting any further with it cause it's a bit complicating for me to learn actually :(
The code is like this:
$aDoor = $_POST['zeilenid'];
if(empty($aDoor))
{
echo("You didn't select anything.");
}
else
{
$N = count($aDoor);
echo("You selected $N entry ID(s): ");
for($i=0; $i < $N; $i++)
{
$str = $aDoor[$i];
$str = ereg_replace(" ",",",$str);
echo($str . " ");
}
}
it doesn't work tho - i want the IDs, which will showing to NOT show them this way: 8 9 10
but rather this way: 8, 9, 10
and save it in ONE variable! how cna i make this happen? What am I doing wrong here?
just try:
$aDoor = $_POST['zeilenid'];
if(empty($aDoor))
{
echo("You didn't select anything.");
}
else
{
$N = count($aDoor);
echo("You selected $N entry ID(s): ");
echo join(", ", $aDoor);
}
join combines all elements of the array into one string and inserts the first parameter between each element
edit: save it:
$str = join(", ", $aDoor);
$clean_search = str_replace(' ', ',', $user_search);
$search_words = explode(' ', $clean_search);
$final_search_words = array();
if (count($search_words) > 0) {
$final_search_words[] = $user_search;
foreach ($search_words as $word) {
if (!empty($word)) {
$final_search_words[] = $word;
}
}
}
This code will replace empty space with comma, and the if condition is used to replace extra spaces with only one space.

Resources