How to create a table using two arrays in perl - arrays

Need a help to create a table using two arrays or files in perl cgi.
I need to create a table that print the list of directories from different path then put it into table say the title on column one says path1 and column 2 path2 and so on, and each column list the directory from that path with href link..here what I do have.
opendir(D, "../abc/status") or die"$!";
my #path1_dir = sort readdir D; closedir D;
opendir(D, "../def/status") or die "$!";
my #path2_dir = sort readdir D; closedir D; .... ...
print "\n"; print "$path1_dir\n"; print "$path2_dir\n";
#print list of directories to column-1 with title Path1
foreach my $path (#path1_dir) {
print "\t\n";
next if ($path =~ /^./);
next if ($path =~ /^\s*$/);
print "$path\n";
}
#this should go to the column two with Path2 title but it does not
foreach my $path (#path2_dir) {
print "\t\n"; `enter code here`
next if ($path =~ /^./);
next if ($path =~ /^\s*$/);
print "$path\n";
}
Can someone help me on this if you can?

I think you want something like this based on your description -- ie printing in two columns
$dir1 = "../abc/status";
$dir2 = "../def/status";
opendir(D, $dir1) or die"$!";
my #path1_dir = sort grep { !/(^\.|^\s*$)/ } readdir D; closedir D;
opendir(D, $dir2) or die "$!";
my #path2_dir = sort grep { !/(^\.|^\s*$)/ } readdir D; closedir D;
print "$dir1\t$dir2\n";
# figure out which one has more files
$limit = $#path1_dir < $#path2_dir ? $#path2_dir : $#path1_dir;
# print in 2 columns
for ($i = 0; $i<=$limit; $i++) {
printf "%s\t%s\n",
($i<=$#path1_dir ? $path1_dir[$i] : ""),
($i <= $#path2_dir ? $path2_dir[$i] : ""),"\n";
}

Although it is not clear what exactly you want, I'm guessing you want to have an ls style output (only directories) for an FTP or something. The variable naming might be ambiguous, but basically what it's doing is printing out one directory from each directory at a time.
#!/usr/bin/perl
use strict;
use warnings;
my %dirs;
my #dirs_to_look = qw(../abc ../def);
my $max_num_of_dirs;
foreach my $dir (#dirs_to_look) {
# Directories that were found in $dir
my #dirs_in_dir = grep { -d } glob "$dir/*";
$max_num_of_dirs = scalar #dirs_in_dir
if not defined $max_num_of_dirs or $max_num_of_dirs < scalar #dirs_in_dir;
$dirs{$dir} = \#dirs_in_dir;
}
my #keys = sort {$a cmp $b} keys %dirs;
# Print the column titles
foreach my $key (#keys) {
printf "%15s", $key;
}
print "\n";
for (my $i = 0; $i < $max_num_of_dirs; $i++) {
foreach my $key (#keys) {
my $dir = shift #{ $dirs{$key} };
printf "%15s", $dir // "";
}
print "\n";
}

Related

Is it possible to put elements of array to hash in perl?

example of file content:
>random sequence 1 consisting of 500 residues.
VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV
HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;
>random sequence 2 consisting of 500 residues.
KAAATKKPWADTIPYLLCTFMQTSGLEWLHTDYNNFSSVVCVRYFEQFWVQCQDHVFVKN
KNWHQVLWEEYAVIDSMNFAWPPLYQSVSSNLDSTERMMWWWVYYQFEDNIQIRMEWCNI
YSGFLSREKLELTHNKCEVCVDKFVRLVFKQTKWVRTMNNRRRVRFRGIYQQTAIQEYHV
HQKIIRYPCHVMQFHDPSAPCDMTRQGKRMNFCFIIFLYTLYEVKYWMHFLTYLNCLEHR;
>random sequence 3 consisting of 500 residues.
AYCSCWRIHNVVFQKDVVLGYWGHCWMSWGSMNQPFHRQPYNKYFCMAPDWCNIGTYAWK
I need an algorithm to build a hash $hash{$key} = $value; where lines starting with > are the values and following lines are the keys.
What I have tried:
open (DATA, "seq-at.txt") or die "blabla";
#data = <DATA>;
%result = ();
$k = 0;
$i = 0;
while($k != #data) {
$info = #data[$k]; #istrina pirma elementa
if(#data[$i] !=~ ">") {
$key .= #data[$i]; $i++;
} else {
$k = $i;
}
$result{$key} = $value;
}
but it doesn't work.
You don't have to previously use an array, you can directly build your hash:
use strict;
use warnings;
# ^- start always your code like this to see errors and what is ambiguous
# declare your variables using "my" to specify the scope
my $filename = 'seq-at.txt';
# use the 3 parameters open syntax to avoid to overwrite the file:
open my $fh, '<', $filename or die "unable to open '$filename' $!";
my %hash;
my $hkey = '';
my $hval = '';
while (<$fh>) {
chomp; # remove the newline \n (or \r\n)
if (/^>/) { # when the line start with ">"
# store the key/value in the hash if the key isn't empty
# (the key is empty when the first ">" is encountered)
$hash{$hkey} = $hval if ($hkey);
# store the line in $hval and clear $hkey
($hval, $hkey) = $_;
} elsif (/\S/) { # when the line isn't empty (or blank)
# append the line to the key
$hkey .= $_;
}
}
# store the last key/val in the hash if any
$hash{$hkey} = $hval if ($hkey);
# display the hash
foreach (keys %hash) {
print "key: $_\nvalue: $hash{$_}\n\n";
}
It is unclear what you want, the array seems to be the lines subsequent to the random sequence number... If the contenst of a file test.txt are:
Line 1:">"random sequence 1 consisting of 500 residues.
Line 2:VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV
Line 3:HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;
You could try something like:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $contentFile = $ARGV[0];
my %testHash = ();
my $currentKey = "";
open(my $contentFH,"<",$contentFile);
while(my $contentLine = <$contentFH>){
chomp($contentLine);
next if($contentLine eq ''); # Empty lines.
if($contentLine =~ /^"\>"(.*)/){
$currentKey= $1;
}else{
push(#{$testHash{$currentKey}},$contentLine);
}
}
print Dumper(\%testHash);
Which results in a structure like this:
seb#amon:[~]$ perl test.pl test.txt
$VAR1 = {
'random sequence 3 consisting of 500 residues.' => [
'AYCSCWRIHNVVFQKDVVLGYWGHCWMSWGSMNQPFHRQPYNKYFCMAPDWCNIGTYAWK'
],
'random sequence 1 consisting of 500 residues.' => [
'VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV',
'HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;'
],
'random sequence 2 consisting of 500 residues.' => [
'KAAATKKPWADTIPYLLCTFMQTSGLEWLHTDYNNFSSVVCVRYFEQFWVQCQDHVFVKN',
'KNWHQVLWEEYAVIDSMNFAWPPLYQSVSSNLDSTERMMWWWVYYQFEDNIQIRMEWCNI',
'YSGFLSREKLELTHNKCEVCVDKFVRLVFKQTKWVRTMNNRRRVRFRGIYQQTAIQEYHV',
'HQKIIRYPCHVMQFHDPSAPCDMTRQGKRMNFCFIIFLYTLYEVKYWMHFLTYLNCLEHR;'
]
};
You would be basically using each hash "value" as an array structure, the #{$variable} does the magic.

Perl retrieve index of array on regex match and print

I am looking to extract columns based off of header names in a comma (or tab) delimited file. I have a scalar variable that matches many header possibilities I named '$Acct_Name', among ones. I want to read the file(column headers), match it to what I have in '$Acct_Name' and print the matched column along with its data etc.
Here is my code:
open(FILE, "list_2.txt") or die "Cannot open file: $!";
my $Account_Name = qr/^Acct ID$|^Account No$|^Account$|^ACCOUNT NUMBER$|Account Number|Account.*?Number|^Account$|^Account #$|^Account_ID$|^Account ID$/i;
my $CLIENT = qr/^CLIENT_NAME$|^Account Long Name$|^ACCOUNT NAME$|^Account Name$|^Name$|portfolio.*?description|^Account Description$/i;
while (my $line = <FILE>) {
chomp $line;
my #array = split(/,/, $line);
my %index;
#index{#array} = (0..$#array);
my $Account_Name_ = $index{$Account_Name};
if (my ($matched) = grep $array[$_] =~ /$Account_Name/, 0..$#array) {
$Account_Name_ = $matched;
my $CLIENT_ = $index{$CLIENT};
if (my ($matched) = grep $array[$_] =~ /$CLIENT/, 0..$#array) {
$CLIENT_ = $matched;
print $array[$Account_Name_],",",$array[$CLIENT_],"\n";
}
}
}
close(FILE);
Data, list_2.txt
Account number,order_num,Name
dj870-1234,12334566,josh trust 1992
My Results
Account number,Name
Desried Out
Account number,Name
dj870-1234,josh
For some reason I am only able to print the column names based on the match. How can I grab the data as well?
You need to move your print statement to output your data lines - these do not match, so in the original code the print statement is never reached !
use warnings;
open(FILE, "list_2.txt") or die "Cannot open file: $!";
my $Account_Name = qr/^Acct ID$|^Account No$|^Account$|^ACCOUNT NUMBER$|Account Number|Account.*?Number|^Account$|^Account #$|^Account_ID$|^Account ID$/i;
my $CLIENT = qr/^CLIENT_NAME$|^Account Long Name$|^ACCOUNT NAME$|^Account Name$|^Name$|portfolio.*?description|^Account Description$/i;
my ($Account_Name_, $CLIENT_);
while (my $line = <FILE>) {
chomp $line;
my #array = split(/,/, $line);
if (my ($matched) = grep $array[$_] =~ /$Account_Name/, 0..$#array) {
$Account_Name_ = $matched;
if (my ($matched) = grep $array[$_] =~ /$CLIENT/, 0..$#array) {
$CLIENT_ = $matched;
}
}
print $array[$Account_Name_],",",$array[$CLIENT_],"\n";
}
close(FILE);

Perl Script that should move through array with i=3 prints indicies that aren't x3

I have these arrays of Sequences and I wrote this script to walk through each sequence three letters at a time (eg. {0,1,2}, {3,4,5},{6,7,8}) and print the index of where it first encounters a certian 3 letter combination (TAA,TAG,TGA). (EX. if sequence were CGTAGCCCCTAACCCC, then the script would skip over the TAG in the 2 position because its not in the correct frame of 3 and report the TAA in the 9 position). Therefore, I am only expecting indices in multiples of 3 in my results.
On most strings there is no problem, however every once in a while it will index at 4 or other non multiples of three. I was wondering if anyone more advanced than I can figure out why this may happen. I know this script is ugly and I am sorry for that, I am a biologist and I mod it for whatever I am mining out of sequences at the time. I just can't figure out the bug.
Here are some sequences from my file. The 3rd line is the sequence that gives the strange result. Just for an example of what I am dealing with.
AGGTACGCGAGTCACCTTTCGTCTTCAATCTCGTTTGATCGAAGCTATTTGTCAAAAAGAGAGGATTTTTTTGCATCTCAATTATGATCATTCCTTAGGGTTTTCAGGGTTTTGGATTGTTGTTTTTGTTAACATTTATCTGATTCGTTTGTATTTGTGTGGCAGTCTAAAGTGGCATCAACAATGGCGTCTTTTATTATACATAAGCCAAAGGAGAGATCGCCTTTCACGAAAGCTGCTTTCAAAACGGTACCTTTAGTGATTCAGCATTTTTATCTGAAATATGTTTGTTGCATTATTGAATGATTCTGATGTGGTGTTGCTACCAACTTGTCTATGTTGGTTGATTTAGCTTGATAGCATCAAGGAGTTGGAACTGTTTATGTTGAAGCATCGAAAGGATTATGTTGATCTGCACCGGACTACAGAACAGGAAAAGGATAGTATTGAACAAGAAGTAAGTACTCTGAGCTAGGCTTGCCCGTAGTATATATCTGAACTCATGAAGTTACTGCGATAAATCTATGCTTGAGTTGAGATTGAACATATGGAACTATGGAATCATAAGAAATGTAGCAACTCATATTGAGATAACTCAGGAAGATTAATGTCTATTACTTTAGATAGCGAGGGAGTTAGTATATTGTGACACTGAGGAACTTGGATCTTGTATTCTTATACCTCTTGCAGTGTTTGATCGAGAACTATGTCTACTTATGTGTTGTGTAATATCATCAAACTCTCTCTCTCTCCCTCTTGCAGGTTGCTGCTTTTATTAAAGCTTGCAAAGAACAGATCGATATTCTCATAAACAGTATTAGAAATGAAGAAGCAAACTCCAAAGGATGGCTTGGCCTCCCCGCAGATAACTTCAATGCTGATTCTATAGCACACAAACATGGAGTGGTATGATATGCACCAATGTAGTAAGCCAACTTTGGTTTTTTTTTACTATGTTTTCTTTCAAAGTATCTAGATGTGTAGAAGTAATGGTAATTTTTTTTGTATGCAGGTTTTGATTCTGAGTGAGAAACTTCATTCAGTCACTGCCCAGTTTGATCAGCTTAGAGCTACTCGTTTCCAAGATATTATAAACAGAGCTATGCCGAGAAGAAAACCTAAGAGGGTCATAAAGGAAGCTACCCCAATTAATACAACTCTGGGAAATTCGGAGTCCATAGAACCGGATGAAATCCAGGCCCAACCTCGTAGATTACAACAACAACAACTTCTAGACGATGAAACACAAGCCCTTCAGGTAACAAGGCAAATATACATGATCTTCGAAAACTTGCATAAGTTTTGTAGTTATGCTAAATTTTGAAATTGATAATTTTTGCAGGTAGAGCTAAGTAATCTTTTAGATGGTGCTAGGCAGACAGAAACTAAGATGGTGGAGATGTCTGCATTAAACCACTTGATGGCAACTCATGTTCTGCAGCAAGCCCAACAGATAGAGTTTCTTTATGACCAGGTTAGGACTTATTAACTTCTCTAACGCTCTCATGTCAACACACTGTTTTGTTAGGCTTTCACTGTTCTTTACACTCCTTTGCTATCTCAAAGTTAAATTCGGATGCTTATTGTATTCAGAACTTTTCCTTGTCACATTCACCTAAATTAGGTATAGAGACGGGAAAGAAACTTTGTATTGGTCCAATTTTAATTGCTCTCCAATTTAGTGGTAGGAAATGGAACGGTTAATGTTTTTAGCTATGTAAAGTCTCTAAAACTCCATTTGAATGTGTCAATGACTCAATGCCATTCCCAATACTTTAGTTTATGGGGCTTTGCAGTTTTCCTACTCTGTAAACGTACAGCTTATGACTGACTTGGTGGCTCTCTTTATGTGTGTGTGTGTGTGTCTTGAGGCCCTTTTTCTCACTCAGTTTGACACTAAATGCAGGCAGTTGAGGCAACAAAGAACGTGGAGCTTGGAAACAAAGAGCTTTCTCAAGCAATCCAACGAAACAGCAGCAGCAGAACCTTTCTCTTACTGTTTTTCTTCGTCCTTACTTTCTCCGTCTTGTTCTTGGATTGGTACAGTTAAaaaacc
AGGTGATTGTTTTGTTATTATAAATCAAGATCAGTACATATATATTTTTGTTTTTCTTGGTTTCATATGTAATATTTTGGACTTTTGGTGTTTAGGTTTTTGACTTGGAAGAAAAGAACGTAATGGATGAGTCACTACACGAGGTGTATAAATTTTGCCTCACCGATGTTGATGAGAGAAGCAAGAAAGAGACATCAATGAAAGATGATTACATAGAACATAAGAAGTCTACTAGATTGTTGGCTGAAAATGCGAAGAAGTCCGGTCACAGTTTAGAAATATTAAGGCCGGAATCTAAACCTGAGACTGAAAAAGAGGTGATTTTATTTTCTTGTTATATAAAGATTCGTAGACATATATTTGGTTTTTCTTTGGTTTCATAATATTTTGGACTTATGTGTGTTTAGGTCAATGAAGAGGAAGAGAAGAGAGTAATGGATCCGGATGTGGATATTAGTTGTTATGAAGAGTCACCACACGAGGTGTATAAATTTAGCCTCACCGATTTCGAAGAAGAGATAATGGAAGATGATTACAGAGAAGATATGAAGTGTAGAATGTTGGATGATATAGTGAAGAATTCCGGTCACCGTGTAGAAATATCAAGGCCGGAATATTATAAACCTGAGATTGAAAAACAGGTTTTATTTTTTTGGTTATTTTGTGATTAAGATCAGTTTTTTTTTTTTTTTTTTTTGGTTTAATAATATTTGATCTTGTGTGTGTTTAGGTATATGAAAAGGAAGAGAAGAAAGTAATGGATCCGGATATCTATATTAGATCTTATGAAGAGTCACCAAACGAGGTGTATAAATTTAGCCTCACTGATTTGGAAGAAGAGATAATGGAAAATGACTCCATAGAAGGTGTGAAGTGTAGAATGTTGGATGAAATAATGAAGAAGTCCGGTCACCATTTAAAAATATCAAGGCCGGAATATAAACCTGAGATTGAAAAACAGGTTAGTTTTTAATAAAAAGATCACTAGATATTTTTTTTTATTTTTTTTTGTTTTTGGTTTCATAATATTTGACTTGTGGCATGTGTTTAGGTATATGAAGAGGAAGAGAAGAAAGTAATGGATCCAGATGTGGATATTAGATGTTATGAAGAGTCACCACACGAGGTGTCTAAATTTAGCCTCACCGATTTCGAAGAAGAGATAATGGAAGATGATTACATAGAAGCTTTGAAGTGTAGAATGTTGGATGATATATTGAAGAAGTCCGGTCACCGTTTAGAAATATCAAGGCGGCAATATAATAAACCTGAGATTGAAATACAGGTGATTTTTTTTTTTTATTATTGTTGTTATAGTAAGATCAGTAGATATATATCTTGGTTTCATAATATTTTGGACTTGTGTGTGTTTAGGTCAATGAAAAGGAAGAGAAGAAAGTAATCAATACGGATATGGATATTAGATATGATGATGAGTCACCAGAAGAGGTGGAGACATATTCTAGTCTCACGGATGATGAAGAAGAGAGAAGCAAGGAAGATACATCAATGGAAGATGTGAAGTGTAGAATGTTGGATTAAAAAACGACGAAGCTCGGCCACCTTTTAGGAATATCAAGGCCGGAATATAGACCTGAGATTGAAAAACAGGTGATTTTATTTTGTTGTTAATTGTATTAGTAAAGATCAGTAGATATATATTTGTTTTTGTTTTTCGGTTTCATAATATTTTGGACGCTTGTGTTTAGGTCAATGAAGAGAAAGAAAGAAAGTAATGGATATTAGATCTGCTGGTCAGTCACAAACACGAGGTGTACAAATTTAGCCTCACCGATATCAAAGAAGAGAGAAGCAATGAAGATACATCAATGGAAGATTGTTGCATAGAAGAGGCTCAAGTCGGAAAAGATCAAAGAGTCTTCAGATTCAGAGAAAGTAGTGAAGAGAAGAGAAAATCCTCATCATCACCATTATCACCACTAACAGAGTTTAGGGATATGGAGAGTTTGACGTATTACATGAGGCAAAAAGGGATGCATCGAAGAAGAAGAAGATCATCAACATCACCACATTGTTGCCATAATGTAGTATACAATGAGTTTAAAGTGACGAAGGAAGAAGAAGAGGAAGAAAGACAAAGATTAACAACCAAACGTGTTCATTCTAAGCTTCATGAATACGAACAATTTTTAACTCAGTTTAAAAAGAAGAAGGAAGAAGAAAACGAGAGACGAAGATTATCACCCAAAGACTTTGAGCCTACGCTTCCTGATTACGACCAAGTGATTACTCGCTTTAGAGTGCTGGAGAAGGAAGAAGAAGAAAGACGAAGATTAGCAACAAAACATGTTCATCCTAAGCTTCCTGATTACGACCAGATTGCTACTAAGTTTAAACTCCTGAAGGAGGTAGAAAAAGAAAGACGAAGATTATTAACCAAACACAGTTCATCCTAAgcttcc
TGGTAATTTTTGCATCTTCAAAATGTTCTAAAATTTTGGCAAATGGTTTTGTTAAGTTCGAATTTTTGGTTATGATACAGTTTGAACGTTTTTCTTCATAGATTACAGTTTTAGCAAATGTGAATCATTAAAAGTGGAATAGTTGGTTTGAAAACAATTGTCAATTTCATTTTTTTTTTGGTTTTATGGTTAGGCGAGGAAAGCATTAAGAGCTTTGAAAGGTATAGTGAAGCTACAAGCATTAGTGAGAGGATACTTAGTAAGGAAACGCGCGGCCGCAATGTTGCAGAGCATACAAACTTTGATCAGAGTCCAAACCGCTATGCGATCAAAACGCATCAATCGCAGCCTCAACAAAGAGTACAACAACATGTTTCAACCTCGACAATCCTTTGTAAAGAACTATTCTCATTTCCATTGGCTCTCTTTTTTTCTTTAAGCCAAAACAAGACTTAAAGTGTGTCCTCTGTTTGTAGGATAAGTTTGATGAAGCAACGTTCGATGACAGAAGAACAAAGATTGTAGAGAAGGACGATAGATACATGAGAAGATCAAGTTCAAGATCAAGATCTAGACAAGTGCACAATGTTGTTTCAATGTCTGACTATGAAGGCGATTTTGTTTACAAAGGGAATGATTTGGAGTTGTGTTTCTCGGATGAGAAGTGGAAGTTTGCTACCGCGCAGAACACGCCGAGATTATTGCATCACCATTCTGCTAATAATCGCTATTATGTAATGCAGTCTCCAGCTAAGAGTGTTGGTGGAAAGGCTTTGTGTGACTATGAAAGCAGTGTGAGTACTCCTGGCTACATGGAGAAAACTAAGTCCTTTAAGGCAAAAGTGCGTTCACACAGCGCACCGCGCCAGCGATCTGAGAGGCAGAGGTTGTCGCTAGATGAAGTTATGGCCTCTAAGAGTAGCGTTAGCGGTGTGAGTATGTCGCATCAGCATCCACCACGCCATTCTTGTTCCTGTGATCCGCTTTAActtaac
GAGTTAGTAAACAAAGTGTTCACATTTTAGTAAACATTGTTGTTCGTTAATCACGTAACGTTTTGTTTTTCCAGTTTACACTGAGCTCTGATGAGTATATAACGGAGGTGAATGGTTACTACAAAACTACGTTTTCGGGAGAAGTCATAACGTCGTTGACGTTCAAGACGAACAAAAGGACATATGGGACTTACGGAAATAAAACCAGTAGCTACTTTTCTGTTGCCGCACCCAAAGATAACCAGATTGTCGGTTTTCTTGGAAGTAGCAGCCATGCTCTCAACTCCATCGACGCTCATTTTGCCCCTGCTCCTCCTCCTGGTAGCACCGGAGCTAAGCCCGGTGCTAGTGGCATCGGAAGTGATTCTGGTAGCATTGGTAGTGCCGGAACTAACCCTGGTGCTGATGGCACCAGAGAAACCGAAAAAAACGCTGGTGGCTCAAAACCTAGTAGTGGTAGTGCCGGAACTAACCCTGGTGCTAGTGCTGTTGGCAACGGAGAAACCGAAAAAAATGCTGGTGGCTCAAAACCTAGCAGTGGTAGTGCTGGAACTAACCCTGGTGCTAGTGCTGGTGGCAACGGAGAAACCGAAAAAAACGTTGGTGGCTCAAAACCTAGCAGTGGTAAAGCCGGAACTAACCCTGGTGCTAATGCTGGTGGCAACGGAGGAACCGAAAAAAACGCTGGTGGCTCAAAATCTAGCAGTGGTAGTGCTCGAACTAACCCTGGTGCTAGTGCTGGTGGCAACGGAGAAACTGTTTCCAACATTGGAGATACGGAAAGTAACGCTGGTGGCTCGAAAAGTAATGATGGTGCTAACAATGGTGCTAGTGGCATTGAAAGTAATGCTGGTAGCACTGGAACTAACTTTGGTGCTGGTGGCACCGGGGGAATTGGAGATACGGAAAGTGATGCTGGTGGCTCCAAAACTAACTCTGGAAACGGCGGAACTAACGATGGTGCTAGTGGTATTGGAAGTAATGATGGTAGCACTGGAACTAACCCTGGTGCTGGTGGAGGAACAGATTCAAACATCGAAGGTACTGAAAATAACGTTGGTGGCAAGGAAACTAACCCTGGTGCTAGTGGCATTGGAAATAGTGATGGTAGCACTGGAACTAGCCCCGAAGGTACCGAAAGTAACGCTGACGGCACAAAAACTAACACGGGAGGCAAAGAATCTAACACCGGAAGTGAATCCAACACCAATTCTAGTCCACAAAAGTTGGAAGCACAAGGAGGCAATGGAGGAAATCAATGGGACGACGGAACCGATCATGATGGTGTGATGAAGATACATGTTGCAGTTGGTGGTCTAGGAATTGAGCAAATTAGATTTGATTATGTCAAGAACGGACAGTTGAAGGAAGGACCCTTCCACGGTGTCAAAGGAAGAGGTGGCACTTCAACGGTGCGTAAATTTTTATTATTATGGCTCAATTACGTTTTTCGAATAAGTGTTAATTCAAGATTATTGATCTTCATGATTCTGCAGATTGAGATTAGCCATCCGGACGAGTATCTTGTTTCCGTCGAGGGGTTGTACGACTCTTCCAATATCATTCAAGGAATCCAGTTTCAATCCAACAAACACACTTCTCAGTACTTTGGATATGAATATTATGGAGATGGTACACAATTTTCACTTCAAGTTAATGAAAAGAAGATCATTGGTTTCCATGGTTTTGCCGACTCACACCTTAATTCTCTTGGAGCTTATTTCGTTCCAATCTCATCCTCTTCTTCCTCCTTGACTCCTCCTCCCAACAAAGTTAAAGCTCAAGGAGGAAGTTATGGAGAAACATTTGACGATGGTGCTTTCGATCATGTAAGAAAGGTTTATGTTGGTCAAGGTGATTCTGGTGTAGCTTATGTCAAGTTCGATTATGAAAAAGACGGTAAAAAGGAGACACAAGAACATGGAAAAATGACATTGTCAGGAACAGAGGAGTTTGAGGTTGATTCAGACGATTACATAACATCAATGGAGGTTTATGTCGACAAAGTCTACGGTTATAAAAGCGAAATCGTCATTGCTCTTACCTTCAAGACCTTTAAGGGTGAAACTTCTCCACGTTTTGGAATAGAGACTGAGAATAAATATGAAGTTAAAGACGGTAAAGGAGGAAAACTTGCTGGTTTCCATGGAAAAGCTAGCGATGTTCTTTATGCTATTGGTGCTTATTTCATTCCAGCAGCAAATTAGagagtt
ACGTATGTCTTAGTTACTACTATCATACTATATTACTATGTATTGGAAAACTTTTGGTTAGAACCTGTTGGGAGGAAAGGGTTTATGTTCTGGTTCATTTTACGTGTACTAAGTACTTATAATTAAGATTAAAAGAAACATTTACAGCTTCACCCTCTGGTCGATGTATGTGGGCTGTGGGCATGTGGCCAATCTCTGAAGCGTTAGGTAGAGCAAATATAGAGTTGAGAGTTGCTTAAGTTAGTGAACGTGAATGACTAAAAAGATATGTTGCATTTAAATCGTATTGGGCCTCATCCCATCTAAAATATAGTAGGTGTAGGCCTTTTAGGTTAATTTGAATAAAATCAACCTTTTTGTAAGCAACATCGACGATTGTCACATTTTTCTCATACACATAGGTGTAATCTAGCTTTGAATGTTTTCTCATACACATAGGTGTAATCACCGTAATTATCATTTGTGAAGATATATGTTTTACCAAGTGGTTTGTATTGTCCATATATACTTTACCACTTTCATATTAACATATAATGTTTTTGTAAGTATTATACCATAAAGGATTGGTTTCTTAATATTATTAACAAAACGCAAAAATTCTTTTAAACGCAGGCGATTCCAATCCACAGCGTTGCGGTTAGAGTAGGATCAACACAAAGAGTAGTGATGGAGATCATAATCACATTCGCATTGGTCTACACTGTTTACGCCACAGCCATTGACTCCAACAATGGCACTCTCGGAACCATCGCTCCACTTGCTATCAGACTCATCGTTGGTGCTAACATTCTTGCAGCCGGCCCATTCTCTGGTGGTCCAATGAACCCTGGACGTTCTTTTGGATCATCTCTTGCCGTTGGAAATTTTTCAGGACATTAGgtttat
and here is the script I am running:
#!/usr/bin/perl
use strict;
use warnings;
# A program to find the first inframe stop codon of non-spliced intron containing genes
print "ENTER THE FILENAME FOR DNA SEQUENCES:= ";
# Asks for Sequence file and if file does not exist prints error message
my $filename = <STDIN>;
#my $sequence;
my #sequence;
chomp $filename;
unless (open(DNAFILE, $filename) ) {
print "Cannot open file \"$filename\"\n\n";
}
#sequence = <DNAFILE>;
close DNAFILE;
open (FILE, ">AtPTCindex.txt");
my $j;
my $i;
my $codon;
my $stopseq;
my $counter;
#Change $j<(375) to n=number of sequences
for ($j = 0; $j < #sequence; $j ++) {
$counter = 0;
for ($i = 0; $i < (length($sequence[$j]) - 2) && $counter < 1; $i += 3) {
$codon = substr($sequence[$j], $i, 3);
if ($codon =~ m/TAG|TGA|TAA/g) {
# m added before /TAG... above
$stopseq = substr($sequence[$j], $i, 9);
my $result = index($sequence[$j], $stopseq);
$counter = 1;
#my $results = index($sequence[$j], $stopseq);
print FILE "$result \n";
#print FILE "$results $j \n";
}
}
if ($counter == 0) {
print FILE "\n"
}
}
close FILE;
exit;
Thanks so much.
As threatened, the following is a cleaned up version of your script:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
die "Usage: $0 Filename\n" if #ARGV != 1;
my $file = shift;
open my $infh, '<', $file;
open my $outfh, '>', 'AtPTCindex.txt';
while (my $line = <$infh>) {
chomp($line);
my $result = '';
for (my $i = 0; $i < (length($line) - 2); $i += 3) {
my $codon = substr($line, $i, 3);
if ($codon =~ m/TAG|TGA|TAA/) {
# m added before /TAG... above
my $stopseq = substr($line, $i, 9);
$result = index($line, $stopseq);
$result .= " ($i, $codon, $stopseq)";
last;
}
}
print "$result\n";
# print $outfh "$result\n";
# print $outfh "$result $.\n";
}
close $infh;
close $outfh;
For the 5 lines of data that you provided, the following is the output:
84 (84, TGA, TGATCATTC)
3 (3, TGA, TGATTGTTT)
3 (3, TAA, TAATTTTTG)
4 (27, TAG, TAGTAAACA)
123 (123, TAA, TAAGATTAA)
I believe your issue is with these lines:
my $stopseq = substr($line, $i, 9);
$result = index($line, $stopseq);
You're pulling a sequence from the $line at position $i, and then immediately doing an index for it. In the case of 4 of 5 of those lines, it immediately finds the same value $i. However, in the case of line 4, it finds a matching sequence earlier in the line.
If this isn't desired, you'll have to explain what your desired behavior actually is. Perhaps, you just want $i? Or are you looking for a matching stop sequence any point AFTER $i? You'll have to specify what your actual logic wants to be.
I took a different approach, unpacking it into groups of three instead of counting by indexes of three. I believe this script does what you want, and it looks a lot cleaner. It can also optionally take the filename as argument.
#!/usr/bin/perl
use strict;
use warnings;
my $filename = 'a'; # dummy value
my $resultfile = 'AtPTCindex.txt';
# User may have passed filename as arguement
if (#ARGV) { if (-e $ARGV[0]) { $filename = $ARGV[0] } }
unless (-e $filename)
{
print "ENTER THE FILENAME FOR DNA SEQUENCES: ";
chomp($filename = <STDIN>)
}
open DNA,"<$filename" or die "Couldn't open $filename for reading: $!\n";
my #sequence = <DNA> or die "Couldn't read $filename: $!\n";;
close DNA;
# Uncomment the below line if you're braver than me
if (-e $resultfile) { die "Cowardly refusing to write to existing file" }
if (-e $resultfile) { unlink $resultfile };
open RESULT,">>$resultfile" or die "Courdn't open$!\n";
foreach my $string (#sequence)
{
# split into groups of 3
my #groups = unpack "(A3)*", $string;
# Search for the group you want
for (my $groupnum = 0; $groupnum < #groups - 1; $groupnum++)
{
if ($groups[$groupnum] =~ m/(TAG|TGA|TAA)/g)
{
print RESULT (($groupnum + 0) * 3) . "\n";
print "$1 (" . $1 . ( $groups[$groupnum + 1]) . ($groups[$groupnum + 2]) . ") at index " . (($groupnum + 0) * 3) . "\n";
last;
}
}
}
close RESULT;
Running the script on your sample data, it outputs:
TGA (TGATCATTC) at index 84
TGA (TGATTGTTT) at index 3
TAA (TAATTTTTG) at index 3
TAG (TAGTAAACA) at index 27
TAA (TAAGATTAA) at index 123
...as well as writes the raw index numbers to the file specified.

Alter code to sum values from multiple files

Curious if I can get a little help here. I'm a perl newbie, and can't figure out how to convert the following code into something a bit more useful for my analysis.
This code presently takes the 1st and 4th column from a user supplied list of data files and puts them together.
What I'd like my code to do, for each row of the "current output" generated by this code (see below), is make a sum of these 4th column values (filea, fileb, filec). Not quite sure how to implement this...
Current Output:
filea fileb filec
entrya | 0 |10.2 | 0
entryb | 0 | 0.0 | 1
entryc | 8 | 57.0| 46
desired output
sum
entrya | 10.2
entryb | 1
entryc | 111
current code looks like this:
main: {
my %data;
foreach my $file (#rsem_files) {
open (my $fh, $file) or die "Error, cannot open file $file";
my $header = <$fh>; # ignore it
while (<$fh>) {
chomp;
my #x = split(/\t/);
my $acc = $x[0];
my $count = $x[4];
$data{$acc}->{$file} = $count;
}
close $fh;
}
my #filenames = #rsem_files;
foreach my $file (#filenames) {
$file = basename($file);
}
print join("\t", "", #filenames) . "\n";
foreach my $acc (keys %data) {
print "$acc";
foreach my $file (#rsem_files) {
my $count = $data{$acc}->{$file};
unless (defined $count) {
$count = "NA";
}
print "\t$count";
}
print "\n";
}
exit(0);
}
Alter the #rsemfiles loop:
# create $total variable outside loop
my $total = 0;
foreach my $file (#rsem_files) {
my $count = $data{$acc}->{$file};
# change unless to if, no need for NA
if (defined $count) {
$total += $count;
}
}
# move print outside loop so it happens once instead of per-file
print '\t$total\n';
foreach $line(#rsemfiles) {
if ($line=~ /^entry/) {
#match the line starting with the word entry
my $entry=$1; my $filea=$2; my $fileb=$3; my $filec=$4;
# make variables out of the column values
Now that you have these variables, you can do math on them.

How would I know if there is any similar element in my array?

I have this code which lists all files in my directory:
$dir = '/var/www/corpIDD/rawFile/';
opendir DIR, $dir or die "cannot open dir $dir: $!";
my #file= readdir DIR;
closedir DIR;
which returns an array containing something like this:
$array (0 => 'ipax3_2011_01_27.txt', 1 => 'ipax3_2011_02_01.txt', 2 => 'ipax3_2011_02_03.txt')
My problem here is, how will I store elements 1 => 'ipax3_2011_02_01.txt' and 2 => 'ipax3_2011_02_03.txt' to separate variable as they belong to the same month and year(2011_02)?
Thanks!
In Perl, when you need to use a string as the key in a data structure, you are looking for the HASH builtin type, designated by the % sigil. A nice feature of Perl's hashes is that you do not have to pre-declare a complex data structure. You can use it, and Perl will infer the structure from that usage.
my #file = qw(ipax3_2011_01_27.txt ipax3_2011_02_01.txt ipax3_2011_02_03.txt);
my %ipax3;
for (#file) {
if (/^ipax3_(\d{4}_\d{2})_(\d{2}).txt$/) {
$ipax3{$1}{$2} = $_
}
else {
warn "bad file: $_\n"
}
}
for my $year_month (keys %ipax3) {
my $days = keys %{ $ipax3{$year_month} };
if ($days > 1) {
print "$year_month has $days files\n";
}
else {
print "$year_month has 1 file\n";
}
}
which prints:
2011_01 has 1 file
2011_02 has 2 files
To get at the individual files:
my $year_month = '2011_02';
my $day = '01';
my $file = $ipax3{$year_month}{$day};
Above I used the return value of the keys function as both the list to iterate over, and as the number of days. This is possible because keys will return all of the keys when in list context, and will return the number of keys in scalar context. Context is provided by the surrounding code:
my $number = keys %ipax3; # number of year_month entries
my #keys = keys %ipax3; # contains ('2011_01', '2011_02')
my #days = keys %{ $ipax{$year_month} };
In the last example, each value in %ipax is a reference to a hash. Since keys takes a literal hash, you need to wrap $ipax{$year_month} in %{ ... }. In perl v5.13.7+ you can omit the %{ ... } around arguments to keys and a few other data structure access functions.
People are responding really fast here :) anyway, I'll post mine, just for your reference. Basically, I'm also using a hash.
use warnings qw(all);
use strict;
my ($dir, $hdir) = 'C:\Work';
opendir($hdir, $dir) || die "Can't open dir \"$dir\" because $!\n";
my (#files) = readdir($hdir);
closedir($hdir);
my %yearmonths;
foreach(#files)
{
my ($year, $month);
next unless(($year, $month) = ($_ =~ /ipax3_(\d{4})_(\d{2})/));
$year += 0;
--$month; #assuming that months are in range 1-12
my $key = ($year * 12) + $month;
++$yearmonths{ $key };
}
foreach(keys %yearmonths)
{
next if($yearmonths{ $_ } < 2);
my $year = $_ / 12;
my $month = 1 + ($_ % 12);
printf "There were %d files from year %d, month %d\n", $yearmonths{$_}, $year, $month;
}

Resources