Perl: Want to add and then average columns of tab delimited data - arrays

Data is a table that includes names in the first row and first column so I keep getting a non-numeric value error. I figured out how to ignore the first row by using if ($row[0] ne "retrovirus" ), but I don't know how to ignore the first column. I am new to programming and having a really hard time understanding arrays and how to get them to work. How do I split my data into columns of numbers excluding the words and add them together?
This is what I have so far, and its giving incorrect answers.
#!/usr/bin/perl -w
use strict;
# Part A. Computing the average bp length of the virus's
# genomes and each individual gene in the text file.
my $infile = "lab1_table.txt";
open INFILE, $infile or die "$infile: $!";
my #totals = ();
while (my $line = <INFILE>){
chomp $line;
my $total = 0;
my $n = 0;
# Splitting into columns
my #row = split /\t/, $line;
# Working through and adding up each column
foreach my $element (#row) {
# Ignoring first line with headings
if ($row[0] ne "retrovirus" ){
$total = $total + $element;
print "$total \n";
}
}
}
close INFILE;

If you totally don't care about the first element of the row, just use shift(#row)
before the foreach loop. Or if you want to preserve the original values you can get the elements from the second to the last:
#!/usr/bin/perl -w
use strict;
# Part A. Computing the average bp length of the virus's
# genomes and each individual gene in the text file.
my $infile = "lab1_table.txt";
open INFILE, $infile or die "$infile: $!";
while (my $line = <INFILE>)
{
chomp $line;
my $total = 0;
# Splitting into columns
my #row = split /\t/, $line;
# Working through and adding up each column
if ($row[0] ne "retrovirus" )
{
map { $total += $_ } #row[1..(scalar(#row) - 1)];
print "$total \n";
}
}
close INFILE;

Related

Is it possible to put elements of array to hash in perl?

example of file content:
>random sequence 1 consisting of 500 residues.
VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV
HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;
>random sequence 2 consisting of 500 residues.
KAAATKKPWADTIPYLLCTFMQTSGLEWLHTDYNNFSSVVCVRYFEQFWVQCQDHVFVKN
KNWHQVLWEEYAVIDSMNFAWPPLYQSVSSNLDSTERMMWWWVYYQFEDNIQIRMEWCNI
YSGFLSREKLELTHNKCEVCVDKFVRLVFKQTKWVRTMNNRRRVRFRGIYQQTAIQEYHV
HQKIIRYPCHVMQFHDPSAPCDMTRQGKRMNFCFIIFLYTLYEVKYWMHFLTYLNCLEHR;
>random sequence 3 consisting of 500 residues.
AYCSCWRIHNVVFQKDVVLGYWGHCWMSWGSMNQPFHRQPYNKYFCMAPDWCNIGTYAWK
I need an algorithm to build a hash $hash{$key} = $value; where lines starting with > are the values and following lines are the keys.
What I have tried:
open (DATA, "seq-at.txt") or die "blabla";
#data = <DATA>;
%result = ();
$k = 0;
$i = 0;
while($k != #data) {
$info = #data[$k]; #istrina pirma elementa
if(#data[$i] !=~ ">") {
$key .= #data[$i]; $i++;
} else {
$k = $i;
}
$result{$key} = $value;
}
but it doesn't work.
You don't have to previously use an array, you can directly build your hash:
use strict;
use warnings;
# ^- start always your code like this to see errors and what is ambiguous
# declare your variables using "my" to specify the scope
my $filename = 'seq-at.txt';
# use the 3 parameters open syntax to avoid to overwrite the file:
open my $fh, '<', $filename or die "unable to open '$filename' $!";
my %hash;
my $hkey = '';
my $hval = '';
while (<$fh>) {
chomp; # remove the newline \n (or \r\n)
if (/^>/) { # when the line start with ">"
# store the key/value in the hash if the key isn't empty
# (the key is empty when the first ">" is encountered)
$hash{$hkey} = $hval if ($hkey);
# store the line in $hval and clear $hkey
($hval, $hkey) = $_;
} elsif (/\S/) { # when the line isn't empty (or blank)
# append the line to the key
$hkey .= $_;
}
}
# store the last key/val in the hash if any
$hash{$hkey} = $hval if ($hkey);
# display the hash
foreach (keys %hash) {
print "key: $_\nvalue: $hash{$_}\n\n";
}
It is unclear what you want, the array seems to be the lines subsequent to the random sequence number... If the contenst of a file test.txt are:
Line 1:">"random sequence 1 consisting of 500 residues.
Line 2:VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV
Line 3:HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;
You could try something like:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $contentFile = $ARGV[0];
my %testHash = ();
my $currentKey = "";
open(my $contentFH,"<",$contentFile);
while(my $contentLine = <$contentFH>){
chomp($contentLine);
next if($contentLine eq ''); # Empty lines.
if($contentLine =~ /^"\>"(.*)/){
$currentKey= $1;
}else{
push(#{$testHash{$currentKey}},$contentLine);
}
}
print Dumper(\%testHash);
Which results in a structure like this:
seb#amon:[~]$ perl test.pl test.txt
$VAR1 = {
'random sequence 3 consisting of 500 residues.' => [
'AYCSCWRIHNVVFQKDVVLGYWGHCWMSWGSMNQPFHRQPYNKYFCMAPDWCNIGTYAWK'
],
'random sequence 1 consisting of 500 residues.' => [
'VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV',
'HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;'
],
'random sequence 2 consisting of 500 residues.' => [
'KAAATKKPWADTIPYLLCTFMQTSGLEWLHTDYNNFSSVVCVRYFEQFWVQCQDHVFVKN',
'KNWHQVLWEEYAVIDSMNFAWPPLYQSVSSNLDSTERMMWWWVYYQFEDNIQIRMEWCNI',
'YSGFLSREKLELTHNKCEVCVDKFVRLVFKQTKWVRTMNNRRRVRFRGIYQQTAIQEYHV',
'HQKIIRYPCHVMQFHDPSAPCDMTRQGKRMNFCFIIFLYTLYEVKYWMHFLTYLNCLEHR;'
]
};
You would be basically using each hash "value" as an array structure, the #{$variable} does the magic.

Tie::File - Get 5 lines from a file into an array while removing them from file

I want to open a file, push 5 lines into an array for later use (or what is left if less than 5) and remove those 5 lines from the file as well.
It does not matter whether I am removing (or pushing) from head or tail of file.
I have used Tie::File in the past and am willing to use it, but I cannot figure it out with or without the Tie module.
use Tie::File;
my $limit='5';
$DataFile='data.txt';
###open my $f, '<', $DataFile or die;
my #lines;
tie (#lines, 'Tie::File', $DataFile);
$#lines = $limit;
###while( <#lines> ) {
shift #lines if #lines <= $limit;
push (#lines, $_);
###}
print #lines;
untie #lines;
Also tried File::ReadBackwards from an example I found but, I cannot figure out how to get the array of 5.
my $pos = do {
my $fh = File::ReadBackwards->new($DataFile) or die $!;
##lines =(<FILE>)[1..$limit];
#$fh->readline() for 1..$limit;
my $log_line = $fh->readline for 1..$limit;
print qq~ LogLine $log_line~;
$fh->tell()};
All that said, this came close, but no cigar. How do I get the 5 into an array?
use File::ReadBackwards;
my $num_lines = 5;
my $pos = do {
my $fh = File::ReadBackwards->new($DataFile) or die $!;
$fh->readline() for 1..$num_lines;
$fh->tell()};
truncate($DataFile, $pos)or die $!;
I will check each line in the array against a regex later on. They still need to be removed from the file either way.
If you extract the last five lines instead of the first five, then you can use truncate instead of writing the entire file. Furthermore, you can use File::ReadBackwards to get those five lines without reading the entire file. That makes the following solution insanely faster than Tie::File for large files (and it will use far less memory):
use File::ReadBackwards qw( );
my $num_lines = 5;
my $fh = File::ReadBackwards->new($DataFile)
or die("Can't open $DataFile: $!\n");
my #extracted_lines;
while ($_ = $fh->readline() && #extracted_lines < $num_lines) {
push #extracted_lines, $_;
}
truncate($fh->get_handle(), $fh->tell())
or die("Can't truncate $DataFile: $!\n");
This removes the first five lines of the data.txt file, stores them in another array and prints the removed lines on STDOUT:
use warnings;
use strict;
use Tie::File;
my $limit = 5;
my $DataFile = 'data.txt';
tie my #lines, 'Tie::File', $DataFile or die $!;
my #keeps = splice #lines, 0, $limit;
print "$_\n" for #keeps;
untie #lines;

Empty array in a perl while loop, should have input

Was working on this script when I came across a weird anomaly. When I go to print #extract after declaring it, it prints correctly the following:
------MMMMMMMMMMMMMMMMMMMMMMMMMM-M-MMMMMMMM
------SSSSSSSSSSSSSSSSSSSSSSSSSS-S-SSSSSDTA
------TIIIIIIIIIIIIITIIIVVIIIIII-I-IIIIITTT
Now the weird part, when I then try to print or return #extract (or $column) inside of the while loop, it comes up empty, thus rendering the rest of the script useless. I've never come across this before up until now, haven't been able to find any documentation or people with similar problems as mine. Below is the code, I marked with #<------ where the problems are and are not, to see if anyone can have any idea what is going on? Thank you kindly.
P.S. I am utilizing perl version 5.12.2
use strict;
use warnings;
#use diagnostics;
#use feature qw(say);
open (S, "Val nuc align.txt") || die "cannot open FASTA file to read: $!";
open (OUTPUT, ">output.txt");
my #extract;
my $sum = 0;
my #lines = <S>;
my #seq = ();
my $start = 0; #amino acid column start
my $end = 10; #amino acid column end
#Removing of the sequence tag until amino acid sequence composition (from >gi to )).
foreach my $line (#lines) {
$line =~ s/\n//g;
if ($line =~ />/g) {
$line =~ s/>.*\]/>/g;
push #seq, $line;
}
else {
push #seq, $line;
}
}
my $seq = join ('', #seq);
my #seq_prot = join "\n", split '>', $seq;
#seq_prot = grep {/[A-Z]/} #seq_prot;
#number of sequences
print OUTPUT "Number of sequences:", scalar (grep {defined} #seq_prot), "\n";
#selection of amino acid sequence. From $start to $end.
my #vertical_array;
while ( my $line = <#seq_prot> ) {
chomp $line;
my #split_line = split //, $line;
for my $index ( $start..$end ) { #AA position, extracts whole columns
$vertical_array[$index] .= $split_line[$index];
}
}
# Print out your vertical lines
for my $line ( #vertical_array ) {
my $extract = say OUTPUT for unpack "(a200)*", $line; #split at end of each column
#extract = grep {defined} $extract;
}
print OUTPUT #extract; #<--------------- This prints correctly the input
#Count selected amino acids excluding '-'.
my %counter;
while (my $column = #extract) {
print #extract; #<------------------------ Empty print, no input found
}
Update: Found the main problem to be with the unpack command, I thought I could utilize it to split my columns of my input at X elements (43 in this case). While this works, the minute I change $start to another number that is not 0 (say 200), the code brings up errors. Probably has something to do with the number of column elements does not match the lines. Will keep updated.
Write your last while loop the same way as your previous for loop. The assignment
my $column = #extract
is in scalar context, which does not give you the same result as:
for my $column (#extract)
Instead, it will give you the number of elements in the array. Try this second option and it should work.
However, I still have a concern, because in fact, if #extract had anything in it, you would obtain an infinite loop. Is there any code that you did not include between your two commented lines?

Popping keys of an array to calculate a total

I'm trying to simply pop off each numeric value and add them together to gain a total.
Input file:
Samsung 46
RIM 16
Apple 87
Microsoft 30
My code compiles, however, it only returns 0:
open (UNITS, 'units.txt') || die "Can't open it $!";
my #lines = <UNITS>;
my $total = 0;
while (<UNITS>) {
chomp;
my $line = pop #lines;
$line += $total;
}
print $total;
No need to slurp all lines into an array if you're just going to loop through them anyway with a while. Also, you need to split each line to get your numbers.
use warnings;
use strict;
open (UNITS, 'units.txt') || die "Can't open it $!";
my $total = 0;
while (<UNITS>) {
chomp;
my $num = (split)[1];
$total += $num;
}
print "$total\n";
__END__
179
There are three problems here
You are trying to add strings like 'Samsung 46' + 'RIM 16'
You read the entire file into #lines and then try to read more from the file in the while loop. That loop is never entered because you have already read to end of file
You are adding $total to the (undeclared) variable $line within the loop, instead of the other way around. So $total remains at zero and $line keeps having zero added to it
It is best to use while to read files unless you need something other than sequential access to the records, so removing #lines is a start.
It isn't completely clear which part of the records you want to accumulate. This program splits the lines on whitespace and adds together the last field of each line.
You must always use strict and use warnings at the start of every program. It is a measure that will make it far easier to locate bugs in your code. It is also best to use lexical file handles rather than the global one you used, and the three-parameter form of open.
use strict;
use warnings;
open my $units, '<', 'units.txt' or die "Can't open it: $!";
my $total;
while (<$units>) {
my #fields = split;
$total += $fields[-1];
}
print $total;
output
179
use strict;
use warnings;
open my $fh, "<", "units.txt" or die "well...";
my $total = 0;
while(<$fh>){
chomp;
my ($string, $num) = split(" ", $_);
$total += $num;
}
print $total;
This problem is a doddle with a one-liner:
$ perl -ane '$sum += $F[1] }{ print $sum' units.txt
Explanation
-a enables autosplit, each line is split and stored in #F
-n loops over the file line by line
-e tells perl that the next argument is to be treated as Perl code
the LHS of the Eskimo-kiss (that funny-looking }{ in the middle) is performed for every line in the input file, RHS performed only once
LHS accumulates the second column of every line in $sum
RHS prints the result of $sum once all lines have been processed

Read CSV file and save in 2 d array

I am trying to read a huge CSV file in 2 D array, there must be a better way to split the line and save it in the 2 D array in one step :s
Cheers
my $j = 0;
while (<IN>)
{
chomp ;
my #cols=();
#cols = split(/,/);
shift(#cols) ; #to remove the first number which is a line header
for(my $i=0; $i<11; $i++)
{
$array[$i][$j] = $cols[$i];
}
$j++;
}
CSV is not trivial. Don't parse it yourself. Use a module like Text::CSV, which will do it correctly and fast.
use strict;
use warnings;
use Text::CSV;
my #data; # 2D array for CSV data
my $file = 'something.csv';
my $csv = Text::CSV->new;
open my $fh, '<', $file or die "Could not open $file: $!";
while( my $row = $csv->getline( $fh ) ) {
shift #$row; # throw away first value
push #data, $row;
}
That will get all your rows nicely in #data, without worrying about parsing CSV yourself.
If you ever find yourself reaching for the C-style for loop, then there's a good chance that your program design can be improved.
while (<IN>) {
chomp;
my #cols = split(/,/);
shift(#cols); #to remove the first number which is a line header
push #array, \#cols;
}
This assumes that you have a CSV file that can be processed with a simple split (i.e. the records contain no embedded commas).
Aside: You can simplify your code with:
my #cols = split /,/;
Your assignment to $array[$col][$row] uses an unusual subscript order; it complicates life.
With your column/row assignment order in the array, I don't think there's a simpler way to do it.
Alternative:
If you were to reverse the order of the subscripts in the array ($array[$row][$col]), you could think about using:
use strict;
use warnings;
my #array;
for (my $j = 0; <>; $j++) # For testing I used <> instead of <IN>
{
chomp;
$array[$j] = [ split /,/ ];
shift #{$array[$j]}; # Remove the line label
}
for (my $i = 0; $i < scalar(#array); $i++)
{
for (my $j = 0; $j < scalar(#{$array[$i]}); $j++)
{
print "array[$i,$j] = $array[$i][$j]\n";
}
}
Sample Data
label1,1,2,3
label2,3,2,1
label3,2,3,1
Sample Output
array[0,0] = 1
array[0,1] = 2
array[0,2] = 3
array[1,0] = 3
array[1,1] = 2
array[1,2] = 1
array[2,0] = 2
array[2,1] = 3
array[2,2] = 1

Resources