Assign Array to Variable without Extra Blank Line in Perl - arrays

I'm writing a perl script that uploads the contents from an array into a database. To do this, I've created a "foreach" statement that loops through the array and assigns its contents into a variable:
foreach my $line (#array)
{ $data .= "$line\n" }
This script works, and I'm able to upload the variable into my database. However, thanks to the new line character at the end of my statement (used to retain the original line breaks of my array), my variable contains an extra blank line at the end, that also get's put into the database.
What is the best way to get rid of this blank line? Is this something that I can only really fix after the loop? I'm extremely new to Perl, so I'm a bit confused. Thanks!

my $data = join "\n", #array;
The join builtin takes a separator and a list, and concatenates the elements of your list. It is equivalent to
sub join ($#) {
my $sep = shift;
#_ or return "";
my $str = shift;
$str .= $sep . shift while #_;
return $str;
}

Maybe you can try the next:
my $data = join "\n", #array;

Related

Perl - initialization of hash

I'm not sure how to correctly initialize my hash - I'm trying to create a key/value pair for values in coupled lines in my input file.
For example, my input looks like this:
#cluster t.18
46421 ../../../output###.txt/
#cluster t.34
41554 ../../../output###.txt/
I'm extracting the t number from line 1 (#cluster line) and matching it to output###.txt in the second line (line starting with 46421). However, I can't seem to get these values into my hash with the script that I have written.
#!/usr/bin/perl
use warnings;
use strict;
my $key;
my $value;
my %hash;
my $filename = 'input.txt';
open my $fh, '<', $filename or die "Can't open $filename: $!";
while (my $line = <$fh>) {
chomp $line;
if ($line =~ m/^\#cluster/) {
my #fields = split /(\d+)/, $line;
my $key = $fields[1];
}
elsif ($line =~ m/^(\d+)/) {
my #output = split /\//, $line;
my $value = $output[5];
}
$hash{$key} = $value;
}
It's a good idea, but your $key that is created with my in the if block is a local variable scoped to that block, masking the global $key. Inside the if block the symbol $key has nothing to do with the one you nicely declared upfront. See my in perlsub.
This local $key goes out of scope as soon as if is done and does not exist outside the if block. The global $key is again available after the if, being visible elsewhere in the loop, but is undefined since it has never been assigned to. The same goes for $value in the elsif block.
Just drop the my declaration inside the loop, thus assign to those global variables (as intended?). So, $key = ... and $value = ..., and the hash will be assigned correctly.
Note -- this is about how to get that hash assignment right. I don't know how your actual data looks and whether the line is parsed correctly. Here is a toy input.txt
#cluster t.1
1111 ../../../output1.1.txt/
#cluster t.2
2222 ../../../output2.2.txt/
I pick the 4th field instead of the 6th, $value = $output[3];, and add
print "$_ => $hash{$_}\n" for keys %hash;
after the loop. This prints
1 => output1.1.txt
2 => output2.2.txt
I am not sure whether this is what you want but the hash is built fine.
A comment on choice of tools in parsing
You parse the lines for numbers, by using the property of split to return the separators as well, when they are captured. That is neat, but in some sense it reverses its main purpose, which is to extract other components from the string, as delimited by the pattern. Thus it may make the purpose of the code a little bit convoluted, and you also have to index very precisely to retrieve what you need.
Instead of using split to extract the delimiter itself, which is given by a regex, why not extract it by a regex? That makes the intention crystal clear, too. For example, with input
#cluster t.10 has 4319 elements, 0 subclusters
37652 ../../../../clust/output43888.txt 1.397428
the parsing can go as
if ($line =~ m/^\#cluster/) {
($key) = $line =~ /t\.(\d+)/;
}
elsif ($line =~ m/^(\d+)/) {
($value) = $line =~ m|.*/(\w+\.txt)|;
}
$hash{$key} = $value if defined $key and defined $value;
where t\. and \.txt are added to more precisely specify the targets. If the target strings aren't certain to have that precise form, just capture \d+, and in the second case all non-space after the last /, say by m|^\d+.*/(\S+)|. We use the greediness of .*, which matches everything possible up to the thing that comes after it (a /), thus all the way to the very last /.
Then you can also reduce it to a single regex for each line, for example
if ($line =~ m/^\#cluster\s+t\.(\d+)/) {
$key = $1;
}
elsif ($line =~ m|^\d+.*/(\w+\.txt)|) {
$value = $1;
}
Note that I've added a condition to the hash assignment. The original code in fact assigns an undef on the first iteration, since no $value had yet been seen at that point. This is overwritten on the next iteration and we don't see it if we only print the hash afterwards. The condition also guards you against failed matches, for malformatted lines or such. Of course, far better checks can be run.

Creating multidimensional array while reading a file - Perl

I'm totally new to Perl, and I've been assigned some task... I have to read a tab separated file, and then do some operations with the data in a DB. The .tsv file is like this:
ID Name Date
155 Pedro 1988-05-05
522 Mengano 2002-08-02
So far I thought that creating a multidimensional array with the data of the file will be a good solution to handle this data later. So I read the file line by line, skip the item title columns and save the values in an array. However, I'm having difficulties creating this multidimensional array... this is what I've done so far:
#Read file from path
my #array;
my $fh = path($filename)->openr_utf8;
while (my $line = <$fh>) {
chomp $line;
# skip comments and blank lines and title line
next if $line =~ /^\#/ || $line =~ /^\s*$/ || $line =~ /^\+/ || $line =~ /ID/;
#split each line into array
my #aux_line = split(/\s+/, $line);
push #array, #{ $aux_line };
}
Obviously, last line is not working... how could be done to create an array of arrays this way? I'm little bit lost with references... And somebody can think of a better way to store this data we read from file? Thank you!
You can also do this with map:
use Data::Dumper;
my #stuff = map {[split]} <$fh>;
print Dumper \#stuff;
(with maybe a grep to skip comments)
But it may suit your use case better to use an array of hashes :
my #stuff ;
chomp(my #header = split ' ', <$fh>);
while ( <$fh>) {
my %this_row;
#this_row{#header} = split;
push ( #stuff, \%this_row) ;
}
First, use strict and use warnings. That would instantly alert you about that your wrong way to get array reference tries to access completely different variable (Perl allows variable of different types have same names).
After that just change your last line to:
push #array, \#aux_line;

Array got flushed after while loop within a filehandle

I got a problem with a Perl script.
Here is the code:
use strict;
use Data::Dumper;
my #data = ('a','b');
if(#data){
for(#data){
&debug_check($_)
}
}
print "#data";#Nothing is printed, the array gets empty after the while loop.
sub debug_check{
my $ip = shift;
open my $fh, "<", "debug.txt";
while(<$fh>){
print "$_ $ip\n";
}
}
Array data, in this example, has two elements. I need to check if the array has elements. If it has, then for each element I call a subroutine, in this case called "debug_check". Inside the subroutine I need to open a file to read some data. After I read the file using a while loop, the data array gets empty.
Why the array is being flushed and how do I avoid this strange behavior?
Thanks.
The problem here I think, will be down to $_. This is a bit of a special case, in that it's an alias to a value. If you modify $_ within a loop, it'll update the array. So when you hand it into the subroutine, and then shift it, it also updates #data.
Try:
my ( $ip ) = #_;
Or instead:
for my $ip ( #array ) {
debug_check($ip);
}
Note - you should also avoid using an & prefix to a sub. It has a special meaning. Usually it'll work, but it's generally redundant at best, and might cause some strange glitches.
while (<$fh>)
is short for
while (defined($_ = <$fh>))
$_ is currently aliased to the element of #data, so your sub is replacing each element of #data with undef. Fix:
while (local $_ = <$fh>)
which is short for
while (defined(local $_ = <$fh>))
or
while (my $line = <$fh>) # And use $line instead of $_ afterwards
which is short for
while (defined(my $line = <$fh>))
Be careful when using global variables. You want to localize them if you modify them.

Empty array in a perl while loop, should have input

Was working on this script when I came across a weird anomaly. When I go to print #extract after declaring it, it prints correctly the following:
------MMMMMMMMMMMMMMMMMMMMMMMMMM-M-MMMMMMMM
------SSSSSSSSSSSSSSSSSSSSSSSSSS-S-SSSSSDTA
------TIIIIIIIIIIIIITIIIVVIIIIII-I-IIIIITTT
Now the weird part, when I then try to print or return #extract (or $column) inside of the while loop, it comes up empty, thus rendering the rest of the script useless. I've never come across this before up until now, haven't been able to find any documentation or people with similar problems as mine. Below is the code, I marked with #<------ where the problems are and are not, to see if anyone can have any idea what is going on? Thank you kindly.
P.S. I am utilizing perl version 5.12.2
use strict;
use warnings;
#use diagnostics;
#use feature qw(say);
open (S, "Val nuc align.txt") || die "cannot open FASTA file to read: $!";
open (OUTPUT, ">output.txt");
my #extract;
my $sum = 0;
my #lines = <S>;
my #seq = ();
my $start = 0; #amino acid column start
my $end = 10; #amino acid column end
#Removing of the sequence tag until amino acid sequence composition (from >gi to )).
foreach my $line (#lines) {
$line =~ s/\n//g;
if ($line =~ />/g) {
$line =~ s/>.*\]/>/g;
push #seq, $line;
}
else {
push #seq, $line;
}
}
my $seq = join ('', #seq);
my #seq_prot = join "\n", split '>', $seq;
#seq_prot = grep {/[A-Z]/} #seq_prot;
#number of sequences
print OUTPUT "Number of sequences:", scalar (grep {defined} #seq_prot), "\n";
#selection of amino acid sequence. From $start to $end.
my #vertical_array;
while ( my $line = <#seq_prot> ) {
chomp $line;
my #split_line = split //, $line;
for my $index ( $start..$end ) { #AA position, extracts whole columns
$vertical_array[$index] .= $split_line[$index];
}
}
# Print out your vertical lines
for my $line ( #vertical_array ) {
my $extract = say OUTPUT for unpack "(a200)*", $line; #split at end of each column
#extract = grep {defined} $extract;
}
print OUTPUT #extract; #<--------------- This prints correctly the input
#Count selected amino acids excluding '-'.
my %counter;
while (my $column = #extract) {
print #extract; #<------------------------ Empty print, no input found
}
Update: Found the main problem to be with the unpack command, I thought I could utilize it to split my columns of my input at X elements (43 in this case). While this works, the minute I change $start to another number that is not 0 (say 200), the code brings up errors. Probably has something to do with the number of column elements does not match the lines. Will keep updated.
Write your last while loop the same way as your previous for loop. The assignment
my $column = #extract
is in scalar context, which does not give you the same result as:
for my $column (#extract)
Instead, it will give you the number of elements in the array. Try this second option and it should work.
However, I still have a concern, because in fact, if #extract had anything in it, you would obtain an infinite loop. Is there any code that you did not include between your two commented lines?

Search for, and remove column from CSV file

I'm trying to write a subroutine that will take two arguments, a filename and the column name inside a CSV file. The subroutine will search for the second argument (column name) and remove that column (or columns) from the CSV file and then return the CSV file with the arguments removed.
I feel like I've gotten through the first half of this sub (opening the file, retrieve the headers and values) but I can't seem to find a way to search the CSV file for the string that the user inputs and delete that whole column. Any ideas? Here's what I have so far.
sub remove_columns {
my #Para = #_;
my $args = #Para;
die "Insufficent arguments\n" if ($nargs < 2);
open file, $file
$header = <file>;
chomp $header;
my #hdr = split ',',$header;
while (my $line = <file>){
chomp $line;
my #vals = split ',',$line;
#hash that will allow me to access column name and values quickly
my %h;
for (my $i=0; $i<=$#hdr;$i++){
$h{$hdr[$i]}=$i;
}
....
}
Here's where the search and removal will be done. I've been thinking about how to go about this; the CSV files that I'll be modifying will be huge, so speed is a factor, but I can't seem to think of a good way to go about this. I'm new to Perl, so I'm struggling a bit.
Here are a few hints that will hopefully get you going.
To remove the element of an array at position $index of an array use :
splice #array,$index,1 ;
As speed is an issues, you probably want to construct an array of column numbers at the start and then loop on the the elements of the array
for my $index (#indices) {
splice #array,$index,1 ;
}
(this way is more idiomatic Perl than for (my $i=0; $i<=$#hdr;$i++) type loop )
Another thing to consider - CSV format is surprisingly complicated. Might your data have data with , within " " such as
1,"column with a , in it"
I would consider using something like Text::CSV
You should probably look in the direction of Text::CSV
Or you can do something like this:
my $colnum;
my #columns = split(/,/, <$file>);
for(my $i = 0; $i < scalar(#columns); $i++) {
if($columns[$i] =~ /^$unwanted_column_name$/) {
$colnum = $i;
last;
};
};
while(<$file>) {
my #row = split(/,/, $_);
splice(#row, $colnum, 1);
#do something with resulting array #row
};
Side note:
you really should use strict and warnings;
split(/,/, <$file>);
won't work with all CSV files
There is elegant way how to remove some columns from array. If I have columns to removal in array #cols, and headers in #headers I can make array of indexes to preserve:
my %to_delete;
#to_delete{#cols} = ();
my #idxs = grep !exists $to_delete{$headers[$_]}, 0 .. $#headers;
Then it's easy to make new headers
#headers[#idxs]
and also new row from read columns
#columns[#idxs]
The same approach can be used for example for rearranging arrays. It is very fast and pretty idiomatic Perl way how to do this sort of tasks.

Resources