Below is my script.
I have attempted many print statements to work out why it is only accessing the first array element. The pattern match works. The array holds a minimum 40 elements. I have checked and it is full.
I have printed each line, and each line prints.
my $index = 0;
open(FILE, "$file") or die "\nNot opening $file for reading\n\n";
open(OUT, ">$final") or die "Did not open $final\n";
while (<FILE>) {
foreach my $barcode (#barcode) {
my #line = <FILE>;
foreach $_ (#line) {
if ($_ =~ /Barcode([0-9]*)\t$barcode[$index]\t$otherarray[$index]/) {
my $bar = $1;
$_ =~ s/.*//;
print OUT ">Barcode$bar"."_"."$barcode[$index]\t$otherarray[$index]";
}
print OUT $_;
}
$index++;
}
}
Okay, lets say the input was:
File:
Barcode001 001 abc
Barcode002 002 def
Barcode003 003 ghi
#barcode holds:
001
002
003
#otherarray holds:
abc
def
ghi
The output result for this script is currently printing only:
Barcode001_001 abc
It should be printing:
>Barcode001_001 abc
>Barcode002_002 def
>Barcode003_003 ghi
Where it should be printing a whole load up to ~40 lines.
Any ideas? There must be something wrong with the way I am accessing the array elements? Or incrementing? Hoping this isn't something too silly!
Thanks in advance.
It needs the index because I am trying to match arrays in parallel, as they are ordered. Each line needs to match the corresponding indices of the arrays to each line in the file.
It's a little hard to answer with certainty without more information about the contents of #barcode and FILE, but there is something odd in your code which makes me think that it might be the problem.
The construct while (<FILE>) { ... } will, until end of file, read a line from FILE into $_ and then execute the contents of the loop. In your code, you also read all the lines from FILE from within the loop that iterates over #barcode. I think it is likely that you intended to check each line from FILE against all the elements of #barcode, which would make the loop look like the following:
while (my $line = <FILE>) {
foreach my $barcode (#barcode) {
if ($line =~ /Barcode([0-9]*)\t$barcode/) {
my $bar = $1;
print OUT ">Barcode$bar"."_"."$barcode\n";
}
else {
print OUT $line;
}
}
}
I've taken the liberty of doing a bit of code tidying, but I may have made some unwarranted assumptions.
Your core problem in the above is - in your first iteration you slurp all of your file into #lines. But because it's lexically scoped to the loop, it disappears when that loop completes.
Furthermore:
I would strongly suggest that you don't use $_ like that.
$_ is a special variable that's set implicitly in loops. I'd strongly suggest that you need to replace that with something that isn't a special variable, because that's a sure way to cause yourself pain.
turn on use strict; and use warnings;
use 3 argument open with a lexical filehandle.
perltidy your code, so the bracketing looks right.
you've a search and replace pattern on $_ that's emptying it completely, but then you're trying to print it. You may well not be printing what you think you're printing.
You're accessing <FILE> outside and inside your loop. This will cause you problems.
Barcode([0-9]*) - with a '*' there you're saying 'zero or more' is valid. You may want to consider \d+ - one or more digits.
referencing multiple arrays by index is messy. I'd suggest coalescing them into a hash lookup (lookup by key - barcode)
This line:
my #line = <FILE>;
reads your whole file into #line. But you do this inside the while loop that iterates... each line in <FILE>. Don't do that, it's horrible.
Is this something like what you wanted?
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #barcode = qw (
001
002
003
);
my #otherarray = qw (
abc
def
ghi
);
my %lookup;
#lookup{#barcode} = #otherarray;
print Dumper \%lookup;
#commented because I don't have your source data
#my $file = "input_file_name";
#my $output = "output_file_name";
#open( my $input, "<", $file ) or die "\nNot opening $file for reading\n\n";
#open( my $output, ">", $final ) or die "Did not open $final\n";
#while ( my $line = <$input> )
while ( my $line = <DATA> ) {
foreach my $barcode (#barcode) {
if ( my ($bar) = ( $line =~ /Barcode(\d+)\s+$barcode/ ) ) {
print ">Barcode$bar" . "_" . "$barcode $lookup{$barcode}\n";
#print {$output} ">Barcode$bar" . "_" . "$lookup{$barcode}\n";
}
}
}
__DATA__
Barcode001 001
Barcode002 002
Barcode003 003
Prints:
$VAR1 = {
'001' => 'abc',
'002' => 'def',
'003' => 'ghi'
};
>Barcode001_001 abc
>Barcode002_002 def
>Barcode003_003 ghi
It turns out it was a simple issue as I had suspected being a Monday. I had a colleague go through it with me, and it was the placing of the index:
#my $index = 0; #This means the index is iterated through,
#but for each barcode for one line, then it continues
#counting up and misses the other values, therefore
#repeatedly printing just the first element of the array.
open(FILE, "$file") or die "\nNot opening $file for reading\n\n";
open(OUT, ">$final") or die "Did not open $final\n";
while (<FILE>) {
$index = 0; #New placement of $index for initialising.
foreach my $barcode (#barcode) {
my #line = <FILE>;
foreach $_ (#line) {
if ($_ =~ /Barcode([0-9]*)\t$barcode[$index]\t$otherarray[$index]/) {
my $bar = $1;
$_ =~ s/.*//;
print OUT ">Barcode$bar"."_"."$barcode[$index]\t$otherarray[$index]";
}
print OUT $_;
$index++; #Increment here
}
#$index++;
}
}
Thanks to everyone for their responses, for my original and poorly worded question they would have worked and may be more efficient, but for the purpose of the script and my edited question, it needs to be this way.
Related
I'm not sure how to correctly initialize my hash - I'm trying to create a key/value pair for values in coupled lines in my input file.
For example, my input looks like this:
#cluster t.18
46421 ../../../output###.txt/
#cluster t.34
41554 ../../../output###.txt/
I'm extracting the t number from line 1 (#cluster line) and matching it to output###.txt in the second line (line starting with 46421). However, I can't seem to get these values into my hash with the script that I have written.
#!/usr/bin/perl
use warnings;
use strict;
my $key;
my $value;
my %hash;
my $filename = 'input.txt';
open my $fh, '<', $filename or die "Can't open $filename: $!";
while (my $line = <$fh>) {
chomp $line;
if ($line =~ m/^\#cluster/) {
my #fields = split /(\d+)/, $line;
my $key = $fields[1];
}
elsif ($line =~ m/^(\d+)/) {
my #output = split /\//, $line;
my $value = $output[5];
}
$hash{$key} = $value;
}
It's a good idea, but your $key that is created with my in the if block is a local variable scoped to that block, masking the global $key. Inside the if block the symbol $key has nothing to do with the one you nicely declared upfront. See my in perlsub.
This local $key goes out of scope as soon as if is done and does not exist outside the if block. The global $key is again available after the if, being visible elsewhere in the loop, but is undefined since it has never been assigned to. The same goes for $value in the elsif block.
Just drop the my declaration inside the loop, thus assign to those global variables (as intended?). So, $key = ... and $value = ..., and the hash will be assigned correctly.
Note -- this is about how to get that hash assignment right. I don't know how your actual data looks and whether the line is parsed correctly. Here is a toy input.txt
#cluster t.1
1111 ../../../output1.1.txt/
#cluster t.2
2222 ../../../output2.2.txt/
I pick the 4th field instead of the 6th, $value = $output[3];, and add
print "$_ => $hash{$_}\n" for keys %hash;
after the loop. This prints
1 => output1.1.txt
2 => output2.2.txt
I am not sure whether this is what you want but the hash is built fine.
A comment on choice of tools in parsing
You parse the lines for numbers, by using the property of split to return the separators as well, when they are captured. That is neat, but in some sense it reverses its main purpose, which is to extract other components from the string, as delimited by the pattern. Thus it may make the purpose of the code a little bit convoluted, and you also have to index very precisely to retrieve what you need.
Instead of using split to extract the delimiter itself, which is given by a regex, why not extract it by a regex? That makes the intention crystal clear, too. For example, with input
#cluster t.10 has 4319 elements, 0 subclusters
37652 ../../../../clust/output43888.txt 1.397428
the parsing can go as
if ($line =~ m/^\#cluster/) {
($key) = $line =~ /t\.(\d+)/;
}
elsif ($line =~ m/^(\d+)/) {
($value) = $line =~ m|.*/(\w+\.txt)|;
}
$hash{$key} = $value if defined $key and defined $value;
where t\. and \.txt are added to more precisely specify the targets. If the target strings aren't certain to have that precise form, just capture \d+, and in the second case all non-space after the last /, say by m|^\d+.*/(\S+)|. We use the greediness of .*, which matches everything possible up to the thing that comes after it (a /), thus all the way to the very last /.
Then you can also reduce it to a single regex for each line, for example
if ($line =~ m/^\#cluster\s+t\.(\d+)/) {
$key = $1;
}
elsif ($line =~ m|^\d+.*/(\w+\.txt)|) {
$value = $1;
}
Note that I've added a condition to the hash assignment. The original code in fact assigns an undef on the first iteration, since no $value had yet been seen at that point. This is overwritten on the next iteration and we don't see it if we only print the hash afterwards. The condition also guards you against failed matches, for malformatted lines or such. Of course, far better checks can be run.
I am having an undesired interaction with a foreach loop and a while that I don't quite understand. I have a normal for loop, and then a foreach loop going through an array. I create a string (representing a file name) with both and then open the file and read it.
The code I use is here:
#array=(1,2);
for($y=0;$y<2;$y++)
{
foreach(#array)
{
print "#array\n";
$name="/Users/jorge/$_\_vs_$y\.txt";
print "$name\n";
open(INFILE,"$name") or die "Can't open files!\n";
while(<INFILE>)
{
$line=$_;
}
}
}
and the output is:
array: 1 2
/Users/jorge/1_vs_0.txt
array: 2
/Users/jorge/2_vs_0.txt
array:
/Users/jorge/_vs_1.txt
Seems that somehow the while loop is shortening my array, if I remove the:
while(<INFILE>)
it works as intented, also if I change the foreach too:
foreach $tmp (#array)
and use $tmp instead of $_, it also works as intended, the output looks like this:
array: 1 2
/Users/jorge/1_vs_0.txt
array: 1 2
/Users/jorge/2_vs_0.txt
array: 1 2
/Users/jorge/1_vs_1.txt
array: 1 2
/Users/jorge/2_vs_1.txt
array: 1 2
/Users/jorge/1_vs_2.txt
You're using $_ as the loop control variable on two nested loops. Instead, you should give each loop its own variable.
Not only is the inner loop changing the value of the control variable for the outer loop, but it's also changing the actual array being looped over. That's because the loop control variable in Perl's for/foreach aliases the elements of the array, so when the <INFILE> construct reads a line into $_, it overwrites the current element of #array with that line. When the while loop finishes reading the file, $_ comes out of the last <INFILE> as undefined, which means the most-recently processed element of #array will always be undefined when you get back to the top of the foreach loop.
You should also be declaring your variables with my, using a lexical scalar instead of a bareword as a file handle, and using the the three-argument version of open, so I've made those changes below as well. But the solution to your shrinking-array problem is just the use of an explicit variable instead of the default $_ for at least one of the two loops.
my #array = (1, 2);
for (my $y = 0; $y < 2; $y++)
{
foreach my $x (#array) # using $x instead of $_
{
print "#array\n";
my $name = "/Users/jorge/${x}_vs_$y.txt";
print "$name\n";
open my $infile, '<', $name or die "$0: can't open file '$name': $!\n";
while (my $line = <$infile>) # using $line instead of $_
{
# do something with $line here
}
}
}
Your while loop overwrites content of array as it uses $_ which is aliased to #array elements.
It is best to use lexical (my) variable when reading file using while,
while (my $line = <INFILE>) {
# ..
}
Was working on this script when I came across a weird anomaly. When I go to print #extract after declaring it, it prints correctly the following:
------MMMMMMMMMMMMMMMMMMMMMMMMMM-M-MMMMMMMM
------SSSSSSSSSSSSSSSSSSSSSSSSSS-S-SSSSSDTA
------TIIIIIIIIIIIIITIIIVVIIIIII-I-IIIIITTT
Now the weird part, when I then try to print or return #extract (or $column) inside of the while loop, it comes up empty, thus rendering the rest of the script useless. I've never come across this before up until now, haven't been able to find any documentation or people with similar problems as mine. Below is the code, I marked with #<------ where the problems are and are not, to see if anyone can have any idea what is going on? Thank you kindly.
P.S. I am utilizing perl version 5.12.2
use strict;
use warnings;
#use diagnostics;
#use feature qw(say);
open (S, "Val nuc align.txt") || die "cannot open FASTA file to read: $!";
open (OUTPUT, ">output.txt");
my #extract;
my $sum = 0;
my #lines = <S>;
my #seq = ();
my $start = 0; #amino acid column start
my $end = 10; #amino acid column end
#Removing of the sequence tag until amino acid sequence composition (from >gi to )).
foreach my $line (#lines) {
$line =~ s/\n//g;
if ($line =~ />/g) {
$line =~ s/>.*\]/>/g;
push #seq, $line;
}
else {
push #seq, $line;
}
}
my $seq = join ('', #seq);
my #seq_prot = join "\n", split '>', $seq;
#seq_prot = grep {/[A-Z]/} #seq_prot;
#number of sequences
print OUTPUT "Number of sequences:", scalar (grep {defined} #seq_prot), "\n";
#selection of amino acid sequence. From $start to $end.
my #vertical_array;
while ( my $line = <#seq_prot> ) {
chomp $line;
my #split_line = split //, $line;
for my $index ( $start..$end ) { #AA position, extracts whole columns
$vertical_array[$index] .= $split_line[$index];
}
}
# Print out your vertical lines
for my $line ( #vertical_array ) {
my $extract = say OUTPUT for unpack "(a200)*", $line; #split at end of each column
#extract = grep {defined} $extract;
}
print OUTPUT #extract; #<--------------- This prints correctly the input
#Count selected amino acids excluding '-'.
my %counter;
while (my $column = #extract) {
print #extract; #<------------------------ Empty print, no input found
}
Update: Found the main problem to be with the unpack command, I thought I could utilize it to split my columns of my input at X elements (43 in this case). While this works, the minute I change $start to another number that is not 0 (say 200), the code brings up errors. Probably has something to do with the number of column elements does not match the lines. Will keep updated.
Write your last while loop the same way as your previous for loop. The assignment
my $column = #extract
is in scalar context, which does not give you the same result as:
for my $column (#extract)
Instead, it will give you the number of elements in the array. Try this second option and it should work.
However, I still have a concern, because in fact, if #extract had anything in it, you would obtain an infinite loop. Is there any code that you did not include between your two commented lines?
I am very new to perl and am struggling to get this script to work.
I have taken pieces or perl and gooten them to work as indivual sections but upon trying to blend them together it fails. Even with the error messages that show up I can not find where my mistake is.
The script when working and completed will read an output file and go through it section my section and utilmately generate a new output file with not much more the a heading with some additional text and a value of the amount of lines in that section.
My issues are when it does the looping for each keyword in the array it is now failing with the error message 'Argument "" isn't numeric in array element at'. Perl directs me to a section in the script but I can not see how I am calling the element incorrectly. All the elements in the array are alpha yet the error message is refering to a numeric value.
Can anyone see my mistake.
Thank you
Here is the script
#!/usr/bin/perl -w
use strict;
use warnings;
use diagnostics;
# this version reads each variable and loops through the 18 times put only displays on per loop.
my $NODE = `uname -n`;
my $a = "/tmp/";
my $b = $NODE ;
my $c = "_deco.txt";
my $d = "_deco_mini.txt";
chomp $b;
my $STRING = "$a$b$c";
my $STRING_out = "$a$b$d";
my #keyword = ( "Report", "Last", "HP", "sulog", "sudo", "eTrust", "proftp", "process", "active clusters", "pdos", "syslog", "BNY", "syslogmon", "errpt", "ports", "crontab", "NFS", "scripts", "messages");
my $i = 0;
my $keyword="";
my $x=0;
my $y=0;
my $jw="";
my $EOS = "########################################################################";
my $qty_lines=0;
my $skip5=0;
my $skipcnt=0;
my $keeplines=0;
my #HPLOG="";
do {
print "Reading File: [$STRING]\n";
if (-e "$STRING" && open (IN, "$STRING")) {
# ++$x; # proving my loop worked
# print "$x interal loop counter\n"; # proving my loop worked
for ( ++$i) { # working
while ( <IN> ) {
chomp ;
#if ($_ =~ /$keyword/) {
#if ($_ =~ / $i /) {
#if ($_ =~ /$keyword[ $i ]/) {
if ($_ =~ /$keyword $i/) {
print " $i \n";
$skip5=1;
next;
# print "$_\n";# $ not initalized error when tring to use it
}
if ($skip5) {
$skipcnt++;
print "SKIP LINE: $_\n";
print "Header LINE: $_\n";
next if $skipcnt <= 5;
$skip5=0;
$keeplines=1;
}
if ($keeplines) {
# ++$qty_lines; # for final output
last if $_ =~ /$EOS/;
print "KEEP LINE: $_\n";
# print "$qty_lines\n"; # for final output
push #HPLOG, "$_\n";
# push #HPLOG, "$qty_lines\n";# for final output
}
} ## end while ( <IN> )
} ## end for ( ++$i)
} ## end if (-e "$STRING" && open (IN, "$STRING"))
close (IN);
} while ( $i < 19 && ++$y < 18 );
Here is a sample section or the input file.
###############################################################################
Checking for active clusters.
#########
root 11730980 12189848 0 11:24:20 pts/2 0:00 egrep hagsd|harnad|HACMP|haemd
If there are any processes listed you need to remove the server from the cluster.
############################################################################
This is the output from Pdos log
Please review it for anything that looks like a users may be trying to run something.
#########
This server is not on Tamos
############################################################################
This is the output from syslog.conf.
Look for any entries on the right side column that are not the ususal logs or location.
#########
# #(#)34 1.11 src/bos/etc/syslog/syslog.conf, cmdnet, bos610 4/27/04 14:47:53
# IBM_PROLOG_BEGIN_TAG
# This is an automatically generated prolog.
#
# bos610 src/bos/etc/syslog/syslog.conf 1.11
I truncated the rest of the file
Can anyone see my mistake.
I can see quite a lot of mistakes. But I also see some good stuff like use strict and use warnings.
My suggestion for you is to work on your coding style so that it gets easier for you and others to debug any problems.
Naming variables
my $NODE = `uname -n`;
my $a = "/tmp/";
my $b = $NODE ;
my $c = "_deco.txt";
my $d = "_deco_mini.txt";
chomp $b;
my $STRING = "$a$b$c";
my $STRING_out = "$a$b$d";
Why are some of those names all uppercase and others all lower case? If you are building up a filename, why do you call the variable that holds the filename $STRING?
my #keyword = ( "Report", "Last", "HP", "sulog", "sudo", ....
If you have a list of several keywords, wouldn't it be apt to not chose a singular for the variable name? How about #keywords?
Using temporary variables you don't need
my $NODE = `uname -n`;
my $a = "/tmp/";
my $b = $NODE ;
my $c = "_deco.txt";
chomp $b;
my $STRING = "$a$b$c";
Why do you need $a, $b and $c? The (forgive me) stupid names of those vars are a tell-tale sign that you don't need them. How about this instead?
my $node_name = `uname -n`;
chomp $node_name;
my $file_name = sprintf '/tmp/%s/_deco.txt', $node_name;
Your biggest problem: you have no idea how to use arrays
You are making several drastic mistakes when it comes to arrays.
my #HPLOG="";
Do you want an array or another string? The # says array, the "" says string. I guess you wanted a new, empty array, so my #hplog = () would have been much better. But since there is no need to tell perl that you want an empty array as it will give you an empty one anyway, my #hplog; will do the job just fine.
It took me a while to figure out this next one and I'm still not sure whether I'm guessing your intentions correctly:
my #keyword = ( "Report", "Last", "HP", "sulog", "sudo", "eTrust", "proftp", "process", "active clusters", "pdos", "syslog", "BNY", "syslogmon", "errpt", "ports", "crontab", "NFS", "scripts", "messages");
...
if ($_ =~ /$keyword $i/) {
What I think you are doing here is trying to match your current input line against element number $i in #keywords. If my assumption is correct, you really wanted to say this:
if ( /$keyword[ $i ]/ ) {
Iterating arrays
Perl is not C. It doesn't make you jump through hoops to get a loop.
Just look at all the code you wrote to loop through your keywords:
my $i = 0;
...
for ( ++$i) { # working
...
if ($_ =~ /$keyword $i/) {
...
} while ( $i < 19 && ++$y < 18 );
Apart from the facts that your working comment is just self-deception and that you hard-coded the number of elements in your array, you could have just used a for-each loop:
foreach my $keyword ( #keywords ) {
# more code here
}
I'm sure that when you try to work on the above list, the problem that made you ask here will just go away. Have fun.
What's going on:
I've ssh'd onto my localhost, ls the desktop and taken those items and put them into an array.
I hardcoded a short list of items and I am comparing them with a hash to see if anything is missing from the host (See if something from a is NOT in b, and let me know).
So after figuring that out, when I print out the "missing files" I get a bunch of duplicates (see below), not sure if that has to do with how the files are being checked in the loop, but I figured the best thing to do would be to just sort out the data and eliminate dupes.
When I do that, and print out the fixed data, only one file is printing, two are missing.
Any idea why?
#!/usr/bin/perl
my $hostname = $ARGV[0];
my #hostFiles = ("filecheck.pl", "hostscript.pl", "awesomeness.txt");
my #output =`ssh $hostname "cd Desktop; ls -a"`;
my %comparison;
for my $file (#hostFiles) {
$comparison{$file} +=1;
}
for my $file (#output) {
$comparison{$file} +=2
}
for my $file (sort keys %comparison) {
#missing = "$file\n" if $comparison{$file} ==1;
#print "Extra file: $file\n" if $comparison{$file} ==2;
print #missing;
}
my #checkedMissingFiles;
foreach my $var ( #missing ){
if ( ! grep( /$var/, #checkedMissingFiles) ){
push( #checkedMissingFiles, $var );
}
}
print "\n\nThe missing Files without dups:\n #checkedMissingFiles\n";
Password:
awesomeness.txt ##This is what is printing after comparing the two arrays
awesomeness.txt
filecheck.pl
filecheck.pl
filecheck.pl
hostscript.pl
hostscript.pl
The missing Files without dups: ##what prints after weeding out duplicates
hostscript.pl
The perl way of doing this would be:
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my %hostFiles = qw( filecheck.pl 1 hostscript.pl 1 awesomeness.txt 1);
# ssh + backticks + ls, not the greatest way to do this, but that's another Q
my #files =`ssh $ARGV[0] "ls -a ~/Desktop"`;
# get rid of the newlines
chomp #files;
#grep returns the matching element of #files
my %existing = map { $_ => 1} grep {exists($hostFiles{$_})} #files;
print Dumper([grep { !exists($existing{$_})} keys %hostFiles]);
Data::Dumper is a utility module, I use it for debugging or demonstrative purposes.
If you want print the list you can do something like this:
{
use English;
local $OFS = "\n";
local $ORS = "\n";
print grep { !exists($existing{$_})} keys %hostFiles;
}
$ORS is the output record separator (it's printed after any print) and $OFS is the output field separator which is printed between the print arguments. See perlvar. You can get away with not using "English", but the variable names will look uglier. The block and the local are so you don't have to save and restore the values of the special variables.
If you want to write to a file the result something like this would do:
{
use English;
local $OFS = "\n";
local $ORS = "\n";
open F, ">host_$ARGV[0].log";
print F grep { !exists($existing{$_})} keys %hostFiles;
close F;
}
Of course, you can also do it the "classical" way, loop trough the array and print each element:
open F, ">host_$ARGV[0].log";
for my $missing_file (grep { !exists($existing{$_})} keys %hostFiles) {
use English;
local $ORS = "\n";
print F "File is missing: $missing_file"
}
close F;
This allows you to do more things with the file name, for example, you can SCP it over to the host.
It seems to me that looping over the 'required' list makes more sense - looping over the list of existing files isn't necessary unless you're looking for files that exist but aren't needed.
#!/usr/bin/perl
use strict;
use warnings;
my #hostFiles = ("filecheck.pl", "hostscript.pl", "awesomeness.txt");
my #output =`ssh $ARGV[0] "cd Desktop; ls -a"`;
chomp #output;
my #missingFiles;
foreach (#hostFiles) {
push( #missingFiles, $_ ) unless $_ ~~ #output;
}
print join("\n", "Missing files: ", #missingFiles);
#missing = "$file\n" assigns the array #missing to contain a single element, "$file\n". It does this every loop, leaving it with the last missing file.
What you want is push(#missing, "$file\n").