I am trying to determine how many time a string, Apples appears in a text file and in which lines it appears.
The script outputs incorrect line numbers, instead it outputs numbers consecutively (1,2,..) and not the correct lines for the word.
file.txt
Apples
Grapes
Oranges
Apples
Goal Output
Apples appear 2 times in this file
Apples appear on these lines: 1, 4,
Instead my output as illustrated from the code below is:
Apples appear 2 times in this file
Apples appear on these lines: 1, 2,
Perl
my $filename = "<file.txt";
open( TEXT, $filename );
$initialLine = 10; ## holds the number of the line
$line = 0;
$counter = 0;
# holder for line numbers
#lineAry = ();
while ( $line = <TEXT> ) {
chomp( $line );
if ( $line =~ /Apples/ ) {
while ( $line =~ /Apples/ig ) {
$counter++;
}
push( #lineAry, $counter );
$initialLine++;
}
}
close( TEXT );
# print "\n\n'Apples' occurs $counter times in file.\n";
print "Apples appear $counter times in this file\n";
print "Apples appear on these lines: ";
foreach $a ( #lineAry ) {
print "$a, ";
}
print "\n\n";
exit;
There are a number of problems with your code, but the reason for the line numbers being printed wrongly is that you are incrementing your variable $counter once each time Apples appears on a line and saving it to #lineAry. That is different from the number of the line where the string appears, and the easiest fix is to use the built-in variable $. which represents the number of times a read has been performed on the file handle
In addition, I would encourage you to use lexical file handles, and the three-parameter form of open, and check that every call to open has succeeded
You never use the value of $initialLine, and I don't understand why you have initialised it to 10
I would write it like this
use strict;
use warnings 'all';
my $filename = 'file.txt';
open my $fh, '<', $filename or die qq{Unable to open "$filename" for input: $!};
my #lines;
my $n;
while ( <$fh> ) {
push #lines, $. if /apples/i;
++$n while /apples/ig;
}
print "Apples appear $n times in this file\n";
print "Apples appear on these lines: ", join( ', ', #lines ), "\n\n";
output
Apples appear 2 times in this file
Apples appear on these lines: 1, 4
Change
push(#lineAry, $counter);
to
push(#lineAry, $.);
$. is a variable that stores the line number when using perl's while (<>).
The alternative, if you want to use your $counter variable, is that you move the increment to increment on every line, not on every match.
Related
Below is my script.
I have attempted many print statements to work out why it is only accessing the first array element. The pattern match works. The array holds a minimum 40 elements. I have checked and it is full.
I have printed each line, and each line prints.
my $index = 0;
open(FILE, "$file") or die "\nNot opening $file for reading\n\n";
open(OUT, ">$final") or die "Did not open $final\n";
while (<FILE>) {
foreach my $barcode (#barcode) {
my #line = <FILE>;
foreach $_ (#line) {
if ($_ =~ /Barcode([0-9]*)\t$barcode[$index]\t$otherarray[$index]/) {
my $bar = $1;
$_ =~ s/.*//;
print OUT ">Barcode$bar"."_"."$barcode[$index]\t$otherarray[$index]";
}
print OUT $_;
}
$index++;
}
}
Okay, lets say the input was:
File:
Barcode001 001 abc
Barcode002 002 def
Barcode003 003 ghi
#barcode holds:
001
002
003
#otherarray holds:
abc
def
ghi
The output result for this script is currently printing only:
Barcode001_001 abc
It should be printing:
>Barcode001_001 abc
>Barcode002_002 def
>Barcode003_003 ghi
Where it should be printing a whole load up to ~40 lines.
Any ideas? There must be something wrong with the way I am accessing the array elements? Or incrementing? Hoping this isn't something too silly!
Thanks in advance.
It needs the index because I am trying to match arrays in parallel, as they are ordered. Each line needs to match the corresponding indices of the arrays to each line in the file.
It's a little hard to answer with certainty without more information about the contents of #barcode and FILE, but there is something odd in your code which makes me think that it might be the problem.
The construct while (<FILE>) { ... } will, until end of file, read a line from FILE into $_ and then execute the contents of the loop. In your code, you also read all the lines from FILE from within the loop that iterates over #barcode. I think it is likely that you intended to check each line from FILE against all the elements of #barcode, which would make the loop look like the following:
while (my $line = <FILE>) {
foreach my $barcode (#barcode) {
if ($line =~ /Barcode([0-9]*)\t$barcode/) {
my $bar = $1;
print OUT ">Barcode$bar"."_"."$barcode\n";
}
else {
print OUT $line;
}
}
}
I've taken the liberty of doing a bit of code tidying, but I may have made some unwarranted assumptions.
Your core problem in the above is - in your first iteration you slurp all of your file into #lines. But because it's lexically scoped to the loop, it disappears when that loop completes.
Furthermore:
I would strongly suggest that you don't use $_ like that.
$_ is a special variable that's set implicitly in loops. I'd strongly suggest that you need to replace that with something that isn't a special variable, because that's a sure way to cause yourself pain.
turn on use strict; and use warnings;
use 3 argument open with a lexical filehandle.
perltidy your code, so the bracketing looks right.
you've a search and replace pattern on $_ that's emptying it completely, but then you're trying to print it. You may well not be printing what you think you're printing.
You're accessing <FILE> outside and inside your loop. This will cause you problems.
Barcode([0-9]*) - with a '*' there you're saying 'zero or more' is valid. You may want to consider \d+ - one or more digits.
referencing multiple arrays by index is messy. I'd suggest coalescing them into a hash lookup (lookup by key - barcode)
This line:
my #line = <FILE>;
reads your whole file into #line. But you do this inside the while loop that iterates... each line in <FILE>. Don't do that, it's horrible.
Is this something like what you wanted?
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #barcode = qw (
001
002
003
);
my #otherarray = qw (
abc
def
ghi
);
my %lookup;
#lookup{#barcode} = #otherarray;
print Dumper \%lookup;
#commented because I don't have your source data
#my $file = "input_file_name";
#my $output = "output_file_name";
#open( my $input, "<", $file ) or die "\nNot opening $file for reading\n\n";
#open( my $output, ">", $final ) or die "Did not open $final\n";
#while ( my $line = <$input> )
while ( my $line = <DATA> ) {
foreach my $barcode (#barcode) {
if ( my ($bar) = ( $line =~ /Barcode(\d+)\s+$barcode/ ) ) {
print ">Barcode$bar" . "_" . "$barcode $lookup{$barcode}\n";
#print {$output} ">Barcode$bar" . "_" . "$lookup{$barcode}\n";
}
}
}
__DATA__
Barcode001 001
Barcode002 002
Barcode003 003
Prints:
$VAR1 = {
'001' => 'abc',
'002' => 'def',
'003' => 'ghi'
};
>Barcode001_001 abc
>Barcode002_002 def
>Barcode003_003 ghi
It turns out it was a simple issue as I had suspected being a Monday. I had a colleague go through it with me, and it was the placing of the index:
#my $index = 0; #This means the index is iterated through,
#but for each barcode for one line, then it continues
#counting up and misses the other values, therefore
#repeatedly printing just the first element of the array.
open(FILE, "$file") or die "\nNot opening $file for reading\n\n";
open(OUT, ">$final") or die "Did not open $final\n";
while (<FILE>) {
$index = 0; #New placement of $index for initialising.
foreach my $barcode (#barcode) {
my #line = <FILE>;
foreach $_ (#line) {
if ($_ =~ /Barcode([0-9]*)\t$barcode[$index]\t$otherarray[$index]/) {
my $bar = $1;
$_ =~ s/.*//;
print OUT ">Barcode$bar"."_"."$barcode[$index]\t$otherarray[$index]";
}
print OUT $_;
$index++; #Increment here
}
#$index++;
}
}
Thanks to everyone for their responses, for my original and poorly worded question they would have worked and may be more efficient, but for the purpose of the script and my edited question, it needs to be this way.
I expected the following to print in the order of the elements of #Data, but it's printing in the order of the elements of #Queries. Am I missing something? I also tried declaring the items to be printed after foreach(#data){... and then printing inside that loop, but still wrong order.
$datafile is a file with the following:
GR29929,JAMES^BOB
GR21122,HANK^REN
$queryfile is a file with the following:
(3123123212):# FD [GR21122]
line 2
line 3
line 4
(12): # FD [HANK^REN]
line 6
line 7
line 8
(13): # FD [Y]
-------------------------------
--------------------------------
(3123123212):# FD [GR29929]
line 2
line 3
line 4
(12): # FD [JAMES^BOB]
line 6
line 7
line 8
(13): # FD [Z]
The output file is:
GR21122,HANK^WREN,Y
GR29929,JAMES^BOB,Z
When I want:
GR29929,JAMES^BOB,Z
GR21122,HANK^WREN,Y
Code is:
open(DA, "<$datafile");
open(QR, "<$queryfile");
my #Data = <DA>;
my #Queries = <QR>;
foreach (#Data) {
my ( $acce, $namee ) = split( ',', $_ );
chomp $acce;
chomp $namee;
print "'$acce' and '$namee'\n";
for my $i ( 0 .. $#Queries ) {
my $Qacce = $Queries[$i];
my $Qname = $Queries[ $i + 4 ];
my $Gen = $Queries[ $i + 8 ];
if ( $Qacce =~ m/$acce/ and $Qname =~ m/$namee/ ) {
my ($acc) = $Qacce =~ /\[(.+?)\]/;
my ($gen) = $Gen =~ /\[(.+?)\]/;
$gen =~ s/\s+$//;
my ($name) = $Qname =~ /\[(.+?)\]/;
print GL "$i,$acc,$gen,$name\n";
}
}
}
The basic shell of your program prints what you ask for, but there is a lot missing. The refactoring below should do what you want.
You had a problem with the values of your $i index variable, so that the first time around the loop you were accessing #data elements [0, 4, 8], the second time [1, 5, 9] etc. It looks like the second loop execution should use elements [11, 15, 19] and so on. Please correct me if I'm wrong.
In addition you were using regular expressions to compare the keys in the two files, and you were finding nothing because the name values contain caret ^ characters which are special within regexes. Escaping the strings using \Q...\E fixed this.
Note that a better solution would use hashes to match keys across the two files, but without details on your file format - particularly queryfile - I have had to follow your own algorithm.
use strict;
use warnings;
use autodie;
my ($data_file, $query_file) = qw/ datafile.txt queryfile.txt /;
my #queries = do {
open my $query_fh, '<', $query_file;
<$query_fh>;
};
chomp #queries;
open my $data_fh, '<', $data_file;
while (<$data_fh>) {
chomp;
my ($acce, $namee) = split /,/;
for (my $i = 0; $i < #queries; $i += 11) {
my ($qacce, $qname, $qgen) = #queries[$i, $i+4, $i+8];
if ( $qacce =~ /\Q$acce\E/ and $qname =~ /\Q$namee\E/ ) {
my ($acc, $name, $gen) = map / \[ ( [^\[\]]+ ) \] /x, ($qacce, $qname, $qgen);
$gen =~ s/\s+\z//;
print "$acc,$name,$gen\n";
}
}
}
output
GR29929,JAMES^BOB,Z
GR21122,HANK^REN,Y
I'm trying to simply pop off each numeric value and add them together to gain a total.
Input file:
Samsung 46
RIM 16
Apple 87
Microsoft 30
My code compiles, however, it only returns 0:
open (UNITS, 'units.txt') || die "Can't open it $!";
my #lines = <UNITS>;
my $total = 0;
while (<UNITS>) {
chomp;
my $line = pop #lines;
$line += $total;
}
print $total;
No need to slurp all lines into an array if you're just going to loop through them anyway with a while. Also, you need to split each line to get your numbers.
use warnings;
use strict;
open (UNITS, 'units.txt') || die "Can't open it $!";
my $total = 0;
while (<UNITS>) {
chomp;
my $num = (split)[1];
$total += $num;
}
print "$total\n";
__END__
179
There are three problems here
You are trying to add strings like 'Samsung 46' + 'RIM 16'
You read the entire file into #lines and then try to read more from the file in the while loop. That loop is never entered because you have already read to end of file
You are adding $total to the (undeclared) variable $line within the loop, instead of the other way around. So $total remains at zero and $line keeps having zero added to it
It is best to use while to read files unless you need something other than sequential access to the records, so removing #lines is a start.
It isn't completely clear which part of the records you want to accumulate. This program splits the lines on whitespace and adds together the last field of each line.
You must always use strict and use warnings at the start of every program. It is a measure that will make it far easier to locate bugs in your code. It is also best to use lexical file handles rather than the global one you used, and the three-parameter form of open.
use strict;
use warnings;
open my $units, '<', 'units.txt' or die "Can't open it: $!";
my $total;
while (<$units>) {
my #fields = split;
$total += $fields[-1];
}
print $total;
output
179
use strict;
use warnings;
open my $fh, "<", "units.txt" or die "well...";
my $total = 0;
while(<$fh>){
chomp;
my ($string, $num) = split(" ", $_);
$total += $num;
}
print $total;
This problem is a doddle with a one-liner:
$ perl -ane '$sum += $F[1] }{ print $sum' units.txt
Explanation
-a enables autosplit, each line is split and stored in #F
-n loops over the file line by line
-e tells perl that the next argument is to be treated as Perl code
the LHS of the Eskimo-kiss (that funny-looking }{ in the middle) is performed for every line in the input file, RHS performed only once
LHS accumulates the second column of every line in $sum
RHS prints the result of $sum once all lines have been processed
The following code reads a file that contains many lines. Some lines of the file contain four elements. Other lines contain only a first element, followed single spaces separated by tabs (it is a tab delimited file). That is, some lines are "full" and others are "blank".
The point of this script is to read the data file, find an instance of a blank line, then remember the immediately preceding line (a full line), scroll to find all consecutive blank lines until the next full line is reached. This set of lines, consecutive blank lines flanked by immediately preceding full line and immediately succeeding full line, is to be used by a subroutine that will apply linear interpolation to "fill in" the blank lines. The information in the flanking full lines for each set will be used in the interpolation step. The script was an answer to a previously posted question, and was provided kindly by user #Kenosis. It is duplicated here but with some very minor changes in its layout---not as neat as #Kenosis originally proposed. You can see this interaction at Perl. Using until function
#!/usr/bin/perl
use strict; use warnings;
die "usage: [ map positions file post SAS ]\n\n" unless #ARGV == 1;
my $mapfile = $ARGV[ 0 ];
open( my $FILE, "<$mapfile" );
my #file = <$FILE>;
for ( my $i = 1 ; $i < $#file ; $i++ ) # $#file returns the index of the last element in #file
{
if ( $file[$i] =~ /(?:\t\s){3}/ ) # if a blank line is found
{
print $file[ $i - 1 ]; # print preceding line
while ( $file[$i] =~ /(?:\t\s){3}/ and $i < $#file ) # keep printing so long as they are blank
# or end of file
{
#print $file[ $i++ ] # one-column, blank line
}
print $file[ $i ]; # print the succeeding full line
} # if
} # for
The problem comes when I try to insert a modification.
my #collect = (); # array collects a current set of consecutive lines needed for linear interpolation
my #file = <$FILE>;
for ( my $i = 1 ; $i < $#file ; $i++ ) # $#file returns the index of the last element in #file
{
if ( $file[$i] =~ /(?:\t\s){3}/ ) # if a blank line is found
{
print $file[ $i - 1 ]; # print preceding line
push( #collect, $file[ $i - 1 ] );
while ( $file[$i] =~ /(?:\t\s){3}/ and $i < $#file ) # keep printing so long as they are blank
# or end of file
{
#print $file[ $i++ ]; # one-column, blank line
push( #collect, $file[ $i++ ] )
}
print $file[ $i ]; # else, succeeding full line
push( #collect, $file[ $i ] );
} # if
} # for
The culprit is in the while loop. Adding the push command there changes the behavior of the script. The script is no longer printing all the lines as the first script above. Why does adding that command change how the script is supposed to work?
What are you trying to do in that push line?
It includes the expression $i++, which adds 1 to the value of $1, so each iteration of that while loop will be jumping down another line in the file.
Do you just mean $i + 1?
Are you really adding a second line of code that increments $i? $i += 1 is not the same as $i += 2
I have two sets of files. One file gives a list of gene names (one gene per line). The second file has a list of gene pairs (e.g., => '1,2' and one gene pair perl line). The gene names are numerical. I want to list all possible gene combinations except the known gene pairs.
My output should be:
3,4
4,5
6,7
...
...
But, I get something like this =>
,4
,5
,7
All the first elements do not print. I'm not sure exactly what's wrong with the code. Can anyone help?
My code:
#! usr/bin/perl
use strict;
use warnings;
if (#ARGV !=2) {
die "Usage: generate_random_pairs.pl <entrez_genes> <known_interactions>\n";
}
my ($e_file, $k_file) = #ARGV;
open (IN, $e_file) or die "Error!! Cannot open $e_file\n";
open (IN2, $k_file) or die "Error!! Cannot open $k_file\n";
my #e_file = <IN>; chomp (#e_file);
my #k_file = <IN2>; chomp (#k_file);
my (%known_interactions, %random_interactions);
foreach my $line (#k_file) {
my #array = split (/,/, $line);
$known_interactions{$array[0]} = $array[1];
}
for (my $i = 0; $i <= $#e_file; $i++) {
for (my $j = $i+1 ; $j <= $#e_file; $j++) {
if ((exists $known_interactions{$e_file[$i]}) && ($known_interactions{$e_file[$i]} == $e_file[$j])) {next;}
if ((exists $known_interactions{$e_file[$j]}) && ($known_interactions{$e_file[$j]} == $e_file[$i])) {next;}
print "$e_file[$i],$e_file[$j]\n";
}
}
Your file uses CR LF for line endings, but you're on a system that uses LF for line endings, so your program outputs
"3" <CR> "," "4" <CR> <LF>
which your terminal shows as
,4
Either fix the line endings using
dos2unix inputfile
Or change
chomp(#e_file);
chomp(#k_file);
to
s/\s+\z// for #e_file;
s/\s+\z// for #k_file;