"Use of uninitialized value" when indexing an array - arrays

I get the following error from Perl when trying to run the code below
Use of uninitialized value within #words in concatenation (.) or string...
It references the line where I try to create an array made up of three-word sequences (the line that starts with $trigrams). Can anyone help me figure out the problem?
my %hash;
my #words;
my $word;
my #trigrams;
my $i = 0;
while (<>) {
#words = split;
foreach $word (#words) {
$hash{$word}++;
# Now trying to create the distinct six-grams in the 10-K.
$trigrams[$i] = join " ", $words[$i], $words[$i + 1], $words[$i + 2];
print "$words[$i]\n";
$i++;
}
}

All that is happening is that you are falling off the end of the array #words. You are executing the loop for each element of #words, so the value of $i goes from 0 to $#words, or the index of the final element of the array. So the line
join " ", $words[$i], $words[$i + 1], $words[$i + 2];
accesses the last element of the array $words[$i] and two elements beyond that which don't exist.
In this case, as with any loop which uses the current index of an array, it is easiest to iterate over the array indices instead of the contents. For the join to be valid you need to start at zero and stop at two elements before the end, so 0 .. $#words-2.
It is also neater to use an array slice to select the three elements for the trigram, and use the fact that interpolating an array into a string, as in "#array", will do the same as join ' ', #array. (More precisely, it does join $", #array, and $" is set to a single space by default.)
I suggest this fix. It is essential to use strict and use warnings at the start of every Perl program, and you should declare all your variables using my as late as possible.
use strict;
use warnings;
my %hash;
while (<>) {
my #words = split;
my #trigrams;
for my $i (0 .. $#words - 2) {
my $word = $words[$i];
++$hash{$word};
$trigrams[$i] = "#words[$i,$i+1,$i+2]";
print "$word\n";
}
}
Update
You may prefer this if it isn't too terse for you
use strict;
use warnings;
my %hash;
while (<>) {
my #words = split;
my #trigrams = map "#words[$_,$_+1,$_+2]", 0 .. $#words-2;
}

Related

compare an array of string with another array of strings in perl

I want to compare an array of string with another array of strings; if it matches, print matched.
Example:
#array = ("R-ID 1.0001", "RA-ID 61.02154", "TCA-ID 49.021456","RCID 61.02154","RB-ID 61.02154");
#var = ("TCA-ID 49", "R-ID 1");
for (my $x = 0; $x <= 4; $x++)
{
$array[$x] =~ /(.+?)\./;
if( ($var[0] eq $1) or ($var[1] eq $1) )
{
print "\n deleted rows are :#array\n";
}
else
{
print "printed rows are : #array \n";
push(#Matrix, \#array);
}
Then I need to compare #var with the #array; if it is matched, print the matched pattern.
Here the entire logic is in a hireartical for loop which gives a new #array in each iteration. so every time this logic is executed #array has different strings.
Then comes with #var it is user input field, this #var can be of any size. So in order to run the logic according to these constraints, I need to iterate the condition inside the if loop when the user input #var size is 3 for example.
So the goal is to match and delete the user input stings using the above mentioned logic. But unfortunately tis logic is not working. Could you please help me out in this issue.
The builtin grep keyword is a good place to start.
my $count = grep { $_ eq $var } #array;
This returns a count of items ($_) in the array which are equal (eq) to $var.
If you needed case-insensitive matching, you could use lc (or in Perl 5.16 or above, fc) to do that:
my $count = grep { lc($_) eq lc($var) } #array;
Now, a disadvantage to grep is that it is counting the matches. So after if finds the first match, it will keep on going until the end of the array. You don't seem to want that, but just want to know if any item in the array matches, in which case keeping on going might be slower than you need if it's a big array with thousands of elements.
So instead, use any from the List::Util module (which is bundled with Perl).
use List::Util qw( any );
my $matched = any { $_ eq $var } #array;
This will match as soon as it finds the first matching element, and skip searching the rest of the array.
Here is a couple of versions that allows multiple strings to be matched. Not clear what form $var takes when you want to store multiple, so assuming they are in an array #var for now.
The key point is this one is the use of the lookup hash to to the matching.
use strict;
use warnings;
my #var = ("TCA-ID 49", "RA-ID 61");
my #array = ("R-ID 1", "RA-ID 61", "TCA-ID 49");
# create a lookup for the strings to match
my %lookup = map { $_ => 1} #var ;
for my $entry (#array)
{
print "$entry\n"
if $lookup{$entry} ;
}
running gives
RA-ID 61
TCA-ID 49
Next, using a regular expression to do the matching
use strict;
use warnings;
my #var = ("TCA-ID 49", "RA-ID 61");
my #array = ("R-ID 1", "RA-ID 61", "TCA-ID 49");
my $re = join "|", map { quotemeta } #var;
print "$_\n" for grep { /^($re)$/ } #array ;
output is the same

Perl output successive string from array

I have an array I can print out as "abcd" however I am trying to print it as "a>ab>abc>abcd". I can't figure out the nested loop I need within the foreach loop I have. What loop do I need within it to print it this way?
my $str = "a>b>c>d";
my #words = split />/, $str;
foreach my $i (0 .. $#words) {
print $words[$i], "\n";
}
Thank you.
You had the right idea, but instead of printing the word at position i, you want to print all the words between positions 0 and i (inclusive). Also, your input can contain multiple strings, so loop over them.
use warnings;
while (my $str = <>) { # read lines from stdin or named files
chomp($str); # remove any trailing line separator
my #words = split />/, $str; # break string into array of words
foreach my $i (0 .. $#words) {
print join '', #words[0 .. $i]; # build the term from the first n words
print '>' if $i < $#words; # print separator between terms (but not at end)
}
print "\n";
}
There are many other ways to write it, but hopefully this way helps you understand what's happening and why. Good luck!
one liner:
perl -e '#a=qw(a b c d); for(#a) {$s.=($h.=$_).">"} $s=substr($s,0,-1);print $s'
I would do it like this:
#!/usr/bin/perl
use strict;
use warnings;
my $str = "a>b>c>d>e>f>g";
my #words = split />/, $str;
$" = '';
my #new_words;
push #new_words, "#words[0 .. $_]" for 0 .. $#words;
print join '>', #new_words;
A few things to explain.
Perl will expand array variables in a double-quoted string. So something like this:
#array = ('x', 'y', 'z');
print "#array";
will print x y z. Notice there are spaces between the elements. The string that is inserted between the elements is controlled by the $" variable. So by setting that variable to an empty string we can remove the spaces, so:
$" = '';
#array = ('x', 'y', 'z');
print "#array";
will print xyz.
The most complex line is:
push #new_words, "#words[0 .. $_]" for 0 .. $#words;
That's just a compact way to write:
for (0 .. $#words) {
my $new_word = "#words[0 .. $_]";
push #new_words, $new_word;
}
We iterate across the integers from zero to the last index in #words. Each time around the loop, we use an array slice to get a list of elements from the array, convert that to a string (by putting it in double-quotes) and then push that string onto #new_words.
This is what I ended up with, It's the only way I could understand and get the output I was looking for.
use strict;
use warnings;
my $str = "a>b>c>d>e>f>g";
my #words = split />/, $str;
my $j = $#words;
my $i = 0;
my #newtax;
while($i <= $#words){
foreach my $i (0 .. $#words - $j){
push (#new, $words[$i]);
}
if($i < $#words){
push(#new, ">");
}
$j--;
$i++;
}
print #new;
This output "a>ab>abc>abcd>abcde>abcdef>abcdefg"

Comparing two strings line by line in Perl

I am looking for code in Perl similar to
my #lines1 = split /\n/, $str1;
my #lines2 = split /\n/, $str2;
for (int $i=0; $i<lines1.length; $i++)
{
if (lines1[$i] ~= lines2[$i])
print "difference in line $i \n";
}
To compare two strings line by line and show the lines at which there is any difference.
I know what I have written is mixture of C/Perl/Pseudo-code. How do I write it in the way that it works on Perl?
What you have written is sort of ok, except you cannot use that notation in Perl lines1.length, int $i, and ~= is not an operator, you mean =~, but that is the wrong tool here. Also if must have a block { } after it.
What you want is simply $i < #lines1 to get the array size, my $i to declare a lexical variable, and eq for string comparison. Along with if ( ... ) { ... }.
Technically you can use the binding operator to perform a string comparison, for example:
"foo" =~ "foobar"
But it is not a good idea when comparing literal strings, because you can get partial matches, and you need to escape meta characters. Therefore it is easier to just use eq.
Using C-style for loops is valid, but the more Perl-ish way is to use this notation:
for my $i (0 .. $#lines1)
Which will iterate over the range 0 to the max index of the array.
Perl allows you to open filehandles on strings by using a reference to the scalar variable that holds the string:
open my $string1_fh, '<', \$string1 or die '...';
open my $string2_fh, '<', \$string2 or die '...';
while( my $line1 = <$string1_fh> ) {
my $line2 = <$string2_fh>;
....
}
But, depending on what you mean by difference (does that include insertion or deletion of lines?), you might want something different.
There are several modules on CPAN that you can inspect for ideas, such as Test::LongString or Algorithm::Diff.
my #lines1 = split(/^/, $str1);
my #lines2 = split(/^/, $str2);
# splits at start of line
# use /\n/ if you want to ignore newline and trailing spaces
for ($i=0; $i < #lines1; $i++) {
print "difference in line $i \n" if (lines1[$i] ne lines2[$i]);
}
Comparing Arrays is a way easier if you create a Hashmap out of it...
#Searching the difference
#isect = ();
#diff = ();
%count = ();
foreach $item ( #array1, #array2 ) { $count{$item}++; }
foreach $item ( keys %count ) {
if ( $count{$item} == 2 ) {
push #isect, $item;
}
else {
push #diff, $item;
}
}
#Output
print "Different= #diff\n\n";
print "\nA Array = #array1\n";
print "\nB Array = #array2\n";
print "\nIntersect Array = #isect\n";
Even after spliting you could compare them as Array.

Empty array in a perl while loop, should have input

Was working on this script when I came across a weird anomaly. When I go to print #extract after declaring it, it prints correctly the following:
------MMMMMMMMMMMMMMMMMMMMMMMMMM-M-MMMMMMMM
------SSSSSSSSSSSSSSSSSSSSSSSSSS-S-SSSSSDTA
------TIIIIIIIIIIIIITIIIVVIIIIII-I-IIIIITTT
Now the weird part, when I then try to print or return #extract (or $column) inside of the while loop, it comes up empty, thus rendering the rest of the script useless. I've never come across this before up until now, haven't been able to find any documentation or people with similar problems as mine. Below is the code, I marked with #<------ where the problems are and are not, to see if anyone can have any idea what is going on? Thank you kindly.
P.S. I am utilizing perl version 5.12.2
use strict;
use warnings;
#use diagnostics;
#use feature qw(say);
open (S, "Val nuc align.txt") || die "cannot open FASTA file to read: $!";
open (OUTPUT, ">output.txt");
my #extract;
my $sum = 0;
my #lines = <S>;
my #seq = ();
my $start = 0; #amino acid column start
my $end = 10; #amino acid column end
#Removing of the sequence tag until amino acid sequence composition (from >gi to )).
foreach my $line (#lines) {
$line =~ s/\n//g;
if ($line =~ />/g) {
$line =~ s/>.*\]/>/g;
push #seq, $line;
}
else {
push #seq, $line;
}
}
my $seq = join ('', #seq);
my #seq_prot = join "\n", split '>', $seq;
#seq_prot = grep {/[A-Z]/} #seq_prot;
#number of sequences
print OUTPUT "Number of sequences:", scalar (grep {defined} #seq_prot), "\n";
#selection of amino acid sequence. From $start to $end.
my #vertical_array;
while ( my $line = <#seq_prot> ) {
chomp $line;
my #split_line = split //, $line;
for my $index ( $start..$end ) { #AA position, extracts whole columns
$vertical_array[$index] .= $split_line[$index];
}
}
# Print out your vertical lines
for my $line ( #vertical_array ) {
my $extract = say OUTPUT for unpack "(a200)*", $line; #split at end of each column
#extract = grep {defined} $extract;
}
print OUTPUT #extract; #<--------------- This prints correctly the input
#Count selected amino acids excluding '-'.
my %counter;
while (my $column = #extract) {
print #extract; #<------------------------ Empty print, no input found
}
Update: Found the main problem to be with the unpack command, I thought I could utilize it to split my columns of my input at X elements (43 in this case). While this works, the minute I change $start to another number that is not 0 (say 200), the code brings up errors. Probably has something to do with the number of column elements does not match the lines. Will keep updated.
Write your last while loop the same way as your previous for loop. The assignment
my $column = #extract
is in scalar context, which does not give you the same result as:
for my $column (#extract)
Instead, it will give you the number of elements in the array. Try this second option and it should work.
However, I still have a concern, because in fact, if #extract had anything in it, you would obtain an infinite loop. Is there any code that you did not include between your two commented lines?

2d arr explicitpackage

I've looked through several threads on websites including this one to try and understand why I am getting an undeclared variable error for my usage of my $line . Each element of the #lines array is an array of strings.
The error is in line 25 and 27 with the $line[$count] statement
use strict;
use warnings;
my #lines;
my #sizes;
# read input from stdin file into 2d array
while(<>)
{
push(#lines, my #tokens = split(/\s+/, $_));
}
# search through each array for largest sizes in
# corresponding elements
for (my $count = 0; $count <= 5; $count++)
{
push(#sizes, 0);
foreach my $line (#lines)
{
if(length($line[$count])>$sizes[$count])
{
$sizes[$count] = length($line[$count]);
}
}
}
I can post the full code if it is necessary, but I am pretty sure the error must be in here somewhere.
The problem is here:
push(#lines, my #tokens = split(/\s+/, $_));
Pushing one array into another just adds all elements to the first array. So you are making a really long one dimensional array.
To fix this, use brackets to make an array reference:
push #lines, [ split(/\s+/, $_) ]; #No need for a temp variable.
Also, to access the array reference, you have to de-reference it. Both of these syntaxes are options:
${$line}[$count];
$line->[$count];
I think the second syntax is more readable.
Update: Also, you could simplify your code if you keep track of the longest lengths while you go through the file:
use strict;
use warnings;
use List::Util qw/max/;
my #lines;
my #sizes = (0)x6;
while(<>)
{
push #lines, [ my #tokens = split ];
#sizes = map { max ( length($tokens[$_]), $sizes[$_] ) } 0..$#tokens;
}
Note: The Data::Dumper core module is an invaluable tool when working with complex data structures in Perl.
use Data::Dumper;
print Dumper #lines;
This will print out the complete structure of whatever variable you give it. That way you can see if you actually created what you thought you did.

Resources