Perl matching multidimensional array elements - arrays

Im not getting any output, anyone get where the issue lies,
matching or calling?
(The two subarrays in the multidimensional array have the same length.)
//Multidimensional array,
//Idarray = Fasta ID, Seqarray = "ATTGTTGGT" sequences
#ordarray = (\#idarray, \#seqarray);
//This calling works
print $ordarray[0][0] , "\n";
print $ordarray[1][0] , "\n", "\n";
// Ordarray output = "TTGTGGCACATAATTTGTTTAATCCAGAT....."
User inputs a search string, loop iterates the sequence dimension,
and counts amount of matches. Prints number of matches and the corresponding ID from the ID dimension.
//The user input-searchstring
$sestri = <>;
for($r=0;$r<#idarray;$r++) {
if ($sestri =~ $ordarray[1][$r] ){
print $ordarray[0][$r] , "\n";
$counts = () = $ordarray[0][$r] =~ /$sestri/g;
print "number of counts: ", $counts ;
}

I think the problem lies with this:
$sestri = <>;
That may well not be doing what you intended - your comment says "user specified search string" but that's not what that operator does.
What it does, is open the filename you specifed on the command line, and 'return' the first line.
I would suggest that if you want to grab a search string from command line you want to do it via #ARGV
E.g.
my ( $sestri ) = #ARGV; # will give first word.
However, please please please switch on use strict and use warnings. You should always do this prior to posting on a forum for assistance.
I would also question quite why you need a two dimensional array with two elements in it though. It seems unnecessary.
Why not instead make a hash, and key your "fasta ids" to the sequence?
E.g.
my %id_of;
#id_of{#seqarray} = #idarray;
my %seq_of;
#seq_of{#id_array} = #seqarray;
I think this would suit your code a bit better, because then you don't have to worry about the array indicies at all.
use strict;
use warnings;
my ($sestri) = #ARGV;
my %id_of;
#id_of{#seqarray} = #idarray;
foreach my $sequence ( keys %id_of ) {
##NB - this is a pattern match, and will be 'true'
## if $sestri is a substring of $sequence
if ( $sequence =~ m/$sestri/ ) {
print $id_of{$sequence}, "\n";
my $count = () = $sequence =~ m/$sestri/g;
print "number of counts: ", $count, "\n";
}
}
I've rewritten it a bit, because I'm not entirely understanding what your code is doing. It looks like it's substring matching in #seqarray but then returning the count of matching elements in #idarray I don't think that makes sense, but if it does, then amend according to your needs.

Related

How to find index of string in array Perl without iterating

I need to find value in array without iterating through whole array.
I get array of strings from file, and I need to get index of some value in this array, I have tried this code, but it doesn't work.
my #array =<$file>;
my $search = "SomeValue";
my $index = first { $array[$_] eq $search } 0 .. $#array;
print "index of $search = $index\n";
Please suggest how can I get index of value, or it is better to get all indexes of line if there are more than one entry.
Thx in advance.
What does "it doesn't work" mean?
The code you have will work fine, except that an element in the array is going to be "SomeValue\n", not "SomeValue". You can remove the newlines with chomp(#array) or include a newline in your $search string.
Your initial question: "I need to find value in array without iterating through whole array."
You can't. It is impossible to check every element of an array, without checking every element of an array. The very best you can do is stop looking once you've found it - but you indicate in your question multiple matches.
There are various options that will do this for you - like List::Util and grep. But they are still doing a loop, they're just hiding it behind the scenes.
The reason first doesn't work for you, is probably because you need to load it from List::Util first. Alternatively - you forgot to chomp, which means your list includes line feeds, where your search pattern doesn't.
Anyway - in the interests of actually giving something that'll do the job:
while ( my $line = <$file> ) {
chomp ( $line );
#could use regular expression based matching for e.g. substrings.
if ( $line eq $search ) { print "Match on line $.\n"; last; }
}
If you want want every match - omit the last;
Alternatively - you can match with:
if ( $line =~ m/\Q$search\E/ ) {
Which will substring match (Which in turn means the line feeds are irrelevant).
So you can do this instead:
while ( <$file> ) {
print "Match on line $.\n" if m/\Q$search\E/;
}

Comparing two strings line by line in Perl

I am looking for code in Perl similar to
my #lines1 = split /\n/, $str1;
my #lines2 = split /\n/, $str2;
for (int $i=0; $i<lines1.length; $i++)
{
if (lines1[$i] ~= lines2[$i])
print "difference in line $i \n";
}
To compare two strings line by line and show the lines at which there is any difference.
I know what I have written is mixture of C/Perl/Pseudo-code. How do I write it in the way that it works on Perl?
What you have written is sort of ok, except you cannot use that notation in Perl lines1.length, int $i, and ~= is not an operator, you mean =~, but that is the wrong tool here. Also if must have a block { } after it.
What you want is simply $i < #lines1 to get the array size, my $i to declare a lexical variable, and eq for string comparison. Along with if ( ... ) { ... }.
Technically you can use the binding operator to perform a string comparison, for example:
"foo" =~ "foobar"
But it is not a good idea when comparing literal strings, because you can get partial matches, and you need to escape meta characters. Therefore it is easier to just use eq.
Using C-style for loops is valid, but the more Perl-ish way is to use this notation:
for my $i (0 .. $#lines1)
Which will iterate over the range 0 to the max index of the array.
Perl allows you to open filehandles on strings by using a reference to the scalar variable that holds the string:
open my $string1_fh, '<', \$string1 or die '...';
open my $string2_fh, '<', \$string2 or die '...';
while( my $line1 = <$string1_fh> ) {
my $line2 = <$string2_fh>;
....
}
But, depending on what you mean by difference (does that include insertion or deletion of lines?), you might want something different.
There are several modules on CPAN that you can inspect for ideas, such as Test::LongString or Algorithm::Diff.
my #lines1 = split(/^/, $str1);
my #lines2 = split(/^/, $str2);
# splits at start of line
# use /\n/ if you want to ignore newline and trailing spaces
for ($i=0; $i < #lines1; $i++) {
print "difference in line $i \n" if (lines1[$i] ne lines2[$i]);
}
Comparing Arrays is a way easier if you create a Hashmap out of it...
#Searching the difference
#isect = ();
#diff = ();
%count = ();
foreach $item ( #array1, #array2 ) { $count{$item}++; }
foreach $item ( keys %count ) {
if ( $count{$item} == 2 ) {
push #isect, $item;
}
else {
push #diff, $item;
}
}
#Output
print "Different= #diff\n\n";
print "\nA Array = #array1\n";
print "\nB Array = #array2\n";
print "\nIntersect Array = #isect\n";
Even after spliting you could compare them as Array.

Perl regex not matching as expected

I'm trying to compare each word in a list to a string to find matching words, but I can't seem to get this to work.
Here is some sample code
my $sent = "this is a test line";
foreach (#keywords) { # array of words (contains the word 'test')
if ($sent =~ /$_/) {
print "match found";
}
}
It seems to work if I manually enter /test/ instead of $_, but I can't enter words manually.
Your code works fine. I hope you have use strict and use warnings in place in the real program? Here is an example where I have populated #keywords with a few items including test.
use strict;
use warnings;
my $sent = "this is a test line";
my #keywords = qw/ a b test d e /;
foreach (#keywords) {
if ($sent =~ /$_/) {
print "match found\n";
}
}
output
match found
match found
match found
So your array doesn't contain what you think it does. I would bet that you've read the data from a file or from the keyboard and forgot to remove the newline from the end of each word with chomp.
You can do that by simply writing
chomp #keywords
which will remove a newline (if there is one) from the end of all elements of #keywords. To see the real contents of #keywords, you can add these lines to your program
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper \#keywords;
You will also see that the elements a and e produce a match as well as test, which I guess you don't want. You could add a word boundary metacharacter \b before and after the value of $_, like this
foreach (#keywords) {
if ( $sent =~ /\b$_\b/ ) {
print "match found\n";
}
}
but a regular expression's definition of a word is very restrictive and allows only alphanumeric characters or an underscore _, so Roger's, "essay", 99%, and nicely-formatted are not "words" in this sense. Depending on your actual data you may want something different.
Finally, I would write this loop more compactly using for instead of foreach (they are identical in every respect) and the postfixed statement modifier form of if, like this
for (#keywords) {
print "match found\n" if $sent =~ /\b$_\b/;
}

Can I use the contents of an array as the keys of a hash?

I want my array to become the keys of my new hash. I am writing a program that counts the number of word occurrences in a document.
my #array = split(" ", $line);
keys my %word_count = #array; #This does nothing
This segment is occuring while I am reading the infile line by line. I am trying to find a way to complete this project using hashes. The words are the keys, and the number of times they appear are the values. But, this step in particular is puzzling me.
Use a hash slice.
my %word_count;
#word_count{split ' ', $line} = ();
# if you like wasting memory:
# my #array = split ' ', $line;
# #word_count{#array} = (0) x #array;
You can't do it that way, certainly.
my %word_count = map {($_, 0)} #array;
would initialize the keys of the hash; but generally in Perl you don't want to do that. Two issues here are that
you need a second pass to actually account for the words in the line;
you can't cheat and change the 0 to 1 above, because if a word is repeated in the line you will count it only once, the others being overwritten.
my %word_count = map { $_ => 0 } split(" ", $line);
You're trying to count the number of occurences of words in a line, right? If so, you want
my %word_count;
++$word_count for split(/\s+/, $line);
Or to put it on its head in order to facilitate refining the definition of a word:
my %word_count;
++$word_count for $line =~ /(\S+)/g;

How to print 'AND' between array elements?

If I have an array with name like below.
How do I print "Hi joe and jack and john"?
The algorithm should also work, when there is only one name in the array.
#!/usr/bin/perl
use warnings;
use strict;
my #a = qw /joe jack john/;
my $mesg = "Hi ";
foreach my $name (#a) {
if ($#a == 0) {
$mesg .= $name;
} else {
$mesg .= " and " . $name;
}
}
print $mesg;
Usually we use an array join method to accomplish this. Here pseudo code:
#array = qw[name1 name2 name2];
print "Hey ", join(" and ", #array), ".";
Untested:
{ local $, = " and "; print "Hi "; print #a; }
Just use the special variable $".
$"="and"; #" (this last double quote is to help the syntax coloring)
$mesg="Hi #a";
To collect the perldoc perlvar answers, you may do one of (at least) two things.
1) Set $" (list separator):
When an array or an array slice is interpolated into a double-quoted
string or a similar context such as /.../ , its elements are separated
by this value. Default is a space. For example, this:
print "The array is: #array\n";
is equivalent to this:
print "The array is: " . join($", #array) . "\n";
=> $" affects the behavior of the interpolation of the array into a string
2) Set $, (output field separator):
The output field separator for the print operator. If defined, this
value is printed between each of print's arguments. Default is undef.
Mnemonic: what is printed when there is a "," in your print statement.
=> $, affects the behavior of the print statement.
Either will work, and either may be used with local to set the value of the special variable only within an enclosing scope. I guess the difference is that with $" you are not limited to the print command:
my #z = qw/ a b c /;
local $" = " and ";
my $line = "#z";
print $line;
here the "magic" happens on the 3rd line not at the print command.
In truth though, using join is the most readable, and unless you use a small enclosing block, a future reader might not notice the setting of a magic variable (say nearer the top) and never see that the behavior is not what is expected vs normal performance. I would save these tricks for small one-offs and one-liners and use the readable join for production code.

Resources