use edit distance on arrays in perl - arrays

I am attempting to compare the edit distance between two arrays. I have tried using Text:Levenshtein.
#!/usr/bin/perl -w
use strict;
use Text::Levenshtein qw(distance);
my #words = qw(four foo bar);
my #list = qw(foo fear);
my #distances = distance(#list, #words);
print "#distances\n";
#results: 3 2 0 3
I however want the results to appear as follows:
2 0 3
2 3 2
Taking the first element of #list through the array of #words and doing the same through out the rest of the elements of #list.
I plan on upscaling this to a much larger arrays.

I'm not sure to understand exactly what you meant, but I think this is what you expect :
#!/usr/bin/perl -w
use strict;
use Text::Levenshtein qw(distance);
my #words = qw(four foo bar);
my #list = qw(foo fear);
foreach my $word (#list) {
my #distances = distance($word, #words);
print "#distances\n";
}

Taking the first element of #list through the array of #words and doing the same through out the rest of the elements of #list.
You just described exactly what you need to do to get the output you would like; loop through the #list array and for each element compute the distance for all elements of the #words array.

Related

create a hash of occurrences in an array with map

I seem to recall that there was a "clever" way to create a hash from an array with Perl with map such that the keys are the elements of the the array and the values are the number of times the element appears. Something like this, although this does not work:
$ perl -e '#a = ('a','a','b','c'); %h = map { $_ => $_ + 1 } #a ; foreach $k (keys (%h)) { print "$k -> $h{$k}\n"}'
c -> 1
b -> 1
a -> 2
$
Am I imagining things? How can I do this?
You can write map {$h{$_}++} #a ignoring its return value, but why would you do this? for (#a){$h{$_}++} is easy enough to type.
So why would you not do it?
map is meant for transforming lists. It takes an input list and generates an output list. It can confuse a reader if you use it in a different way using side effect instead of output.
Also, although map is optimized to not create the output list when called in void context, it is slower:
use warnings;
use strict;
use Benchmark qw/cmpthese/;
my #in = map {chr(int(rand(127)+1))} 1..10000;
my %out;
cmpthese(10000,
{stmtfor => sub{%out = (); $out{$_}++ for #in},
voidmap => sub{%out = (); map {$out{$_}++} #in;},
}
);
__END__
Rate voidmap stmtfor
voidmap 2075/s -- -17%
stmtfor 2513/s 21% --
I'm not sure this is what you are looking for, but you can easily do that with a for rather than a map:
$ perl -e '#a = ('a','a','b','c'); $h{$_}++ for #a; foreach $k (keys (%h)) { print "$k -> $h{$k}\n"}'
c -> 1
b -> 1
a -> 2

Replicate the array number of times in Perl

I have array which contains 5 elements (1,2,3,4,5). I want to replicate this a number of times based on the value set in the scalar $no_of_replication, e.g. 3.
So that my final array would contain (1,2,3,4,5,1,2,3,4,5,1,2,3,4,5).
Here is what I have tried. It gives me scalar content instead of elements.
use strict;
use warnings;
use Data::Dumper;
my #array = (1,2,3,4,5);
print Dumper(\#array);
my $no_of_replication = 3;
my #new_array = #array * $no_of_replication;
print Dumper(\#new_array);
My array(#new_array) should be like (1,2,3,4,5,1,2,3,4,5,1,2,3,4,5).
The operator for that is x and you need to be careful with the array syntax:
#new_array = ( #array ) x $no_of_replication;
Found the solution here:
Multiplying strings and lists in perl via Archive.org

Extract number from array in Perl

I have a array which have certain elements. Each element have two char "BC" followed by a number
e.g - "BC6"
I want to extract the number which is present and store in a different array.
use strict;
use warnings;
use Scalar::Util qw(looks_like_number);
my #band = ("BC1", "BC3");
foreach my $elem(#band)
{
my #chars = split("", $elem);
foreach my $ele (#chars) {
looks_like_number($ele) ? 'push #band_array, $ele' : '';
}
}
After execution #band_array should contain (1,3)
Can someone please tell what I'm doing wrong? I am new to perl and still learning
To do this with a regular expression, you need a very simple pattern. /BC(\d)/ should be enough. The BC is literal. The () are a capture group. They save the match inside into a variable. The first group relates to $1 in Perl. The \d is a character group for digits. That's 0-9 (and others, but that's not relevant here).
In your program, it would look like this.
use strict;
use warnings;
use Data::Dumper;
my #band = ('BC1', 'BC2');
my #numbers;
foreach my $elem (#band) {
if ($elem =~ m/BC(\d)/) {
push #numbers, $1;
}
}
print Dumper #numbers;
This program prints:
$VAR1 = '1';
$VAR2 = '2';
Note that your code had several syntax errors. The main one is that you were using #band = [ ... ], which gives you an array that contains one array reference. But your program assumed there were strings in that array.
Just incase your naming contains characters other than BC this will exctract all numeric values from your list.
use strict;
use warnings;
my #band = ("AB1", "BC2", "CD3");
foreach my $str(#band) {
$str =~ s/[^0-9]//g;
print $str;
}
First, your array is an anonymous array reference; use () for a regular array.
Then, i would use grep to filter out the values into a new array
use strict;
use warnings;
my #band = ("BC1", "BC3");
my #band_array = grep {s/BC(\d+)/$1/} #band;
$"=" , "; # make printing of array nicer
print "#band_array\n"; # print array
grep works by passing each element of an array in the code in { } , just like a sub routine. $_ for each value in the array is passed. If the code returns true then the value of $_ after the passing placed in the new array.
In this case the s/// regex returns true if a substitution is made e.g., the regex must match. Here is link for more info on grep

Perl array element comparing

I am new in Perl programming. I am trying to compare the two arrays each element. So here is my code:
#!/usr/bin/perl
use strict;
use warnings;
use v5.10.1;
my #x = ("tom","john","michell");
my #y = ("tom","john","michell","robert","ricky");
if (#x ~~ #y)
{
say "elements matched";
}
else
{
say "no elements matched";
}
When I run this I get the output
no elements matched
So I want to compare both array elements in deep and the element do not matches, those elements I want to store it in a new array. As I can now compare the only matched elements but I can't store it in a new array.
How can I store those unmatched elements in a new array?
Please someone can help me and advice.
I'd avoid smart matching in Perl - e.g. see here
If you're trying to compare the contents of $y[0] with $x[0] then this is one way to go, which puts all non-matches in an new array #keep:
use strict;
use warnings;
use feature qw/say/;
my #x = qw(tom john michell);
my #y = qw(tom john michell robert ricky);
my #keep;
for (my $i = 0; $i <$#y; $i++) {
unless ($y[$i] eq $x[$i]){
push #keep, $y[$i];
}
}
say for #keep;
Or, if you simply want to see if one name exists in the other array (and aren't interested in directly comparing elements), use two hashes:
my (%x, %y);
$x{$_}++ for #x;
$y{$_}++ for #y;
foreach (keys %y){
say if not exists $x{$_};
}
It would be well worth your while spending some time reading the Perl FAQ.
Perl FAQ 4 concerns Data Manipulation and includes the following question and answer:
How do I compute the difference of two arrays? How do I compute
the intersection of two arrays?
Use a hash. Here's code to do both and more. It assumes that each
element is unique in a given array:
my (#union, #intersection, #difference);
my %count = ();
foreach my $element (#array1, #array2) { $count{$element}++ }
foreach my $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
Note that this is the symmetric difference, that is, all elements
in either A or in B but not in both. Think of it as an xor
operation.

Remove elements from an array which have a substring that is itself an element of the array

In Perl, I'd like to remove all elements from an array where another element of the same array is a non-empty substring of said element.
Say I have the array
#itemlist = ("abcde", "ab", "khi", "jklm");
In this instance I would like to have the element "abcde" removed, because "ab" is a substring of "abcde".
I could make a copy of the array (maybe as a hash?), iterate over it, try to index with every element of the original array and remove it, but there has to be a more elegant way, no?
Thanks for your help!
Edited for clarity a bit.
You could construct a regex from all the items and throw out anything that matches:
$alternation = join('|', map(quotemeta, #itemlist));
#itemlist = grep !/($alternation).|.($alternation)/, #itemlist;
The ().|.() thing just ensures that an item doesn't match itself.
Well, I wouldn't call this elegant, but here goes:
#!usr/bin/perl
use strict;
use warnings;
my #itemlist = ("abcde", "ab", "khi", "jklm");
#itemlist = grep {
#itemlist ~~ sub {$_ !~ /(?:.\Q$_[0]\E|\Q$_[0]\E.)/}
} #itemlist;
print "#itemlist";
It relies on a rather obscure behavior of smart match: if the left argument is an array and the right argument a sub, it calls the sub for each element, and the final result is true only if the sub returns true for each element.
Explanation: for each element of the array, it checks that no other element is a substring of that element (requiring at least one additional character so that elements won't match themselves).
Note: wdebeaum's answer is probably the one I would prefer in the real world. Still, it is kind of interesting the strange things one can do with smart match.
wdebeaum's answer is the solution to use, not the one below, but I learned something by doing it and perhaps someone else will too. After I had written mine I decided to test it on lists of several thousand elements.
b.pl:
#!/usr/bin/perl
use strict;
use warnings;
my #itemlist = <>;
for(#itemlist) { chomp; }
my $regex;
if(defined $ENV{wdebeaum}) {
# wdebeaum's solution
my $alternation = join('|', map(quotemeta, #itemlist));
$regex = qr/(?:$alternation).|.(?:$alternation)/;
} else {
# my solution
$regex = join "|", map {qq{(?:\Q$_\E.)|(?:.\Q$_\E)}} #itemlist;
}
my #result = grep !/$regex/, #itemlist;
print scalar #itemlist, "\t", scalar #result, "\n";
I generated a list of 5000 random words.
sort -R /usr/share/dict/american-english|head -5000 > some-words
For small lists both solutions seem fine.
$ time head -200 some-words | wdebeaum=1 ./b.pl
200 198
real 0m0.012s
user 0m0.004s
sys 0m0.004s
$ time head -200 some-words | ./b.pl
200 198
real 0m0.068s
user 0m0.060s
sys 0m0.004s
But for larger lists, wdebeaum's is clearly better.
$ time cat some-words | wdebeaum=1 ./b.pl
5000 1947
real 0m0.068s
user 0m0.064s
sys 0m0.000s
$ time cat some-words | ./b.pl
5000 1947
real 0m8.305s
user 0m8.277s
sys 0m0.012s
I think the reason for the difference, is that even though both regular expressions have the same number of possible paths, my regex has more paths that have to be tried, since it has the same number of .s as paths, while wdebebaum's has only two.
You can use a hash to count substrings of all the words. Any word in the list that has a higher count than one is then a substring of another word. The minimum length of the substrings is two in this example:
use strict;
use warnings;
use feature 'say';
my #list = qw(abcde ab foo foobar de oba cd xs);
my %count;
for my $word (#list) {
my $len = length $word;
$count{$word}++;
for my $start (0 .. $len - 2) {
for my $long (2 .. $len - 2) {
my $sub = substr($word, $start, $long);
$count{$sub}++;
}
}
}
say for grep $count{$_} == 1, #list;
Output:
abcde
foobar
xs
The following will remove the substring from the array.
#!/usr/bin/perl
use strict;
use warnings;
my #ar=("asl","pwe","jsl","nxu","sl","baks","ak");
foreach my $i (#ar){
my $p = grep /$i/, #ar;
if ( $p == 1 ){
print "$i" , "\n";
}
}
I had the inverse problem: removing from the list strings which are substrings of other strings. Here is my not-too-elegant solution.
sub remove_substrings_from_list {
my #list = #_;
my #vals_without_superstrings;
my %hash_of_others;
for ( 0 .. $#list ) {
my $a = shift #list;
$hash_of_others{$a} = [ #list ];
push #list, $a;
}
foreach my $k ( keys %hash_of_others ) {
push #vals_without_superstrings, $k unless grep { index( $_, $k ) != -1 } #{ $hash_of_others{$k} };
}
return #vals_without_superstrings;
}

Resources