Lets say the array is (1,2,3,4,5,6,7,8,9),
Another subarray is (2,3,4)
Is there a function to check if the subarray pattern(full exact order) exists within array?
In this case, it would return any indicator(index) that shows it exists.
Also would need to work for duplicates if there are multiple subarrays existing in the array like (4,2,3,4,2,3,4).
If it happens to match multiple times for example:
Array = (2,3,2,3,2,2,3,2)
Sub Array = (2,3,2)
Would just return starting index of matches in order: 0,2,5
Or if it removes, would result in (3,2)
Edit: Elements don't have to be num
There's no built-in method, but it's easy to write:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
# Takes two arrayrefs of numbers.
#
# Returns the first index in the first one where the second list appears, or
# -1 if not found.
sub find_sublist(++) {
my ($haystack, $needle) = #_;
my $nlen = #$needle;
my $hlen = #$haystack;
return -1 if $hlen == 0 || $nlen == 0;
HAYSTACK_POS:
for (my $n = 0; $n <= $hlen - $nlen; $n++) {
for (my $m = 0; $m < $nlen; $m++) {
if ($haystack->[$n + $m] != $needle->[$m]) {
next HAYSTACK_POS;
}
}
return $n;
}
return -1;
}
# Takes two arrayrefs of numbers.
#
# Returns a list of the starting indexes of the first list
# of every run of the second list. Returns an empty list if
# there are no matches.
sub find_sublists(++) {
my ($haystack, $needle) = #_;
my $nlen = #$needle;
my $hlen = #$haystack;
my #positions;
return #positions if $hlen == 0 || $nlen == 0;
HAYSTACK_POS:
for (my $n = 0; $n <= $hlen - $nlen; $n++) {
for (my $m = 0; $m < $nlen; $m++) {
if ($haystack->[$n + $m] != $needle->[$m]) {
next HAYSTACK_POS;
}
}
push #positions, $n;
}
return #positions;
}
# Takes two arrayrefs of numbers.
#
# Returns a new list that is the first one with every non-overlapping run of
# the second second list removed.
sub remove_sublists(++) {
my #haystack = #{$_[0]};
my $needle = $_[1];
while ((my $pos = find_sublist #haystack, $needle) != -1) {
splice #haystack, $pos, #$needle;
}
return #haystack;
}
my #list1 = (1,2,3,4,5,6,7,8,9);
my #list2 = (4,2,3,4,2,3,4);
my #list3 = (2,3,2,3,2,2,3,2);
say find_sublist(#list1, [2, 3, 4]); # Returns 1
say find_sublist([2,9,3,4], [2,3,4]); # Returns -1
my #positions = find_sublists(#list2, [2,3,4]); # 1,4
say join(",", #positions);
#positions = find_sublists(#list3, [2,3,2]); # 0,2,5
say join(",", #positions);
say join(",", remove_sublists(#list1, [2,3,4])); # 1,5,6,7,8,9
say join(",", remove_sublists(#list3, [2,3,2])); # 3,2
If the inputs are numbers representable by your perl's integers (as shown), you can use
# Indexes
my $pattern = pack "W*", #pattern;
my $array = pack "W*", #array;
my #indexes;
push #indexes, $-[0] while $array =~ /\Q$pattern/g;
# Removal
my $pattern = pack "W*", #pattern;
my $array = pack "W*", #array;
$array =~ s/\Q$pattern//g;
#array = unpack "W*", $array;
How it handles overlaps:
/---\ /---\ Removed
2,3,2 from 2,3,2,3,2,2,3,2
\---/ Not removed
Note that this also works if you can map the inputs to numbers.
my ( %map_f, #map_r );
for ( #array, #pattern ) {
if ( !exists{ $map{ $_ } } ) {
$map_f{ $_ } = #map_r;
push #map_r, $_;
}
}
my $pattern = pack "W*", #map_f{ #pattern };
my $array = pack "W*", #map_f{ #array };
$array =~ s/\Q$pattern//g;
#array = #map_r[ unpack "W*", $array ];
It's not the best algorithm, but it should be very fast by moving the work from Perl to the regex engine.
I have an array, say #array1 = qw(abc def ghi jkl).
Now, I want to use this array in a way that elements are shifted 1 by 1, but that shifting takes place virtually, and not in the array.
Like, "shift" will shift the elements and remove them from the array. But, I don't want those elements to be removed.
Short Code Snippet:
while (my $rName = shift #array1) {
my $bName = shift #array1 ;
## Do something now with the value
}
##And now, I want that I can use #array1 again with the original elements residing
How can it be implemented?
In Perl 5.36 you'll be able to do this:
for my ($rName, $bName) (#array1) { ... }
Use a C-style for loop and increment by two. $#foo is the index of the last element.
my #foo = 0 .. 5;
for (my $i = 0; $i <= $#foo; $i += 2){
my $r_name = $foo[$i];
my $b_name = $foo[$i+1];
}
If you wanted fancier-looking code, you could use natatime from List::MoreUtils on CPAN, which gives you an iterator that you can use in a while loop.
use List::MoreUtils 'natatime';
my #foo = 0 .. 5;
my $it = natatime 2, #foo;
while ( my ($r_name, $b_name) = $it->() ) {
print "$r_name $b_name\n";
}
You can also use pairs from the core List::Util module:
A convenient shortcut to operating on even-sized lists of pairs, this function returns a list of ARRAY references, each containing two items from the given list.
#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;
use List::Util qw/pairs/;
my #array1 = qw/a 1 b 2 c 3/;
for my $pair (pairs #array1) {
my ($rName, $bName) = #$pair;
say "$rName => $bName";
}
say "#array1";
# Require Perl 5.36
for my ( $rName, $bName ) ( #array1 ) {
...
}
my #array2 = #array1;
while ( #array2 ) {
my $rName = shift #array2;
my $bName = shift #array2;
...
}
for my $i ( 0 .. $#array1/2 ) {
my $rName = $array1[ $i * 2 + 0 ];
my $bName = $array1[ $i * 2 + 1 ];
...
}
for ( my $i = 0; $i < #array1; ) {
my $rName = $array1[ $i++ ];
my $bName = $array1[ $i++ ];
...
}
use List::Util qw( pairs );
for ( pairs #array1 ) {
my ( $rName, $bName ) = #$_;
...
}
I tried to organize them from fastest to slowest, but I didn't actually benchmark anything.
Three of the solutions were previously mentioned, but noone mentioned what I think is the fastest solution that doesn't require 5.36 (which hasn't been released yet).
I am trying to split an array into batches of 500, but I want to cover the case where the array size is not evenly divided by 500 as well. For example:
Array = [1,...,1100]
subArray1 = [1..500]
subArray2 = [500..1000]
subArray3 = [1000..1100]
Here is what I have so far:
my #array = [prepopulated];
my $array_size = scalar(#array);
my $start = 0;
my $end = 500;
while ($start < $array_size) {
my #subArray = #array[$start..$end];
print Dumper(\#subArray);
$start = $end;
if($end + 500 > $array_size) {
$end = $start + ($array_size % 500);
} else {
$end += 500;
}
}
This seems to work, but is there a better way to do this or does this break for some case I am not seeing?
List::MoreUtils covers exactly this kind of task:
use strict; use warnings;
use List::MoreUtils qw(natatime);
my #Array = (1..1100);
my $it = natatime 500, #Array;
while (my #vals = $it->()) {
printf "%d\n", 0+#vals;
}
500
500
100
No need to reinvent the wheel.
I think splice would be simpler and cleaner:
use strict;
use warnings;
my #array = (1 .. 1100);
my $batch_size = 500;
while (#array) {
my #batch = splice(#array, 0, $batch_size);
# ...
}
Keep in mind that perl's range operator start..end is inclusive both sides. 0..500 gives you: 0, 1, ..., 499, 500
Assuming you don't want to duplicate boundary elements (e.g. index 500 included in two sub arrays), try the following.
Summary:
change $end initialization to 499
inside the loop set $start = $end+1
when checking if the new end is overflowing use >= instead: $end + 500 >= $array_size
code
my #array = [prepopulated];
my $array_size = scalar(#array);
my $start = 0;
my $end = 499;
while ($start < $array_size) {
my #subArray = #array[$start..$end];
print Dumper(\#subArray);
$start = $end+1;
if($end + 500 >= $array_size) {
$end = $array_size-1;
} else {
$end += 500;
}
}
Creating a second array of indexes for natatime is extremely wasteful.
use List::Util qw( min );
for (my $start=0; $start<#array; $start+=500) {
my $end = min($start+500, 0+#array);
...
}
I am fairly new to Perl.
I have a large array of numbers which is sorted first:
my #numbers = do {
open my $fh, '<', glob("hotfolder/*.out") or die $!;
local $/;
<$fh> =~ /\d+/g;
};
my #sorted_numbers = sort { $a <=> $b } #numbers;
The sorted values are now in the #sorted_numbers array.
Next it needs to find the missing values in this array:
my #missing;
my $i = 0;
for ( $sorted_numbers[ 0 ] .. $sorted_numbers[ -1 ] ) {
++$i, next if $_ == $sorted_numbers[ $i ];
push #missing, "$_\n";
}
This is where I need some help. I am not sure how to achieve the next step.
Each number in the array either starts with 2 (200000) or 3 (300000).
The missing values should only show between these ranges.
Let's say the first range finished at 240000. The other range will start somewhere over 300000.
Of course I do not want to return all the values outside these two ranges as missing.
I tried something along these lines but the syntax and logic is completely wrong:
foreach (my $step = #sorted_numbers) {
if ($step <= 299999) {
my $i = 0;
for ( $sorted_numbers[ 0 ] .. $sorted_numbers[ -1 ] ) {
++$i, next if $_ == $sorted_numbers[ $i ];
push #missing, "$_\n";
}
}
else ($step > 299999) {
my $i = 0;
for ( $sorted_numbers[ 0 ] .. $sorted_numbers[ -1 ] ) {
++$i, next if $_ == $sorted_numbers[ $i ];
push #missing2, "$_\n";
}
}
}
Let me know if something is unclear and I'll be happy to provide further information.
Just check the difference between neighbouring numbers. If it's more than 1 but small enough (see $gap_size below), report the missing numbers:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $gap_size = 50; # Tune this to your needs.
my #numbers = sort { $a <=> $b } map /\d+/g, <>;
for my $i (1 .. $#numbers) {
my $diff = $numbers[$i] - $numbers[$i-1];
if ($diff > 1 && $diff < $gap_size) {
say for $numbers[$i-1] + 1 .. $numbers[$i] - 1;
}
}
I have got one array with milions of floating points number (ordered) and then another one smaller and i need to match the numbers within a certain tolerance (there is no overlap between the values in the big array and the values in the small array) from the small array in the big array. okay so no big deal, this is the perl function for returning a not perfect match within the tolerance, it's inside a for loop there I am looping through the small array values.
sub bin_search{
my ($arr, $v, $t ) = #_;
my ($min, $max) = (0, #$arr-1);
while ($min <= $max) {
my $w = $v + $t;
my $k = $v - $t;
my $try = int( ( $min + $max ) / 2 );
$min = $try + 1, next if $arr -> [$try] < $k ;
$max = $try - 1, next if $arr -> [$try] > $w ;
return $arr -> [$try] ;
}
return 0;
}
but then after checkin my data it seems I have got some values discarded because it is returning just the first match.
I tried grep but it is too slow.
my $min = $val - $t;
my $max = $val + $t;
my #arr2 = grep { ( $_ > $min ) && ($_ < $max ) }#big_arr1;
so I wanted to modify a bit the binary search for returning a range from $min to $max, because I thought once there is one match is either at $min or $max, so something like
sub bin_search{
my ($arr, $v, $t ) = #_;
my ($min, $max) = (0, #$arr-1);
my $w = $v + $t;
my $k = $v - $t;
while ($min <= $max) {
my $try = int( ( $min + $max ) / 2 );
$min = $try + 1, next if $arr -> [$try] < $k ;
$max = $try - 1, next if $arr -> [$try] > $w ;
last;
}
my #fin;
if ( ($arr -> [$try] < $w) && ($arr -> [$try] > $k) ) {
push #fin, $arr ->[$try]; $try++ }
return \#fin;
}
but I am missing some values, and I think that I am missing something, should I look just at one direction at the time? like left until we reach the lower limit then return to $try and do the same thing until higher limit?
Start by finding the index of a matching element using a binary search.
Once you have that, you need to find where the range starts. You can use a binary search for that as well, but a linear search is also acceptable if the number of matching elements is usually small.
Finally, you need to find the end of the range. You can use the same approach as you used for finding the start of the range.
The problem with your solution is that you didn't look for the start of the range.
The following is an untested implementation that uses the linear scan approach (like yours), so it assumes that there will be very few matching elements:
sub binsearch_numeric_range {
my $min = shift;
my $max = shift;
my $array = shift;
return () if !#$array;
my $i = 0;
my $j = $#$array;
my $k;
while (1) {
$k = int(($i+$j)/2);
if ($array->[$k] > $max) {
$j = $k-1;
return () if $i > $j;
}
elsif ($array->[$k] < $min) {
$i = $k+1;
return () if $i > $j;
}
else {
last;
}
}
my $min_k = $k; --$min_k while $min_k > 0 && $array->[$min_k - 1] >= $min;
my $max_k = $k; ++$max_k while $max_k < $#$array && $array->[$max_k + 1] <= $max;
return #$array[$min_k .. $max_k];
}
my #matches = binsearch_numeric_range($v-$t, $v+$t, $arr);
An implementation that doesn't require writing a whole new binsearch:
my $idx = binsearch { abs($a-$b) <= $t ? 0 : $a <=> $b } $v, #$arr;
my #range;
if ($idx >= 0) {
my $min_idx = $idx; --$min_idx while $min_idx > 0 && $arr->[$min_idx-1] >= ($v-$t);
my $max_idx = $idx; ++$max_idx while $max_idx < $#$arr && $arr->[$max_idx+1] <= ($v+$t);
#range = #$array[$min_idx .. $max_idx];
}
The binsearch used is defined here.