algorithm binary search return range perl

algorithm binary search return range perl - arrays

I have got one array with milions of floating points number (ordered) and then another one smaller and i need to match the numbers within a certain tolerance (there is no overlap between the values in the big array and the values in the small array) from the small array in the big array. okay so no big deal, this is the perl function for returning a not perfect match within the tolerance, it's inside a for loop there I am looping through the small array values.
sub bin_search{
my ($arr, $v, $t ) = #_;
my ($min, $max) = (0, #$arr-1);
while ($min <= $max) {
my $w = $v + $t;
my $k = $v - $t;
my $try = int( ( $min + $max ) / 2 );
$min = $try + 1, next if $arr -> [$try] < $k ;
$max = $try - 1, next if $arr -> [$try] > $w ;
return $arr -> [$try] ;
}
return 0;
}
but then after checkin my data it seems I have got some values discarded because it is returning just the first match.
I tried grep but it is too slow.
my $min = $val - $t;
my $max = $val + $t;
my #arr2 = grep { ( $_ > $min ) && ($_ < $max ) }#big_arr1;
so I wanted to modify a bit the binary search for returning a range from $min to $max, because I thought once there is one match is either at $min or $max, so something like
sub bin_search{
my ($arr, $v, $t ) = #_;
my ($min, $max) = (0, #$arr-1);
my $w = $v + $t;
my $k = $v - $t;
while ($min <= $max) {
my $try = int( ( $min + $max ) / 2 );
$min = $try + 1, next if $arr -> [$try] < $k ;
$max = $try - 1, next if $arr -> [$try] > $w ;
last;
}
my #fin;
if ( ($arr -> [$try] < $w) && ($arr -> [$try] > $k) ) {
push #fin, $arr ->[$try]; $try++ }
return \#fin;
}
but I am missing some values, and I think that I am missing something, should I look just at one direction at the time? like left until we reach the lower limit then return to $try and do the same thing until higher limit?

Start by finding the index of a matching element using a binary search.
Once you have that, you need to find where the range starts. You can use a binary search for that as well, but a linear search is also acceptable if the number of matching elements is usually small.
Finally, you need to find the end of the range. You can use the same approach as you used for finding the start of the range.
The problem with your solution is that you didn't look for the start of the range.
The following is an untested implementation that uses the linear scan approach (like yours), so it assumes that there will be very few matching elements:
sub binsearch_numeric_range {
my $min = shift;
my $max = shift;
my $array = shift;
return () if !#$array;
my $i = 0;
my $j = $#$array;
my $k;
while (1) {
$k = int(($i+$j)/2);
if ($array->[$k] > $max) {
$j = $k-1;
return () if $i > $j;
}
elsif ($array->[$k] < $min) {
$i = $k+1;
return () if $i > $j;
}
else {
last;
}
}
my $min_k = $k; --$min_k while $min_k > 0 && $array->[$min_k - 1] >= $min;
my $max_k = $k; ++$max_k while $max_k < $#$array && $array->[$max_k + 1] <= $max;
return #$array[$min_k .. $max_k];
}
my #matches = binsearch_numeric_range($v-$t, $v+$t, $arr);
An implementation that doesn't require writing a whole new binsearch:
my $idx = binsearch { abs($a-$b) <= $t ? 0 : $a <=> $b } $v, #$arr;
my #range;
if ($idx >= 0) {
my $min_idx = $idx; --$min_idx while $min_idx > 0 && $arr->[$min_idx-1] >= ($v-$t);
my $max_idx = $idx; ++$max_idx while $max_idx < $#$arr && $arr->[$max_idx+1] <= ($v+$t);
#range = #$array[$min_idx .. $max_idx];
}
The binsearch used is defined here.

Related

Is there a built in Perl Function for finding duplicate subarrays(exact order) in an array?

Lets say the array is (1,2,3,4,5,6,7,8,9),
Another subarray is (2,3,4)
Is there a function to check if the subarray pattern(full exact order) exists within array?
In this case, it would return any indicator(index) that shows it exists.
Also would need to work for duplicates if there are multiple subarrays existing in the array like (4,2,3,4,2,3,4).
If it happens to match multiple times for example:
Array = (2,3,2,3,2,2,3,2)
Sub Array = (2,3,2)
Would just return starting index of matches in order: 0,2,5
Or if it removes, would result in (3,2)
Edit: Elements don't have to be num

There's no built-in method, but it's easy to write:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
# Takes two arrayrefs of numbers.
#
# Returns the first index in the first one where the second list appears, or
# -1 if not found.
sub find_sublist(++) {
my ($haystack, $needle) = #_;
my $nlen = #$needle;
my $hlen = #$haystack;
return -1 if $hlen == 0 || $nlen == 0;
HAYSTACK_POS:
for (my $n = 0; $n <= $hlen - $nlen; $n++) {
for (my $m = 0; $m < $nlen; $m++) {
if ($haystack->[$n + $m] != $needle->[$m]) {
next HAYSTACK_POS;
}
}
return $n;
}
return -1;
}
# Takes two arrayrefs of numbers.
#
# Returns a list of the starting indexes of the first list
# of every run of the second list. Returns an empty list if
# there are no matches.
sub find_sublists(++) {
my ($haystack, $needle) = #_;
my $nlen = #$needle;
my $hlen = #$haystack;
my #positions;
return #positions if $hlen == 0 || $nlen == 0;
HAYSTACK_POS:
for (my $n = 0; $n <= $hlen - $nlen; $n++) {
for (my $m = 0; $m < $nlen; $m++) {
if ($haystack->[$n + $m] != $needle->[$m]) {
next HAYSTACK_POS;
}
}
push #positions, $n;
}
return #positions;
}
# Takes two arrayrefs of numbers.
#
# Returns a new list that is the first one with every non-overlapping run of
# the second second list removed.
sub remove_sublists(++) {
my #haystack = #{$_[0]};
my $needle = $_[1];
while ((my $pos = find_sublist #haystack, $needle) != -1) {
splice #haystack, $pos, #$needle;
}
return #haystack;
}
my #list1 = (1,2,3,4,5,6,7,8,9);
my #list2 = (4,2,3,4,2,3,4);
my #list3 = (2,3,2,3,2,2,3,2);
say find_sublist(#list1, [2, 3, 4]); # Returns 1
say find_sublist([2,9,3,4], [2,3,4]); # Returns -1
my #positions = find_sublists(#list2, [2,3,4]); # 1,4
say join(",", #positions);
#positions = find_sublists(#list3, [2,3,2]); # 0,2,5
say join(",", #positions);
say join(",", remove_sublists(#list1, [2,3,4])); # 1,5,6,7,8,9
say join(",", remove_sublists(#list3, [2,3,2])); # 3,2

If the inputs are numbers representable by your perl's integers (as shown), you can use
# Indexes
my $pattern = pack "W*", #pattern;
my $array = pack "W*", #array;
my #indexes;
push #indexes, $-[0] while $array =~ /\Q$pattern/g;
# Removal
my $pattern = pack "W*", #pattern;
my $array = pack "W*", #array;
$array =~ s/\Q$pattern//g;
#array = unpack "W*", $array;
How it handles overlaps:
/---\ /---\ Removed
2,3,2 from 2,3,2,3,2,2,3,2
\---/ Not removed
Note that this also works if you can map the inputs to numbers.
my ( %map_f, #map_r );
for ( #array, #pattern ) {
if ( !exists{ $map{ $_ } } ) {
$map_f{ $_ } = #map_r;
push #map_r, $_;
}
}
my $pattern = pack "W*", #map_f{ #pattern };
my $array = pack "W*", #map_f{ #array };
$array =~ s/\Q$pattern//g;
#array = #map_r[ unpack "W*", $array ];
It's not the best algorithm, but it should be very fast by moving the work from Perl to the regex engine.

Perl - Split sorted array into two arrays based on values

I am fairly new to Perl.
I have a large array of numbers which is sorted first:
my #numbers = do {
open my $fh, '<', glob("hotfolder/*.out") or die $!;
local $/;
<$fh> =~ /\d+/g;
};
my #sorted_numbers = sort { $a <=> $b } #numbers;
The sorted values are now in the #sorted_numbers array.
Next it needs to find the missing values in this array:
my #missing;
my $i = 0;
for ( $sorted_numbers[ 0 ] .. $sorted_numbers[ -1 ] ) {
++$i, next if $_ == $sorted_numbers[ $i ];
push #missing, "$_\n";
}
This is where I need some help. I am not sure how to achieve the next step.
Each number in the array either starts with 2 (200000) or 3 (300000).
The missing values should only show between these ranges.
Let's say the first range finished at 240000. The other range will start somewhere over 300000.
Of course I do not want to return all the values outside these two ranges as missing.
I tried something along these lines but the syntax and logic is completely wrong:
foreach (my $step = #sorted_numbers) {
if ($step <= 299999) {
my $i = 0;
for ( $sorted_numbers[ 0 ] .. $sorted_numbers[ -1 ] ) {
++$i, next if $_ == $sorted_numbers[ $i ];
push #missing, "$_\n";
}
}
else ($step > 299999) {
my $i = 0;
for ( $sorted_numbers[ 0 ] .. $sorted_numbers[ -1 ] ) {
++$i, next if $_ == $sorted_numbers[ $i ];
push #missing2, "$_\n";
}
}
}
Let me know if something is unclear and I'll be happy to provide further information.

Just check the difference between neighbouring numbers. If it's more than 1 but small enough (see $gap_size below), report the missing numbers:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $gap_size = 50; # Tune this to your needs.
my #numbers = sort { $a <=> $b } map /\d+/g, <>;
for my $i (1 .. $#numbers) {
my $diff = $numbers[$i] - $numbers[$i-1];
if ($diff > 1 && $diff < $gap_size) {
say for $numbers[$i-1] + 1 .. $numbers[$i] - 1;
}
}

Use of uninitalized value in concatenation <.> or string at xyz.pl line 27

I am writing this perl program and want to create an array which should be stored in output file according to the values of for loop. I am new to this programming.
This is my code
use strict;
use warnings;
open( my $out_fh, ">", "output.txt" ) || die("Cannot open file.\n");
my ( $x, $y, $i, $j, $k, $p, $q );
my ( #Xrow, #b, #b_l, #w );
print("Enter the number of rows:\n");
$p = <STDIN>;
chop($p);
print("Enter the number of columns:\n");
$q = <STDIN>;
chop($q);
$x = 2**$p;
$y = 2**$q;
#Xrow = ( #b, #b_l, #w );
for ( $i = 0; $i < $x * $y; $i = $i + 1 ) {
for ( $j = 0; $j < $x; $j = $j + 1 ) {
for ( $k = 0; $k < $y; $k = $k + 1 ) {
$Xrow[$i] = "$b[$j],$b_l[$j],$w[$k]";
foreach (#Xrow) {
print $out_fh "$_\n";
}
}
}
}
So output should look like for example p=q=1
Xrow0 b0 b_l0 w0
Xrow1 b1 b_l1 w0
Xrow2 b0 b_l0 w1
Xrow3 b1 b_l1 w1
so it should print like this in output file without any braces and "="
But I am getting error like this
Use of uninitialized value in concatenation<.> or string at xyz.pl in line 27
Use of uninitialized value within #b in concatenation<.> or string at xyz.pl in line 27

You don't populate the arrays anywhere, so they stay empty.
But, in fact, you need no arrays to get the output you want.
Additional advice:
Prefer chomp to chop.
Declare the variables when you need them, not at the top of the program/subroutine.
How I'd do it:
#!/usr/bin/perl
use warnings;
use strict;
open my $OUT, '>', 'output.txt' or die "Cannot open file.\n";
print "Enter the number of rows:\n";
my $p = <STDIN>;
chomp $p;
print "Enter the number of columns:\n";
my $q = <STDIN>;
chomp $q;
my $x = 2 ** $p;
my $y = 2 ** $q;
my $i = 0;
for (my $j = 0; $j < $x; $j = $j + 1) {
for (my $k = 0; $k < $y; $k = $k + 1) {
print {$OUT} "Xrow$i b$k b_l$k w$j\n";
++$i;
}
}

Perl - Identify all elements in a hash that have the highest value

I wrote some Perl code that searches for the highest key-value pair in a hash in which the keys are text and the values are numbers:
my $o_val = 0; # FOR TRACKING HIGHEST VALUE ENCOUNTERED IN THE LOOP
my $o_key; # FOR TRACKING CORRESPONDING KEY TO THE HIGHEST VALUE
while ( my ($key, $val) = each(%NG) ) {
if ( $val > $o_val ) {
$o_val = $val;
$o_key = $key;
}
}
print "$okey\n";
The problem is that it does nothing to account for the possibility of a tie for highest value. How can I capture all the key-value pairs that tied for highest value if my measuring variable can potentially change with each iteration of the loop?
I have the idea that I could write another while loop to run through the same hash after $o_val had been established as the highest value then push each key paired to $o_val into another array, something like:
my #highest; # ARRAY OF HIGHEST-VALUE KEYS
while ( my ($key, $val) = each(%NG) ) {
if ( $val == $o_val ) { push(#highest, $key); }
}
but this seems kinda inefficient. I'm hoping there are some Perl commands I'm not aware of that will let me identify the highest value in the hash without looping through it so I can just use the second piece of code in my question.

You can use an array to keep the keys for highest value,
if ( $val > $o_val ) {
$o_val = $val;
#o_keys = $key;
}
elsif ($val == $o_val) {
push(#o_keys, $key);
}

#!/usr/bin/env perl
use strict;
use warnings;
my %h = map { $_ => int(rand 10)} 'a' .. 'z';
# see what we've got
my #k = sort { $h{$b} <=> $h{$a} } keys %h;
print "$_ => $h{$_}\n" for #k;
# initialize $max with a value from %h
my ($max) = values %h;
# keys with highest values
my #argmax;
while (my ($k, $v) = each %h) {
next if $v < $max;
if ($v > $max) {
$max = $v;
#argmax = ($k);
}
else {
push #argmax, $k
}
}
print "#argmax\n";
print "#h{ #argmax }\n";
Of course, if you don't care much about memory, or if you are golfing, or japhing, you could write something like:
my %v;
push #{ $v{$h{$_}} }, $_ for keys %h;
my ($max) = sort { $b <=> $a } keys %v;
print "#{ $v{$max} } => $max\n"

slicing out multiples of nth element

I am having a list which has more than 10k elements. I want to remove every third element.
For example,
#testlists = qw (helloworld sessions first.cgi login localpcs depthhashes.cgi search view macros plugins ...) ;
I want to remove first.cgi, depthhashses.cgi, macros and so on from original array. Grep function is little bit slower. Please suggest me a faster grep search or any other similar subroutines . Any help would be highly appreciated

I can think of a few solutions:
Grep on index divisibility
my $i = 0;
#testlist = grep { ++$i % 3 } #testlist;
repeated splicing
for (my $i = 2; $i < $#testlist; $i += 2) {
splice #testlist, $i, 1;
}
Copying with skipping
my #output;
# pre-extend the array for fewer reallocations
$#output = #testlist * 2/3;
#output = ();
# FIXME annoying off-by one errors at the end that can add one undef
for (my $i = 0; $i < #testlist; $i += 3) {
push #output, #testlist[$i, $i+1];
}
Ikegami has corrected and optimized the copying solution in his remarkable answer.
A benchmark with 1,000-element lists declares splicing the clear winner:
Rate slice grep copy splice
slice 790/s -- -10% -18% -37%
grep 883/s 12% -- -8% -29%
copy 960/s 22% 9% -- -23%
splice 1248/s 58% 41% 30% --
(slice is mob's solution)
This is probably because it offloads most of the actual work into the C-level implementations, and avoids allocations and costly Perl-level operations.
With 10,000-element lists, the advantages shift towards the other solutions. Indeed, the splice solution has a very bad algorithmic complexity, because it moves all elements after all splicing position, meaning that the last element is moved nearly 3333 times:
Rate splice slice grep copy
splice 42.7/s -- -35% -42% -49%
slice 65.3/s 53% -- -12% -23%
grep 74.2/s 74% 14% -- -12%
copy 84.4/s 98% 29% 14% --
Here is the script I used for benchmarking.

amon's copy can be tweaked to be 30% faster!
my $i = 1;
my $j = 1;
while ($i < #a) {
$a[++$j] = $a[$i+=2];
$a[++$j] = $a[++$i];
}
$#a = $j-1 if #a>2;
And you can get even faster by avoid copying anything at all (though the result is a reference to an array). The longer the strings in each element, the better this will do.
my $i = 0;
my $ref = sub { \#_ }->( grep { ++$i % 3 } #a );
Results (10,000 elements):
>perl a.pl
Rate splice_a splice_r grep_a copy1_a copy1_r copy2_r copy2_a grep_r
splice_a 52.8/s -- -0% -51% -54% -56% -66% -66% -68%
splice_r 52.9/s 0% -- -51% -54% -55% -66% -66% -68%
grep_a 107/s 103% 103% -- -7% -10% -30% -31% -34%
copy1_a 115/s 118% 117% 7% -- -3% -25% -26% -30%
copy1_r 119/s 125% 124% 11% 3% -- -23% -23% -27%
copy2_r 154/s 191% 190% 43% 34% 29% -- -0% -6%
copy2_a 154/s 192% 192% 44% 34% 30% 0% -- -6%
grep_r 163/s 209% 209% 52% 42% 37% 6% 6% --
Benchmark:
use strict;
use warnings;
use Benchmark qw( cmpthese );
my #testlist = qw( helloworld sessions first.cgi login localpcs depthhashes.cgi search view macros );
#testlist = ( #testlist ) x ( 10000 / #testlist );
sub grep_a { my #a = #testlist; my $i = 0; #a = grep { ++$i % 3 } #a; 1 }
sub copy1_a { my #a = #testlist;
my #b;
$#b = $#a; #b = (); # Does absolutely nothing in this benchmark because of optimisations in Perl.
for (my $i = 0; $i < #a; $i += 3) {
push #b, #a[$i, $i+1];
}
1
}
sub copy2_a { my #a = #testlist;
my $i = 1;
my $j = 1;
while ($i < #a) {
$a[++$j] = $a[$i+=2];
$a[++$j] = $a[++$i];
}
$#a = $j-1 if #a>2;
1
}
sub splice_a { my #a = #testlist;
for (my $i = 2; $i < $#a; $i += 2) {
splice #a, $i, 1;
}
1
}
sub grep_r { my $r = [ #testlist ]; my $i = 0; $r = sub { \#_ }->( grep { ++$i % 3 } #$r ); 1 }
sub copy1_r { my $r = [ #testlist ];
my #b;
$#b = $#$r; #b = (); # Does absolutely nothing in this benchmark because of optimisations in Perl.
for (my $i = 0; $i < #$r; $i += 3) {
push #b, #$r[$i, $i+1];
}
$r = \#b;
1
}
sub copy2_r { my $r = [ #testlist ];
my $i = 1;
my $j = 1;
while ($i < #$r) {
$r->[++$j] = $r->[$i+=2];
$r->[++$j] = $r->[++$i];
}
$#$r = $j-1 if #$r>2;
1
}
sub splice_r { my $r = [ #testlist ];
for (my $i = 2; $i < $#$r; $i += 2) {
splice #$r, $i, 1;
}
1
}
cmpthese(-3, {
grep_a => \&grep_a,
copy1_a => \&copy1_a,
copy2_a => \&copy2_a,
splice_a => \&splice_a,
grep_r => \&grep_r,
copy1_r => \&copy1_r,
copy2_r => \&copy2_r,
splice_r => \&splice_r,
});

Use an array slice.
#testlists = #testlists[ grep { ($_+1) % 3 } 0..$#testlists ];

I'm not sure what you mean about using grep but perhaps you mean something like this.
for $i (0 .. $#testlists) {
if (($i % 3) == 2) {
delete $testlists[$i];
}
}
# Demonstrate the values.
foreach $e (#testlists) {
print "$e\n";
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

algorithm binary search return range perl - arrays

Related

Is there a built in Perl Function for finding duplicate subarrays(exact order) in an array?

Perl - Split sorted array into two arrays based on values

Use of uninitalized value in concatenation <.> or string at xyz.pl line 27

Perl - Identify all elements in a hash that have the highest value

slicing out multiples of nth element

Categories

Resources