How to do fast comparison of arrays in perl [duplicate]

How to do fast comparison of arrays in perl [duplicate] - arrays

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In Perl, is there a built in way to compare two arrays for equality?
I need to compare arrays with a function that should return:
true if all elements are equal when compared pairwise
true if all elements are equal or the element in the first array is undefined when compared pairwise
false in all other cases
in other words, if the sub is called "comp":
#a = ('a', 'b', undef, 'c');
#b = ('a', 'b', 'f', 'c');
comp(#a, #b); # should return true
comp(#b, #a); # should return false
#a = ('a', 'b');
#b = ('a', 'b', 'f', 'c');
comp(#a, #b); # should return true
the obvious solution would be to do pairwise compares between the two arrays, but I'd like it to be faster than that, as the comparisons are run multiple times over a large set of arrays, the and the arrays may have many elements.
On the other hand, the contents of the arrays to be compared (i.e.: all the possible #b's) is pre-determined and does not change. The elements of the arrays do not have a fixed length, and there is no guarantee as to what chars they might contain (tabs, commas, you name it).
Is there a faster way to do this than pairwise comparison? Smart match won't cut it, as it returns true if all elements are equal (an therefore not if one is undef).
Could packing and doing bitwise comparisons be a strategy? It looks promising when I browse the docs for pack/unpack and vec, but I'm somewhat out of my depth there.
Thanks.

Perl can compare lists of 10,000 pairwise elements in about 100ms on my Macbook, so first thing I'll say is to profile your code to make sure this is actually the problem.
Doing some benchmarking, there's a few things you can do to speed things up.
Make sure to bail on the first failure to match.
Assuming you have a lot of comparisons which don't match, this will save HEAPS of time.
Check up front that the arrays are the same length.
If they arrays aren't the same length, they can never match. Compare their sizes and return early if they're different. This avoids needing to check this case over and over again inside the loop.
Use an iterator instead of a C-style for loop.
Iterating pair-wise you'd normally do something like for( my $idx = 0; $idx <= $#a; $idx += 2 ) but iterating over an array is faster than using a C-style for loop. This is an optimization trick of Perl, its more efficient to do the work inside perl in optimized C than to do it in Perl code. This gains you about 20%-30% depending on how you micro-optimize it.
for my $mark (0..$#{$a}/2) {
my $idx = $mark * 2;
next if !defined $a->[$idx] || !defined $b->[$idx];
return 0 if $a->[$idx] ne $b->[$idx] || $a->[$idx+1] ne $b->[$idx+1];
}
return 1;
Precompute the interesting indexes.
Since one set of pairs is fixed, you can produce an index of which pairs are defined. This makes the iterator even simpler and faster.
state $indexes = precompute_indexes($b);
for my $idx ( #$indexes ) {
next if !defined $a->[$idx];
return 0 if $a->[$idx] ne $b->[$idx] || $a->[$idx+1] ne $b->[$idx+1];
}
return 1;
With no nulls this is a performance boost of 40%. You get more beyond that the more nulls are in your fixed set.
use strict;
use warnings;
use v5.10; # for state
# Compute the indexes of a list of pairs which are interesting for
# comparison: those with defined keys.
sub precompute_indexes {
my $pairs = shift;
die "Unbalanced pairs" if #$pairs % 2 != 0;
my #indexes;
for( my $idx = 0; $idx <= $#$pairs; $idx += 2 ) {
push #indexes, $idx if defined $pairs->[$idx];
}
return \#indexes;
}
sub cmp_pairs_ignore_null_keys {
my($a, $b) = #_;
# state is like my but it will only evaluate once ever.
# It acts like a cache which initializes the first time the
# program is run.
state $indexes = precompute_indexes($b);
# If they don't have the same # of elements, they can never match.
return 0 if #$a != #$b;
for my $idx ( #$indexes ) {
next if !defined $a->[$idx];
return 0 if $a->[$idx] ne $b->[$idx] || $a->[$idx+1] ne $b->[$idx+1];
}
return 1;
}
I'm still convinced this is better to do in SQL with a self-join, but haven't worked that out.

Related

compare two arrays in order

I thought this would be pretty simple, but I seem to be getting things mixed up, and I haven't found anything on stackoverflow that quite matches my question.
I'm trying to write a function that can compare two arrays of file names to make sure their values match. They need to actually match in their position as well, so the order is crucial. In other words:
array1 = ["file1.html", "file2.html", "file3.html", "file4.html"]
array2 = ["file1.html", "file2.html", "file4.html", "file3.html"]
I would want a comparison between these two arrays to return as false, because of the difference in order (even though both arrays actually include the same values). I tried something like this:
matching = true
names1 = array1.map { |x| File.basename(x)}
names2 = array2.map { |x| File.basename(x)}
names1.each_with_index { |file,index|
if file != names2[index]
matching = false
end
}
return matching
This works, but I'm wondering if there's a cleaner, more foolproof way of comparing arrays in this way? Thanks!

You don't need to do anything of the sort. Default array equality operator compares arrays with order
array1 = %w[file1.html file2.html file3.html file4.html]
array2 = %w[file1.html file2.html file4.html file3.html]
array3 = %w[file1.html file2.html file3.html file4.html]
array1 == array2 # => false
array1 == array3 # => true
Example with comparing mapped values (if you have full paths in your arrays)
array1.map{|a| File.basename(a)} == array2.map{|a| File.basename(a) }
# or, as #mudasobwa would suggest
[array1, array2].map{|a| a.map(&File.method(:basename)) }.reduce(:==)
Use the last one if and only if a) you understand it completely and b)
you think it's a good idea.

Count Perl array size

I'm trying to print out the size of my array. I've followed a few other questions like this one on Stack Overflow. However, I never get the result I want.
All I wish for in this example is for the value of 3 to be printed as I have three indexes. All I get, from both print methods is 0.
my #arr;
$arr{1} = 1;
$arr{2} = 2;
$arr{3} = 3;
my $size = #arr;
print $size; # Prints 0
print scalar #arr; # Prints 0
What am I doing wrong, and how do I get the total size of an array when declared and populated this way?

First off:
my #arr;
$arr{1} = 1;
$arr{2} = 2;
$arr{3} = 3;
is nonsense. {} is for hash keys, so you are referring to %arr not #arr. use strict; and use warnings; would have told you this, and is just one tiny fragment of why they're considered mandatory.
To count the elements in an array, merely access it in a scalar context.
print scalar #arr;
if ( $num_elements < #arr ) { do_something(); }
But you would need to change your thing to
my #arr;
$arr[1] = 1;
$arr[2] = 2;
$arr[3] = 3;
And note - the first element of your array $arr[0] would be undefined.
$VAR1 = [
undef,
1,
2,
3
];
As a result, you would get a result of 4. To get the desired 'count of elements' you would need to filter the undefined items, with something like grep:
print scalar grep {defined} #arr;
This will take #arr filter it with grep (returning 3 elements) and then take the scalar value - count of elements, in this case 3.
But normally - you wouldn't do this. It's only necessary because you're trying to insert values into specific 'slots' in your array.
What you would do more commonly, is use either a direct assignment:
my #arr = ( 1, 2, 3 );
Or:
push ( #arr, 1 );
push ( #arr, 2 );
push ( #arr, 3 );
Which inserts the values at the end of the array. You would - if explicitly iterating - go from 0..$#arr but you rarely need to do this when you can do:
foreach my $element ( #arr ) {
print $element,"\n";
}
Or you can do it with a hash:
my %arr;
$arr{1} = 1;
$arr{2} = 2;
$arr{3} = 3;
This turns your array into a set of (unordered) key-value pairs, which you can access with keys %arr and do exactly the same:
print scalar keys %arr;
if ( $elements < keys %arr ) { do_something(); }
In this latter case, your hash will be:
$VAR1 = {
'1' => 1,
'3' => 3,
'2' => 2
};
I would suggest this is bad practice - if you have ordered values, the tool for the job is the array. If you have 'key' values, a hash is probably the tool for the job still - such as a 'request ID' or similar. You can typically tell the difference by looking at how you access the data, and whether there are any gaps (including from zero).
So to answer your question as asked:
my $size = #arr;
print $size; # prints 0
print scalar #arr; # prints 0
These don't work, because you never insert any values into #arr. But you do have a hash called %arr which you created implicitly. (And again - use strict; and use warnings; would have told you this).

You are initializing a hash, not an array.
To get the "size" of your hash you can write.
my $size = keys %arr;

I just thought there should be an illustration of your code run with USUW (use strict/use warnings) and what it adds to the troubleshooting process:
use strict;
use warnings;
my #arr;
...
And when you run it:
Global symbol "%arr" requires explicit package name (did you forget to declare "my %arr"?) at - line 9.
Global symbol "%arr" requires explicit package name (did you forget to declare "my %arr"?) at - line 10.
Global symbol "%arr" requires explicit package name (did you forget to declare "my %arr"?) at - line 11.
Execution of - aborted due to compilation errors.
So USUW.

You may be thinking that you are instantiating an element of #arr when you are typing in the following code:
$arr{1} = 1;
However, you are instantiating a hash doing that. This tells me that you are not using strict or you would have an error. Instead, change to brackets, like this:
$arr[1] = 1;

Why ever use an array instead of a hash?

I have read that it is much faster to iterate through a hash than through an array. Retrieving values from a hash is also much faster.
Instead of using an array, why not just use a hash and give each key a value corresponding to an index? If the items ever need to be in order, they can be sorted.

Retrieving from hash is faster in a sense that you can fetch value directly by key instead of iterating over whole hash (or array when you're searching for particular string). Having that said, $hash{key} isn't faster than $array[0] as no iteration is taking place.
Arrays can't be replaced by hashes, as they have different features,
arrays hashes
------------------------------------
ordered keys x -
push/pop x -
suitable for looping x -
named keys - x

I don't know where you read that hashes are faster than arrays. According to some Perl reference works (Mastering Algorithms with Perl), arrays are faster than hashes (follow this link for some more info).
If speed is your only criterae, you should benchmark to see which technique is going to be faster. It depends on what operations you will be doing onto the array/hash.
Here is an SO link with some further information: Advantage of 'one dimensional' hash over array in Perl

I think this is a good question: it's not so much a high level "language design" query so much as it is an implementation question. It could be worded in a way that emphasizes that - say using hashes versus arrays for a particular technique or use case.
Hashes are nice but you need lists/arrays (c.f. #RobEarl). You can use tie (or modules like Tie::IxHash or Tie::Hash::Indexed ) to "preserve" the order of a hash, but I believe these would have to be slower than a regular hash and in some cases you can't pass them around or copy them in quite the same way.

This code is more or less how a hash works. It should explain well enough why you would want to use an array instead of a hash.
package DIYHash;
use Digest::MD5;
sub new {
my ($class, $buckets) = #_;
my $self = bless [], $class;
$#$self = $buckets || 32;
return $self;
}
sub fetch {
my ( $self, $key ) = #_;
my $i = $self->_get_bucket_index( $key );
my $bo = $self->_find_key_in_bucket($key);
return $self->[$i][$bo][1];
}
sub store {
my ( $self, $key, $value ) = #_;
my $i = $self->_get_bucket_index( $key );
my $bo = $self->_find_key_in_bucket($key);
$self->[$i][$bo] = [$key, $value];
return $value;
}
sub _find_key_in_bucket {
my ($self, $key, $index) = #_;
my $bucket = $self->[$index];
my $i = undef;
for ( 0..$#$bucket ) {
next unless $bucket->[$_][0] eq $key;
$i = $_;
}
$i = #$bucket unless defined $i;
return $i;
}
# This function needs to always return the same index for a given key.
# It can do anything as long as it always does that.
# I use the md5 hashing algorithm here.
sub _get_bucket_index {
my ( $self, $key ) = #_;
# Get a number from 0 to 1 - bucket count.
my $index = unpack( "I", md5($key) ) % #$self;
return $index;
}
1;
To use this amazing cluster of code:
my $hash = DIYHash->new(4); #This hash has 4 buckets.
$hash->store(mouse => "I like cheese");
$hash->store(cat => "I like mouse");
say $hash->fetch('mouse');
Hashes look like they are constant time, rather than order N because for a given data set, you select a number of buckets that keeps the number of items in any bucket very small.
A proper hashing system would be able to resize as appropriate when the number of collisions gets too high. You don't want to do this often, because it is an order N operation.

How to tell if two arrays are permutations of each other (without the ability to sort them)

If I have two different arrays and all I can do is check whether two elements in the arrays are equal (in other words, there is no comparison function (beyond equals) for the elements to sort them), is there any efficient way to check whether one array is a permutation of the other?

Words Like Jared's brute force solution should work, but it is O(n^2).
If the elements are hashable, you can achieve O(n).
def isPermutation(A, B):
"""
Computes if A and B are permutations of each other.
This implementation correctly handles duplicate elements.
"""
# make sure the lists are of equal length
if len(A) != len(B):
return False
# keep track of how many times each element occurs.
counts = {}
for a in A:
if a in counts: counts[a] = counts[a] + 1
else: counts[a] = 1
# if some element in B occurs too many times, not a permutation
for b in B:
if b in counts:
if counts[b] == 0: return False
else: counts[b] = counts[b] - 1
else: return False
# None of the elements in B were found too many times, and the lists are
# the same length, they are a permutation
return True
Depending on how the dictionary is implemented (as a hashset vs a treeset), this will take either O(n) for hashset or O(n log n) for treeset.

This implementation might be wrong, but the general idea should be correct. I am just starting python, so this may also be an unconventional or non-pythonic style.
def isPermutation(list1, list2):
# make sure the lists are of equal length
if (len(list1) is not len(list2)):
return False
# keep track of what we've used
used = [False] * len(list1)
for i in range(len(list1)):
found = False
for j in range(len(list1)):
if (list1[i] is list2[j] and not used[j]):
found = True
used[j] = True
break
if (not found):
return False
return True

Assuming the two arrays are equal length and an element could be appearing the arrays more than once, you could create another array of the same length of type boolean initialized to false.
Then iterate though one of the arrays and for each element check whether that element appears in the other array at a poistion where the corresponding boolean is false -- if it does, set the corresponding boolean to true. If all elements in the first array can be accounted for this way, the two arrays are equal, otherwise not (and you found at least one difference).
The memory requirement is O(n), the time complexity is O(n^2)

If hashing for your object type makes sense, you could use a temp hash set to insert all the items from array A. Then while iterating over array B, make sure each item is already in the temp hash set.
This should be faster ( O(n) ) than a naive nested O(n^2) loop. (Except for small or trivial data sets, where a simpler naive algo might outperform it)
Note that this will take O(n) extra memory, and that this approach will only work if you dont have duplicates (or dont want to count them as part of the comparison)

Because I can't comment yet (not enough rep), I'll just mention this here as a reply to one of the other answers: Python's set() doesn't work all that well with objects.
If you have objects that are hashable, this code should work for you:
def perm(a, b):
dicta = {}; dictb = {}
for i in a:
if i in dicta: dicta[i] += 1
else: dict[i] = 1
for i in b:
if i in dictb: dictb[i] += 1
else: dict[i] = 1
return dicta == dictb
Construct a hashmap of objects in a and the number of times they occur. For each element in b, if the element is not in the hashmap or the occurrences don't match, it is not a permutation. Otherwise, it is a permutation.
>>> perm([1,2,4], [1,4,2])
True
>>> perm([1,2,3,2,1], [1,2,1,2,3])
True
>>> perm([1,2,4], [1,2,2])
False

// C# code, l1 and l2 are non-sorted lists
private static bool AreListContainedEachOther(List<int> l1, List<int> l2)
{
if (l1.Equals(null) || l2.Equals(null))
return false;

bool isContained = true;

foreach (int n in l1)
{
if (!l2.Contains(n))
return false;
}

return isContained;
}

The previous answers are great.
Just adds, in Python, like in Python, it's really simple.
If :
The length of the arrays is equal.
There are no elements in array B that do not exist in array A.
There are no elements that have more occurrences in Array B than in Array A.
The arrays are permutations of each other
from collections import defaultdict
def is_permutation(a, b):
if len(a) != len(b):
return False
counts = defaultdict(int)
for element in a:
counts[element] += 1
for element in b:
if not counts[element]:
return False
counts[element] -= 1
return True

Faster way to check for element in array?

This function does the same as exists does with hashes.
I plan on use it a lot.
Can it be optimized in some way?
my #a = qw/a b c d/;
my $ret = array_exists("b", #a);
sub array_exists {
my ($var, #a) = #_;
foreach my $e (#a) {
if ($var eq $e) {
return 1;
}
}
return 0;
}

If you have to do this a lot on a fixed array, use a hash instead:
my %hash = map { $_, 1 } #array;
if( exists $hash{$key} ) { ... }
Some people reach for the smart match operator, but that's one of the features that we need to remove from Perl. You need to decide if this should match, where the array hold an array reference that has a hash reference with the key b:
use 5.010;
my #a = (
qw(x y z),
[ { 'b' => 1 } ],
);
say 'Matches' if "b" ~~ #a; # This matches
Since the smart match is recursive, if keeps going down into data structures. I write about some of this in Rethinking smart matching.

You can use smart matching, available in Perl 5.10 and later:
if ("b" ~~ #a) {
# "b" exists in #a
}
This should be much faster than a function call.

I'd use List::MoreUtils::any.
my $ret = any { $_ eq 'b' } #a;

Since there are lots of similar questions on StackOverflow where different "correct answers" return different results, I tried to compare them. This question seems to be a good place to share my little benchmark.
For my tests I used a test set (#test_set) of 1,000 elements (strings) of length 10 where only one element ($search_value) matches a given string.
I took the following statements to validate the existence of this element in a loop of 100,000 turns.
_grep
grep( $_ eq $search_value, #test_set )
_hash
{ map { $_ => 1 } #test_set }->{ $search_value }
_hash_premapped
$mapping->{ $search_value }
uses a $mapping that is precalculated as $mapping = { map { $_ => 1 } #test_set } (which is included in the final measuring)
_regex
sub{ my $rx = join "|", map quotemeta, #test_set; $search_value =~ /^(?:$rx)$/ }
_regex_prejoined
$search_value =~ /^(?:$rx)$/
uses a regular expression $rx that is precalculated as $rx = join "|", map quotemeta, #test_set; (which is included in the final measuring)
_manual_first
sub{ foreach ( #test_set ) { return 1 if( $_ eq $search_value ); } return 0; }
_first
first { $_ eq $search_value } #test_set
from List::Util (version 1.38)
_smart
$search_value ~~ #test_set
_any
any { $_ eq $search_value } #test_set
from List::MoreUtils (version 0.33)
On my machine ( Ubuntu, 3.2.0-60-generic, x86_64, Perl v5.14.2 ) I got the following results. The shown values are seconds and returned by gettimeofday and tv_interval of Time::HiRes (version 1.9726).
Element $search_value is located at position 0 in array #test_set
_hash_premapped: 0.056211
_smart: 0.060267
_manual_first: 0.064195
_first: 0.258953
_any: 0.292959
_regex_prejoined: 0.350076
_grep: 5.748364
_regex: 29.27262
_hash: 45.638838
Element $search_value is located at position 500 in array #test_set
_hash_premapped: 0.056316
_regex_prejoined: 0.357595
_first: 2.337911
_smart: 2.80226
_manual_first: 3.34348
_any: 3.408409
_grep: 5.772233
_regex: 28.668455
_hash: 45.076083
Element $search_value is located at position 999 in array #test_set
_hash_premapped: 0.054434
_regex_prejoined: 0.362615
_first: 4.383842
_smart: 5.536873
_grep: 5.962746
_any: 6.31152
_manual_first: 6.59063
_regex: 28.695459
_hash: 45.804386
Conclusion
The fastest method to check the existence of an element in an array is using prepared hashes. You of course buy that by an proportional amount of memory consumption and it only makes sense if you search for elements in the set multiple times. If your task includes small amounts of data and only a single or a few searches, hashes can even be the worst solution. Not the same way fast, but a similar idea would be to use prepared regular expressions, which seem to have a smaller preparation time.
In many cases, a prepared environment is no option.
Surprisingly List::Util::first has very good results, when it comes to the comparison of statements, that don't have a prepared environment. While having the search value at the beginning (which could be perhaps interpreted as the result in smaller sets, too) it is very close to the favourites ~~ and any (and could even be in the range of measurement inaccuracy). For items in the middle or at the end of my larger test set, first is definitely the fastest.

brian d foy suggested using a hash, which gives O(1) lookups, at the cost of slightly more expensive hash creation. There is a technique that Marc Jason Dominus describes in his book Higher Order Perl where by a hash is used to memoize (or cache) results of a sub for a given parameter. So for example, if findit(1000) always returns the same thing for the given parameter, there's no need to recalculate the result every time. The technique is implemented in the Memoize module (part of the Perl core).
Memoizing is not always a win. Sometimes the overhead of the memoized wrapper is greater than the cost of calculating a result. Sometimes a given parameter is unlikely to ever be checked more than once or a relatively few times. And sometimes it cannot be guaranteed that the result of a function for a given parameter will always be the same (ie, the cache can become stale). But if you have an expensive function with stable return values per parameter, memoization can be a big win.
Just as brian d foy's answer uses a hash, Memoize uses a hash internally. There is additional overhead in the Memoize implementation, but the benefit to using Memoize is that it doesn't require refactoring the original subroutine. You just use Memoize; and then memoize( 'expensive_function' );, provided it meets the criteria for benefitting from memoization.
I took your original subroutine and converted it to work with integers (just for simplicity in testing). Then I added a second version that passed a reference to the original array rather than copying the array. With those two versions, I created two more subs that I memoized. I then benchmarked the four subs.
In benchmarking, I had to make some decisions. First, how many iterations to test. The more iterations we test, the more likely we are to have good cache hits for the memoized versions. Then I also had to decide how many items to put into the sample array. The more items, the less likely to have cache hits, but the more significant the savings when a cache hit occurs. I ultimately decided on an array to be searched containing 8000 elements, and chose to search through 24000 iterations. That means that on average there should be two cache hits per memoized call. (The first call with a given param will write to the cache, while the second and third calls will read from the cache, so two good hits on average).
Here is the test code:
use warnings;
use strict;
use Memoize;
use Benchmark qw/cmpthese/;
my $n = 8000; # Elements in target array
my $count = 24000; # Test iterations.
my #a = ( 1 .. $n );
my #find = map { int(rand($n)) } 0 .. $count;
my ( $orx, $ormx, $opx, $opmx ) = ( 0, 0, 0, 0 );
memoize( 'orig_memo' );
memoize( 'opt_memo' );
cmpthese( $count, {
original => sub{ my $ret = original( $find[ $orx++ ], #a ); },
orig_memo => sub{ my $ret = orig_memo( $find[ $ormx++ ], #a ); },
optimized => sub{ my $ret = optimized( $find[ $opx++ ], \#a ); },
opt_memo => sub{ my $ret = opt_memo( $find[ $opmx++ ], \#a ); }
} );
sub original {
my ( $var, #a) = #_;
foreach my $e ( #a ) {
return 1 if $var == $e;
}
return 0;
}
sub orig_memo {
my ( $var, #a ) = #_;
foreach my $e ( #a ) {
return 1 if $var == $e;
}
return 0;
}
sub optimized {
my( $var, $aref ) = #_;
foreach my $e ( #{$aref} ) {
return 1 if $var == $e;
}
return 0;
}
sub opt_memo {
my( $var, $aref ) = #_;
foreach my $e ( #{$aref} ) {
return 1 if $var == $e;
}
return 0;
}
And here are the results:
Rate orig_memo original optimized opt_memo
orig_memo 876/s -- -10% -83% -94%
original 972/s 11% -- -82% -94%
optimized 5298/s 505% 445% -- -66%
opt_memo 15385/s 1657% 1483% 190% --
As you can see, the memoized version of your original function was actually slower. That's because so much of the cost of your original subroutine was spent in making copies of the 8000 element array, combined with the fact that there is additional call-stack and bookkeeping overhead with the memoized version.
But once we pass an array reference instead of a copy, we remove the expense of passing the entire array around. Your speed jumps considerably. But the clear winner is the optimized (ie, passing array refs) version that we memoized (cached), at 1483% faster than your original function. With memoization the O(n) lookup only happens the first time a given parameter is checked. Subsequent lookups occur in O(1) time.
Now you would have to decide (by Benchmarking) whether memoization helps you. Certainly passing an array ref does. And if memoization doesn't help you, maybe brian's hash method is best. But in terms of not having to rewrite much code, memoization combined with passing an array ref may be an excellent alternative.

Your current solution iterates through the array before it finds the element it is looking for. As such, it is a linear algorithm.
If you sort the array first with a relational operator (>for numeric elements, gt for strings) you can use binary search to find the elements. It is a logarithmic algorithm, much faster than linear.
Of course, one must consider the penalty of sorting the array in the first place, which is a rather slow operation (n log n). If the contents of the array you are matching against change often, you must sort after every change, and it gets really slow. If the contents remain the same after you've initially sorted them, binary search ends up being practically faster.

You can use grep:
sub array_exists {
my $val = shift;
return grep { $val eq $_ } #_;
}
Surprisingly, it's not off too far in speed from List::MoreUtils' any(). It's faster if your item is at the end of the list by about 25% and slower by about 50% if your item is at the start of the list.
You can also inline it if needed -- no need to shove it off into a subroutine. i.e.
if ( grep { $needle eq $_ } #haystack ) {
### Do something
...
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to do fast comparison of arrays in perl [duplicate] - arrays

Related

compare two arrays in order

Count Perl array size

Why ever use an array instead of a hash?

How to tell if two arrays are permutations of each other (without the ability to sort them)

Faster way to check for element in array?

Categories

Resources