Find longest slices of an array that contain distinct values

Find longest slices of an array that contain distinct values - arrays

I have a very long one-dimensional array of positive integers. Starting from one end, I need to find the longest slices/chunks of the array that have values that are at-least one number away from all the constituents of that slice.
i.e., I want to make a partitioning of the array (starting from the left) such that each partition contains elements that are more than one unit away from all the other elements in that partition.
Eg:
[1,1,9,5,3,8,7,4,1,2] -> [1],[1,9,5,3],[8],[7,4,1],[2]
[1,5,9,1,3,6,4,2,7,0] -> [1,5,9],[1,3,6,4],[2,7,0]
Bellow, I've written a little code in Fortran that will let me find the first such point of recurrence of a previous value.
mask is a LOGICAL array
array is the array in question
n is the length of the array
I can easily extend this to find the full partitioning.
mask = .FALSE.
DO i = 1,n
k = array(i)
IF ( mask(k) ) THEN
PRINT*, i
EXIT
ELSE
mask(k-1 : k+1) = .TRUE.
END IF
END DO
So my question is, is there a better way (alorithm) to do this? When I say better, I mean speed. I don't mind a memory cost.

Conceptually it could look something like this...
DO i = 1,n-1
Delta(I) = array(I+1) - array(I)
ENDDO
iMask = 0
WHERE(ABS(Delta) < 2) iMask =1
ALLOCATE(splits(SUM(iMask)))
K=0
DO I = 1, n-1
IF(iMask(I) == 0) CYCLE
K = K +1
Splits(K) = I
ENDDO
!... DEALLOCATE(Splits)
Then just print out the data between the splits values, which could be off by a count, and you may also need to do something for the Nth point, so it depends a bit on your implementation and whether your delta is "too the next point" or "from the last point".
In this case I used imask as an integer rather than a logical so I could use SUM.

My initial reaction is the naive approach:
save index bounds on the partition you're currently expanding (partitionNumber from iStart to iEnd)
Take the next point with index iEnd+1 and loop from iStart to iEnd testing that the candidate point is not within 1 of the current members
If the candidate fails the inclusion test, start it in its own partition by resetting iStart and incrementing partitionNumber
Increment iEnd.
If you're expecting the partitions to mostly be quite short then this should be pretty quick. If you're expecting long chains of increasing or decreasing integers you could save the min and max of values in the partition include a quick test to see if your candidate is outside the range.
I've not tested this and my fortran might be a bit rusty, but I think it represents the above algorithm.
partitionNumber = 1
iStart = 1
iEnd = 1
iCandidate = iEnd + 1
arrayMember(iStart) = partitionNumber
DO WHILE (iCandidate <= N)
DO j = iStart,iEnd
IF ( ABS(array(iCandidate)-array(j)) < 2 )
partitionNumber = partitionNumber + 1
iStart = iCandidate
EXIT
END IF
END DO
arrayMember(iCandidate) = partitionNumber
iEnd = iEnd + 1
iCandidate = iEnd + 1
END DO
Operating on your two examples, I would hope it to return arrayMember with entries
[1,1,9,5,3,8,7,4,1,2] -> [1,2,2,2,2,3,4,4,4,5] (represents [1],[1,9,5,3],[8],[7,4,1],[2])
[1,5,9,1,3,6,4,2,7,0] -> [1,1,1,2,2,2,3,3,3,3] (represents [1,5,9],[1,3,6],[4,2,7,0])
I'm not entirely sure I understand how you would extend your version to all partitions, but this might save on defining mask of size MAX(array)?

Related

How to find contiguous subarray of integers in an array from n arrays such that the sum of elements of such contiguous subarrays is minimum

Input: n arrays of integers of length p.
Output: An array of p integers built by copying contiguous subarrays of the input arrays into matching indices of the output, satisfying the following conditions.
At most one subarray is used from each input array.
Every index of the output array is filled from exactly one subarray.
The output array has the minimum possible sum.
Suppose I have 2 arrays:
[1,7,2]
[2,1,8]
So if I choose a subarray [1,7] from array 1 and subarray [8] from array 2. since these 2 subarrays are not overlapping for any index and are contiguous. We are also not taking any subarray twice from an array from which we have already chosen a subarray.
We have the number of elements in the arrays inside the collection = 2 + 1 = 3, which is the same as the length of the individual array (i.e. len(array 1) which is equal to 3). So, this collection is valid.
The sum here for [1,7] and [8] is 1 + 7 + 8 = 16
We have to find a collection of such subarrays such that the total sum of the elements of subarrays is minimum.
A solution to the above 2 arrays would be a collection [2,1] from array 1 and [2] from array 2.
This is a valid collection and the sum is 2 + 1 + 2 = 5 which is the minimum sum for any such collection in this case.
I cannot think of any optimal or correct approach, so I need help.
Some Ideas:
I tried a greedy approach by choosing minimum elements from all array for a particular index since the index is always increasing (non-overlapping) after a valid choice, I don't have to bother about storing minimum value indices for every array. But this approach is clearly not correct since it will visit the same array twice.
Another method I thought was to start from the 0th index for all arrays and start storing their sum up to k elements for every array since the no. of arrays are finite, I can store the sum upto k elements in an array. Now I tried to take a minimum across these sums and for a "minimum sum", the corresponding subarray giving this sum (i.e. k such elements in that array) can be a candidate for a valid subarray of size k, thus if we take this subarray, we can add a k + 1-th element corresponding to every array into their corresponding sum and if the original minimum still holds, then we can keep on repeating this step. When the minima fail, we can consider the subarray up to the index for which minima holds and this will be a valid starting subarray. However, this approach will also clearly fail because there could exist another subarray of size < k giving minima along with remaining index elements from our subarray of size k.
Sorting is not possible either, since if we sort then we are breaking consecutive condition.
Of course, there is a brute force method too.
I am thinking, working through a greedy approach might give a progress in the approach.
I have searched on other Stackoverflow posts, but couldn't find anything which could help my problem.

To get you started, here's a recursive branch-&-bound backtracking - and potentially exhaustive - search. Ordering heuristics can have a huge effect on how efficient these are, but without mounds of "real life" data to test against there's scant basis for picking one over another. This incorporates what may be the single most obvious ordering rule.
Because it's a work in progress, it prints stuff as it goes along: all solutions found, whenever they meet or beat the current best; and the index at which a search is cut off early, when that happens (because it becomes obvious that the partial solution at that point can't be extended to meet or beat the best full solution known so far).
For example,
>>> crunch([[5, 6, 7], [8, 0, 3], [2, 8, 7], [8, 2, 3]])
displays
new best
L2[0:1] = [2] 2
L1[1:2] = [0] 2
L3[2:3] = [3] 5
sum 5
cut at 2
L2[0:1] = [2] 2
L1[1:3] = [0, 3] 5
sum 5
cut at 2
cut at 2
cut at 2
cut at 1
cut at 1
cut at 2
cut at 2
cut at 2
cut at 1
cut at 1
cut at 1
cut at 0
cut at 0
So it found two ways to get a minimal sum 5, and the simple ordering heuristic was effective enough that all other paths to full solutions were cut off early.
def disp(lists, ixs):
from itertools import groupby
total = 0
i = 0
for k, g in groupby(ixs):
j = i + len(list(g))
chunk = lists[k][i:j]
total += sum(chunk)
print(f"L{k}[{i}:{j}] = {chunk} {total}")
i = j
def crunch(lists):
n = len(lists[0])
assert all(len(L) == n for L in lists)
# Start with a sum we know can be beat.
smallest_sum = sum(lists[0]) + 1
smallest_ixs = [None] * n
ixsofar = [None] * n
def inner(i, sumsofar, freelists):
nonlocal smallest_sum
assert sumsofar <= smallest_sum
if i == n:
print()
if sumsofar < smallest_sum:
smallest_sum = sumsofar
smallest_ixs[:] = ixsofar
print("new best")
disp(lists, ixsofar)
print("sum", sumsofar)
return
# Simple greedy heuristic: try available lists in the order
# of smallest-to-largest at index i.
for lix in sorted(freelists, key=lambda lix: lists[lix][i]):
L = lists[lix]
newsum = sumsofar
freelists.remove(lix)
# Try all slices in L starting at i.
for j in range(i, n):
newsum += L[j]
# ">" to find all smallest answers;
# ">=" to find just one (potentially faster)
if newsum > smallest_sum:
print("cut at", j)
break
ixsofar[j] = lix
inner(j + 1, newsum, freelists)
freelists.add(lix)
inner(0, 0, set(range(len(lists))))
How bad is brute force?
Bad. A brute force way to compute it: say there are n lists each with p elements. The code's ixsofar vector contains p integers each in range(n). The only constraint is that all occurrences of any integer that appears in it must be consecutive. So a brute force way to compute the total number of such vectors is to generate all p-tuples and count the number that meet the constraints. This is woefully inefficient, taking O(n**p) time, but is really easy, so hard to get wrong:
def countb(n, p):
from itertools import product, groupby
result = 0
seen = set()
for t in product(range(n), repeat=p):
seen.clear()
for k, g in groupby(t):
if k in seen:
break
seen.add(k)
else:
#print(t)
result += 1
return result
For small arguments, we can use that as a sanity check on the next function, which is efficient. This builds on common "stars and bars" combinatorial arguments to deduce the result:
def count(n, p):
# n lists of length p
# for r regions, r from 1 through min(p, n)
# number of ways to split up: comb((p - r) + r - 1, r - 1)
# for each, ff(n, r) ways to spray in list indices = comb(n, r) * r!
from math import comb, prod
total = 0
for r in range(1, min(n, p) + 1):
total += comb(p-1, r-1) * prod(range(n, n-r, -1))
return total
Faster
Following is the best code I have for this so far. It builds in more "smarts" to the code I posted before. In one sense, it's very effective. For example, for randomized p = n = 20 inputs it usually finishes within a second. That's nothing to sneeze at, since:
>>> count(20, 20)
1399496554158060983080
>>> _.bit_length()
71
That is, trying every possible way would effectively take forever. The number of cases to try doesn't even fit in a 64-bit int.
On the other hand, boost n (the number of lists) to 30, and it can take an hour. At 50, I haven't seen a non-contrived case finish yet, even if left to run overnight. The combinatorial explosion eventually becomes overwhelming.
OTOH, I'm looking for the smallest sum, period. If you needed to solve problems like this in real life, you'd either need a much smarter approach, or settle for iterative approximation algorithms.
Note: this is still a work in progress, so isn't polished, and prints some stuff as it goes along. Mostly that's been reduced to running a "watchdog" thread that wakes up every 10 minutes to show the current state of the ixsofar vector.
def crunch(lists):
import datetime
now = datetime.datetime.now
start = now()
n = len(lists[0])
assert all(len(L) == n for L in lists)
# Start with a sum we know can be beat.
smallest_sum = min(map(sum, lists)) + 1
smallest_ixs = [None] * n
ixsofar = [None] * n
import threading
def watcher(stop):
if stop.wait(60):
return
lix = ixsofar[:]
while not stop.wait(timeout=600):
print("watch", now() - start, smallest_sum)
nlix = ixsofar[:]
for i, (a, b) in enumerate(zip(lix, nlix)):
if a != b:
nlix.insert(i,"--- " + str(i) + " -->")
print(nlix)
del nlix[i]
break
lix = nlix
stop = threading.Event()
w = threading.Thread(target=watcher, args=[stop])
w.start()
def inner(i, sumsofar, freelists):
nonlocal smallest_sum
assert sumsofar <= smallest_sum
if i == n:
print()
if sumsofar < smallest_sum:
smallest_sum = sumsofar
smallest_ixs[:] = ixsofar
print("new best")
disp(lists, ixsofar)
print("sum", sumsofar, now() - start)
return
# If only one input list is still free, we have to take all
# of its tail. This code block isn't necessary, but gives a
# minor speedup (skips layers of do-nothing calls),
# especially when the length of the lists is greater than
# the number of lists.
if len(freelists) == 1:
lix = freelists.pop()
L = lists[lix]
for j in range(i, n):
ixsofar[j] = lix
sumsofar += L[j]
if sumsofar >= smallest_sum:
break
else:
inner(n, sumsofar, freelists)
freelists.add(lix)
return
# Peek ahead. The smallest completion we could possibly get
# would come from picking the smallest element in each
# remaining column (restricted to the lists - rows - still
# available). This probably isn't achievable, but is an
# absolute lower bound on what's possible, so can be used to
# cut off searches early.
newsum = sumsofar
for j in range(i, n): # pick smallest from column j
newsum += min(lists[lix][j] for lix in freelists)
if newsum >= smallest_sum:
return
# Simple greedy heuristic: try available lists in the order
# of smallest-to-largest at index i.
sortedlix = sorted(freelists, key=lambda lix: lists[lix][i])
# What's the next int in the previous slice? As soon as we
# hit an int at least that large, we can do at least as well
# by just returning, to let the caller extend the previous
# slice instead.
if i:
prev = lists[ixsofar[i-1]][i]
else:
prev = lists[sortedlix[-1]][i] + 1
for lix in sortedlix:
L = lists[lix]
if prev <= L[i]:
return
freelists.remove(lix)
newsum = sumsofar
# Try all non-empty slices in L starting at i.
for j in range(i, n):
newsum += L[j]
if newsum >= smallest_sum:
break
ixsofar[j] = lix
inner(j + 1, newsum, freelists)
freelists.add(lix)
inner(0, 0, set(range(len(lists))))
stop.set()
w.join()
Bounded by DP
I've had a lot of fun with this :-) Here's the approach they were probably looking for, using dynamic programming (DP). I have several programs that run faster in "smallish" cases, but none that can really compete on a non-contrived 20x50 case. The runtime is O(2**n * n**2 * p). Yes, that's more than exponential in n! But it's still a minuscule fraction of what brute force can require (see above), and is a hard upper bound.
Note: this is just a loop nest slinging machine-size integers, and using no "fancy" Python features. It would be easy to recode in C, where it would run much faster. As is, this code runs over 10x faster under PyPy (as opposed to the standard CPython interpreter).
Key insight: suppose we're going left to right, have reached column j, the last list we picked from was D, and before that we picked columns from lists A, B, and C. How can we proceed? Well, we can pick the next column from D too, and the "used" set {A, B, C} doesn't change. Or we can pick some other list E, the "used" set changes to {A, B, C, D}, and E becomes the last list we picked from.
Now in all these cases, the details of how we reached state "used set {A, B, C} with last list D at column j" make no difference to the collection of possible completions. It doesn't matter how many columns we picked from each, or the order in which A, B, C were used: all that matters to future choices is that A, B, and C can't be used again, and D can be but - if so - must be used immediately.
Since all ways of reaching this state have the same possible completions, the cheapest full solution must have the cheapest way of reaching this state.
So we just go left to right, one column at a time, and remember for each state in the column the smallest sum reaching that state.
This isn't cheap, but it's finite ;-) Since states are subsets of row indices, combined with (the index of) the last list used, there are 2**n * n possible states to keep track of. In fact, there are only half that, since the way sketched above never includes the index of the last-used list in the used set, but catering to that would probably cost more than it saves.
As is, states here are not represented explicitly. Instead there's just a large list of sums-so-far, of length 2**n * n. The state is implied by the list index: index i represents the state where:
i >> n is the index of the last-used list.
The last n bits of i are a bitset, where bit 2**j is set if and only if list index j is in the set of used list indices.
You could, e.g., represent these by dicts mapping (frozenset, index) pairs to sums instead, but then memory use explodes, runtime zooms, and PyPy becomes much less effective at speeding it.
Sad but true: like most DP algorithms, this finds "the best" answer but retains scant memory of how it was reached. Adding code to allow for that is harder than what's here, and typically explodes memory requirements. Probably easiest here: write new to disk at the end of each outer-loop iteration, one file per column. Then memory use isn't affected. When it's done, those files can be read back in again, in reverse order, and mildly tedious code can reconstruct the path it must have taken to reach the winning state, working backwards one column at a time from the end.
def dumbdp(lists):
import datetime
_min = min
now = datetime.datetime.now
start = now()
n = len(lists)
p = len(lists[0])
assert all(len(L) == p for L in lists)
rangen = range(n)
USEDMASK = (1 << n) - 1
HUGE = sum(sum(L) for L in lists) + 1
new = [HUGE] * (2**n * n)
for i in rangen:
new[i << n] = lists[i][0]
for j in range(1, p):
print("working on", j, now() - start)
old = new
new = [HUGE] * (2**n * n)
for key, g in enumerate(old):
if g == HUGE:
continue
i = key >> n
new[key] = _min(new[key], g + lists[i][j])
newused = (key & USEDMASK) | (1 << i)
for i in rangen:
mask = 1 << i
if newused & mask == 0:
newkey = newused | (i << n)
new[newkey] = _min(new[newkey],
g + lists[i][j])
result = min(new)
print("DONE", result, now() - start)
return result

Summing a fortran array with mask

I have a fortran array a(i,j). I wish to sum it on dimension 2(j) with a mask that j is not equal to i.
i.e,
a1=0
do j=1,n
if(j.ne.i) then
a1=a1+a(i,j)
endif
enddo
What is the way of doing this using the intrinsic sum function in fortran as I found the intrinsic to be much faster than the explicit loop.
I thought of trying sum(a(i,:),j.ne.i), but this is naturally giving error. Also if one can suggest how to only some the values of a(i,:) where abs(a(i,j)) is greater than, say 0.01, it would be helpful.

You can easily avoid any branching for the off-diagonal case. It should be much faster than creating any mask array and checking the mask. Branching (conditional jumps) is costly even when branch prediction can be very efficient.
do j=1,n
do i = 1,j-1
a1=a1+a(i,j)
end do
do i = j+1,n
a1=a1+a(i,j)
end do
end do
If you need your code to be fast and not short, you should test this kind of approach. In my tests it is much faster.

To answer your last question, you can use the WHERE construct to build a mask. For example,
logical :: y(3,3) = .false.
real x(3,3)
x = 1
x(1,1) = 0.1
x(2,2) = 0.1
x(3,3) = 0.1
print * , sum(x)
where(abs(x) > 0.25) y = .true.
print *, sum(x,y)
end
Whether this is better than nested do-loops is questionable.

I find that summing the whole array then subtracting sum of diagonal elements can be 2x faster.
a1 = 0
do i = 1, n
a1 = a1 + a(i,i)
end do
a1 = sum(a) - a1
end do

Julia / Cellular Automata: efficient way to get neighborhood

I'd like to implement a cellular automaton (CA) in Julia. Dimensions should be wrapped, this means: the left neighbor of the leftmost cell is the rightmost cell etc.
One crucial question is: how to get the neighbors of one cell to compute it's state in the next generation? As dimensions should be wrapped and Julia does not allow negative indices (as in Python) i had this idea:
Considered a 1D CA, one generation is a one-dimensional array:
0 0 1 0 0
What if we create a two dimensional Array, where the first row is shifted right and the third is shifted left, like this:
0 0 0 1 0
0 0 1 0 0
0 1 0 0 0
Now, the first column contain the states of the first cell and it's neighbors etc.
i think this can easily be generalized for two and more dimensions.
First question: do you think this is a good idea, or is this a wrong track?
EDIT: Answer to first question was no, second Question and code example discarded.
Second question: If the approach is basically ok, please have a look at the following sketch:
EDIT: Other approach, here is a stripped down version of a 1D CA, using mod1() for getting neighborhood-indices, as Bogumił Kamiński suggested.
for any cell:
- A array of all indices
- B array of all neighborhood states
- C states converted to one integer
- D lookup next state
function digits2int(digits, base=10)
int = 0
for digit in digits
int = int * base + digit
end
return int
end
gen = [0,0,0,0,0,1,0,0,0,0,0]
rule = [0,1,1,1,1,0,0,0]
function nextgen(gen, rule)
values = [mod1.(x .+ [-1,0,1], size(gen)) for x in 1:length(gen)] # A
values = [gen[value] for value in values] # B
values = [digits2int(value, 2) for value in values] # C
values = [rule[value+1] for value in values] # D
return values
end
for _ in 1:100
global gen
println(gen)
gen = nextgen(gen, rule)
end
Next step should be to extend it to two dimensions, will try it now...

The way I typically do it is to use mod1 function for wrapped indexing.
In this approach, no matter what dimensionality of your array a is then when you want to move from position x by delta dx it is enough to write mod1(x+dx, size(a, 1)) if x is the first dimension of an array.
Here is a simple example of a random walk on a 2D torus counting the number of times a given cell was visited (here I additionally use broadcasting to handle all dimensions in one expression):
function randomwalk()
a = zeros(Int, 8, 8)
pos = (1,1)
for _ in 1:10^6
# Von Neumann neighborhood
dpos = rand(((1,0), (-1,0), (0,1), (0,-1)))
pos = mod1.(pos .+ dpos, size(a))
a[pos...] += 1
end
a
end

Usually, if the CA has cells that are only dependent on the cells next to them, it's simpler just to "wrap" the vector by adding the last element to the front and the first element to the back, doing the simulation, and then "unwrap" by taking the first and last elements away again to get the result length the same as the starting array length. For the 1-D case:
const lines = 10
const start = ".........#........."
const rules = [90, 30, 14]
rule2poss(rule) = [rule & (1 << (i - 1)) != 0 for i in 1:8]
cells2bools(cells) = [cells[i] == '#' for i in 1:length(cells)]
bools2cells(bset) = prod([bset[i] ? "#" : "." for i in 1:length(bset)])
function transform(bset, ruleposs)
newbset = map(x->ruleposs[x],
[bset[i + 1] * 4 + bset[i] * 2 + bset[i - 1] + 1
for i in 2:length(bset)-1])
vcat(newbset[end], newbset, newbset[1])
end
const startset = cells2bools(start)
for rul in rules
println("\nUsing Rule $rul:")
bset = vcat(startset[end], startset, startset[1]) # wrap ends
rp = rule2poss(rul)
for _ in 1:lines
println(bools2cells(bset[2:end-1])) # unwrap ends
bset = transform(bset, rp)
end
end
As long as only the adjacent cells are used in the simulation for any given cell, this is correct.
If you extend this to a 2D matrix, you would also "wrap" the first and last rows as well as the first and last columns, and so forth.

What is the fastest way to count elements in an array?

In my models, one of the most repeated tasks to be done is counting the number of each element within an array. The counting is from a closed set, so I know there are X types of elements, and all or some of them populate the array, along with zeros that represent 'empty' cells. The array is not sorted in any way, and could by quite long (about 1M elements), and this task is done thousands of times during one simulation (which is also part of hundreds of simulations). The result should be a vector r of size X, so r(k) is the amount of k in the array.
Example:
For X = 9, if I have the following input vector:
v = [0 7 8 3 0 4 4 5 3 4 4 8 3 0 6 8 5 5 0 3]
I would like to get this result:
r = [0 0 4 4 3 1 1 3 0]
Note that I don't want the count of zeros, and that elements that don't appear in the array (like 2) have a 0 in the corresponding position of the result vector (r(2) == 0).
What would be the fastest way to achieve this goal?

tl;dr: The fastest method depend on the size of the array. For array smaller than 214 method 3 below (accumarray) is faster. For arrays larger than that method 2 below (histcounts) is better.
UPDATE: I tested this also with implicit broadcasting, that was introduced in 2016b, and the results are almost equal to the bsxfun approach, with no significant difference in this method (relative to the other methods).
Let's see what are the available methods to perform this task. For the following examples we will assume X has n elements, from 1 to n, and our array of interest is M, which is a column array that can vary in size. Our result vector will be spp1, such that spp(k) is the number of ks in M. Although I write here about X, there is no explicit implementation of it in the code below, I just define n = 500 and X is implicitly 1:500.
The naive for loop
The most simple and straightforward way to cope this task is by a for loop that iterate over the elements in X and count the number of elements in M that equal to it:
function spp = loop(M,n)
spp = zeros(n,1);
for k = 1:size(spp,1);
spp(k) = sum(M==k);
end
end
This is off course not so smart, especially if only little group of elements from X is populating M, so we better look first for those that are already in M:
function spp = uloop(M,n)
u = unique(M); % finds which elements to count
spp = zeros(n,1);
for k = u(u>0).';
spp(k) = sum(M==k);
end
end
Usually, in MATLAB, it is advisable to take advantage of the built-in functions as much as possible, since most of the times they are much faster. I thought of 5 options to do so:
1. The function tabulate
The function tabulate returns a very convenient frequency table that at first sight seem to be the perfect solution for this task:
function tab = tabi(M)
tab = tabulate(M);
if tab(1)==0
tab(1,:) = [];
end
end
The only fix to be done is to remove the first row of the table if it counts the 0 element (it could be that there are no zeros in M).
2. The function histcounts
Another option that can be tweaked quite easily to our need it histcounts:
function spp = histci(M,n)
spp = histcounts(M,1:n+1);
end
here, in order to count all different elements between 1 to n separately, we define the edges to be 1:n+1, so every element in X has it's own bin. We could write also histcounts(M(M>0),'BinMethod','integers'), but I already tested it, and it takes more time (though it makes the function independent of n).
3. The function accumarray
The next option I'll bring here is the use of the function accumarray:
function spp = accumi(M)
spp = accumarray(M(M>0),1);
end
here we give the function M(M>0) as input, to skip the zeros, and use 1 as the vals input to count all unique elements.
4. The function bsxfun
We can even use binary operation #eq (i.e. ==) to look for all elements from each type:
function spp = bsxi(M,n)
spp = bsxfun(#eq,M,1:n);
spp = sum(spp,1);
end
if we keep the first input M and the second 1:n in different dimensions, so one is a column vector the other is a row vector, then the function compares each element in M with each element in 1:n, and create a length(M)-by-n logical matrix than we can sum to get the desired result.
5. The function ndgrid
Another option, similar to the bsxfun, is to explicitly create the two matrices of all possibilities using the ndgrid function:
function spp = gridi(M,n)
[Mx,nx] = ndgrid(M,1:n);
spp = sum(Mx==nx);
end
then we compare them and sum over columns, to get the final result.
Benchmarking
I have done a little test to find the fastest method from all mentioned above, I defined n = 500 for all trails. For some (especially the naive for) there is a great impact of n on the time of execution, but this is not the issue here since we want to test it for a given n.
Here are the results:
We can notice several things:
Interestingly, there is a shift in the fastest method. For arrays smaller than 214 accumarray is the fastest. For arrays larger than 214 histcounts is the fastest.
As expected the naive for loops, in both versions are the slowest, but for arrays smaller than 28 the "unique & for" option is slower. ndgrid become the slowest in arrays bigger than 211, probably because of the need to store very large matrices in memory.
There is some irregularity in the way tabulate works on arrays in size smaller than 29. This result was consistent (with some variation in the pattern) in all the trials I conducted.
(the bsxfun and ndgrid curves are truncated because it makes my computer stuck in higher values, and the trend is quite clear already)
Also, notice that the y-axis is in log10, so a decrease in unit (like for arrays in size 219, between accumarray and histcounts) means a 10-times faster operation.
I'll be glad to hear in the comments for improvements to this test, and if you have another, conceptually different method, you are most welcome to suggest it as an answer.
The code
Here are all the functions wrapped in a timing function:
function out = timing_hist(N,n)
M = randi([0 n],N,1);
func_times = {'for','unique & for','tabulate','histcounts','accumarray','bsxfun','ndgrid';
timeit(#() loop(M,n)),...
timeit(#() uloop(M,n)),...
timeit(#() tabi(M)),...
timeit(#() histci(M,n)),...
timeit(#() accumi(M)),...
timeit(#() bsxi(M,n)),...
timeit(#() gridi(M,n))};
out = cell2mat(func_times(2,:));
end
function spp = loop(M,n)
spp = zeros(n,1);
for k = 1:size(spp,1);
spp(k) = sum(M==k);
end
end
function spp = uloop(M,n)
u = unique(M);
spp = zeros(n,1);
for k = u(u>0).';
spp(k) = sum(M==k);
end
end
function tab = tabi(M)
tab = tabulate(M);
if tab(1)==0
tab(1,:) = [];
end
end
function spp = histci(M,n)
spp = histcounts(M,1:n+1);
end
function spp = accumi(M)
spp = accumarray(M(M>0),1);
end
function spp = bsxi(M,n)
spp = bsxfun(#eq,M,1:n);
spp = sum(spp,1);
end
function spp = gridi(M,n)
[Mx,nx] = ndgrid(M,1:n);
spp = sum(Mx==nx);
end
And here is the script to run this code and produce the graph:
N = 25; % it is not recommended to run this with N>19 for the `bsxfun` and `ndgrid` functions.
func_times = zeros(N,5);
for n = 1:N
func_times(n,:) = timing_hist(2^n,500);
end
% plotting:
hold on
mark = 'xo*^dsp';
for k = 1:size(func_times,2)
plot(1:size(func_times,1),log10(func_times(:,k).*1000),['-' mark(k)],...
'MarkerEdgeColor','k','LineWidth',1.5);
end
hold off
xlabel('Log_2(Array size)','FontSize',16)
ylabel('Log_{10}(Execution time) (ms)','FontSize',16)
legend({'for','unique & for','tabulate','histcounts','accumarray','bsxfun','ndgrid'},...
'Location','NorthWest','FontSize',14)
grid on
1 The reason for this weird name comes from my field, Ecology. My models are a cellular-automata, that typically simulate individual organisms in a virtual space (the M above). The individuals are of different species (hence spp) and all together form what is called "ecological community". The "state" of the community is given by the number of individuals from each species, which is the spp vector in this answer. In this models, we first define a species pool (X above) for the individuals to be drawn from, and the community state take into account all species in the species pool, not only those present in M

We know that that the input vector always contains integers, so why not use this to "squeeze" a bit more performance out of the algorithm?
I've been experimenting with some optimizations of the the two best binning methods suggested by the OP, and this is what I came up with:
The number of unique values (X in the question, or n in the example) should be explicitly converted to an (unsigned) integer type.
It's faster to compute an extra bin and then discard it, than to "only process" valid values (see the accumi_new function below).
This function takes about 30sec to run on my machine. I'm using MATLAB R2016a.
function q38941694
datestr(now)
N = 25;
func_times = zeros(N,4);
for n = 1:N
func_times(n,:) = timing_hist(2^n,500);
end
% Plotting:
figure('Position',[572 362 758 608]);
hP = plot(1:n,log10(func_times.*1000),'-o','MarkerEdgeColor','k','LineWidth',2);
xlabel('Log_2(Array size)'); ylabel('Log_{10}(Execution time) (ms)')
legend({'histcounts (double)','histcounts (uint)','accumarray (old)',...
'accumarray (new)'},'FontSize',12,'Location','NorthWest')
grid on; grid minor;
set(hP([2,4]),'Marker','s'); set(gca,'Fontsize',16);
datestr(now)
end
function out = timing_hist(N,n)
% Convert n into an appropriate integer class:
if n < intmax('uint8')
classname = 'uint8';
n = uint8(n);
elseif n < intmax('uint16')
classname = 'uint16';
n = uint16(n);
elseif n < intmax('uint32')
classname = 'uint32';
n = uint32(n);
else % n < intmax('uint64')
classname = 'uint64';
n = uint64(n);
end
% Generate an input:
M = randi([0 n],N,1,classname);
% Time different options:
warning off 'MATLAB:timeit:HighOverhead'
func_times = {'histcounts (double)','histcounts (uint)','accumarray (old)',...
'accumarray (new)';
timeit(#() histci(double(M),double(n))),...
timeit(#() histci(M,n)),...
timeit(#() accumi(M)),...
timeit(#() accumi_new(M))
};
out = cell2mat(func_times(2,:));
end
function spp = histci(M,n)
spp = histcounts(M,1:n+1);
end
function spp = accumi(M)
spp = accumarray(M(M>0),1);
end
function spp = accumi_new(M)
spp = accumarray(M+1,1);
spp = spp(2:end);
end

Find the number of inversions in Matlab

On the way of finding number of inversions in array by divide-and-conquer approach I faced with a problem of implementing merge-step: we have two sorted arrays, the task is to count the number of cases when an element of the first array is greater than an element from the second.
For example, if the arrays are v1 = [1,2,4], v2 = [0,3,5], we should count 4 inversions.
So, I implemented the merge-step in Matlab, but I stuck with the problem of how to make it fast.
Firstly, I've tried brute-force approach with
tempArray = arrayfun(#(x) length(find(v2>x)), v1)
It works too slow as well as the next snippet
l = 1;
s = 0;
for ii = 1:n1
while(l <= n2 && p1(ii)>p2(l))
l = l + 1;
end
s = s + l - 1;
end
Is there a good way to make it faster?
Edit
Thank you for your answers and approaches! I find interesting things for my further work.
Here is the snippet, which supposed to be the fastest I've tried
n1 = length(vec1); n2 = length(vec2);
uniteOne = [vec1, vec2];
[~, d1] = sort(uniteOne);
[~, d2] = sort(d1); % Find ind-s IX such that B(IX) = A, where B = sort(A)
vec1Slice = d2(1:n1);
finalVecToSolve = vec1Slice - (1:n1);
sum(finalVecToSolve)

Another brute-force approach with bsxfun -
sum(reshape(bsxfun(#gt,v1(:),v2(:).'),[],1))
Or, as #thewaywewalk has mentioned in the comments, use nnz to replacing summing -
nnz(bsxfun(#gt,v1(:),v2(:).'))

Code
n = numel(v1);
[~, ind_sort] = sort([v1 v2]);
ind_v = ind_sort<=n;
result = sum(find(ind_v))-n*(n+1)/2;
Explanation
n denotes the length of the input vectors. ind_v is a vector of length 2*n that represents the values of v1 and v2 sorted together, with one indicating a value from v1 and zero indicating a value from v2. For your example,
v1 = [1,2,4];
v2 = [0,3,5];
we have
ind_v =
0 1 1 0 1 0
The first entry of ind_v is zero. This means that the lowest value from v1 and v2 (which is 0) belongs to v2. Then there is a one because the second-lowest value (which is 1) belongs to v1. The last entry of ind_v is zero because the largest value of the input vectors (which is 5) belongs to v2.
From this ind_v it's easy to compute the result. Namely, we only need to count how many zeros there are to the left of each one, and sum all those counts.
We don't even need to do the counts; we can infer them just from the position of each one. The number of zeros to the left of the first one is the position of that one minus 1. The number of zeros to the left of the second one is its position minus 2. And so on. Thus sum(find(ind_v)-(1:n)) would give the desired result. But sum(1:n) is just n*(n+1)/2, and so the result can be simplified to sum(find(ind_v))-n*(n+1)/2.
Complexity
Sorting the vectors is the limiting operation here, and requires O(2*n*log(2*n)) arithmetic comparisons. Brute force, on the contrary, requires O(n^2) comparisons.

One explicit approach could be to subtract your elements and see where they're negative:
v1 = [1,2,4];
v2 = [0,3,5];
mydiffs = zeros(length(v1), length(v2));
for ii = 1:length(v1)
mydiffs(ii,:) = v2 - v1(ii);
end
test = sum(reshape(mydiffs,[],length(v1)*length(v2)) < 0)
Which returns:
test =
4
This isn't the prettiest approach and there's definitely room for improvement. I also doubt it's faster than the bsxfun approach.
Edit1: An arrayfun approach, which looks tidier but isn't necessarily faster than the loop.
test = arrayfun(#(x) (v2 - x) < 0, v1, 'UniformOutput', false);
inversions = sum([test{:}]);
Edit2: A repmat approach
inversions = nnz(repmat(v2, length(v2), 1) - repmat(v1', 1, length(v1)) < 0)