Fastest way to get Nth combination without repetition from a larg number of symbols

Fastest way to get Nth combination without repetition from a larg number of symbols - permutation

64 symbols have 64! permutations. How to get one of these permutations from its index/rank and how to get index/rank of one of these permutations in the fastest way in Java or Python or C#?
These permutations have no repetitions, and the length of each of the permutations is equal to the number of symbols given to the function.

N-th permutation
The iea is that whichever digit you select for the first position, what remains is a permutation of (n-1) elements, so the digit selected to the first position is floor(idx / (n-1)!). Apply this recursively and you have the permutation you want.
from functools import lru_cache
#lru_cache
def factorial(n):
if n <= 1: return 1
else: return n * factorial(n-1);
def nth_permutation(idx, length, alphabet=None, prefix=()):
if alphabet is None:
alphabet = [i for i in range(length)]
if length == 0:
return prefix
else:
branch_count = factorial(length-1)
for d in alphabet:
if d not in prefix:
if branch_count <= idx:
idx -= branch_count;
else:
return nth_permutation(idx,
length-1, alphabet, prefix + (d,))
This will return a tuple representing the requested permutation, if you want you can pass a custom alphabet.
Examples
nth_permutation(1, 10)
# (0, 1, 2, 3, 4, 5, 6, 7, 9, 8)
nth_permutation(1000, 10)
# (0, 1, 2, 4, 6, 5, 8, 9, 3, 7)
1000
nth_permutation(3628799, 10)
# (9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
nth_permutation(10**89, 64)
# [[50 27 40 11 60 12 10 49]
# [63 29 41 0 2 48 43 47]
# [57 6 59 56 17 58 52 39]
# [13 51 25 23 45 24 26 7]
# [46 20 36 62 14 55 31 3]
# [ 4 5 53 15 8 28 16 21]
# [32 30 35 18 19 37 61 44]
# [38 42 54 9 33 34 1 22]]
Permutation index
The index of a given permutation is the index of the first element multiplied by (n-1)! added to the rank of the permutation of the remaining terms.
def permutation_index(item, alphabet=None):
if alphabet is None:
alphabet = sorted(item)
n = len(item)
r = 0
for i, v in enumerate(item):
# for every (item[j] > item[i]) we have to increase (n - i)!
# the factorials are computed recursively
# grouped in r
r = sum(1 for u in item[i+1:]
if alphabet.index(u) < alphabet.index(v)) + r * (n - i)
return r;
Consistency check
permutation_index(nth_permutation(1234567890, 16))

Related

Splitting a large matrix

I'm using Julia to ingest a large two dimensional data array (data) of size 1000 x 32768; I need to break up the array into smaller square arrays along both dimensions. For instance, I would like to break data into a grid of smaller, square arrays similar to the following image:
Note that no pixels get left out -- when another square cannot be fit in along either axis, the last possible array of square pixels is returned as another array (hence the shifted pink squares on the right hand side).
Currently, I'm doing this through a function I built to decimate the raw dataset:
function decimate_square(data,fraction=4)
# Read size of input data / calculate length of square side
sy,sx = size(data)
square_side = Int(round(sy/fraction))
# Number of achievable full squares
itersx,itersy = [Int(floor(s/square_side)) for s in [sx,sy]]
# Find left/right X values
for ix in 1:itersx
if ix!=itersx
# Full sliding square can be calculated
left = square_side*(ix-1) + 1
right = square_side*(ix)
else
# Capture last square of data
left = sx-square_side + 1
right = sx
end
# Find top/bottom Y values for each X
for iy in 1:itersy
if iy!=itersy
# Full sliding square can be calculated
top = square_side*(iy-1) + 1
bottom = square_side*(iy)
else
# Capture last square of data
top = sy-square_side + 1
bottom = sy
end
# Record data in 3d stack
cursquare = data[top:bottom,left:right]
if (ix==1)&&(iy==1); global dstack=cursquare
else; dstack=cat(dstack,cursquare,dims=3)
end
end
end
return dstack
end
Which currently takes ~20 seconds to run:
rand_arr = rand(1000,32768)
t1 = Dates.now()
dec_arr = decimate_square(rand_arr)
t2 = Dates.now()
#info(t2-t1)
[ Info: 19666 milliseconds
This is the biggest bottleneck of my analysis. Is there a pre-built function that I can use, or is there a more efficient way to decimate my array?

You can take views as Przemyslaw Szufel suggests, and the CartesianIndex type comes in handy for selecting blocks of the matrix.
julia> function squareviews(data, fraction = 4)
squareside = floor(Int, size(data, 1) / fraction)
[#view(M[CartesianIndex(ix-squareside+1, iy-squareside+1):CartesianIndex(ix, iy)])
for ix in squareside:squareside:size(data, 1),
iy in squareside:squareside:size(data, 2)]
end
squareviews (generic function with 2 methods)
julia> result = squareviews(M)
4×40 Matrix{SubArray{Int64, 2, Matrix{Int64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}}:
[346 392 … 746 429; 380 193 … 476 757; … ; 424 329 … 285 427; 591 792 … 710 891] … [758 916 … 7 185; 26 846 … 631 808; … ; 945 713 … 875 137; 793 655 … 400 322]
[55 919 … 402 728; 292 238 … 266 636; … ; 62 490 … 913 126; 293 475 … 492 20] [53 8 … 146 365; 216 673 … 157 909; … ; 955 635 … 332 945; 354 913 … 922 272]
[278 966 … 128 334; 700 560 … 226 701; … ; 529 398 … 17 674; 237 830 … 4 788] [239 274 … 983 911; 591 669 … 762 675; … ; 213 949 … 917 903; 336 890 … 633 578]
[723 483 … 135 283; 729 579 … 1000 942; … ; 987 383 … 764 544; 682 942 … 376 179] [370 859 … 444 566; 34 106 … 320 161; … ; 310 41 … 868 349; 719 341 … 718 800]
This divides the data matrix into blocks such that result[2, 3] gives the square that is 2nd from the top and 3rd from the left. (My matrix M was 100x1000 in size, so there are 100/25 = 4 blocks vertically and 1000/25 = 40 blocks horizontally.)
If you want the results linearly like in your original function, you can instead have the second line of the function be:
julia> function squareviews(data, fraction = 4)
squareside = floor(Int, size(data, 1) / fraction)
[#view(M[CartesianIndex(ix-squareside+1, iy-squareside+1):CartesianIndex(ix, iy)])
for iy in squareside:squareside:size(data, 2)
for ix in squareside:squareside:size(data, 1)]
end
squareviews (generic function with 2 methods)
julia> squareviews(M)
160-element Vector{SubArray{Int64, 2, Matrix{Int64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}}:
(Note the subtle changes in the for syntax - the iy comes before ix here, there's no comma, and there's an extra for.)
This returns a vector of square matrices (views).
Your original function returned a three-dimensional matrix, in which you'd access values as originalresult[i, j, k]. Here, the equivalent would be result[k][i, j].

There's a lot of stuff going on in your code with is not recommended and making things slow. Here's a somewhat idiomatic solution, with the additional bonus of generalizing to arbitrary ranks:
julia> function square_indices(data; fraction=4)
splits = cld.(size(data), fraction)
return Iterators.map(CartesianIndices, Iterators.product(Iterators.partition.(axes(data), splits)...))
end
square_indices (generic function with 1 method)
The result of this is an iterator over CartesianIndices, which are objects that you can use to index your squares. Either the regular data[ix], or view(data, ix), which does not create a copy. (Different fractions per dimension are possible, try test_square_indices(println, 4, 4, 4; fraction=(2, 1, 1)).)
And to see whether it works as expected:
julia> function test_square_indices(f, s...; fraction=4)
arr = reshape(1:prod(s), s...)
for ix in square_indices(arr; fraction)
f(view(arr, ix))
end
end
test_square_indices (generic function with 1 method)
julia> # just try this on some moderatly costly function
#btime test_square_indices(v -> inv.(v), 1000, 32768)
81.980 ms (139 allocations: 250.01 MiB)
julia> test_square_indices(println, 9)
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
julia> test_square_indices(println, 9, 5)
[1 10; 2 11; 3 12]
[4 13; 5 14; 6 15]
[7 16; 8 17; 9 18]
[19 28; 20 29; 21 30]
[22 31; 23 32; 24 33]
[25 34; 26 35; 27 36]
[37; 38; 39;;]
[40; 41; 42;;]
[43; 44; 45;;]
julia> reshape(1:9*5, 9, 5)
9×5 reshape(::UnitRange{Int64}, 9, 5) with eltype Int64:
1 10 19 28 37
2 11 20 29 38
3 12 21 30 39
4 13 22 31 40
5 14 23 32 41
6 15 24 33 42
7 16 25 34 43
8 17 26 35 44
9 18 27 36 45
julia> test_square_indices(println, 4, 4, 4; fraction=2)
[1 5; 2 6;;; 17 21; 18 22]
[3 7; 4 8;;; 19 23; 20 24]
[9 13; 10 14;;; 25 29; 26 30]
[11 15; 12 16;;; 27 31; 28 32]
[33 37; 34 38;;; 49 53; 50 54]
[35 39; 36 40;;; 51 55; 52 56]
[41 45; 42 46;;; 57 61; 58 62]
[43 47; 44 48;;; 59 63; 60 64]
julia> reshape(1:4*4*4, 4, 4, 4)
4×4×4 reshape(::UnitRange{Int64}, 4, 4, 4) with eltype Int64:
[:, :, 1] =
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
[:, :, 2] =
17 21 25 29
18 22 26 30
19 23 27 31
20 24 28 32
[:, :, 3] =
33 37 41 45
34 38 42 46
35 39 43 47
36 40 44 48
[:, :, 4] =
49 53 57 61
50 54 58 62
51 55 59 63
52 56 60 64
Here's a bit of an illustration of how this works:
julia> data = reshape(1:9*5, 9, 5); fraction = 3;
julia> size(data)
(9, 5)
julia> # chunk sizes
splits = cld.(size(data), fraction)
(3, 2)
julia> # every dimension chunked
Iterators.partition.(axes(data), splits) .|> collect
(UnitRange{Int64}[1:3, 4:6, 7:9], UnitRange{Int64}[1:2, 3:4, 5:5])
julia> # cross product of all chunks
Iterators.product(Iterators.partition.(axes(data), splits)...) .|> collect
3×3 Matrix{Vector{UnitRange{Int64}}}:
[1:3, 1:2] [1:3, 3:4] [1:3, 5:5]
[4:6, 1:2] [4:6, 3:4] [4:6, 5:5]
[7:9, 1:2] [7:9, 3:4] [7:9, 5:5]

You could just go with views. Suppose you want to slice your data into 64 matrices, each having size 1000 x 512. In that case you could do:
dats = view.(Ref(rand_arr),Ref(1:1000), [range(1+(i-1)*512,i*512) for i in 1:64])
The time for this on my machine is 600 nanoseconds:
julia> #btime view.(Ref($rand_arr),Ref(1:1000), [range(1+(i-1)*512,i*512) for i in 1:64]);
595.604 ns (3 allocations: 4.70 KiB)

Extract indices of sets of values greater than zero in an array

I have an array of length n. The array has braking energy values, and the index number represents time in seconds.
The structure of array is as follows:
Index 1 to 140, array has zero values. (Vehicle not braking)
Index 141 to 200, array has random energy values. (Vehicle was braking and regenerating energy)
Index 201 to 325, array has zero values. (Vehicle not braking)
Index 326 to 405, array has random energy values. (Vehicle was braking and regenerating energy)
...and so on for an array of length n.
What I want to do is to get starting and ending index number of each set of energy values.
For example the above sequence gives this result:
141 - 200
326 - 405
...
Can someone please suggest what method or technique can I use to get this result?

Using diff is a quick way to do this.
Here is a demo (see the comments for details):
% Junk data for demo. Indices shown above for reference
% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
x = [0, 0, 0, 2, 3, 4, 0, 0, 1, 1, 7, 9, 3, 4, 0, 0, 0];
% Logical converts all non-zero values to 1
% diff is x(2:end)-x(1:end-1), so picks up on changes to/from zeros
% Instead of 'logical', you could have a condition here,
% e.g. bChange = diff( x > 0.5 );
bChange = diff( logical( x ) );
% bChange is one of the following for each consecutive pair:
% 1 for [0 1] pairs
% 0 for [0 0] or [1 1] pairs
% -1 for [1 0] pairs
% We inflate startIdx by 1 to index the non-zero value
startIdx = find( bChange > 0 ) + 1; % Indices of [0 1] pairs
endIdx = find( bChange < 0 ); % Indices of [1 0] pairs
I'll leave it as an exercise to capture the edge cases where you add a start or end index if the array starts or ends with a non-zero value. Hint: you could handle each case separately or pad the initial x with additional end values.
Output of the above:
startIdx
>> [4, 9]
endIdx
>> [6, 14]
So you can format this however you like to get the spans 4-6, 9-14.

This task is performed by two methods Both works perfectly.
Wolfie Method:
bChange = diff( EnergyB > 0 );
startIdx = find( bChange > 0 ) + 1; % Indices of [0 1] pairs
endIdx = find( bChange < 0 ); % Indices of [1 0] pairs
Result:
startIdx =
141
370
608
843
endIdx =
212
426
642
912
Second Method:
startends = find(diff([0; EnergyB > 0; 0]));
startends = reshape(startends, 2, [])';
startends(:, 2) = startends(:, 2) - 1
Result:
startends =
141 212
370 426
608 642
843 912

find a list of integers for a checksum

I would need a list of n positive integers L that has following properties:
for each possible subset S of L, if I sum all items of S, this sum is not in L
for each possible subset S of L, if I sum all items of S, this sum is unique (each subset can be identified by his sum)
Working example 1:
n = 4
L = [1, 5, 7, 9]
check:
1+5 = 6 ok
5+7 = 12 ok
7+9 = 16 ok
9+1 = 10 ok
1+7 = 8 ok
5+9 = 14 ok
1+5+7 = 13 ok
5+7+9 = 21 ok
1+5+9 = 15 ok
1+7+9 = 17 ok
1+5+7+9 = 22 ok
All sums are unique -> L is OK for n = 4

As an easy to construct sequence, I suggest using power series, e.g.
1, 2, 4, 8, ..., 2**k, ...
1, 3, 9, 27, ..., 3**k, ...
1, 4, 16, 64, ..., 4**k, ...
...
1, n, n**2, n**3,..., n**k, ... where n >= 2
Take, for instance, 2: neither power of 2 is a sum of other 2 powers; given a sum (number) you can easily find out the subset by converting sum into binary representation:
23 = 10111 (binary) = 2**0 + 2**1 + 2**2 + 2**4 = 1 + 2 + 4 + 16
In general case, a simple greedy algorithm will do: given a sum subtract the largest item less or equal to the sum; continue subtracting up to zero:
n = 3
sum = 273
273 - 243 (3**5) = 30
30 - 27 (3**3) = 3
3 - 3 (3**1) = 0
273 = 3**5 + 3**3 + 3**1 = 243 + 27 + 3

Matlab: extract values from vector A, based on values in vector B

A = [5 10 16 22 28 32 36 44 49 56]
B = [2 1 1 2 1 2 1 2 2 2]
How to get this?
C1 = [10 16 28 36]
C2 = [5 22 32 44 49 56]
C1 needs to get the values from A, only in the positions in which B is 1
C2 needs to get the values from A, only in the positions in which B is 2

You can do this this way :
C1 = A(B==1);
C2 = A(B==2);
B==1 gives a logical array : [ 0 1 1 0 1 0 1 0 0 0 ].
A(logicalArray) returns elements for which the value of logicalArray is true (it is termed logical indexing).
A and logicalArray must of course have the same size.
It is probably the fastest way of doing this operation in matlab.
For more information on indexing, see matlab documentation.

To achieve this with an arbitrary number of groups (not just two as in your example), use accumarray with an a anoynmous function to collect the values in each group into a cell. To preserve order, B needs to be sorted first (and the same order needs to be applied to A):
[B_sort, ind_sort] = sort(B);
C = accumarray(B_sort.', A(ind_sort).', [], #(x){x.'});
This gives the result in a cell array:
>> C{1}
ans =
10 16 28 36
>> C{2}
ans =
5 22 32 44 49 56

Julia : Cartesian product of multiple arrays

I would like to compute a product iterator using Iterators.jl.
Let's say I have an array of UnitRanges tab with a priori unknown size.
I would like to compute the cartesian product of the elements of tab.
For example if tab length is 2 and tab[1] = a and tab[2] = b I want to compute product(a,b) from Iterators.jl.
I want to make a generic function that compute the cartesian product of every component in tab.
I tried something like this
prod = tab[1]
for i in tab[2:end]
prod = product(prod,i)
end
However if tab is length 3, components a,b and c, I obtain in prod elements under the form (1,(3,2)) and not (1,3,2). With 1 element of c, 3 element of b and 2 element of a.

In v0.5, there is now Base.product, which is much better than Iterators.product.
It can handle as many arrays as needed, and it even has a shape:
julia> collect(Base.product([1, 2], [3, 4]))
2×2 Array{Tuple{Int64,Int64},2}:
(1,3) (1,4)
(2,3) (2,4)
julia> collect(Base.product(1:5, 1:3, 1:2, 1:2))
5×3×2×2 Array{NTuple{4,Int64},4}:
[:, :, 1, 1] =
(1,1,1,1) (1,2,1,1) (1,3,1,1)
(2,1,1,1) (2,2,1,1) (2,3,1,1)
(3,1,1,1) (3,2,1,1) (3,3,1,1)
(4,1,1,1) (4,2,1,1) (4,3,1,1)
(5,1,1,1) (5,2,1,1) (5,3,1,1)
[:, :, 2, 1] =
(1,1,2,1) (1,2,2,1) (1,3,2,1)
(2,1,2,1) (2,2,2,1) (2,3,2,1)
(3,1,2,1) (3,2,2,1) (3,3,2,1)
(4,1,2,1) (4,2,2,1) (4,3,2,1)
(5,1,2,1) (5,2,2,1) (5,3,2,1)
[:, :, 1, 2] =
(1,1,1,2) (1,2,1,2) (1,3,1,2)
(2,1,1,2) (2,2,1,2) (2,3,1,2)
(3,1,1,2) (3,2,1,2) (3,3,1,2)
(4,1,1,2) (4,2,1,2) (4,3,1,2)
(5,1,1,2) (5,2,1,2) (5,3,1,2)
[:, :, 2, 2] =
(1,1,2,2) (1,2,2,2) (1,3,2,2)
(2,1,2,2) (2,2,2,2) (2,3,2,2)
(3,1,2,2) (3,2,2,2) (3,3,2,2)
(4,1,2,2) (4,2,2,2) (4,3,2,2)
(5,1,2,2) (5,2,2,2) (5,3,2,2)
The shape is extremely useful for map. For instance, here's how to create a multiplication table using Base.product:
julia> map(prod, Base.product(1:9, 1:9))
9×9 Array{Int64,2}:
1 2 3 4 5 6 7 8 9
2 4 6 8 10 12 14 16 18
3 6 9 12 15 18 21 24 27
4 8 12 16 20 24 28 32 36
5 10 15 20 25 30 35 40 45
6 12 18 24 30 36 42 48 54
7 14 21 28 35 42 49 56 63
8 16 24 32 40 48 56 64 72
9 18 27 36 45 54 63 72 81
Of course, if you don't need the shape, then you are free to ignore it — it will still iterate properly.
And Base.product is fast too!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Fastest way to get Nth combination without repetition from a larg number of symbols - permutation

Related

Splitting a large matrix

Extract indices of sets of values greater than zero in an array

find a list of integers for a checksum

Matlab: extract values from vector A, based on values in vector B

Julia : Cartesian product of multiple arrays

Categories

Resources