Related
Hello Everyone,
I am Abhiroop Singh new to the world of competitive programming, recently I came across a question for which i didn't have the faintest idea how to approach it. Being working on the approach for past two days please point me in the righ direction.
So, the questions was,
You are given an array A of N integers. You need to find two integers x and y such that the sum of the absolute difference between each element of the array to one of the two chosen integers is minimal
Task
Determine the minimum value of the expression function if the chosen numbers are x and y
Example
Assumptions
N=4
A = [2,3,6,7]
Approach
•You can choose the two integers, 3 and 7
•The required sum |2-3| + |3-3| + |6-7| + |7-7| = 1 + 0 + 1 + 0 = 2
Constraints
1<= T <= 100
2<= N <=5*10^3
1<= A[i] <=10^6
The sum of N over all test cases does not exceed 5*10^3
Sample input
2
3
1 3 5
4
3 2 5 11
Output
2
3
Explanation
The first line contains the number of test case. T = 2
The first test case
Given
• N = 3
• A = [1,3,5]
Approach
• You can choose the two integers 1 and 4 .
• The required sum = |1-1| + |3-4| + |5-4| = 0 + 1 + 1 = 2.
The second test case
Given
• N = 4
• A = [3, 2, 5, 11]
Approach
• You can choose the two integers, 3 and 11.
• The required sum = |2-3| + |3-3| + |5-3| + |11-11| = 1 + 0 + 2 + 0 = 3.
My approach-:
• First I tried finding median of the array.
• Secondly I tried applying binary search to find the two numbers
Both didn't worked. So, please help me.
Given an array of values, how can you build a data structure that lets you find the maximum of any contiguous subarray quickly? Ideally the overhead of building this structure should be small and the structure should allow efficient appends and mutation of single elements.
An example array would be [6, 2, 3, 7, 4, 5, 1, 0, 3]. A request may be to find the maximum of the slice from index 2 to 7 (subarray [3, 7, 5, 1, 0]), which would result in 7.
Let n be the length of the array and k be the length of the slice.
The naïve, O(log k), method
An obvious solution is to build a tree that repeatedly gives a pairwise summary of the maximums
1 8 4 5 4 0 1 5 6 9 1 7 0 4 0 9 0 7 0 4 5 7 4 3 4 6 3 8 2 4 · ·
8 5 4 5 9 7 4 9 7 4 7 4 6 8 4 ·
8 5 9 9 7 7 8 4
8 9 7 8
9 8
9
These summaries take at most O(n) space, and the lower levels can be stored efficiently by using short indices. The bottom level, for example, can be a bit array. Appends and single mutations take O(log n) time. There are many other areas for optimization if need be.
The chosen slice can be split into two slices, split on a boundary between two triangles. In this example, for a given slice we'd split as so:
|---------------------------------|
6 9 1 7 0 4 0 9|0 7 0 4 5 7 4 3 4 6 3 8 2 4 · ·
9 7 4 9 | 7 4 7 4 6 8 4 ·
9 9 | 7 7 8 4
9 | 7 8
| 8
In each triangle we are interested in a forest of these trees that minimally determines the elements we actually care about:
|---------------------------------|
1 7 0 4 0 9|0 7 0 4 5 7 4 3 4 6 3
7 4 9 | 7 4 7 4 6
9 | 7 7
| 7
Note that in this case there are two trees on the left and three on the right. The total number of trees will be at most O(log k), since there are at most two of any given height. We can find the splitting point with a little bit-math
round_to = (start ^ end).bit_length() - 1
split_point = (end >> height) << height
Note that Python's bit_length can be done quickly with the lzcnt instruction on x86 architectures. The relevant trees are on each side of the split. The sizes of the relevant subtrees are encoded in the bits of the residuals of these numbers:
lhs_residuals = split_point - start
rhs_residuals = end - split_point
bin(lhs_residuals)
# eg. 10010110
# sizes = 10000000
# 10000
# 100
# 10
It's hard to traverse the most significant bits of an integer, but if you do a bit swap (a byteswap instruction plus a few shift-and-masks) you can then traverse the lowest significant bits by iterating this:
new_value = value & (value - 1)
lowest_set_bit = value ^ new_value
value = new_value
A traversal down the left and right halves takes O(log k) expected time because there are at most 2log₂ k trees - one per bit for each side.
A tangent: handling residuals in O(1) time and O(n log n) space
O(log k) is better than O(log n), but it's still not groundbreaking. One helpful effect of the previous attempt is that the trees to each side are "attached" to one side; there are only n ranges in their slice, not n² for an arbitrary slice. You can utilize this by adding to each level cumulative maxima, as so:
1 8 4 5 4 0 1 5 6 9 1 7 0 4 0 9 0 7 0 4 5 7 4 3 4 6 3 8 2 4 · ·
- 8|- 5|- 4|- 5|- 9|- 7|- 4|- 9|- 7|- 4|- 7|- 4|- 6|- 8|- 4|- · left to right
8 -|5 -|4 -|5 -|9 -|7 -|4 -|9 -|7 -|4 -|7 -|4 -|6 -|8 -|4 -|· - right to left
- - 8 8|- - 4 5|- - 9 9|- - 4 9|- - 7 7|- - 7 7|- - 6 8|- - · · left to right
8 8 - -|5 5 - -|9 9 - -|9 9 - -|7 7 - -|7 7 - -|8 8 - -|4 4 - - right to left
- - - - 8 8 8 8|- - - - 9 9 9 9|- - - - 7 7 7 7|- - - - 8 8 · · left to right
8 8 5 5 - - - -|9 9 9 9 - - - -|7 7 7 7 - - - -|8 8 8 8 - - - - right to left
- - - - - - - - 8 9 9 9 9 9 9 9|- - - - - - - - 7 7 7 8 8 8 · · left to right
9 9 9 9 9 9 9 9 - - - - - - - -|8 8 8 8 8 8 8 8 - - - - - - - - right to left
- - - - - - - - - - - - - - - - 9 9 9 9 9 9 9 9 9 9 9 9 9 9 · · left to right
9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 - - - - - - - - - - - - - - - - right to left
The marker - is used to ignore those parts necessarily the same as the level below them, which do not need to be copied. In this case, the relevant slices are
|---------------------------------|
1 7 0 4 0 9 0 7 0 4 5 7 4 3 4 6 3
↓ ↓
9 9 9 9 - - - -|- - - - - - - - 7 7 7 8 8 8 · ·
right to left | left to right
and the wanted maxima are as indicated. The true maxima is then the maximum of those two values.
This obviously takes O(n log n) memory, since there are log n levels and each needs a complete row of values (though they can be indexes to save space). Updates, however, take O(n) time as they may propagate - adding a 10 to this would invalidate the whole bottom right-to-left row, for example. Mutations are obviously equally inefficient.
O(1) time by answering a different question
Depending on the context you need this for, you may find it is possible to truncate the search depth. This works if you are allowed some leeway in your slice relative to the size of the slice. Since the slices shrink geometrically, although a slice from 0:4294967295 takes a massive 22 iterations, truncating to a fixed quantity of 11 iterations gives the maximum of the slice 0:4292870144, a 0.05% difference. This may be acceptable.
O(1) expected time by exploiting probability
Rounding may be acceptable, but even if it is you're still doing an O(log n) algorithm - just with a smaller, fixed n. It is possible to do a lot better on randomly distributed data.
Consider one side of a forest. As you traverse down it, the fraction of numbers you've seen exceeds the fraction you haven't seen geometrically. Thus the probability that you've already seen the maximum increases in par. It makes sense that you can use this to your advantage.
Consider this half again:
---------------------|
0 7 0 4 5 7 4 3 4 6 3 8 2 4 · ·
7 4 7 4 6* 8 4 ·
7 7 8* 4
7* 8
8
After checking the 7*, don't immediately traverse to the 6*. Instead, check the smallest parent of all of the rest, which is the 8*. Only traverse down if this parent is larger than the maximum so far. If it is not, you can stop iterating. Only if it is larger do you need to continue traversing down. It just so happens that the largest values is one past the end here, so we traverse all the way down, but you can imagine this is unusual.
At least half of the time you only need to evaluate the first triangle, at least half of the rest you only need to look down once more, etc. This is a geometric sequence that shows the average traversal cost is two traversals; less if you include the fact that the remaining triangles can be less than half the size some of the time.
And in the worst case?
The worst case occurs with nonrandom trees. The most pathological is sorted data:
---------------------|
0 1 2 3 4 5 6 7 8 9 a b c d e f
1 3 5 7 9 b d f
3 7 b f
7 f
f
Since the maximum is always in the fragment of the range you haven't seen, regardless of which slice you choose. Thus the traversal is always O(log n). Unfortunately sorted data is frequent in practice, and this algorithm is hurt here (this property is shared with several other algorithms, like quicksort). It is possible to mitigate the harm, though.
Not dying on sorted data
If each node says whether it's sorted, or sorted in reverse, then upon reaching that node you don't need to do any more traversal - you can just take the first or last element in the subarray.
---------------------|
0 1 2 3 4 5 6 7 8 9 a b c d e f
→ → → → → → → →
→ → → →
→ →
→
You might find you instead have mostly-sorted data with some small randomization, though, which breaks the scheme:
---------------------|
0 1 2 3 4 5 6 7 a 9 a b d 0 e f
→ → → → ← → ← →
→ → b f
→ f
f
so instead each node can have the maximum number of levels down you can go whilst remaining sorted, and in which direction. You then skip down that many iterations. An example:
---------------------|
0 1 2 3 4 5 6 7 a 9 a b d 0 e f
→1 →1 →1 →1 ←1 →1 ←1 →1
0 3 5 7 a b d f
→2 →2 →1 →1
3 7 b f
→3 →2
7 f
→3
f
→n means if you skip down n levels the nodes will all be sorted left to right. The top node is →3 because three levels down is ordered: 0 3 5 7 a b d f. The direction is easy to encode in a single bit. Thus mostly-sortedness is handled gracefully.
This is easy to keep updated, because each node can calculate its value from its direct children. If they agree and are sorted in the same direction they agree, the the minimum distance and add one. Otherwise reset to a distance of 1 and point in the direction the children are sorted. The hardest part is the logic in traversal, which looks a bit finicky.
It is still possible to produce examples that require traversal all the way to the bottom, but they should not occur frequently in non-adversarial data.
I have discovered by coincidence the term for this problem:
Range minimum query
Unsurprisingly, this is a well-studied problem, albeit one it seems hard to search for. Wikipedia gives some solutions which are noticeably different to mine.
The O(1) time, O(n log n) space solution in particular is much more effective than my similar aside, since it allows appends in O(log n) time, which may suffice, rather than the terrible O(n) mine caused.
The other approaches are asymptotically decent, and the final result is particularly good. The O(log n) time, O(n) space solution is technically weaker than my final result, but log n is never large and it has better constant factors on search due to its linear scan of memory. Appending elements in both cases is amortized O(1), with the Wikipedia variant doing better with sufficient care. I would expect setting the block size to something fixed and applying the algorithm straightforwardly would be a practical win. In my case, even excessive block sizes of, say, 128 would be plenty fast for searches, and minimise both overhead on append and the constant factor of the space overhead.
The final constant-time approach seems like an academic result of little practical use.
I want to create a matrix from a vector by concatenating the vector onto itself n times. So if my vector is mx1, then my matrix will be mxn and each column of the matrix will be equal to the vector.
Which of the following is the best/correct way, or maybe there is a better way I do not know?
matrix = repmat(vector, 1, n);
matrix = vector * ones(1, n);
Thanks
Here is some benchmarking using timeit with different vector sizes and repetition factors. The results to be shown are for Matlab R2015b on Windows.
First define a function for each of the considered approaches:
%// repmat approach
function matrix = f_repmat(vector, n)
matrix = repmat(vector, 1, n);
%// multiply approach
function matrix = f_multiply(vector, n)
matrix = vector * ones(1, n);
%// indexing approach
function matrix = f_indexing(vector,n)
matrix = vector(:,ones(1,n));
Then generate vectors of different size, and use different repetition factors:
M = round(logspace(2,4,15)); %// vector sizes
N = round(logspace(2,3,15)); %// repetition factors
time_repmat = NaN(numel(M), numel(N)); %// preallocate results
time_multiply = NaN(numel(M), numel(N));
time_indexing = NaN(numel(M), numel(N));
for ind_m = 1:numel(M);
for ind_n = 1:numel(N);
vector = (1:M(ind_m)).';
n = N(ind_n);
time_repmat(ind_m, ind_n) = timeit(#() f_repmat(vector, n)); %// measure time
time_multiply(ind_m, ind_n) = timeit(#() f_multiply(vector, n));
time_indexing(ind_m, ind_n) = timeit(#() f_indexing(vector, n));
end
end
The results are plotted in the following two figures, using repmat as reference:
figure
imagesc(time_multiply./time_repmat)
set(gca, 'xtick',1:2:numel(N), 'xticklabels',N(1:2:end))
set(gca, 'ytick',1:2:numel(M), 'yticklabels',M(1:2:end))
title('Time of multiply / time of repmat')
axis image
colorbar
figure
imagesc(time_indexing./time_repmat)
set(gca, 'xtick',1:2:numel(N), 'xticklabels',N(1:2:end))
set(gca, 'ytick',1:2:numel(M), 'yticklabels',M(1:2:end))
title('Time of indexing / time of repmat')
axis image
colorbar
Perhaps a better comparison is to indicate, for each tested vector size and repetition factor, which of the three approaches is the fastest:
figure
times = cat(3, time_repmat, time_multiply, time_indexing);
[~, fastest] = min(times, [], 3);
imagesc(fastest)
set(gca, 'xtick',1:2:numel(N), 'xticklabels',N(1:2:end))
set(gca, 'ytick',1:2:numel(M), 'yticklabels',M(1:2:end))
title('1: repmat is fastest; 2: multiply is; 3: indexing is')
axis image
colorbar
Some conclusions can be drawn from the figures:
The multiply-based approach is always slower than repmat
The indexing-based approach is similar to repmat. It tends to be faster for large values of vector size or repetition factor, and slower for small values.
Either method is correct if they provide you with the desired output.
However, depending on how you declare your vector you may get incorrect results with repmat that will be spotted if you use ones. For instance take this example
>> v = 1:10;
>> m = v * ones(1, n)
Error using *
Inner matrix dimensions must agree.
>> m = repmat(v, 1, n)
m =
Columns 1 through 22
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2
Columns 23 through 44
3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4
Columns 45 through 50
5 6 7 8 9 10
ones provides an error to let you know you aren't doing the right thing but repmat doesn't. Whilst this example works correctly with both repmat and ones
>> v = (1:10).';
>> m = v * ones(1, n)
m =
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
10 10 10 10 10
>> m = repmat(v, 1, n)
m =
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
10 10 10 10 10
You can also do this -
vector(:,ones(1,n))
But, if I have to choose, repmat would be the go-to approach for me, as it is made exactly for this purpose. Also, depending on how you are going to use this replicated array, you can just avoid creating it altogether with bsxfun that does on-the-fly replication on its input arrays and some operation to be applied on the inputs. Here's a comparison on that - Comparing BSXFUN and REPMAT that shows bsxfun to be better than repmat in most cases.
Benchmarking
For the sake of performance, let's test out these. Here's a benchmarking code to do so -
%// Inputs
vector = rand(1000,1);
n = 1000;
%// Warm up tic/toc.
for iter = 1:50000
tic(); elapsed = toc();
end
disp(' ------- With REPMAT -------')
tic,
for iter = 1:200
A = repmat(vector, 1, n);
end
toc, clear A
disp(' ------- With vector(:,ones(1,n)) -------')
tic,
for iter = 1:200
A = vector(:,ones(1,n));
end
toc, clear A
disp(' ------- With vector * ones(1, n) -------')
tic,
for iter = 1:200
A = vector * ones(1, n);
end
toc
Runtime results -
------- With REPMAT -------
Elapsed time is 1.241546 seconds.
------- With vector(:,ones(1,n)) -------
Elapsed time is 1.212566 seconds.
------- With vector * ones(1, n) -------
Elapsed time is 3.023552 seconds.
Both are correct, but repmat is a more general solution for multi-dimensional matrix copying and is thus bound to be slower than an other solution. The specific 'homemade' solution of multiplying two vectors is possibly faster. It is probably even faster to do selecting instead of multiplying, i.e. vector(:,ones(n,1)) instead of vector*ones(1,n).
EDIT:
Type open repmat in your Command Window. As you can see, it is not a built-in function. You can see that it also makes use of ones (selecting) to copy matrices. However, since it is a more general solution (for scalars and multi-dimensional matrices and copies in multiple directions), you will find unnecessary if statements and other unnecessary code, effectively slowing things down.
EDIT:
Multiplying vectors with ones becomes slower for very large vectors. The unequivocal winner is using ones with selection, i.e. vector(:,ones(n,1)) (which should always be faster than repmat since it uses the same strategy).
The cycle leader iteration algorithm is an algorithm for shuffling an array by moving all even-numbered entries to the front and all odd-numbered entries to the back while preserving their relative order. For example, given this input:
a 1 b 2 c 3 d 4 e 5
the output would be
a b c d e 1 2 3 4 5
This algorithm runs in O(n) time and uses only O(1) space.
One unusual detail of the algorithm is that it works by splitting the array up into blocks of size 3k+1. Apparently this is critical for the algorithm to work correctly, but I have no idea why this is.
Why is the choice of 3k + 1 necessary in the algorithm?
Thanks!
This is going to be a long answer. The answer to your question isn't simple and requires some number theory to fully answer. I've spent about half a day working through the algorithm and I now have a good answer, but I'm not sure I can describe it succinctly.
The short version:
Breaking the input into blocks of size 3k + 1 essentially breaks the input apart into blocks of size 3k - 1 surrounded by two elements that do not end up moving.
The remaining 3k - 1 elements in the block move according to an interesting pattern: each element moves to the position given by dividing the index by two modulo 3k.
This particular motion pattern is connected to a concept from number theory and group theory called primitive roots.
Because the number two is a primitive root modulo 3k, beginning with the numbers 1, 3, 9, 27, etc. and running the pattern is guaranteed to cycle through all the elements of the array exactly once and put them into the proper place.
This pattern is highly dependent on the fact that 2 is a primitive root of 3k for any k ≥ 1. Changing the size of the array to another value will almost certainly break this because the wrong property is preserved.
The Long Version
To present this answer, I'm going to proceed in steps. First, I'm going to introduce cycle decompositions as a motivation for an algorithm that will efficiently shuffle the elements around in the right order, subject to an important caveat. Next, I'm going to point out an interesting property of how the elements happen to move around in the array when you apply this permutation. Then, I'll connect this to a number-theoretic concept called primitive roots to explain the challenges involved in implementing this algorithm correctly. Finally, I'll explain why this leads to the choice of 3k + 1 as the block size.
Cycle Decompositions
Let's suppose that you have an array A and a permutation of the elements of that array. Following the standard mathematical notation, we'll denote the permutation of that array as σ(A). We can line the initial array A up on top of the permuted array σ(A) to get a sense for where every element ended up. For example, here's an array and one of its permutations:
A 0 1 2 3 4
σ(A) 2 3 0 4 1
One way that we can describe a permutation is just to list off the new elements inside that permutation. However, from an algorithmic perspective, it's often more helpful to represent the permutation as a cycle decomposition, a way of writing out a permutation by showing how to form that permutation by beginning with the initial array and then cyclically permuting some of its elements.
Take a look at the above permutation. First, look at where the 0 ended up. In σ(A), the element 0 ended up taking the place of where the element 2 used to be. In turn, the element 2 ended up taking the place of where the element 0 used to be. We denote this by writing (0 2), indicating that 0 should go where 2 used to be, and 2 should go were 0 used to be.
Now, look at the element 1. The element 1 ended up where 4 used to be. The number 4 then ended up where 3 used to be, and the element 3 ended up where 1 used to be. We denote this by writing (1 4 3), that 1 should go where 4 used to be, that 4 should go where 3 used to be, and that 3 should go where 1 used to be.
Combining these together, we can represent the overall permutation of the above elements as (0 2)(1 4 3) - we should swap 0 and 2, then cyclically permute 1, 4, and 3. If we do that starting with the initial array, we'll end up at the permuted array that we want.
Cycle decompositions are extremely useful for permuting arrays in place because it's possible to permute any individual cycle in O(C) time and O(1) auxiliary space, where C is the number of elements in the cycle. For example, suppose that you have a cycle (1 6 8 4 2). You can permute the elements in the cycle with code like this:
int[] cycle = {1, 6, 8, 4, 2};
int temp = array[cycle[0]];
for (int i = 1; i < cycle.length; i++) {
swap(temp, array[cycle[i]]);
}
array[cycle[0]] = temp;
This works by just swapping everything around until everything comes to rest. Aside from the space usage required to store the cycle itself, it only needs O(1) auxiliary storage space.
In general, if you want to design an algorithm that applies a particular permutation to an array of elements, you can usually do so by using cycle decompositions. The general algorithm is the following:
for (each cycle in the cycle decomposition algorithm) {
apply the above algorithm to cycle those elements;
}
The overall time and space complexity for this algorithm depends on the following:
How quickly can we determine the cycle decomposition we want?
How efficiently can we store that cycle decomposition in memory?
To get an O(n)-time, O(1)-space algorithm for the problem at hand, we're going to show that there's a way to determine the cycle decomposition in O(1) time and space. Since everything will get moved exactly once, the overall runtime will be O(n) and the overall space complexity will be O(1). It's not easy to get there, as you'll see, but then again, it's not awful either.
The Permutation Structure
The overarching goal of this problem is to take an array of 2n elements and shuffle it so that even-positioned elements end up at the front of the array and odd-positioned elements end up at the end of the array. Let's suppose for now that we have 14 elements, like this:
0 1 2 3 4 5 6 7 8 9 10 11 12 13
We want to shuffle the elements so that they come out like this:
0 2 4 6 8 10 12 1 3 5 7 9 11 13
There are a couple of useful observations we can have about the way that this permutation arises. First, notice that the first element does not move in this permutation, because even-indexed elements are supposed to show up in the front of the array and it's the first even-indexed element. Next, notice that the last element does not move in this permutation, because odd-indexed elements are supposed to end up at the back of the array and it's the last odd-indexed element.
These two observations, put together, means that if we want to permute the elements of the array in the desired fashion, we actually only need to permute the subarray consisting of the overall array with the first and last elements dropped off. Therefore, going forward, we are purely going to focus on the problem of permuting the middle elements. If we can solve that problem, then we've solved the overall problem.
Now, let's look at just the middle elements of the array. From our above example, that means that we're going to start with an array like this one:
Element 1 2 3 4 5 6 7 8 9 10 11 12
Index 1 2 3 4 5 6 7 8 9 10 11 12
We want to get the array to look like this:
Element 2 4 6 8 10 12 1 3 5 7 9 11
Index 1 2 3 4 5 6 7 8 9 10 11 12
Because this array was formed by taking a 0-indexed array and chopping off the very first and very last element, we can treat this as a one-indexed array. That's going to be critically important going forward, so be sure to keep that in mind.
So how exactly can we go about generating this permutation? Well, for starters, it doesn't hurt to take a look at each element and to try to figure out where it began and where it ended up. If we do so, we can write things out like this:
The element at position 1 ended up at position 7.
The element at position 2 ended up at position 1.
The element at position 3 ended up at position 8.
The element at position 4 ended up at position 2.
The element at position 5 ended up at position 9.
The element at position 6 ended up at position 3.
The element at position 7 ended up at position 10.
The element at position 8 ended up at position 4.
The element at position 9 ended up at position 11.
The element at position 10 ended up at position 5.
The element at position 11 ended up at position 12.
The element at position 12 ended up at position 6.
If you look at this list, you can spot a few patterns. First, notice that the final index of all the even-numbered elements is always half the position of that element. For example, the element at position 4 ended up at position 2, the element at position 12 ended up at position 6, etc. This makes sense - we pushed all the even elements to the front of the array, so half of the elements that came before them will have been displaced and moved out of the way.
Now, what about the odd-numbered elements? Well, there are 12 total elements. Each odd-numbered element gets pushed to the second half, so an odd-numbered element at position 2k+1 will get pushed to at least position 7. Its position within the second half is given by the value of k. Therefore, the elements at an odd position 2k+1 gets mapped to position 7 + k.
We can take a minute to generalize this idea. Suppose that the array we're permuting has length 2n. An element at position 2x will be mapped to position x (again, even numbers get halfed), and an element at position 2x+1 will be mapped to position n + 1 + x. Restating this:
The final position of an element at position p is determined as follows:
If p = 2x for some integer x, then 2x ↦ x
If p = 2x+1 for some integer x, then 2x+1 ↦ n + 1 + x
And now we're going to do something that's entirely crazy and unexpected. Right now, we have a piecewise rule for determining where each element ends up: we either divide by two, or we do something weird involving n + 1. However, from a number-theoretic perspective, there is a single, unified rule explaining where all elements are supposed to end up.
The insight we need is that in both cases, it seems like, in some way, we're dividing the index by two. For the even case, the new index really is formed by just dividing by two. For the odd case, the new index kinda looks like it's formed by dividing by two (notice that 2x+1 went to x + (n + 1)), but there's an extra term in there. In a number-theoretic sense, though, both of these really correspond to division by two. Here's why.
Rather than taking the source index and dividing by two to get the destination index, what if we take the destination index and multiply by two? If we do that, an interesting pattern emerges.
Suppose our original number was 2x. The destination is then x, and if we double the destination index to get back 2x, we end up with the source index.
Now suppose that our original number was 2x+1. The destination is then n + 1 + x. Now, what happens if we double the destination index? If we do that, we get back 2n + 2 + 2x. If we rearrange this, we can alternatively rewrite this as (2x+1) + (2n+1). In other words, we've gotten back the original index, plus an extra (2n+1) term.
Now for the kicker: what if all of our arithmetic is done modulo 2n + 1? In that case, if our original number was 2x + 1, then twice the destination index is (2x+1) + (2n+1) = 2x + 1 (modulo 2n+1). In other words, the destination index really is half of the source index, just done modulo 2n+1!
This leads us to a very, very interesting insight: the ultimate destination of each of the elements in a 2n-element array is given by dividing that number by two, modulo 2n+1. This means that there really is a nice, unified rule for determining where everything goes. We just need to be able to divide by two modulo 2n+1. It just happens to work out that in the even case, this is normal integer division, and in the odd case, it works out to taking the form n + 1 + x.
Consequently, we can reframe our problem in the following way: given a 1-indexed array of 2n elements, how do we permute the elements so that each element that was originally at index x ends up at position x/2 mod (2n+1)?
Cycle Decompositions Revisited
At this point, we've made quite a lot of progress. Given any element, we know where that element should end up. If we can figure out a nice way to get a cycle decomposition of the overall permutation, we're done.
This is, unfortunately, where things get complicated. Suppose, for example, that our array has 10 elements. In that case, we want to transform the array like this:
Initial: 1 2 3 4 5 6 7 8 9 10
Final: 2 4 6 8 10 1 3 5 7 9
The cycle decomposition of this permutation is (1 6 3 7 9 10 5 8 4 2). If our array has 12 elements, we want to transform it like this:
Initial: 1 2 3 4 5 6 7 8 9 10 11 12
Final: 2 4 6 8 10 12 1 3 5 7 9 11
This has cycle decomposition (1 7 10 5 9 11 12 6 3 8 4 2 1). If our array has 14 elements, we want to transform it like this:
Initial: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Final: 2 4 6 8 10 12 14 1 3 5 7 9 11 13
This has cycle decomposition (1 8 4 2)(3 9 12 6)(5 10)(7 11 13 14). If our array has 16 elements, we want to transform it like this:
Initial: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Final: 2 4 6 8 10 12 14 16 1 3 5 7 9 11 13 15
This has cycle decomposition (1 9 13 15 16 8 4 2)(3 10 5 11 14 7 12 6).
The problem here is that these cycles don't seem to follow any predictable patterns. This is a real problem if we're going to try to solve this problem in O(1) space and O(n) time. Even though given any individual element we can figure out what cycle contains it and we can efficiently shuffle that cycle, it's not clear how we figure out what elements belong to what cycles, how many different cycles there are, etc.
Primitive Roots
This is where number theory comes in. Remember that each element's new position is formed by dividing that number by two, modulo 2n+1. Thinking about this backwards, we can figure out which number will take the place of each number by multiplying by two modulo 2n+1. Therefore, we can think of this problem by finding the cycle decomposition in reverse: we pick a number, keep multiplying it by two and modding by 2n+1, and repeat until we're done with the cycle.
This gives rise to a well-studied problem. Suppose that we start with the number k and think about the sequence k, 2k, 22k, 23k, 24k, etc., all done modulo 2n+1. Doing this gives different patterns depending on what odd number 2n+1 you're modding by. This explains why the above cycle patterns seem somewhat arbitrary.
I have no idea how anyone figured this out, but it turns out that there's a beautiful result from number theory that talks about what happens if you take this pattern mod 3k for some number k:
Theorem: Consider the sequence 3s, 3s·2, 3s·22, 3s·23, 3s·24, etc. all modulo 3k for some k ≥ s. This sequence cycles through through every number between 1 and 3k, inclusive, that is divisible by 3s but not divisible by 3s+1.
We can try this out on a few examples. Let's work modulo 27 = 32. The theorem says that if we look at 3, 3 · 2, 3 · 4, etc. all modulo 27, then we should see all the numbers less than 27 that are divisible by 3 and not divisible by 9. Well, let'see what we get:
3 · 20 = 3 · 1 = 3 = 3 mod 27
3 · 21 = 3 · 2 = 6 = 6 mod 27
3 · 22 = 3 · 4 = 12 = 12 mod 27
3 · 23 = 3 · 8 = 24 = 24 mod 27
3 · 24 = 3 · 16 = 48 = 21 mod 27
3 · 25 = 3 · 32 = 96 = 15 mod 27
3 · 26 = 3 · 64 = 192 = 3 mod 27
We ended up seeing 3, 6, 12, 15, 21, and 24 (though not in that order), which are indeed all the numbers less than 27 that are divisible by 3 but not divisible by 9.
We can also try this working mod 27 and considering 1, 2, 22, 23, 24 mod 27, and we should see all the numbers less than 27 that are divisible by 1 and not divisible by 3. In other words, this should give back all the numbers less than 27 that aren't divisible by 3. Let's see if that's true:
20 = 1 = 1 mod 27
21 = 2 = 2 mod 27
22 = 4 = 4 mod 27
23 = 8 = 8 mod 27
24 = 16 = 16 mod 27
25 = 32 = 5 mod 27
26 = 64 = 10 mod 27
27 = 128 = 20 mod 27
28 = 256 = 13 mod 27
29 = 512 = 26 mod 27
210 = 1024 = 25 mod 27
211 = 2048 = 23 mod 27
212 = 4096 = 19 mod 27
213 = 8192 = 11 mod 27
214 = 16384 = 22 mod 27
215 = 32768 = 17 mod 27
216 = 65536 = 7 mod 27
217 = 131072 = 14 mod 27
218 = 262144 = 1 mod 27
Sorting these, we got back the numbers 1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 20, 22, 23, 25, 26 (though not in that order). These are exactly the numbers between 1 and 26 that aren't multiples of three!
This theorem is crucial to the algorithm for the following reason: if 2n+1 = 3k for some number k, then if we process the cycle containing 1, it will properly shuffle all numbers that aren't multiples of three. If we then start the cycle at 3, it will properly shuffle all numbers that are divisible by 3 but not by 9. If we then start the cycle at 9, it will properly shuffle all numbers that are divisible by 9 but not by 27. More generally, if we use the cycle shuffle algorithm on the numbers 1, 3, 9, 27, 81, etc., then we will properly reposition all the elements in the array exactly once and will not have to worry that we missed anything.
So how does this connect to 3k + 1? Well, we need to have that 2n + 1 = 3k, so we need to have that 2n = 3k - 1. But remember - we dropped the very first and very last element of the array when we did this! Adding those back in tells us that we need blocks of size 3k + 1 for this procedure to work correctly. If the blocks are this size, then we know for certain that the cycle decomposition will consist of a cycle containing 1, a nonoverlapping cycle containing 3, a nonoverlapping cycle containing 9, etc. and that these cycles will contain all the elements of the array. Consequently, we can just start cycling 1, 3, 9, 27, etc. and be absolutely guaranteed that everything gets shuffled around correctly. That's amazing!
And why is this theorem true? It turns out that a number k for which 1, k, k2, k3, etc. mod pn that cycles through all the numbers that aren't multiples of p (assuming p is prime) is called a primitive root of the number pn. There's a theorem that says that 2 is a primitive root of 3k for all numbers k, which is why this trick works. If I have time, I'd like to come back and edit this answer to include a proof of this result, though unfortunately my number theory isn't at a level where I know how to do this.
Summary
This problem was tons of fun to work on. It involves cute tricks with dividing by two modulo an odd numbers, cycle decompositions, primitive roots, and powers of three. I'm indebted to this arXiv paper which described a similar (though quite different) algorithm and gave me a sense for the key trick behind the technique, which then let me work out the details for the algorithm you described.
Hope this helps!
Here is most of the mathematical argument missing from templatetypedef’s
answer. (The rest is comparatively boring.)
Lemma: for all integers k >= 1, we have
2^(2*3^(k-1)) = 1 + 3^k mod 3^(k+1).
Proof: by induction on k.
Base case (k = 1): we have 2^(2*3^(1-1)) = 4 = 1 + 3^1 mod 3^(1+1).
Inductive case (k >= 2): if 2^(2*3^(k-2)) = 1 + 3^(k-1) mod 3^k,
then q = (2^(2*3^(k-2)) - (1 + 3^(k-1)))/3^k.
2^(2*3^(k-1)) = (2^(2*3^(k-2)))^3
= (1 + 3^(k-1) + 3^k*q)^3
= 1 + 3*(3^(k-1)) + 3*(3^(k-1))^2 + (3^(k-1))^3
+ 3*(1+3^(k-1))^2*(3^k*q) + 3*(1+3^(k-1))*(3^k*q)^2 + (3^k*q)^3
= 1 + 3^k mod 3^(k+1).
Theorem: for all integers i >= 0 and k >= 1, we have
2^i = 1 mod 3^k if and only if i = 0 mod 2*3^(k-1).
Proof: the “if” direction follows from the Lemma. If
i = 0 mod 2*3^(k-1), then
2^i = (2^(2*3^(k-1)))^(i/(2*3^(k-1)))
= (1+3^k)^(i/(2*3^(k-1))) mod 3^(k+1)
= 1 mod 3^k.
The “only if” direction is by induction on k.
Base case (k = 1): if i != 0 mod 2, then i = 1 mod 2, and
2^i = (2^2)^((i-1)/2)*2
= 4^((i-1)/2)*2
= 2 mod 3
!= 1 mod 3.
Inductive case (k >= 2): if 2^i = 1 mod 3^k, then
2^i = 1 mod 3^(k-1), and the inductive hypothesis implies that
i = 0 mod 2*3^(k-2). Let j = i/(2*3^(k-2)). By the Lemma,
1 = 2^i mod 3^k
= (1+3^(k-1))^j mod 3^k
= 1 + j*3^(k-1) mod 3^k,
where the dropped terms are divisible by (3^(k-1))^2, so
j = 0 mod 3, and i = 0 mod 2*3^(k-1).
Q1 - At the following double-nested loop what will be the final value in m if loop does for n.
Of course it is not desired to do loop and see what the m is! Since n can be very large!
m = 0
for i = 1 to n-2
for j = i+1,n-1
for k = j+1,n
m += 1
Q2 - How did you find the answer? I mean what was the algorithm/technique that you used to solve the problem?
Q3 - What are your recommendation to solve similar problems?
Here is the answer that I was looking for:
Answer:
def ntn(n,k):
"""returns the number of iterations for k nested dependent loops(n)"""
return long(np.prod(n-np.arange(k,dtype=float)) /
np.prod(np.arange(k,dtype=float)+1))
example:
>>> ntn(1000,4)
41417124750L
>>> ntn(1e20,3)
166666666666666650797607483335462097315368077619447843520512L
Q3: Find a pattern to the question.
Q2: Assuming n:=10
Notice that i will loop from 1 to 8
Therefore, j will loop from
2 to 9
3 to 9
...
9 to 9
Therefore, k will loop from
loops value index
3 to 10, 4 to 10, 5 to 10, ..., 10 to 10 8 + 7 + 6 + ... + 1 8
4 to 10, 5 to 10, ..., 10 to 10 7 + 6 + ... + 1 7
5 to 10, ..., 10 to 10 6 + ... + 1 6
... ... ... ...
10 to 10 1 1
Notice the pattern here: if we start the index from the bottom number (1), to get the mth number in the sequence, you simply sum 1 through m.
Q1: You figure this one out on your own. Hint: it's a summation of summations...
Answer:
Combination formula is as follows:
can be used for this purpose.
In Python there is comb() function within scipy package which can be used too.
However the following solution is much more flexible and faster and the resulting digits are much longer.
import numpy as np
def ntn(n,k):
"""returns the number of iterations for k nested dependent loops(n)"""
return long(np.prod(n-np.arange(k,dtype=float)) /
np.prod(np.arange(k,dtype=float)+1))
Examples:
>>> ntn(1000,4)
41417124750L
>>> ntn(1e20,3)
166666666666666650797607483335462097315368077619447843520512L