Merge Process In-Place: - arrays

Merge Procedure in MergeSort cannot run in-place.
This is my explanation:
Does it add up?
A = 5,1,9,2,10,0
”q”: pointer to the middle of array at index 2.
We are merging element at q and those to its left with the elements to the right of q.
A= 1,5,10,0,2,12
We point to the beginning of the left side with ”i” and to that of the right with ”j”.
We point to the current position in the array with ”k”.
The algorithm starts with i=0, j=3, k=0.
If we merge in place:for k=0:i= 0, j= 3,0<1 -> A[k] =A[j] andj+ +
resulting array:A={0,5,10,0,2,12}
As we can see, we lost the value 1 already.
We will continue to lose values, for example in the next iteration:
for k=1:i= 0, j= 4,0<2 =⇒A[k] =A[i] andi+ +
resulting array:A={0,0,10,0,2,12}

It can be done using rotates of sub-arrays. There are in place merge sorts with O(n log(n)) time complexity, but these use a portion of the array as working storage. If stability is needed, then some small subset of unique values (like 2 sqrt(n)) are used to provide the space, since reordering unique values before sorting won't break stability. Getting back to to simple rotate algorithm:
1 5 10 0 2 12
0 1 5 10 2 12 0 < 1, rotate 0 into place, adjust working indexes
0 1 5 10 2 12 1 < 2, continue
0 1 2 5 10 12 2 < 5, rotate 2 into place, adjust working indexes
0 1 2 5 10 12 5 < 12, continue
0 1 2 5 10 12 10 < 12, continue, end of left run, done

Related

Array pattern issue to maintain uniformity

There is an existing array of size 64 that has values 6 values distributed as 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 ...
Please see the image for complete data.
The number of occurrence of 0 in the array is 11 times (at every 6th index), 1 is 11 times ... where as 4 and 5 occurs 10 times each.
There is a necessity to reduce the occurrence of any of these numbers [0 to 5] to a lesser number that could be any number from 0 to 10.
For example, it could be to reduce occurrence of 0 to 6 and 1 to 9.
I am looking for a solid idea to do this. Certainly all the numbers are to be evenly distributed and not something like 0 0 0 0 0 2 2 2 2 2 2 ...
I tried to find the index/position where the reduced value has to filled (64/occurrence of 0 or 2). But at times the index collide with each other and thus is not robust one.
From the example I quoted above, number of occurrence of 0 must be changed to 6 and occurrence of 1 to 9, the result after my algorithm is below -
New location to fill 0 = (Array size)/(new occurrence of 0) = 64/6 = ~10th index
New location to fill 1 = (Array size)/(new occurrence of 1) = 64/9 = ~7 index
For filling 6 0's and 9 1's, first the array is reset after which each of the values are filled to maintain balanced distribution.
After filling 6 0's, the array would be come like this:
Then, after filling 9 1's, the array would be come like this:
The index at 55 already has value 0 and apparently 8th 1 also index to 55 that creates a collision. So I believe, this algorithm to balance the distribution does not work.
How do I populate 6 's, 9 1's and rest of the numbers {2, 3, 4, 5} in the array in a balanced way?

Find a duplicate in array of integers

This was an interview question.
I was given an array of n+1 integers from the range [1,n]. The property of the array is that it has k (k>=1) duplicates, and each duplicate can appear more than twice. The task was to find an element of the array that occurs more than once in the best possible time and space complexity.
After significant struggling, I proudly came up with O(nlogn) solution that takes O(1) space. My idea was to divide range [1,n-1] into two halves and determine which of two halves contains more elements from the input array (I was using Pigeonhole principle). The algorithm continues recursively until it reaches the interval [X,X] where X occurs twice and that is a duplicate.
The interviewer was satisfied, but then he told me that there exists O(n) solution with constant space. He generously offered few hints (something related to permutations?), but I had no idea how to come up with such solution. Assuming that he wasn't lying, can anyone offer guidelines? I have searched SO and found few (easier) variations of this problem, but not this specific one. Thank you.
EDIT: In order to make things even more complicated, interviewer mentioned that the input array should not be modified.
Take the very last element (x).
Save the element at position x (y).
If x == y you found a duplicate.
Overwrite position x with x.
Assign x = y and continue with step 2.
You are basically sorting the array, it is possible because you know where the element has to be inserted. O(1) extra space and O(n) time complexity. You just have to be careful with the indices, for simplicity I assumed first index is 1 here (not 0) so we don't have to do +1 or -1.
Edit: without modifying the input array
This algorithm is based on the idea that we have to find the entry point of the permutation cycle, then we also found a duplicate (again 1-based array for simplicity):
Example:
2 3 4 1 5 4 6 7 8
Entry: 8 7 6
Permutation cycle: 4 1 2 3
As we can see the duplicate (4) is the first number of the cycle.
Finding the permutation cycle
x = last element
x = element at position x
repeat step 2. n times (in total), this guarantees that we entered the cycle
Measuring the cycle length
a = last x from above, b = last x from above, counter c = 0
a = element at position a, b = elment at position b, b = element at position b, c++ (so we make 2 steps forward with b and 1 step forward in the cycle with a)
if a == b the cycle length is c, otherwise continue with step 2.
Finding the entry point to the cycle
x = last element
x = element at position x
repeat step 2. c times (in total)
y = last element
if x == y then x is a solution (x made one full cycle and y is just about to enter the cycle)
x = element at position x, y = element at position y
repeat steps 5. and 6. until a solution was found.
The 3 major steps are all O(n) and sequential therefore the overall complexity is also O(n) and the space complexity is O(1).
Example from above:
x takes the following values: 8 7 6 4 1 2 3 4 1 2
a takes the following values: 2 3 4 1 2
b takes the following values: 2 4 2 4 2
therefore c = 4 (yes there are 5 numbers but c is only increased when making steps, not initially)
x takes the following values: 8 7 6 4 | 1 2 3 4
y takes the following values: | 8 7 6 4
x == y == 4 in the end and this is a solution!
Example 2 as requested in the comments: 3 1 4 6 1 2 5
Entering cycle: 5 1 3 4 6 2 1 3
Measuring cycle length:
a: 3 4 6 2 1 3
b: 3 6 1 4 2 3
c = 5
Finding the entry point:
x: 5 1 3 4 6 | 2 1
y: | 5 1
x == y == 1 is a solution
Here is a possible implementation:
function checkDuplicate(arr) {
console.log(arr.join(", "));
let len = arr.length
,pos = 0
,done = 0
,cur = arr[0]
;
while (done < len) {
if (pos === cur) {
cur = arr[++pos];
} else {
pos = cur;
if (arr[pos] === cur) {
console.log(`> duplicate is ${cur}`);
return cur;
}
cur = arr[pos];
}
done++;
}
console.log("> no duplicate");
return -1;
}
for (t of [
[0, 1, 2, 3]
,[0, 1, 2, 1]
,[1, 0, 2, 3]
,[1, 1, 0, 2, 4]
]) checkDuplicate(t);
It is basically the solution proposed by #maraca (typed too slowly!) It has constant space requirements (for the local variables), but apart from that only uses the original array for its storage. It should be O(n) in the worst case, because as soon as a duplicate is found, the process terminates.
If you are allowed to non-destructively modify the input vector, then it is pretty easy. Suppose we can "flag" an element in the input by negating it (which is obviously reversible). In that case, we can proceed as follows:
Note: The following assume that the vector is indexed starting at 1. Since it is probably indexed starting at 0 (in most languages), you can implement "Flag item at index i" with "Negate the item at index i-1".
Set i to 0 and do the following loop:
Increment i until item i is unflagged.
Set j to i and do the following loop:
Set j to vector[j].
if the item at j is flagged, j is a duplicate. Terminate both loops.
Flag the item at j.
If j != i, continue the inner loop.
Traverse the vector setting each element to its absolute value (i.e. unflag everything to restore the vector).
It depends what tools are you(your app) can use. Currently a lot of frameworks/libraries exists. For exmaple in case of C++ standart you can use std::map<> ,as maraca mentioned.
Or if you have time you can made your own implementation of binary tree, but you need to keep in mind that insert of elements differs in comarison with usual array. In this case you can optimise search of duplicates as it possible in your particular case.
binary tree expl. ref:
https://www.wikiwand.com/en/Binary_tree

How can I get no. of Swap operations to form the 2nd Array

I have got two arrays with same elements... (But in different order)
e.g 1 2 12 9 7 15 22 30
and 1 2 7 12 9 20 15 22
how many swaps operations are needed to form the 2nd array from the first.?
I have tried taking no. of different elements for each index and dividing the result by 2 but that isn't fetching me the right answer...
One classic algorithm seems to be permutation cycles (https://en.m.wikipedia.org/wiki/Cycle_notation#Cycle_notation). The number of swaps needed equals the total number of elements subtracted by the number of cycles.
For example:
1 2 3 4 5
2 5 4 3 1
Start with 1 and follow the cycle:
1 down to 2, 2 down to 5, 5 down to 1.
1 -> 2 -> 5 -> 1
3 -> 4 -> 3
We would need to swap index 1 with 5, then index 5 with 2; as well as index 3 with index 4. Altogether 3 swaps or n - 2. We subtract n by the number of cycles since cycle elements together total n and each cycle represents a swap less than the number of elements in it.
1) re-index elements from 0 to n-1. In your example, arrayA becomes 0..7 and arrayB becomes 0 1 4 2 3 7 5 6.
2) sort the second array using your swapping algorithm and count the number of operations.
A bit naive, but I think you can use recursion as follows (pseudo code):
function count_swaps(arr1, arr2):
unless both arrays contain the same objects return false
if arr1.len <= 1 return 0
else
if arr1[0] == arr2[0] return count_swaps(arr1.tail, arr2.tail)
else
arr2_tail = arr2.tail
i = index_of arr1[0] in arr2_tail
arr2_tail[i] = arr2[0]
return 1+count_swaps(arr1.tail, arr2_tail)
Here's a ruby implementation:
require 'set'
def count_swaps(a1, a2)
raise "Arrays do not have the same objects: #{a1} #{a2}" unless a1.length == a2.length && Set[*a1]==Set[*a2]
return count_swap_rec(a1, a2)
end
def count_swap_rec(a1, a2)
return 0 if a1.length <= 1
return count_swaps(a1[1..-1], a2[1..-1]) if a1[0] == a2[0]
a2_tail = a2[1..-1]
a2_tail[a2_tail.find_index(a1[0])] = a2[0]
return 1 + count_swaps(a1[1..-1], a2_tail)
end

Sorting an array where the only permitted operation is swapping 0 and another element

Given any permutation of the numbers {0, 1, 2,..., N-1}, it is easy to sort them in increasing order. But what if Swap(0, *) is the ONLY operation that is allowed to use? For example, to sort {4, 0, 2, 1, 3} we may apply the swap operations in the following way:
Swap(0, 1) => {4, 1, 2, 0, 3}
Swap(0, 3) => {4, 1, 2, 3, 0}
Swap(0, 4) => {0, 1, 2, 3, 4}
Now you are asked to find the minimum number of swaps need to sort the given permutation of the first N nonnegative integers.
This question comes from this website. I've included my code below, which takes too long to solve the problem. How can I solve this efficiently?
#include "stdio.h"
#define MAX 100001
void swap(int i);
int a[MAX];
int m,count=0,zeroindex,next=1;
int check()
{
int i;
if(a[0]!=0)
return -1;
else
{
for(i=next;i<m;i++)
{
if(a[i]!=i)
{
swap(i);
next=i; //just start form the next
return -1;
}
}
return 1;
}
}
main()
{
int n,i=0,j;
scanf("%d",&m);
for(i=0;i<m;i++)
{
scanf("%d",&a[i]);
if(a[i]==0)
zeroindex=i;
}
while(check()!=1)
{
for(i=0;i<m;i++)
{
if(a[i]==zeroindex)
{
swap(i);
break;
}
}
}
printf("%d",count);
}
void swap(int i)
{
int temp=a[zeroindex];
a[zeroindex]=a[i];
a[i]=temp;
zeroindex=i;
count++;
}
I think that the optimal solution to this problem involves finding a cycle decomposition of the original array and using that to guide what swaps you make.
There's a convenient notation for representing permutations in terms of smaller cycles within the permutation. For example, consider the array
5 4 0 1 3 2 7 6
The cycle decomposition of this array is (0 2 5)(1 3 4)(6 7). To see where this comes from, start by looking at the number 0. 0 is currently in the position that 2 should occupy. The number 2 is in the spot that 5 should occupy, and 5 is in the position that 0 should occupy. In this sense, the permutation contains a "cycle" (0 2 5) indicating that 0 is in 2's place, 2 is in 5's place, and (wrapping around) 5 is in 0's place. Independently, look at the number 1. 1 is in the spot where 3 should be, 3 is in the spot where 4 should be, and 4 is in the spot where 1 should be. Therefore, there's another cycle (1 3 4) in this permutation. The last cycle is (6 7).
In general, if you're allowed to make arbitrary swaps, the series of swaps required to sort a permutation that uses the fewest number of swaps works by splitting the permutation apart into its cycle decomposition and then using the swaps to repair the permutation. For example, to fix the cycle (0 2 5), the fastest option would be to swap 0 with 5, then swap 5 with 2. (This gives rise to the cycle sort algorithm).
We can adapt this idea here. Going back to our initial array 5 4 0 1 3 2 7 6, which has cycle decomposition (0 2 5)(1 3 4)(6 7), suppose that we want to fix the cycle (1 3 4). To do so, we can do the following:
5 4 0 1 3 2 7 6
5 4 1 0 3 2 7 6 (Swap 0 and 1)
5 4 1 3 0 2 7 6 (Swap 0 and 3)
5 0 1 3 4 2 7 6 (Swap 0 and 4)
5 1 0 3 4 2 7 6 (Swap 0 and 1)
Notice that the elements 1, 3, and 4 are now in the right place. The cycle decomposition of what remains is now (0 2 5)(6 7).
We can fix (6 7) as follows:
5 1 0 3 4 2 7 6
5 1 6 3 4 2 7 0 (Swap 0 and 6)
5 1 6 3 4 2 0 7 (Swap 0 and 7)
5 1 0 3 4 2 6 7 (Swap 0 and 6)
Now 6 and 7 are in the right place, and our remaining cycle decomposition is (0 5 2). That can easily be fixed:
5 1 0 3 4 2 6 7
5 1 2 3 4 0 6 7 (Swap 0 and 2)
0 1 2 3 4 5 6 7 (Swap 0 and 5)
More generally, this works as follows. To fix a cycle (c1, c2, ..., cn) that does not contain 0, do the following swaps:
Swap(0, c1)
Swap(0, c2)
Swap(0, c3)
...
Swap(0, cn)
Swap(0, c1)
To fix a cycle (0, c1, c2, ..., cn), do the following:
Swap(0, cn)
...
Swap(0, c3)
Swap(0, c2)
Swap(0, c1)
The challenge remains to show that this solution is optimal. To do so, let's begin by thinking about the number of swaps made. Suppose that we have a cycle decomposition for the array that contains cycles σ1, σ2, ..., σn. Fixing any individual cycle that does not contain zero requires k + 1 swaps, where k is the number of elements in the cycle. Fixing a cycle that does contain 0 requires k - 1 swaps, where k is the number of elements in the cycle. If we sum this up over all the cycles, we get that the cost is given by the total number of elements that are out of place (call that X), plus the number of cycles (call that C), minus two if 0 happens to be in a cycle (let Z be -2 if zero is out of place and 0 otherwise). In other words, the cost is X + C + Z.
I'm going to claim that there is no possible way to do better than this. Focus on any one individual cycle in the cycle decomposition (and, for simplicity, assume that the cycle doesn't contain zero). To move all of the elements in this cycle back to their original positions, every element in the cycle needs to be moved at least once. Since each swap involves the number 0, the number of total swaps necessary to move every item in the cycle at least once is k, where k is the total number of elements in the cycle. Additionally, we need one more swap because the very first swap we do introduces 0 into the cycle and we need to do another swap to get it out. A similar line of reasoning accounts for why a cycle containing 0 must use at least k - 1 swaps. Overall, this means that the optimal series of swaps is formed by getting the cycle decomposition and using the above approach to swap everything around.
The last question is how to find the cycle decomposition. This, fortunately, can be done efficiently. Let's go back to our original sample array:
5 4 0 1 3 2 7 6
Let's create an auxiliary array mapping each element to its index, which we can fill in in a single pass:
2 3 5 4 1 0 7 6
We can now construct the cycle decomposition as follows. Let's start at 0. The auxiliary array says that 0 is in position 2. Looking at position 2, we see that 2 is in position 5. Looking at position 5, we see that 5 is in index 0. This gives us the cycle (0 2 5). We can repeat this process across the array to build up the cycle decomposition, and from then can just play out the appropriate swaps to sort everything optimally.
Overall, the total time required is O(n), the total space required is O(n), and the number of swaps performed is minimized.
Hope this helps!
A simple approach:
1. Find all cycles of elements through their original positions.
2. Store size of each cycle in an array L[].
3. Initialise ans = 0.
4. While i = 1 to L.size():
if A[0] is not included in current cycle:
ans = ans + L[i] + 1;
else
ans = ans + L[i];
swap elements in array equivalently to correct this cycle.

Why does array size have to be 3^k+1 for cycle leader iteration algorithm to work?

The cycle leader iteration algorithm is an algorithm for shuffling an array by moving all even-numbered entries to the front and all odd-numbered entries to the back while preserving their relative order. For example, given this input:
a 1 b 2 c 3 d 4 e 5
the output would be
a b c d e 1 2 3 4 5
This algorithm runs in O(n) time and uses only O(1) space.
One unusual detail of the algorithm is that it works by splitting the array up into blocks of size 3k+1. Apparently this is critical for the algorithm to work correctly, but I have no idea why this is.
Why is the choice of 3k + 1 necessary in the algorithm?
Thanks!
This is going to be a long answer. The answer to your question isn't simple and requires some number theory to fully answer. I've spent about half a day working through the algorithm and I now have a good answer, but I'm not sure I can describe it succinctly.
The short version:
Breaking the input into blocks of size 3k + 1 essentially breaks the input apart into blocks of size 3k - 1 surrounded by two elements that do not end up moving.
The remaining 3k - 1 elements in the block move according to an interesting pattern: each element moves to the position given by dividing the index by two modulo 3k.
This particular motion pattern is connected to a concept from number theory and group theory called primitive roots.
Because the number two is a primitive root modulo 3k, beginning with the numbers 1, 3, 9, 27, etc. and running the pattern is guaranteed to cycle through all the elements of the array exactly once and put them into the proper place.
This pattern is highly dependent on the fact that 2 is a primitive root of 3k for any k ≥ 1. Changing the size of the array to another value will almost certainly break this because the wrong property is preserved.
The Long Version
To present this answer, I'm going to proceed in steps. First, I'm going to introduce cycle decompositions as a motivation for an algorithm that will efficiently shuffle the elements around in the right order, subject to an important caveat. Next, I'm going to point out an interesting property of how the elements happen to move around in the array when you apply this permutation. Then, I'll connect this to a number-theoretic concept called primitive roots to explain the challenges involved in implementing this algorithm correctly. Finally, I'll explain why this leads to the choice of 3k + 1 as the block size.
Cycle Decompositions
Let's suppose that you have an array A and a permutation of the elements of that array. Following the standard mathematical notation, we'll denote the permutation of that array as σ(A). We can line the initial array A up on top of the permuted array σ(A) to get a sense for where every element ended up. For example, here's an array and one of its permutations:
A 0 1 2 3 4
σ(A) 2 3 0 4 1
One way that we can describe a permutation is just to list off the new elements inside that permutation. However, from an algorithmic perspective, it's often more helpful to represent the permutation as a cycle decomposition, a way of writing out a permutation by showing how to form that permutation by beginning with the initial array and then cyclically permuting some of its elements.
Take a look at the above permutation. First, look at where the 0 ended up. In σ(A), the element 0 ended up taking the place of where the element 2 used to be. In turn, the element 2 ended up taking the place of where the element 0 used to be. We denote this by writing (0 2), indicating that 0 should go where 2 used to be, and 2 should go were 0 used to be.
Now, look at the element 1. The element 1 ended up where 4 used to be. The number 4 then ended up where 3 used to be, and the element 3 ended up where 1 used to be. We denote this by writing (1 4 3), that 1 should go where 4 used to be, that 4 should go where 3 used to be, and that 3 should go where 1 used to be.
Combining these together, we can represent the overall permutation of the above elements as (0 2)(1 4 3) - we should swap 0 and 2, then cyclically permute 1, 4, and 3. If we do that starting with the initial array, we'll end up at the permuted array that we want.
Cycle decompositions are extremely useful for permuting arrays in place because it's possible to permute any individual cycle in O(C) time and O(1) auxiliary space, where C is the number of elements in the cycle. For example, suppose that you have a cycle (1 6 8 4 2). You can permute the elements in the cycle with code like this:
int[] cycle = {1, 6, 8, 4, 2};
int temp = array[cycle[0]];
for (int i = 1; i < cycle.length; i++) {
swap(temp, array[cycle[i]]);
}
array[cycle[0]] = temp;
This works by just swapping everything around until everything comes to rest. Aside from the space usage required to store the cycle itself, it only needs O(1) auxiliary storage space.
In general, if you want to design an algorithm that applies a particular permutation to an array of elements, you can usually do so by using cycle decompositions. The general algorithm is the following:
for (each cycle in the cycle decomposition algorithm) {
apply the above algorithm to cycle those elements;
}
The overall time and space complexity for this algorithm depends on the following:
How quickly can we determine the cycle decomposition we want?
How efficiently can we store that cycle decomposition in memory?
To get an O(n)-time, O(1)-space algorithm for the problem at hand, we're going to show that there's a way to determine the cycle decomposition in O(1) time and space. Since everything will get moved exactly once, the overall runtime will be O(n) and the overall space complexity will be O(1). It's not easy to get there, as you'll see, but then again, it's not awful either.
The Permutation Structure
The overarching goal of this problem is to take an array of 2n elements and shuffle it so that even-positioned elements end up at the front of the array and odd-positioned elements end up at the end of the array. Let's suppose for now that we have 14 elements, like this:
0 1 2 3 4 5 6 7 8 9 10 11 12 13
We want to shuffle the elements so that they come out like this:
0 2 4 6 8 10 12 1 3 5 7 9 11 13
There are a couple of useful observations we can have about the way that this permutation arises. First, notice that the first element does not move in this permutation, because even-indexed elements are supposed to show up in the front of the array and it's the first even-indexed element. Next, notice that the last element does not move in this permutation, because odd-indexed elements are supposed to end up at the back of the array and it's the last odd-indexed element.
These two observations, put together, means that if we want to permute the elements of the array in the desired fashion, we actually only need to permute the subarray consisting of the overall array with the first and last elements dropped off. Therefore, going forward, we are purely going to focus on the problem of permuting the middle elements. If we can solve that problem, then we've solved the overall problem.
Now, let's look at just the middle elements of the array. From our above example, that means that we're going to start with an array like this one:
Element 1 2 3 4 5 6 7 8 9 10 11 12
Index 1 2 3 4 5 6 7 8 9 10 11 12
We want to get the array to look like this:
Element 2 4 6 8 10 12 1 3 5 7 9 11
Index 1 2 3 4 5 6 7 8 9 10 11 12
Because this array was formed by taking a 0-indexed array and chopping off the very first and very last element, we can treat this as a one-indexed array. That's going to be critically important going forward, so be sure to keep that in mind.
So how exactly can we go about generating this permutation? Well, for starters, it doesn't hurt to take a look at each element and to try to figure out where it began and where it ended up. If we do so, we can write things out like this:
The element at position 1 ended up at position 7.
The element at position 2 ended up at position 1.
The element at position 3 ended up at position 8.
The element at position 4 ended up at position 2.
The element at position 5 ended up at position 9.
The element at position 6 ended up at position 3.
The element at position 7 ended up at position 10.
The element at position 8 ended up at position 4.
The element at position 9 ended up at position 11.
The element at position 10 ended up at position 5.
The element at position 11 ended up at position 12.
The element at position 12 ended up at position 6.
If you look at this list, you can spot a few patterns. First, notice that the final index of all the even-numbered elements is always half the position of that element. For example, the element at position 4 ended up at position 2, the element at position 12 ended up at position 6, etc. This makes sense - we pushed all the even elements to the front of the array, so half of the elements that came before them will have been displaced and moved out of the way.
Now, what about the odd-numbered elements? Well, there are 12 total elements. Each odd-numbered element gets pushed to the second half, so an odd-numbered element at position 2k+1 will get pushed to at least position 7. Its position within the second half is given by the value of k. Therefore, the elements at an odd position 2k+1 gets mapped to position 7 + k.
We can take a minute to generalize this idea. Suppose that the array we're permuting has length 2n. An element at position 2x will be mapped to position x (again, even numbers get halfed), and an element at position 2x+1 will be mapped to position n + 1 + x. Restating this:
The final position of an element at position p is determined as follows:
If p = 2x for some integer x, then 2x ↦ x
If p = 2x+1 for some integer x, then 2x+1 ↦ n + 1 + x
And now we're going to do something that's entirely crazy and unexpected. Right now, we have a piecewise rule for determining where each element ends up: we either divide by two, or we do something weird involving n + 1. However, from a number-theoretic perspective, there is a single, unified rule explaining where all elements are supposed to end up.
The insight we need is that in both cases, it seems like, in some way, we're dividing the index by two. For the even case, the new index really is formed by just dividing by two. For the odd case, the new index kinda looks like it's formed by dividing by two (notice that 2x+1 went to x + (n + 1)), but there's an extra term in there. In a number-theoretic sense, though, both of these really correspond to division by two. Here's why.
Rather than taking the source index and dividing by two to get the destination index, what if we take the destination index and multiply by two? If we do that, an interesting pattern emerges.
Suppose our original number was 2x. The destination is then x, and if we double the destination index to get back 2x, we end up with the source index.
Now suppose that our original number was 2x+1. The destination is then n + 1 + x. Now, what happens if we double the destination index? If we do that, we get back 2n + 2 + 2x. If we rearrange this, we can alternatively rewrite this as (2x+1) + (2n+1). In other words, we've gotten back the original index, plus an extra (2n+1) term.
Now for the kicker: what if all of our arithmetic is done modulo 2n + 1? In that case, if our original number was 2x + 1, then twice the destination index is (2x+1) + (2n+1) = 2x + 1 (modulo 2n+1). In other words, the destination index really is half of the source index, just done modulo 2n+1!
This leads us to a very, very interesting insight: the ultimate destination of each of the elements in a 2n-element array is given by dividing that number by two, modulo 2n+1. This means that there really is a nice, unified rule for determining where everything goes. We just need to be able to divide by two modulo 2n+1. It just happens to work out that in the even case, this is normal integer division, and in the odd case, it works out to taking the form n + 1 + x.
Consequently, we can reframe our problem in the following way: given a 1-indexed array of 2n elements, how do we permute the elements so that each element that was originally at index x ends up at position x/2 mod (2n+1)?
Cycle Decompositions Revisited
At this point, we've made quite a lot of progress. Given any element, we know where that element should end up. If we can figure out a nice way to get a cycle decomposition of the overall permutation, we're done.
This is, unfortunately, where things get complicated. Suppose, for example, that our array has 10 elements. In that case, we want to transform the array like this:
Initial: 1 2 3 4 5 6 7 8 9 10
Final: 2 4 6 8 10 1 3 5 7 9
The cycle decomposition of this permutation is (1 6 3 7 9 10 5 8 4 2). If our array has 12 elements, we want to transform it like this:
Initial: 1 2 3 4 5 6 7 8 9 10 11 12
Final: 2 4 6 8 10 12 1 3 5 7 9 11
This has cycle decomposition (1 7 10 5 9 11 12 6 3 8 4 2 1). If our array has 14 elements, we want to transform it like this:
Initial: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Final: 2 4 6 8 10 12 14 1 3 5 7 9 11 13
This has cycle decomposition (1 8 4 2)(3 9 12 6)(5 10)(7 11 13 14). If our array has 16 elements, we want to transform it like this:
Initial: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Final: 2 4 6 8 10 12 14 16 1 3 5 7 9 11 13 15
This has cycle decomposition (1 9 13 15 16 8 4 2)(3 10 5 11 14 7 12 6).
The problem here is that these cycles don't seem to follow any predictable patterns. This is a real problem if we're going to try to solve this problem in O(1) space and O(n) time. Even though given any individual element we can figure out what cycle contains it and we can efficiently shuffle that cycle, it's not clear how we figure out what elements belong to what cycles, how many different cycles there are, etc.
Primitive Roots
This is where number theory comes in. Remember that each element's new position is formed by dividing that number by two, modulo 2n+1. Thinking about this backwards, we can figure out which number will take the place of each number by multiplying by two modulo 2n+1. Therefore, we can think of this problem by finding the cycle decomposition in reverse: we pick a number, keep multiplying it by two and modding by 2n+1, and repeat until we're done with the cycle.
This gives rise to a well-studied problem. Suppose that we start with the number k and think about the sequence k, 2k, 22k, 23k, 24k, etc., all done modulo 2n+1. Doing this gives different patterns depending on what odd number 2n+1 you're modding by. This explains why the above cycle patterns seem somewhat arbitrary.
I have no idea how anyone figured this out, but it turns out that there's a beautiful result from number theory that talks about what happens if you take this pattern mod 3k for some number k:
Theorem: Consider the sequence 3s, 3s·2, 3s·22, 3s·23, 3s·24, etc. all modulo 3k for some k ≥ s. This sequence cycles through through every number between 1 and 3k, inclusive, that is divisible by 3s but not divisible by 3s+1.
We can try this out on a few examples. Let's work modulo 27 = 32. The theorem says that if we look at 3, 3 · 2, 3 · 4, etc. all modulo 27, then we should see all the numbers less than 27 that are divisible by 3 and not divisible by 9. Well, let'see what we get:
3 · 20 = 3 · 1 = 3 = 3 mod 27
3 · 21 = 3 · 2 = 6 = 6 mod 27
3 · 22 = 3 · 4 = 12 = 12 mod 27
3 · 23 = 3 · 8 = 24 = 24 mod 27
3 · 24 = 3 · 16 = 48 = 21 mod 27
3 · 25 = 3 · 32 = 96 = 15 mod 27
3 · 26 = 3 · 64 = 192 = 3 mod 27
We ended up seeing 3, 6, 12, 15, 21, and 24 (though not in that order), which are indeed all the numbers less than 27 that are divisible by 3 but not divisible by 9.
We can also try this working mod 27 and considering 1, 2, 22, 23, 24 mod 27, and we should see all the numbers less than 27 that are divisible by 1 and not divisible by 3. In other words, this should give back all the numbers less than 27 that aren't divisible by 3. Let's see if that's true:
20 = 1 = 1 mod 27
21 = 2 = 2 mod 27
22 = 4 = 4 mod 27
23 = 8 = 8 mod 27
24 = 16 = 16 mod 27
25 = 32 = 5 mod 27
26 = 64 = 10 mod 27
27 = 128 = 20 mod 27
28 = 256 = 13 mod 27
29 = 512 = 26 mod 27
210 = 1024 = 25 mod 27
211 = 2048 = 23 mod 27
212 = 4096 = 19 mod 27
213 = 8192 = 11 mod 27
214 = 16384 = 22 mod 27
215 = 32768 = 17 mod 27
216 = 65536 = 7 mod 27
217 = 131072 = 14 mod 27
218 = 262144 = 1 mod 27
Sorting these, we got back the numbers 1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 20, 22, 23, 25, 26 (though not in that order). These are exactly the numbers between 1 and 26 that aren't multiples of three!
This theorem is crucial to the algorithm for the following reason: if 2n+1 = 3k for some number k, then if we process the cycle containing 1, it will properly shuffle all numbers that aren't multiples of three. If we then start the cycle at 3, it will properly shuffle all numbers that are divisible by 3 but not by 9. If we then start the cycle at 9, it will properly shuffle all numbers that are divisible by 9 but not by 27. More generally, if we use the cycle shuffle algorithm on the numbers 1, 3, 9, 27, 81, etc., then we will properly reposition all the elements in the array exactly once and will not have to worry that we missed anything.
So how does this connect to 3k + 1? Well, we need to have that 2n + 1 = 3k, so we need to have that 2n = 3k - 1. But remember - we dropped the very first and very last element of the array when we did this! Adding those back in tells us that we need blocks of size 3k + 1 for this procedure to work correctly. If the blocks are this size, then we know for certain that the cycle decomposition will consist of a cycle containing 1, a nonoverlapping cycle containing 3, a nonoverlapping cycle containing 9, etc. and that these cycles will contain all the elements of the array. Consequently, we can just start cycling 1, 3, 9, 27, etc. and be absolutely guaranteed that everything gets shuffled around correctly. That's amazing!
And why is this theorem true? It turns out that a number k for which 1, k, k2, k3, etc. mod pn that cycles through all the numbers that aren't multiples of p (assuming p is prime) is called a primitive root of the number pn. There's a theorem that says that 2 is a primitive root of 3k for all numbers k, which is why this trick works. If I have time, I'd like to come back and edit this answer to include a proof of this result, though unfortunately my number theory isn't at a level where I know how to do this.
Summary
This problem was tons of fun to work on. It involves cute tricks with dividing by two modulo an odd numbers, cycle decompositions, primitive roots, and powers of three. I'm indebted to this arXiv paper which described a similar (though quite different) algorithm and gave me a sense for the key trick behind the technique, which then let me work out the details for the algorithm you described.
Hope this helps!
Here is most of the mathematical argument missing from templatetypedef’s
answer. (The rest is comparatively boring.)
Lemma: for all integers k >= 1, we have
2^(2*3^(k-1)) = 1 + 3^k mod 3^(k+1).
Proof: by induction on k.
Base case (k = 1): we have 2^(2*3^(1-1)) = 4 = 1 + 3^1 mod 3^(1+1).
Inductive case (k >= 2): if 2^(2*3^(k-2)) = 1 + 3^(k-1) mod 3^k,
then q = (2^(2*3^(k-2)) - (1 + 3^(k-1)))/3^k.
2^(2*3^(k-1)) = (2^(2*3^(k-2)))^3
= (1 + 3^(k-1) + 3^k*q)^3
= 1 + 3*(3^(k-1)) + 3*(3^(k-1))^2 + (3^(k-1))^3
+ 3*(1+3^(k-1))^2*(3^k*q) + 3*(1+3^(k-1))*(3^k*q)^2 + (3^k*q)^3
= 1 + 3^k mod 3^(k+1).
Theorem: for all integers i >= 0 and k >= 1, we have
2^i = 1 mod 3^k if and only if i = 0 mod 2*3^(k-1).
Proof: the “if” direction follows from the Lemma. If
i = 0 mod 2*3^(k-1), then
2^i = (2^(2*3^(k-1)))^(i/(2*3^(k-1)))
= (1+3^k)^(i/(2*3^(k-1))) mod 3^(k+1)
= 1 mod 3^k.
The “only if” direction is by induction on k.
Base case (k = 1): if i != 0 mod 2, then i = 1 mod 2, and
2^i = (2^2)^((i-1)/2)*2
= 4^((i-1)/2)*2
= 2 mod 3
!= 1 mod 3.
Inductive case (k >= 2): if 2^i = 1 mod 3^k, then
2^i = 1 mod 3^(k-1), and the inductive hypothesis implies that
i = 0 mod 2*3^(k-2). Let j = i/(2*3^(k-2)). By the Lemma,
1 = 2^i mod 3^k
= (1+3^(k-1))^j mod 3^k
= 1 + j*3^(k-1) mod 3^k,
where the dropped terms are divisible by (3^(k-1))^2, so
j = 0 mod 3, and i = 0 mod 2*3^(k-1).

Resources