Why is the last element of the list not shuffled in the Fischer-Yates algorithm? - shuffle

On the Wikipedia, it gives the amended Fischer-Yates shuffle algorithm as:
-- To shuffle an array a of n elements (indices 0..n-1):
for i from 0 to n−2 do
j ← random integer such that i ≤ j < n
exchange a[i] and a[j]
Why is the last element of the array not shuffled?

Putting #CollinD's comments as an answer:
If i == n-1, then it can only possibly be swapped with j = n-1 so why bother running an extra iteration?
At a given i, there is a 1/(n-i) chance to swap with any element in [i, n) (including i itself)

Related

Need help proving loop invariant (simple bubble sort, partial correctness)

The bubble-sort algorithm (pseudo-code):
Input: Array A[1...n]
for i <- n,...,2 do
for j <- 2,...,i do
if A[j - 1] >= A[j] then
swap the values of A[j-1] and A[j];
I am not sure but my proof seems to work, but is overly convoluted. Could you help me clean it up?
Loop-invariant: After each iteration i, the i - n + 1 greatest
elements of A are in the position they would be were A sorted
non-descendingly. In the case that array A contains more than one
maximal value, let the greatest element be the one with the smallest index
of all the possible maximal values.
Induction-basis (i = n): The inner loop iterates over every element of
A. Eventually, j points to the greatest element. This value will be
swapped until it reaches position i = n, which is the highest position
in array A and hence the final position for the greatest element of A.
Induction-step: (i = m -> i = m - 1 for all m > 3): The inner loop
iterates over every element of A. Eventually, j points to the greatest
element of the ones not yet sorted. This value will be swapped until
it reaches position i = m - 1, which is the highest position of the
positions not-yet-sorted in array A and hence the final position for
the greatest not-yet-sorted element of A.
After the algorithm was fully executed, the remaining element at
position 1 is also in its final position because were it not, the
element to its right side would not be in its final position, which is
a contradiction. Q.E.D.
I'd be inclined to recast your proof in the following terms:
Bubble sort A[1..n]:
for i in n..2
for j in 2..i
swap A[j - 1], A[j] if they are not already in order
Loop invariant:
let P(i) <=> for all k s.t. i < k <= n. A[k] = max(A[1..k])
Base case:
initially i = n and the invariant P(n) is trivially satisfied.
Induction step:
assuming the invariant holds for some P(m + 1),
show that after the inner loop executes, the invariant holds for P(m).

How to calculate GCD for every index in array with some constraints

Given a 1-indexed array A of size N, the distance between any
2 indices of this array i and j is given by |i−j|. Now, given this information, I need to find for every index i (1≤i≤N), an index j, such that 1≤j≤N, i≠j, and GCD(A[i],A[j])>1.
If there are multiple such candidates for an index i, have to find index j, such that the distance between i and j is minimal. If there still exist multiple candidates, print the minimum j satisfying the above constraints.
Example:
Array(A) 2 3 4 9 17
Output : 3 4 1 2 -1
Note: array size can be as large as 2*10^5.
and each array element can take max value 2*10^5 and min value 1.
I should be able to calculate this in 1 second at most.
Here is my code, but its exceeding time limit. Is there a way to optimize it.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
public class GCD {
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
int n = Integer.parseInt(br.readLine().trim());
int[] a = new int[n+1];
StringBuilder sb =new StringBuilder("");
String[] array = br.readLine().trim().split(" ");
for(int i=1; i<=n; i++){
a[i] = Integer.parseInt(array[i-1]);
}
int c,d;
l1: for(int i=1; i<=n; i++){
c = i-1;
d = i+1;
while((c>0||d<=n)){
if(c>0){
if(GCD(a[i],a[c])>1){
sb.append(c+" ");
continue l1;
}
}
if(d<=n){
if(GCD(a[i],a[d])>1){
sb.append(d+" ");
continue l1;
}
}
c--;
d++;
}
sb.append("-1 ");
}
System.out.println(sb);
}
static long GCD(int a, int b){
if(b==0)
return a;
return GCD(b, a%b);
}
}
You know the problem can be solved in one second. You know that array can have 200,000 elements. Comparing 200,000 elements with 200,000 elements takes 40 billion comparisons. If you're lucky your computer does 3 billion operations per second. You see that comparing 200,000 elements with 200,000 elements isn't going to work. (That would happen in the simple case where all array elements are equal to 1). So optimizing your code isn't going to help.
So move your mind away from the way the problem is posed. It asks to find j so that gcd (a [i], a [j]) != 1. What it really means is to find j so that a [j] has a prime factor in common with a [i]. And that j needs to be the largest j < i or the smallest j > i.
The numbers are small, less than 200,000. So you can find all the different prime factors of a a [i] very quickly.
So first you create an array "index": For each prime number p <= 200,000, index [p] is the index j of the last array element a [j] that you examined which had the prime factor p, or -1 if you didn't find any. You also create an array "solution": For each i that you examined, it contains the closest number so far or -1.
Iterate through the array for i = 1 to n: For each a [i], find all prime factors. For every factor p: If j = index [p] > 0 then a [j] is also divisible by p, so gcd (a [i], a [j]) > 1. Doing that you get the largest j < i with gcd (a [i], a [j]) > 1. Also update the array index as you find prime factors.
But also if you find that a [i] and a [j] have a common factor, then the solution that you stored for j might be wrong because it only considered indices less than j, so update solution as well. Pseudo-code:
Create array "index" filled with -1.
Create array "solution" filled with -1.
for all i
for all prime factors p of a [i]
let j = index [p]
index [p] = j
if j >= 0
if solution [i] = -1
solution [i] = j
else if j > solution [i]
solution [i] = j
if solution [j] = -1
solution [j] = i
else if solution [j] < j && i-j < j - solution [j]
solution [j] = i
print solution
You can see it doesn't matter at all how far the array element with a common factor is away. The execution time is a very small number times the number of prime factors, plus the time for finding the factors, which is worst if all elements are large primes. So all you need to do is find all the factors of any number < 200,000 in say 3-4 microseconds. Should be easy. You are allowed to create a table of prime numbers up to 500 before you start.
To run this under 1 second, your algorithm should be θ(N) or θ(N * log(N)), N<2*10^5. One way to do this can be:
Let us find all the factors except 1, of all the numbers, in 1 iteration of the array. Complexity = θ(N) * θ(GCD) = θ(N * log(N))
Make a hashmap, with key = the factor we just found and value = sorted array of indices of elements in input whose factors they are. (The arrays in hash map are made in order, so no need for explicit sorting. number of factors<θ(log(N))) Complexity = θ(N * log(N))
Now we iterate over the elements and in each element, iterate over the factors and for each factor, we find from the hash map where this factor is present in the nearest indices using binary search. We select the nearest value for all the factors for each element and provide this as the answer. Complexity = θ (N * log(N) * log(log(N))).

Bubble Sort Outer Loop and N-1

I've read multiple posts on Bubble Sort, but still have difficulty verbalizing why my code works, particularly with respect to the outer loop.
for (int i = 0; i < (n - 1); i++)
{
for (int j = 0; j < (n - i - 1); j++)
{
if (array[j] > array[j + 1])
{
int temp = array[j];
array[j] = array[j + 1];
array[j + 1] = temp;
}
}
}
For any array of length n, at most n-1 pairwise comparisons are possible. That said, if we stop at i < n-1, we never see the final element. If, in the worst case, the array's elements (I'm thinking ints here) are in reverse order, we cannot assume that it is in its proper place. So, if we never examine the final array element in the outer loop, how can this possibly work?
Array indexing is done as 0 to n-1. If there are 10 elements in array, indexing will be n-1. So in first, iteration of inner loop (n-1) comparison will take place. First pass of bubble sort will bubble up the largest number to its position.
In the next iteration (n-1-1) iteration will take place and it will bubble up the second largest value to its place and so on until the whole array is sorted.
In this line you are accessing 1 element ahead of current position of j
array[j + 1];
In first iteration of the loop you run j from 0 to j<(n-0-1), so last index of array which you can get is j less than n, as you accessing array[j+1]. So if you declare you array as array[n], this will get the last element for your array.
n is typically the number of elements in your array, so if 10 elements in the array, the elements would be indexed from 0 - 9. You would not want to access array[10] in the outer loop as this would yield a segfault for accessing out of array bounds, hence the use of "n -1" in the loop condition statement. In C, when writing and calling a function that includes iterating an array, the size of the array is also passed as a parameter.
The n means the "number of all the elements". The initial number in the loops is 0, ranging from 0 to (n-1); so we will get n elements; all the elements will be travelsaled.

Finding common elements in two arrays of different size

I have to find the best way to get common elements of two arrays of different size.
The arrays are unordered; the common elements are in different position, but in the same order (if in the array A common element b came after a, the same happens in array B) and with max distance N.
I can't use more additional space of O(N).
Actually I extract N elements from array A, order them with mergesort and perform a dicotomic search using N elements of array B. Then I get the next N elements from the position of the match I found and do another cycle.
The cost of this should be, using m as length of array B, O(m N log N)
I have tried using an hashtable, but to manage collisions I have to implement a List, and efficiency goes down.
There is a better way?
Assuming you can have "holes" in your matched sequence (A = [1,3,2] AND B = [1,4,2] Then MatchSet = {1,2})
Maybe I am wrong but you could try this pseudo code:
i <- 0; j <- 0; jHit <- -1
matchSet <- Empty
While i < Length(A) AND j < Length(B):
If A[i] == B[j] Then
matchSet.add(A[i])
i <- i+1
jHit <- j
End If
j <- j+1
If j == Length(B) Then
i <- i+1
j <- jHit+1
End If
End Loop
The first index (i) points to the next element of A not found in B whereas (j) is used to look for the next element of B (after the last element found in A).
This yould give you a time complexity of O(mN) and space usage of O(N).
Here you have an implementation in Python:
def match(A,B):
i = 0
j = 0
jHit = -1
res = []
while i < len(A) and j < len(B):
if A[i] == B[j]:
res.append(A[i])
i += 1
jHit = j
j += 1
if j == len(B):
i += 1
j = jHit+1
return res

Total number of possible triangles from n numbers

If n numbers are given, how would I find the total number of possible triangles? Is there any method that does this in less than O(n^3) time?
I am considering a+b>c, b+c>a and a+c>b conditions for being a triangle.
Assume there is no equal numbers in given n and it's allowed to use one number more than once. For example, we given a numbers {1,2,3}, so we can create 7 triangles:
1 1 1
1 2 2
1 3 3
2 2 2
2 2 3
2 3 3
3 3 3
If any of those assumptions isn't true, it's easy to modify algorithm.
Here I present algorithm which takes O(n^2) time in worst case:
Sort numbers (ascending order).
We will take triples ai <= aj <= ak, such that i <= j <= k.
For each i, j you need to find largest k that satisfy ak <= ai + aj. Then all triples (ai,aj,al) j <= l <= k is triangle (because ak >= aj >= ai we can only violate ak < a i+ aj).
Consider two pairs (i, j1) and (i, j2) j1 <= j2. It's easy to see that k2 (found on step 2 for (i, j2)) >= k1 (found one step 2 for (i, j1)). It means that if you iterate for j, and you only need to check numbers starting from previous k. So it gives you O(n) time complexity for each particular i, which implies O(n^2) for whole algorithm.
C++ source code:
int Solve(int* a, int n)
{
int answer = 0;
std::sort(a, a + n);
for (int i = 0; i < n; ++i)
{
int k = i;
for (int j = i; j < n; ++j)
{
while (n > k && a[i] + a[j] > a[k])
++k;
answer += k - j;
}
}
return answer;
}
Update for downvoters:
This definitely is O(n^2)! Please read carefully "An Introduction of Algorithms" by Thomas H. Cormen chapter about Amortized Analysis (17.2 in second edition).
Finding complexity by counting nested loops is completely wrong sometimes.
Here I try to explain it as simple as I could. Let's fix i variable. Then for that i we must iterate j from i to n (it means O(n) operation) and internal while loop iterate k from i to n (it also means O(n) operation). Note: I don't start while loop from the beginning for each j. We also need to do it for each i from 0 to n. So it gives us n * (O(n) + O(n)) = O(n^2).
There is a simple algorithm in O(n^2*logn).
Assume you want all triangles as triples (a, b, c) where a <= b <= c.
There are 3 triangle inequalities but only a + b > c suffices (others then hold trivially).
And now:
Sort the sequence in O(n * logn), e.g. by merge-sort.
For each pair (a, b), a <= b the remaining value c needs to be at least b and less than a + b.
So you need to count the number of items in the interval [b, a+b).
This can be simply done by binary-searching a+b (O(logn)) and counting the number of items in [b,a+b) for every possibility which is b-a.
All together O(n * logn + n^2 * logn) which is O(n^2 * logn). Hope this helps.
If you use a binary sort, that's O(n-log(n)), right? Keep your binary tree handy, and for each pair (a,b) where a b and c < (a+b).
Let a, b and c be three sides. The below condition must hold for a triangle (Sum of two sides is greater than the third side)
i) a + b > c
ii) b + c > a
iii) a + c > b
Following are steps to count triangle.
Sort the array in non-decreasing order.
Initialize two pointers ‘i’ and ‘j’ to first and second elements respectively, and initialize count of triangles as 0.
Fix ‘i’ and ‘j’ and find the rightmost index ‘k’ (or largest ‘arr[k]‘) such that ‘arr[i] + arr[j] > arr[k]‘. The number of triangles that can be formed with ‘arr[i]‘ and ‘arr[j]‘ as two sides is ‘k – j’. Add ‘k – j’ to count of triangles.
Let us consider ‘arr[i]‘ as ‘a’, ‘arr[j]‘ as b and all elements between ‘arr[j+1]‘ and ‘arr[k]‘ as ‘c’. The above mentioned conditions (ii) and (iii) are satisfied because ‘arr[i] < arr[j] < arr[k]'. And we check for condition (i) when we pick 'k'
4.Increment ‘j’ to fix the second element again.
Note that in step 3, we can use the previous value of ‘k’. The reason is simple, if we know that the value of ‘arr[i] + arr[j-1]‘ is greater than ‘arr[k]‘, then we can say ‘arr[i] + arr[j]‘ will also be greater than ‘arr[k]‘, because the array is sorted in increasing order.
5.If ‘j’ has reached end, then increment ‘i’. Initialize ‘j’ as ‘i + 1′, ‘k’ as ‘i+2′ and repeat the steps 3 and 4.
Time Complexity: O(n^2).
The time complexity looks more because of 3 nested loops. If we take a closer look at the algorithm, we observe that k is initialized only once in the outermost loop. The innermost loop executes at most O(n) time for every iteration of outer most loop, because k starts from i+2 and goes upto n for all values of j. Therefore, the time complexity is O(n^2).
I have worked out an algorithm that runs in O(n^2 lgn) time. I think its correct...
The code is wtitten in C++...
int Search_Closest(A,p,q,n) /*Returns the index of the element closest to n in array
A[p..q]*/
{
if(p<q)
{
int r = (p+q)/2;
if(n==A[r])
return r;
if(p==r)
return r;
if(n<A[r])
Search_Closest(A,p,r,n);
else
Search_Closest(A,r,q,n);
}
else
return p;
}
int no_of_triangles(A,p,q) /*Returns the no of triangles possible in A[p..q]*/
{
int sum = 0;
Quicksort(A,p,q); //Sorts the array A[p..q] in O(nlgn) expected case time
for(int i=p;i<=q;i++)
for(int j =i+1;j<=q;j++)
{
int c = A[i]+A[j];
int k = Search_Closest(A,j,q,c);
/* no of triangles formed with A[i] and A[j] as two sides is (k+1)-2 if A[k] is small or equal to c else its (k+1)-3. As index starts from zero we need to add 1 to the value*/
if(A[k]>c)
sum+=k-2;
else
sum+=k-1;
}
return sum;
}
Hope it helps........
possible answer
Although we can use binary search to find the value of 'k' hence improve time complexity!
N0,N1,N2,...Nn-1
sort
X0,X1,X2,...Xn-1 as X0>=X1>=X2>=...>=Xn-1
choice X0(to Xn-3) and choice form rest two item x1...
choice case of (X0,X1,X2)
check(X0<X1+X2)
OK is find and continue
NG is skip choice rest
It seems there is no algorithm better than O(n^3). In the worst case, the result set itself has O(n^3) elements.
For Example, if n equal numbers are given, the algorithm has to return n*(n-1)*(n-2) results.

Resources