What is the most efficient way to find all combinations of n choose 2 for 2 <= n <= 100000?
For example, 5 choose 2 is
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
This is what I have so far for testing the worst case:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define MAX_ITEMS 100000
void combinations(int[], int);
long long count = 0;
int main(void) {
int *arr = (int*) calloc(MAX_ITEMS, sizeof(int));
if (!arr) {
printf("Error allocating memory.");
exit(1);
}
int i, n = MAX_ITEMS;
for (i = 0; i < MAX_ITEMS; i++) {
arr[i] = i + 1;
}
clock_t start, diff;
int msec;
start = clock();
combinations(arr, n);
diff = clock() - start;
msec = diff * 1000 / CLOCKS_PER_SEC;
printf("\n\nTime taken %d seconds %d milliseconds", msec / 1000, msec % 1000);
printf("\n\nPairs = %lld\n", count);
return 0;
}
void combinations(int arr[], int n) {
int i, j, comb1, comb2, end = n - 1;
for (i = 0; i < end; i++) {
for (j = i + 1; j < n; j++) {
// simulate doing something with data at these indices
comb1 = arr[i];
comb2 = arr[j];
// printf("%d %d\n", arr[i], arr[j]);
count++;
}
}
}
OUTPUT
Time taken 28 seconds 799 milliseconds
Pairs = 4999950000
I could be mistaken but time complexity is O(n^2).
Is there a more efficient algorithm to handle the worst case?
There is no "best case" or "worst case". You need to generate exactly (n * (n - 1)) / 2 pairs, and your current program generates exactly those pairs and nothing else. Thus your program is optimal (in the algorithmic analysis sense) and is θ(n^2).
Some optimizations may be possible using various tricks (e.g. bitwise operations to go from one pair to the next, generating bulk pairs in one iteration, compiler optimizations, etc) but none would affect the time complexity of the algorithm.
Look at it this way, what if "n" was the number of pairs not the original size of the chose array. The complexity of your approach is O(n), not O (n^2). Notice you fill one index of the array for each iteration of the inner loop, regardless of your outter loop.
Given that, i dont think you can do significantly better. This is going to be a lower bound, you cant produce two pairs per step!! There for i would guess this is an optimal solution.
To continue -- if the output is n^2 the size of the input then your lower bound is always n^2 assuming you must touch every data point once. Which here you do.
Related
I have to print numbers between two limits n and m, t times.
I created t variable, and two pointers n, m that points to reserved blocks of memory for t integer values.
I use pointers instead of array to do faster operations.
Outer for loop iterates for every test cases and increasing m and n pointers.
Inner for loop prints primes from m[i] to n[i].
Code
#include <stdio.h>
#include <stdlib.h>
int is_prime(int);
int main(void) {
int t;
int *n = malloc(sizeof(int) * t);
int *m = malloc(sizeof(int) * t);
scanf("%d", &t);
for (int i = 0; i < t; i++, m++, n++) {
scanf("%d %d", &m[i], &n[i]);
for (int j = m[i]; j <= n[i]; j++) {
if (is_prime(j)) {
printf("%d\n", j);
}
}
if (i < t - 1) printf("\n");
}
return 0;
}
int is_prime(int num)
{
if (num <= 1) return 0;
if (num % 2 == 0 && num > 2) return 0;
for(int i = 3; i < num / 2; i+= 2){
if (num % i == 0)
return 0;
}
return 1;
}
Problem: http://www.spoj.com/problems/PRIME1/
Code is correctly compiling on http://ideone.com but I'm giving "time limit exceeded" error when I'm trying submit this code on SPOJ. How can I reduce execution time of this prime number generator?
As #Carcigenicate suggests, you're exceeding the time limit because your prime generator is too slow; and it's too slow since you're using an inefficient algorithm.
Indeed, you should not simply test each consecutive number for primality (which, by the way, you're also doing ineffectively), but rather rule out multiple values at once using known primes (and perhaps additional primes which you compute). For example, you don't need to check multiples of 5 and 10 (other than the actual value 5) for primality, since you know that 5 divides them. So just "mark" the multiples of various primes as irrelevant.
... and of course, that's just for getting you started, there are all sort of tricks you could use for optimization - algorithmic and implementation-related.
I know that you are looking for algorithm improvements, but the following technical optimizations might help:
If you are using Visual Studio, you can use alloca instead of malloc, so that n and m go in the stack instead of the heap.
You can also try to rewrite your algorithm using arrays instead of pointers to put n and m in the stack.
If you want to keep using pointers, use the __restrict keyword after the asterisks, which alerts the compiler that you don't make references of the two pointers.
You can even do it without using pointers or arrays
#include <stdio.h>
#include<math.h>
int is_prime(long n){
if (n == 1 || n % 2 == 0)
return 0;
if (n == 2)
return 1;
for (long i = 3; i <= sqrt(n); i += 2) {
if(n % i == 0)
return 0;
}
return 1;
}
int main() {
int t;
scanf("%d",&t);
while(t--) {
long n, m;
scanf("%ld %ld",&n,&m);
for (long i = n; i <= m; i++) {
if (is_prime(i) == 1)
printf("%ld\n",i);
}
}
return 0;
}
There are several ways to improve the primality check for an integer n. Here are a few that you might find useful.
Reduce the number of checks: A well known theorem is giving the fact that if you want to look for factors of n, let say n = a * b, then you can look for a divisor between 1 and sqrt(n). (Proof is quite easy, the main argument being that we have three cases, either a = b = sqrt(n), or we have a < sqrt(n) < b or b < sqrt(n) < a. And, whatever case we fall in, there will be a factor of n between 1 and sqrt(n)).
Use a Sieve of Eratosthenes: This way allows to discard unnecessary candidates which are previously disqualified (see Sieve of Eratosthenes (Wikipedia))
Use probabilistic algorithms: The most efficient way to check for primality nowadays is to use a probabilistic test. It is a bit more complex to implements but it is way more efficient. You can find a few of these techniques here (Wikipedia).
I have to print numbers between two limits n and m, t times.
I created t variable, and two pointers n, m that points to reserved blocks of memory for t integer values.
I use pointers instead of array to do faster operations.
Outer for loop iterates for every test cases and increasing m and n pointers.
Inner for loop prints primes from m[i] to n[i].
Code
#include <stdio.h>
#include <stdlib.h>
int is_prime(int);
int main(void) {
int t;
int *n = malloc(sizeof(int) * t);
int *m = malloc(sizeof(int) * t);
scanf("%d", &t);
for (int i = 0; i < t; i++, m++, n++) {
scanf("%d %d", &m[i], &n[i]);
for (int j = m[i]; j <= n[i]; j++) {
if (is_prime(j)) {
printf("%d\n", j);
}
}
if (i < t - 1) printf("\n");
}
return 0;
}
int is_prime(int num)
{
if (num <= 1) return 0;
if (num % 2 == 0 && num > 2) return 0;
for(int i = 3; i < num / 2; i+= 2){
if (num % i == 0)
return 0;
}
return 1;
}
Problem: http://www.spoj.com/problems/PRIME1/
Code is correctly compiling on http://ideone.com but I'm giving "time limit exceeded" error when I'm trying submit this code on SPOJ. How can I reduce execution time of this prime number generator?
As #Carcigenicate suggests, you're exceeding the time limit because your prime generator is too slow; and it's too slow since you're using an inefficient algorithm.
Indeed, you should not simply test each consecutive number for primality (which, by the way, you're also doing ineffectively), but rather rule out multiple values at once using known primes (and perhaps additional primes which you compute). For example, you don't need to check multiples of 5 and 10 (other than the actual value 5) for primality, since you know that 5 divides them. So just "mark" the multiples of various primes as irrelevant.
... and of course, that's just for getting you started, there are all sort of tricks you could use for optimization - algorithmic and implementation-related.
I know that you are looking for algorithm improvements, but the following technical optimizations might help:
If you are using Visual Studio, you can use alloca instead of malloc, so that n and m go in the stack instead of the heap.
You can also try to rewrite your algorithm using arrays instead of pointers to put n and m in the stack.
If you want to keep using pointers, use the __restrict keyword after the asterisks, which alerts the compiler that you don't make references of the two pointers.
You can even do it without using pointers or arrays
#include <stdio.h>
#include<math.h>
int is_prime(long n){
if (n == 1 || n % 2 == 0)
return 0;
if (n == 2)
return 1;
for (long i = 3; i <= sqrt(n); i += 2) {
if(n % i == 0)
return 0;
}
return 1;
}
int main() {
int t;
scanf("%d",&t);
while(t--) {
long n, m;
scanf("%ld %ld",&n,&m);
for (long i = n; i <= m; i++) {
if (is_prime(i) == 1)
printf("%ld\n",i);
}
}
return 0;
}
There are several ways to improve the primality check for an integer n. Here are a few that you might find useful.
Reduce the number of checks: A well known theorem is giving the fact that if you want to look for factors of n, let say n = a * b, then you can look for a divisor between 1 and sqrt(n). (Proof is quite easy, the main argument being that we have three cases, either a = b = sqrt(n), or we have a < sqrt(n) < b or b < sqrt(n) < a. And, whatever case we fall in, there will be a factor of n between 1 and sqrt(n)).
Use a Sieve of Eratosthenes: This way allows to discard unnecessary candidates which are previously disqualified (see Sieve of Eratosthenes (Wikipedia))
Use probabilistic algorithms: The most efficient way to check for primality nowadays is to use a probabilistic test. It is a bit more complex to implements but it is way more efficient. You can find a few of these techniques here (Wikipedia).
I have 2D array of size m*m with element values either 0s or 1s. Furthermore, each column of the array has a contiguous block of 1s (with 0 outside that block). The array itself is too large to be held in memory (as many as 10^6 rows), but for each column I can determine the lower bound, a, and the upper bound, b, of the 1s in that column. For a given n, I need to find out those n consecutive rows which have the maximum number of 1s. I can easily do it for smaller numbers by calculating the sum of each row one by one, and then choosing n consecutive rows whose sum comes out to be maximum, but for large numbers, it is consuming too much time. Is there any efficient way for calculating this? Perhaps using Dynamic Programming?
Here is an example code fragment showing my current approach, where successive calls to read_int() (not given here) provide the lower and upper bounds for successive columns:
long int harr[10000]={0}; //initialized to zero
for(int i=0;i<m;i++)
{
a=read_int();
b=read_int();
for(int j=a;j<=b;j++) // for finding sum of each row
harr[j]++;
}
answer=0;
for(int i=0;i<n;i++)
{
answer=answer+harr[i];
}
current=answer;
for(int i=n;i<m;i++)
{
current=current+harr[i]-harr[i-n];
if(current>answer)
{
answer=current;
}
}
For example (with m = 6 and n = 3)
Here the answer would be row 1 to row 3 with a total 1-count of 13 in those rows. (Row 2 to row 4 also maximizes the sum as there is a tie.)
Here is a different approach. Think of each pair a, b as defining an interval of the form [a,b+1). The task is to find the n consecutive indices which maximizes the sum of the parenthesis depth of the numbers in that interval. Every new a bumps the parenthesis depth at a up by 1. Every new b causes the parenthesis depth after b to go down by 1. In the first pass -- just load these parentheses depth deltas. Then one pass gets the parenthesis depths from these deltas. The following code illustrates this approach. I reduced m to 6 for testing purposes and replaced calls to the unkown read_int() by accesses to hard-wired arrays (which correspond to the example in the question):
#include <stdio.h>
int main(void){
int a,b,answer,current,lower,upper;
int n = 3;
int lower_bound[6] = {0,1,2,3,1,2};
int upper_bound[6] = {3,4,3,5,2,4};
int m = 6;
int harr[6]={0};
//load parenthesis depth-deltas (all initially 0)
for(int i=0;i<m;i++)
{
a = lower_bound[i];
b = upper_bound[i];
harr[a]++;
if(b < m-1)harr[b+1]--;
}
//determine p-depth at each point
for(int i = 1; i < m; i++){
harr[i] += harr[i-1];
}
//find optimal n-rows by sliding-window
answer = 0;
for(int i=0;i<n;i++)
{
answer = answer+harr[i];
}
current =answer;
lower = 0;
upper = n-1;
for(int i=n;i<m;i++)
{
current = current+harr[i]-harr[i-n];
if(current>answer)
{
answer = current;
lower = i-n+1;
upper = i;
}
}
printf("Max %d rows are %d to %d with a total sum of %d ones\n", n,lower,upper,answer);
return 0;
}
(Obviously, the loop which loads harr can be combined with the loop which computes answer. I kept it as two passes to better illustrate the logic of how the final harr values can be obtained from the parentheses deltas).
When this code is compiled and run its output is:
Max 3 rows are 1 to 3 with a total sum of 13 ones
I'm not sure how the following will scale for your 10^6 rows, but it manages the the trailing sum of x consecutive rows in a single pass without function call overhead. It may be worth a try. Also insure you are compiling with full optimizations so the compiler can add its 2 cents as well.
My original thought was to find some way to read x * n integers (from your m x n matrix) and in some fashion look at a population of set bits over that number of bytes. (checking the endianness) and taking either the first or last byte for each integer to check whether a bit was set. However, the logic seemed as costly as simply carrying the sum of the trailing x rows and stepping through the array while attempting to optimize the logic.
I don't have any benchmarks from your data to compare against, but perhaps this will give you another idea or two.:
#include <stdio.h>
#include <stdlib.h>
#ifndef CHAR_BIT
#define CHAR_BIT 8
#endif
#ifndef INT_MIN
#define INT_MIN -(1U << (sizeof (int) * CHAR_BIT - 1))
#endif
int main (int argc, char **argv) {
/* number of consecutive rows to sum */
size_t ncr = argc > 1 ? (size_t)atoi (argv[1]) : 3;
/* static array to test summing and row id logic, not
intended to simulate the 0's or 1's */
int a[][5] = {{1,2,3,4,5},
{2,3,4,5,6},
{3,4,5,6,7},
{4,5,6,7,8},
{3,4,5,6,7},
{0,1,2,3,4},
{1,2,3,4,5}};
int sum[ncr]; /* array holding sum on ncr rows */
int sumn = 0; /* sum of array values */
int max = INT_MIN; /* variable holding maximum sum */
size_t m, n, i, j, k, row = 0, sidx;
m = sizeof a / sizeof *a; /* matrix m x n dimensions */
n = sizeof *a / sizeof **a;
for (k = 0; k < ncr; k++) /* initialize vla values */
sum[k] = 0;
for (i = 0; i < m; i++) /* for each row */
{
sidx = i % ncr; /* index for sum array */
if (i > ncr - 1) { /* sum for ncr prior rows */
for (k = 0; k < ncr; k++)
sumn += sum[k];
/* note 'row' index assignment below is 1 greater
than actual but simplifies output loop indexes */
max = sumn > max ? row = i, sumn : max;
sum[sidx] = sumn = 0; /* zero index to be replaced and sumn */
}
for (j = 0; j < n; j++) /* compute sum for current row */
sum [sidx] += a[i][j];
}
/* output results */
printf ("\n The maximum sum for %zu consecutive rows: %d\n\n", ncr, max);
for (i = row - ncr; i < row; i++) {
printf (" row[%zu] : ", i);
for (j = 0; j < n; j++)
printf (" %d", a[i][j]);
printf ("\n");
}
return 0;
}
Example Output
$./bin/arraymaxn
The maximum sum for 3 consecutive rows: 80
row[2] : 3 4 5 6 7
row[3] : 4 5 6 7 8
row[4] : 3 4 5 6 7
$./bin/arraymaxn 4
The maximum sum for 4 consecutive rows: 100
row[1] : 2 3 4 5 6
row[2] : 3 4 5 6 7
row[3] : 4 5 6 7 8
row[4] : 3 4 5 6 7
$ ./bin/arraymaxn 2
The maximum sum for 2 consecutive rows: 55
row[2] : 3 4 5 6 7
row[3] : 4 5 6 7 8
Note: if there are multiple equivalent maximum consecutive rows (i.e. two sets of rows where the 1's add up the the same number), the first occurrence of the maximum is selected.
I'm not sure what optimizations you are choosing to compile with, but regardless which code you use, you can always try the simple hints to the compiler to inline all functions (if you have functions in your code) and fully optimize the code. Two helpful ones are:
gcc -finline-functions -Ofast
Well, there are lots of such questions available in SO as well as other forums. However, none of these helped.
I wrote a program in "C" to find number of primes within a range. The range i in long int. I am using Sieve of Eratosthenes" algorithm. I am using an array of long ints to store all the numbers from 1 till the limit. I could not think of a better approach to achieve without using an array. The code works fine, till 10000000. But after that, it runs out of memory and exits. Below is my code.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
typedef unsigned long uint_32;
int main() {
uint_32 i, N, *list, cross=0, j=4, k, primes_cnt = 0;
clock_t start, end;
double exec_time;
system("cls");
printf("Enter N\n");
scanf("%lu", &N);
list = (uint_32 *) malloc( (N+1) * sizeof(uint_32));
start = clock();
for(i=0; i<=N+1; i++) {
list[i] = i;
}
for(i=0; cross<=N/2; i++) {
if(i == 0)
cross = 2;
else if(i == 1)
cross = 3;
else {
for(j=cross+1; j<=N; j++) {
if(list[j] != 0){
cross = list[j];
break;
}
}
}
for(k=cross*2; k<=N; k+=cross) {
if(k <= N)
list[k] = 0;
}
}
for(i=2; i<=N; i++) {
if(list[i] == 0)
continue;
else
primes_cnt++;
}
printf("%lu", primes_cnt);
end = clock();
exec_time = (double) (end-start);
printf("\n%f", exec_time);
return 0;
}
I am stuck and can't think of a better way to achieve this. Any help will be hugely appreciated. Thanks.
Edit:
My aim is to generate and print all prime numbers below the range. As printing consumed a lot of time, I thought of getting the first step right.
There are other algorithm that does not require you to generate prime number up to N to count number of prime below N. The easiest algorithm to implement is Legendre Prime Counting. The algorithm requires you to generate only sqrt(N) prime to determine the number of prime below N.
The idea behind the algorithm is that
pi(n) = phi(n, sqrt(n)) + pi(sqrt(n)) - 1
where
pi(n) = number of prime below N
phi(n, m) = number of number below N that is not divisible by any prime below m.
That's mean phi(n, sqrt(n)) = number of prime between sqrt(n) to n. For how to calculate the phi, you can go to the following link (Feasible implementation of a Prime Counting Function)
The reason why it is more efficient is because it is easiest to compute phi(n, m) than to compute pi(n). Let say that I want to compute phi(100, 3) means that how many number below or equal to 100 that does not divisible by 2 and 3. You can do as following. phi(100, 3) = 100 - 100/2 - 100/3 + 100/6.
Your code uses about 32 times as much memory as it needs. Note that since you initialized list[i] = i the assignment cross = list[j] can be replaced with cross = j, making it possible to replace list with a bit vector.
However, this is not enough to bring the range to 264, because your implementation would require 261 bytes (2 exbibytes) of memory, so you need to optimize some more.
The next thing to notice is that you do not need to go up to N/2 when "crossing" the numbers: √N is sufficient (you should be able to prove this by thinking about the result of dividing a composite number by its divisors above √N). This brings memory requirements within your reach, because your "crossing" primes would fit in about 4 GB of memory.
Once you have an array of crossing primes, you can build a partial sieve for any range without keeping in memory all ranges that precede it. This is called the Segmented sieve. You can find details on it, along with a simple implementation, on the page of primesieve generator. Another advantage of this approach is that you can parallelize it, bringing the time down even further.
You can tweak the algorithm a bit to calculate the prime numbers in chunks.
Load a part of the array (as much as fits the memory), and in addition hold a list of all known prime numbers.
Whenever you load a chunk, first go through the already known prime numbers, and similar to the regular sieve, set all non primes as such.
Then, go over the array again, mark whatever you can, and add to the list the new prime numbers found.
When done, you'll have a list containing all your prime numbers.
I could see that the approach you are using is the basic implementation of Eratosthenes, that first stick out all the 2's multiple and then 3's multiple and so on.
But I have a better solution to the question. Actually, there is question on spoj PRINT. Please go through it and do check the constraints it follows. Below is my code snippet for this problem:
#include<stdio.h>
#include<math.h>
#include<cstdlib>
int num[46500] = {0},prime[5000],prime_index = -1;
int main() {
/* First, calculate the prime up-to the sqrt(N) (preferably greater than, but near to
sqrt(N) */
prime[++prime_index] = 2; int i,j,k;
for(i=3; i<216; i += 2) {
if(num[i] == 0) {
prime[++prime_index] = i;
for(j = i*i, k = 2*i; j<=46500; j += k) {
num[j] = 1;
}
}
}
for(; i<=46500; i+= 2) {
if(num[i] == 0) {
prime[++prime_index] = i;
}
}
int t; // Stands for number of test cases
scanf("%i",&t);
while(t--) {
bool arr[1000005] = {0}; int m,n,j,k;
scanf("%i%i",&m,&n);
if(m == 1)
m++;
if(m == 2 && m <= n) {
printf("2\n");
}
int sqt = sqrt(n) + 1;
for(i=0; i<=prime_index; i++) {
if(prime[i] > sqt) {
sqt = i;
break;
}
}
for(; m<=n && m <= prime[prime_index]; m++) {
if(m&1 && num[m] == 0) {
printf("%i\n",m);
}
}
if(m%2 == 0) {
m++;
}
for(i=1; i<=sqt; i++) {
j = (m%prime[i]) ? (m + prime[i] - m%prime[i]) : (m);
for(k=j; k<=n; k += prime[i]) {
arr[k-m] = 1;
}
}
for(i=0; i<=n-m; i += 2) {
if(!arr[i]) {
printf("%i\n",m+i);
}
}
printf("\n");
}
return 0;
}
I hope you got the point:
And, as you mentioned that your program is working fine up-to 10^7 but above it fails, it must be because you must be running out of the memory.
NOTE: I'm sharing my code only for knowledge purpose. Please, don't copy and paste it, until you get the point.
The objective of this problem is to be able to get the 2.000.000 first primes and be able to tell which the 2.000.000th prime is.
We start from this code:
#include <stdlib.h>
#include <stdio.h>
#define N 2000000
int p[N];
main(int na,char* arg[])
{
int i;
int pp,num;
printf("Number of primes to find: %d\n",N);
p[0] = 2;
p[1] = 3;
pp = 2;
num = 5;
while (pp < N)
{
for (i=1; p[i]*p[i] <= num ;i++)
if (num % p[i] == 0) break;
if (p[i]*p[i] > num) p[pp++]=num;
num += 2;
}
printf("The %d prime is: %d\n",N,p[N-1]);
exit(0);
}
Now we are asked to make this process threaded with via pragma omp. This is what I've done so far:
#include <stdlib.h>
#include <stdio.h>
#define N 2000000
#define D 1415
int p[N];
main(int na,char* arg[])
{
int i,j;
int pp,num;
printf("Number of primes to find: %d\n",N);
p[0] = 2;
p[1] = 3;
pp = 2;
num = 5;
while (pp < D)
{
for (i=1; p[i]*p[i] <= num ;i++)
if (num % p[i] == 0) break;
if (p[i]*p[i] > num) p[pp++]=num;
num += 2;
}
int success = 0;
int t_num;
int temp_num = num;
int total = pp;
#pragma omp parallel num_threads(4) private(j, t_num, num, success)
{
t_num = omp_get_thread_num();
num = temp_num + t_num*2;
#pragma omp for ordered schedule(static,4)
for(pp=D; pp<N; pp++) {
success = 0;
while(success==0) {
for (i=1; p[i]*p[i] <= num;i++) {
if (num % p[i] == 0) break;
}
if (p[i]*p[i] > num) {
p[pp] = num;
success=1;
}
num+=8;
}
}
}
//sort(p, 0, N);
printf("El %d primer es: %d\n",N,p[N-1]);
exit(0);
}
Now let me explain my "partial" solution, and therefore, my problem.
The first D primes are obtained with sequencial code, so now I can check the divisibility for a large amount of numbers.
Each thread runs a diagonal of primes so that there are no dependencies between threads and there's no need of syncronization. However, the problems with this approach are the following:
One thread may generate more primes than another thread
As a direct consequence of problem 1., it will generate N primes but they won't be ordered, so when the prime counter 'pp' reaches 'N', the last prime is not the 2.000.000th prime, it's a more advanced prime.
It also may be that by the time it generates 2.000.000 primes, the thread who can reach the real 2.000.000th prime may not have enought time to even put it on the prime array 'p'.
And the question/dilemma is:
How I can be able to know when the 2.000.000th prime is generated?
Hints:
I was told that I should do batches of ( let's say ) 10.000 candidates of primes. Then when something I don't know happends, I would know that the last batch of 10.000 candidates contains the 2.000.000th prime and I could just sort it with quicksort.
I hope I made myself clear, this is really tought exercise and I just tried non-stop for several days.
If all you need is 2000000 primes, you can maintain one ~4.1MB sized bitarray and flip bits on it for each found prime. No sort is needed. Halve your bitarray size by implementing odds-only representation scheme.
Use Sieve of Eratosthenes, in segments, with sizes proportional to sqrt(top_value_of_range) (or something similar - the goal is to have approximately same amount of work to be performed on each segment). For n=2000000, n*(log n + log(log n)) == 34366806, and prime[771]^2 == 34421689 (0-based), so, precalculate the first 771 odd primes.
Each worker can count, too, as it flips the bits, so you will know counts for each range when they are all finished, and will only need to scan through the one range that contains the 2mln-th prime, in the end, to find that prime. Or have each worker maintain its own bitarray according to its range - you will only have to keep one, and can discard the others.
The pseudocode for counting Sieve of Eratosthenes is:
Input: an integer n > 1
Let A be an array of bool values, indexed by integers 3, 5, ... upto n,
initially all set to true.
count := floor( (n-1)/2 )
for i = 3, 5, 7, ..., while i^2 ≤ n:
if A[i] is true:
for j = i^2, i^2 + 2i, i^2 + 4i, ..., while j ≤ n:
if A[j] is true:
A[j] := false
count := count - 1
Now all 'i's such that A[i] is true are prime,
and 'count' is the total count of odd primes found.
I can think of two approaches.
Once you've got a candidate for the 2 millionth prime, your threads continue calculating primes that are lower than your candidate until you have no primes missing. Then you can sort the list of primes and take the 2 millionth from that.
If your threads are producing blocks of sequential prime numbers, they should maintain the blocks separately and then the blocks of prime numbers can be subsequently reassembled into a master list. The thread that does the reassembly can terminate the program once it's found the 2 millionth prime.