I'm working on speeding up Conway's Game of Life. Right now, the code looks at a cell and then adds up the 3x3 area immediately surrounding the point, then subtracts the value at the point we're looking at. Here's the function that is doing that:
static int neighbors2 (board b, int i, int j)
{
int n = 0;
int i_left = max(0,i-1);
int i_right = min(HEIGHT, i+2);
int j_left = max(0,j-1);
int j_right = min(WIDTH, j+2);
int ii, jj;
for (jj = j_left; jj < j_right; ++jj) {
for (ii = i_left; ii < i_right; ii++) {
n += b[ii][jj];
}
}
return n - b[i][j];
}
And here is the code I've been trying to use to iterate through pieces at a time:
//Iterates through the first row of the 3x3 area
static int first_row(board b, int i, int j) {
int f = 0;
int i_left = max(0,i-1);
int j_left = max(0,j-1);
int j_right = min(WIDTH, j+2);
int jj;
for (jj = j_left; jj < j_right; ++jj) {
f += b[i_left][jj];
}
return f;
}
//Iterates and adds up the second row of the 3x3 area
static int second_row(board b, int i, int j) {
int g = 0;
int i_right = min(HEIGHT, i+2);
int j_left = max(0,j-1);
int j_right = min(WIDTH, j+2);
int jj;
if (i_right != i) {
for (jj = j_left; jj < j_right; ++jj) {
g += b[i][jj];
}
}
return g;
}
//iterates and adds up the third row of the 3x3 area.
static int third_row(board b, int i, int j) {
int h = 0;
int i_right = min(HEIGHT, i+2);
int j_left = max(0,j-1);
int j_right = min(WIDTH, j+2);
int jj;
for (jj = j_left; jj < j_right; ++jj) {
h += b[i_right][jj];
}
return h;
}
//adds up the surrounding spots
//subtracts the spot we're looking at.
static int addUp(board b, int i, int j) {
int n = first_row(b, i, j) + second_row(b, i, j) + third_row(b, i, j);
return n - b[i][j];
}
But, for some reason it isn't working. I have no idea why.
Things to note:
sometimes i == i_right, so we do not want to add up a row twice.
The three functions are supposed to do the exact same thing as neighbors2 in separate pieces.
min and max are functions that were premade for me.
sometimes sometimes j == j_right, so we do not want to add up something twice. I'm pretty confident the loop takes care of this however.
Tips and things to consider are appreciated.
Thanks all. I've been working on this for a couple hours now and have no idea what is going wrong. It seems like it should work but I keep getting incorrect solutions at random spots among the board.
In neighbors2, you set i_left and i_right so that the're limited to the rows of the grid. If the current cell is in the top or bottom row, you only loop through two rows instead of 3.
In first_row() and last_row() you also limit it to the rows of the grid. But the result is that these functions will add the cells on the same row as the current cell, which is what second_row does. So you end up adding those rows twice.
You shouldn't call first_row() when i = 0, and you shouldn't call third_row() when i == HEIGHT.
static int addUp(board b, int i, int j) {
int n = (i == 0 ? 0 : first_row(b, i, j)) +
second_row(b, i, j) +
(i == HEIGHT ? 0 : third_row(b, i, j));
return n - b[i][j];
}
Another option would be to do the check in the functions themselves:
function first_row((board b, int i, int j) {
if (i == 0) {
return 0;
}
int f = 0;
int j_left = max(0,j-1);
int j_right = min(WIDTH, j+2);
int jj;
for (jj = j_left; jj < j_right; ++jj) {
f += b[i][jj];
}
return f;
}
and similarly for third_row(). But doing it in the caller saves the overhead of the function calls.
BTW, your variable names are very confusing. All the i variables are for rows, which go from top to bottom, not left to right.
#include <stdio.h>
#include <stdlib.h>
#define ROWSDISP 50
#define COLSDISP 100
int rows=ROWSDISP+2, cols=COLSDISP+2;
This is to avoid illegal indexes when stepping over the neighbours.
struct onecell {char alive;
char neibs;} **cells;
This is the foundation of a (dynamic) 2D-array, of a small struct.
To create space for each row plus the space to hold an array of row pointers:
void init_cells()
{
int i;
cells = calloc(rows, sizeof(*cells));
for(i=0; i<=rows-1; i++)
cells[i] = calloc(cols, sizeof(**cells));
}
I skip the rand_fill() and glider() funcs. A cell can be set by
cells[y][x].alive=1.
int main(void) {
struct onecell *c, *n1, *rlow;
int i, j, loops=0;
char nbs;
init_cells();
rand_fill();
glider();
while (loops++ < 1000) {
printf("\n%d\n", loops);
for (i = 1; i <= rows-2; i++) {
for (j = 1; j <= cols-2; j++) {
c = &cells[ i ][ j ];
n1 = &cells[ i ][j+1];
rlow = cells[i+1];
nbs = c->neibs + n1->alive + rlow[ j ].alive
+ rlow[j+1].alive
+ rlow[j-1].alive;
if(c->alive) {
printf("#");
n1->neibs++;
rlow[ j ].neibs++;
rlow[j+1].neibs++;
rlow[j-1].neibs++;
if(nbs < 2 || nbs > 3)
c->alive = 0;
} else {
printf(" ");
if(nbs == 3)
c->alive = 1;
}
c->neibs = 0; // reset for next cycle
}
printf("\n");
}
}
return(0);
}
There is no iterating a 3x3 square here. Of the 8 neighbours,
only the 4 downstream ones are checked; but at the same time
their counters are raised.
A benchmark with 100x100 grid:
# time ./a.out >/dev/null
real 0m0.084s
user 0m0.084s
sys 0m0.000s
# bc <<<100*100*1000/.084
119047619
And each of these 100M cells needs to check 8 neighbours, so this is close to the CPU frequency (1 neighbour check per cycle).
It seems twice as fast as the rosetta code solution.
There also is no need to switch the boards. Thanks to the investment in the second field of a cell.
Related
I'm building the public API for Game of Life and I don't know how to code this part.
This is a picture of "flow chart" we're suppose to follow:
Steps :
These are the public APIs we're supposed to fill out;
void gofl_get_world(int n_rows, int n_cols, cell_t world[][n_cols], double percent_alive){}
this first function should: Get the initial randomly ordered distribution of cells in the world. This is the usage: gofl_get_world(MAX_ROWS, MAX_COLS, world, 0.1)
void gofl_next_state(int n_rows, int n_cols, cell_t world[][n_cols]){}
This function does this: Calculate the next state of the world according to the rules and the actual state. I.e. this will mark all cells in the world as alive or dead.
Here are the functions we're supposed to build the public API from, I've made these myself as well so they are not pre-defined (they are tested and all returned true):
static void get_cells(cell_t arr[], int size, double percent_alive) {
int n = (int) round(percent_alive*size); //checking if cell dead or alive with size
for (int i = 0; i < size; i++){
if (i < n ){ //cell needs to be over certain thresh hold to be alive
arr[i] = 1; //alive
}
else{
arr[i] = 0; //dead
}
}
static int get_living_neighbours(int n_rows, int n_cols, const cell_t world[][n_cols], int row, int col) {
int sum = 0;
for (int r = row - 1; r <=row + 1; r++){
for(int c = col-1; c<=col +1; c++){
if(!(row == r && col == c) && is_valid_location(n_rows, n_cols, r, c)){
sum = sum + world[r][c];
}
}
}
return sum;
static void array_to_matrix(int n_rows, int n_cols, cell_t matrix[][n_cols], const cell_t arr[], int size) {
for (int i = 0; i < size; i++){
matrix[i/n_rows][i%n_cols] = arr[i];
}
static void shuffle_cells(cell_t arr[], int size) {
for(int i = size; i > 1; i--){
int j = rand()%i;
int tmp = arr[j];
arr[j] = arr[i-1];
arr[i-1] = tmp;
}
Anyone know how I can solve this? I don't know how to perform this action, thanks!
I'm trying to dynamically allocate memory to a 2d array using a single pointer. For that, I have 3 functions that allocate the respective memory newarray() and to store individual elements in it store(), to fetch elements from it fetch(). I don't know why I get execution errors while I test it, also I should allocate the the exact amount of memory for it, that might be the problem but I'm not sure how to do that. This probrams deals with a triangular matrix which should have the number of columns lower than the number of rows when It comes to adding elements, like I, have a 5x5 array where (4,2) and (4,4) its OK but (4,5) its NOT.
Here is the code
typedef int* triangular;
triangular newarray(int N){
triangular mat = NULL; //pointer to integer
//Allocate memory for row
mat = (int *)malloc(N * N * sizeof(int));
//Check memory validity
if(mat == NULL)
{
return 1;
}
return mat;
}
int store(triangular as, int N, int row, int col, int val){
if(row >= col){
as[row * N + col] = val;
return 1;
}else if(row < col){
return -1;
}else if((row > N) ||(col > N) || (row + col > N + N))
return -1;
}
int fetch(triangular as, int N, int row, int col){
int value;
value = as[row * N + col];
if((row > N) ||(col > N) || (row + col > N + N) )
return -1;
else if(row < col)
return -1;
return value;
}
nt main()
{
int iRow = 0; //Variable for looping Row
int iCol = 0; //Variable for looping column
int N;
triangular mat = newarray(5);
printf("\nEnter the number of rows and columns = ");
scanf("%d",&N); //Get input for number of Row
store(mat,N,3,2,10);
store(mat,N,3,3,10);
store(mat,N,4,2,111);
store(mat,N,3,5,11);
printf("the element at [3,5] is : %i", fetch(mat,N,3,5));
//Print the content of 2D array
for (iRow =0 ; iRow < N ; iRow++)
{
for (iCol =0 ; iCol < N ; iCol++)
{
printf("\nmat[%d][%d] = %d\n",iRow, iCol,mat[iRow * N + iCol]);
}
}
//free the allocated memory
free(mat);
return 0;
}
int store(triangular as, int N, int row, int col, int val){
if(row >= col){
as[row * N + col] = val;
return 1;
}else if(row < col){
return -1;
}else if((row > N) ||(col > N) || (row + col > N + N))
return -1;
}
in store function, first if condition is so weird. Why you dont set the value to the array when the parameters passed to function is 2(row), 3(column).
I changed your store in the following way. index and array size are different things because of that index is equal to N - 1. In your code, there are a lot of if checks I guess checking only row and col is enough to understand that they are inside boundaries.
int store(triangular as, int N, int row, int col, int val){
int index = N - 1;
if((row > N) ||(col > N))
return -1;
as[row * index + col] = val;
return 1;
}
I changed your fetch function like below because the reason I mentioned about your store function.
int fetch(triangular as, int N, int row, int col){
int value;
int index = N - 1;
if((row > index) ||(col > index))
return -1;
value = as[row * index + col];
return value;
}
You are making this needlessly complicated. All those functions and manual run-time calculations aren't really necessary.
Also, you have the following problems:
Don't hide pointers behind typedef, it just makes the code unreadable for no gain.
Initialize the data returned from malloc or instead use calloc which sets everything to zero, unlike malloc.
Arrays in C are zero-indexed so you can't access item [3][5] in an array of size 5x5. This is a common beginner problem since int array[5][5]; declares such an array but array[5][5] for index accessing goes out of bounds. The syntax for declaration and access isn't the same, access needs to start at 0.
You didn't include any headers, I'm assuming you left that part out.
Here's a simplified version with bug fixes that you can use:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int N=5;
int (*mat)[N] = calloc( 1, sizeof(int[N][N]) ); // allocate 2D array dynamically
if(mat == NULL)
return 0;
mat[3][2] = 10;
mat[3][3] = 10;
mat[4][2] = 111;
mat[3][4] = 11;
for(int i=0; i<N; i++)
{
for(int j=0; j<N; j++)
{
printf("[%d][%d] = %d\n", i, j, mat[i][j]);
}
}
free(mat);
return 0;
}
Further study: Correctly allocating multi-dimensional arrays
Hi I want to make a 3 x 3 magic square in C using backtracking (as in the 4 queens exercise) with recursivity.
In addition, I must enter the maximum value that this magic square will have inside, for example if I enter m = 26, my table should look something like this:
[22,8,21]
[16,17,18]
[13,26,12]
as it should be done by backtracking, that is one possible solution of many, currently I have a simple code of 3 loops to perform all the possible combinations by entering the value of M.
attached code:
#include <stdio.h>
#include <string.h>
#define N 10
void print (int * num, int n)
{
int i;
for (i = 0; i <n; i ++)
printf ("% d", num [i]);
printf ("\ n");
}
int main ()
{
int num [N];
int * ptr;
int temp;
int i, m, j;
int n = 3;
printf ("\ nlimite:");
scanf ("% d", & m);
for (int i = 1; i <= m; ++ i)
{
for (int j = 1; j <= m; ++ j)
{
for (int k = 1; k <= m; ++ k)
{
permutations ++;
printf ("%i,%i,%i\n", i, j, k);
}
}
}
}
How can I transform this code to be recursive? and without repeating the first values, for example [1,1,1] [16,16,16] since this will allow me to create the possible rows and columns to elaborate the magic square.
and finally to be able to print all the possible solutions that are correct.
solution 1 solution N
[4,9,2] [22,8,21]
[3,5,7] [16,17,18]
[8,1,6] ... [13,26,12]
for compilation I use MingGW - gcc on windows, in advance thanks a lot for the help
so, nowhere in your current code do you actually test that the solution is a perfect square. Let's rectify that.
Now this solution is realllllllly slow, but it does show how to advance recursively in this kind of problem.
#include <stdio.h>
void magic_square(int *grid, int next_slot, int max_value) {
// Maybe recurse
if (next_slot < 9) {
for (int i = 1; i < max_value; i++) {
grid[next_slot] = i;
magic_square(grid, next_slot + 1, max_value);
}
// Test magic square.
} else {
const int sum = grid[0] + grid[1] + grid[2];
// Horizontal lines
if (grid[3] + grid[4] + grid[5] != sum) return;
if (grid[6] + grid[7] + grid[8] != sum) return;
// Vertical lines
if (grid[0] + grid[3] + grid[6] != sum) return;
if (grid[1] + grid[4] + grid[7] != sum) return;
if (grid[2] + grid[5] + grid[8] != sum) return;
// Diagonal lines
if (grid[0] + grid[4] + grid[8] != sum) return;
if (grid[2] + grid[4] + grid[6] != sum) return;
// Guess it works
printf("%3d %3d %3d\n%3d %3d %3d\n%3d %3d %3d\n\n",
grid[0], grid[1], grid[2],
grid[3], grid[4], grid[5],
grid[6], grid[7], grid[8]);
}
}
int main(void) {
int grid[9];
int max_value = 5;
magic_square(grid, 0, max_value);
}
You'll also need to add the restriction that no number is used multiple times.
I have developed this knapsack algorithm based on pseudo-code found on wikipedia. It works fine for small number of items and capacity (n=6, v=2014), but it crashes for large numbers (n=5, v=123456789).
Additional problem is, that my program is tested by makefile with time limit set at 1 second.
What can i do to save time and memory?
v - Knapsack capacity
n - Number of items
weight[] - Weights
value[] - Values
int knapSack(int v, int weight[], int value[], int n){
int a, i, j;
int **ks;
ks = (int **)calloc(n+1, sizeof(int*));
for(a = 0; a < (n+1); a++) {
ks[a] = (int *)calloc(v+1, sizeof(int));
}
for (i = 1; i <= n; i++){
for (j = 0; j <= v; j++){
if (weight[i-1] <= j){
ks[i][j] = max(value[i-1] + ks[i-1][j-weight[i-1]], ks[i-1][j]);
} else {
ks[i][j] = ks[i-1][j];
}
}
}
int result = ks[n][v];
for(i = 0; i < (n+1); i++) {
free(ks[i]);
}
free(ks);
return result;
}
An array of 123456789 integer elements declared on the stack will crash many implementations of C. Sounds like this is your problem. Did you declare your arrays inside of a function (on the stack)?
// on heap
static int v[123456789]={0};
// on the stack (inside a function like main() )
int foo()
{
int v[123456789]={0};
}
I try to use OpenMP to parallelize QuickSort in partition part and QuickSort part. My C code is as follows:
#include "stdlib.h"
#include "stdio.h"
#include "omp.h"
// parallel partition
int ParPartition(int *a, int p, int r) {
int b[r-p];
int key = *(a+r); // use the last element in the array as the pivot
int lt[r-p]; // mark 1 at the position where its element is smaller than the key, else 0
int gt[r-p]; // mark 1 at the position where its element is bigger than the key, else 0
int cnt_lt = 0; // count 1 in the lt array
int cnt_gt = 0; // count 1 in the gt array
int j=p;
int k = 0; // the position of the pivot
// deal with gt and lt array
#pragma omp parallel for
for ( j=p; j<r; ++j) {
b[j-p] = *(a+j);
if (*(a+j) < key) {
lt[j-p] = 1;
gt[j-p] = 0;
} else {
lt[j-p] = 0;
gt[j-p] = 1;
}
}
// calculate the new position of the elements
for ( j=0; j<(r-p); ++j) {
if (lt[j]) {
++cnt_lt;
lt[j] = cnt_lt;
} else
lt[j] = cnt_lt;
if (gt[j]) {
++cnt_gt;
gt[j] = cnt_gt;
} else
gt[j] = cnt_gt;
}
// move the pivot
k = lt[r-p-1];
*(a+p+k) = key;
// move elements to their new positon
#pragma omp parallel for
for ( j=p; j<r; ++j) {
if (b[j-p] < key)
*(a+p+lt[j-p]-1) = b[j-p];
else if (b[j-p] > key)
*(a+k+gt[j-p]) = b[j-p];
}
return (k+p);
}
void ParQuickSort(int *a, int p, int r) {
int q;
if (p<r) {
q = ParPartition(a, p, r);
#pragma omp parallel sections
{
#pragma omp section
ParQuickSort(a, p, q-1);
#pragma omp section
ParQuickSort(a, q+1, r);
}
}
}
int main() {
int a[10] = {5, 3, 8, 4, 0, 9, 2, 1, 7, 6};
ParQuickSort(a, 0, 9);
int i=0;
for (; i!=10; ++i)
printf("%d\t", a[i]);
printf("\n");
return 0;
}
For the example in the main function, the sorting result is:
0 9 9 2 2 2 6 7 7 7
I used gdb to debug. In the early recursion, all went well. But in some recursions, it suddenly messed up to begin duplicate elements. Then generate the above result.
Can someone help me figure out where the problem is?
I decided to post this answer because:
the accepted answer is wrong, and the user seems inactive these days. There is a race-condition on
#pragma omp parallel for
for(i = p; i < r; i++){
if(a[i] < a[r]){
lt[lt_n++] = a[i]; //<- race condition lt_n is shared
}else{
gt[gt_n++] = a[i]; //<- race condition gt_n is shared
}
}
Nonetheless, even if it was correct, the modern answer to this question is to use OpenMP tasks instead of sections.
I am providing the community with full runnable example of such approach including tests and profiling.
#include <assert.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
#define TASK_SIZE 100
unsigned int rand_interval(unsigned int min, unsigned int max)
{
// https://stackoverflow.com/questions/2509679/
int r;
const unsigned int range = 1 + max - min;
const unsigned int buckets = RAND_MAX / range;
const unsigned int limit = buckets * range;
do
{
r = rand();
}
while (r >= limit);
return min + (r / buckets);
}
void fillupRandomly (int *m, int size, unsigned int min, unsigned int max){
for (int i = 0; i < size; i++)
m[i] = rand_interval(min, max);
}
void init(int *a, int size){
for(int i = 0; i < size; i++)
a[i] = 0;
}
void printArray(int *a, int size){
for(int i = 0; i < size; i++)
printf("%d ", a[i]);
printf("\n");
}
int isSorted(int *a, int size){
for(int i = 0; i < size - 1; i++)
if(a[i] > a[i + 1])
return 0;
return 1;
}
int partition(int * a, int p, int r)
{
int lt[r-p];
int gt[r-p];
int i;
int j;
int key = a[r];
int lt_n = 0;
int gt_n = 0;
for(i = p; i < r; i++){
if(a[i] < a[r]){
lt[lt_n++] = a[i];
}else{
gt[gt_n++] = a[i];
}
}
for(i = 0; i < lt_n; i++){
a[p + i] = lt[i];
}
a[p + lt_n] = key;
for(j = 0; j < gt_n; j++){
a[p + lt_n + j + 1] = gt[j];
}
return p + lt_n;
}
void quicksort(int * a, int p, int r)
{
int div;
if(p < r){
div = partition(a, p, r);
#pragma omp task shared(a) if(r - p > TASK_SIZE)
quicksort(a, p, div - 1);
#pragma omp task shared(a) if(r - p > TASK_SIZE)
quicksort(a, div + 1, r);
}
}
int main(int argc, char *argv[])
{
srand(123456);
int N = (argc > 1) ? atoi(argv[1]) : 10;
int print = (argc > 2) ? atoi(argv[2]) : 0;
int numThreads = (argc > 3) ? atoi(argv[3]) : 2;
int *X = malloc(N * sizeof(int));
int *tmp = malloc(N * sizeof(int));
omp_set_dynamic(0); /** Explicitly disable dynamic teams **/
omp_set_num_threads(numThreads); /** Use N threads for all parallel regions **/
// Dealing with fail memory allocation
if(!X || !tmp)
{
if(X) free(X);
if(tmp) free(tmp);
return (EXIT_FAILURE);
}
fillupRandomly (X, N, 0, 5);
double begin = omp_get_wtime();
#pragma omp parallel
{
#pragma omp single
quicksort(X, 0, N);
}
double end = omp_get_wtime();
printf("Time: %f (s) \n",end-begin);
assert(1 == isSorted(X, N));
if(print){
printArray(X, N);
}
free(X);
free(tmp);
return (EXIT_SUCCESS);
return 0;
}
How to run:
This program accepts three parameters:
The size of the array;
Print or not the array, 0 for no, otherwise yes;
The number of Threads to run in parallel.
Mini Benchmark
In a 4 core machine : Input 100000 with
1 Thread -> Time: 0.784504 (s)
2 Threads -> Time: 0.424008 (s) ~ speedup 1.85x
4 Threads -> Time: 0.282944 (s) ~ speedup 2.77x
I feel sorry for my first comment.It does not matter with your problem.I have not found the true problem of your question(Maybe your move element has the problem).According to your opinion, I wrote a similar program, it works
fine.(I am also new on OpenMP).
#include <stdio.h>
#include <stdlib.h>
int partition(int * a, int p, int r)
{
int lt[r-p];
int gt[r-p];
int i;
int j;
int key = a[r];
int lt_n = 0;
int gt_n = 0;
#pragma omp parallel for
for(i = p; i < r; i++){
if(a[i] < a[r]){
lt[lt_n++] = a[i];
}else{
gt[gt_n++] = a[i];
}
}
for(i = 0; i < lt_n; i++){
a[p + i] = lt[i];
}
a[p + lt_n] = key;
for(j = 0; j < gt_n; j++){
a[p + lt_n + j + 1] = gt[j];
}
return p + lt_n;
}
void quicksort(int * a, int p, int r)
{
int div;
if(p < r){
div = partition(a, p, r);
#pragma omp parallel sections
{
#pragma omp section
quicksort(a, p, div - 1);
#pragma omp section
quicksort(a, div + 1, r);
}
}
}
int main(void)
{
int a[10] = {5, 3, 8, 4, 0, 9, 2, 1, 7, 6};
int i;
quicksort(a, 0, 9);
for(i = 0;i < 10; i++){
printf("%d\t", a[i]);
}
printf("\n");
return 0;
}
I've implemented parallel quicksort in a production environment, although with concurrent processes (i.e. fork() and join()) and not OpenMP. I also found a pretty good pthread solution, but a concurrent process solution was the best in terms of worst-case runtime. Let me start by saying that it doesn't seem like you're making copies of your input array for each thread, so you'll definitely encounter race conditions which can corrupt your data.
Essentially, what is happening is you have created an array N in shared memory, and when you do a #pragma omp parallel sections, you're spawning as many worker threads as there are #pragma omp section's. Each time a worker thread tries to access and modify elements of a, it will execute a series of instructions: "read the n'th value of N from the given address", "modify the n'th value of N", "write the n'th value of N back to the given address". Since you have multiple threads with no locking or synchronization, the read, modify, and write instructions may be executed in any order by multiple processors, so the threads may overwrite each other's modifications or read a non-updated value.
The best solution that I found (after many weeks of testing and benchmarking many solutions that I came up with) is to subdivide the list log(n) times, where n is the number of processors. For example, if you have a quad core machine (n = 4), subdivide the list 2 times (log(4) = 2) choosing pivots that are the medians of the data set. It is important that the pivots are medians, because otherwise you can end up with a case where a poorly chosen pivot causes the lists to be distributed unevenly amongst processes. Then each process does quicksort on its local subarray, then merges its results with the results of other processes. This is called "hyperquicksort", and from an initial github search, I found this. I can't vouch for the code in there, and can't publish any of the code that I wrote since it is protected under an NDA.
By the way, one of the best parallel sorting algorithm is PSRS (Parallel Sorting by Regular Sampling), which keeps list sizes more balanced amongst processes, doesn't unnecessarily communicate keys between processes, and can work on an arbitrary number of concurrent processes (they don't necessarily have to be a power of 2).