I have implemented the following quicksort algorithm to sort couples of points(3D space).
Every couple defines a line: the purpose is to place all lines that have a distance less or equal to powR nearby inside the array which contains all the couples.
The Array containing coordinates is monodimentional, every 6 elements define a couple and every 3 a point.
When i run the algorithm with an array of 3099642 elements stops after processing 2799222 trying to enter the next iteration. if i start the algorithm from element 2799228 it stops at 3066300.
I can't figure out where is the problem, and suggestion?
void QuickSort(float *array, int from, int to, float powR){
float pivot[6];
float temp[6];
float x1;
float y1;
float z1;
float x2;
float y2;
float z2;
float d12;
int i;
int j;
if(from >= to)
return;
pivot[0] = array[from+0];
pivot[1] = array[from+1];
pivot[2] = array[from+2];
pivot[3] = array[from+3];
pivot[4] = array[from+4];
pivot[5] = array[from+5];
i = from;
for(j = from+6; j <= to; j += 6){
x1 = pivot[0] - array[j+0];
y1 = pivot[1] - array[j+1];
z1 = pivot[2] - array[j+2];
x2 = pivot[3] - array[j+3];
y2 = pivot[4] - array[j+4];
z2 = pivot[5] - array[j+5];
d12 = (x1*x1 + y1*y1 + z1*z1) + (x2*x2 + y2*y2 + z2*z2);
/*the sorting condition i am using is the regular euclidean norm*/
if (d12 <= powR){
i += 6;
temp[0] = array[i+0];
temp[1] = array[i+1];
temp[2] = array[i+2];
temp[3] = array[i+3];
temp[4] = array[i+4];
temp[5] = array[i+5];
array[i+0] = array[j+0];
array[i+1] = array[j+1];
array[i+2] = array[j+2];
array[i+3] = array[j+3];
array[i+4] = array[j+4];
array[i+5] = array[j+5];
array[j+0] = temp[0];
array[j+1] = temp[1];
array[j+2] = temp[2];
array[j+3] = temp[3];
array[j+4] = temp[4];
array[j+5] = temp[5];
}
}
QuickSort(array, i+6, to, powR);
}
function is called in this way:
float LORs = (float) calloc((unsigned)tot, sizeof(float));
LORs is filled reading datas from a file, and works fine.
QuickSort(LORs, 0, 6000, powR);
free(LORs);
for(j = from+6; j <= to; j += 6) {
array[i+0] = array[j+0];
array[i+1] = array[j+1];
array[i+2] = array[j+2];
array[i+3] = array[j+3];
array[i+4] = array[j+4];
array[i+5] = array[j+5];
}
Your j + constant_number goes out of bounds when you approach the end. That's why it crashes at the end. Note that constant_number is non-negative.
When j comes close (how close you can find by the increment step, i.e. +6) to the end of your array, it will go for sure out of bounds.
Take the easy case, the max value j can get. That is the size of your array.
So, let's call it N.
Then, when j is equal to N, you are going to enter the loop.
Then, you want to access array[j + 0], which is actually array[N + 0], which is array[N].
I am pretty sure, you know that indexing in C (which you should in the future include in the tags of your questions is needed), is from 0 to N - 1. And so on..
EDIT: As the comments suggest, this is not a (quick)sort!
I had implemented quickSort here, is you want to take an idea of it. I suggest you start from the explanations and not from the code!
Related
I am working on a project to rewrite a sequential C-code algorithm for creating a mandelbrot set into a parallel one using pthreads. I've gone up against a wall so to speak, as my version simply outputs a more or less black picture (and nothing to what the original program results into), and I can't really see where I'm going wrong. Simply put, I could use a second pair of eyes on this one.
Here is the sequential code snippet that matters:
void mandelbrot(float width, float height, unsigned int *pixmap)
{
int i, j;
float xmin = -1.6f;
float xmax = 1.6f;
float ymin = -1.6f;
float ymax = 1.6f;
for (i = 0; i < height; i++) {
for (j = 0; j < width; j++) {
float b = xmin + j * (xmax - xmin) / width;
float a = ymin + i * (ymax - ymin) / height;
float sx = 0.0f;
float sy = 0.0f;
int ii = 0;
while (sx + sy <= 64.0f) {
float xn = sx * sx - sy * sy + b;
float yn = 2 * sx * sy + a;
sx = xn;
sy = yn;
ii++;
if (ii == 1500) {
break;
}
}
if (ii == 1500) {
pixmap[j+i*(int)width] = 0;
}
else {
int c = (int)((ii / 32.0f) * 256.0f);
pixmap[j + i *(int)width] = pal[c%256];
}
}
}
}
Here is my sequential version of the code:
void* Mandel(void* threadId) {
int x = *(int*)threadId;
float xmin = -1.6f;
float xmax = 1.6f;
float ymin = -1.6f;
float ymax = 1.6f;
float b = xmin + x * (xmax - xmin) / WIDTH;
for (int y = 0; y < 1024; y++)
{
float a = ymin + y * (ymax - ymin) / WIDTH;
float sx = 0.0f;
float sy = 0.0f;
int ii = 0;
while (sx + sy <= 64.0f) {
float xn = sx * sx - sy * sy + b;
float yn = 2 * sx * sy + a;
sx = xn;
sy = yn;
ii++;
if (ii == 1500) {
break;
}
}
if (ii == 1500) {
pixmap[x+y*(int)WIDTH] = 0;
}
else {
int c = (int)((ii / 32.0f) * 256.0f);
pixmap[x + y *(int)WIDTH] = pal[c%256];
}
}
}
Explanation of my thought process:
I create 1024 threads in the main function, and then call on the function above with each thread. They're supposed to a column each (since the x is a constant between 0 and 1023, while the y value changes from 0 to 1023 within the function). As you can see, most of the mathematical meat in the function itself is the same in both the sequential and my parallel versions of the code. Because of this, I think the problem comes from how I'm stepping through the array, but I cannot see the problem with my own eyes. Regardless, the value that ii eventually receives is used to calculate c, which in turn is used to decide the color value that's to be saved in the corresponding position in pixmap. (pal is basically just a large array filled with color values).
This function is the only piece of the code that I've actually touched to any major degree. The only difference in the main function is that I've created threads in it with the instruction to carry out the function Mandel.
I assume that anyone willing to help will want more information, and please let me know of any improvements to this post in case I have posted too little information.
I found an answer shortly after posting this code. Silly me.
Anywho, the problem wasn't stepping through the array within the function, but actually how I created the threads.
This is how it looked in main when the problem occurred:
for(int k = 0; k < 1024; k++) {
pthread_create(&threads[k], NULL, (void*)Mandel, (void*) k);
}
The problem with doing it this way was that the loop continued, changing the value of k for the next thread, meaning that the last thread that we just created suddenly got an erronous value for k. This was solved by using an int array which saves the values of k as we go through each k value and create each thread, like shown below:
int id[1024];
for(int k = 0; k < 1024; k++) {
pthread_create(&threads[k], NULL, (void*)Mandel, (void*)(id+k));
}
I would like to point towards the post in the following link, for helping me answer this problem:
Pass integer value through pthread_create
Just goes to show that you can often find the answers you're looking for if you search long enough, for the most of the time.
I am trying to implement a Navier-Stokes solver in 2D using CUDA. I am using Jacobi's method to solve the system of difference equations. I am dividing the code in 4x4 blocks consisting of 16x16 threads. As every inner point in my matrix (of dimension 64x64) requires its top, bottom, left and right element to compute its new value, I create a new shared matrix of 18x18 dimension for every block. I read all the values into the matrix in this fashion - The thread with indices (0, 0) will write its value into the (1, 1) element in the matrix and will also attempt to read the element above it and the one to its left if this access is not exceeding the boundary. Once this read is done, I update the values of all the internal points and then write them back into memory.
I end up getting garbage values in the matrix pn, even though all the values are initialized correctly. I honestly cannot see where I'm going wrong. Can someone help me with this?
My kernel -
__global__ void red_psi (float *psi_o, float *psi_n, float *e, float *omega, float l1)
{
// m = n = 64
int i1 = blockIdx.x;
int j1 = blockIdx.y;
int i2 = threadIdx.x;
int j2 = threadIdx.y;
int i = (i1 * blockDim.x) + i2; // Actual row of the element
int j = (j1 * blockDim.y) + j2; // Actual column of the element
int l = i * n + j;
// e_XX --> variables refers to expanded shared memory location in order to accomodate halo elements
//Current Local ID with radius offset.
int e_li = i2 + 1;
int e_lj = j2 + 1;
// Variable pointing at top and bottom neighbouring location
int e_li_prev = e_li - 1;
int e_li_next = e_li + 1;
// Variable pointing at left and right neighbouring location
int e_lj_prev = e_lj - 1;
int e_lj_next = e_lj + 1;
__shared__ float po[BLOCK_SIZE + 2][BLOCK_SIZE + 2];
__shared__ float pn[BLOCK_SIZE + 2][BLOCK_SIZE + 2];
__shared__ float oo[BLOCK_SIZE + 2][BLOCK_SIZE + 2];
//__shared__ float ee[BLOCK_SIZE + 2][BLOCK_SIZE + 2];
if (i2 < 1) // copy top and bottom halo
{
//Copy Top Halo Element
if (blockIdx.y > 0) // Boundary check
{
po[i2][e_lj] = psi_o[l - n];
//pn[i2][e_lj] = psi_n[l - n];
oo[i2][e_lj] = omega[l - n];
//printf ("i_pn[%d][%d] = %f\n", i2, e_lj, oo[i2][e_lj]);
}
//Copy Bottom Halo Element
if (blockIdx.y < (gridDim.y - 1)) // Boundary check
{
po[1 + BLOCK_SIZE][e_lj] = psi_o[l + n];
//pn[1 + BLOCK_SIZE][e_lj] = psi_n[l + n];
oo[1 + BLOCK_SIZE][e_lj] = omega[l + n];
//printf ("j_pn[%d][%d] = %f\n", 1 + BLOCK_SIZE, e_lj, oo[1 + BLOCK_SIZE][e_lj]);
}
}
if (j2 < 1) // copy left and right halo
{
if (blockIdx.x > 0) // Boundary check
{
po[e_li][j2] = psi_o[l - 1];
//pn[e_li][j2] = psi_n[l - 1];
oo[e_li][j2] = omega[l - 1];
//printf ("k_pn[%d][%d] = %f\n", e_li, j2, oo[e_li][j2]);
}
if (blockIdx.x < (gridDim.x - 1)) // Boundary check
{
po[e_li][1 + BLOCK_SIZE] = psi_o[l + 1];
//pn[e_li][1 + BLOCK_SIZE] = psi_n[l + 1];
oo[e_li][1 + BLOCK_SIZE] = omega[l + 1];
//printf ("l_pn[%d][%d] = %f\n", e_li, 1 + BLOCK_SIZE, oo[e_li][BLOCK_SIZE + 1]);
}
}
// copy current location
po[e_li][e_lj] = psi_o[l];
//pn[e_li][e_lj] = psi_n[l];
oo[e_li][e_lj] = omega[l];
//printf ("o_pn[%d][%d] = %f\n", e_li, e_lj, oo[e_li][e_lj]);
__syncthreads ();
// Checking whether we have an internal point.
if ((i >= 1 && i < (m - 1)) && (j >= 1 && j < (n - 1)))
{
//printf ("Calculating for - (%d, %d)\n", i, j);
pn[e_li][e_lj] = 0.25 * (po[e_li_next][e_lj] + po[e_li_prev][e_lj] + po[e_li][e_lj_next] + po[e_li][e_lj_prev] + h*h*oo[e_li][e_lj]);
//printf ("n_pn[%d][%d] (%d, %d), a(%d, %d) = %f\n", e_li_prev, e_lj, i1, j1, i, j, po[e_li_prev][e_lj]);
pn[e_li][e_lj] = po[e_li][e_lj] + 1.0 * (pn[e_li][e_lj] - po[e_li][e_lj]);
__syncthreads ();
psi_n[l] = pn[e_li][e_lj];
e[l] = po[e_li][e_lj] - pn[e_li][e_lj];
}
}
This is how I invoke the kernel -
dim3 threadsPerBlock (4, 4);
dim3 numBlocks (4, 4);
red_psi<<<numBlocks, threadsPerBlock>>> (d_xn, d_xx, d_e, d_w, l1);
(d_xx, d_xn, d_e, d_w are all float arrays of size 4096)
I switched the blockDim.x and blockDim.y when I was copying the top / bottom and the left / right halo elements.
According to Wikipedia and other sources I had went through, you need matrix m[n][W]; n - number of items and W - total capacity of knapsack. This matrix get really big, sometimes too big to handle it in C program. I know that dynamic programming is based on saving time for memory but still, is there any solution where can you save time and memory?
Pseudo-code for Knapsack problem:
// Input:
// Values (stored in array v)
// Weights (stored in array w)
// Number of distinct items (n)
// Knapsack capacity (W)
for j from 0 to W do
m[0, j] := 0
end for
for i from 1 to n do
for j from 0 to W do
if w[i] <= j then
m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i])
else
m[i, j] := m[i-1, j]
end if
end for
end for
Lets say, that W = 123456789 and n = 100. In this case we get really big matrix m[100][123456789]. I was thinking how to implement this, but best I have in my mind is just to save which items was selected with one bit (0/1). Is this possible? Or is there any other approach for this problem?
int32 -> 32 * 123456789 * 100 bits
one_bit -> 1 * 123456789 * 100 bits
I hope this is not stupid question and thanks for your effort.
EDIT - working C code:
long int i, j;
long int *m[2];
m[0] = (long int *) malloc(sizeof(long int)*(W+1));
m[1] = (long int *) malloc(sizeof(long int)*(W+1));
for(i = 0; i <= W; i++){
m[0][i] = 0;
}
int read = 0;
int write = 1;
int tmp;
long int percent = (W+1)*(n)/100;
long int counter = 0;
for(i = 1; i <= n; i++){
for(j = 0; j <= W; j++){
if(w[i-1] <= j){
m[write][j] = max(m[read][j],(v[i-1]) + m[read][j-(w[i-1])]);
}else{
m[write][j] = m[read][j];
}
counter++;
if(counter == percent){
printf("."); //printing dot (.) for each percent
fflush(stdout);
counter = 0;
}
}
tmp = read;
read = write;
write = tmp;
}
printf("\n%ld\n", m[read][W]);
free(m[0]);
free(m[1]);
Knapsack problem can be solved using O(W) space.
At each step of the iteration you need only 2 rows - current state of the array m[i] and m[i + 1].
current = 1
int m[2][W]
set NONE for all elements of m # that means we are not able to process this state
m[0][0] = 0 # this is our start point, initially empty knapsack
FOR i in [1..n] do
next = 3 - current; /// just use 1 or 2 based on the current index
for j in [0...W] do
m[next][j] = m[current][j]
FOR j in [w[i]..W] do
if m[current][j - w[i]] is not NONE then # process only reachable positions
m[next][j] = max(m[next][j], m[current][j - w[i]] + v[i]);
current = next; /// swap current state and the produced one
Also it is possible to use only 1 array. Here is the pseudocode
FOR i in [1..n] do
FOR j in [w[i]..W] do
m[j] = max(m[j], m[j - w[i]] + v[i]);
You can decrease the space use from m[100][123456789] into m[2][123456789] by this observation:
Look at this part of the code, at any time, you only need to refer to two rows of the matrix i and i - 1
if w[i] <= j then
m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i])
else
m[i, j] := m[i-1, j]
end if
You can use this trick:
int current = 1;
//.........
if w[i] <= j then
m[current, j] := max(m[1 - current, j], m[1 - current, j-w[i]] + v[i])
else
m[i, j] := m[1 - current, j]
end if
current = 1 - current;
So I was making a program that computes Pythagorean triples, as long as c is lower than the number entered by the user. So I used 3 while loops to accomplish this. What I also want to accomplish is print to the screen, the set of triples that has the thinnest interior angle, has to be a right-angled triangle. Basically, I calculated using the sine law the smallest angle for each of the triples, and then stored that smallest in an array, and right after it the three indexes of the array represent its corresponding triples. Then I made a method to compare each angle from the triples and if one was greater to store it in the initial four spots of the array. I am currently not worrying about the array size and have declared it as 9999. So the problem is that when I compare more than 1 set of triples, the program does not make the 1st set of indexes in the array equal to the triple with the smallest angle. I agree that the procedure that I have used is very inefficient and time consuming, but if you could give me some sort of solution or even guide me in the right direction I'd appreciate it. Thanks. Oh and here is my code,
#include <stdio.h>
#include <math.h>
#define PI 3.14159265
static int a[9999];
int main(void)
{
int side1, side2, hyp, num;
int i = 0;
int j;
side1 = 1;
hyp = 0;
printf("Please enter a number\n");
scanf("%d", &num);
while (side1 < num) {
side2 = 1;
while (side2 < num) {
hyp = 1;
while (hyp < num) {
if (side1 * side1 + side2 * side2 == hyp * hyp && side1 < side2) {
printf("The side lengths are %d,%d,%d\n", side1, side2, hyp);
float angle1 = (asin((float) side1 / hyp) * (180 / PI));
float angle2 = (asin((float) side2 / hyp) * (180 / PI));
if (angle1 > angle2) {
a[i] = (int)angle2;
a[i + 1] = side1;
a[i + 2] = side2;
a[i + 3] = hyp;
} else if (angle2 > angle1) {
a[i] = (int)angle1;
a[i + 1] = side1;
a[i + 2] = side2;
a[i + 3] = hyp;
} else {
a[i] = (int)angle1;
a[i + 1] = side1;
a[i + 2] = side2;
a[i + 3] = hyp;
}
i=i+4;
}
hyp++;
}
side2++;
}
side1++;
}
a[i+1]=99.99;
a[i+2]=99.99;
a[i+3]=99.99;
a[i+4]=99.99;
compare(i);
return (0);
}
void compare(int i)
{
int j;
for(j=0;j<i;j=j+4)
{
if (a[0]>a[j+4])
{
a[0]=a[j+4];
a[1]=a[j+5];
a[2]=a[j+6];
a[3]=a[j+7];
}
//printf("%d\n",a[0]);
}
printf("The thinnest triangle is formed by (%d , %d , %d)", a[1], a[2], a[3]);
}
Oh one more thing, the reason I have made some indexes 99.9 is so that when the loop is checking and it is not the last triple, I do not want an error,since the loop will have nothing further to compare the previous triples to. Ok I changed it to one equal sign, but now the output is always 99,99,99.
Perhaps the four statements starting with a[0]==a[j+4] should be using = instead of ==.
switch your assignments to single equal signs = instead of double ones ==.
void compare(int i)
{
int j;
for(j=0;j<i;j=j+4)
{
if (a[0]>a[j+4])
{
a[0] = a[j+4]; // <- here,
a[1] = a[j+5]; // <- here,
a[2] = a[j+6]; // <- here,
a[3] = a[j+7]; // <- and here
}
//printf("%d\n",a[0]);
}
printf("The thinnest triangle is formed by (%d , %d , %d)", a[1], a[2], a[3]);
}
I have an odd problem. Following (re: copying) from here, I've been trying to implement the Cooley–Tukey FFT algorithm for arrays with a power-of-2 size, but the answers returned from this implementation are the conjugate of the true answers.
int fft_pow2(int dir,int m,float complex *a)
{
long nn,i,i1,j,k,i2,l,l1,l2;
float c1,c2,tx,ty,t1,t2,u1,u2,z;
float complex t;
/* Calculate the number of points */
nn = 1;
for (i=0;i<m;i++)
nn *= 2;
/* Do the bit reversal */
i2 = nn >> 1;
j = 0;
for (i=0;i<nn-1;i++) {
if (i < j) {
t = a[i];
a[i] = a[j];
a[j] = t;
}
k = i2;
while (k <= j) {
j -= k;
k >>= 1;
}
j += k;
}
/* Compute the FFT */
c1 = -1.0;
c2 = 0.0;
l2 = 1;
for (l=0;l<m;l++) {
l1 = l2;
l2 <<= 1;
u1 = 1.0;
u2 = 0.0;
for (j=0;j<l1;j++) {
for (i=j;i<nn;i+=l2) {
i1 = i + l1;
t = u1 * crealf(a[i1]) - u2 * cimagf(a[i1])
+ I * (u1 * cimagf(a[i1]) + u2 * crealf(a[i1]));
a[i1] = a[i] - t;
a[i] += t;
}
z = u1 * c1 - u2 * c2;
u2 = u1 * c2 + u2 * c1;
u1 = z;
}
c2 = sqrt((1.0 - c1) / 2.0);
if (dir == 1)
c2 = -c2;
c1 = sqrt((1.0 + c1) / 2.0);
}
/* Scaling for forward transform */
if (dir == 1) {
for (i=0;i<nn;i++) {
a[i] /= (float)nn;
}
}
return 1;
}
int main(int argc, char **argv) {
float complex arr[4] = { 1.0, 2.0, 3.0, 4.0 };
fft_pow2(0, log2(n), arr);
for (int i = 0; i < n; i++) {
printf("%f %f\n", crealf(arr[i]), cimagf(arr[i]));
}
}
The results:
10.000000 0.000000
-2.000000 -2.000000
-2.000000 0.000000
-2.000000 2.000000
whereas the true answer is the conjugate.
Any ideas?
The FFT is often defined with Hk = sum(e–2•π•i•j•k/N•hj, 0 < j ≤ N). Note the minus sign in the exponent. The FFT can be defined with a plus sign instead of the minus sign. In large part, the definitions are equivalent, because +i and –i are completely symmetric.
The code you show is written for the definition with the negative sign, and it is also written so that the first parameter, dir, is 1 for a forward transform and something else for a reverse transform. We can determine the intended direction because of the comment about scaling for the forward transform: It scales if dir is 1.
So, where your code in main calls fft_pow2 with 0 for dir, it is requesting a reverse transform. Your code has performed a reverse transform using the FFT definition with a negative sign. The reverse of the transform with a negative sign is a transform with a positive sign. For [1, 2, 3, 4], the result is:
10•1 + 11•2 + 12•3 + 13•4 = 1 + 2 + 3 + 4 = 10.
i0•1 + i1•2 + i2•3 + i3•4 = 1 + 2i – 3 – 4i = –2 – 2i.
(–1)0•1 + (–1)1•2 + (–1)2•3 + (–1)3•4 = 1 – 2 + 3 – 4 = –2.
(–i)0•1 + (–i)1•2 + (–i)2•3 + (–i)3•4 = 1 – 2i – 3 + 4i = –2 + 2i.
And that is the result you obtained.