following this previous question Malloc Memory Corruption in C, now i have another problem.
I have the same code. Now I am trying to multiply the values contained in the arrays A * vc
and store in res. Then A is set to zero and i do a second multiplication with res and vc and i store the values in A. (A and Q are square matrices and mc and vc are N lines two columns matrices or arrays).
Here is my code :
int jacobi_gpu(double A[], double Q[],
double tol, long int dim){
int nrot, p, q, k, tid;
double c, s;
double *mc, *vc, *res;
int i,kc;
double vc1, vc2;
mc = (double *)malloc(2 * dim * sizeof(double));
vc = (double *)malloc(2 * dim * sizeof(double));
vc = (double *)malloc(dim * dim * sizeof(double));
if( mc == NULL || vc == NULL){
fprintf(stderr, "pb allocation matricre\n");
exit(1);
}
nrot = 0;
for(k = 0; k < dim - 1; k++){
eye(mc, dim);
eye(vc, dim);
for(tid = 0; tid < floor(dim /2); tid++){
p = (tid + k)%(dim - 1);
if(tid != 0)
q = (dim - tid + k - 1)%(dim - 1);
else
q = dim - 1;
printf("p = %d | q = %d\n", p, q);
if(fabs(A[p + q*dim]) > tol){
nrot++;
symschur2(A, dim, p, q, &c, &s);
mc[2*tid] = p; vc[2 * tid] = c;
mc[2*tid + 1] = q; vc[2*tid + 1] = -s;
mc[2*tid + 2*(dim - 2*tid) - 2] = p; vc[2*tid + 2*(dim - 2*tid) - 2 ] = s;
mc[2*tid + 2*(dim - 2*tid) - 1] = q; vc[2 * tid + 2*(dim - 2*tid) - 1 ] = c;
}
}
for( i = 0; i< dim; i++){
for(kc=0; kc < dim; kc++){
if( kc < floor(dim/2)) {
vc1 = vc[2*kc + i*dim];
vc2 = vc[2*kc + 2*(dim - 2*kc) - 2];
}else {
vc1 = vc[2*kc+1 + i*dim];
vc2 = vc[2*kc - 2*(dim - 2*kc) - 1];
}
res[kc + i*dim] = A[mc[2*kc] + i*dim]*vc1 + A[mc[2*kc + 1] + i*dim]*vc2;
}
}
zero(A, dim);
for( i = 0; i< dim; i++){
for(kc=0; kc < dim; k++){
if( k < floor(dim/2)){
vc1 = vc[2*kc + i*dim];
vc2 = vc[2*kc + 2*(dim - 2*kc) - 2];
}else {
vc1 = vc[2*kc+1 + i*dim];
vc2 = vc[2*kc - 2*(dim - 2*kc) - 1];
}
A[kc + i*dim] = res[mc[2*kc] + i*dim]*vc1 + res[mc[2*kc + 1] + i*dim]*vc2;
}
}
affiche(mc,dim,2,"Matrice creuse");
affiche(vc,dim,2,"Valeur creuse");
}
free(mc);
free(vc);
free(res);
return nrot;
}
When i try to compile, i have this error :
jacobi_gpu.c: In function ‘jacobi_gpu’:
jacobi_gpu.c:103: error: array subscript is not an integer
jacobi_gpu.c:103: error: array subscript is not an integer
jacobi_gpu.c:118: error: array subscript is not an integer
jacobi_gpu.c:118: error: array subscript is not an integer
make: *** [jacobi_gpu.o] Erreur 1
The corresponding lines are where I store the results in res and A :
res[kc + i*dim] = A[mc[2*kc] + i*dim]*vc1 + A[mc[2*kc + 1] + i*dim]*vc2;
and
A[kc + i*dim] = res[mc[2*kc] + i*dim]*vc1 + res[mc[2*kc + 1] + i*dim]*vc2;
Can someone explain me what is this error and how can i correct it?
Thanks for your help. ;)
mc is of type double. It has to be integral type
mc is pointer to double.
A[mc[2*kc + 1]
In above, you are indexing A with a value in mc (double array). And, there are other similar cases. If you are sure of the values, cast to int
Your declaration of mc:
mc = (double *)malloc(2 * dim * sizeof(double));
And then you use mc multiple times in your array access. For example:
A[mc[2*kc + 1] ...]
Can you change mc to be an int array instead of a double?
Looks like you're using entries in mc, which are doubles, as a part of array subscripts, thus making the entire subscript a double.
If you meant to do this, try casting back to an integer. I don't know what the context of this problem is, but I'd take a real good look at what you're doing to ensure you really want to use the contents of mc as a subscript.
The compiler is complaining because the expression you use as an array index evaluates to type double.
In other words, the expression:
mc[2*kc] + i*dim
...will give you a result which is of type double. You may want to look into the rules for usual arithmetic type conversions in C if you don't understand why this expression evaluates to a double.
The problem is that array indices must be integral types, like int or long. This is because the array subscript operator in C is basically shorthand for pointer arithmetic. In other words, saying array[N] is the same as saying *(array + N). But you can't do pointer arithmetic with non-integral types like float or double, so of course the array subscript operator won't work that way either.
To fix this, you'll need to cast the result of your array-indexing expression to an integral type.
mc is an array of doubles, and floating point values cannot be used to index arrays. I notice that nowhere in your code do you assign anything other than integers to mc. You should consider changing mc's type to an array of integers.
Related
I am new to C language :)
Although the code runs perfectly, I cannot understand how it is operating. I mean let's take (k = i / j * j) as an example, according to mathematic rules the answer should be k = 2 but the program output is 0. can anyone help me out with what rule C language is applying to it?
int i = 2, j=3, k,l;
float a, b;
k = i / j * j;
l = j / i * i;
a = i / j * j;
b = j / i * i;
printf( "%d %d %f %f", k, l, a, b );
According to the operator precedence of the C language, the corresponding operation is performed as follows.
k = i / j * j;
= (i / j) * j;
= (2 / 3) * 3
= 0 * 3
= 0
Note: https://en.cppreference.com/w/c/language/operator_precedence
(As you probably know, division between Integers cannot represent a decimal point as an operation.)
I need to pass Two Dimension array to a function as a single pointer. There are different types of approaches are there but due to some constraints(CodeGeneration), I want to pass a single pointer only. I have macros which contain the size of each dimension. I implemented the following way but I am not sure it will work fine for N dimensions also
#define size_1D 3
#define size_2D 3
void fun(int *arr)
{
int i,total_size = size_1D* size_2D;
for(i = 0; i < total_size ; i++)
{
int value = arr[i];
}
}
int main()
{
int arr[size_1D][size_2D] = {{1,2,7},{8,4,9}};
fun(&arr[0][0]);
}
Any loophole is there if I followed the above approach?
void fun(int (*arr)[3]);
or exactly equivalent, but maybe more readable:
void fun(int arr[][3]);
arr is a pointer to two dimensional array with 3 rows and 3 columns. arr decayed to a pointer has the type of a pointer to an array of 3 elements. You need to pass a pointer to an array of 3 elements. You can access the data normally, using arr[a][b].
#define size_1D 3
#define size_2D 3
void fun(int arr[][3])
{
for(int i = 0; i < size_1D ; i++) {
for(int j = 0; j < size_2D ; j++) {
int value = arr[i][j];
}
}
}
int main()
{
int arr[size_1D][size_2D] = {{1,2,7},{8,4,9}};
fun(arr);
}
You can specify the sizes as arguments and use a variable length array declaration inside function parameter list. The compiler will do some job for you.
#include <stdlib.h>
void fun(size_t xmax, size_t ymax, int arr[xmax][ymax]);
// is equivalent to
void fun(size_t xmax, size_t ymax, int arr[][ymax]);
// is equivalent to
void fun(size_t xmax, size_t ymax, int (*arr)[ymax]);
void fun(size_t xmax, size_t ymax, int arr[xmax][ymax])
{
for(int i = 0; i < xmax ; i++) {
for(int j = 0; j < ymax ; j++) {
int value = arr[i][j];
}
}
}
int main()
{
int arr[3][4] = {{1,2,7},{8,4,9}};
fun(3, 4, arr);
}
#edit
We know that the result of array subscript operator is exactly identical to pointer dereference operator of the sum:
a[b] <=> *(a + b)
From pointer arithmetic we know that:
type *pnt;
int a;
pnt + a = (typeof(pnt))(void*)((uintptr_t)(void*)pnt + a * sizeof(*pnt))
pnt + a = (int*)(void*)((uintptr_t)(void*)pnt + a * sizeof(type))
And that the array is equal to the value to the pointer to the first element of an array:
type pnt[A];
assert((uintptr_t)pnt == (uintptr_t)&pnt[0]);
assert((uintptr_t)pnt == (uintptr_t)&*(pnt + 0));
assert((uintptr_t)pnt == (uintptr_t)&*pnt);
So:
int arr[A][B];
then:
arr[x][y]
is equivalent to (ignore warnings, kind-of pseudocode):
*(*(arr + x) + y)
*( *(int[A][B])( (uintptr_t)arr + x * sizeof(int[B]) ) + y )
// ---- x * sizeof(int[B]) = x * B * sizeof(int)
*( *(int[A][B])( (uintptr_t)arr + x * B * sizeof(int) ) + y )
// ---- C11 6.5.2.1p3
*( (int[B])( (uintptr_t)arr + x * B * sizeof(int) ) + y )
*(int[B])( (uintptr_t)( (uintptr_t)arr + x * B * sizeof(int) ) + y * sizeof(int) )
// ---- *(int[B])( ... ) = (int)dereference( ... ) = *(int*)( ... )
// ---- loose braces - conversion from size_t to uintptr_t should be safe
*(int*)( (uintptr_t)arr + x * B * sizeof(int) + y * sizeof(int) )
*(int*)( (uintptr_t)arr + ( x * B + y ) * sizeof(int) )
*(int*)( (uintptr_t)( &*arr ) + ( x * B + y ) * sizeof(int) )
// ---- (uintptr_t)arr = (uintptr_t)&arr[0][0]
*(int*)( (uintptr_t)( &*(*(arr + 0) + 0) ) + ( x * B + y ) * sizeof(int) )
*(int*)( (uintptr_t)( &arr[0][0] ) + ( x * B + y ) * sizeof(int) )
*(int*)( (uintptr_t)&arr[0][0] + ( x * B + y ) * sizeof(int) )
// ---- decayed typeof(&arr[0][0]) = int*
*( &arr[0][0] + ( x * B + y ) )
(&arr[0][0])[x * B + y]
So:
arr[x][y] == (&arr[0][0])[x * B + y]
arr[x][y] == (&arr[0][0])[x * sizeof(*arr)/sizeof(**arr) + y]
On a sane architecture where sizeof(uintptr_t) == sizeof(size_t) == sizeof(int*) == sizeof(int**) and etc., and there is no difference in accessing data behind a int* pointer from accessing data behind int(*)[B] pointer etc. You should be safe with accessing one dimensional array when using a pointer to the first array member, as the operations should be equivalent ("safe" with exception for out-of-bound accesses, that's never safe)
Note, that this is correctly undefined behavior according to C standard and will not work on all architectures. Example: there could be an architecture, where data of the type int[A] are stored in different memory bank then int[A][B] data (by hardware, by design). So the type of the pointer tells the compiler which data bank to choose, so accessing the same data with the same to the value pointer, but with different pointer type, leads to UB, as the compiler chooses different data bank to access the data.
I want to transform 2d array to 1d. I put the most important part of my code.
int mask[3][3] = {{0, -1, 0}, {-1, 4, -1}, {0, -1, 0}};
for (i = 1; i < rows - 1; i++) {
for (j = 1; j < cols - 1;j++) {
int s;
s = mask[0][0] * image[i-1][j-1]
+ mask[0][1] * image[i-1][j]
+ mask[0][2] * image[i-1][j+1]
+ mask[1][0] * image[i][j-1]
+ mask[1][1] * image[i][j]
+ mask[1][2] * image[i][j+1]
+ mask[2][0] * image[i+1][j-1]
+ mask[2][1] * image[i+1][j]
+ mask[2][2] * image[i+1][j+1];
}
}
my 1d array
for (k = rows + 1; k < (cols * rows) / 2; k++) {
int s;
s = 0 * image_in[k-rows-1]
- 1 * image_in[k-rows]
+ 0 * image_in[k-rows+1]
- 1 * image_in[k-1]
+ 4 * image_in[k]
- 1 * image_in[k+1]
+ 0 * image_in[k+rows-1]
- 1 * image_in[k+rows]
+ 0 * image_in[k+rows+1];
}
That should be the same but I don't know if I correctly doing transformations. Can someone tell me if that is ok?
First of all: Why do you want to get away with the 2D array? You think that 2D array dimensions must be constant? Well, in that case I have good news for you: You are wrong. This code should work perfectly:
int width = ..., height = ...;
//Create a 2D array on the heap with dynamic sizes:
int (*image_in)[width] = malloc(height * sizeof(*image_in));
//initialize the array
for(int i = 0; i < height; i++) {
for(int j = 0; j < width; j++) {
image_in[i][j] = ...;
}
}
You see, apart from the somewhat cryptic declaration of the array pointer, the indexing remains exactly the same as with an automatic 2D array on the stack.
Within your given loop, you want to address the cells relative to the center cell. This is easiest done by actually addressing relative to that cell:
for (i = 1; i < rows - 1; i++) {
for (j = 1; j < cols - 1;j++) {
int* center = &image_in[i][j];
int s = mask[0][0] * center[-width - 1]
+ mask[0][1] * center[-width]
+ mask[0][2] * center[-width + 1]
+ mask[1][0] * center[-1]
+ mask[1][1] * center[0]
+ mask[1][2] * center[1]
+ mask[2][0] * center[width - 1]
+ mask[2][1] * center[width]
+ mask[2][2] * center[width + 1];
}
}
This works because the 2D array has the same memory layout as your 1D array (this is guaranteed by the C standard).
The edge handling in a 1D loop is always wrong: It will execute the body of the loop for the first and last cells of each line. This cannot be fixed without introducing some if() statements into the loop which will significantly slow things down.
This may be ignored if the consequences are proven to be irrelevant (you still need to exclude the first and last lines plus a cell). However, the edge handling is much easier if you stick to a 2D array.
If the first part of your code gives you expected result, then you can do the same with 1d array this way :
for (i = 1; i < rows - 1; i++) {
for (j = 1; j < cols - 1;j++) {
int s;
s = mask[0][0] * image_in[i-1+rows*(j-1)]
+ mask[0][1] * image_in[i-1+rows*j]
+ mask[0][2] * image_in[i-1+rows*(j+1)]
+ mask[1][0] * image_in[i+rows*(j-1)]
+ mask[1][1] * image_in[i+rows*j]
+ mask[1][2] * image_in[i+rows*(j+1)]
+ mask[2][0] * image_in[i+1+rows*(j-1)]
+ mask[2][1] * image_in[i+1+rows*j]
+ mask[2][2] * image_in[i+1+rows*(j+1)];
}
}
This way, if you are good with 2d arrays, you can do the same without error with 1d array as if they were 2d.
I am trying to implement a Navier-Stokes solver in 2D using CUDA. I am using Jacobi's method to solve the system of difference equations. I am dividing the code in 4x4 blocks consisting of 16x16 threads. As every inner point in my matrix (of dimension 64x64) requires its top, bottom, left and right element to compute its new value, I create a new shared matrix of 18x18 dimension for every block. I read all the values into the matrix in this fashion - The thread with indices (0, 0) will write its value into the (1, 1) element in the matrix and will also attempt to read the element above it and the one to its left if this access is not exceeding the boundary. Once this read is done, I update the values of all the internal points and then write them back into memory.
I end up getting garbage values in the matrix pn, even though all the values are initialized correctly. I honestly cannot see where I'm going wrong. Can someone help me with this?
My kernel -
__global__ void red_psi (float *psi_o, float *psi_n, float *e, float *omega, float l1)
{
// m = n = 64
int i1 = blockIdx.x;
int j1 = blockIdx.y;
int i2 = threadIdx.x;
int j2 = threadIdx.y;
int i = (i1 * blockDim.x) + i2; // Actual row of the element
int j = (j1 * blockDim.y) + j2; // Actual column of the element
int l = i * n + j;
// e_XX --> variables refers to expanded shared memory location in order to accomodate halo elements
//Current Local ID with radius offset.
int e_li = i2 + 1;
int e_lj = j2 + 1;
// Variable pointing at top and bottom neighbouring location
int e_li_prev = e_li - 1;
int e_li_next = e_li + 1;
// Variable pointing at left and right neighbouring location
int e_lj_prev = e_lj - 1;
int e_lj_next = e_lj + 1;
__shared__ float po[BLOCK_SIZE + 2][BLOCK_SIZE + 2];
__shared__ float pn[BLOCK_SIZE + 2][BLOCK_SIZE + 2];
__shared__ float oo[BLOCK_SIZE + 2][BLOCK_SIZE + 2];
//__shared__ float ee[BLOCK_SIZE + 2][BLOCK_SIZE + 2];
if (i2 < 1) // copy top and bottom halo
{
//Copy Top Halo Element
if (blockIdx.y > 0) // Boundary check
{
po[i2][e_lj] = psi_o[l - n];
//pn[i2][e_lj] = psi_n[l - n];
oo[i2][e_lj] = omega[l - n];
//printf ("i_pn[%d][%d] = %f\n", i2, e_lj, oo[i2][e_lj]);
}
//Copy Bottom Halo Element
if (blockIdx.y < (gridDim.y - 1)) // Boundary check
{
po[1 + BLOCK_SIZE][e_lj] = psi_o[l + n];
//pn[1 + BLOCK_SIZE][e_lj] = psi_n[l + n];
oo[1 + BLOCK_SIZE][e_lj] = omega[l + n];
//printf ("j_pn[%d][%d] = %f\n", 1 + BLOCK_SIZE, e_lj, oo[1 + BLOCK_SIZE][e_lj]);
}
}
if (j2 < 1) // copy left and right halo
{
if (blockIdx.x > 0) // Boundary check
{
po[e_li][j2] = psi_o[l - 1];
//pn[e_li][j2] = psi_n[l - 1];
oo[e_li][j2] = omega[l - 1];
//printf ("k_pn[%d][%d] = %f\n", e_li, j2, oo[e_li][j2]);
}
if (blockIdx.x < (gridDim.x - 1)) // Boundary check
{
po[e_li][1 + BLOCK_SIZE] = psi_o[l + 1];
//pn[e_li][1 + BLOCK_SIZE] = psi_n[l + 1];
oo[e_li][1 + BLOCK_SIZE] = omega[l + 1];
//printf ("l_pn[%d][%d] = %f\n", e_li, 1 + BLOCK_SIZE, oo[e_li][BLOCK_SIZE + 1]);
}
}
// copy current location
po[e_li][e_lj] = psi_o[l];
//pn[e_li][e_lj] = psi_n[l];
oo[e_li][e_lj] = omega[l];
//printf ("o_pn[%d][%d] = %f\n", e_li, e_lj, oo[e_li][e_lj]);
__syncthreads ();
// Checking whether we have an internal point.
if ((i >= 1 && i < (m - 1)) && (j >= 1 && j < (n - 1)))
{
//printf ("Calculating for - (%d, %d)\n", i, j);
pn[e_li][e_lj] = 0.25 * (po[e_li_next][e_lj] + po[e_li_prev][e_lj] + po[e_li][e_lj_next] + po[e_li][e_lj_prev] + h*h*oo[e_li][e_lj]);
//printf ("n_pn[%d][%d] (%d, %d), a(%d, %d) = %f\n", e_li_prev, e_lj, i1, j1, i, j, po[e_li_prev][e_lj]);
pn[e_li][e_lj] = po[e_li][e_lj] + 1.0 * (pn[e_li][e_lj] - po[e_li][e_lj]);
__syncthreads ();
psi_n[l] = pn[e_li][e_lj];
e[l] = po[e_li][e_lj] - pn[e_li][e_lj];
}
}
This is how I invoke the kernel -
dim3 threadsPerBlock (4, 4);
dim3 numBlocks (4, 4);
red_psi<<<numBlocks, threadsPerBlock>>> (d_xn, d_xx, d_e, d_w, l1);
(d_xx, d_xn, d_e, d_w are all float arrays of size 4096)
I switched the blockDim.x and blockDim.y when I was copying the top / bottom and the left / right halo elements.
I have implemented the following quicksort algorithm to sort couples of points(3D space).
Every couple defines a line: the purpose is to place all lines that have a distance less or equal to powR nearby inside the array which contains all the couples.
The Array containing coordinates is monodimentional, every 6 elements define a couple and every 3 a point.
When i run the algorithm with an array of 3099642 elements stops after processing 2799222 trying to enter the next iteration. if i start the algorithm from element 2799228 it stops at 3066300.
I can't figure out where is the problem, and suggestion?
void QuickSort(float *array, int from, int to, float powR){
float pivot[6];
float temp[6];
float x1;
float y1;
float z1;
float x2;
float y2;
float z2;
float d12;
int i;
int j;
if(from >= to)
return;
pivot[0] = array[from+0];
pivot[1] = array[from+1];
pivot[2] = array[from+2];
pivot[3] = array[from+3];
pivot[4] = array[from+4];
pivot[5] = array[from+5];
i = from;
for(j = from+6; j <= to; j += 6){
x1 = pivot[0] - array[j+0];
y1 = pivot[1] - array[j+1];
z1 = pivot[2] - array[j+2];
x2 = pivot[3] - array[j+3];
y2 = pivot[4] - array[j+4];
z2 = pivot[5] - array[j+5];
d12 = (x1*x1 + y1*y1 + z1*z1) + (x2*x2 + y2*y2 + z2*z2);
/*the sorting condition i am using is the regular euclidean norm*/
if (d12 <= powR){
i += 6;
temp[0] = array[i+0];
temp[1] = array[i+1];
temp[2] = array[i+2];
temp[3] = array[i+3];
temp[4] = array[i+4];
temp[5] = array[i+5];
array[i+0] = array[j+0];
array[i+1] = array[j+1];
array[i+2] = array[j+2];
array[i+3] = array[j+3];
array[i+4] = array[j+4];
array[i+5] = array[j+5];
array[j+0] = temp[0];
array[j+1] = temp[1];
array[j+2] = temp[2];
array[j+3] = temp[3];
array[j+4] = temp[4];
array[j+5] = temp[5];
}
}
QuickSort(array, i+6, to, powR);
}
function is called in this way:
float LORs = (float) calloc((unsigned)tot, sizeof(float));
LORs is filled reading datas from a file, and works fine.
QuickSort(LORs, 0, 6000, powR);
free(LORs);
for(j = from+6; j <= to; j += 6) {
array[i+0] = array[j+0];
array[i+1] = array[j+1];
array[i+2] = array[j+2];
array[i+3] = array[j+3];
array[i+4] = array[j+4];
array[i+5] = array[j+5];
}
Your j + constant_number goes out of bounds when you approach the end. That's why it crashes at the end. Note that constant_number is non-negative.
When j comes close (how close you can find by the increment step, i.e. +6) to the end of your array, it will go for sure out of bounds.
Take the easy case, the max value j can get. That is the size of your array.
So, let's call it N.
Then, when j is equal to N, you are going to enter the loop.
Then, you want to access array[j + 0], which is actually array[N + 0], which is array[N].
I am pretty sure, you know that indexing in C (which you should in the future include in the tags of your questions is needed), is from 0 to N - 1. And so on..
EDIT: As the comments suggest, this is not a (quick)sort!
I had implemented quickSort here, is you want to take an idea of it. I suggest you start from the explanations and not from the code!