2D array with CUDA and cudaMallocPitch - c

I have been reading a few threads on stackoverflow about 2D arrays and cudaMallocPitch and I have tried to use cudaMallocPitch with the small documentation I have found. However I'm now facing a problem.
I need to go through an array and do something similar :
for(int k=0; k<100; ++k){
for(i=SID; i<SID+stride; ++i){
while(-1 < j && Driver[k][j] != Road[i]){
j = Pilot[j][k];
}
++j;
}
}
I was thus wondering, how should I adapt this code to make it work with the pitch, because I have read that I had to update the pointer to the beginning of the row. Of course my kernel receives the following :
__global__ void driving(char *Driver, size_t pitch_driver,
char *Road, int *Pilot, size_t pitch_pilot)
And I'm not really sure how to make things working, I've been reading and trying, but it seems not working at the moment.
Thank you.
Edit 1: I have been reading this thread in particular :How to use 2D Arrays in CUDA? and came across the lines :
for (int row = 0; row < rowCount; row++)
{
// update the pointer to point to the beginning of the next row
float* rowData = (float*)(((char*)d_array) + (row * pitch));
for (int column = 0; column < columnCount; column++)
{
rowData[column] = 123.0; // make every value in the array 123.0
destinationArray[(row*columnCount) + column] = rowData[column];
}
}
Which is updating the pointer of the next row, I'am not sure how to use to make my 2 for loops and while working such as in the previous code.
At the moment I can only access one dimension of my array but not the other one.
it returns the value 2, but when I try my multiple comparisons, it only returns 0, or even comparing two values do not work.

In the CUDA Reference Manual it says:
5.8.2.17 cudaError_t cudaMallocPitch (void devPtr, size_t pitch, size_t width, size_t height)
[...]
Given the row and column of
an array element of type T, the address is computed as:
T* pElement = (T*)((char*)BaseAddress + Row * pitch) + Column;
So you need to cast your pointer first to char*, do the math and then cast it back to your type.

Related

Find the indices of the k smallest values in C

I'm implementing K Nearest Neighbor in C and I've gotten to the point where I've computed a distance matrix of every point in my to-be-labeled set of size m to every point in my already-labeled set of size n. The format of this matrix is
[[dist_0,0 ... dist_0,n-1]
.
.
.
[dist_m-1,0 ... dist_m-1,n-1]]
Next, I need to find the k smallest distances in each row so I can use the column indices to access the labels of those points and then compute the label for the point the row index is referring to. The latter part is trivial but computing the indices of the k smallest distances has me stumped. Python has easy ways to do something like this but the bare bones nature of C has gotten me a bit frustrated. I'd appreciate some pointers (no pun intended) on what to go about doing and any helpful functions C might have to help.
Without knowing k, and assuming that it can be variable, the simplest way to do this would be to:
Organize each element in a structure which holds the original column index.
Sort each row of the matrix in ascending order and take the first k elements of that row.
struct item {
unsigned value;
size_t index;
};
int compare_items(void *a, void *b) {
struct item *item_a = a;
struct item *item_b = b;
if (item_a->value < item_b->value)
return -1;
if (item_a->value > item_b->value)
return 1;
return 0;
}
// Your matrix:
struct item matrix[N][M];
/* Populate the matrix... make sure that each index is set,
* e.g. matrix[0][0] has index = 0.
*/
size_t i, j;
for (i = 0; i < M; i++) {
qsort(matrix[i], N, sizeof(struct item), compare_items);
/* Now the i-th row is sorted and you can take a look
* at the first k elements of the row.
*/
for (j = 0; j < k; j++) {
// Do something with matrix[i][j].index ...
}
}

2-D matrices using malloc failing to assign correct values

sorry, I'm relatively new to c and am trying to create two 2-D arrays using malloc. I was told that this method is computationally more efficient than creating a pointer array of arrays through a for loop (for large arrays).
int i, j;
double **PNow, **PNext, *Array2D1, *Array2D2;
//Allocate memory
PNow = (double**)malloc(3 * sizeof(double*));
PNext = (double**)malloc(3 * sizeof(double*));
Array2D1 = (double*)malloc(5 * sizeof(double));
Array2D2 = (double*)malloc(5 * sizeof(double));
//Create 2-Dimensionality
for(i = 0; i < 3; i++)
{
PNow[i] = Array2D1 + i * 5;
PNext[i] = Array2D2 + i * 5;
};
//Define Element Values
for(i = 0; i < 3; i++)
{
for(j = 0; j < 5; j++)
{
PNow[i][j] = 10.*(i + j);
PNext[i][j] = 1000.*(i + j);
};
};
//Output two matrices side-by-side.
for(i = 0; i < 3; i++)
{
for(j = 0; j < 5; j++)
{
printf("%6lg", PNow[i][j]);
if(j == 4)
{
printf("|");
};
};
for(j = 0; j < 5; j++)
{
printf("%6lg", PNext[i][j]);
if(j == 4)
{
printf("\n");
};
};
};
My problem is that the first matrix (PNow) turns out as I would expect, but for some reason half of the values in PNext are those of PNow, and I can't for the life of me figure out why it is doing this? I'm obviously missing something.. Also I am not overly clear on what "Array2D1 + i*5" is doing and how this makes PNow a 2-D array?
Any help would be really appreciated.
Thank you.
P.S. This is the output that I am getting, so you can see what I mean:
0 10 20 30 40| 20 30 40 50 20
10 20 30 40 50| 30 40 50 60 5000
20 30 40 50 60| 2000 3000 4000 5000 6000
In C you don't cast the result of mallocs, so your malloc lines should read
PNow = malloc(3*sizeof(double*));
Your problem is you're not actually allocating enough memory in Array2D1 and Array2D2. When you move past the first "row" in your array you're getting beyond your allocated memory! So you're in undefined behavior territory. In your case, it looks like your two matrices step all over each other (though my test just throws an error). You can solve this in two ways:
Specify the full size of your matrix in the malloc and do as you did:
Array2D1 = malloc(15*sizeof(double));
Array2D2 = malloc(15*sizeof(double));
Or malloc each line in your for loop:
for(i=0; i<3; i++){
PNow[i] = malloc(5*sizeof(double));
PNext[i] = malloc(5*sizeof(double));
}
Edit: On the topic of freeing in each example
For the first example, the freeing is straight forward
free(PNow);
free(PNext);
free(Array2D1);
free(Array2D2);
For the second, you must iterate through each line and free individually
for (i = 0; i < 3; i++) {
free(PNow[i]);
free(PNext[i]);
}
Edit2: Realistically, if you're going to hardcode your rows and columns in with a pre-processor macro, there's no reason to malloc at all. You can simply do this:
#define ROW 3
#define COL 5
double PNow[ROW][COL], PNext[ROW][COL];
Edit3: As for what Array2D1 + i * 5 is doing, PNow is an array of pointers, and Array2D1 is a pointer. By adding i * 5 you're incrementing the pointer by i * 5 (i.e., saying "give me a pointer to the memory that is i * 5 doubles away from Array2D1). So, you're filling PNow with pointers to the starts of appropriately sized memory chunks for your rows.
You code does not have 2D arrays, aka matrices. And your pointers cannot point to such an object either.
A proper pointer which can point to a 2D array would be declared like:
#define ROWS 4
#define COLS 5
double (*arr)[COLS];
Allocation is straight-forward:
arr = malloc(sizeof(*arr) * ROWS);
And deleting similar:
free(arr);
Indexing is like:
arr[row][col]
Notice the identical syntax only. The semantics are different.
Nothing more necessary and no need for hand-crafted pointer arrays.
The code above shows another important rule: Don't use magic values. Use constant-like macros instead. These should be #defined at the beginning or in a configuration-section of the code (typically somewhere near the top of the file or a distinct header file). So if you lateron change e.g. the length of a dimension, you don't have to edit all places you explicitly wrote it, but only change the macro once.
While the code above uses constants, you can as well use variables for the dimensions. This is standard C and called variable length array (VLA). If you pass the arrays to other functions, you have to pass them as additional arguments:
void f(size_t rows, size_t cols, double a[rows][cols]);
Remember array-arguments decay to pointers to the first element, so a is actually the same as arr above. The outermost dimension can be omitted, but as you need it anyway it is good for documentation to specify it, too.

Best solution to represent Data[i,j] in c?

There is a pseudocode that I want to implement in C. But I am in doubt on how to implement a part of it. The psuedocode is:
for every pair of states qi, and qj, i<j, do
D[i,j] := 0
S[i,j] := notzero
end for
i and j, in qi and qj are subscripts.
how do I represent D[i,J] or S[i,j]. which data structure to use so that its simple and fast.
You can use something like
int length= 10;
int i =0, j= 0;
int res1[10][10] = {0, }; //index is based on "length" value
int res2[10][10] = {0, }; //index is based on "length" value
and then
for (i =0; i < length; i++)
{
for (j =0; j < length; j++)
{
res1[i][j] = 0;
res2[i][j] = 1;//notzero
}
}
Here D[i,j] and S[i,j] are represented by res1[10][10] and res2[10][10], respectively. These are called two-dimentional array.
I guess struct will be your friend here depending on what you actually want to work with.
Struct would be fine if, say, pair of states creates some kind of entity.
Otherwise You could use two-dimensional array.
After accept answer.
Depending on coding goals and platform, to get "simple and fast" using a pointer to pointer to a number may be faster then a 2-D array in C.
// 2-D array
double x[MAX_ROW][MAX_COL];
// Code computes the address in `x`, often involving a i*MAX_COL, if not in a loop.
// Slower when multiplication is expensive and random array access occurs.
x[i][j] = f();
// pointer to pointer of double
double **y = calloc(MAX_ROW, sizeof *y);
for (i=0; i<MAX_ROW; i++) y[i] = calloc(MAX_COL, sizeof *(y[i]));
// Code computes the address in `y` by a lookup of y[i]
y[i][j] = f();
Flexibility
The first data type is easy print(x), when the array size is fixed, but becomes challenging otherwise.
The 2nd data type is easy print(y, rows, columns), when the array size is variable and of course works well with fixed.
The 2nd data type also row swapping simply by swapping pointers.
So if code is using a fixed array size, use double x[MAX_ROW][MAX_COL], otherwise recommend double **y. YMMV

C extract an array from a matrix using pointers

I wrote a code and I have some data stored in a 2d matrix:
double y[LENGTH][2];
I have a function that take as input a 1D array:
double function(double* data)
I am interested in passing the data stored in the first column of this matrix to this function. How can I do that using pointers?
My function is something like (where the array data is an array of double containing LENGTH elements:
double data[LENGTH];
):
double function(double* data){
double result=0;
for(int i=0; i<LENGTH; i++){
result+=data[i];
}
return result;
}
And I want to pass to this function a row of a matrix as data input.
Thanks to everyone in advance!
If you pass a pointer to the first element of your 2D matrix, you can access it as a 1 D matrix since the elements are stored contiguously:
double y[LENGTH][2];
x = function(y[0]);
...
double function(double* p) {
int ii;
double sum=0;
for(ii=0; ii<2*LENGTH; ii++) sum += p[ii];
return sum;
}
Note that in this case the order of accessing the elements is
y[0][0]
y[0][1]
y[1][0]
y[1][1]
y[2][0]
... etc
update - you just clarified your question a little bit. If you want to access just one column of data, you need to skip through the array. This means you need to know the size of the second dimension. I would recommend something like this:
double function(double* p, int D2) {
int ii;
double sum=0;
for(ii=0; ii<D2*LENGTH; ii+=D2) sum += p[ii];
return sum;
}
And you would call it with
x = function(y[colNum], numCols);
Now we start at a certain location, then, skip forward D2 elements to access the next element in the column.
I have to say that this is rather ugly - this is not really how C is intended to be used. I would recommend wrapping things into a class that handles these things for you cleanly - in other words, switch to C++ (although it's possible to write pure C functions that "hide" some of this complexity). You could of course copy the data to another memory block to make it contiguous, but that's usually considered a last recourse.
Be careful that you don't end up with code that is unreadable / unmaintainable...
further update
Per your comment, the above is still not what you wanted. Then I recommend the following:
double *colPointer(double *p, int rowCount, int colCount) {
double *cp;
int ii;
cp = malloc(rowCount * sizeof *cp);
for(ii=0; ii<rowCount; ii++) cp[ii] = *(p + ii * colCount);
return cp;
}
This will return a pointer to a newly created copy of the column. You call it with
double *cc;
cc = colPointer(y[colNum], LENGTH, 2);
answer = function(cc);
And now you can use cc in the way you wanted. If you have to do this many times you might be better off transposing the entire array just once - that way you can pass a pointer to a row of the transpose and achieve your result. You can adapt the code above to generate such a transpose.
Note that there is a risk of memory leaks if you don't clean up after yourself with this method.
the question is that do you consider to be the row-dimension.
usually the first one is rows and the second one cols.
that means that your double y[LENGTH][2]; is a matrix with LENGTH rows ans 2 cols.
if that is also your interpretation then the answer to your question is "you can't" since the memory is layed out like this:
r0c0 r0c1 r1c0 r1c1 r2c0 r2c1 ...
you can retrieve pointer to a row but not to a column.
matrix classes are usually designed in a way, that row and column step length is stored so that by carefully setting them you can build sub matrices on a big data chunk.
you may look for opencv matrix implementation if you plan to perform complexer tasks.
if you can change the implementation of the function you want to call. you can change it to accept the row step (number of your columns), so that it does not joust increment the pointer by one to reach the next element but to increment the pointer by row step.
as an alternative there is the obvious way to copy the required column to a new array.
edit:
fixed stupid error on memory layout diagram

alternative to multidimensional array in c

tI have the following code:
#define FIRST_COUNT 100
#define X_COUNT 250
#define Y_COUNT 310
#define z_COUNT 40
struct s_tsp {
short abc[FIRST_COUNT][X_COUNT][Y_COUNT][Z_COUNT];
};
struct s_tsp xyz;
I need to run through the data like this:
for (int i = 0; i < FIRST_COUNT; ++i)
for (int j = 0; j < X_COUNT; ++j)
for (int k = 0; k < Y_COUNT; ++k)
for (int n = 0; n < Z_COUNT; ++n)
doSomething(xyz, i, j, k, n);
I've tried to think of a more elegant, less brain-dead approach. ( I know that this sort of multidimensional array is inefficient in terms of cpu usage, but that is irrelevant in this case.) Is there a better approach to the way I've structured things here?
If you need a 4D array, then that's what you need. It's possible to 'flatten' it into a single dimensional malloc()ed 'array', however that is not quite as clean:
abc = malloc(sizeof(short)*FIRST_COUNT*X_COUNT*Y_COUNT*Z_COUNT);
Accesses are also more difficult:
*(abc + FIRST_COUNT*X_COUNT*Y_COUNT*i + FIRST_COUNT*X_COUNT*j + FIRST_COUNT*k + n)
So that's obviously a bit of a pain.
But you do have the advantage that if you need to simply iterate over every single element, you can do:
for (int i = 0; i < FIRST_COUNT*X_COUNT*Y_COUNT*Z_COUNT; i++) {
doWhateverWith *(abc+i);
}
Clearly this method is terribly ugly for most uses, and is a bit neater for one type of access. It's also a bit more memory-conservative and only requires one pointer-dereference rather than 4.
NOTE: The intention of the examples used in this post are just to explain the concepts. So the examples may be incomplete, may lack error handling, etc.
When it comes to usage of multi-dimension array in C, the following are the two possible ways.
Flattening of Arrays
In C, arrays are implemented as a contiguous memory block. This information can be used to manipulate the values stored in the array and allows rapid access to a particular array location.
For example,
int arr[10][10];
int *ptr = (int *)arr ;
ptr[11] = 10;
// this is equivalent to arr[1][0] = 10; assign a 2D array
// and manipulate now as a single dimensional array.
The technique of exploiting the contiguous nature of arrays is known as flattening of arrays.
Ragged Arrays
Now, consider the following example.
char **list;
list[0] = "United States of America";
list[1] = "India";
list[2] = "United Kingdom";
for(int i=0; i< 3 ;i++)
printf(" %d ",strlen(list[i]));
// prints 24 5 14
This type of implementation is known as ragged array, and is useful in places where the strings of variable size are used. Popular method is to have dynamic-memory-allocation to be done on the every dimension.
NOTE: The command line argument (char *argv[]) is passed only as ragged array.
Comparing flattened and ragged arrays
Now, lets consider the following code snippet which compares the flattened and ragged arrays.
/* Note: lacks error handling */
int flattened[30][20][10];
int ***ragged;
int i,j,numElements=0,numPointers=1;
ragged = (int ***) malloc(sizeof(int **) * 30);
numPointers += 30;
for( i=0; i<30; i++) {
ragged[i] = (int **)malloc(sizeof(int*) * 20);
numPointers += 20;
for(j=0; j<20; j++) {
ragged[i][j]=(int*)malloc(sizeof(int) * 10);
numElements += 10;
}
}
printf("Number of elements = %d",numElements);
printf("Number of pointers = %d",numPointers);
// it prints
// Number of elements = 6000
// Number of pointers = 631
From the above example, the ragged arrays require 631-pointers, in other words, 631 * sizeof(int *) extra memory locations for pointing 6000 integers. Whereas, the flattened array requires only one base pointer: i.e. the name of the array enough to point to the contiguous 6000 memory locations.
But OTOH, the ragged arrays are flexible. In cases where the exact number of memory locations required is not known you cannot have the luxury of allocating the memory for worst possible case. Again, in some cases the exact number of memory space required is known only at run-time. In such situations ragged arrays become handy.
Row-major and column-major of Arrays
C follows row-major ordering for multi-dimensional arrays. Flattening of arrays can be viewed as an effect due this aspect in C. The significance of row-major order of C is it fits to the natural way in which most of the accessing is made in the programming. For example, lets look at an example for traversing a N * M 2D matrix,
for(i=0; i<N; i++) {
for(j=0; j<M; j++)
printf(“%d ”, matrix[i][j]);
printf("\n");
}
Each row in the matrix is accessed one by one, by varying the column rapidly. The C array is arranged in memory in this natural way. On contrary, consider the following example,
for(i=0; i<M; i++) {
for(j=0; j<N; j++)
printf(“%d ”, matrix[j][i]);
printf("\n");
}
This changes the column index most frequently than the row index. And because of this there is a lot of difference in efficiency between these two code snippet. Yes, the first one is more efficient than the second one!
Because the first one accesses the array in the natural order (row-major order) of C, hence it is faster, whereas the second one takes more time to jump. The difference in performance would get widen as the number of dimensions and the size of element increases.
So when working with multi-dimension arrays in C, its good to consider the above details!

Resources