array operation using CUDA kernel - c

I'm writing CUDA kernel and threads are performing following tasks :
for example i have array of [1, 2, 3, 4] then I want answer [12, 13, 14, 23, 24, 34]
Suppose I've an array with n integers and i've two indexes i and j.
simple solution for that in C language will be :
k=0;
for (i = 0; i < n - 1; i++)
for(j = i+1; j < n-1 ; j++)
{ new_array[k] = array[i]*10 + array[j];
k++;
}
In CUDA I've tried my luck :
for(i = threadIdx.x + 1; i < n-1; i++ )
new_array[i] = array[threadIdx.x] * 10 + array[i];
But I think this is not totally correct or optimal way to do this. can anyone suggest anything better?

I'm assuming that the code you want to port to CUDA is the following:
#include <stdio.h>
#define N 7
int main(){
int array[N] = { 1, 2, 3, 4, 5, 6, 7};
int new_array[(N-1)*N/2] = { 0 };
int k=0;
for (int i = 0; i < N; i++)
for(int j = i+1; j < N; j++)
{
new_array[k] = array[i]*10 + array[j];
k++;
}
for (int i = 0; i < (N-1)*N/2; i++) printf("new_array[%d] = %d\n", i, new_array[i]);
return 0;
}
You may wish to note that you can recast the interior loop as
for (int i = 0; i < N; i++)
for(int j = i+1; j < N; j++)
new_array[i*N+(j-(i+1))-(i)*(i+1)/2] = array[i]*10 + array[j];
which will avoid the explicit definition of an index variable k by directly using index i*N+(j-(i+1))-(i)*(i+1)/2. Such an observation is useful becuase, if you interpret the indices i and j as thread indices in the ported code, then you will have a mapping between the 2d thread indices and the index needed to access the target array in the __global__ function you have to define.
Accordingly, the ported code is
#include <stdio.h>
#define N 7
__global__ void kernel(int* new_array_d, int* array_d) {
int i = threadIdx.x;
int j = threadIdx.y;
if (j > i) new_array_d[i*N+(j-(i+1))-(i)*(i+1)/2] = array_d[i]*10 + array_d[j];
}
int main(){
int array[N] = { 1, 2, 3, 4, 5, 6, 7};
int new_array[(N-1)*N/2] = { 0 };
int* array_d; cudaMalloc((void**)&array_d,N*sizeof(int));
int* new_array_d; cudaMalloc((void**)&new_array_d,(N-1)*N/2*sizeof(int));
cudaMemcpy(array_d,array,N*sizeof(int),cudaMemcpyHostToDevice);
dim3 grid(1,1);
dim3 block(N,N);
kernel<<<grid,block>>>(new_array_d,array_d);
cudaMemcpy(new_array,new_array_d,(N-1)*N/2*sizeof(int),cudaMemcpyDeviceToHost);
for (int i = 0; i < (N-1)*N/2; i++) printf("new_array[%d] = %d\n", i, new_array[i]);
return 0;
}
Please, add your own CUDA error check in the sense of What is the canonical way to check for errors using the CUDA runtime API?. Also, you may wish to extend the above CUDA code to the case of block grids of non-unitary sizes.

Related

How to transpose a 2 dimensional array on C via call by reference?

As the title says, I am trying to transpose a 2 dimensional matrix by calling by reference.
I have attached my code below. When I run the code, the 2 dimensional array is unchanged.
#include <stdio.h>
#define SIZE 4
void transpose2D(int ar[][SIZE], int rowSize, int colSize);
int main()
{
int testArr[4][4] = {
{1, 2, 3, 4},
{5, 1, 2, 2},
{6, 3, 4, 4},
{7, 5, 6, 7},
};
transpose2D(testArr, 4, 4);
// print out new array
for (int i = 0; i < 4; i++)
{
for (int j = 0; j < 4; j++)
{
printf("%d ", testArr[i][j]);
}
printf("\n");
}
return 0;
}
void transpose2D(int ar[][SIZE], int rowSize, int colSize)
{
for (int i = 0; i < rowSize; i++)
{
for (int j = 0; j < colSize; j++)
{
int temp = *(*(ar + i) + j);
*(*(ar + i) + j) = *(*(ar + j) + i);
*(*(ar + j) + i) = temp;
}
}
}
Have been stuck for a couple of hours, any help is greatly appreciated, thank you!
To fix your function I suggest:
switch element only once, the previous version swapped elements for i=a, j=b and i=b,j=a, so the matrix remained unchanged
use a common a[i][j] syntax
let the non-square matrix be embedded into a larger matrix whose inner dimensions is set to stride.
using VLAs to make the interface a bit more generic
void transpose2D(size_t rows, size_t cols, size_t stride, int ar[][stride]) {
assert(rows <= stride);
assert(cols <= stride);
for (size_t i = 0; i < rows; i++) {
for (size_t j = i + 1; j < cols; j++) {
int tmp = ar[j][i];
ar[j][i] = ar[i][j];
ar[i][j] = tmp;
}
}
}
Exemplary usage:
int testArr[4][4] = {
{1, 2},
{5, 1},
{6, 3},
};
transpose2D(3, 2, 4, testArr);
The algorithm is still very inefficient due to terrible cache miss rates on access to a[j][i]. It can be fixes by tiling and transposing smaller 8x8 blocks, but it is a topic for another day.

Count how many times an array element is larger than the subsequent element (off-by-one error)

I'm programming in C. I have to make a function called count , that counts how many times is larger than the subsequent element in the same array. For example, if we had a main code looking like this:
int main() {
int a1[] = { 5 };
int a2[] = { 1, 2, 3, 4, 5 };
int a3[] = { 5, 4, 3, 2, 1 };
int a4[] = { 1, 9, 3, 7, 5 };
int a5[] = { 7, 5, 6 };
printf("%d\n", count(a1, sizeof a1 / sizeof a1[0]));
printf("%d\n", count(a2, sizeof a2 / sizeof a2[0]));
printf("%d\n", count(a3, sizeof a3 / sizeof a3[0]));
printf("%d\n", count(a4, sizeof a4 / sizeof a4[0]));
printf("%d\n", count(a5, sizeof a5 / sizeof a5[0]));
return 0;
}
Count should return the following:
0
0
4
2
1
I have tried myself, but it seems like I get an off-by-one error that I don't know how to fix.
int count(int a[], int i){
int k=0;
int j;
for (j=0; j<=i-1; j++){
if(a[j] > a[j+1])
k++;
}
return k;
}
But this gives this wrong output:
0
1
5
3
2
Can someone spot the mistake in my code, or help me with this?
You're reading a[i] when j=i-1, which is out of array a bound.
for (j=0; j<=i-1; j++){
if(a[j] > a[j+1])
It should be
for (j=0; j<i-1; j++){
if(a[j] > a[j+1])
A way to avoid this off-by-one error is to use an idiomatic "iterate over an array" for loop and termination condition j < i but change the initial loop index from 0 to 1. The test inside the loop uses j and j - 1.
int count(const int *a, int i)
{
int k = 0;
for (int j = 1; j < i; j++) {
if (a[j - 1] > a[j])
k++;
}
return k;
}
I think j < i is easier to reason about than j <= i - 1 and be confident that it's correct.

C - Nested loops and stack?

I am trying to find the location of a target inside of a 1-D array that acts like a table with rows and cols. I could do it using divide and mod, but I am stuck on finding it using nested loops. specifically, I can't seem to assign values inside the nested loop.
here is my code:
#include <stdio.h>
int main()
{
int arr[9] = // act as a 3 X 3 table
{ 2, 34, 6,
7, 45, 45,
35,65, 2
};
int target = 7;// r = 1; c = 0
int r = 0; // row of the target
int c = 0; // col of the target
int rows = 3;
int cols = 3;
for (int i = 0; i < rows; i++){
for (int j = 0; j + i * cols < cols + i * cols; i++ ){
if (arr[j] == target){
c = j; // columns of the target
r = i; // rows of the target
}
}
}
printf ("%d, %d",c, r);
return 0;
}
The code outputs: 0,0.
The problem isn't with the assignment, it's with the wrong loop and if condition.
The outer loop should loop over the i rows
The inner loop should loop over the j columns
within both loops, the cell to evaluate is i * cols + j
Put it all together and you'll get:
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++ ) {
if (arr[i * cols + j] == target) {
c = j; // columns of the target
r = i; // rows of the target
}
}
}
Since arr is 1D array and inside for loop, for any i value j will reach upto max 3 only so its not checking after arr[3]
To avoid this problem take int pointer and points to arr and do the operation as below
int *p = arr;
for (i = 0; i < rows; i++){
for ( j = 0; j < cols ; j++ ){
if (p[j] == target){
c = j; // columns of the target
r = i; // rows of the target
}
}
p = p + j;/*make p to points to next row */
}
A better solution would use only one loop:
for (int i = 0; i < rows * cols; i++){
if (arr[i] == target){
r = i / 3;
c = i % r;
}
}

Remove values from an array of int

I've written this small C function to remove integers from an array.
/* remove `count` integers from `arr`, starting at index `idx` */
void remove_int(int (*arr)[100], int idx, int count)
{
int i, j;
for (i = 0; i < count; i++)
for (j = idx; *arr[j]; j++)
*arr[j] = *arr[j+1];
}
Say I run it like this:
int arr[100] = {25, 4, 4, 1, 2, 1, 2};
remove_int(&arr, 7, 2);
I get a Segmentation Fault. Why?
EDIT Comment by BLUEPIXY solved it, answer by chqrlie explained it. Thanks guys!
Your code does not do what you think it does:
arr is defined as a pointer to an array of 100 int elements.
arr[j] does not point to the element at offset j, but rather to the jth array in the array pointed to by arr.
*arr[j] dereferences the integer at location arr[j][0], much beyond the end of the arr array from the calling function.
If you keep the same API, you should write the code this way:
/* remove `count` integers from `arr`, starting at index `idx` */
void remove_int(int (*arr)[100], int idx, int count) {
int i, j;
for (i = idx + count; i < 100 && (*arr)[i]; i++)
(*arr)[i - count] = (*arr)[i];
for (j = 0; j < count && i - count + j < 100; j++) {
(*arr)[i - count + j] = 0;
}
It is not idiomatic in C to handle pointers to arrays, it is more common to just pass arrays directly, and the called function receives a pointer to the first element of the array.
The function would then be called this way:
int arr[100] = {25, 4, 4, 1, 2, 1, 2};
remove_int(arr, 7, 2);
And the function would be written this way:
/* remove `count` integers from `arr`, starting at index `idx` */
void remove_int(int arr[100], int idx, int count) {
int i, j;
for (i = idx + count; i < 100 && arr[i]; i++)
arr[i - count] = arr[i];
for (j = 0; j < count && i - count + j < 100; j++) {
arr[i - count + j] = 0;
}
In this case, the [100] array size is ignored and the function behaves exactly the same as if it was defined as void remove_int(int *arr, int idx, int count)

Why does the output of my bubble sort program change each time I run it?

Here is my code:
#include<stdio.h>
#include<cs50.h>
int main(void)
{
int array[8] = {2, 5, 3, 1, 4, 6, 9, 7};
for (int j = 0; j < 8; j++)
{
for (int i = 0; i < 8 ; i++)
{
if (array[i] > array[i + 1])
{
int temp = array[i];
array[i] = array[i + 1];
array[i + 1] = temp;
}
}
}
for (int i = 0; i < 8; i++)
printf("%i", array[i]);
printf("\n");
}
And here is a screenshot of my terminal window: http://i.imgur.com/Q1yCsgR.jpg
As you can see, I made no changes, just kept running it until it finally worked. What's more is that when i tried adding a variable n in main that stored the sizeof the array, and replaced the '8' in the for loops with n, the output to the terminal window just went absolutely crazy and refused to tend towards the correct answer each time I ran it.
if (array[i] > array[i + 1])
In the above if statement, when i is 7, you are accessing out of bounds and that leads to undefined behaviour. You can fix it by changing the for loop condition to:
for (int i = 0; i < 7 ; i++)
In the inner loop when i is 7 you access array[i+1] which is not defined.
Change your inner loop to for (int i = 0; i < 7 ; i++) and it should work.

Resources