CUDA: Finding Minimum Value in 100x100 Matrix - c

i just learned GPU programming and now i have a task to find a minimum value from 100x100 matrix by doing parallel at CUDA. i have try this code, but it's not showing the answer, instead of showing my initiate value hmin = 9999999.can anyone give me the right code? oh, the code is in C lang.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#define size (100*100)
//Kernel Functions & Variable
__global__ void FindMin(int* mat[100][100],int* kmin){
int b=blockIdx.x+threadIdx.x*blockDim.x;
int k=blockIdx.y+threadIdx.y*blockDim.y;
if(mat[b][k] < kmin){
kmin = mat[b][k];
}
}
int main(int argc, char *argv[]) {
//Declare Variabel
int i,j,hmaks=0,hmin=9999999,hsumin,hsumax; //Host Variable
int *da[100][100],*dmin,*dmaks,*dsumin,*dsumax; // Device Variable
FILE *baca; //for opening txt file
char buf[4]; //used for fscanf
int ha[100][100],b; //matrix shall be filled by "b"
//1: Read txt File
baca=fopen("MatrixTubes1.txt","r");
if (!baca){
printf("Hey, it's not even exist"); //Checking File, is it there?
}
i=0;j=0; //Matrix index initialization
if(!feof(baca)){ //if not end of file then do
for(i = 0; i < 100; i++){
for(j = 0; j < 100; j++){
fscanf(baca,"%s",buf); //read max 4 char
b=atoi(buf); //parsing from string to integer
ha[i][j]=b; //save it to my matrix
}
}
}
fclose(baca);
//all file has been read
//time to close the file
//Sesi 2: Allocation data di GPU
cudaMalloc((void **)&da, size*sizeof(int));
cudaMalloc((void **)&dmin, sizeof(int));
cudaMalloc((void **)&dmaks, sizeof(int));
cudaMalloc((void **)&dsumin, sizeof(int));
cudaMalloc((void **)&dsumax, sizeof(int));
//Sesi 3: Copy data to Device
cudaMemcpy(da, &ha, size*sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(dmin, &hmin, sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(dmaks, &hmaks, sizeof(int), cudaMemcpyHostToDevice);
//Sesi 4: Call Kernel
FindMin<<<100,100,1>>>(da,dmin);
//5: Copy from Device to Host
cudaMemcpy(&hmin, dmin, sizeof(int), cudaMemcpyDeviceToHost);
//6: Print that value
printf("Minimum Value = %i \n",hmin);
system("pause"); return 0;
}
this is my result
Minimum Value = 9999999
Press any key to continue . . .

I saw a few issues in your code.
As mentioned in the comments from MayurK, you got the indexing wrong.
Also as MayurK said, you are comparing two pointers and not the values they point to.
You kernel invocation code asks for 100 x 100 x 1 grid, with each block containing just 1 thread. This is very bad in terms of efficiency. Also, because of this, your b and k will only range from 0 to 99, as the threadIdx.x will always be zero.
Finally, all threads will be running in parallel, resulting in a race condition in kmin = mat[b][k] (which should be *kmin by the way). When you fixed the indexing problem, all threads in the same block will write to the location in global memory at same time. You should use atomicMin() or a parallel reduction for finding the minimum value in parallel.

Related

c - Which is the correct way to dynamically allocate multidimensional float arrays? Valgrind error

I'm implementing a K-means algorithm in C. It works well most of the time, but debugging it with Valgrind tell me that I'm doing an "Invalid read of size 8 - Invalid write of size 8 - Invalid read of size 8" using '''memcpy''' at the beginning. I think the problem isn't there, but where I assign a value to the multidimensional float array element, which memory is dynamically allocated with '''malloc''' with a for loop at some point. 'Cause Valgrind also tell "Address 0x572c380 is 0 bytes after a block of size 80 alloc'd".
I've tried to add 1 to the number of bytes that I allocate, cause I thought that maybe '''malloc''' "needed" more memory to do its job, but nothing changed. I know maybe it's a basic error, but I'm quite new to the language and at my course it wasn't explain anything so "technical". I've tried to search the answer and explanation of the error but I have only found problems with '''char''' arrays, and with those I'd understood the function '''strcpy''' can resolve the issue. What about float arrays? It's the first time a use '''memcpy'''.
Here are pieces of code that raise those Valgrind messages.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main(){
FILE* fp; //used to open a .txt file to read
char buf[100];
float ** a;
char * s;
int i;
int j;
int rows = 10;
fp = fopen("data.txt", "r");
if(fp==NULL){
perror("Error at open file.");
exit(1);
}
a = (float**) malloc(rows*sizeof(float*));
for(i=0; i<rows; i++){
s = fgets(buf, 100, fp); //reading .txt file
if (s==NULL){
break;
}
a[i] = malloc(dim*sizeof(float));
a[i][0] = atof(strtok(s, ","));
for(j=1; j<dim; j++){
a[i][j] = atof(strtok(NULL,",")); //save as float value the token read from a line in file, for example, from line "1.0,2.0,3.0" as first line -> get a[0][1] = 2.0
}
}
fclose(fp);
m = (float**) malloc(rows*sizeof(float*));
for (i=0; i<rows; i++){
m[i]=malloc(dim*sizeof(float)); //not initialized
}
memcpy(m, a, rows*dim*sizeof(float));
}
Can someone also help me understand why it works but Valgrind raises these error messages?
You're first allocating an array of float*, then allocating several arrays of float so your last memcpy(m, a, rows*dim*sizeof(float)) copies an array of float* (pointers to float) to another one, but using rows * dim floats, which #SomeProgrammerDude rightfully noted. That would copy pointers, and not values.
Also, as pointed by #xing, you're allocating rows but using righe (which you didn't show). It might be a cause of problems.
I would suggest allocating the whole array at once on the first row, then having all other rows pointing to adequate rows:
a = malloc(rows * sizeof(float*));
a[0] = malloc(dim * rows * sizeof(float)); // Allocate the whole matrix on row #0
for (i = 1; i < rows; i++) {
a[i] = a[0] + i * dim; // sizeof(float) automatically taken into account as float* pointer arithmetics
}
...
m = malloc(rows * sizeof(float*));
m[0] = malloc(dim * rows * sizeof(float));
memcpy(m[0], a[0], dim * rows * sizeof(float));
(add NULL checks of course)

fread'ing a binary file into a dynamically allocated C array

Just a quick comment to start: While there are similar threads to this one, I haven't quite been able to find the solution I'm looking for. My problem is the following:
I have 2D arrays of doulbes saved to binary files and I would like to read the binary files (using C code) into a 2D array. Additionally, I need to allocate the memory dynamically as the shape of the arrays will be changing in my application. To get started, I tried the following code:
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
int N = 10; //number of rows of 2D array
int M = 20; //number of columns
/* first allocate the array */
double **data;
data = (double **)malloc(N*sizeof(double *));
for(unsigned int i=0; i < N; i++) {
data[i] = (double *)malloc(sizeof(double)*M);
}
FILE *ptr;
ptr = fopen("random_real_matrix.dat", "rb");
fread(data, sizeof(data), 1, ptr);
for(unsigned int i=0; i<10;i++){
for(unsigned int j=0; j<20;j++){
fprintf(stderr, "m[%d][%d] = %f\n ", i, j, data[i][j]);
}
}
}
Unfortunately, this code segfaults. I checked to see if I can set the array entries like
d[0][0] = 235;
and that works fine.
Assuming this approach can be fixed, I'm also interested to know if it could be extended to read to an array of double complex's.
Any advice would be greatly appreciated!
Your fread statement is incorrect. It's a common beginner mistake to think that sizeof gets the size of a dynamically allocated array. It doesn't. In this case it just returns the size of a double **. You will need to read in each double in the file and put that into the correct spot in the array.
for (int ii = 0; ii < N; ++ii)
{
for (int jj = 0; jj < M; ++jj)
{
fread(data[ii][jj], sizeof(double), 1, ptr);
// Be sure to check status of fread
}
}
You can do this with a single for loop (or a single fread) but this is probably clearer to read.
Because you allocated each row separately, you can't read into the entire array at once. You need to do it row by row.
for (int i = 0; i < N; i++) {
fread(data[i], sizeof(double), M, ptr);
}

"glibc detected" error on c code

I have a problem with a c code i am writing. It has to multiply 2 matrix (filled with randoms integers between 0-9) of a given dimension (mxn multiplied by nxm and the result being an mxm matrix). The matrix are filled by columns. Also i have to output the computing times for both the whole program and the execution of the function that does the calculation.
I am getting an "glibc detected" error while executing the application. I do know that it is due to heap corruption in my program, most likely due to having written outside memory on the malloc'ed arrays by I am unable the find where the error is.
Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define A(i,j) aa[m*(j)+(i)] //matrix by columns
#define B(i,j) bb[n*(j)+(i)]
#define C(i,j) cc[m*(j)+(i)]
void mmul (int m, int n, double *aa, double *bb, double *cc) {
int i, j, k;
for (i=0; i<m; i++)
for (j=0; j<m; j++) {
C(i,j)=0;
for (k=0; k<n; k++) C(i,j)+=A(i,k)*B(k,j);
}
}
int main (int argc, char *argv[]) {
clock_t exec_timer=clock(), comp_timer;
srand(time(NULL)); //initialize random seed
int m, n, i;
double *aa, *bb, *cc, exec_time, comp_time;
if (argc!=3
|| sscanf(argv[1], "%d", &m)!=1
|| sscanf(argv[2], "%d", &n)!=1
) {
fprintf(stderr, "%s m n \n", argv[0]);
return -1;
}
/* malloc memory */
aa=malloc(m*n*sizeof(int)); //integer matrix
bb=malloc(n*m*sizeof(int));
cc=malloc(m*m*sizeof(int));
/* fill matrix */
for (i=0; i<m*n; i++) aa[i]=rand()%10; //fill with random integers 0-9
for (i=0; i<n*m; i++) bb[i]=rand()%10;
/* compute product */
comp_timer=clock();
mmul(m,n,aa,bb,cc);
comp_time=(double) (clock() - comp_timer) / CLOCKS_PER_SEC;
/* write output */
for (i=0; i<m*m; i++) printf("%i\n",cc[i]);
/* finishing */
free(aa); free(bb); free(cc);
exec_time=(double) (clock() - exec_timer) / CLOCKS_PER_SEC;
printf("exec time = %.3f, comp = %.3f\n", exec_time, comp_time);
return 0;
}
#undef C
#undef B
#undef A
Anyone can see the problem I am missing?
Well, yes, I can see the problem.
You are working with arrays of double, but your allocation code uses int. Since typically a double is twice the size of an int, this leads to horrible amounts of buffer overflow, trashing random memory.
Basically, this:
aa=malloc(m*n*sizeof(int)); //integer matrix
is lying. :) It should be:
aa = malloc(m * n * sizeof *aa); /* Not an integer matrix! aa is double *. */
And the same for the allocations of bb and cc, of course.
Note use of sizeof *aa (meaning "the size of the value pointed at by the pointer aa") to remove the risk of introducing this error, i.e. by not repeating the type manually but instead "locking" it to an actual pointer, you make the code safer.
As a minor note, not related to the problem, you should use const for the read-only arguments to mmul(), like so:
void mmul (int m, int n, const double *aa, const double *bb, double *cc)
That immediately makes it obvious which pointer(s) are inputs, and which is output. It can also help the compiler generate better code, but the main advantage is that it communicates much more clearly what you mean.

CUDA counter letter

I've got a problem with CUDA. I want to make small program which count letters from array of char.
I read letters from file and save to int variable called N, how many letters read. After that I malloc.
char *b_h, *b_d;
size_t size_char = N * sizeof(char);
b_h = (char *)malloc(size_char);
After malloc I read file again and assign current letter to element of char array (it works):
int j=0;
while(fscanf(file,"%c",&l)!=EOF)
{
b_h[j]=l;
j++;
}
After that I create an int variable (a_h) as counter.
int *a_h, *a_d;
size_t size_count = 1*sizeof(int);
a_h = (int *)malloc(size_count);
Ok, go with CUDA:
cudaMalloc((void **) &a_d, size_count);
cudaMalloc((void **) &b_d, size_char);
Copy from host to device:
cudaMemcpy(a_d, a_h, size_count, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, b_h, size_char, cudaMemcpyHostToDevice);
Set blocks and call CUDA function:
int block_size = 4;
int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
square_array <<< n_blocks, block_size >>> (a_d,b_d,c_d, N);
Receive from function:
cudaMemcpy(a_h, a_d, size_count, cudaMemcpyDeviceToHost);
cudaMemcpy(b_h, d_d, size_char, cudaMemcpyDeviceToHost);
And print count:
printf("\Count: %d\n", a_h[0]);
And it doesn't work. In array of char I have sentence: Super testSuper test ; I'm looking for 'e' letter and I got a_h[0] = 1.
Where is problem?
CUDA function:
__global__ void square_array(int *a, char *b, int *c, int N)
{
const char* letter = "e";
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx<N)
{
if(b[idx] == *letter)
{
a[0]++;
}
}
}
Please, help me.
I'm guessing that N is small enough that your GPU is able to launch all your threads in parallel. So, you start a thread for each character in your array. The threads, all running simultaneously, don't see the output from each other. Instead, each thread reads the value of a[0] (which is 0), and increases it by 1 and stores the resulting value (1). If this is homework, that would have been the basic lesson that the professor wanted to impart.
When multiple threads store a value in the same location simultaneously, it is undefined which thread will get its value stored. In your case, that doesn't matter because all threads that store a value will store the value, "1".
A typical solution would be to have each thread store a value of 0 or 1 in a separate location (depending on if there is a match or not), and then add up the values in a separate step.
You can also use an atomic increase operation.

2 Dimensional array and Print out average of the Columns and Rows

Im still a beginner in C programming and I need a little help writing a code for my C programming class.
The prompt is: Input for this program is a two-dimensional array of floating point data located in a file named textfile94. The input array will contain 3 rows of data with each row containing 5 columns of data.
I want you to use the two-subscript method of dynamic memory
allocation.
Use malloc to create an array that holds pointers.
Each element of that array points at another array, which is the row of data.
Use malloc in a loop to create your rows.
Then you can use two subscript operators [r][c] to get at your data
to do the summing and averaging that the program calls for.
This program calls for hard-coded height and width of the 2D array,
known ahead of time (3x5, actually).
Instead of writing in the literal numbers in your code, I want you to
create a global constant variable to hold those dimensions, and use
those in your code.
Here is my code:
#include <stdio.h>
#include <stdlib.h>
#define int rows = 3;
#define int columns = 5;
float array[rows][columns];
int main(int argc, char* argv[]){
FILE* fin;
float x;
int i,j;
int* array;
fin = fopen("textfile94", "r");
fscanf(fin,"%f", &x);
array =(int*) malloc(rows* sizeof(int*));
for(i=0;i<rows;i++){
for(j=0;j<columns;j++)
array[i]=(int*)malloc(columns* sizeof(int));
}
printf("The Average values for the three rows are:[%f]",array[i]);
printf("The Average values for the five columns are:[%f]", array[j]);
return 0;
}
In text file: 4.33 5.33 1.11 99.00 100.00 1.0 33.3 12.5 1.1 -1000.00 22.1 11.9 2.4 8.3 8.9
The program should output:
The average values for the three rows are: 41.95 -190.42 10.32
The average values for the five columns are: 9.14 16.84 5.33 36.13 -297.7
Having Trouble getting it to do this correctly, any help would be appreciated. I don't want the answer I want to learn from this but just need some hints. Thank you.
Updated Code:
#include <stdio.h>
#include <stdlib.h>
#define ROWS 3
#define COLUMNS 5
float array[ROWS][COLUMNS];
int main(int argc, char* argv[]){
FILE* fin;
int i;
float x;
float** array;
fin = fopen("textfile94", "r");
array=(float**) malloc(ROWS*sizeof(float*));
for(i=0;i<ROWS;i++)
array[ROWS]=(float*)malloc(COLUMNS*sizeof(float));
for(j=0;j<COLUMNS;j++){
fscanf(fin,"%f",&x);
x = array[ROWS][COLUMNS];
}
printf("The Average values for the three rows are:%f", array[ROWS]);
printf("The Average values for the five columns are:%f", array[COLUMNS]);
return 0;
}
Ok, I'll see what I can add.
Defines are not written like that, and by convention should be all upper case
#define ROWS 3
#define COLUMNS 5
He wants you to dynamically allocate the array via malloc, you are now statically allocating a 2-dimensional array of floats and then you try to force arrays of ints into it. You should look up how to do multidimensional arrays with malloc.
Basically what you want is
float **array;
Now array is a pointer to pointer to float, then assign array rows number of pointers to float.
ROWS * sizeof(float*)
After that you can for each row assign array[row] with
COLUMNS * sizeof(float)
Now you have your array[ROWS][COLUMNS] structure
One approach to reading in the data in pythonesque pseudo code would be
for(row 1..3)
array[row] = malloc(...)
for(col 1..5)
fscanf(value)
array[row][col] = value
Tell me if I am being too vague, trying to give hints without giving the code.
This should get you started on how to allocate the array, assign and access values, and then free memory. Error checking is omitted for clarity. Most likely you will want verify that calloc actually returns a valid pointer.
To complete the program you will have to read the values into the array and then calculate the averages.
#include <stdlib.h>
#include <stdio.h>
const size_t rows = 3;
const size_t columns = 5;
int main(void)
{
size_t i, j;
/* allocate a two-dimensional array of zeroes */
double **array = calloc(1, rows * sizeof(double *));
for (i = 0; i < rows; ++i) {
array[i] = calloc(1, columns * sizeof(double));
}
/* print it out - replace this by reading in values */
for (i = 0; i < rows; ++i) {
for (j = 0; j < columns; ++j) {
fprintf(stdout, "%.2f", array[i][j]);
fputc(' ', stdout);
}
fprintf(stdout, "\n");
}
/* TODO loop through the array again and average the data */
/* free memory */
for (i = 0; i < rows; ++i) {
free(array[i]);
}
free(array);
return EXIT_SUCCESS;
}

Resources