Hope everyone is doing well.
I am trying to write a simple Matrix Library in C by creating a Matrix struct, and then using its memory address to execute operations.
Here is my header file for the library:
/*
To compile:
g++ -c simpMat.cpp
ar rvs simpMat.a simpMat.o
g++ test_simpMat.c simpMat.a
*/
#ifndef SIMPMATH_H
#define SIMPMAT_H
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
typedef struct{
uint8_t nRows;
uint8_t nCols;
uint8_t nElements;
float **elements;
}simpMat;
/**
*#brief simpMat_Init
*#param simpMat
*#param uint8_t
*#param uint8_t
*#param uint8_t
*#param float*
*#retval NONE
*/
void simpMat_Init(simpMat *Matrix, uint8_t nRows, uint8_t nColumns, uint8_t nElements, float elements[]);
/**
*#brief simpMat_Print
*#param simpMat
*#retval NONE
*/
void simpMat_Print(simpMat *Matrix);
/**
*#brief simpMat_Delete
*#param simpMat
*#retval NONE
*/
void simpMat_Delete(simpMat *Matrix);
#endif
Here is the source file:
#include "simpMat.h"
void simpMat_Init(simpMat *Matrix, uint8_t nRows, uint8_t nColumns, uint8_t nElements, float elements[])
{
Matrix->nRows = nRows;
Matrix->nCols = nColumns;
Matrix->nElements = nElements;
Matrix->elements = (float**)malloc(nRows * sizeof(float*));
for (uint8_t i = 0; i < nRows; i++)
{
Matrix->elements[i] = (float*)malloc(nColumns * sizeof(float));
}
uint8_t count = 0;
for (uint8_t i = 0; i < nRows; i++)
{
for (uint8_t j = 0; j < nColumns; j++)
{
Matrix->elements[i][j] = elements[count];
count++;
}
}
}
void simpMat_Print(simpMat *Matrix)
{
for (uint8_t i = 0; i < Matrix->nRows; i++)
{
for (uint8_t j = 0; j < Matrix->nCols; j++)
{
printf("%d ", Matrix->elements[i][j]);
}
printf("\n");
}
}
void simpMat_Delete(simpMat *Matrix)
{
uint8_t n = Matrix->nRows;
while(n) free(Matrix->elements[--n]);
free(Matrix->elements);
}
I also wrote a small test program to see if I can successfully assign elements to the matrix; such as:
#include "simpMat.h"
#include "stdio.h"
int main()
{
simpMat Matrix1;
float toAppend[9] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0};
simpMat_Init(&Matrix1, 3, 2, 9, toAppend);
printf("MATRIX ELEMENTS ARE:\n");
simpMat_Print(&Matrix1);
simpMat_Delete(&Matrix1);
return 0;
}
I compiled my library and the main program with the following commands on CMD:
g++ -c simpMat.cpp
ar rvs simpMat.a simpMat.o
g++ test_simpMat.c simpMat.a
However, when I run the executable, I get the following output:
MATRIX ELEMENTS ARE:
0 0
0 0
0 0
I could not understand the reason I cannot assign values. I am fairly new to the Dynamic Memory Allocation subject and I suspect that I had a misconception about the methodology. Can you help me with that?
If you use a debugger and step through your program looking at the memory, you should see the data is actually there. Your question assumes the problem is assignment, whereas it's actually in your output. This kind of thing is most easily discoverable with a debugger.
The actual problem is your matrix elements are float. But you are using %d specifier in your printf, which is for int values. Change this to %f.
Separately, you should reconsider the purpose of the nElements parameter. You are not doing any sanity tests before copying the array (for example, ensuring rows * cols does not exceed that value). It doesn't appear to have any relation to the actual matrix and should not be stored.
Related
Problem
I have custom struct for 2D Matrices. I'm using this struct inside a function to initialize a 2D matrix where every element value is set to 0. I've also have another function to print a matrix to terminal (for debugging purposes).
When I write the struct and the functions inside the main.c, they work. The problem is when I put them in a separate file and call them from that file I get a runtime error: Exception thrown: write access violation.
In my program I have 3 file: main.c, my_lib.h, my_lib.c. The struct is stored inside my_lib.hand the function is in my_lib.c. Inside main.h
I'm using Windows 10 & coding in Visual Studio 2017 v15.9.10
Output
The program compiles but gives a runtime error Exception thrown: write access violation
Edit:
Well, it seems it was my own fault that this was happening.
Actually, I was trying to run this code on my work computer. I've written the the original code on my personal computer where the main.c, my_lib.h & my_lib.c version was working. Then I copied the folder that I was working on and tried to continue on my work computer. Both my computer runs on Windows 10 OS and both have the same version of VS 2017.
On my personal computer the solution explorer was like:
But on my work computer, the solution opened as:
Everything, including the folder hierarchy are same on both computers. It seems, copying a project folder is not a good idea.
When I, created a new C project on my work computer and added the my_lib.cand my_lib.h manually, everything worked.
But I'm still curious why I was getting an Exception error... And how can I correct this problem of copying without creating a new project in VS?
Code
Just main.c ( Works)
main.c
#include <stdio.h>
typedef struct Matrix {
int rows; // number of rows
int cols; // number of columns
double** data; // a pointer to an array of n_rows pointers to rows
}Matrix;
Matrix* make_matrix(int n_rows, int n_cols);
void print_matrix(Matrix* m);
int main() {
Matrix* m1 = make_matrix(2, 5);
print_matrix(m1);
return 0;
}
// CREATE A MATRIX WITH N_ROWS AND N_COLUMNS AND INITIALIZE EACH VALUE AS 0
Matrix* make_matrix(int n_rows, int n_cols) {
Matrix* matrix = malloc(sizeof(Matrix));
matrix->rows = n_rows;
matrix->cols = n_cols;
double** data = malloc(sizeof(double*) * n_rows);
for (int x = 0; x < n_rows; x++) {
data[x] = calloc(n_cols, sizeof(double));
}
matrix->data = data;
return matrix;
}
// PRINT GIVEN MATRIX TO COMMAND LINE
void print_matrix(Matrix* m) {
for (int x = 0; x < m->rows; x++) {
for (int y = 0; y < m->cols; y++) {
printf("%f", m->data[x][y]);
printf("|");
}
printf("\n");
}
}
main.c & function in seperate files (Throws an exception)
main.c
#include "my_lib.h"
int main(){
// Create a 2 by 5 matrix & then print it to terminal
Matrix* m1 = make_matrix(2, 5);
print_matrix(m1);
return 0;
}
my_lib.h
#pragma once
// Our custom 2D matrix struct
typedef struct Matrix {
int rows; // number of rows
int cols; // number of columns
double** data; // a pointer to an array of n_rows pointers to rows
}Matrix;
Matrix* make_matrix(int n_rows, int n_cols);
void print_matrix(Matrix* m);
my_lib.c
#include "my_lib.h"
#include <stdio.h>
// CREATE A MATRIX WITH N_ROWS AND N_COLUMNS AND INITIALIZE EACH VALUE AS 0
Matrix* make_matrix(int n_rows, int n_cols) {
Matrix* matrix = malloc(sizeof(Matrix));
matrix->rows = n_rows;
matrix->cols = n_cols;
double** data = malloc(sizeof(double*) * n_rows);
for (int x = 0; x < n_rows; x++) {
data[x] = calloc(n_cols, sizeof(double));
}
matrix->data = data;
return matrix;
}
// PRINT GIVEN MATRIX TO COMMAND LINE
void print_matrix(Matrix* m) {
for (int x = 0; x < m->rows; x++) {
for (int y = 0; y < m->cols; y++) {
printf("%f", m->data[x][y]);
printf("|");
}
printf("\n");
}
}
The reason you get the crash is not related at all to the fact that you have one or two .c files in your project but it's because you forgot to include <stdlib.h> in my_lib.c.
This triggers following warnings:
my_lib.c(8) : warning C4013: 'malloc' undefined; assuming extern
returning int my_lib.c(13): warning C4013: 'calloc' undefined;
assuming extern returning int my_lib.c(13): warning C4047: '=':
'double *' differs in levels of indirection from 'int'
my_lib.c(8): warning C4047: 'initializing': 'Matrix *' differs in
levels of indirection from 'int' my_lib.c(11): warning C4047:
'initializing': 'double **' differs in levels of indirection from
'int'
You get away with it on a 32 bit build because the size of int is the same as the size of a pointer.
On the other hand if you build your program as 64 bit program, the warnings become really relevant, because now pointers are 64 bits wide, but as the compiler assumes that malloc etc. return ints (32 bit values), everything get messed up.
Actually these warnings should be considered as errors.
Here you decide if you want a 32 or a 64 bit build:
Add #include <stdlib.h> here in my_lib.c:
#include "my_lib.h"
#include <stdlib.h> // <<<<<<<<<<<<<
#include <stdio.h>
// CREATE A MATRIX WITH N_ROWS AND N_COLUMNS AND INITIALIZE EACH VALUE AS 0
Matrix* make_matrix(int n_rows, int n_cols) {
Matrix* matrix = malloc(sizeof(Matrix));
...
I'm trying to initialize a 2-dimensional array in a structure but I always get an error :
gcc -g -Wall -W -I/usr/include/SDL -c -o fractal.o fractal.c
In file included from fractal.c:2:0:
fractal.h:12:12: error: array type has incomplete element type ‘double[]’
double values[][];
Here's the code:
struct fractal {
char name[64];
int height;
int width;
double a;
double b;
double meanValue;
double values[][]; /*This line is causing the error*/
};
Ideally I'd like to initialize the height and width of the 2-dimensional array like this:
struct fractal {
/*... Same code as above ...*/
double values[width][height];
};
But then I get two other errors when compiling:
gcc -g -Wall -W -I/usr/include/SDL -c -o fractal.o fractal.c
In file included from fractal.c:2:0:
fractal.h:12:19: error: ‘width’ undeclared here (not in a function)
double values[width][height];
^
fractal.h:12:26: error: ‘height’ undeclared here (not in a function)
double values[width][height];
^
I've looked about everywhere but my code should work and I can't figure out why it doesn't.
Thanks for the help
As a disclaimer, this is something of an advanced topic, so if you are a beginner you might want to just back away from it entirely and just use a double* array followed by a call to malloc for each pointer. (Fine for beginners, unacceptable in professional code.)
It is an advanced topic since this particular case is a weakness in C. The feature you are trying to use, with an empty array at the end of a struct, is known as flexible array member. This only works for one dimension however. If both dimensions are unknown at compile time, you have to come up with a work-around.
The allocation part is as for any flexible array member: allocate the struct dynamically and make size for the trailing array.
fractal_t* f = malloc(sizeof *f + sizeof(double[height][width]) );
(In this case taking advantage of the convenient VLA syntax, although a flexible array member is not a VLA.)
Technically, the last member of the struct is supposedly double[] now, or so says the struct declaration. But memory returned by malloc has no actual effective type until you access it, after which the effective type of that memory becomes the type used for the access.
We can use this rule to access that memory as if it was a double[][], even though the pointer type in the struct is a different one. Given a fractal f, the code for accessing through a pointer becomes something like this:
double (*array_2D)[width] = (double(*)[width]) f->values;
Where array_2D is an array pointer. The most correct type to use here would have been an array pointer to an array of double, double (*)[height][width], but that one comes with mandatory ugly accessing (*array_2D)[i][j]. To avoid such ugliness, a common trick is to leave out the left-most dimension in the array pointer declaration, then we can access it as array_2D[i][j] which looks far prettier.
Example code:
#include <stdlib.h>
#include <stdio.h>
typedef struct
{
char name[64];
size_t height;
size_t width;
double a;
double b;
double meanValue;
double values[];
} fractal_t;
fractal_t* fractal_create (size_t height, size_t width)
{
// using calloc since it conveniently fills everything with zeroes
fractal_t* f = calloc(1, sizeof *f + sizeof(double[height][width]) );
f->height = height;
f->width = width;
// ...
return f;
}
void fractal_destroy (fractal_t* f)
{
free(f);
}
void fractal_fill (fractal_t* f)
{
double (*array_2D)[f->width] = (double(*)[f->width]) f->values;
for(size_t height=0; height < f->height; height++)
{
for(size_t width=0; width < f->width; width++)
{
array_2D[height][width] = (double)width; // whatever value that makes sense
}
}
}
void fractal_print (const fractal_t* f)
{
double (*array_2D)[f->width] = (double(*)[f->width]) f->values;
for(size_t height=0; height < f->height; height++)
{
for(size_t width=0; width < f->width; width++)
{
printf("%.5f ", array_2D[height][width]);
}
printf("\n");
}
}
int main (void)
{
int h = 3;
int w = 4;
fractal_t* fractal = fractal_create(h, w);
fractal_fill(fractal); // fill with some garbage value
fractal_print(fractal);
fractal_destroy(fractal);
}
Dynamic dimensions arrays is not the point where C is at its best... Simple Variable Length Arrays were only introduced in the language in the C99 version and were made optional in C11 version. They are still not accepted in MSVC 2017...
But here, you are trying to set one in a struct. That is not supported at all because a struct must have a constant size(*) (how could be handled arrays of structs). So I am sorry but this code should not work and I know no way to express that in C language.
A common way would be to replace the 2D dynamic array with a pointer, allocate the pointer to a 2D array and then use it, but even this is not really simple.
You have to design your struct differently...
(*) The last element of a struct may be of an incomplete type, for example int tab[];. That is a dangerous feature because the programmer is responsable for providing room for it. But anyway you cannot build an array of incomplete types.
I encountered this problem while designing a struct to hold both the domain values (N x 1 vector) and the solution values (N x M matrix) in my ODE solver, so as to simplify the function interfaces. N and M are simulation-dependent and hence are unknown a priori. I solved it by using GNU Scientific Library's vector-matrix module. I found it more streamlined to work with than casting a FAM (albeit allocated as 2D) to a standalone whole-array-pointer.
After allocating memory for the struct, all we need to do is invoke gsl_matrix_alloc() to reserve space for the matrix. After we are done, calling gsl_matrix_free() will destroy it. Please note that these functions are data-type dependent as explained in the documentation.
Filename: struct_mat.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <stdlib.h>
#include <time.h>
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_statistics.h>
typedef struct _fractal {
char name[64];
size_t height;
size_t width;
double a;
double b;
double meanValue;
gsl_matrix *values;
} fractal;
fractal * fractal_create(size_t height, size_t width) {
fractal * fractalObj = (fractal *) malloc(sizeof(fractal));
fractalObj -> values = gsl_matrix_alloc(height, width);
if (fractalObj == NULL || fractalObj -> values == NULL) {
fprintf(stderr, "NULL pointer returned while allocating fractal object.. Exiting program.\n");
exit(EXIT_FAILURE);
}
fractalObj -> height = height;
fractalObj -> width = width;
fractalObj -> meanValue = 0.0;
return fractalObj;
}
void fractal_populate(fractal * fractalObj) {
srand(time(NULL));
double current_value = 0.0;
for (size_t r = 0; r < fractalObj -> height; ++r) {
for (size_t c = 0; c < fractalObj -> width; ++c) {
current_value = (double) rand() / (double) RAND_MAX;
gsl_matrix_set(fractalObj -> values, r, c, current_value);
}
}
}
void fractal_calcMeanValue(fractal * fractalObj) {
gsl_vector_view colVec;
for (size_t col = 0; col < fractalObj -> values -> size2; ++col) {
colVec = gsl_matrix_column(fractalObj -> values, col);
fractalObj -> meanValue += gsl_stats_mean(colVec.vector.data, colVec.vector.stride, colVec.vector.size);
}
fractalObj -> meanValue /= fractalObj -> values -> size2;
printf("\nThe mean value of the entire matrix is %lf\n", fractalObj -> meanValue);
}
void fractal_display(fractal * fractalObj) {
printf("\n");
for (size_t r = 0; r < fractalObj -> height; ++r) {
for (size_t c = 0; c < fractalObj -> width; ++c) {
printf("%lf ", gsl_matrix_get(fractalObj -> values, r, c));
}
printf("\n");
}
}
void fractal_delete(fractal * fractalObj) {
gsl_matrix_free(fractalObj -> values);
free(fractalObj);
}
int main(int argc, char const *argv[]){
// Program takes number of rows and columns as command line parameters
switch(argc) {
case 3:
printf("Running program..\n"); // to avoid the declaration-succeeding-label error
size_t height = atoi(argv[1]);
size_t width = atoi(argv[2]);
fractal * myFractal = fractal_create(height, width);
fractal_populate(myFractal);
fractal_display(myFractal);
fractal_calcMeanValue(myFractal);
fractal_delete(myFractal);
return 0;
default:
fprintf(stderr, "USAGE: struct_mat <rows> <columns>\n");
return 1;
}
}
Compile by linking with the GSL and GSL CBLAS libraries:
gcc -std=c99 struct_mat.c -o struct_mat -lgsl -lgslcblas -lm
You may install GSL via your distribution's package manager, Cygwin on Windows or by compiling the source.
In my limited experience, using a standard data structure proves to be far easier than wrestling with either FAMs or array-of-pointers-to-1D-arrays. However, the caveat is that we have to remember allocating memory for the matrix after allocating the struct itself.
General Information
NOTE: I am also decently new to C, OpenAcc.
Hi I am trying to develop an image blurring program, but first I wanted to see if I could parallelize the for loops and copyin/copyout my values.
The problem I am facing currently is when I try to copyin and copyout my data and output variables. The error looks to be a buffer overflow (I have also googled it and that is what people have said), but i am not sure how I should go about fixing this. I think I am doing something wrong with the pointers, but I am not sure.
Thanks so much in advance, if you think that I missed some information please let me know and I can provide it.
Question
I would like to confirm what the error actually is?
How should I go about fixing the issue?
Anything I should look into more so I can fix this kind of issue myself in the future.
Error
FATAL ERROR: variable in data clause is partially present on the device: name=output
file:/nfs/u50/singhn8/4F03/A3/main.c ProcessImageACC line:48
output lives at 0x7ffca75f6288 size 16 not present
Present table dump for device[1]: NVIDIA Tesla GPU 1, compute capability 3.5
host:0x7fe98eaf9010 device:0xb05dc0000 size:2073600 presentcount:1 line:47 name:(null)
host:0x7fe98f0e8010 device:0xb05bc0000 size:2073600 presentcount:1 line:47 name:(null)
host:0x7ffca75f6158 device:0xb05ac0400 size:4 presentcount:1 line:47 name:filterRad
host:0x7ffca75f615c device:0xb05ac0000 size:4 presentcount:1 line:47 name:row
host:0x7ffca75f6208 device:0xb05ac0200 size:4 presentcount:1 line:47 name:col
host:0x7ffca75f6280 device:0xb05ac0600 size:16 presentcount:1 line:48 name:data
Program Definition
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
#include <openacc.h>
// ================================================
// ppmFile.h
// ================================================
#include <sys/types.h>
typedef struct Image
{
int width;
int height;
unsigned char *data;
} Image;
Image* ImageCreate(int width,
int height);
Image* ImageRead(char *filename);
void ImageWrite(Image *image,
char *filename);
int ImageWidth(Image *image);
int ImageHeight(Image *image);
void ImageClear(Image *image,
unsigned char red,
unsigned char green,
unsigned char blue);
void ImageSetPixel(Image *image,
int x,
int y,
int chan,
unsigned char val);
unsigned char ImageGetPixel(Image *image,
int x,
int y,
int chan);
Blur Filter Function
// ================================================
// The Blur Filter
// ================================================
void ProcessImageACC(Image **data, int filterRad, Image **output) {
int row = (*data)->height;
int col = (*data)->width;
#pragma acc data copyin(row, col, filterRad, (*data)->data[0:row * col]) copyout((*output)->data[0:row * col])
#pragma acc kernels
{
#pragma acc loop independent
for (int j = 0; j < row; j++) {
#pragma acc loop independent
for (int i = 0; i < col; i++) {
(*output)->data[j * row + i] = (*data)->data[j * row + i];
}
}
}
}
Main Function
// ================================================
// Main Program
// ================================================
int main(int argc, char *argv[]) {
// vars used for processing:
Image *data, *result;
int dataSize;
int filterRadius = atoi(argv[1]);
// ===read the data===
data = ImageRead(argv[2]);
// ===send data to nodes===
// send data size in bytes
dataSize = sizeof(unsigned char) * data->width * data->height * 3;
// ===process the image===
// allocate space to store result
result = (Image *)malloc(sizeof(Image));
result->data = (unsigned char *)malloc(dataSize);
result->width = data->width;
result->height = data->height;
// initialize all to 0
for (int i = 0; i < (result->width * result->height * 3); i++) {
result->data[i] = 0;
}
// apply the filter
ProcessImageACC(&data, filterRadius, &result);
// ===save the data back===
ImageWrite(result, argv[3]);
return 0;
}
The problem here is that in addition to the data arrays, the output and data pointers need to be copied over as well. From the compiler feed back messages, you can see the compiler implicitly copying them over.
% pgcc -c image.c -ta=tesla:cc70 -Minfo=accel
ProcessImageACC:
46, Generating copyout(output->->data[:col*row])
Generating copyin(data->->data[:col*row],col,filterRad,row)
47, Generating implicit copyout(output[:1])
Generating implicit copyin(data[:1])
50, Loop is parallelizable
52, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
50, #pragma acc loop gang, vector(4) /* blockIdx.y threadIdx.y */
52, #pragma acc loop gang, vector(32) /* blockIdx.x threadIdx.x */
Now you might be able to get this to work by using unstructured data regions to create both the data and pointers, and then "attach" the pointers to the arrays (i.e. fill in the value of the device pointers to the address of the device data array).
Though an easier option is to create temp arrays to point to the data, and then copy the data to the device. This will also increase the performance of your code (both on the GPU and CPU) since it eliminates the extra levels of indirection.
void ProcessImageACC(Image **data, int filterRad, Image **output) {
int row = (*data)->height;
int col = (*data)->width;
unsigned char * ddata, * odata;
odata = (*output)->data;
ddata = (*data)->data;
#pragma acc data copyin(ddata[0:row * col]) copyout(odata[0:row * col])
#pragma acc kernels
{
#pragma acc loop independent
for (int j = 0; j < row; j++) {
#pragma acc loop independent
for (int i = 0; i < col; i++) {
odata[j * row + i] = ddata[j * row + i];
}
}
}
}
Note that scalars are firstprivate by default so there's no need to add the row, col, and filterRad variables in the data clause.
I'd like to use my own type matrix in C using a syntax matlab-like to access it. Is there a solution using the preprocessor? Thanks. (The following code doesn't work).
include <stdio.h>
#define array(i,j) array.p[i*array.nrows+j] //???????????
typedef struct
{
unsigned int nrows;
unsigned int ncols;
float* p;
} matrix;
int main()
{
unsigned int i=4,j=5;
float v=154;
matrix a;
a.p=(float*) malloc(10*sizeof(float));
array(i,j)=v;
return 0;
}
You would have to pass in the name of your array to the macro, but yes, you could do something like that.
Just as an FYI, MATLAB order is more generally known as "column major". C order is more generally known as "row major order".
I have taken the liberty of correcting 1) your memory allocation and 2) your initialization of the dimensions since they are necessary for the macro to work properly:
include <stdio.h>
#define INDEX(mat, i, j) (mat).p[(i) * (mat).nrows + (j)]
typedef struct
{
unsigned int nrows;
unsigned int ncols;
float *p;
} matrix;
int main()
{
unsigned int i = 4, j = 5;
float v = 154;
matrix a = {i, j, NULL};
a.p = malloc(i * j * sizeof(float));
INDEX(a, i - 1, j - 1) = v;
return 0;
}
Here the order is column major, but the index is still zero-based. I have highlighted this by accessing index [i - 1, j - 1] instead of [i, j]. If you want one-based indexing to really conform to MATLAB's way of doing things, you can change your macro to this:
#define INDEX(mat, i, j) (mat).p[((i) - 1) * (mat).nrows + (j) - 1]
Then in main, you could do:
INDEX(a, i, j) = v;
Maybe this approach could help you
#include <stdio.h>
#define array(arr,i,j) (arr.p[ i * arr.nrows + j ]) //???????????
typedef struct
{
unsigned int nrows;
unsigned int ncols;
float* p;
} matrix;
int main()
{
unsigned int i=4,j=5;
float v=154;
matrix a;
a.p=(float*) malloc(10*sizeof(float));
array(a,i,j)=v;
return 0;
}
If you will try to do: a(i,j) it will try to make a call to "a".
thank you for your help.
I fixed some trivial errors in the initial code.
I see your answers and I suppose that a matlab syntax cannot be implemented using a generic #define rule in the preprocessor for several structs.
Instead, a #define rule must be written for each specific struct as the following code.
Thank you so much again.
#include <stdio.h>
#define a(i,j) a.p[i*a.nrows+j]
typedef struct
{
unsigned int nrows;
unsigned int ncols;
float* p;
} matrix;
int main()
{
unsigned int i=4,j=5;
float v=154;
matrix a;
a.nrows=10;
a.ncols=10;
a.p=(float *) malloc(100*sizeof(float));
a(i,j)=v;
return 0;
}
I have a dynamically declared 2D array in my C program, the contents of which I want to transfer to a CUDA kernel for further processing. Once processed, I want to populate the dynamically declared 2D array in my C code with the CUDA processed data. I am able to do this with static 2D C arrays but not with dynamically declared C arrays. Any inputs would be welcome!
I mean the dynamic array of dynamic arrays. The test code that I have written is as below.
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <conio.h>
#include <math.h>
#include <stdlib.h>
const int nItt = 10;
const int nP = 5;
__device__ int d_nItt = 10;
__device__ int d_nP = 5;
__global__ void arr_chk(float *d_x_k, float *d_w_k, int row_num)
{
int index = (blockIdx.x * blockDim.x) + threadIdx.x;
int index1 = (row_num * d_nP) + index;
if ( (index1 >= row_num * d_nP) && (index1 < ((row_num +1)*d_nP))) //Modifying only one row data pertaining to one particular iteration
{
d_x_k[index1] = row_num * d_nP;
d_w_k[index1] = index;
}
}
float **mat_create2(int r, int c)
{
float **dynamicArray;
dynamicArray = (float **) malloc (sizeof (float)*r);
for(int i=0; i<r; i++)
{
dynamicArray[i] = (float *) malloc (sizeof (float)*c);
for(int j= 0; j<c;j++)
{
dynamicArray[i][j] = 0;
}
}
return dynamicArray;
}
/* Freeing memory - here only number of rows are passed*/
void cleanup2d(float **mat_arr, int x)
{
int i;
for(i=0; i<x; i++)
{
free(mat_arr[i]);
}
free(mat_arr);
}
int main()
{
//float w_k[nItt][nP]; //Static array declaration - works!
//float x_k[nItt][nP];
// if I uncomment this dynamic declaration and comment the static one, it does not work.....
float **w_k = mat_create2(nItt,nP);
float **x_k = mat_create2(nItt,nP);
float *d_w_k, *d_x_k; // Device variables for w_k and x_k
int nblocks, blocksize, nthreads;
for(int i=0;i<nItt;i++)
{
for(int j=0;j<nP;j++)
{
x_k[i][j] = (nP*i);
w_k[i][j] = j;
}
}
for(int i=0;i<nItt;i++)
{
for(int j=0;j<nP;j++)
{
printf("x_k[%d][%d] = %f\t",i,j,x_k[i][j]);
printf("w_k[%d][%d] = %f\n",i,j,w_k[i][j]);
}
}
int size1 = nItt * nP * sizeof(float);
printf("\nThe array size in memory bytes is: %d\n",size1);
cudaMalloc( (void**)&d_x_k, size1 );
cudaMalloc( (void**)&d_w_k, size1 );
if((nP*nItt)<32)
{
blocksize = nP*nItt;
nblocks = 1;
}
else
{
blocksize = 32; // Defines the number of threads running per block. Taken equal to warp size
nthreads = blocksize;
nblocks = ceil(float(nP*nItt) / nthreads); // Calculated total number of blocks thus required
}
for(int i = 0; i< nItt; i++)
{
cudaMemcpy( d_x_k, x_k, size1,cudaMemcpyHostToDevice ); //copy of x_k to device
cudaMemcpy( d_w_k, w_k, size1,cudaMemcpyHostToDevice ); //copy of w_k to device
arr_chk<<<nblocks, blocksize>>>(d_x_k,d_w_k,i);
cudaMemcpy( x_k, d_x_k, size1, cudaMemcpyDeviceToHost );
cudaMemcpy( w_k, d_w_k, size1, cudaMemcpyDeviceToHost );
}
printf("\nVerification after return from gpu\n");
for(int i = 0; i<nItt; i++)
{
for(int j=0;j<nP;j++)
{
printf("x_k[%d][%d] = %f\t",i,j,x_k[i][j]);
printf("w_k[%d][%d] = %f\n",i,j,w_k[i][j]);
}
}
cudaFree( d_x_k );
cudaFree( d_w_k );
cleanup2d(x_k,nItt);
cleanup2d(w_k,nItt);
getch();
return 0;
I mean the dynamic array of dynamic arrays.
Well, that's exactly where the problem lies. A dynamic array of dynamic arrays consists of a whole bunch of disjoint memory blocks, one for each line in the array (as is clearly seen from the malloc inside you for loop in mat_create2). So you can't copy such a data structure to device memory with just one call to cudaMemcpy*. Instead, you have to do either
Also use dynamic arrays of dynamic arrays on CUDA. To do this, you have to basically recreate your mat_create2 function, using cudaMalloc instead of malloc, then copy each row seperately.
Use a "tight" 2d array on CUDA, like you do now (which is a good thing, at least performance-wise!). But if you keep using dyn-dyn-arrays on host memory, you still have copy each row seperately, like
for(int i=0; i<r; ++i){
cudaMemcpy(d_x_k + i*c, x_k[i], c*sizeof(float), cudaMemcpyHostToDevice)
}
You may wonder "why did it work with a static 2d array, then"? Well, static 2d arrays in C are proper, tight arrays that can be copied in one go. It's a bit confusing that these are indexed with exactly the same syntax as dyn-dyn arrays (arr[x][y]), because it actually works completely different.
But you should consider using tight arrays on host memory, too, perhaps with an object-oriented wrapper like
typedef struct {
float* data;
int n_rows, n_cols;
} tight2dFloatArray;
#define INDEX_TIGHT2DARRAY(arr, y, x)\
(arr).data[(y)*(arr).n_cols + (x)]
such an approach of course can be implemented much safer as a C++ class.
*You also can't copy it inside main memory with just one memcpy: that only copies the array of pointers, not the actual data.