This is one of those questions where there are so many answers, and yet none do the specific thing.
I tried to look at all of these posts —
1 2 3 4 5 6 7 8 9 — and every time the solution would be either using VLAs, using normal arrays with fixed dimensions, or using pointer to pointer.
What I want is to allocate:
dynamically (using a variable set at runtime)
rectangular ("2d array") (I don't need a jagged one. And I guess it would be impossible to do it anyway.)
contiguous memory (in #8 and some other posts, people say that pointer to pointer is bad because of heap stuff and fragmentation)
no VLAs (I heard they are the devil and to always avoid them and not to talk to people who suggest using them in any scenario).
So please, if there is a post I skipped, or didn't read thoroughly enough, that fulfils these requirements, point me to it.
Otherwise, I would ask of you to educate me about this and tell me if this is possible, and if so, how to do it.
You can dynamically allocate a contiguous 2D array as
int (*arr)[cols] = malloc( rows * sizeof (int [cols]) );
and then access elements as arr[i][j].
If your compiler doesn’t support VLAs, then cols will have to be a constant expression.
Preliminary
no VLAs (I heard they are the devil and to always avoid them and not to talk to people who suggest using them in any scenario.
VLAs are a problem if you want your code to work with Microsoft's stunted C compiler, as MS has steadfastly refused to implement VLA support, even when C99, in which VLA support was mandatory, was the current language standard. Generally speaking, I would suggest avoiding Microsoft's C compiler altogether if you can, but I will stop well short of suggesting the avoidance of people who advise you differently.
VLAs are also a potential problem when you declare an automatic object of VLA type, without managing the maximum dimension. Especially so when the dimension comes from user input. This produces a risk of program crash that is hard to test or mitigate at development time, except by avoiding the situation in the first place.
But it is at best overly dramatic to call VLAs "the devil", and I propose that anyone who actually told you "not to talk to people who suggest using them in any scenario" must not have trusted you to understand the issues involved or to evaluate them for yourself. In particular, pointers to VLAs are a fine way to address all your points besides "no VLAs", and they have no particular technical issues other than lack of support by (mostly) Microsoft. Support for these will be mandatory again in C2X, the next C language specification, though support for some other forms of VLA use will remain optional.
Your requirements
If any of the dimensions of an array type are not given by integer constant expressions, then that type is by definition a variable-length array type. If any dimension but the first of an array type is not given by an integer constant expressions then you cannot express the corresponding pointer type without using a VLA.
Therefore, if you want a contiguously allocated multidimensional array (array of arrays) for which any dimension other than the first is chosen at runtime, then a VLA type must be involved. Allocating such an object dynamically works great and has little or no downside other than lack of support by certain compilers (which is a non-negligible consideration, to be sure). It would look something like this:
void do_something(size_t rows, size_t columns) {
int (*my_array)[columns]; // pointer to VLA
my_array = malloc(rows * sizeof(*my_array));
// ... access elements as my_array[row][col] ...
}
You should have seen similar in some of the Q&As you reference in the question.
If that's not acceptable, then you need to choose which of your other requirements to give up. I would suggest the "multi-dimensional" part. Instead, allocate (effectively) a one-dimensional array, and use it as if it had two dimensions by performing appropriate index computations upon access. This should perform almost as well, because it's pretty close to what the compiler will set up automatically for a multidimensional array. You can make it a bit easier on yourself by creating a macro to assist with the computations. For example,
#define ELEMENT_2D(a, dim2, row, col) ((a)[(row) * (dim2) + (col)])
void do_something(size_t rows, size_t columns) {
int *my_array;
my_array = malloc(rows * columns * sizeof(*my_array));
// ... access elements as ELEMENT_2D(my_array, columns, row, col) ..
}
Alternatively, you could give up the contiguous allocation and go with an array of pointers instead. This is what people who don't understand arrays, pointers, and / or dynamic allocation typically do, and although there are some applications, especially for arrays of pointers to strings, this form has mostly downside relative to contiguous allocation for the kinds of applications where one wants an object they think of as a 2D array.
Often an array of pointers is allocated and then memory is allocated to each pointer.
This could be inverted. Allocate a large contiguous block of memory. Allocate an array of pointers and assign addresses from within the contiguous block.
#include <stdio.h>
#include <stdlib.h>
int **contiguous ( int rows, int cols, int **memory, int **pointers) {
int *temp = NULL;
int **ptrtemp = NULL;
// allocate a large block of memory
if ( NULL == ( temp = realloc ( *memory, sizeof **memory * rows * cols))) {
fprintf ( stderr, "problem memory malloc\n");
return pointers;
}
*memory = temp;
// allocate pointers
if ( NULL == ( ptrtemp = realloc ( pointers, sizeof *pointers * rows))) {
fprintf ( stderr, "problem memory malloc\n");
return pointers;
}
pointers = ptrtemp;
for ( int rw = 0; rw < rows; ++rw) {
pointers[rw] = &(*memory)[rw * cols]; // assign addresses to pointers
}
// assign some values
for ( int rw = 0; rw < rows; ++rw) {
for ( int cl = 0; cl < cols; ++cl) {
pointers[rw][cl] = rw * cols + cl;
}
}
return pointers;
}
int main ( void) {
int *memory = NULL;
int **ptrs = NULL;
int rows = 20;
int cols = 17;
if ( ( ptrs = contiguous ( rows, cols, &memory, ptrs))) {
for ( int rw = 0; rw < rows; ++rw) {
for ( int cl = 0; cl < cols; ++cl) {
printf ( "%3d ", ptrs[rw][cl]);
}
printf ( "\n");
}
free ( memory);
free ( ptrs);
}
return 0;
}
Suppose you need a 2D array of size W x H containing ints (where H is the number of rows, and W the number of columns).
Then you can do the the following:
Allocation:
int * a = malloc(W * H * sizeof(int));
Access element at location (i,j):
int val = a[j * W + i];
a[j * W + i] = val;
The whole array would occupy a continuous block of memory, and can be dynamically allocated (without VLAs). Being a continuous block of memory offers an advantage over an array of pointers due to [potentially] fewer cache misses.
In such an array, the term "stride" refers to the offset between one row to another. If you need to use padding e.g. to make sure all lines start at some aligned address, you can use a stride which is bigger than W.
I did a benchmark between:
the classic pointer to array of pointers to individually malloc'd memory
one pointer to contignuous memory, accessed with a[x * COLS + y]
a mix of both - pointer to array of pointers to sliced up malloc'd contignuous memory
TL;DR:
the second one appears to be faster by 2-12% compared to the others, which are sort of similar in performance.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define ROWS 100
#define COLS 100
#define LOOPS 100
#define NORMAL 0
#define SINGLE 1
#define HYBRID 2
int **x_normal; /* global vars to make it more equal */
int *y_single;
int *z_hybrid_memory;
int **z_hybrid_pointers;
int copy_array[ROWS][COLS];
void x_normal_write(int magic) { /* magic number to prevent compiler from optimizing it */
int i, ii;
for (i = 0; i < ROWS; i++) {
for (ii = 0; ii < COLS; ii++) {
x_normal[i][ii] = (i * COLS + ii + magic);
}
}
}
void y_single_write(int magic) {
int i, ii;
for (i = 0; i < ROWS; i++) {
for (ii = 0; ii < COLS; ii++) {
y_single[i * COLS + ii] = (i * COLS + ii + magic);
}
}
}
void z_hybrid_write(int magic) {
int i, ii;
for (i = 0; i < ROWS; i++) {
for (ii = 0; ii < COLS; ii++) {
z_hybrid_pointers[i][ii] = (i * COLS + ii + magic);
}
}
}
void x_normal_copy(void) {
int i, ii;
for (i = 0; i < ROWS; i++) {
for (ii = 0; ii < COLS; ii++) {
copy_array[i][ii] = x_normal[i][ii];
}
}
}
void y_single_copy(void) {
int i, ii;
for (i = 0; i < ROWS; i++) {
for (ii = 0; ii < COLS; ii++) {
copy_array[i][ii] = y_single[i * COLS + ii];
}
}
}
void z_hybrid_copy(void) {
int i, ii;
for (i = 0; i < ROWS; i++) {
for (ii = 0; ii < COLS; ii++) {
copy_array[i][ii] = z_hybrid_pointers[i][ii];
}
}
}
int main() {
int i;
clock_t start, end;
double times_read[3][LOOPS];
double times_write[3][LOOPS];
/* MALLOC X_NORMAL 1/2 */
x_normal = malloc(ROWS * sizeof(int*)); /* rows */
for (i = 0; i < ROWS; i+=2) { /* malloc every other row to ensure memory isn't contignuous */
x_normal[i] = malloc(COLS * sizeof(int)); /* columns for each row (1/2) */
}
/* MALLOC Y_SINGLE */
y_single = malloc(ROWS * COLS * sizeof(int)); /* all in one contignuous memory */
/* MALLOC Z_HYBRID */
z_hybrid_memory = malloc(ROWS * COLS * sizeof(int)); /* memory part - with a big chunk of contignuous memory */
z_hybrid_pointers = malloc(ROWS * sizeof(int*)); /* pointer part - like in normal */
for (i = 0; i < ROWS; i++) { /* assign addresses to pointers from "memory", spaced out by COLS */
z_hybrid_pointers[i] = &z_hybrid_memory[(i * COLS)];
}
/* MALLOC X_NORMAL 2/2 */
for (i = 1; i < ROWS; i+=2) { /* malloc every other row to ensure memory isn't contignuous */
x_normal[i] = malloc(COLS * sizeof(int)); /* columns for each row (2/2) */
}
/* TEST */
for (i = 0; i < LOOPS; i++) {
/* NORMAL WRITE */
start = clock();
x_normal_write(i);
end = clock();
times_write[NORMAL][i] = (double)(end - start);
/* SINGLE WRITE */
start = clock();
y_single_write(i);
end = clock();
times_write[SINGLE][i] = (double)(end - start);
/* HYBRID WRITE */
start = clock();
z_hybrid_write(i);
end = clock();
times_write[HYBRID][i] = (double)(end - start);
/* NORMAL READ */
start = clock();
x_normal_copy();
end = clock();
times_read[NORMAL][i] = (double)(end - start);
/* SINGLE READ */
start = clock();
y_single_copy();
end = clock();
times_read[SINGLE][i] = (double)(end - start);
/* HYBRID READ */
start = clock();
z_hybrid_copy();
end = clock();
times_read[HYBRID][i] = (double)(end - start);
}
/* REPORT FINDINGS */
printf("CLOCKS NEEDED FOR:\n\nREAD\tNORMAL\tSINGLE\tHYBRID\tWRITE\tNORMAL\tSINGLE\tHYBRID\n\n");
for (i = 0; i < LOOPS; i++) {
printf(
"\t%.1f\t%.1f\t%.1f\t\t%.1f\t%.1f\t%.1f\n",
times_read[NORMAL][i], times_read[SINGLE][i], times_read[HYBRID][i],
times_write[NORMAL][i], times_write[SINGLE][i], times_write[HYBRID][i]
);
/* USE [0] to get totals */
times_read[NORMAL][0] += times_read[NORMAL][i];
times_read[SINGLE][0] += times_read[SINGLE][i];
times_read[HYBRID][0] += times_read[HYBRID][i];
times_write[NORMAL][0] += times_write[NORMAL][i];
times_write[SINGLE][0] += times_write[SINGLE][i];
times_write[HYBRID][0] += times_write[HYBRID][i];
}
printf("TOTAL:\n\t%.1f\t%.1f\t%.1f\t\t%.1f\t%.1f\t%.1f\n",
times_read[NORMAL][0], times_read[SINGLE][0], times_read[HYBRID][0],
times_write[NORMAL][0], times_write[SINGLE][0], times_write[HYBRID][0]
);
printf("AVERAGE:\n\t%.1f\t%.1f\t%.1f\t\t%.1f\t%.1f\t%.1f\n",
(times_read[NORMAL][0] / LOOPS), (times_read[SINGLE][0] / LOOPS), (times_read[HYBRID][0] / LOOPS),
(times_write[NORMAL][0] / LOOPS), (times_write[SINGLE][0] / LOOPS), (times_write[HYBRID][0] / LOOPS)
);
return 0;
}
Though maybe this is not the best approach since the result can be tainted by random stuff - such as perhaps the proximity of the source arrays to the destination array for copy functions (though the numbers are consistent for reads and writes. Perhaps someone can expand on this.)
I'm new to C programming so I am probably doing something really stupid here. I am trying to get the value from a 2D array that I read in from a text file ~70m lines.
When running the code, I get a seg fault and I have narrowed it down to line 10: if (i == graph[j][0])
void convertToCSR(int source, int maxNodes, int maxEdges, int* vertices, int* edges, int** graph) {
int i;
int j;
int edge = 0;
for (i = 0; i < maxNodes; i++) {
vertices[i] = edge;
for (j = 0; j < maxEdges; j++) {
if (i == graph[j][0]) {
//Sets edges[0] to the first position
edges[edge] = graph[j][1];
printf("new edge value: %d\n", edge);
edge++;
}
}
}
vertices[maxNodes] = maxEdges;}
I have tried this with smaller datasets e.g 50 bytes and it works fine. With further testing, I print out the value of graph[0][0] and I get a seg fault.
The graph has loaded the data and was allocated like this:
int num_rows = 69000000;
graph = (int**) malloc(sizeof(int*) * num_rows);
for(i=0; i < num_rows; i++){
graph[i] = (int*) malloc(sizeof(int) * 2 );
}
I am also able to get the value of graph[0][0] outside of this method but not inside.What am I doing wrong? I appreciate any help.
EDIT: In my main method, I am doing the following:
readInputFile(file);
int source = graph[0][0];
convertToCSR(source, maxNodes, maxEdges, nodes, edges, graph);
I have the correct value for the variable : source.
It seg faults in the convertToCSR method.
You’re using num_rows to store a number bigger than int capacity.
So the actual value int num_rows is not 69000000, because of the Overflow.
Try to use long unsigned int num_rows instead.
I'm currently porting some C (as part of a wider R package) to Go. Because the C in question is used as part of an R package, it has to make extensive use of pointers. The R package is changepoint.np.
As somebody who isn't experienced in C, I've managed to understand most of it. However, the following code has me a bit stumped:
double *sumstat; /* matrix in R: nquantile rows, n cols */
int *n; /* length of data */
int *minseglen; /* minimum segment length */
int *nquantiles; /* num. quantiles in empirical distribution */
...[abridged for brevity]...
int j;
int isum;
double *sumstatout;
sumstatout = (double *)calloc(*nquantiles,sizeof(double));
for (j = *minseglen; j < (2*(*minseglen)); j++) {
for (isum = 0; isum < *nquantiles; isum++) {
*(sumstatout+isum) = *(sumstat+isum+(*nquantiles*(j))) - *(sumstat+isum+(*nquantiles*(0)));
}
}
Specifically, this line (in the inner for loop):
*(sumstatout+isum) = *(sumstat+isum+(*nquantiles*(j))) - *(sumstat+isum+(*nquantiles*(0)));
I've read various pages and Stackoverflow questions/answers about C pointers and arrays, and if I understood them correctly, this line would be translated into Go as:
n := len(data)
nquantiles := int(4 * math.Log(float64(len(data))))
sumstatout[isum] = sumstat[isum*n + nquantiles*j] - sumstat[isum*n + nquantiles*0]
Where n is the number of columns (*n in the C code), and nquantiles is the number of rows (*nquantiles in the C code).
However this produces an error (index out of range, obviously) where the original code does not.
Where am I going wrong?
In the line:
sumstatout[isum] = sumstat[isum*n + nquantiles*j] - sumstat[isum*n + nquantiles*0]
I see two strange things:
1) Where did the n in isum*n come from? The n is not part of the orginal expression.
2) nquantiles is a pointer in the original code so it can't be used that way.
In C it should rather be:
sumstatout[isum] = sumstat[isum + *nquantiles*j] - sumstat[isum]
The original C code treats a (contiguous) memory area as a 2D matrix. Like this:
int i, j;
int cols = ..some number..;
int rows = ..some number..;
double* matrix = malloc(cols * rows * sizeof *matrix);
for (i = 0; i < rows; ++i)
for (j = 0; j < rows; ++j)
*(matrix + i*cols + j) = ... some thing ...;
^^^^^^ ^^^
Move to row i Move to column j
That is equivalent to:
int i, j;
int cols = ..some number..;
int rows = ..some number..;
double matrix[rows][cols];
for (i = 0; i < rows; ++i)
for (j = 0; j < cols; ++j)
matrix[i][j] = ... some thing ...;
I'm trying to implement a kernel which does parallel reduction. The code below works on occasion, I have not been able to pin down why it goes wrong on the occasions it does.
__kernel void summation(__global float* input, __global float* partialSum, __local float *localSum){
int local_id = get_local_id(0);
int workgroup_size = get_local_size(0);
localSum[local_id] = input[get_global_id(0)];
for(int step = workgroup_size/2; step>0; step/=2){
barrier(CLK_LOCAL_MEM_FENCE);
if(local_id < step){
localSum[local_id] += localSum[local_id + step];
}
}
if(local_id == 0){
partialSum[get_group_id(0)] = localSum[0];
}}
Essentially I'm summing the values per work group and storing each work group's total into partialSum, the final summation is done on the host. Below is the code which sets up the values for the summation.
size_t global[1];
size_t local[1];
const int DATA_SIZE = 15000;
float *input = NULL;
float *partialSum = NULL;
int count = DATA_SIZE;
local[0] = 2;
global[0] = count;
input = (float *)malloc(count * sizeof(float));
partialSum = (float *)malloc(global[0]/local[0] * sizeof(float));
int i;
for (i = 0; i < count; i++){
input[i] = (float)i+1;
}
I'm thinking it has something to do when the size of the input is not a power of two? I noticed it begins to go off for numbers around 8000 and beyond. Any assistance is welcome. Thanks.
I'm thinking it has something to do when the size of the input is not a power of two?
Yes. Consider what happens when you try to reduce, say, 9 elements. Suppose you launch 1 work-group of 9 work-items:
for (int step = workgroup_size / 2; step > 0; step /= 2){
// At iteration 0: step = 9 / 2 = 4
barrier(CLK_LOCAL_MEM_FENCE);
if (local_id < step) {
// Branch taken by threads 0 to 3
// Only 8 numbers added up together!
localSum[local_id] += localSum[local_id + step];
}
}
You're never summing the 9th element, hence the reduction is incorrect. An easy solution is to pad the input data with enough zeroes to make the work-group size the immediate next power-of-two.
I want to do some calculation with some matrices whose size is 2048*2048, for example.But the simulator stop working and it does not simulate the code. I understood that the problem is about the size and type of variable. For example, I run a simple code, which is written below, to check whether I am right or not. I should print 1 after declaring variable A. But it does not work.
Please note that I use Codeblocks. WFM is a function to write a float matrix in a text file and it works properly because I check that before with other matrices.
int main()
{
float A[2048][2048];
printf("1");
float *AP = &(A[0][0]);
const char *File_Name = "example.txt";
int counter = 0;
for(int i = 0; i < 2048; i++)
for(int j = 0; j < 2048; j++)
{
A[i][j] = counter;
++counter;
}
WFM(AP, 2048, 2048, File_Name , ' ');
return 0;
}
Any help and suggestion to deal with this problem and larger matrices is appreciate it.
Thanks
float A[2048][2048];
which requires approx. 2K * 2K * 8 = 32M of stack memory. But typically the stack size of the process if far less than that. Please allocate it dynamically using alloc family.
float A[2048][2048];
This may be too large for a local array, you should allocate memory dynamically by function such as malloc. For example, you could do this:
float *A = malloc(2048*2048*sizeof(float));
if (A == 0)
{
perror("malloc");
exit(1);
}
float *AP = A;
int counter = 0;
for(int i = 0; i < 2048; i++)
for(int j = 0; j < 2048; j++)
{
*(A + 2048*i + j) = counter;
++counter;
}
And when you does not need A anymore, you can free it by free(A);.
Helpful links about efficiency pitfalls of large arrays with power-of-2 size (offered by #LưuVĩnhPhúc):
Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513?
Why is my program slow when looping over exactly 8192 elements?
Matrix multiplication: Small difference in matrix size, large difference in timings