I'm going to write in pseudo code to make my question more clear. Please keep in mind this code will be done in C.
Imagine I have an array of any amount of numbers. The first number tells me how big of an array we're dealing with. For example, if my first number is 3, it means I have two 3x3 matrices. So I create two multidimensional arrays with:
matrix1[3][3]
matrix2[3][3]
What I'm having a hard time with is the arithmetic/coding to assign all the numbers to the matrices, I'm having a very hard time visualizing how it would be done.
Imagine a test array contains [2,1,2,3,4,5,6,7,8]
My program should now have two matrixes with:
1 2 5 6
3 4 7 8
Do I need several nested loops? Any help would be appreciated.
At the moment the only idea i get is using two for loops. Or you can make a function and call it every time you need (but don't forget to use k as second argument).
int i, j, k;
/* We start in the 2nd element of the array that's why k = 1. */
k = 1;
/* Now we fill the array1 copying 1 by 1 the elements of the "test array" until
we fill it. Then we do the same with the array2. */
for( i = 0; i < test[ 0 ]; i++ ){
for( j = 0; j < test[ 0 ]; j++ ){
array1[ i ][ j ] = test[ k ]
k++;
}
}
for( i = 0; i < test[ 0 ]; i++ ){
for( j = 0; j < test[ 0 ]; j++ ){
array2[ i ][ j ] = test[ k ]
k++;
}
}
Your data is presented in row-major order. After reading your integer array and validating the content (i.e. the dim=4 means 32 values follow, dim=2 means 8 values follow, etc.) I'm not sure why you want to allocate or loop anything.
I.e. You can use your physical test[] data as the matrices:
int dim = test[0];
int (*mat1)[dim] = (int (*)[dim])(test+1);
int (*mat2)[dim] = (int (*)[dim])(test+1 + dim*dim);
C99 supports variable array declarations at the implementation level (i.e. the compiler can support the feature as it is defined by the standard, but does not have to; see 6.7.6.2 of the C99 standard for more info). If your toolchain does NOT support it, then a predefined macro, __STDC_NO_VLA__ must be defined and can be tested at compile time (see section 6.10.8.3-1 of the C99 standard). That being said, every C99-compliant compiler I've ever used in the last decade-plus does support it, so if your's does not, tell us below in a comment.
If it does, then pay note to the use of 'dim' in the declarations of mat1 and mat2 above). It is one of the few features of C I like that C++ does not have. So dance with the one you brought.
Finally, assuming your compiler is C99 compliant and supports VLAs (__STDC_NO_VLA__ is NOT defined), as an extra super-special bonus it is all-but-guaranteed to be the fastest algorithm to get your two matrices, because there is no algorithm. You read one array element, then assign two pointers. O(3) is hard to beat.
Example
#include <stdlib.h>
#include <stdio.h>
// main loader.
int main(int argc, char *argv[])
{
int test[] = {2,1,2,3,4,5,6,7,8};
int dim = test[0];
int (*mat1)[dim] = (int (*)[dim])(test+1);
int (*mat2)[dim] = (int (*)[dim])(test+1 + dim*dim);
// proof stuff is where it should be.
int i=0,j=0;
for (i=0;i<dim;i++)
{
for (j=0;j<dim;printf("%d ", mat1[i][j++]));
printf (" ");
for (j=0;j<dim;printf("%d ", mat2[i][j++]));
printf("\n");
}
return EXIT_SUCCESS;
}
Output
1 2 5 6
3 4 7 8
A similar test with a 3x3 data set:
int test[] = {3,1,2,3,4,5,6,7,8,9,9,8,7,6,5,4,3,2,1};
Output
1 2 3 9 8 7
4 5 6 6 5 4
7 8 9 3 2 1
And finally, a 4x4 data set:
int test[] = {4,1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,8,7,6,5,4,3,2,1,8,7,6,5,4,3,2,1};
Output
1 2 3 4 8 7 6 5
5 6 7 8 4 3 2 1
1 2 3 4 8 7 6 5
5 6 7 8 4 3 2 1
The problem with multidimensional arrays in C is that you need to know in advance (at compile time) n-1 of the dimension sizes, they are also a drag when used as function parameters.
There are a couple of alternate approaches:
Creating an array of arrays. i.e. allocating an array of array pointers and then allocating arrays to those pointers.
type **array = malloc(sizeof(type * ) * < firstnumread > );
array[0] = malloc(sizeof(type) * < firstnumread > );
...
Allocating a single dimension array with the size of all the multiplied dimensions. i.e.
type *array = malloc(sizeof(type) * < firstnumread > * < firstnumread >);
In your case, the second is probably more appropiate. Something like:
matrix1 = malloc(sizeof(type)*<firstnumread>*<firstnumread>);
matrix2 = malloc(sizeof(type)*<firstnumread>*<firstnumread>);
Then you can assign values like this:
matrix1[column*<firstnumread> + row] = <value>;
Yes, with 2 for loops.
2D arrays are stored in continuous series of lines from the matrix. So you doesn't even need to allocate new memory you can use your original array. Anyway you can create 2 new standalone array too.
You can crate a function like this, to get the correct number of the matrix.
int getNumber(int array[], int arraynumber, int index_x, int index_y)
{
return array[(((array[0]*index_x)+index_y)+1)+((array[0]*array[0])*arraynumber)];
}
The arraynumber variable is 0 for the first and 1 for the second matrix. This funciton works only if all parameters are correct, so ther is no error detection.
With this function you can easily loop through and create 2 new arrays:
int i,k;
for (i=0; i<array[0]; i++)
{
for (k=0; k<array[0]; k++)
{
newarray1[i][k] = getNumber(array, 0, i,k);
newarray2[i][k] = getNumber(array, 1, i,k);
}
}
Here is something that works in a single loop; no nests and no repeats. I don't know if it'll outperform other answers, but I just felt like giving you a different answer ^_^
I have not tested this code, but it looks like the logic of the algorithm works - that's the point, right? Let me know if it has any errors....
int c=0, x=0, y=0, size=test[0], length=sizeof(test);
for(i=1; i<length; i++) {
if((c-size)<0) {
matrix1[x][y] = test[i];
} else {
matrix2[x][y] = test[i];
}
++y;
if(y%size == 0) {
++c;
y = 0;
x = (c-size)<0 ? ++x : 0;
}
}
Related
Data file I am reading from:
$ cat temp.txt
0 1 2 3
4 5 6 7
8 9 10 11
C code:
#include<stdio.h>
#include<stdlib.h>
void main(){
int matsize = 12;
int numrows = 3;
int numcols = 4;
int** mat=malloc(matsize*sizeof(int*));
for(int i=0;i<numrows*numcols;++i){
mat[i]=malloc(sizeof(int));
}
FILE *file;
file=fopen("temp.txt", "r");
for(int i = 0; i < numrows; i++){
for(int j = 0; j < numcols; j++) {
if (!fscanf(file, "%d", &mat[i][j]))
break;
}
}
fclose(file);
printf("%d\n",mat[numrows-1][numcols-1]);
printf("%d\n", mat[2][3]);
printf("%d\n", mat[1][5]);
printf("%d\n", mat[0][11]);
printf("Done allocating.\n");
}
$ ./a.out
11
11
0
7
The first two outputs are both 11 as expected. For a 12 item array, I was expecting mat[1][5] and mat[0][11] to output the same thing as mat[2][3] (i.e. the 12th element, i.e. the [i+1th][j+1th] element). My understanding is that internally, the array declared here:
int** mat=malloc(matsize*sizeof(int*));
is not really a 2x3 array, rather it's just a matsize array and that accessing it via mat[i][j] just kind of divides it into [i] rows then gets the [jth] element of the ith row. Does the compiler "know" the array should be a 3x4 array because I am reading the text file in with the line if (!fscanf(file, "%d", &mat[i][j])) ? Does this scanning statement permanently change the mat object into 3 pointers of 4-length integer arrays? Or is it really just still a 12-length array, which should be accessible via dividing into two 6's, one 12, etc (if so, then why didn't the last two indexings print out the right thing)?
Follow up: If I change everything from %d/ints to %lf/doubles, I get this output with the same indexing:
11.000000
11.000000
9.000000
11.000000
Why is, for example, mat[0][11] coming out to 7 when everything is an int, but 11.000000 (as expected) when things are floats?
You should allocate a 2D array instead. It can be done as
int (*mat)[rows][cols] = malloc( sizeof(int[rows][cols]) );
However, to enable the conventient mat[i][j] syntax, you have to drop one of the dimensions in the pointer type:
int (*mat)[cols] = malloc( sizeof(int[rows][cols]) );
Not only does this get rid of the needless complexity and bugs, it also enables you to read the whole file with a single fread call, since you have a real 2D array now, instead of some fragmented pointer-to-pointer thing.
Don't forget the free(mat); at the end.
So the question is originated from Leetcode:
In a n * m two-dimensional array, each row is sorted in increasing order from left to right, and each column is sorted in increasing order from top to bottom. Please complete a function, input such a two-dimensional array and an integer, and determine whether the array contains the integer.
And my C solution is:
1 bool findNumberIn2DArray(int** matrix, int matrixSize, int* matrixColSize, int target){
2 //return matrixSize==5;
3 for(int i = 0; i < matrixSize; i++)
4 {
5 for (int j = 0; j < *matrixColSize; j++)
6 {
7 int num = matrix[i][j];
8 if (num == target) {return true;}
9 if (num > target) {break;}
10 }
11 }
12 return false;
13}
In row 5, It has to be *matrixColSize. If I remove the asterisk, the memory will overflow. From my perspective, the matrixColSize is an INT. Why can't I use the INT directly but have to use a pointer?
Maybe I didn't make my question clear. I know it's a pointer, and I know a pointer is totally different from an INT. My question is, why is this question using a pointer here to define the matrixColSize but not just using and INT? From my perspective, an INT is enough here to represent the size.
Can anybody tell me why please?
Because you define as a function parameter int* matrixColSize if you delete * in 5. line, you basically say that increase j till address of matrixColSize.
And it is very large number because an address like 0x88644278. So thats why I guess your memory overflows. If you want to just int you must change your function parameters to bool findNumberIn2DArray(int** matrix, int matrixSize, int matrixColSize, int target) and in 5. line you can write without an asterisk.
I am having some weird problems with my code and I have no idea what I'm doing wrong. I'm relatively new to C-language, so I'd really appreciate if someone could point out if I'm doing something stupid.
I'm trying to read data from a file into two dimensional array of type state. Type state is defined as follows:
typedef uint8_t state [4][4];
I have defined method read_block that gets the file pointer as parameter and returns pointer two dimensional type state array which contains first 16 bytes from that file.
state* read_block(FILE *fp) {
state *arr = (state*) malloc(sizeof (state));
uint8_t *temp = (uint8_t*) malloc(sizeof (uint8_t)*16);
fread(temp, 16, 1, fp);
/*
for (int y = 0; y < 16; y++) {
printf("%u ", temp[y]);
}
printf("\n");
*/
int n = 0;
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 4; j++) {
/*
printf("%u ", temp[n]);
*/
*arr[i][j] = temp[n];
n++;
}
}
return arr;
}
Code compiles and runs ok, but the type state array that method returns always contains one wrong value. Specifically, the value in position [3][0] is always wrong.
The test file I'm trying to read contains numbers:
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
I decided to check if the values are read right and if I remove comments around printf lines I get as output:
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
0 1 2 3 0 1 2 3 0 1 2 3 3 1 2 3
In which the first line is exactly correct (that is printed right after fread()), but for some reason the second line is incorrect (that is printed when inserting values into arr).
I can't figure out what is causing this. I've also tried reading values one byte at time with fread, but the same value keeps being wrong. I'm using NetBeans, so I'm not sure if this is somehow being caused by it.
You're running into operator precedence.
*arr[i][j]
is the same as
*(arr[i][j]) // or arr[i][j][0]
This is the same only if i == 0, i.e. in the first line.
What you want is
(*arr)[i][j]
It is not really a good idea to hide an array behind a typedef. This is the root of the problem, since arr is actually an array pointer of type uint8_t (*)[4][4]. And when de-referencing an array pointer you have to first take the contents of the pointer to get the array, then after access a specific index. That is, you will have to do (*arr)[i][j] rather than *arr[i][j].
To avoid that fishy syntax in turn, you could use a trick and declare an array pointer to the outermost dimension: uint8_t(*)[4]. Then you can suddenly type arr[i][j] instead. arr[i] then means do pointer arithmetic on a uint8_t(*)[4] type, and then in the obtained result you can access a specific array index, [j].
In addition to the operator precedence issue, you probably want to store data directly into the array. There's no reason why you can't do fread(arr, 16, 1, fp);, since you are using that array in the same order as it is allocated.
I haven't tested it, but you should be able to replace the whole function with this:
void read_block(FILE* fp, uint8_t (*arr)[4][4])
{
const size_t size = sizeof(uint8_t[4][4]);
*arr = malloc(size);
fread(*arr, size, 1, fp);
}
I am new to GPU programming (and rather rusty in C) so this might be a rather basic question with an obvious bug in my code. What I am trying to do is take a 2 dimensional array and find the sum of each column for every row. So If I have a 2D array that contains:
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 2 4 6 8 10 12 14 16 18
I want to get an array that contains the following out:
45
45
90
The code I have so far is not returning the correct output and I'm not sure why. I'm guessing it is because I am not handling the indexing in the kernel properly. But it could be that I am not using the memory correctly since I adapted this from an over-simplified 1 dimensional example and the CUDA Programming Guide (section 3.2.2) makes a rather big and not very well described jump for a beginner between 1 and 2 dimensional arrays.
My incorrect attempt:
#include <stdio.h>
#include <stdlib.h>
// start with a small array to test
#define ROW 3
#define COL 10
__global__ void collapse( int *a, int *c){
/*
Sum along the columns for each row of the 2D array.
*/
int total = 0;
// Loop to get total, seems wrong for GPUs but I dont know a better way
for (int i=0; i < COL; i++){
total = total + a[threadIdx.y + i];
}
c[threadIdx.x] = total;
}
int main( void ){
int array[ROW][COL]; // host copies of a, c
int c[ROW];
int *dev_a; // device copies of a, c (just pointers)
int *dev_c;
// get the size of the arrays I will need
int size_2d = ROW * COL * sizeof(int);
int size_c = ROW * sizeof(int);
// Allocate the memory
cudaMalloc( (void**)&dev_a, size_2d);
cudaMalloc( (void**)&dev_c, size_c);
// Populate the 2D array on host with something small and known as a test
for (int i=0; i < ROW; i++){
if (i == ROW - 1){
for (int j=0; j < COL; j++){
array[i][j] = (j*2);
printf("%i ", array[i][j]);
}
} else {
for (int j=0; j < COL; j++){
array[i][j] = j;
printf("%i ", array[i][j]);
}
}
printf("\n");
}
// Copy the memory
cudaMemcpy( dev_a, array, size_2d, cudaMemcpyHostToDevice );
cudaMemcpy( dev_c, c, size_c, cudaMemcpyHostToDevice );
// Run the kernal function
collapse<<< ROW, COL >>>(dev_a, dev_c);
// copy the output back to the host
cudaMemcpy( c, dev_c, size_c, cudaMemcpyDeviceToHost );
// Print the output
printf("\n");
for (int i = 0; i < ROW; i++){
printf("%i\n", c[i]);
}
// Releasae the memory
cudaFree( dev_a );
cudaFree( dev_c );
}
Output:
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 2 4 6 8 10 12 14 16 18
45
45
45
You are correct, it's an indexing issue. Your kernel will generate a correct answer if you replace this:
total = total + a[threadIdx.y + i];
with this:
total = total + a[blockIdx.x*COL + i];
and this:
c[threadIdx.x] = total;
with this:
c[blockIdx.x] = total;
However there's more to say than that.
Any time you're having trouble with a CUDA code, you should use proper cuda error checking. The second issue above was definitely resulting in a memory access error, and you may have gotten a hint of this with error checking. You should also run your codes with cuda-memcheck which will do an extra-tight job of bounds checking, and it would definitely catch the out-of-bounds access your kernel was making.
I think you may be confused with kernel launch syntax: <<<ROW, COL>>> You may be thinking that this maps into 2D thread coordinates (I'm just guessing, since you used threadIdx.y in a kernel where it has no meaning.) However the first parameter is the number of blocks to be launched, and the second is the number of threads per block. If you provide scalar quantities (as you have) for both of these, you will be launching a 1D grid of 1D threadblocks, and your .y variables won't really be meaningful (for indexing). So one takeaway is that threadIdx.y doesn't do anything useful in this setup (it is always zero).
To fix that, we could make the first change listed at the beginning of this answer. Note that when we launch 3 blocks, each block will have a unique blockIdx.x so we can use that for indexing, and we have to multiply that by the "width" of your array to generate proper indexing.
Since the second parameter is the number of threads per block, your indexing into C also didn't make sense. C only has 3 elements (which is sensible) but each block had 10 threads, and in each block the threads were trying into index into the "first 10" locations in C (each thread in a block has a unique value for threadIdx.x) But after the first 3 locations, there is no extra storage in C.
Now possibly the biggest issue. Each thread in a block is doing exactly the same thing in the loop. Your code does not differentiate behavior of threads. You can write code that gives the correct answer this way, but it's not sensible from a performance standpoint.
To fix this last issue, the canonical answer is to use a parallel reduction. That's an involved topic, and there are many questions about it here on the SO tag, so I'll not try to cover it, but point out to you that there is a good tutorial here along with the accompanying CUDA sample code that you can study. If you want to see a parallel reduction on the matrix rows, for example, you could look at this question/answer. It happens to be performing a max-reduction instead of a sum-reduction, but the differences are minor. You can also use an atomic method as suggested in the other answer, but that is generally not considered a "high-performance" approach, because the throughput of atomic operations is more limited than what is achievable with the ordinary CUDA memory bandwidth.
You also seem to be generally confused about the CUDA kernel execution model, so continued reading of the programming guide (that you've already linked) is a good starting point.
I am using Ubuntu Linux for programming purposes. Yesterday I came across a very strange problem that was really really obscure and was weird.
The problem was that I tried to do bubble sort, logic, syntax everything was correct but the output was wrong. I wrote same program in Windows and it worked fine. I am using Eclipse IDE in Linux. What can be the problem? On The other side I used pointers (call by reference) to accomplish bubble sort, but in Ubuntu the output was also wrong, while in Windows the output was okay. I don't know how to figure it out.
My code for bubble sort is as following:
#include<stdio.h>
void main(void)
{
int array[] = {4,2,6,3,1,5,8,4,6,1};
int i=0;
int j=0;
for(i=1;i<=10;i++)
{
for(j=0;j<=10-i;j++)
{
if(array[j]>array[j+1])
{
int temp = array[j];
array[j] = array[j+1];
array[j+1] = temp;
}
}
}
for(i=0;i<=9;i++)
{
printf("%d\t",array[i]);
}
}
Output:
gcc -o bubblesort.c -o output
./output
2 3 4 1 5 6 4 6 1 1
Going beyond the bounds of an array is undefined behaviour (a subset of which is behave "correctly"), which is what is occuring the program. Arrays use a zero-based indexed meaning the last valid index is one less than the number of elements in the array:
/* 10 elements in 'array'. */
int array[] = {4,2,6,3,1,5,8,4,6,1};
for(j=0;j<=10-i ;j++)
{
if(array[j]>array[j+1]) /* When 'j' is 9 the
'array[j + 1]' is
out of bounds. */
Change the inner for loop terminating condition:
for(j=0;j<=9-i ;j++)
Instead of hard-coding 9 and 10 throughout the code you could use sizeof(array)/sizeof(array[0]) to obtain the number of elements in array. This makes it less error prone and simpler to change the number of elements in array later:
const int ARRAY_SIZE = sizeof(array)/sizeof(array[0]);
This:
for(j=0;j<=10-i;j++)
together with this:
if(array[j]>array[j+1])
and other places where you access your array out of bound is a likely cause of your problems.
Accessing an array out of bounds is undefined behavior.
This is pseudo code for a bubble sort:
for (i = 0; i < 9; i++) {
for (j = i + 1; j < 10; j++) {
if (element[i] > element[j]) swap_elements();