I am new to thrust (cuda) and I want to do some array operations but I don´t find any similar example on the internet.
I have following two arrays (2d):
a = { {1, 2, 3}, {4} }
b = { {5}, {6, 7} }
I want that thrust compute this array:
c = { {1, 2, 3, 5}, {1, 2, 3, 6, 7}, {1, 2, 3, 5}, {1, 2, 3, 6, 7} }
I know how it works in c/c++ but not how to say thrust to do it.
Here is my idea how it wohl maybe could work:
Thread 1:
Take a[0] -> expand it with b.
Write it to c.
Thread 2:
Take a[1] -> expand it with b.
Write it to c.
But I have no idea how to do that. I could write the array a and b to an 1d array like:
thrust::device_vector<int> dev_a;
dev_a.push_back(3); // size of first array
dev_a.push_back(1);
dev_a.push_back(2);
dev_a.push_back(3);
dev_a.push_back(1); // size of secound array
dev_a.push_back(4);
thrust::device_vector<int> dev_b;
dev_b.push_back(1); // size of first array
dev_b.push_back(5);
dev_b.push_back(2); // size of secound array
dev_b.push_back(6);
dev_b.push_back(7);
And the pseudo-function:
struct expand
{
__host__ __device__
?? ?? (const array ai, const array *b) {
for bi in b: // each array in the 2d array
{
c.push_back(bi[0] + ai[0]); // write down the array count
for i in ai: // each element in the ai array
c.push_back(i);
for i in bi: // each element in the bi array
c.push_back(i);
}
}
};
Anyone any idea?
I guess you won't get any speed increase on the GPU in such kind of operation since it needs a lot oo memory accesses - a slow operation on GPU.
But if you anyway want to implement this:
I guess, for the reason I wrote previously, trust won't help you with ready-to-use algorithm. This means that you need to write your own kernel, however, you can leave memory management to thust.
It is always faster to create arrays in CPU memory and, when ready, copy the whole array to GPU. (CPU<->GPU copies are faster on long continiuos pieces of data)
Keep in mind that GPU runs hundreds of threads in parallel. Each thread need to know what to read and where to write.
Global memory operations are slow (300-400 clocks). Avoid thread reading the whole array from global memory to find out that it needed only the last few bytes.
So, as I can see you program.
Make your arrays 1D in a CPU memory look like this:
float array1[] = { 1, 2, 3, 4};
float array2[] = { 5, 6, 7};
int arr1offsets[] = {0, 2, 3, 1}; // position of the first element and length of subarray pairs
int arr2offsets[] = {0, 1, 1, 2};
Copy your arrays and offsets to GPU and allocate memory for result and it's offsets. I guess, you'll have to count max length of one joint subarray and allocate memory for the worst case.
Run the kernel.
Collect the results
The kernel may look like this (If I correctly understood your idea)
__global__ void kernel(float* arr1, int* arr1offset,
float* arr2, int* arr2offset,
float* result, int* resultoffset)
{
int idx = threadIdx.x+ blockDim.x*blockIdx.x;
int a1beg = arr1offset[Idx*2];
int a2beg = arr2offset[Idx*2];
int a1len = arr1offset[Idx*2+1];
int a2len = arr2offset[Idx*2+1];
resultoffset[idx*2] = idx*MAX_SUBARRAY_LEN;
resultoffset[idx*2+1] = a1len+a2len;
for (int k = 0; k < a1len; ++k) result[idx*MAX_SUBARRAY_LEN+k] = arr1[a1beg+k];
for (int k = 0; k < a2len; ++k) result[idx*MAX_SUBARRAY_LEN+a1len+k] = arr2[a2beg+k];
}
This code is not perfect, but should do the right thing.
Related
im writting a program in which i want to use this piece of code to transfer a 2d array in a function, but i dont understand fully how it works exactly. Can someone explain it, specifically line 7?
#include <stdio.h>
void print(int *arr, int m, int n)
{
int i, j;
for (i = 0; i < m; i++)
for (j = 0; j < n; j++)
printf("%d ", *((arr+i*n) + j));
}
int main()
{
int arr[][3] = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};
int m = 3, n = 3;
// We can also use "print(&arr[0][0], m, n);"
print((int *)arr, m, n);
return 0;
}
i also tried using
*( *(p + i) + j)
instead, but it didnt really work and i dont know why so if someone can explain why this didnt work as well i would really appreciate it.
In modern C, you should use Variable Length Array types introduced in C99.
void print(int m, int n, int arr[m][n])
{
int i, j;
for (i = 0; i < m; i++)
for (j = 0; j < n; j++)
printf("%d ", arr[i][j]);
}
The function should be called with a simple:
int main()
{
int arr[][3] = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};
int m = 3, n = 3;
print(m, n, arr);
return 0;
}
The VLA types are optional feature in C11 but they will be mandatory again in C23.
You pass a 2-dimesional Array to your print function, with the amount of items in the individual arrays and the amount of arrays in the 2D-Array.
Now let us come to the loop:
First of all if i and j are both zero you get the first Item of the first Arrays. In the next Iteration of the inner loop j is 1, thus (arr+i*n) + 1 points to 2 Element of the first Arrays, because i is still zero and j will be 1 ((arr + 0 * 3) + 1). In the next iteration it is the same but i is 2, thus pointing to the second element.
When the inner loop has finished i is increased to 1 and the expression is now (arr + 1 * 3) + 0. So now i * 3 will point to the first element of the second Array.
And in the third iteration of the outer loop i will point to the first element of the third array. So i * 3 is always the pointer to the first element of an array, in this 2D-Array and the + j always points to an individual element in the Array. By combining this the 2D-Array gets printed.
*( *(p + i) + j) Does not work because, assuming p is an pointer to the array arr, because you are dereferencing it, so it in the first iteration it would evaluate to *(1 + 0) which results in a segmentation fault because you are not allowed to read this Memory Adress. This is, because by dereferencing it you are *(p + 0) referring to the first Element of the first Array, which is 1.
In int arr[][3] = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};, 1, 2, and 3 initialize an array of 3 int. An array is a contiguously allocated set of objects, so 1, 2, and 3 are contiguous in memory. 4, 5, and 6 initialize another array of 3 int, and so do 7, 8, and 9. These three arrays of 3 int are themselves another array, an array of 3 arrays of 3 int. Since an array is a contiguously allocated set of objects, the 3 arrays are contiguous in memory. So the 4, 5, and 6 follow the 1, 2, and 3, and the 7, 8, and 9 followed the 4, 5, and 6.
So the overall effect is that 1, 2, 3, 4, 5, 6, 7, 8, and 9 are contiguous and consecutive in memory.
*((arr+i*n) + j) uses this fact to calculate the location of the element in row i and column j. The starting address of the array is arr. i*n is the number of elements from the start to row i. That is, each row has n elements, so i rows have i*n elements. Then j is the number of elements from the start of the row to the element in column j of that row. So arr + i*n + j is where the element in row i, column j is, and *(arr + i*n + j) is that element. The extra parentheses in *((arr+i*n) + j) are unnecessary.
This code abuses the C type model. In main, arr is an array of 3 arrays of 3 int. When main calls print, it passes (int *)arr. This passes a pointer to an int instead of a pointer to an array of 3 int, and then print bypasses the normal application of array types to accessing the memory. Technically, the behavior of this code is not defined by the C standard, but it works in many C implementations.
C is an extremely simple language, it became popular mainly because the simple parts were designed to be combined in ways that replaced complex parts of previous languages (see for as an example). One side effect is that it leaves out parts you expect in other languages.
Specifically for arrays, an array has no information on its size or format, it's assumed that the programmer will keep track of that, or that the size of every dimension but the first is constant (and normally the first one as well). So however many dimensions it's declared as, an array is just a single block of memory large enough to hold all elements, and the location is calculated internally using the [] operator.
Fun fact, C allows you to specify a[1] as 1[a], because it all translates to addition and multiplication. But don't do that.
In the event that you have an array that has variable sizes for more than one dimension, C doesn't support that so you have to do the math yourself, which is what that print() function is doing, where m and n are the sizes of the dimensions. The first row starts at arr (or arr + 0), and goes to arr + (n - 1) (0 to n-1 is n elements), and would look like arr[0][0] to arr[0][n-1] in a language that supported it. The next row starts at arr + n (would be arr[1][0]) to arr + (2 * n) - 1, and so on (up to what would be arr[m-1][n-1]).
In the function here, i and j go from 0 to m-1 and n-1 respectively, so you don't see - 1 in the code.
One more thing, C is at least helpful enough to know when you use + on a pointer, you mean to increment by the size of the thing you're pointing to, so you don't have to figure out how many bytes in a int.
I built a simple function that, given two arrays aa[5] = {5, 4, 9, -1, 3} and bb[2] = {16, -11}, orders them in a third array cc[7].
#include<stdio.h>
void merge(int *, int *, int *, int, int);
int main(){
int aa[5] = {5, 4, 9, -1, 3};
int bb[2] = {16, -11};
int cc[7];
merge(aa, bb, cc, 5, 2);
return 0;
}
void merge(int *aa, int *bb, int *cc, int m, int n){
int i = 0, j = 0, k = 0;
while(i < m && j < n){
if(aa[i] < bb[j])
cc[k++] = aa[i++]; /*Smallest value should be assigned to cc*/
else
cc[k++] = bb[j++];
}
while(i < m) /*Transfer the remaining part of longest array*/
cc[k++] = aa[i++];
while(j < n)
cc[k++] = bb[j++];
}
The cc array is correctly filled, but the values are not ordered. Instead of the expected cc = {-11, -1, 3, 4, 5, 9, 16} it returns cc = {5, 4, 9, -1, 3, 16, 11}.
Like the assignments cc[k++] = aa[i++] and cc[k++] = bb[j++] do not work, somehow, or the logical test if aa[i] < bb[j] goes ignored.
I hypothesized operators priority problems, hence I tested with two different standard, with no differences:
gcc main.c -o main.x -Wall
gcc main.c -o main.x -Wall -std=c89
I checked the code many times, unable to find any relevant error. Any suggestion at this point would be appreciated.
You need to think your algorithm through properly. There's no obvious bug in it. The problem is your expectations. One way to make this clear is to think about what would happen if one array was empty. Would the function merge change the order of anything? It will not. In fact, if two elements a and b are from the same array - be it aa or bb - and a comes before b in that array, then a will also come before b in cc.
The function does what you expect on sorted arrays, so make sure they are sorted before. You can use qsort for this.
Other than that, when you use pointers to arrays you do not want to change, use the const qualifier.
void merge(const int *aa, const int *bb, int *cc, int m, int n)
There's no bug in your implementation (at least I don't see any, imho) The problem is that the merging you have done is not for two sorted arrays (it's for several bunches of sorted numbers). Case you had feed two already sorted arrays you'd have the result sorted correctly.
The merge sorting algorithm begins with splitting the input into two parts of sorted arrays. This is done by switching arrays when you detect the element is not in order (it is not greater to last number) You get the first ordered set to fill an array (the first a elements of initial list which happen to be in order, to put into array A, and the second bunch of elements to put them into array B. This produces two arrays that can be merged (because they are already in order) and this merging makes the result a larger array (this fact is what warrants the algorithm will make larger and larger arrays at each pass and warrants it will finish at some pass. You don't need to operate array by array, as at each pass the list has less and less packs of larger bunch of sorted elements. in your case:
1st pass input (the switching points are where the input
is not in order, you don't see them, but you switch arrays
when the next input number is less than the last input one):
{5}, {4, 9}, {-1, 3}, {16}, {-11} (See note 2)
after first split:
{5}, {-1, 3}, {-11}
{4, 9}, {16}
after first merge result:
{4, 5, 9}, {-1, 3, 16}, {-11}
after second pass split:
{4, 5, 9}, {-11}
{-1, 3, 16}
result:
{-1, 3, 4, 5, 9, 16}, {-11}
third pass split:
{-1, 3, 4, 5, 9, 16}
{-11}
third pass result:
{-11, -1, 3, 4, 5, 9, 16}
The algorithm finishes when you don't get two bunches of ordered streams (you don't switch arrays), and you cannot divide further the stream of data.
Your implementation only executes one pass of merge sorting algorithm you need to implement it completely to get sorted output. The algorithm was designed to make it possible to do several passes when input is not feasible to put in arrays (as you do, so it doesn't fully illustrate the thing with arrays). Case you have it read from files, you'll see the idea better.
NOTE
Sort programs for huge amounts of data use merging algorithm for bunchs of data that are quicksort'ed first, so we start with buckets of data that don't fit in an array all together.
NOTE 2
The number 16 after number 3 should have been in the same bunch as previous bunch, making it {-1, 3, 16}, but as they where in different arrays at first, and I have not found any way to put them in a list that splits into this arrangement, I have forced the buckets as if 16 < 3, switching artificially the arrays on splitting the input. This could affect the final result in making an extra pass through the data, but doesn't affect the final result, which is a sorted list of numbers. I have made this on purpose, and it is not a mistake (it has no relevance to explain how the algorithm works) Anyway, the algorithm switches lists (I don't like to use arrays when describing this algoritm, as normally merging algorithms don't operate on arrays, as arrays are random access, while lists must be accessed by some iterator means from beginning to end, which is the requirement of the merging sort algorithm) The same happens to {4, 9}, {16} after the first split, just imagine the result of the comparisons was the one shown, as after the first merge everything is correct.
If your program works fine, you can sort in O(N) by comparison. As it is not possible and mentioned in comments by #Karzes, your program works fine just for the sorted sub arrays. Hence, if you want to implement merge function for the merge sort, you should try your program for these two inputs:
int aa[5] = {-1, 3, 4, 5, 9};
int bb[2] = {-11, 16};
Not the most efficient cause it's bobble sort...
#include<stdio.h>
void merge(int *, int *, int *, int, int);
void sortarray(int array[], int arraySize)
{
int c,d,temp;
for (c = 0 ; c < arraySize-1; c++)
{
for (d = 0 ; d < arraySize - c - 1; d++)
{
if (array[d] > array[d+1]) /* For decreasing order use < */
{
temp = array[d];
array[d] = array[d+1];
array[d+1] = temp;
}
}
}
}
int main(){
int aa[5] = {5, 4, 9, -1, 3};
int bb[2] = {16, -11};
int cc[7];
int i;
sortarray(aa,sizeof(aa)/sizeof(aa[0]));
sortarray(bb,sizeof(bb)/sizeof(bb[0]));
merge(aa, bb, cc, 5, 2);
for(i=0;i<sizeof(cc)/sizeof(cc[0]);i++)
{
printf("%d,",cc[i]);
}
return 0;
}
void merge(int *aa, int *bb, int *cc, int m, int n){
int i = 0, j = 0, k = 0;
while(i < m && j < n)
{
if(aa[i] < bb[j])
cc[k++] = aa[i++]; /*Smallest value should be assigned to cc*/
else
cc[k++] = bb[j++];
}
while(i < m) /*Transfer the remaining part of longest array*/
cc[k++] = aa[i++];
while(j < n)
cc[k++] = bb[j++];
}
If I want to append a number to an array initialized to int, how can I do that?
int arr[10] = {0, 5, 3, 64};
arr[] += 5; //Is this it?, it's not working for me...
I want {0,5, 3, 64, 5} in the end.
I'm used to Python, and in Python there is a function called list.append that appends an element to the list automatically for you. Does such function exist in C?
int arr[10] = {0, 5, 3, 64};
arr[4] = 5;
EDIT:
So I was asked to explain what's happening when you do:
int arr[10] = {0, 5, 3, 64};
you create an array with 10 elements and you allocate values for the first 4 elements of the array.
Also keep in mind that arr starts at index arr[0] and ends at index arr[9] - 10 elements
arr[0] has value 0;
arr[1] has value 5;
arr[2] has value 3;
arr[3] has value 64;
after that the array contains garbage values / zeroes because you didn't allocated any other values
But you could still allocate 6 more values so when you do
arr[4] = 5;
you allocate the value 5 to the fifth element of the array.
You could do this until you allocate values for the last index of the arr that is arr[9];
Sorry if my explanation is choppy, but I have never been good at explaining things.
There are only two ways to put a value into an array, and one is just syntactic sugar for the other:
a[i] = v;
*(a+i) = v;
Thus, to put something as the element at index 4, you don't have any choice but arr[4] = 5.
For some people which might still see this question, there is another way on how to append another array element(s) in C. You can refer to this blog which shows a C code on how to append another element in your array.
But you can also use memcpy() function, to append element(s) of another array. You can use memcpy()like this:
#include <stdio.h>
#include <string.h>
int main(void)
{
int first_array[10] = {45, 2, 48, 3, 6};
int scnd_array[] = {8, 14, 69, 23, 5};
int i;
// 5 is the number of the elements which are going to be appended
memcpy(first_array + 5, scnd_array, 5 * sizeof(int));
// loop through and print all the array
for (i = 0; i < 10; i++) {
printf("%d\n", a[i]);
}
}
You can have a counter (freePosition), which will track the next free place in an array of size n.
If you have a code like
int arr[10] = {0, 5, 3, 64}; , and you want to append or add a value to next index, you can simply add it by typing a[5] = 5.
The main advantage of doing it like this is you can add or append a value to an any index not required to be continued one, like if I want to append the value 8 to index 9, I can do it by the above concept prior to filling up before indices.
But in python by using list.append() you can do it by continued indices.
Short answer is: You don't have any choice other than:
arr[4] = 5;
void Append(int arr[],int n,int ele){
int size = n+1; // increasing the size
int arrnew[size]; // Creating the new array:
for(int i = 0; i<size;i++){
arrnew[i] = arr[i]; // copy the element old array to new array:
}
arrnew[n] = ele; // Appending the element:
}
by above simple method you can append the value
In C Language i'm creating a array ( 2 Dimensional ) in which all the elements are zeros
I do it the following way :
int a[5][5],i,j; //a is the required array
for(i=0;i<5;i++)
for(j=0;j<5;j++)
a[i][j]=0;
I know some other way also :
int a[5][5]={0};
Are both the same or is there any difference ??
What should be preferred ??
Thank you !
The second method is more concise. Also consider:
memset(&a, 0, sizeof(a));
Both ways have the same effect, but the second one will generally be faster because it allows the compiler to optimise and vectorise that code.
Another widely accepted way (also optimisable) is
memset(a, 0, sizeof(a));
Second one is useful.
The first one uses for loop, so it takes time.
There are other ways in which you can initialize an arrray...
int myArray[10] = { 5, 5, 5, 5, 5, 5, 5, 5, 5, 5 }; // All elements of myArray are 5
int myArray[10] = { 0 }; // Will initialize all elements to 0
int myArray[10] = { 5 }; // Will initialize myArray[0] to 5 and other elements to 0
static int myArray[10]; // Will initialize all elements to 0
/************************************************************************************/
int myArray[10];// This will declare and define (allocate memory) but won’t initialize
int i; // Loop variable
for (i = 0; i < 10; ++i) // Using for loop we are initializing
{
myArray[i] = 5;
}
/************************************************************************************/
int myArray[10] = {[0 ... 9] = 5}; // This works in GCC
memset(myArray, 0, sizeof(myArray));
I would prefer latter one if I do not want to over stress my eyes (and my compiler too).
I have an integer matrix that should act like a buffer:
x = {{0, 0, 0, 0, 0}, {1, 1, 1, 1, 1}, {2, 2, 2, 2, 2}};
Now if I add a new row {3, 3, 3, 3, 3}, the new matrix should look like:
x = {{1, 1, 1, 1, 1}, {2, 2, 2, 2, 2}, {3, 3, 3, 3, 3}};
Is there a clever way of doing this without copying all elements around?
If your matrix is defined as an int ** and you separately allocate each row, then you would only have to swap the row pointers.
How about modulo operation?
If you access the elements as matrix[x + SZ * y] you could change it to:
matrix[x + SZ * ((y + FIRST_ROW) % SZ)] .
In this way to implement this shift you just put the new line {3, 3, 3..} where line {0, 0, 0} was, and increment the FIRST_ROW counter to point to the new starting row.
Use a linked list.
struct node
{
int row[5];
struct node *next;
};
Appending a row is as simple as walking the list to the end, then replacing the NULL next pointer with a new node (whose next pointer is NULL).
Can you increment x so that it points to the second row, then free the first row? Obviously, you would need to allocate a row at a time and this would not guarantee that the matrix is contiguous. If you require that, you could allocate one big block of memory to hold your matrix and then clobber the unused parts when you reach the end.
If you use an array of pointers to arrays (rather than an ordinary two-dimensional array), you can copy just the pointers to rows instead of copying all elements.
And if you're okay with overallocating the array of pointers, you could maybe add a new pointer to the end and advance the pointer to the "start" of the array. But this wouldn't be a good idea if you potentially want to do this sort of shift many times. And of course you'd want to make sure you have the original pointer somewhere so you can properly free() your resources.
Lazy to write code example - you can use modulo arithmetics to address the rows. When pushing a new row, simply increase a starting offset variable, add the matrix height and modulo the result by matrix height. This way you get a circular matrix with no need to copy the whole matrix and keeping the matrix array compact.