Merging two ascending arrays in CUDA with ascending order - arrays

I have two float arrays
a = {1, 0, 0, 22, 89, 100};
b = {2, 3, 5, 0, 77, 98};
Both are monotonically increasing; Both with same length; Both may/may not have 0s inside. What I am trying to get is the new array combing both arrays in ascending order but without 0s:
c = {1, 2, 3, 5, 22, 77, 89, 98, 100 };
I cannot figure out how to write in CUDA code, unless I do a serial for loop, which I am trying to avoid. Any suggestions? Thanks.

As Robert pointed out, thrust provides the basic building blocks for your needs.
merge.cu
#include <iostream>
#include <thrust/remove.h>
#include <thrust/merge.h>
int main()
{
float a[6] = {1, 0, 0, 22, 89, 100};
float b[6] = {2, 3, 5, 0, 77, 98};
float c[12];
thrust::merge(a,a+6,b,b+6,c);
float* newEnd = thrust::remove(c,c+12,0);
thrust::copy(c,newEnd, std::ostream_iterator<float>(std::cout, " "));
}
Compile and run:
nvcc -arch sm_20 merge.cu && ./a.out
Output:
1 2 3 5 22 77 89 98 100

Related

Need Help Finding Segmentation fault in C Program

I'm implementing the Data Encryption Standard in C for a personal learning project and I have a seg fault that has been driving me up the wall for the past 3 days. I understand this isn't the place for "fix my code for me" type questions, but I need a second pair of eyes to look over this:
/*we must define our own modulo, as the C modulo returns unexpected results:*/
#define MOD(x, n) ((x % n + n) % n)
/*example: the 12th bit should be in the second byte so return 1 (the first byte being 0)*/
#define GET_BYTE_NUM(bit_index) (bit_index/8)
/*example: a bit index of 12 means this bit is the 4th bit of the second byte so return 4*/
#define GET_BIT_NUM(bit_index) MOD(bit_index, 8)
typedef unsigned char byte;
/*each row represents a byte, at the bit to be place in the position
* for example for the first row (first byte) we will place bits 31, 0, 1, 2, 3, 4 in
* in bit positions 0-6, respectively. The last two bits will be left blank. Since this is supposed to be a crypto implementation, static prevents this value from being accessed outside the file.*/
const static byte e_box[8][6] = { {31, 0, 1, 2, 3, 4}, {3, 4, 5, 6, 7, 8}, {7, 8, 9, 10, 11, 12}, {11, 12, 13, 14, 15, 16}, {12, 16, 17, 18, 19, 20}, {19, 20, 21, 22, 23, 24},
{23, 24, 25, 26, 27, 28}, {27, 28, 29, 30, 31, 0} }
void e(byte **four_byte_block)
{
int i, n, l = 0, four_bit_num, four_byte_num;
/*create the new byte_block and initialize all values to 0, we will have 4 spaces of bytes, so 32 bits in total*/
byte *new_byte_block = (byte*)calloc(4, sizeof(byte));
byte bit;
for(i = 0 i < 8; i++)
{
for(n = 0; n < 6; n++)
{
/*get the byte number of the bit at l*/
four_byte_num = GET_BYTE_NUM(e_box[i][n]);
/*find what index the bit at l is in its byte*/
half_bit_num = GET_BIT_NUM(e_box[i][n]);
bit = *four_byte_block[half_byte_num]; /*SEG FAULT!*/
}
}
/*finally, set four_byte_block equal to new_byte_block*/
/*four_byte_block = NULL;
* four_byte_block = new_byte_block;*/
}
I have narrowed the problem down to the line marked /SEG FAULT!/ but I can't see what the issue is. When I print the half_byte_num, I get a number that is within bounds of half_block, and when I print the values of half_block, I can confirm that those values exist.
I believe I may be doing something wrong with the pointers ie by passing **four_byte_block, (a pointer to a pointer) and it's manipulation could be causing the seg fault.
Have you tried this:
bit = (*four_byte_block)[half_byte_num];
Instead of this:
bit = *four_byte_block[half_byte_num];
Those are not the same, as an example, take the following code:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
char **lines = malloc(sizeof(char *) * 8);
for (int i = 0; i < 8l; i++)
lines[i] = strdup("hey");
printf("%c\n", *lines[1]);
printf("%c\n",(*lines)[1]);
return 0;
}
The former will output h.
The latter will output e.
This is because of the operators precedence, [] will be evalued before *, thus if you want to go to the nth index of *foo, you need to type (*foo)[n].

Product matrix-vector using MPI (C) - doesn't work properly

hopping you and your family are doing great concedering the situation.
I'm working on a program to multiply a matrix and a vector, using multithreading (MPI) in C. Here it is :
#include <mpi.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
int rank, size;
int mat[2][3] = {{1, 2, 3}, {4, 5, 6}};
int vector[3] = {7, 8, 9};
int vecRes[3] = {50, 122};
int nbLigMat = sizeof(mat) / sizeof(mat[0]);
int nbColMat = sizeof(mat[0]) / sizeof(int);
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
int sendbuf[sizeof(vector) / sizeof(mat[0])], recvbuf, recvcounts[size];
if (size == nbColMat)
{
for (int i = 0; i < nbLigMat; i++)
sendbuf[i] = mat[i][rank] * vector[rank];
for (int i = 0; i < size; i++)
recvcounts[i] = 1;
MPI_Reduce_scatter(sendbuf, &recvbuf, recvcounts, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
if (rank < nbLigMat)
printf("Processeur num %d / outputVector[%d] : %d\n", rank, rank, recvbuf);
}
else
{
if (rank == 0)
{
printf("Le nombre de processeurs necessaires est : %d\n", nbColMat);
}
}
MPI_Finalize();
return 0;
}
It does work for the following matrix/vector couples :
{{1, 2}, {3, 4}} and {5, 6}; {{1, 2, 3}, {4, 5, 6}} and {7, 8, 9}; {{1, 2, 3, 4}, {5, 6, 7, 8}} and {9, 10, 11, 12} but for example with {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}} and {10, 11, 12}, the result should be {68, 167, 266} but my output vector is {68, 167, 476}.
I realised that, I don't know for which reason, the elements in my input vector are changed between the beginning and the first for loop for each process even though i don't interact with it :
In process 0, my input vector became a {40, 11, 12} in the first loop and a {40, 280, 12} at the end. In process 1, {55, 11, 12} and then {55, 88, 12}. In p2, {72, 11, 12} and then {72, 108, 12}. These numbers match the result numbers of the mat[i][rank]*vector[rank] calculation in the first loop, but I don't understand how they get in my input vector.
Maybe i understood something wrong on how MPI works but i don't manage to find out how those number get there and overwrite my input vector. Hopping the explanation is clear enough..
I think the error is in the dimension of sendbuf. If I run your example with {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}} and {10, 11, 12} then:
sizeof(vector) / sizeof(mat[0]) = 1
which is too small, e.g. in your loop you assign values to sendbuf from i = 0 to i = nbLigMat-1.
If I replace the definition of sendbuf to be:
int sendbuf[nbLigMat], recvbuf, recvcounts[size];
then I get the correct answer:
Processeur num 0 / outputVector[0] : 68
Processeur num 1 / outputVector[1] : 167
Processeur num 2 / outputVector[2] : 266

Runtime error when initializing array in c

This is my code initializing the array:
#include <stdio.h>
int main (void) {
int x, n;
// 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
int *array = {2, 4, 6, 9, 11, 13, 15, 17, 19, 21, 25, 29, 30, 34, 35, 38};
n = sizeof(array) / sizeof(int);
for (x=0; x<n; x++) {
printf("%i: %i - ", x, array[x]);
}
printf("\nArray's length: %i", n);
return 0;
}
I'm not understanding why this simple code shows this message:
Runtime error
Thanks in advance.
Change this:int *array = to this: int array[] =. Ideone link: https://ideone.com/ULH7i6 . See this too: How to initialize all members of an array to the same value?
What did you have in mind when you declared the following line?
int *array = {2, 4, 6, 9, 11, 13, 15, 17, 19, 21, 25, 29, 30, 34, 35, 38};
What comes to mind when I see something like this you're trying to work with an array using pointer arithmetic, which makes for a lot of fun interview questions (and just cool in general :P ). On the other hand you might just be used to being able to create arrays using array literals.
Below is something that talks about the different types of arrays you might be trying to work with. I know you picked an answer but this might be useful to you if you were trying to accomplish something else.
C pointer to array/array of pointers disambiguation
Your array declaration is not correct.... just edit your declaration to
int *array[] = {2, 4, 6, 9, 11, 13, 15, 17, 19, 21, 25, 29, 30, 34, 35, 38};
here the correction code!
#include <stdio.h>
int main (void) {
int x, n;
// 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
int *array[] = {2, 4, 6, 9, 11, 13, 15, 17, 19, 21, 25, 29, 30, 34, 35, 38};
n =sizeof(array) / sizeof(int);
for (x=0; x<n; x++) {
printf("%i: %i - ",x,array[x]);
}
printf("\nArray's length: %i", n);
return 0;
}

fwrite() puts unknown zero

I have one strange problem while using fwrite() in C.After editing of file with integers, i get zero ("0") before my last element which added by fwrite().My exercise is to divide integers in file on groups which are consisted of 10 elements or less.
For example, i have :
{2, 3, 9, 4, 6, 7, 5, 87, 65,12, 45, 2, 3, 4, 5, 6, 7, 8, 9}
after editing i need :
{2, 3, 9, 4, 6, 7, 5, 87, 65, 12, 87, 45, 2, 3, 4, 5, 6, 7, 8, 9, 45 };
With code below I get:
{2, 3, 9, 4, 6, 7, 5, 87, 65, 12, 87, 45, 2, 3, 4, 5, 6, 7, 8, 9, 0, 45 };
In the process of step-by-step debugging fwrite() works only two times, it wites 87 after first ten elements, and 45 after remained.Zero wasn`t writed by fwrite().Is that so?From where it comes finally?
My code:
while (!feof(fp)) {
fread(&elems[k], sizeof(int), 1, fp);
fwrite(&elems[k], sizeof(int), 1, hp);
++k;
if (k == 10 || feof(fp)){
for (i = 0; i < 10; ++i){
if (elems[i] > max_number){
max_number = elems[i];
}
}
fwrite(&max_number, sizeof(int), 1, hp);
for (i = 0; i < 10; ++i){
elems[i] = 0;
}
max_number = INT_MIN;
k = 0;
}
}
Thank for answers!
Using feof will result in reading an extra item.
Must check for errors from fwrite and fread calls.
One possible error is the EOF error.
If fread returns 0 items read the value will be left at whatever it was before. Probably 0. Which is probably where your extra 0 comes from.

Finding the sum of the digits

I have a 5-digit integer, say
int num = 23456;
How to find the sum of its digits?
Use the modulo operation to get the value of the least significant digit:
int num = 23456;
int total = 0;
while (num != 0) {
total += num % 10;
num /= 10;
}
If the input could be a negative number then it would be a good idea to check for that and invert the sign.
#include <stdio.h>
int main()
{
int i = 23456;
int sum = 0;
while(i)
{
sum += i % 10;
i /= 10;
}
printf("%i", sum);
return 0;
}
int sum=0;while(num){sum+=num%10;num/=10;}
Gives a negative answer if num is negative, in C99 anyway.
Is this homework?
How about this:
for(sum=0 ,num=23456;num; sum+=num %10, num/=10);
If you want a way to do it without control statements, and incredibly efficient to boot, O(1) instead of the O(n), n = digit count method:
int getSum (unsigned int val) {
static int lookup[] = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, // 0- 9
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, // 10- 19
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, // 20- 29
:
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, // 90- 99
:
14, 15, 16, 17, 18, 19, 20, 21, 22, 23, // 23450-23459
::
};
return lookup[23456];
}
:-)
Slightly related: if you want the repeated digit sum, a nice optimization would be:
if (num%3==0) return (num%9==0) ? 9 : 3;
Followed by the rest of the code.
Actually, I answered with another (somewhat humorous) answer which used a absolutely huge array for a table lookup but, thinking back on it, it's not that bad an idea, provided you limit the table size.
The following functions trade off space for time. As with all optimisations, you should profile them yourself in the target environment.
First the (elegant) recursive version:
unsigned int getSum (unsigned int val) {
static const unsigned char lookup[] = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, // 0- 9
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, // 10- 19
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, // 20- 29
:
18, 19, 20, 21, 22, 23, 24, 25, 26, 27 // 990-999
};
return (val == 0) ? 0 : getSum (val / 1000) + lookup[val%1000];
}
It basically separates the number into three-digit groupings with fixed lookups for each possibility. This can easily handle a 64-bit unsigned value with a recursion depth of seven stack frames.
For those who don't even trust that small amount of recursion (and you should, since normal programs go that deep and more even without recursion), you could try the iterative solution:
unsigned int getSum (unsigned int val) {
static const unsigned char lookup[] = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, // 0- 9
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, // 10- 19
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, // 20- 29
:
18, 19, 20, 21, 22, 23, 24, 25, 26, 27 // 990-999
};
unsigned int tot = 0;
while (val != 0) {
tot += lookup[val%1000];
val /= 1000;
}
return tot;
}
These are probably three times faster than the one-digit-at-a-time solution, at the cost of a thousand bytes of data. If you're not adverse to using 10K or 100K, you could increase the speed to four or five times but you may want to write a program to generate the static array statement above :-)
As with all optimisation options, measure, don't guess!
I prefer the more elegant recursive solution myself but I'm also one of those types who prefer cryptic crosswords. Read into that what you will.
#include <stdio.h>
2 #include <stdlib.h>
3
4 #define BUFSIZE 20
5
6 int main(void)
7 {
8 int number = 23456;
9 char myBuf[BUFSIZE];
10 int result;
11 int i = 0;
12
13 sprintf(myBuf,"%i\0",number);
14
15 for( i = 0; i < BUFSIZE && myBuf[i] != '\0';i++)
16 {
17 result += (myBuf[i]-48);
18 }
19
20 printf("The result is %d",result);
21 return 0;
22 }
23
Another idea here using the sprintf and the ascii number representation
#include<stdio.h>
main()
{
int sum=0,n;
scanf("%d",&n);
while(n){
sum+=n%10;
n/=10;
}
printf("result=%d",sum);
}
sum is the sum of digits of number n

Resources