cuda kernel is not accessing all the element of an array - c

I have written a cuda program to do some operation on large array. But when I pass that array to a cuda kernel, then all of its elements are not accessed by threads. Below, there is a simple program explaining my use case:
#include <stdio.h>
#include <stdlib.h>
__global__
void kernel(int n){
int s = threadIdx.x + blockIdx.x*blockDim.x;
int t = blockDim.x*gridDim.x;
for(int i=s;i<n;i+=t){
printf("%d\n",i); //printing index of array which is being accessed
}
}
int main(void){
int i,n = 10000; //array_size
int blockSize = 64;
int numBlocks = (n + blockSize - 1) / blockSize;
kernel<<<numBlocks, blockSize>>>(n);
cudaDeviceSynchronize();
}
I've tried with different blockSize = 256, 128, 64, etc, It is not printing all the indices of array. Ideally, it should print any permutation of 0 to n-1, however it is printing lesser(<n) numbers.
If numBlocks and blockSize both are 1, then it is accessing all the element. And if array size is less than 4096, then also it is accessing all the elements.

Actually, all of the values are being printed in the current case. but you may not be able to see all of them due to buffer limit of the output console. Try increasing the output console's buffer size.
Additionally, keep in mind that the printf calls inside the kernel execute out-of-order. Also, there are limitations of the printf buffer on the device which are explained in the documentation.

Use better debugging techniques! Your code is properly functional
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <stdlib.h>
__global__
void kernel(int* in, int n){
int s = threadIdx.x + blockIdx.x*blockDim.x;
int t = blockDim.x*gridDim.x;
for (int i = s; i<n; i += t){
in[i] = 1; //printing index of array which is being accessed
}
}
int main(void){
int i, n = 10000; //array_size
int blockSize = 64;
int numBlocks = (n + blockSize - 1) / blockSize;
int* d_res,*h_res;
cudaMalloc(&d_res, n*sizeof(int));
h_res = (int*)malloc(n*sizeof(int));
kernel << <numBlocks, blockSize >> >(d_res, n);
cudaDeviceSynchronize();
cudaMemcpy(h_res, d_res, n*sizeof(int), cudaMemcpyDeviceToHost);
int sum = 0;
for (int i = 0; i < n; i++)
sum += h_res[i];
printf("%d", sum);
}

Related

Algorithm to generate N numbers with rand() without duplicates [duplicate]

I'm looking for a function in ANSI C that would randomize an array just like PHP's shuffle() does. Is there such a function or do I have to write it on my own? And if I have to write it on my own, what's the best/most performant way to do it?
My ideas so far:
Iterate through the array for, say, 100 times and exchange a random index with another random index
Create a new array and fill it with random indices from the first one checking each time if the index is already taken (performance = 0 complexity = serious)
Pasted from Asmodiel's link to Ben Pfaff's Writings, for persistence:
#include <stdlib.h>
/* Arrange the N elements of ARRAY in random order.
Only effective if N is much smaller than RAND_MAX;
if this may not be the case, use a better random
number generator. */
void shuffle(int *array, size_t n)
{
if (n > 1)
{
size_t i;
for (i = 0; i < n - 1; i++)
{
size_t j = i + rand() / (RAND_MAX / (n - i) + 1);
int t = array[j];
array[j] = array[i];
array[i] = t;
}
}
}
EDIT: And here's a generic version that works for any type (int, struct, ...) through memcpy. With an example program to run, it requires VLAs, not every compiler supports this so you might want to change that to malloc (which will perform badly) or a static buffer large enough to accommodate any type you throw at it:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
/* compile and run with
* cc shuffle.c -o shuffle && ./shuffle */
#define NELEMS(x) (sizeof(x) / sizeof(x[0]))
/* arrange the N elements of ARRAY in random order.
* Only effective if N is much smaller than RAND_MAX;
* if this may not be the case, use a better random
* number generator. */
static void shuffle(void *array, size_t n, size_t size) {
char tmp[size];
char *arr = array;
size_t stride = size * sizeof(char);
if (n > 1) {
size_t i;
for (i = 0; i < n - 1; ++i) {
size_t rnd = (size_t) rand();
size_t j = i + rnd / (RAND_MAX / (n - i) + 1);
memcpy(tmp, arr + j * stride, size);
memcpy(arr + j * stride, arr + i * stride, size);
memcpy(arr + i * stride, tmp, size);
}
}
}
#define print_type(count, stmt) \
do { \
printf("["); \
for (size_t i = 0; i < (count); ++i) { \
stmt; \
} \
printf("]\n"); \
} while (0)
struct cmplex {
int foo;
double bar;
};
int main() {
srand(time(NULL));
int intarr[] = { 1, -5, 7, 3, 20, 2 };
print_type(NELEMS(intarr), printf("%d,", intarr[i]));
shuffle(intarr, NELEMS(intarr), sizeof(intarr[0]));
print_type(NELEMS(intarr), printf("%d,", intarr[i]));
struct cmplex cmparr[] = {
{ 1, 3.14 },
{ 5, 7.12 },
{ 9, 8.94 },
{ 20, 1.84 }
};
print_type(NELEMS(intarr), printf("{%d %f},", cmparr[i].foo, cmparr[i].bar));
shuffle(cmparr, NELEMS(cmparr), sizeof(cmparr[0]));
print_type(NELEMS(intarr), printf("{%d %f},", cmparr[i].foo, cmparr[i].bar));
return 0;
}
The following code ensures that the array will be shuffled based on a random seed taken from the usec time. Also this implements the Fisher–Yates shuffle properly. I've tested the output of this function and it looks good (even expectation of any array element being the first element after shuffle. Also even expectation for being the last).
void shuffle(int *array, size_t n) {
struct timeval tv;
gettimeofday(&tv, NULL);
int usec = tv.tv_usec;
srand48(usec);
if (n > 1) {
size_t i;
for (i = n - 1; i > 0; i--) {
size_t j = (unsigned int) (drand48()*(i+1));
int t = array[j];
array[j] = array[i];
array[i] = t;
}
}
}
I’ll just echo Neil Butterworth’s answer, and point out some trouble with your first idea:
You suggested,
Iterate through the array for, say, 100 times and exchange a random index with another random index
Make this rigorous. I'll assume the existence of randn(int n), a wrapper around some RNG, producing numbers evenly distributed in [0, n-1], and swap(int a[], size_t i, size_t j),
void swap(int a[], size_t i, size_t j) {
int temp = a[i]; a[i] = a[j]; a[j] = temp;
}
which swaps a[i] and a[j].
Now let’s implement your suggestion:
void silly_shuffle(size_t n, int a[n]) {
for (size_t i = 0; i < n; i++)
swap(a, randn(n), randn(n)); // swap two random elements
}
Notice that this is not any better than this simpler (but still wrong) version:
void bad_shuffle(size_t n, int a[n]) {
for (size_t i = 0; i < n; i++)
swap(a, i, randn(n));
}
Well, what’s wrong? Consider how many permutations these functions give you: With n (or 2×_n_ for silly_shuffle) random selections in [0, n-1], the code will “fairly” select one of _n_² (or 2×_n_²) ways to shuffle the deck. The trouble is that there are n! = _n_×(n-1)×⋯×2×1 possible arrangements of the array, and neither _n_² nor 2×_n_² is a multiple of n!, proving that some permutations are more likely than others.
The Fisher-Yates shuffle is actually equivalent to your second suggestion, only with some optimizations that change (performance = 0, complexity = serious) to (performance = very good, complexity = pretty simple). (Actually, I’m not sure that a faster or simpler correct version exists.)
void fisher_yates_shuffle(size_t n, int a[n]) {
for (size_t i = 0; i < n; i++)
swap(a, i, i+randn(n-1-i)); // swap element with random later element
}
ETA: See also this post on Coding Horror.
There isn't a function in the C standard to randomize an array.
Look at Knuth - he has algorithms for the job.
Or look at Bentley - Programming Pearls or More Programming Pearls.
Or look in almost any algorithms book.
Ensuring a fair shuffle (where every permutation of the original order is equally likely) is simple, but not trivial.
Here a solution that uses memcpy instead of assignment, so you can use it for array over arbitrary data. You need twice the memory of original array and the cost is linear O(n):
void main ()
{
int elesize = sizeof (int);
int i;
int r;
int src [20];
int tgt [20];
for (i = 0; i < 20; src [i] = i++);
srand ( (unsigned int) time (0) );
for (i = 20; i > 0; i --)
{
r = rand () % i;
memcpy (&tgt [20 - i], &src [r], elesize);
memcpy (&src [r], &src [i - 1], elesize);
}
for (i = 0; i < 20; printf ("%d ", tgt [i++] ) );
}
The function you are looking for is already present in the standard C library. Its name is qsort. Random sorting can be implemented as:
int rand_comparison(const void *a, const void *b)
{
(void)a; (void)b;
return rand() % 2 ? +1 : -1;
}
void shuffle(void *base, size_t nmemb, size_t size)
{
qsort(base, nmemb, size, rand_comparison);
}
The example:
int arr[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
srand(0); /* each permutation has its number here */
shuffle(arr, 10, sizeof(int));
...and the output is:
3, 4, 1, 0, 2, 7, 6, 9, 8, 5
Assuming you may want to just access an array randomly instead of actually shuffling it, you can use the degenerative case of a linear congruential pseudo-random number generator
X_n+1 = (a Xn+c) mod N
where a is coprime to N
generates a random cycle over all values 0:N
Naturally you could store this sequence in an empty array.
uint32_t gcd ( uint32_t a, uint32_t b )
{
if ( a==0 ) return b;
return gcd ( b%a, a );
}
uint32_t get_coprime(uint32_t r){
uint32_t min_val = r>>1;
for(int i =0;i<r*40;i++){
uint64_t sel = min_val + ( rand()%(r-min_val ));
if(gcd(sel,r)==1)
return sel;
}
return 0;
}
uint32_t next_val(uint32_t coprime, uint32_t cur, uint32_t N)
{
return (cur+coprime)%N;
}
// Example output Array A in random order
void shuffle(float * A, uint32_t N){
uint32_t coprime = get_coprime(N);
cur = rand()%N;
for(uint32_t i = 0;i<N;i++){
printf("%f\n",A[cur]);
cur = next_val(coprime, cur, N);
}
Just run the following code first and modify it for your needs:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define arr_size 10
// shuffle array
void shuffle(int *array, size_t n) {
if (n > 1) {
for (size_t i = 0; i < n - 1; i++) {
size_t j = i + rand() / (RAND_MAX / (n - i) + 1);
int t = array[j];
array[j] = array[i];
array[i] = t;
}
}
}
// display array elements
void display_array(int *array, size_t n){
for (int i = 0; i < n; i++)
printf("%d ", array[i]);
}
int main() {
srand(time(NULL)); // this line is necessary
int numbers[arr_size] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
printf("Given array: ");
display_array(numbers, arr_size);
shuffle(numbers, arr_size);
printf("\nShuffled array: ");
display_array(numbers, arr_size);
return 0;
}
You would have something like:
You get different shuffled arrays every time you run the code:
The same answer like Nomadiq but the Random is kept simple.
The Random will be the same if you call the function one after another:
#include <stdlib.h>
#include <time.h>
void shuffle(int aArray[], int cnt){
int temp, randomNumber;
time_t t;
srand((unsigned)time(&t));
for (int i=cnt-1; i>0; i--) {
temp = aArray[i];
randomNumber = (rand() % (i+1));
aArray[i] = aArray[randomNumber];
aArray[randomNumber] = temp;
}
}
I saw the answers and I've discovered an easy way to do it
#include <stdio.h>
#include <conio.h>
#include <time.h>
int main(void){
int base[8] = {1,2,3,4,5,6,7,8}, shuffled[8] = {0,0,0,0,0,0,0,0};
int index, sorted, discart=0;
srand(time(NULL));
for(index = 0; index<8; index++){
discart = 0;
while(discart==0){
sorted = rand() % 8;
if (shuffled[sorted] == 0){
//This here is just for control of what is happening
printf("-------------\n");
printf("index: %i\n sorted: %i \n", index,sorted);
printf("-------------\n");
shuffled[sorted] = base[index];
discart= 1;
}
}
}
//This "for" is just to exibe the sequence of items inside your array
for(index=0;index<8; index++){
printf("\n----\n");
printf("%i", shuffled[index]);
}
return 0;
}
Notice that this method doesn't allow duplicated items.
And at the end you can use either numbers and letters, just replacing them into the string.
This function will shuffle array based on random seed:
void shuffle(int *arr, int size)
{
srand(time(NULL));
for (int i = size - 1; i > 0; i--)
{
int j = rand() % (i + 1);
int tmp = arr[i];
arr[i] = arr[j];
arr[j] = tmp;
}
}
In the code example, I have a function that takes as parameters a pointer to an int ordered_array and a pointer to int shuffled_array and a number representing the length of both arrays. It picks in each loop a random number from the ordered_array and inserts it into the shuffled array.
void shuffle_array(int *ordered_array, int *shuffled_array, int len){
int index;
for(int i = 0; i < len; i++){
index = (rand() % (len - i));
shuffled_array[i] = ordered_array[index];
ordered_array[index] = ordered_array[len-i];
}
}
I didn't see it among answers so I propose this solution if it can help anybody:
static inline void shuffle(size_t n, int arr[])
{
size_t rng;
size_t i;
int tmp[n];
int tmp2[n];
memcpy(tmp, arr, sizeof(int) * n);
bzero(tmp2, sizeof(int) * n);
srand(time(NULL));
i = 0;
while (i < n)
{
rng = rand() % (n - i);
while (tmp2[rng] == 1)
++rng;
tmp2[rng] = 1;
arr[i] = tmp[rng];
++i;
}
}

CUDA - Sieve of Eratosthenes division into parts

I'm writing implementation of Sieve of Eratosthenes (https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes) on GPU. But no sth like this - http://developer-resource.blogspot.com/2008/07/cuda-sieve-of-eratosthenes.html
Method:
Creating n-element array with default values 0/1 (0 - prime, 1 - no) and passing it on GPU (I know that it can be done directly in kernel but it's not problem in this moment).
Each thread in block checks multiples of a single number. Each block checks in total sqrt(n) possibilities. Each block == different interval.
Marking multiples as 1 and passing data back to the host.
Code:
#include <stdio.h>
#include <stdlib.h>
#define THREADS 1024
__global__ void kernel(int *global, int threads) {
extern __shared__ int cache[];
int tid = threadIdx.x + 1;
int offset = blockIdx.x * blockDim.x;
int number = offset + tid;
cache[tid - 1] = global[number];
__syncthreads();
int start = offset + 1;
int end = offset + threads;
for (int i = start; i <= end; i++) {
if ((i != tid) && (tid != 1) && (i % tid == 0)) {
cache[i - offset - 1] = 1;
}
}
__syncthreads();
global[number] = cache[tid - 1];
}
int main(int argc, char *argv[]) {
int *array, *dev_array;
int n = atol(argv[1]);
int n_sqrt = floor(sqrt((double)n));
size_t array_size = n * sizeof(int);
array = (int*) malloc(n * sizeof(int));
array[0] = 1;
array[1] = 1;
for (int i = 2; i < n; i++) {
array[i] = 0;
}
cudaMalloc((void**)&dev_array, array_size);
cudaMemcpy(dev_array, array, array_size, cudaMemcpyHostToDevice);
int threads = min(n_sqrt, THREADS);
int blocks = n / threads;
int shared = threads * sizeof(int);
kernel<<<blocks, threads, shared>>>(dev_array, threads);
cudaMemcpy(array, dev_array, array_size, cudaMemcpyDeviceToHost);
int count = 0;
for (int i = 0; i < n; i++) {
if (array[i] == 0) {
count++;
}
}
printf("Count: %d\n", count);
return 0;
}
Run:
./sieve 10240000
It works correctly when n = 16, 64, 1024, 102400... but for n = 10240000 I getting incorrect result. Where is problem?
This code has a variety of problems, in my view.
You are fundamentally accessing items out of range. Consider this sequence in your kernel:
int tid = threadIdx.x + 1;
int offset = blockIdx.x * blockDim.x;
int number = offset + tid;
cache[tid - 1] = global[number];
You (in some cases -- see below) have launched a thread array exactly equal in size to your global array. So what happens when the highest numbered thread runs the above code? number = threadIdx.x+1+blockIdx.x*blockDim.x. This number index will be one beyond the end of your array. This is true for many possible values of n. This problem would have been evident to you if you had either used proper cuda error checking or had run your code with cuda-memcheck. You should always do those things when you are having trouble with a CUDA code and also before asking for help from others.
The code only has a chance of working correctly if the input n is a perfect square. The reason for this is contained in these lines of code (as well as dependencies in the kernel):
int n = atol(argv[1]);
int n_sqrt = floor(sqrt((double)n));
...
int threads = min(n_sqrt, THREADS);
int blocks = n / threads;
(note that the correct function here would be atoi not atol, but I digress...) Unless n is a perfect square, the resultant n_sqrt will be somewhat less than the actual square root of n. This will lead you to compute a total thread array that is smaller than the necessary size. (It's OK if you don't believe me at this point. Run the code I will post below and input a size like 1025, then see if the number of threads * blocks is of sufficient size to cover an array of 1025.)
As you've stated:
Each block checks in total sqrt(n) possibilities.
Hopefully this also points out the danger of non-perfect square n, but we must now ask "what if n is larger than the square of the largest threadblock size (1024)? The answer is that the code will not work correctly in many cases - and your chosen input of 10240000, although a perfect square, exceeds 1024^2 (1048576) and it does not work for this reason. Your algorithm (which I claim is not a Sieve of Eratosthenes) requires that each block be able to check sqrt(n) possibilities, just as you stated in the question. When that no longer becomes possible because of the limits of threads per block, then your algorithm starts to break.
Here is a code that makes some attempt to fix issue #1 above, and at least give an explanation for the failures associated with #2 and #3:
#include <stdio.h>
#include <stdlib.h>
#define THREADS 1024
#define MAX 10240000
#define cudaCheckErrors(msg) \
do { \
cudaError_t __err = cudaGetLastError(); \
if (__err != cudaSuccess) { \
fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
msg, cudaGetErrorString(__err), \
__FILE__, __LINE__); \
fprintf(stderr, "*** FAILED - ABORTING\n"); \
exit(1); \
} \
} while (0)
__global__ void kernel(int *global, int threads) {
extern __shared__ int cache[];
int tid = threadIdx.x + 1;
int offset = blockIdx.x * blockDim.x;
int number = offset + tid;
if ((blockIdx.x != (gridDim.x-1)) || (threadIdx.x != (blockDim.x-1))){
cache[tid - 1] = global[number];
__syncthreads();
int start = offset + 1;
int end = offset + threads;
for (int i = start; i <= end; i++) {
if ((i != tid) && (tid != 1) && (i % tid == 0)) {
cache[i - offset - 1] = 1;
}
}
__syncthreads();
global[number] = cache[tid - 1];}
}
int cpu_sieve(int n){
int limit = floor(sqrt(n));
int *test_arr = (int *)malloc(n*sizeof(int));
if (test_arr == NULL) return -1;
memset(test_arr, 0, n*sizeof(int));
for (int i = 2; i < limit; i++)
if (!test_arr[i]){
int j = i*i;
while (j <= n){
test_arr[j] = 1;
j += i;}}
int count = 0;
for (int i = 2; i < n; i++)
if (!test_arr[i]) count++;
return count;
}
int main(int argc, char *argv[]) {
int *array, *dev_array;
if (argc != 2) {printf("must supply n as command line parameter\n"); return 1;}
int n = atoi(argv[1]);
if ((n < 1) || (n > MAX)) {printf("n out of range %d\n", n); return 1;}
int n_sqrt = floor(sqrt((double)n));
size_t array_size = n * sizeof(int);
array = (int*) malloc(n * sizeof(int));
array[0] = 1;
array[1] = 1;
for (int i = 2; i < n; i++) {
array[i] = 0;
}
cudaMalloc((void**)&dev_array, array_size);
cudaMemcpy(dev_array, array, array_size, cudaMemcpyHostToDevice);
int threads = min(n_sqrt, THREADS);
int blocks = n / threads;
int shared = threads * sizeof(int);
printf("threads = %d, blocks = %d\n", threads, blocks);
kernel<<<blocks, threads, shared>>>(dev_array, threads);
cudaMemcpy(array, dev_array, array_size, cudaMemcpyDeviceToHost);
cudaCheckErrors("some error");
int count = 0;
for (int i = 0; i < n; i++) {
if (array[i] == 0) {
count++;
}
}
printf("Count: %d\n", count);
printf("CPU Sieve: %d\n", cpu_sieve(n));
return 0;
}
There are a couple of issues, I think, but here's a pointer to the actual problem: The sieve of Eratosthenes removes iteratively multiples of already encountered prime numbers, and you want to separate the work-load into thread-blocks, where each thread-block operates on a piece of shared memory (cache, in your example). Thread-blocks, however, are generally independent from all other thread-blocks and cannot easily communicate with one another. One example to illustrate the problem: The thread with index 0 in thread-block with index 0 removes multiples of 2. Thread blocks with index > 0 have no way to know about this.

C programming Char array shuffle and storing values and if = [duplicate]

I'm looking for a function in ANSI C that would randomize an array just like PHP's shuffle() does. Is there such a function or do I have to write it on my own? And if I have to write it on my own, what's the best/most performant way to do it?
My ideas so far:
Iterate through the array for, say, 100 times and exchange a random index with another random index
Create a new array and fill it with random indices from the first one checking each time if the index is already taken (performance = 0 complexity = serious)
Pasted from Asmodiel's link to Ben Pfaff's Writings, for persistence:
#include <stdlib.h>
/* Arrange the N elements of ARRAY in random order.
Only effective if N is much smaller than RAND_MAX;
if this may not be the case, use a better random
number generator. */
void shuffle(int *array, size_t n)
{
if (n > 1)
{
size_t i;
for (i = 0; i < n - 1; i++)
{
size_t j = i + rand() / (RAND_MAX / (n - i) + 1);
int t = array[j];
array[j] = array[i];
array[i] = t;
}
}
}
EDIT: And here's a generic version that works for any type (int, struct, ...) through memcpy. With an example program to run, it requires VLAs, not every compiler supports this so you might want to change that to malloc (which will perform badly) or a static buffer large enough to accommodate any type you throw at it:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
/* compile and run with
* cc shuffle.c -o shuffle && ./shuffle */
#define NELEMS(x) (sizeof(x) / sizeof(x[0]))
/* arrange the N elements of ARRAY in random order.
* Only effective if N is much smaller than RAND_MAX;
* if this may not be the case, use a better random
* number generator. */
static void shuffle(void *array, size_t n, size_t size) {
char tmp[size];
char *arr = array;
size_t stride = size * sizeof(char);
if (n > 1) {
size_t i;
for (i = 0; i < n - 1; ++i) {
size_t rnd = (size_t) rand();
size_t j = i + rnd / (RAND_MAX / (n - i) + 1);
memcpy(tmp, arr + j * stride, size);
memcpy(arr + j * stride, arr + i * stride, size);
memcpy(arr + i * stride, tmp, size);
}
}
}
#define print_type(count, stmt) \
do { \
printf("["); \
for (size_t i = 0; i < (count); ++i) { \
stmt; \
} \
printf("]\n"); \
} while (0)
struct cmplex {
int foo;
double bar;
};
int main() {
srand(time(NULL));
int intarr[] = { 1, -5, 7, 3, 20, 2 };
print_type(NELEMS(intarr), printf("%d,", intarr[i]));
shuffle(intarr, NELEMS(intarr), sizeof(intarr[0]));
print_type(NELEMS(intarr), printf("%d,", intarr[i]));
struct cmplex cmparr[] = {
{ 1, 3.14 },
{ 5, 7.12 },
{ 9, 8.94 },
{ 20, 1.84 }
};
print_type(NELEMS(intarr), printf("{%d %f},", cmparr[i].foo, cmparr[i].bar));
shuffle(cmparr, NELEMS(cmparr), sizeof(cmparr[0]));
print_type(NELEMS(intarr), printf("{%d %f},", cmparr[i].foo, cmparr[i].bar));
return 0;
}
The following code ensures that the array will be shuffled based on a random seed taken from the usec time. Also this implements the Fisher–Yates shuffle properly. I've tested the output of this function and it looks good (even expectation of any array element being the first element after shuffle. Also even expectation for being the last).
void shuffle(int *array, size_t n) {
struct timeval tv;
gettimeofday(&tv, NULL);
int usec = tv.tv_usec;
srand48(usec);
if (n > 1) {
size_t i;
for (i = n - 1; i > 0; i--) {
size_t j = (unsigned int) (drand48()*(i+1));
int t = array[j];
array[j] = array[i];
array[i] = t;
}
}
}
I’ll just echo Neil Butterworth’s answer, and point out some trouble with your first idea:
You suggested,
Iterate through the array for, say, 100 times and exchange a random index with another random index
Make this rigorous. I'll assume the existence of randn(int n), a wrapper around some RNG, producing numbers evenly distributed in [0, n-1], and swap(int a[], size_t i, size_t j),
void swap(int a[], size_t i, size_t j) {
int temp = a[i]; a[i] = a[j]; a[j] = temp;
}
which swaps a[i] and a[j].
Now let’s implement your suggestion:
void silly_shuffle(size_t n, int a[n]) {
for (size_t i = 0; i < n; i++)
swap(a, randn(n), randn(n)); // swap two random elements
}
Notice that this is not any better than this simpler (but still wrong) version:
void bad_shuffle(size_t n, int a[n]) {
for (size_t i = 0; i < n; i++)
swap(a, i, randn(n));
}
Well, what’s wrong? Consider how many permutations these functions give you: With n (or 2×_n_ for silly_shuffle) random selections in [0, n-1], the code will “fairly” select one of _n_² (or 2×_n_²) ways to shuffle the deck. The trouble is that there are n! = _n_×(n-1)×⋯×2×1 possible arrangements of the array, and neither _n_² nor 2×_n_² is a multiple of n!, proving that some permutations are more likely than others.
The Fisher-Yates shuffle is actually equivalent to your second suggestion, only with some optimizations that change (performance = 0, complexity = serious) to (performance = very good, complexity = pretty simple). (Actually, I’m not sure that a faster or simpler correct version exists.)
void fisher_yates_shuffle(size_t n, int a[n]) {
for (size_t i = 0; i < n; i++)
swap(a, i, i+randn(n-1-i)); // swap element with random later element
}
ETA: See also this post on Coding Horror.
There isn't a function in the C standard to randomize an array.
Look at Knuth - he has algorithms for the job.
Or look at Bentley - Programming Pearls or More Programming Pearls.
Or look in almost any algorithms book.
Ensuring a fair shuffle (where every permutation of the original order is equally likely) is simple, but not trivial.
Here a solution that uses memcpy instead of assignment, so you can use it for array over arbitrary data. You need twice the memory of original array and the cost is linear O(n):
void main ()
{
int elesize = sizeof (int);
int i;
int r;
int src [20];
int tgt [20];
for (i = 0; i < 20; src [i] = i++);
srand ( (unsigned int) time (0) );
for (i = 20; i > 0; i --)
{
r = rand () % i;
memcpy (&tgt [20 - i], &src [r], elesize);
memcpy (&src [r], &src [i - 1], elesize);
}
for (i = 0; i < 20; printf ("%d ", tgt [i++] ) );
}
The function you are looking for is already present in the standard C library. Its name is qsort. Random sorting can be implemented as:
int rand_comparison(const void *a, const void *b)
{
(void)a; (void)b;
return rand() % 2 ? +1 : -1;
}
void shuffle(void *base, size_t nmemb, size_t size)
{
qsort(base, nmemb, size, rand_comparison);
}
The example:
int arr[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
srand(0); /* each permutation has its number here */
shuffle(arr, 10, sizeof(int));
...and the output is:
3, 4, 1, 0, 2, 7, 6, 9, 8, 5
Assuming you may want to just access an array randomly instead of actually shuffling it, you can use the degenerative case of a linear congruential pseudo-random number generator
X_n+1 = (a Xn+c) mod N
where a is coprime to N
generates a random cycle over all values 0:N
Naturally you could store this sequence in an empty array.
uint32_t gcd ( uint32_t a, uint32_t b )
{
if ( a==0 ) return b;
return gcd ( b%a, a );
}
uint32_t get_coprime(uint32_t r){
uint32_t min_val = r>>1;
for(int i =0;i<r*40;i++){
uint64_t sel = min_val + ( rand()%(r-min_val ));
if(gcd(sel,r)==1)
return sel;
}
return 0;
}
uint32_t next_val(uint32_t coprime, uint32_t cur, uint32_t N)
{
return (cur+coprime)%N;
}
// Example output Array A in random order
void shuffle(float * A, uint32_t N){
uint32_t coprime = get_coprime(N);
cur = rand()%N;
for(uint32_t i = 0;i<N;i++){
printf("%f\n",A[cur]);
cur = next_val(coprime, cur, N);
}
Just run the following code first and modify it for your needs:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define arr_size 10
// shuffle array
void shuffle(int *array, size_t n) {
if (n > 1) {
for (size_t i = 0; i < n - 1; i++) {
size_t j = i + rand() / (RAND_MAX / (n - i) + 1);
int t = array[j];
array[j] = array[i];
array[i] = t;
}
}
}
// display array elements
void display_array(int *array, size_t n){
for (int i = 0; i < n; i++)
printf("%d ", array[i]);
}
int main() {
srand(time(NULL)); // this line is necessary
int numbers[arr_size] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
printf("Given array: ");
display_array(numbers, arr_size);
shuffle(numbers, arr_size);
printf("\nShuffled array: ");
display_array(numbers, arr_size);
return 0;
}
You would have something like:
You get different shuffled arrays every time you run the code:
The same answer like Nomadiq but the Random is kept simple.
The Random will be the same if you call the function one after another:
#include <stdlib.h>
#include <time.h>
void shuffle(int aArray[], int cnt){
int temp, randomNumber;
time_t t;
srand((unsigned)time(&t));
for (int i=cnt-1; i>0; i--) {
temp = aArray[i];
randomNumber = (rand() % (i+1));
aArray[i] = aArray[randomNumber];
aArray[randomNumber] = temp;
}
}
I saw the answers and I've discovered an easy way to do it
#include <stdio.h>
#include <conio.h>
#include <time.h>
int main(void){
int base[8] = {1,2,3,4,5,6,7,8}, shuffled[8] = {0,0,0,0,0,0,0,0};
int index, sorted, discart=0;
srand(time(NULL));
for(index = 0; index<8; index++){
discart = 0;
while(discart==0){
sorted = rand() % 8;
if (shuffled[sorted] == 0){
//This here is just for control of what is happening
printf("-------------\n");
printf("index: %i\n sorted: %i \n", index,sorted);
printf("-------------\n");
shuffled[sorted] = base[index];
discart= 1;
}
}
}
//This "for" is just to exibe the sequence of items inside your array
for(index=0;index<8; index++){
printf("\n----\n");
printf("%i", shuffled[index]);
}
return 0;
}
Notice that this method doesn't allow duplicated items.
And at the end you can use either numbers and letters, just replacing them into the string.
This function will shuffle array based on random seed:
void shuffle(int *arr, int size)
{
srand(time(NULL));
for (int i = size - 1; i > 0; i--)
{
int j = rand() % (i + 1);
int tmp = arr[i];
arr[i] = arr[j];
arr[j] = tmp;
}
}
In the code example, I have a function that takes as parameters a pointer to an int ordered_array and a pointer to int shuffled_array and a number representing the length of both arrays. It picks in each loop a random number from the ordered_array and inserts it into the shuffled array.
void shuffle_array(int *ordered_array, int *shuffled_array, int len){
int index;
for(int i = 0; i < len; i++){
index = (rand() % (len - i));
shuffled_array[i] = ordered_array[index];
ordered_array[index] = ordered_array[len-i];
}
}
I didn't see it among answers so I propose this solution if it can help anybody:
static inline void shuffle(size_t n, int arr[])
{
size_t rng;
size_t i;
int tmp[n];
int tmp2[n];
memcpy(tmp, arr, sizeof(int) * n);
bzero(tmp2, sizeof(int) * n);
srand(time(NULL));
i = 0;
while (i < n)
{
rng = rand() % (n - i);
while (tmp2[rng] == 1)
++rng;
tmp2[rng] = 1;
arr[i] = tmp[rng];
++i;
}
}

A strange error occur about allocate array and rand()

I want to fill big array with rand() function, when I define my array by int h_in[N],the program crash in vs 2010,to my surprise, when I copy it to the online complier ideone a linkand everything is ok.Finally I define array by h_in = (int *)malloc(N * sizeof(int)) in VS 2010,the program works.I can't figure out that and hope somebody point out my error.
#include <stdio.h>
#include <stdlib.h>
const int N = 1024 * 1024;
int main()
{
//int *h_in = (int *)malloc(N * sizeof(int));
int h_in[N];
float sum = 0.0f;
srand(1);
for(unsigned int i = 0; i < N; i++) {
h_in[i] = (rand() & 0xFF);
}
return 0;
}
int h_in[N];
is allocated on the stack.
int * h_in = malloc(N * sizeof(int));
is allocated on the heap. [BTW: don't cast the result of malloc()]
The default stack size is 1MB, so you should use a linker option to increase it:
/F (Set Stack Size)

Implementing CUDA VecAdd from sample code

I'm trying to test out a sample code from the CUDA site http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#kernels.
I simply want to add two arrays A and B of size 4, and store it in array C. Here is what I have so far:
#include <stdio.h>
#include "util.h"
void print_array(int* array, int size) {
int i;
for (i = 0; i < size; i++) {
printf("%d ", array[i]);
}
printf("\n");
}
__global__ void VecAdd(int* A, int* B, int* C) {
int i = threadIdx.x;
C[i] = A[i] + B[i];
}
int main(int argc , char **argv) {
int N = 4;
int i;
int *A = (int *) malloc(N * sizeof(int));
int *B = (int *) malloc(N * sizeof(int));
int *C = (int *) malloc(N * sizeof(int));
for (i = 0; i < N; i++) {
A[i] = i + 1;
B[i] = i + 1;
}
print_array(A, N);
print_array(B, N);
VecAdd<<<1, N>>>(A, B, C);
print_array(C, N);
return 0;
}
I'm expecting the C array (the last row of the output) to be 2, 4, 6, 8, but it doesn't seem to get added:
1 2 3 4
1 2 3 4
0 0 0 0
What am I missing?
First, you have to define the pointers that will hold the data that will be copied to GPU:
In your example, we want to copy the arrays 'a','b' and 'c' from CPU to the GPU's global memory.
int a[array_size], b[array_size],c[array_size]; // your original arrays
int *a_cuda,*b_cuda,*c_cuda; // defining the "cuda" pointers
define the size that each array will occupy.
int size = array_size * sizeof(int); // Is the same for the 3 arrays
Then you will allocate the space to the data that will be used in cuda:
Cuda memory allocation:
msg_erro[0] = cudaMalloc((void **)&a_cuda,size);
msg_erro[1] = cudaMalloc((void **)&b_cuda,size);
msg_erro[2] = cudaMalloc((void **)&c_cuda,size);
Now we need to copy this data from CPU to the GPU:
Copy from CPU to GPU:
msg_erro[3] = cudaMemcpy(a_cuda, a,size,cudaMemcpyHostToDevice);
msg_erro[4] = cudaMemcpy(b_cuda, b,size,cudaMemcpyHostToDevice);
msg_erro[5] = cudaMemcpy(c_cuda, c,size,cudaMemcpyHostToDevice);
Execute the kernel
int blocks = //;
int threads_per_block = //;
VecAdd<<<blocks, threads_per_block>>>(a_cuda, b_cuda, c_cuda);
Copy the results from GPU to CPU (in our example array C):
msg_erro[6] = cudaMemcpy(c,c_cuda,size,cudaMemcpyDeviceToHost);
Free Memory:
cudaFree(a_cuda);
cudaFree(b_cuda);
cudaFree(c_cuda);
For debugging purposes, I normally save the status of the functions on an array, like this:
cudaError_t msg_erro[var];
However, this is not strictly necessary but it will save you time if an error occurs during the allocation or memory transference. You can take out all the 'msg_erro[x] =' from the code above if you wish.
If you mantain the 'msg_erro[x] =', and if a error does occur you can use a function like the one that follows, to print these erros:
void printErros(cudaError_t *erros,int size)
{
for(int i = 0; i < size; i++)
printf("{%d} => %s\n",i ,cudaGetErrorString(erros[i]));
}
You need to transfer the memory back and forth from/to the GPU, something like
int *a_GPU, *b_GPU, *c_GPU;
cudaMalloc(&a_GPU, N*sizeof(int));
cudaMalloc(&b_GPU, N*sizeof(int));
cudaMalloc(&c_GPU, N*sizeof(int));
cudaMemcpy(a_GPU, A, N*sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(b_GPU, B, N*sizeof(int), cudaMemcpyHostToDevice);
VecAdd<<<1, N>>>(a_GPU, b_GPU, c_GPU);
cudaMemcpy(C, c_GPU, N*sizeof(int), cudaMemcpyDeviceToHost);
print_array(C, N);
cudaFree(a_GPU);
cudaFree(b_GPU);
cudaFree(c_GPU);

Resources