I am having trouble understanding the output of the following simple CUDA code. All that the code does is allocate two integer arrays: one on the host and one on the device each of size 16. It then sets the device array elements to the integer value 3 and then copies these values into the host_array where all the elements are then printed out.
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
int num_elements = 16;
int num_bytes = num_elements * sizeof(int);
int *device_array = 0;
int *host_array = 0;
// malloc host memory
host_array = (int*)malloc(num_bytes);
// cudaMalloc device memory
cudaMalloc((void**)&device_array, num_bytes);
// Constant out the device array with cudaMemset
cudaMemset(device_array, 3, num_bytes);
// copy the contents of the device array to the host
cudaMemcpy(host_array, device_array, num_bytes, cudaMemcpyDeviceToHost);
// print out the result element by element
for(int i = 0; i < num_elements; ++i)
printf("%i\n", *(host_array+i));
// use free to deallocate the host array
free(host_array);
// use cudaFree to deallocate the device array
cudaFree(device_array);
return 0;
}
The output of this program is 50529027 printed line by line 16 times.
50529027
50529027
50529027
..
..
..
50529027
50529027
Where did this number come from? When I replace 3 with 0 in the cudaMemset call then I get correct behaviour. i.e.
0 printed line by line 16 times.
I compiled the code with nvcc test.cu on Ubuntu 10.10 with CUDA 4.0
I'm no cuda expert but 50529027 is 0x03030303 in hex. This means cudaMemset sets each byte in the array to 3 and not each int. This is not surprising given the signature of cuda memset (to pass in the number of bytes to set) and the general semantics of memset operations.
Edit: As to your (I guess) implicit question of how to achieve what you intended I think you have to write a loop and initialize each array element.
As others have pointed out, cudaMesetworks like the standard C memset- it sets byte values. From the CUDA documentation:
cudaError_t cudaMemset( void * devPtr, int value, size_t count)
Fills the first count bytes of the memory area pointed to by devPtr
with the constant byte value value.
If you want to set word size values, the best solution is to use your own memset kernel, perhaps something like this:
template<typename T>
__global__ void myMemset(T * x, T value, size_t count )
{
size_t tid = threadIdx.x + blockIdx.x * blockDim.x;
size_t stride = blockDim.x * gridDim.x;
for(int i=tid; i<count; i+=stride) {
x[i] = value;
}
}
which could be launched with enough blocks to cover the number of MP in your GPU, and each thread will do as many iterations as required to fill the memory allocation. Writes will be coalesced, so performance shouldn't be too bad. This could also be adapted to CUDA's vector types, if you so desired.
memset sets bytes, and integer is 4 bytes.. so what you get is 50529027 decimal, which is 0x3030303 in hex... In other words - you are using it wrong, and it has nothing to do with CUDA.
This is a classic memset shortcoming; it works only on data type with 8-bit size i.e char. This means it sets (probably) 3 to every 8-bits of the total memory. You can confirm this by a simple C++ code:
int main ()
{
int x=16;
size_t bytes = x*sizeof(int);
int *M = (int*)malloc(bytes);
memset(M,3,bytes);
for (int i = 0; i < x; ++i) {
printf("%d\n", M[i]);
}
return 0;
}
The only case in which memset works on all data types is when you set it to 0. (it sets every byte to 0 and hence all data to 0). If you change the data type to char, you'll see the desired output. cudaMemset is ditto copy of memset with the only difference that it takes a GPU pointer in input.
So memset or cudaMemset probably sets every byte to the integer value (in your case 3) of whole memory space defined by the third argument regardless of the datatype.
Tip:
Google: 50529027 in binary and you'll get the answer :)
Related
This program worked fine when i manually iterated over 5 individual variables but when I substituted them for those arrays and for loop, I started getting floating point exceptions. I have tried debugging but i can't find were the error comes out from.
#include <stdio.h>
int main(void) {
long int secIns;
int multiplicadors[4] = {60, 60, 24, 7};
int variables[5];
int i;
printf("insereix un aquantitat entera de segons: \n");
scanf("%ld", &secIns);
variables[0] = secIns;
for (i = 1; i < sizeof variables; i++) {
variables[i] = variables[i - 1]/multiplicadors[i - 1];
variables[i - 1] -= variables[i]*multiplicadors[i - 1];
}
printf("\n%ld segons són %d setmanes %d dies %d hores %d minuts %d segons\n", secIns, variables[4], variables[3], variables[2], variables[1], variables[0]);
return 0;
}
The problem is you're iterating past the ends of your arrays. The reason is that your sizeof expression isn't what you want. sizeof returns the size in bytes, not the number of elements.
To fix it, change the loop to:
for (i = 1; i < sizeof(variables)/sizeof(*variables); i++) {
On an unrelated note, you might consider changing secIns from long int to int, since it's being assigned to an element of an int array, so the added precision isn't really helping.
Consider this line of code:
for (i = 1; i < sizeof variables; i++) {
sizeof isn't doing what you think it's doing. You've declared an array of 5 ints. In this case, ints are 32-bit, which means they each use 4 bytes of memory. If you print the output of sizeof variables you'll get 20 because 4 * 5 = 20.
You'd need to divide the sizeof variables by the size of its first element.
As mentioned before, sizeOf returns the size of bytes the array holds.
Unlike java's .length that returns the actual length of the array. Takes a little bit more of knowledge with bytes when it comes to C.
https://www.geeksforgeeks.org/data-types-in-c/
This link tells you a bit more about data types and the memory(bytes) they take up.
You could also do sizeOf yourArrayName/sizeOf (int). sizeOf(datatype) returns the size of bytes the data type takes up.
sizeof will give the size (in bytes) of the variables and will yield different results depending on the data type.
Try:
for (i = 1; i < 5; i++) {
...
}
I want to read the values of the memory locations of the entire program flash memory of an MCU, in particular, the CC2538 on the OpenMote-CC2538. The read values are then computed into, currently, a large sum of all the values.
At this moment, I have the following code working to traverse the memory and get the values
uint64_t readMemory() {
unsigned char * bytes = (char *) 0x200000;
size_t size = 0x0007FFD4;
size_t i;
uint64_t amount = 0;
for (i = 0; i < size; i++) {
amount += bytes[i];
}
return amount;
}
uint64_t readFlashMemory() {
unsigned int * bytes = (int *) 0x200000;
size_t size = 0x0007FFD4;
size_t i;
uint64_t amount = 0;
for (i = 0; i < size; i+=4) {
amount += FlashGet(bytes);
bytes++;
}
return amount;
}
address 0x200000 and its size is 0x0007FFD4. The first function works with a char and goes to each address one by one, while the second one uses an existing function FlashGet(uint32_t) from the flash.c file, which is a direct access to a register (HWREG).
FlashGet requires a uint32_t address and returns a uint32_t value, as such it has a length of 4 and the address should be moved with 4 in the loop .The first function uses char for the addressing, which is a length of 1 and so the address should also move by 1 in the loop. Am I correct in these statements? If so, am I executing them correctly? For the second function, incrementing the pointer with 1 should move it with 4 due to it being of type uint32_t (similar to int).
However, the functions return a different value.
The first one returns: 674426297757
The second one returns: 8213668631160
As both functions should be doing the same, one or both must be incorrect and is not reading the entire program flash memory.
How can I fix both functions? Is there a better or easier way to read the entire memory when you have the starting address and size?
Consider you have a 4-byte flash memory with content
00 01 02 03
Adding by byte values will give you 0x000000000000006
Adding by 32-bit int values will give you 0x0000000003020100 assuming little-endian.
I'm experiencing some troubles with my code written in C. It's all about an int * vector intially declared and dynamically allocated but when it comes to filling it with data it stuck on the first element and won't increment the counter to fill the rest of the vector
my header file : instance.h
struct pbCoupe
{
int tailleBarre;
int nbTaillesDem;
int nbTotPcs;
int * taille;
int * nbDem;
};
my code : coupe.c
pb->taille = (int*) malloc (pb->nbTaillesDem * sizeof(int));
pb->nbDem = (int*) malloc (pb->nbTaillesDem * sizeof(int));
while (i < pb->nbTaillesDem)
{
fscanf_s(instanceFile,"%s",data,sizeof(data));
pb->taille[i] = atoi(data); //<-- here is the problem !! it only accept the first value and ignore all the rest
printf("%s\n",data);
fscanf_s(instanceFile,"%s",data,sizeof(data));
pb->nbDem[i] = atoi(data); //<-- the same problem here too !!
printf("%s\n",data);
i++;
}
Your interpretation of sizeof is wrong, since data is the buffer that the string is being parsed into.
It returns the size of the the variable, not the size of the the what the variable (or namely a pointer) points to
Strings in C are all pointer to the size would be 4 bytes on a 32-bit system, 8 on a 64-bit.
Since it prints all the number it reading more numbers that intended with each loop iteration 4 bytes = 4 characters, atoi on parses the first integer and returns,
EDIT: If it is a buffer array, sizeof returns the size of the array.
You need to make sure you are only reading in a single number per iteration of the loop to solve this issue.
If you don't care for the literal string, best thing you can do is use:
fscanf(instanceFile, "%d", ((pb->taille) + i)));
//and store the integer into the index right away
//last param same as &pb->taille[i]
I have a toy cipher program which is encountering a bus error when given a very long key (I'm using 961168601842738797 to reproduce it), which perplexes me. When I commented out sections to isolate the error, I found it was being caused by this innocent-looking for loop in my Sieve of Eratosthenes.
unsigned long i;
int candidatePrimes[CANDIDATE_PRIMES];
// CANDIDATE_PRIMES is a macro which sets the length of the array to
// two less than the upper bound of the sieve. (2 being the first prime
// and the lower bound.)
for (i=0;i<CANDIDATE_PRIMES;i++)
{
printf("i: %d\n", i); // does not print; bus error occurs first
//candidatePrimes[i] = PRIME;
}
At times this has been a segmentation fault rather than a bus error.
Can anyone help me to understand what is happening and how I can fix it/avoid it in the future?
Thanks in advance!
PS
The full code is available here:
http://pastebin.com/GNEsg8eb
I would say your VLA is too large for your stack, leading to undefined behaviour.
Better to allocate the array dynamically:
int *candidatePrimes = malloc(CANDIDATE_PRIMES * sizeof(int));
And don't forget to free before returning.
If this is Eratosthenes Sieve, then the array is really just flags. It's wasteful to use int if it's just going to hold 0 or 1. At least use char (for speed), or condense to a bit array (for minimal storage).
The problem is that you're blowing the stack away.
unsigned long i;
int candidatePrimes[CANDIDATE_PRIMES];
If CANDIDATE_PRIMES is large, this alters the stack pointer by a massive amount. But it doesn't touch the memory, it just adjusts the stack pointer by a very large amount.
for (i=0;i<CANDIDATE_PRIMES;i++)
{
This adjusts "i" which is way back in the good area of the stack, and sets it to zero. Checks that it's < CANDIDATE_PRIMES, which it is, and so performs the first iteration.
printf("i: %d\n", i); // does not print; bus error occurs first
This attempts to put the parameters for "printf" onto the bottom of the stack. BOOM. Invalid memory location.
What value does CANDIDATE_PRIMES have?
And, do you actually want to store all the primes you're testing or only those that pass? What is the purpose of storing the values 0 thru CANDIDATE_PRIMES sequentially in an array???
If what you just wanted to store the primes, you should use a dynamic allocation and grow it as needed.
size_t g_numSlots = 0;
size_t g_numPrimes = 0;
unsigned long* g_primes = NULL;
void addPrime(unsigned long prime) {
unsigned long* newPrimes;
if (g_numPrimes >= g_numSlots) {
g_numSlots += 256;
newPrimes = realloc(g_primes, g_numSlots * sizeof(unsigned long));
if (newPrimes == NULL) {
die(gracefully);
}
g_primes = newPrimes;
}
g_primes[g_numPrimes++] = prime;
}
I've noticed that a few of my classmates have actually tried asking questions about this same assignment on StackOverflow over the past few days so I'm going to shamelessly copy paste (only) the context of one question that was deleted (but still cached on Google with no answers) to save time. I apologize in advance for that.
Context
I am trying to write a C program that measures the data throughput (MBytes/sec) of the L2 cache of my system. To perform the measurement I have to write a program that copies an array A to an array B, repeated multiple times, and measure the throughput.
Consider at least two scenarios:
Both fields fit in the L2 cache
The array size is significantly larger than the L2 cache size.
Using memcpy() from string.h to copy the arrays, initialize both arrays with some values (e.g. random numbers using rand()), and repeat at least 100 times, otherwise you do not see a difference.
The array size and number of repeats should be input parameters. One of the array sizes should be half of my L2 cache size.
Question
So based on that context of the assignment I have a good idea of what I need to do because it pretty much tells me straight out. The problem is that we were given some template code to work with and I'm having trouble deciphering parts of it. I would really appreciate it if someone would help me to just figure out what is going on.
The code is:
/* do not add other includes */
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <time.h>
#include <string.h>
double getTime(){
struct timeval t;
double sec, msec;
while (gettimeofday(&t, NULL) != 0);
sec = t.tv_sec;
msec = t.tv_usec;
sec = sec + msec/1000000.0;
return sec;
}
/* for task 1 only */
void usage(void)
{
fprintf(stderr, "bandwith [--no_iterations iterations] [--array_size size]\n");
exit(1);
}
int main (int argc, char *argv[])
{
double t1, t2;
/* variables for task 1 */
unsigned int size = 1024;
unsigned int N = 100;
unsigned int i;
/* declare variables; examples, adjust for task */
int *A;
int *B;
/* parameter parsing task 1 */
for(i=1; i<(unsigned)argc; i++) {
if (strcmp(argv[i], "--no_iterations") == 0) {
i++;
if (i < argc)
sscanf(argv[i], "%u", &N);
else
usage();
} else if (strcmp(argv[i], "--array_size") == 0) {
i++;
if (i < argc)
sscanf(argv[i], "%u", &size);
else
usage();
} else usage();
}
/* allocate memory for arrays; examples, adjust for task */
A = malloc (size*size * sizeof (int));
B = malloc (size*size * sizeof (int));
/* initialise arrray elements */
t1 = getTime();
/* code to be measured goes here */
t2 = getTime();
/* output; examples, adjust for task */
printf("time: %6.2f secs\n",t2 - t1);
/* free memory; examples, adjust for task */
free(B);
free(A);
return 0;
}
My questions are:
What could the purpose of the usage method be?
What is the parameter passing part supposed to be doing because as far as I can tell it will just always lead to usage() and won't take any parameters with the sscanf lines?
In this assignment we're meant to record array sizes in KB or MB, and I know that malloc allocates size in bytes and with a size variable value of 1024 would result in 1MB * sizeof(int) (I think at least). In this case would the array size I should record be 1MB or 1MB * sizeof(int)?
If parameter passing worked properly and we passed parameters to change the size variable value would the array size always be the size variable squared? Or would the array size be considered to be just the size variable? It seems very unintuitive to malloc size*size instead of just size unless there's something I'm missing about all this.
My understanding of measuring the throughput is that I should just multiply the array size by the number of iterations and then divide by the time taken. Can I get any confirmation that this is right?
These are the only hurdles in my understanding of this assignment. Any help would be much appreciated.
What could the purpose of the usage method be?
The usage function tells you what arguments are supposed to be passed to the program on the command-line.
What is the parameter passing part supposed to be doing because as far as I can tell it will just always lead to usage() and won't take any parameters with the sscanf lines?
It leads the calling the usage() function when an invalid argument is passed to the program.
Otherwise, it sets the number of iterations to the variable N to the value of the argument no_iterations (default value of 100), and it sets the size of the array to the variable size to the value of the argument array_size (default value of 1024).
In this assignment we're meant to record array sizes in KB or MB, and I know that malloc allocates size in bytes and with a size variable value of 1024 would result in 1MB * sizeof(int) (I think at least). In this case would the array size I should record be 1MB or 1MB * sizeof(int)?
If your size is supposed to be 1 MB, then that is probably what the size should be.
If you want to make it sure the size is a factor of the size of the data type, then you can do:
if (size % sizeof(int) != 0)
{
size = ((int)(size / sizeof(int))) * sizeof(int);
}
If parameter passing worked properly and we passed parameters to change the size variable value would the array size always be the size variable squared? Or would the array size be considered to be just the size variable? It seems very unintuitive to malloc size*size instead of just size unless there's something I'm missing about all this.
You probably just want to allocate size bytes. Unless you are supposed to be working with matrices, rather than just arrays. In that case, it would be size * size bytes.
My understanding of measuring the throughput is that I should just multiply the array size by the number of iterations and then divide by the time taken. Can I get any confirmation that this is right?
I guess so.