I have written the code for the sieve but the program runs for only array size less than or equal to 1000000. For the rest of the cases which are larger, a simple SIGSEGV occurs. Can this be made to run cases > 1000000. Or where am I wrong?
#include <stdio.h>
int main()
{
unsigned long long int arr[10000001] = {[0 ... 10000000] = 0};
unsigned long long int c=0,i,j,a,b;
scanf("%llu%llu",&a,&b);
for(i=2;i<=b;i++)
if(arr[i] == 0)
for(j=2*i;j<=b;j+=i)
arr[j] = 1;
for(i=(a>2)?a:2;i<=b;i++)
if(arr[i] == 0)``
c++;
printf("%llu",c);
return 0;
}
This line allocates memory on the stack (which is a limited resource)
unsigned long long int arr[10000001] = {[0 ... 10000000] = 0};
If you are allocating 10,000,000 entries at 4 bytes each, that is 40 million bytes, which will be more than your stack can handle.
(or, on your platform, there is a good chance that a long-long-int is 8 or more bytes, indicating 80 million bytes in use!)
Instead, allocate the memory from the heap, which is much more plentiful:
int* arr = malloc(10,000,000 * sizeof(int)); // commas for clarity only. Remove in real code!
Or, if you want the memory initialized to zero, use calloc.
Then at the end of your program be sure you also free it:
free(arr);
PS The syntax {[0 ... 10000000] = 0}; is needlessly verbose.
To initialize an array to zero, simply:
int arr[100] = {0}; // Thats all!
You declared an array that can hold 10000001 items; if you want to handle larger numbers, you need a bigger array. I'm mildly surprised that it works for 1000000 already - that's a lot of stack space to be using.
Edit: sorry - didn't notice you had a different number of zeroes there. Don't use the stack to allocate your array and you should be fine. Just add static to the array declaration and you'll probably be okay.
Related
I was trying to make an array that contains Fibonacci numbers in C, but I got into trouble. I can't get all of the elements, and some of the elements are wrongly calculated, and I don't know where I am I going wrong.
#include <stdio.h>
int main(void){
int serie[]={1,1},sum=0,size=2;
while(size<=4000000){
serie[size]=serie[size-1]+serie[size-2];
printf("%d\n",serie[size-1]);
size+=1;
}
return 0;
}
Output:
1
2
4
6
11
17
28
45
73
118
191
309
500
809
1309
2118
3427
5545
8972
14517
23489
38006
61495
99501
160996
260497
421493
681990
1103483
1785473
2888956
4674429
7563385
12237814
19801199
32039013
51840212
83879225
135719437
219598662
355318099
574916761
930234860
1505151621
-1859580815
-354429194
2080957287
1726528093
-487481916
1239046177
751564261
1990610438
-1552792597
437817841
-1114974756
-677156915
-1792131671
1825678710
33547039
1859225749
1892772788
-542968759
1349804029
806835270
-2138327997
-1331492727
825146572
-506346155
318800417
-187545738
131254679
-56291059
74963620
18672561
93636181
112308742
205944923
318253665
524198588
842452253
1366650841
-2085864202
-719213361
1489889733
770676372
-2034401191
-1263724819
996841286
-266883533
729957753
463074220
1193031973
1656106193
-1445829130
210277063
-1235552067
-1025275004
2034140225
1008865221
-1251961850
-243096629
-1495058479
-1738155108
1061753709
-676401399
385352310
-291049089
94303221
-196745868
-102442647
-299188515
-401631162
-700819677
-1102450839
-1803270516
1389245941
-414024575
975221366
561196791
1536418157
2097614948
-660934191
--------------------------------
Process exited after 2.345 seconds with return value 3221225477
Press any key to continue . . .
I don't understand why it is giving that output.
int serie[]={1,1}
Declares an array of two elements. As the array has two elements and indices start from zero, it has valid indices - 0 and 1, ie. serie[0] is the first element and serie[1] is the second element.
int size=2;
while(..) {
serie[size]= ...
size+=1;
}
As size starts 2, the expression serie[2] = is invalid. There is no third element in the array and it writes to an unknown memory region. Executing such an action is undefined behavior. There could be some another variable there, some system variable, or memory of another program or it can spawn nasal demons. It is undefined.
If you want to store the output in an array, you need to make sure the array has enough elements to hold the input.
And a tip:
int serie[4000000];
may not work, as it will try to allocate 40000000 * sizeof(int), which assuming sizeof(int) = 4 is 15.2 megabytes of memory. Some systems don't allow to allocate that much memory on stack, so you should move to dynamic allocation.
You're having an integer overflow because the int size is ,at a certain leverl, not big enough to hold the numbers, so the number is wrapping round the size and giving false values.
Your program should be like:
#include <stdio.h>
int main(void){
long long unsigned series[100] = {1,1};
int size = 2;
while(size < 100){
series[size] = series[size-1] + series[size-2];
printf("%llu\n", series[size-1]);
size += 1;
}
return 0;
}
Although, size of long long unsigned is also limited, at a certain level, with such very big numbers in Fibonacci. So this will result in more correct numbers printed, but also will overflow at a certain level. It will overflow when the number exceeds this constant ULLONG_MAX declared in limits.h.
The problem with this code:
#include <stdio.h>
int main(void){
int serie[]={1,1},sum=0,size=2;
while(size<=4000000){
serie[size]=serie[size-1]+serie[size-2];
printf("%d\n",serie[size-1]);
size+=1;
}
return 0;
}
... is that it attempts to store a very long series of numbers (4 million) into a very short array (2 elements). Arrays are fixed in size. Changing the variable size has no effect on the size of the array serie.
The expression serie[size]=... stores numbers outside the bounds of the array every time it's executed because the only legal array index values are 0 and 1. This results in undefined behavior and to be honest you were lucky only to see weird output.
There are a couple of possible solutions. The one that changes your code the least is to simply extend the array. Note that I've made it a static rather than automatic variable, because your implementation probably won't support something of that size in its stack.
#include <stdio.h>
int serie[4000000]={1,1};
int main(void){
int size=2;
while(size<4000000){ // note strict less-than: 4000000 is not a valid index
serie[size]=serie[size-1]+serie[size-2];
printf("%d\n",serie[size-1]);
size+=1;
}
return 0;
}
The more general solution is to store the current term and the two previous terms in the series as three separate integers. It's a little more computationally expensive but doesn't have the huge memory requirement.
#include <limits.h>
#include <stdio.h>
int main(void)
{
int term0=0, term1=1, term2;
while(1)
{
if (term0 > INT_MAX - term1) break;// overflow, stop
term2 = term0 + term1;
printf("%d\n",term2);
term0 = term1;
term1 = term2;
}
return 0;
}
This also has the benefit that it won't print any numbers that have "wrapped around" as a result of exceeding the limits of what can be represented in an 'int`. Of course, you can easily choose another data type in order to get a longer sequence of valid output.
You have two problems:
You need to allocate more space in serie, as much as you are going
to use
Eventually the fib numbers will become too big to fit inside an integer, even a 64bit unsigned integer (long long unsigned), i think 90 or so is about max
See the modified code:
#include <stdio.h>
// Set maximum number of fib numbers
#define MAX_SIZE 90
int main(void) {
// Use 64 bit unsigned integer (can't be negative)
long long unsigned int serie[MAX_SIZE];
serie[0] = 1;
serie[1] = 1;
int sum = 0;
int size = 0;
printf("Fib(0): %llu\n", serie[0]);
printf("Fib(1): %llu\n", serie[1]);
for (size = 2; size < MAX_SIZE; size++) {
serie[size] = serie[size-1] + serie[size-2];
printf("Fib(%i): %llu\n", size, serie[size]);
}
return 0;
}
As you are only printing out the numbers, you don't actually have to store all of them
(only the two previous numbers), but it really doesn't matter if there's only 90.
I have a toy cipher program which is encountering a bus error when given a very long key (I'm using 961168601842738797 to reproduce it), which perplexes me. When I commented out sections to isolate the error, I found it was being caused by this innocent-looking for loop in my Sieve of Eratosthenes.
unsigned long i;
int candidatePrimes[CANDIDATE_PRIMES];
// CANDIDATE_PRIMES is a macro which sets the length of the array to
// two less than the upper bound of the sieve. (2 being the first prime
// and the lower bound.)
for (i=0;i<CANDIDATE_PRIMES;i++)
{
printf("i: %d\n", i); // does not print; bus error occurs first
//candidatePrimes[i] = PRIME;
}
At times this has been a segmentation fault rather than a bus error.
Can anyone help me to understand what is happening and how I can fix it/avoid it in the future?
Thanks in advance!
PS
The full code is available here:
http://pastebin.com/GNEsg8eb
I would say your VLA is too large for your stack, leading to undefined behaviour.
Better to allocate the array dynamically:
int *candidatePrimes = malloc(CANDIDATE_PRIMES * sizeof(int));
And don't forget to free before returning.
If this is Eratosthenes Sieve, then the array is really just flags. It's wasteful to use int if it's just going to hold 0 or 1. At least use char (for speed), or condense to a bit array (for minimal storage).
The problem is that you're blowing the stack away.
unsigned long i;
int candidatePrimes[CANDIDATE_PRIMES];
If CANDIDATE_PRIMES is large, this alters the stack pointer by a massive amount. But it doesn't touch the memory, it just adjusts the stack pointer by a very large amount.
for (i=0;i<CANDIDATE_PRIMES;i++)
{
This adjusts "i" which is way back in the good area of the stack, and sets it to zero. Checks that it's < CANDIDATE_PRIMES, which it is, and so performs the first iteration.
printf("i: %d\n", i); // does not print; bus error occurs first
This attempts to put the parameters for "printf" onto the bottom of the stack. BOOM. Invalid memory location.
What value does CANDIDATE_PRIMES have?
And, do you actually want to store all the primes you're testing or only those that pass? What is the purpose of storing the values 0 thru CANDIDATE_PRIMES sequentially in an array???
If what you just wanted to store the primes, you should use a dynamic allocation and grow it as needed.
size_t g_numSlots = 0;
size_t g_numPrimes = 0;
unsigned long* g_primes = NULL;
void addPrime(unsigned long prime) {
unsigned long* newPrimes;
if (g_numPrimes >= g_numSlots) {
g_numSlots += 256;
newPrimes = realloc(g_primes, g_numSlots * sizeof(unsigned long));
if (newPrimes == NULL) {
die(gracefully);
}
g_primes = newPrimes;
}
g_primes[g_numPrimes++] = prime;
}
I've noticed that a few of my classmates have actually tried asking questions about this same assignment on StackOverflow over the past few days so I'm going to shamelessly copy paste (only) the context of one question that was deleted (but still cached on Google with no answers) to save time. I apologize in advance for that.
Context
I am trying to write a C program that measures the data throughput (MBytes/sec) of the L2 cache of my system. To perform the measurement I have to write a program that copies an array A to an array B, repeated multiple times, and measure the throughput.
Consider at least two scenarios:
Both fields fit in the L2 cache
The array size is significantly larger than the L2 cache size.
Using memcpy() from string.h to copy the arrays, initialize both arrays with some values (e.g. random numbers using rand()), and repeat at least 100 times, otherwise you do not see a difference.
The array size and number of repeats should be input parameters. One of the array sizes should be half of my L2 cache size.
Question
So based on that context of the assignment I have a good idea of what I need to do because it pretty much tells me straight out. The problem is that we were given some template code to work with and I'm having trouble deciphering parts of it. I would really appreciate it if someone would help me to just figure out what is going on.
The code is:
/* do not add other includes */
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <time.h>
#include <string.h>
double getTime(){
struct timeval t;
double sec, msec;
while (gettimeofday(&t, NULL) != 0);
sec = t.tv_sec;
msec = t.tv_usec;
sec = sec + msec/1000000.0;
return sec;
}
/* for task 1 only */
void usage(void)
{
fprintf(stderr, "bandwith [--no_iterations iterations] [--array_size size]\n");
exit(1);
}
int main (int argc, char *argv[])
{
double t1, t2;
/* variables for task 1 */
unsigned int size = 1024;
unsigned int N = 100;
unsigned int i;
/* declare variables; examples, adjust for task */
int *A;
int *B;
/* parameter parsing task 1 */
for(i=1; i<(unsigned)argc; i++) {
if (strcmp(argv[i], "--no_iterations") == 0) {
i++;
if (i < argc)
sscanf(argv[i], "%u", &N);
else
usage();
} else if (strcmp(argv[i], "--array_size") == 0) {
i++;
if (i < argc)
sscanf(argv[i], "%u", &size);
else
usage();
} else usage();
}
/* allocate memory for arrays; examples, adjust for task */
A = malloc (size*size * sizeof (int));
B = malloc (size*size * sizeof (int));
/* initialise arrray elements */
t1 = getTime();
/* code to be measured goes here */
t2 = getTime();
/* output; examples, adjust for task */
printf("time: %6.2f secs\n",t2 - t1);
/* free memory; examples, adjust for task */
free(B);
free(A);
return 0;
}
My questions are:
What could the purpose of the usage method be?
What is the parameter passing part supposed to be doing because as far as I can tell it will just always lead to usage() and won't take any parameters with the sscanf lines?
In this assignment we're meant to record array sizes in KB or MB, and I know that malloc allocates size in bytes and with a size variable value of 1024 would result in 1MB * sizeof(int) (I think at least). In this case would the array size I should record be 1MB or 1MB * sizeof(int)?
If parameter passing worked properly and we passed parameters to change the size variable value would the array size always be the size variable squared? Or would the array size be considered to be just the size variable? It seems very unintuitive to malloc size*size instead of just size unless there's something I'm missing about all this.
My understanding of measuring the throughput is that I should just multiply the array size by the number of iterations and then divide by the time taken. Can I get any confirmation that this is right?
These are the only hurdles in my understanding of this assignment. Any help would be much appreciated.
What could the purpose of the usage method be?
The usage function tells you what arguments are supposed to be passed to the program on the command-line.
What is the parameter passing part supposed to be doing because as far as I can tell it will just always lead to usage() and won't take any parameters with the sscanf lines?
It leads the calling the usage() function when an invalid argument is passed to the program.
Otherwise, it sets the number of iterations to the variable N to the value of the argument no_iterations (default value of 100), and it sets the size of the array to the variable size to the value of the argument array_size (default value of 1024).
In this assignment we're meant to record array sizes in KB or MB, and I know that malloc allocates size in bytes and with a size variable value of 1024 would result in 1MB * sizeof(int) (I think at least). In this case would the array size I should record be 1MB or 1MB * sizeof(int)?
If your size is supposed to be 1 MB, then that is probably what the size should be.
If you want to make it sure the size is a factor of the size of the data type, then you can do:
if (size % sizeof(int) != 0)
{
size = ((int)(size / sizeof(int))) * sizeof(int);
}
If parameter passing worked properly and we passed parameters to change the size variable value would the array size always be the size variable squared? Or would the array size be considered to be just the size variable? It seems very unintuitive to malloc size*size instead of just size unless there's something I'm missing about all this.
You probably just want to allocate size bytes. Unless you are supposed to be working with matrices, rather than just arrays. In that case, it would be size * size bytes.
My understanding of measuring the throughput is that I should just multiply the array size by the number of iterations and then divide by the time taken. Can I get any confirmation that this is right?
I guess so.
I have a function as follow:
int doSomething(long numLoop,long arraySize){
int * buffer;
buffer = (int*) malloc (arraySize * sizeof(int));
long k;
int i;
for (i=0;i<arraySize;i++)
buffer[i]=2;//write to make sure memory is allocated
//start reading from cache
for(k=0;k<numLoop;k++){
int i;
int temp
for (i=0;i<arraySize;i++)
temp = buffer[i];
}
}
What it do is to declare an array and read from the beginning to the end. The purpose is to see the effect of cache.
What I expect to see is: when I call doSomething(10000,1000), the arraySize is small so it is all stored in the cache. After that I call doSomething(100,100000), the arraySize is bigger than that of the cache. As a result, the 2nd function call should take longer than the 1st one. The latter function call involved in some memory access as the whole array cannot be stored in the cache.
However, it seems that the 2nd operation takes approximately the same time as the 1st one. So what's wrong here? I tried to compile with -O0 and it doesnt solve the problem.
Thank you.
Update 1: these are the code with random access and it seems to work, time access with large array is ~15s while small array is ~3s
int doSomething(long numLoop,int a, long arraySize){
int * buffer;
buffer = (int*) malloc (arraySize * sizeof(int));
long k;
int i;
for (i=0;i<arraySize;i++)
buffer[i]=2;//write to make sure memory is allocated
//start reading from cache
for(k=0;k<numLoop;k++){
int temp;
for (i=0;i<arraySize;i++){
long randnum = rand();//max is 32767
randnum = (randnum <<16) | rand();
if (randnum < 0) randnum = -randnum;
randnum%=arraySize;
temp = buffer[randnum];
}
}
}
You are accessing the array in sequence,
for (i=0;i<arraySize;i++)
temp = buffer[i];
so the part you are accessing will always be in the cache since that pattern is trivial to predict. To see a cache-effect, you must access the array in a less predictable order, for example by generating (pseudo)random indices, so that you jump between the fron and the back of the array.
In addition to the other answers: Your code accesses the memory sequentially. Let's assume that the cache line is 32 bytes. That means that you probably get a cache miss on every 8 access. So, picking a random index you should make it at least 32 bytes far from the previous value
In order to measure the effect across multiple calls, you must use the same buffer (with the expectation that the first time through you are loading the cache, and the next time you are using it). In your case, you are allocating a new buffer for every call. (Additionally, you are never freeing your allocation.)
I am having trouble understanding the output of the following simple CUDA code. All that the code does is allocate two integer arrays: one on the host and one on the device each of size 16. It then sets the device array elements to the integer value 3 and then copies these values into the host_array where all the elements are then printed out.
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
int num_elements = 16;
int num_bytes = num_elements * sizeof(int);
int *device_array = 0;
int *host_array = 0;
// malloc host memory
host_array = (int*)malloc(num_bytes);
// cudaMalloc device memory
cudaMalloc((void**)&device_array, num_bytes);
// Constant out the device array with cudaMemset
cudaMemset(device_array, 3, num_bytes);
// copy the contents of the device array to the host
cudaMemcpy(host_array, device_array, num_bytes, cudaMemcpyDeviceToHost);
// print out the result element by element
for(int i = 0; i < num_elements; ++i)
printf("%i\n", *(host_array+i));
// use free to deallocate the host array
free(host_array);
// use cudaFree to deallocate the device array
cudaFree(device_array);
return 0;
}
The output of this program is 50529027 printed line by line 16 times.
50529027
50529027
50529027
..
..
..
50529027
50529027
Where did this number come from? When I replace 3 with 0 in the cudaMemset call then I get correct behaviour. i.e.
0 printed line by line 16 times.
I compiled the code with nvcc test.cu on Ubuntu 10.10 with CUDA 4.0
I'm no cuda expert but 50529027 is 0x03030303 in hex. This means cudaMemset sets each byte in the array to 3 and not each int. This is not surprising given the signature of cuda memset (to pass in the number of bytes to set) and the general semantics of memset operations.
Edit: As to your (I guess) implicit question of how to achieve what you intended I think you have to write a loop and initialize each array element.
As others have pointed out, cudaMesetworks like the standard C memset- it sets byte values. From the CUDA documentation:
cudaError_t cudaMemset( void * devPtr, int value, size_t count)
Fills the first count bytes of the memory area pointed to by devPtr
with the constant byte value value.
If you want to set word size values, the best solution is to use your own memset kernel, perhaps something like this:
template<typename T>
__global__ void myMemset(T * x, T value, size_t count )
{
size_t tid = threadIdx.x + blockIdx.x * blockDim.x;
size_t stride = blockDim.x * gridDim.x;
for(int i=tid; i<count; i+=stride) {
x[i] = value;
}
}
which could be launched with enough blocks to cover the number of MP in your GPU, and each thread will do as many iterations as required to fill the memory allocation. Writes will be coalesced, so performance shouldn't be too bad. This could also be adapted to CUDA's vector types, if you so desired.
memset sets bytes, and integer is 4 bytes.. so what you get is 50529027 decimal, which is 0x3030303 in hex... In other words - you are using it wrong, and it has nothing to do with CUDA.
This is a classic memset shortcoming; it works only on data type with 8-bit size i.e char. This means it sets (probably) 3 to every 8-bits of the total memory. You can confirm this by a simple C++ code:
int main ()
{
int x=16;
size_t bytes = x*sizeof(int);
int *M = (int*)malloc(bytes);
memset(M,3,bytes);
for (int i = 0; i < x; ++i) {
printf("%d\n", M[i]);
}
return 0;
}
The only case in which memset works on all data types is when you set it to 0. (it sets every byte to 0 and hence all data to 0). If you change the data type to char, you'll see the desired output. cudaMemset is ditto copy of memset with the only difference that it takes a GPU pointer in input.
So memset or cudaMemset probably sets every byte to the integer value (in your case 3) of whole memory space defined by the third argument regardless of the datatype.
Tip:
Google: 50529027 in binary and you'll get the answer :)