I have a toy cipher program which is encountering a bus error when given a very long key (I'm using 961168601842738797 to reproduce it), which perplexes me. When I commented out sections to isolate the error, I found it was being caused by this innocent-looking for loop in my Sieve of Eratosthenes.
unsigned long i;
int candidatePrimes[CANDIDATE_PRIMES];
// CANDIDATE_PRIMES is a macro which sets the length of the array to
// two less than the upper bound of the sieve. (2 being the first prime
// and the lower bound.)
for (i=0;i<CANDIDATE_PRIMES;i++)
{
printf("i: %d\n", i); // does not print; bus error occurs first
//candidatePrimes[i] = PRIME;
}
At times this has been a segmentation fault rather than a bus error.
Can anyone help me to understand what is happening and how I can fix it/avoid it in the future?
Thanks in advance!
PS
The full code is available here:
http://pastebin.com/GNEsg8eb
I would say your VLA is too large for your stack, leading to undefined behaviour.
Better to allocate the array dynamically:
int *candidatePrimes = malloc(CANDIDATE_PRIMES * sizeof(int));
And don't forget to free before returning.
If this is Eratosthenes Sieve, then the array is really just flags. It's wasteful to use int if it's just going to hold 0 or 1. At least use char (for speed), or condense to a bit array (for minimal storage).
The problem is that you're blowing the stack away.
unsigned long i;
int candidatePrimes[CANDIDATE_PRIMES];
If CANDIDATE_PRIMES is large, this alters the stack pointer by a massive amount. But it doesn't touch the memory, it just adjusts the stack pointer by a very large amount.
for (i=0;i<CANDIDATE_PRIMES;i++)
{
This adjusts "i" which is way back in the good area of the stack, and sets it to zero. Checks that it's < CANDIDATE_PRIMES, which it is, and so performs the first iteration.
printf("i: %d\n", i); // does not print; bus error occurs first
This attempts to put the parameters for "printf" onto the bottom of the stack. BOOM. Invalid memory location.
What value does CANDIDATE_PRIMES have?
And, do you actually want to store all the primes you're testing or only those that pass? What is the purpose of storing the values 0 thru CANDIDATE_PRIMES sequentially in an array???
If what you just wanted to store the primes, you should use a dynamic allocation and grow it as needed.
size_t g_numSlots = 0;
size_t g_numPrimes = 0;
unsigned long* g_primes = NULL;
void addPrime(unsigned long prime) {
unsigned long* newPrimes;
if (g_numPrimes >= g_numSlots) {
g_numSlots += 256;
newPrimes = realloc(g_primes, g_numSlots * sizeof(unsigned long));
if (newPrimes == NULL) {
die(gracefully);
}
g_primes = newPrimes;
}
g_primes[g_numPrimes++] = prime;
}
Related
I don't know how to ask this question as it was little confusing to me. i was having a problem with this code
#include <stdio.h>
#include <stdlib.h>
#define ull unsigned long long
#define SIZE 1000000001
#define min(a,b) ((a<b?a:b))
#define max(a,b) ((a>b?a:b))
int solve(void) {
// unsigned *count = malloc(sizeof(unsigned) * SIZE);
int k;
scanf("%d", &k);
unsigned count[SIZE];
for (ull i = 0; i < SIZE; i++){
count[i] = 0;
}
return 0;
}
int main(void){
unsigned t;
if (scanf("%u", &t) != 1) return 1;
while (t-- > 0){
if (solve() != 0) return 1;
}
}
This code for me is giving segfault.
What my observation is
it is running fine until it is in solve function.
on calling solve function it is giving segfault.
It has nothing to do with scanf("%d", &k) as by removing this line gives the same error
But if we decrease the SIZE value it will run fine.
Other thing which i can do is instead of creating an array on stack i can use heap and this is working fine.
If i only declare array count in solve function instead of taking k as input and initializing all the values of array count to 0. i am not getting any segfault
So i have some questions regarding this.
Is this due to memory limitation to array or because of memory limitation for a stack frame for the function solve (or possibly another reason which i can't find).
If this is due to any kind of memory limitation than isn't it is too low for a program?
How compiler checks for such errors as adding any kind of print statement won't run before array declaration as i am getting segfault when program reaches solve. So compiler somehow knows that their is a problem with code without even getting there.
and specifically for the 6th point, as per my knowledge when declaring array it reserves memory for the array. So by initializing it i am doing nothing which will increase the size of array. So why i am not getting any kind of error when declaring array while i am getting segfault when i am initializing all those values in array
Maybe i am seeing it in totally wrong way but this is how i think it is, So please if you know any reason for this please answer me about that too
It depends on your operating system. On Windows, the typical maximum size for a stack is 1MB, whereas it is 8MB on a typical modern Linux, although those values are adjustable in various ways.
For me it's working properly check with other platform or other system.
I always get an SIGSEGV error whenever i am dynamically initializing arrays in C. Please tell me what am I doing wrong all the time?
The code works fine on TurboC but it gives SIGSEGV on an online judge which uses GCC.
Programming Problem
My code:
#include<stdio.h>
#include<stdlib.h>
int main(void)
{
long n,h,i,crane=0,box=0,temp=0;
long *comm;
scanf("%ld %ld",&n,&h);
long *a = (long*)malloc(n*sizeof(long));
for(i=0;i<n;i++)
scanf("%ld",&a[i]);
scanf("%ld",&comm[0]);
i=0;
while(comm[i]!=0)
{
i++;
scanf("%ld",&comm[i]);
}
for(i=0;comm[i]!=0;i++)
{
if(comm[i]==3)
box=1;
if(comm[i]==4 && box==1)
{
a[crane]=(a[crane]+1);
temp=0;
}
if(box==1 && (comm[i]==1 || comm[i]==2) && temp==0)
{
a[crane]=(a[crane]-1);
temp=1;
}
if(crane!=0 && comm[i]==1)
crane--;
if(comm[i]==2)
crane++;
if(comm[i]==0)
break;
}
for(i=0;i<n;i++)
printf("%ld ",a[i]);
free(a);
free(comm);
return 0;
}
For a start, nowhere in that code are you actually allocating memory for comm to point to. You apparently know this is required since you've done something similar for a and you free both a and comm at the end.
You need to malloc the memory for comm to point to, before using it. For example, if you wanted that to depend on the second value input (h, probable since it's not used anywhere else), you would need to add in:
comm = malloc(h*sizeof(long));
after the first scanf, noting that I don't cast the return value - you shouldn't do that in C.
If you don't know how big comm should be before using it, the usual way to handle that is to allocate a certain number of elements (the capacity) and keep track of how many you've used (the size). Each time when your size is about to exceed your capacity, use realloc to get more space.
The following (C-like) pseudo-code shows how to do this, starting with an empty array and expanding it by thirty elements each time more space is needed:
comm = NULL
capacity = 0
size = 0
for each value in input():
if size == capacity:
capacity += 30
comm = realloc (comm, capacity), exit if error
comm[size++] = value
Note that, on loop exit, size is the indicator of how many elements are in the array, despite the fact there may be more capacity.
And, as an aside, you should always assume that calls subject to failure (such as scanf and malloc) will fail at some point. In other words, check the return values.
I have a program where I repeat a succession of methods to reproduce time evolution. One of the things I have to do is to write the same value for a long continue subset of elements of a very large array. Knowing which elements are and which value I want, is there any other way rather than doing a loop for setting these values each by each?
EDIT: To be clear, I want to avoid this:
double arr[10000000];
int i;
for (i=0; i<100000; ++i)
arr[i] = 1;
by just one single call if it is possible. Can you assign to a part of an array the values from another array of the same size? Maybe I could have in memory a second array arr2[1000000] with all elements 1 and then do something like copying the memory of arr2 to the first 100.000 elements of arr?
I have a somewhat tongue-in-cheek and non-portable possibility for you to consider. If you tailored your buffer to a size that is a power of 2, you could seed the buffer with a single double, then use memcpy to copy successively larger chunks of the buffer until the buffer is full.
So first you copy the first 8 bytes over the next 8 bytes...(so now you have 2 doubles)
...then you copy the first 16 bytes over the next 16 bytes...(so now you have 4 doubles)
...then you copy the first 32 bytes over the next 32 bytes...(so now you have 8 doubles)
...and so on.
It's plain to see that we won't actually call memcpy all that many times, and if the implementation of memcpy is sufficiently faster than a simple loop we'll see a benefit.
Try building and running this and tell me how it performs on your machine. It's a very scrappy proof of concept...
#include <string.h>
#include <time.h>
#include <stdio.h>
void loop_buffer_init(double* buffer, int buflen, double val)
{
for (int i = 0; i < buflen; i++)
{
buffer[i] = val;
}
}
void memcpy_buffer_init(double* buffer, int buflen, double val)
{
buffer[0] = val;
int half_buf_size = buflen * sizeof(double) / 2;
for (int i = sizeof(double); i <= half_buf_size; i += i)
{
memcpy((unsigned char *)buffer + i, buffer, i);
}
}
void check_success(double* buffer, int buflen, double expected_val)
{
for (int i = 0; i < buflen; i++)
{
if (buffer[i] != expected_val)
{
printf("But your whacky loop failed horribly.\n");
break;
}
}
}
int main()
{
const int TEST_REPS = 500;
const int BUFFER_SIZE = 16777216;
static double buffer[BUFFER_SIZE]; // 2**24 doubles, 128MB
time_t start_time;
time(&start_time);
printf("Normal loop starting...\n");
for (int reps = 0; reps < TEST_REPS; reps++)
{
loop_buffer_init(buffer, BUFFER_SIZE, 1.0);
}
time_t end_time;
time(&end_time);
printf("Normal loop finishing after %.f seconds\n",
difftime(end_time, start_time));
time(&start_time);
printf("Whacky loop starting...\n");
for (int reps = 0; reps < TEST_REPS; reps++)
{
memcpy_buffer_init(buffer, BUFFER_SIZE, 2.5);
}
time(&end_time);
printf("Whacky loop finishing after %.f seconds\n",
difftime(end_time, start_time));
check_success(buffer, BUFFER_SIZE, 2.5);
}
On my machine, the results were:
Normal loop starting...
Normal loop finishing after 21 seconds
Whacky loop starting...
Whacky loop finishing after 9 seconds
To work with a buffer that was less than a perfect power of 2 in size, just go as far as you can with the increasing powers of 2 and then fill out the remainder in one final memcpy.
(Edit: before anyone mentions it, of course this is pointless with a static double (might as well initialize it at compile time) but it'll work just as well with a nice fresh stretch of memory requested at runtime.)
It looks like this solution is very sensitive to your cache size or other hardware optimizations. On my old (circa 2009) laptop the memcpy solution is as slow or slower than the simple loop, until the buffer size drops below 1MB. Below 1MB or so the memcpy solution returns to being twice as fast.
I have a program where I repeat a succession of methods to reproduce
time evolution. One of the things I have to do is to write the same
value for a long continue subset of elements of a very large array.
Knowing which elements are and which value I want, is there any other
way rather than doing a loop for setting these values each by each?
In principle, you can initialize an array however you like without using a loop. If that array has static duration then that initialization might in fact be extremely efficient, as the initial value is stored in the executable image in one way or another.
Otherwise, you have a few options:
if the array elements are of a character type then you can use memset(). Very likely this involves a loop internally, but you won't have one literally in your own code.
if the representation of the value you want to set has all bytes equal, such as is the case for typical representations of 0 in any arithmetic type , then memset() is again a possibility.
as you suggested, if you have another array with suitable contents then you can copy some or all of it into the target array. For this you would use memcpy(), unless there is a chance that the source and destination could overlap, in which case you would want memmove().
more generally, you may be able to read in the data from some external source, such as a file (e.g. via fread()). Don't count on any I/O-based solution to be performant, however.
you can write an analog of memset() that is specific to the data type of the array. Such a function would likely need to use a loop of some form internally, but you could avoid such a loop in the caller.
you can write a macro that expands to the needed loop. This can be type-generic, so you don't need different versions for different data types. It uses a loop, but the loop would not appear literally in your source code at the point of use.
If you know in advance how many elements you want to set, then in principle, you could write that many assignment statements without looping. But I cannot imagine why you would want so badly to avoid looping that you would resort to this for a large number of elements.
All of those except the last actually do loop, however -- they just avoid cluttering your code with a loop construct at the point where you want to set the array elements. Some of them may also be clearer and more immediately understandable to human readers.
I've written an OpenCL program in C in order to take advantage of my GPU for parallel processing, and I've run into an issue where the display driver crashes under certain calling conditions when running one of my kernels. I've created a new stripped-down program that demonstrates the same behavior.
Essentially I allocate a linear array on the GPU and then launch a kernel, in which each thread will increment each value in a single nonoverlapping 'row' of the array of fixed size, according to its global thread ID.
I have a for loop wrapping this task which causes it to be repeated a number of times - however, each repetition, I reset the pointer to memory to the same starting value, so the inner loop should be performing exactly the same task each iteration of the outer loop.
The odd behavior is that the program runs with no apparent errors (and the output looks correct) when run with between 1 and 958 repetitions of the outer loop. However, if this number is increased to anything above 958, the display driver crashes and is recovered. Oddly, this doesn't result in an error returned by clEnqueueNDRangeKernel() or the subsequent clFinish().
Here's the kernel in question:
__kernel void testKernel(__global unsigned int* arr)
{
// OVERRIDE ARGS
unsigned int numReps = 958;
unsigned int numRows = 1000;
unsigned int rowLength = 676;
// Make sure thread index is in-bounds
if( get_global_id(0) < numRows )
{
__global unsigned int* arrPtr;
__global unsigned int* arrInitPtr = arr + (get_global_id(0) * rowLength);
unsigned int i, j;
unsigned int tmp;
for( i = 0; i < numReps; ++i )
{
// Reset the array pointer to the first element in this thread's row
arrPtr = arrInitPtr;
for( j = 0; j < rowLength; ++j )
{
// Increment value in the row
tmp = *arrPtr;
*arrPtr = tmp + 1;
// Advance pointer to the next value
++arrPtr;
}
}
}
}
I've hard-coded the number of rows and row length to avoid any possible mistakes in parameter-passing and simplify things further.
I allocate the buffer (passed in to the kernel as arr) and enqueue the kernel as follows:
size_t numThreads = 1000;
unsigned int rowLength = 676;
size_t arrLength = rowLength * numThreads;
cl_mem arr_d = clCreateBuffer(gpuContext, CL_MEM_READ_WRITE, arrLength * sizeof(unsigned int), NULL, &clErr);
if( clErr != CL_SUCCESS )
{
printf("Error: Failed to allocate buffer on device.\n");
exit(2);
}
clSetKernelArg(testKernel, 0, sizeof(cl_mem), &arr_d);
clErr = clEnqueueNDRangeKernel(gpuCmdQueue, testKernel, 1, NULL, &numThreads, &numThreads, 0, NULL, NULL);
My first instinct is of course that arrPtr is being incremented beyond the boundaries of the array - however, I don't think this should be happening based on the for loop conditional and the fact that when I examine memory after copying the array back to the host, no values outside of the array appear to have been modified. For clarity, in my original program I initialize every value in the array to zero beforehand, but I left that out of this example program since it doesn't seem relevant to my problem.
I am positive that the memory access to arrPtr is out-of-bounds somehow - I don't see any other way for this to be crashing. However, my array is large enough, and I check the global thread ID before making any accesses, so even if my thread pool size were too large, that shouldn't be a problem.
I assume that the specific boundaries of the failure (958 - 959) are fairly arbitrary since they don't directly correspond to any of my parameters. The added repetitions must be exposing an underlying indexing problem. However, it's odd in that case that it's so repeatable with those values. I've also tried reducing one from various parameters in order to look for off-by-one errors, to no avail.
For reference, I'm using nVidia's 64-bit implementation of OpenCL (CUDA 6.0 drivers) with a GeForce 770 under Windows 7 64-bit.
Thanks for any responses! I've tried to be specific but didn't want this to become too long - if you have any questions or want to see my full OpenCL setup code, please just let me know.
I know it's old, but whatever... From the comment:
Windows has a watchdog timer mechanism that restarts the display driver if it appears to become unresponsive. I find that if my kernel runs for more than a few seconds, the timer will trip and restart the display driver. The only solution I know of is to break up the kernel execution into segments of one or two seconds each and run them sequentially.
(I got kinda this error, so this still seems to be true)
I have a function as follow:
int doSomething(long numLoop,long arraySize){
int * buffer;
buffer = (int*) malloc (arraySize * sizeof(int));
long k;
int i;
for (i=0;i<arraySize;i++)
buffer[i]=2;//write to make sure memory is allocated
//start reading from cache
for(k=0;k<numLoop;k++){
int i;
int temp
for (i=0;i<arraySize;i++)
temp = buffer[i];
}
}
What it do is to declare an array and read from the beginning to the end. The purpose is to see the effect of cache.
What I expect to see is: when I call doSomething(10000,1000), the arraySize is small so it is all stored in the cache. After that I call doSomething(100,100000), the arraySize is bigger than that of the cache. As a result, the 2nd function call should take longer than the 1st one. The latter function call involved in some memory access as the whole array cannot be stored in the cache.
However, it seems that the 2nd operation takes approximately the same time as the 1st one. So what's wrong here? I tried to compile with -O0 and it doesnt solve the problem.
Thank you.
Update 1: these are the code with random access and it seems to work, time access with large array is ~15s while small array is ~3s
int doSomething(long numLoop,int a, long arraySize){
int * buffer;
buffer = (int*) malloc (arraySize * sizeof(int));
long k;
int i;
for (i=0;i<arraySize;i++)
buffer[i]=2;//write to make sure memory is allocated
//start reading from cache
for(k=0;k<numLoop;k++){
int temp;
for (i=0;i<arraySize;i++){
long randnum = rand();//max is 32767
randnum = (randnum <<16) | rand();
if (randnum < 0) randnum = -randnum;
randnum%=arraySize;
temp = buffer[randnum];
}
}
}
You are accessing the array in sequence,
for (i=0;i<arraySize;i++)
temp = buffer[i];
so the part you are accessing will always be in the cache since that pattern is trivial to predict. To see a cache-effect, you must access the array in a less predictable order, for example by generating (pseudo)random indices, so that you jump between the fron and the back of the array.
In addition to the other answers: Your code accesses the memory sequentially. Let's assume that the cache line is 32 bytes. That means that you probably get a cache miss on every 8 access. So, picking a random index you should make it at least 32 bytes far from the previous value
In order to measure the effect across multiple calls, you must use the same buffer (with the expectation that the first time through you are loading the cache, and the next time you are using it). In your case, you are allocating a new buffer for every call. (Additionally, you are never freeing your allocation.)