Working on my assignment, more details in another question. If I use
arr[(i * 16) % arrLen] *= 2; // seg fault
vs
arr[i % arrLen] *= 2; // OK!
Why? Full source see line 31. Why? I modulus the array length so it should be OK?
i * 16 can overflow into the negative range of signed integers. When you take modulo of a negative integer, you can get a negative remainder and that'll make your array subscript negative and result in an access outside of the array's allocated memory and sometimes a crash.
Looking at your full source:
You should be checking the return of malloc to make sure you were able to get that memory
You should be freeing the memory, you have a leak
The memory inside your arr array is uninitialized, you allocate it but don't set it to anything so you're getting (most likely) a large negitive number. This can be done by memset(arr,0,arrLen);
You malloc(arrLen * sizeof(int)) however arrLen is created with a /sizeof(int), so you're canceling your work there...
Regarding your seg fault, as others have stated your overflowing your array. You've created an array of ints. The then you're looping from 0 to reps (268,435,456) which is the max size of an int. When you try to multiply that by 16 you're overflowing and creating a negative offset.
Try multiplying the 16 into the initialization of reps:
int reps = 256 * 1024 * 1024 * 16;
Your compiler should throw a warning letting you know this exact thing:
warning: integer overflow in expression [-Woverflow]
Assuming the size of an int on your system is 32-bits, chances are you're causing an overflow and the result of i * 16 is becoming negative. In a two's complement system, negative values are represented with a higher binary value.
int reps = 256 * 1024 * 1024;
So reps = 268,435,456, which is the value you're looping up until. The greatest value of i is therefore 268,435,455 and 268,435,455 * 16 = 4,294,967,280.
The largest positive value a 32-bit int can represent is 2,147,483,647 (4,294,967,295 for an unsigned int, so you haven't wrapped around the negatives yet), which means that result is being interpreted as a negative value.
Accessing a negative offset from arr is out of the bounds of your allocated memory, which causes undefined behaviour and fortunately a seg fault.
Related
#include <stdio.h>
#include <stdlib.h>
int factorial(int i) {
if(i == 1) {
return 1;
}
else {
return i*factorial(i - 1);
}
}
int combination(int l, int m) {
return factorial(l)/(factorial(l-m)*factorial(m));
}
int main() {
int n,r;
printf("Input taken in form of nCr\n");
printf("Enter n: ");
scanf("%d", &n);
printf("Enter r: ");
scanf("%d", &r);
int y = combination(n, r);
printf("Result: %d", y);
return 0;
}
Tried to make a simple code for calculating the combination function in maths. It worked for small values and basically works till n = 12, and gives wrong values from n = 13 and onwards.
Also for n = 15 and r = 2, it returns the result -4.
And it gives the error
segmentation fault (core dumped)
for n = 40 and r = 20.
I would like to know how to solve this problem and why exactly is this happening.
The value of 13! is 6227020800 which is too large to fit into an 32 bit integer. By attempting to calculate this factorial or larger results in overflowing a 32 bit int. Signed integer overflow invokes undefined behavior.
In some cases, this undefined behavior manifests as outputting the wrong value, while in others it manifests as a crash. The cases where it crashes the factorial function is most likely passed a value less than 1, meaning that the recursive calls will attempt to go all the way down to INT_MIN but fills up the stack before that can happen.
Even changing to long long isn't enough to fix this, as the intermediate results will overflow that. So how do you fix this? If you were calculating these values by hand you wouldn't multiply out all of the numbers together then divide two huge numbers. You'd write out the factors and cancel out terms from the top and bottom of the equation. For example, suppose you wanted to calculate 12C7. You would write it out like this:
12 * 11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1
------------------------------------------------
( 5 * 4 * 3 * 2 * 1 ) * (7 * 6 * 5 * 4 * 3 * 2 * 1)
Then cancel out 7! from the top and bottom:
12 * 11 * 10 * 9 * 8
---------------------
5 * 4 * 3 * 2
Then cancel out other terms:
12 * 11 * 10 * 9 * 8 12 * 11 * 2 * 9 * 8 12 * 11 * 2 * 9
--------------------- = -------------------- = --------------- = 4 * 11 * 2 * 9
5 * 4 * 3 * 2 4 * 3 * 2 3
Then multiply what's left:
4 * 11 * 2 * 9 = 792
Now do this in code. :) Be sure to change all of your datatypes to long long, as the result of 40C20 is still a bit larger than what a 32-bit int can hold. This type is guaranteed to be at least 64 bits.
This is an overflow problem here. You result is above the max int value.
13! = 6227020800
Wich is more than INT_MAX (2147483647). If you want to handle larger numbers you should either use other variables types (for example unsigned long long), or handle the overflow in your function to avoid memory crashes.
Here is a topic that could be interesting about overflow checking in c here.
Also for n = 15 and r = 2, it returns the result -4
When a variable overflowed, it can underflow and
overflow in cycle. This is why you are getting negative values. I'm not sure but I think this is related. If somebody can validate this it would be great.
I guess there are 2 effects interacting:
Your integers overflow, that is the value of factorial(i) will become negative for sufficiently big i leading to
Your recursion (having factorial call itself) consumes all your stack space.
Try to change the condition in factorial from if(i == 1):
int factorial(int i) {
if(1 == i) {
return 1;
} else if(1 > i) {
return -1;
}
return i * factorial(i - 1);
}
This should have you get rid of the SEGFAULT.
For the integer overflow, the only possible solution would be to not rely on C integer arithmethic but using some bignum library (or write the code on your own).
Some explanation for what is probably going on:
As #WhozCraig pointed out, integers can only keep a range of numbers up to INT_MAX. However, factorial(i) just explodes even for relatively small numbers.
C however does not capture this exception and your integers will silently overflow to negative numbers.
This means at some point you start feeding factorial with negative numbers.
However, for each function call, some data has to be pushed onto the stack (usually the return address and local variables, possibly including the function arguments).
This memory will be released only after the function returns.
This means, if you call factorial(40), if everything works integer wise, you will eat up 40 times the amount of memory for 1 call to factorial.
Since your factorial does not handle negative numbers correctly, it will end up calling itself endlessly, overflowing from time to time, until the condition i == 1 at some point is randomly hit.
Ostensibly in most cases, this does not happen before your stack is exhausted.
When I run your program in a debugger with n = 40 and r = 20 on a 32-bit binary compiled with Microsoft Visual Studio, then I don't get a segmentation fault, but I get a division by zero error in the following line:
return factorial(l)/(factorial(l-m)*factorial(m));
factorial(l-m) and factorial(m) both evaluate to factorial(20), which is 2,192,834,560.
Assuming that sizeof(int) == 4 (32-bit), this number cannot be represented by a signed int. Therefore, the int overflows, which, according to the official C standard, causes undefined behavior.
However, even if the behavior is undefined, I can reasonably speculate that the following happens:
Due to the overflow, the number 2,192,834,560 will become -2,102,132,736. This is because the second number corresponds to the first number in Two's complement binary representation.
Since this number is multiplied with itself in your code (assuming n = 40 and r = 20), then the result of the multiplication will be 4,418,962,039,762,845,696. This number certainly does not fit into a signed int, so that an overflow occurs again.
The hexadecimal representation of this number is 0x3D534E9000000000.
Since this large number does not fit into a 32-bit integer, all the excess bits are stripped off, which is equivalent to subjecting the result to modulo UINT_MAX + 1 (4,294,967,296). The result of this modulo operation is 0.
Therefore, the expression
factorial(l-m)*factorial(m)
evaluates to 0.
This means that the line
return factorial(l)/(factorial(l-m)*factorial(m));
will cause a division by zero exception.
One way of solving the problem of handling large numbers is to use floating point numbers instead of integers. These can handle very large numbers without overflowing, but you may lose precision. If you use double instead of float, you will not so easily lose precision and, even if you do, the precision loss will be smaller.
I'm trying to limit arithmetic operations before they are executed to the result of at most 32 bit integers, specifically for addition.
This loop will find the bit position:
size_t highestOneBitPosition(uint32_t a) {
size_t bits=0;
while (a!=0) {
++bits;
a>>=1;
};
return bits;
}
This function effectively limits multiplication:
bool multiplication_is_safe(uint32_t a, uint32_t b) {
size_t a_bits=highestOneBitPosition(a), b_bits=highestOneBitPosition(b);
return (a_bits+b_bits<=32);
}
However, I'm unsure how to do this with addition. Something like this:
bool addition_is_safe(uint32_t a, uint32_t b) {
size_t a_bits=highestOneBitPosition(a), b_bits=highestOneBitPosition(b);
return (a_bits<32 && b_bits<32);
}
However, this will not limit the integer to 32bit (or 0x7FFFFFFF for signed). It will make sure each operand has has that many bit positions.
Mathematically, if you add two numbers, you have at most a carry of 1 into the place beyond the longest. So if you add a 4 digit number to a 3 digit number (or anything 4 digits or less), you have at most a 5 digit number. Except, when you have two with the same, you can end up with more (99 * 99 = 9801) so then it would be the same concept as in multiplication (a_bits+b_bits <=32)
What I would have to do is determine the longest operand, then add 1 and make sure that it's not exceeding 32 bit positions. I am entirely unsure how to do this with a function. My question is how can I modify addition_is_safe(uint32_t a, uint32_t b) to limit the result to <=32 as it is in multiplication_is_safe. I definitely want to utilize the HighestOneBit Position with this.
First of all, I don't even think that this function is correct:
bool multiplication_is_safe(uint32_t a, uint32_t b) {
size_t a_bits=highestOneBitPosition(a), b_bits=highestOneBitPosition(b);
return (a_bits+b_bits<=32);
}
It does return false when the multiply will overflow, but it also returns false when the multiply doesn't overflow. For example, given a = 0x10000 and b = 0x8000, this function returns false even though the result of a*b is 0x80000000 which fits in 32 bits. But if you change a and b to 0x1ffff and 0xffff (which have the same "highest one bit positions") then the multiply actually does overflow. But you couldn't tell by just using the highest bit position. You would need to look at more than just the top bit to figure out whether the multiplication will overflow. In fact, you would need to do part or all of the actual multiplication to figure out the right answer.
Similarly, you could construct a function addition_is_safe that detects "possible overflows" (both in the positive and negative direction) using bit positions. But you can't detect "actual overflow" unless you do part or all of the actual addition.
I believe that in the worst case, you will be forced to do the full multiplication or addition, so I'm not sure you will be saving anything by not letting the machine just do the full multiplication/addition for you.
Mathematically, if you add two numbers, you have at most a carry of 1
into the place beyond the longest.
That's absolutely correct (for unsigned binary numbers) without exception; you just got lost in your further consideration. So, the addition_is_safe condition based on the summands' numbers of bits is: the highest number of bits of the summands has to be smaller than the available number of bits.
bool addition_is_safe(uint32_t a, uint32_t b)
{
size_t a_bits=highestOneBitPosition(a), b_bits=highestOneBitPosition(b);
return (a_bits<b_bits?b_bits:a_bits)<32;
}
Surely you are aware that a false return from that function doesn't always mean overflow would occur, but a true return means overflow cannot occur.
you can check for overflow from adding two positive integers with (a + b) < a || (a + b) < b
overflow will either make the value negative (for signed integers), or will leave a smaller positive mode 32 remainder (for unsigned)
a positive added to a negative will never overflow
two negatives should be similar to two positives
in this equation
#define mod 1000000007
int n;
int num = ((1<<n)%mod)+2;
I have to left shift 1 by any value of n and then perform mod operation to contain the result within the range of int. But the 1<<n is not showing correct value for bigger values of n such as 1000 or 10000. How to do it ?
The maximum you can left-shift 1 by is CHAR_BIT * sizeof(int) - 2. Any larger amount causes undefined behaviour.
If you want to work with numbers like 210000 you are going to have to use a big integer library (or write your own) , there are no built-in data types that can handle that sort of number accurately.
Another option is to use a smarter algorithm for modular exponentiation.
I wrote a code for multiplying 2 vectors of length "N" elements, and returning the product vector of the same length in CUDA 5.0. Here is my code
I vary the value of "N" just see how the GPU fares compared to the CPU. I am able to go up to 2000000000 elements. However when I go to 3000000000 I get the warning:
vecmul.cu(52): warning: floating-point value does not fit in required integral type
vecmul.cu(52): warning: floating-point value does not fit in required integral type
vecmul.cu: In function `_Z6vecmulPiS_S_':
vecmul.cu:15: warning: comparison is always false due to limited range of data type
vecmul.cu: In function `int main()':
vecmul.cu:40: warning: comparison is always true due to limited range of data type
And here is my code
// Summing 2 Arrays
#include<stdio.h>
#include <fstream>
#define N (3000000000)
//const int threadsPerBlock = 256;
// Declare add function for Device
__global__ void vecmul(int *a,int *b,int *c)
{
int tid = threadIdx.x + blockIdx.x * blockDim.x;
if (tid >= N) {return;} // (LINE 15)
c[tid] = a[tid] * b[tid];
}
int main(void)
{
// Allocate Memory on Host
int *a_h = new int[N];
int *b_h = new int[N];
int *c_h = new int[N];
// Allocate Memory on GPU
int *a_d;
int *b_d;
int *c_d;
cudaMalloc((void**)&a_d,N*sizeof(int));
cudaMalloc((void**)&b_d,N*sizeof(int));
cudaMalloc((void**)&c_d,N*sizeof(int));
//Initialize Host Array
for (int i=0;i<N;i++) // (LINE 40)
{
a_h[i] = i;
b_h[i] = (i+1);
}
// Copy Data from Host to Device
cudaMemcpy(a_d,a_h,N*sizeof(int),cudaMemcpyHostToDevice);
cudaMemcpy(b_d,b_h,N*sizeof(int),cudaMemcpyHostToDevice);
// Run Kernel
int blocks = int(N - 0.5)/256 + 1; // (LINE 52)
vecmul<<<blocks,256>>>(a_d,b_d,c_d);
// Copy Data from Device to Host
cudaMemcpy(c_h,c_d,N*sizeof(int),cudaMemcpyDeviceToHost);
// Free Device Memory
cudaFree(a_d);
cudaFree(b_d);
cudaFree(c_d);
// Free Memory from Host
free(a_h);
free(b_h);
free(c_h);
return 0;
}
Is this something because of the number of blocks is not sufficient for this array size?
Any suggestions would be welcome since I am a beginner in CUDA.
I am running this on a NVIDIA Quadro 2000.
The errors are caused by overflowing a 32-bit signed int. 2147483648 is the max 32-bit signed int so N will always be negative, causing your boolean tests to always return true/false as specified by the warning.
The other problem is around
int blocks = int(N - 0.5)/256 + 1; // (LINE 52)
trying to turn N into a floating point and then turn it back into an int. The value in the floating point number is too big -- again because you've overflowed a 32-bit int.
I think if you can remove the int(), it will work since once you divide by 256, you will be small enough, but you're forcing it to int before the division, so it's too big causing the error. It's not the assignment into blocks that's the problem, it's the explicit conversion to int.
edit: Wondering if now that we've fixed some of the computation problems with N and floating point vs int that you're seeing issues with the overflow. For example:
for (int i=0;i<N;i++) // (LINE 40)
{
a_h[i] = i;
b_h[i] = (i+1);
}
When N is over 2^31-1, this will always result in true (at least until i overflows. This SHOULD cause this to be either an infinite loop or perhaps do 2^31-1 iterations and then exit? The compiler says it will ALWAYS be true, which if that's the case, the loop should never end.
Also, I don't know what a size_t is in CUDA, but
cudaMemcpy(c_h,c_d,N*sizeof(int),cudaMemcpyDeviceToHost);
doing N*sizeof(int) is going way over 2^31 and even 2^32 when N=3B.
At some point you need to ask yourself why you are trying to allocate this much space and if there is a better approach.
Hi I am new to C programming can anyone please tell me what this line of code would do:
i = (sizeof (X) / sizeof (int))
The code actually works with a case statement when it takes a value of bdata and compares it to different cases.
Generally, such a statement is used to calculate the number of elements in an array.
Let's consider an integer array as below:
int a[4];
Now, when sizeof(a) is done it will return 4*4 = 16 as the size. 4 elements and each element is of 4 bytes.
So, when you do sizeof(a) / sizeof(int), you will get 4 which is the length or size of the array.
It computes the number of elements of the array of int named X.
returns the length of the array X
it computes X's volume in memory divided by the size of an integer in your computer(2 bytes or 4 bytes). If i is integer than it is an integer division. If it is float and X has no even volume, it is real division.
int size can change. X depends on implementation. Division result depends on type of i.
All these means, it computes how many ints fit into X.
Besides common practice or personal experience there is no reason to think that this i = (sizeof (X) / sizeof (int)) computes the size of the array X. Most often probably this is the case but in theory X could be of any type, so the given expression would compute the ratio of the sizes of your var X and an int (how much more memory, in bytes, does your X var occupy with respect to an int)
Moreover, if X was a pointer to an array (float* X, the alternate way of declaring arrays in C) this expression would evaluate to 1 on a 32-bit architecture. The pointer would be 4 bytes and the int also 4 bytes => i = sizeof(X) / sizeof(int) (=1)