I use malloc to create an array in C. But I got the segmentation fault when I tried to assign random values to the array in 2 loops.
There is no segmentation fault when I assign values to this array in 1 loop. The array size is large. Please see the code I attached.
Anyone can give me a hint what is going on here. I am pretty new to C. Thanks a lot in advance.
int n=50000;
float *x = malloc(n*n*sizeof(float));
// there is segmentation fault:
int i, j;
for (i=0; i<n; i++){
for (j=0; j<n; j++){
x[i*n+j] = random() / (float)RAND_MAX;
}
}
// there is no segmentation fault:
int ii;
for (ii=0; ii<n*n; ii++){
x[ii] = random() / (float)RAND_MAX;
}
int overflow.
50000 * 50000 --> 2,500,000,000 --> more than INT_MAX --> undefined behavior (UB).
First, let us make certain a calculation for the size of this allocation is possible
assert(SIZE__MAX/n/n/sizeof(float) >= 1);
Then with verified wide enough size_t, use size_t math to do the multiplication and use size_t math for array index calculation. Rather than int*int*size_t, do size_t*int*int.
// float *x = malloc(n*n*sizeof(float));
// Uses at least `size_t` math by leading the multiplication with that type.
float *x = malloc(sizeof(float) * n*n);
// or better
float *x = malloc(sizeof *x * n*n);
for (i=0; i<n; i++){
for (j=0; j<n; j++){
x[(size_t)n*i + j] = random() / (float)RAND_MAX;
}
}
2nd loop did not not "fail" as n*n is not the large value as expected, but likely the same UB value in the allocation.
First off, you're invoking undefined behavior due to signed integer overflow. Assuming an int is 32-bit, the value of 50000*50000 is outsize the range of an int, causing the overflow.
You can fix this by putting sizeof(float) first in the expression. The result of sizeof is a size_t which is unsigned and at least as large as an int. Then when each n is multiplied, it is first converted to size_t thus avoiding overflow.
float *x = malloc(sizeof(float)*n*n);
However, even if you fix this you're asking for too much memory.
Assuming sizeof(float) is 4 bytes, n*n*sizeof(float) is about 10GB of memory. If you check the return value of malloc, you'll probably see that it returns NULL.
You'll need to make your array much smaller. Try n=1000 instead, which will only use about 4MB.
I believe the issue is related to integer overflow:
50,000 * 50,000 = 2.5 Billion
2^31 ~ 2.1 Billion
Thus, you are invoking undefined behavior when calculating the array index. As to why it works for one but not the other, that's just the way it is. Undefined behavior means the compiler (and computer) can do whatever it wants including doing what you expect and crashing the program.
To fix, change the types of i, j, n, and ii to long long from int. That should solve the overflow issue and the segmentation fault.
Edit:
You should also check that malloc returns a valid pointer before you perform operations on the pointer. If malloc fails, you will receive a null pointer.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
My project is to scan an address space (which in my case is 0x00000000 - 0xffffffff, or 0 - (232)-1) for a pattern and return in an array the locations in memory where the pattern was found (could be found multiple times).
Since the address space is 32 bits, i is a double and max is pow(2,32) (also a double).
I want to keep the original value of i intact so that I can use that to report the location of where the pattern was found (since actually finding the pattern requires moving forward several bytes past i), so I want temp, declared as char *, to copy the value of i. Then, later in my program, I will dereference temp.
double i, max = pow(2, 32);
char *temp;
for (i = 0; i < max; i++)
{
temp = (char *) i;
//some code involving *temp
}
The issue I'm running into is a double can't be cast as a char *. An int can be; however, since the address space is 32 bits (not 16) I need a double, which is exactly large enough to represent 2^32.
Is there anything I can do about this?
In C, double and float are not represented the way you think they are; this code demonstrates that:
#include <stdio.h>
typedef union _DI
{
double d;
int i;
} DI;
int main()
{
DI di;
di.d = 3.00;
printf("%d\n", di.i);
return 0;
}
You will not see an output of 3 in this case.
In general, even if you could read other process' memory, your strategy is not going to work on any modern operating system because of virtual memory (the address space that one process "sees" doesn't necessarily (in fact, it usually doesn't) represent the physical memory on the system).
Never use a floating point variable to store an integer. Floating point variables make approximate computations. It would happen to work in this case, because the integers are small enough, but to know that, you need intimate knowledge of how floating point works on a particular machine/compiler and what range of integers you'll be using. Plus it's harder to write the program, and the program would be slower.
C defines an integer type that's large enough to store a pointer: uintptr_t. You can cast a pointer to uintptr_t and back. On a 32-bit machine, uintptr_t will be a 32-bit type, so it's only able to store values up to 232-1. To express a loop that covers the whole range of the type including the first and last value, you can't use an ordinary for loop with a variable that's incremented, because the ending condition requires a value of the loop index that's out of range. If you naively write
uintptr_t i;
for (i = 0; i <= UINTPTR_MAX; i++) {
unsigned char *temp = (unsigned char *)i;
// ...
}
then you get an infinite loop, because after the iteration with i equal to UINTPTR_MAX, running i++ wraps the value of i to 0. The fact that the loop is infinite can also be seen in a simpler logical way: the condition i <= UINTPTR_MAX is always true since all values of the type are less or equal to the maximum.
You can fix this by putting the test near the end of the loop, before incrementing the variable.
i = 0;
do {
unsigned char *temp = (unsigned char *)i;
// ...
if (i == UINTPTR_MAX) break;
i++;
} while (1);
Note that exploring 4GB in this way will be extremely slow, if you can even do it. You'll get a segmentation fault whenever you try to access an address that isn't mapped. You can handle the segfault with a signal handler, but that's tricky and slow. What you're attempting may or may not be what your teacher expects, but it doesn't make any practical sense.
To explore a process's memory on Linux, read /proc/self/maps to discover its memory mappings. See my answer on Unix.SE for some sample code in Python.
Note also that if you're looking for a pattern, you need to take the length of the whole pattern into account, a byte-by-byte lookup doesn't do the whole job.
Ahh, a school assignment. OK then.
uint32_t i;
for ( i = 0; i < 0xFFFFFFFF; i++ )
{
char *x = (char *)i;
// Do magic here.
}
// Also, the above code skips on 0xFFFFFFFF itself, so magic that one address here.
// But if your pattern is longer than 1 byte, then it's not necessary
// (in fact, use something less than 0xFFFFFFFF in the above loop then)
The cast of a double to a pointer is a constraint violation - hence the error.
A floating type shall not be converted to any pointer type. C11dr ยง6.5.4 4
To scan the entire 32-bit address space, use a do loop with an integer type capable of the [0 ... 0xFFFFFFFF] range.
uint32_t address = 0;
do {
char *p = (char *) address;
foo(p);
} while (address++ < 0xFFFFFFFF);
I am trying to optimize code to run in under 7 seconds. I had it down to 8, and now I am trying to use pointers to speed up the code. But gcc gives an error when I try to compile:
.c:29: warning: assignment from incompatible pointer type .c:29:
warning: comparison of distinct pointer types lacks a cast
Here is what I had before trying to use pointers:
#include <stdio.h>
#include <stdlib.h>
#define N_TIMES 600000
#define ARRAY_SIZE 10000
int main (void)
{
double *array = calloc(ARRAY_SIZE, sizeof(double));
double sum = 0;
int i;
double sum1 = 0;
for (i = 0; i < N_TIMES; i++) {
int j;
for (j = 0; j < ARRAY_SIZE; j += 20) {
sum += array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] + array[j+6] + array[j+7] + array[j+8] + array[j+9];
sum1 += array[j+10] + array[j+11] + array[j+12] + array[j+13] + array[j+14] + array[j+15] + array[j+16] + array[j+17] + array[j+18] + array[j+19];
}
}
sum += sum1;
return 0;
}
Here is what I have when I use pointers (this code generates the error):
int *j;
for (j = array; j < &array[ARRAY_SIZE]; j += 20) {
sum += *j + *(j+1) + *(j+2) + *(j+3) + *(j+4) + *(j+5) + *(j+6) + *(j+7) + *(j+8) + *(j+9);
sum1 += *(j+10) + *(j+11) + *(j+12) + *(j+13) + *(j+14) + *(j+15) + *(j+16) + *(j+17) + *(j+18) + *(j+19);
}
How do I fix this error? Btw I don't want suggestions on alternative ways to try to optimize the code. This is a homework problem that has constraints about what I'm allowed to do. I think once I get this pointer thing fixed it will run under 7 seconds and i'll be good to go.
comparison of distinct pointer types lacks a cast
This means that you tried to compare a pointer of one type to a pointer of another type, and did so without a cast.
double *array = calloc(ARRAY_SIZE, sizeof(double));
int *j;
Pointers to double and pointers to int are not directly comparable. You aren't allowed to compare j to array for this reason. Perhaps you meant to declare j as a pointer to double ?
C is a statically typed language, and comparisons across pointer types will give you errors. There is some implicit casting in certain cases, like if you compare a double to an int, because comparing numbers is a common operation. Comparing pointers of different types isn't.
Further, when you increment a pointer over an array, it uses the size of it's dereferenced element to know how far in memory to move. Moving with an int over an array of doubles will lead to issues.
A double will move farther then an int, so you will get more interations with an int pointer anyway.
You could explicitly cast things, but really you should be using a double * for an array of doubles.
I'd be greatly surprised if moving from an array representation to a pointer representation would yield much (if any) speedup, as both are memory addresses (and memory offsets) in the final outputted code. Remember, the array representation is actually a pointer representation in different clothing too.
Instead, I'd look towards one of two techniques:
Embedded MMX representations, to do multiple addition operations within the same register, under the same clock cycle. Then, you need one operation near the end to combine the high double with the low double.
Scatter / Gather algorithims to spread the addition operation across multiple cores (nearly every CPU these days has 4 cores available, if not 16 pseudo-cores (a la hyper-threading))
Beyond that, you can do a few attempts at cache analysis, and at storing intermediates in different registers. There seems to be a deep chain of additions in each of your computations. Breaking them up might yield the ability to spread the on-cpu storage across more registers.
Most operations become memory bound. 20 is a really strange boundary for loop unrolling. Doubles probably are 16 bits, so 20 doubles is 320 bits, which is probably not aligned to your memory cache line size. Try making sure that multiples of your unrolled loop align cleanly with your architecture's level 1 cache, and you might avoid a page fault as you read across cache boundaries. Doing so will speed up your program by some (but who knows how much).
" When you increment a pointer over an array, it uses the size of it's dereferenced element to know how far in memory to move. Moving with an int over an array of doubles will lead to issues ".
To avoid your warn: do the below one
for (j= (int *)array; j < (int *)&array[ARRAY_SIZE]; j += 20)
I wrote a code for multiplying 2 vectors of length "N" elements, and returning the product vector of the same length in CUDA 5.0. Here is my code
I vary the value of "N" just see how the GPU fares compared to the CPU. I am able to go up to 2000000000 elements. However when I go to 3000000000 I get the warning:
vecmul.cu(52): warning: floating-point value does not fit in required integral type
vecmul.cu(52): warning: floating-point value does not fit in required integral type
vecmul.cu: In function `_Z6vecmulPiS_S_':
vecmul.cu:15: warning: comparison is always false due to limited range of data type
vecmul.cu: In function `int main()':
vecmul.cu:40: warning: comparison is always true due to limited range of data type
And here is my code
// Summing 2 Arrays
#include<stdio.h>
#include <fstream>
#define N (3000000000)
//const int threadsPerBlock = 256;
// Declare add function for Device
__global__ void vecmul(int *a,int *b,int *c)
{
int tid = threadIdx.x + blockIdx.x * blockDim.x;
if (tid >= N) {return;} // (LINE 15)
c[tid] = a[tid] * b[tid];
}
int main(void)
{
// Allocate Memory on Host
int *a_h = new int[N];
int *b_h = new int[N];
int *c_h = new int[N];
// Allocate Memory on GPU
int *a_d;
int *b_d;
int *c_d;
cudaMalloc((void**)&a_d,N*sizeof(int));
cudaMalloc((void**)&b_d,N*sizeof(int));
cudaMalloc((void**)&c_d,N*sizeof(int));
//Initialize Host Array
for (int i=0;i<N;i++) // (LINE 40)
{
a_h[i] = i;
b_h[i] = (i+1);
}
// Copy Data from Host to Device
cudaMemcpy(a_d,a_h,N*sizeof(int),cudaMemcpyHostToDevice);
cudaMemcpy(b_d,b_h,N*sizeof(int),cudaMemcpyHostToDevice);
// Run Kernel
int blocks = int(N - 0.5)/256 + 1; // (LINE 52)
vecmul<<<blocks,256>>>(a_d,b_d,c_d);
// Copy Data from Device to Host
cudaMemcpy(c_h,c_d,N*sizeof(int),cudaMemcpyDeviceToHost);
// Free Device Memory
cudaFree(a_d);
cudaFree(b_d);
cudaFree(c_d);
// Free Memory from Host
free(a_h);
free(b_h);
free(c_h);
return 0;
}
Is this something because of the number of blocks is not sufficient for this array size?
Any suggestions would be welcome since I am a beginner in CUDA.
I am running this on a NVIDIA Quadro 2000.
The errors are caused by overflowing a 32-bit signed int. 2147483648 is the max 32-bit signed int so N will always be negative, causing your boolean tests to always return true/false as specified by the warning.
The other problem is around
int blocks = int(N - 0.5)/256 + 1; // (LINE 52)
trying to turn N into a floating point and then turn it back into an int. The value in the floating point number is too big -- again because you've overflowed a 32-bit int.
I think if you can remove the int(), it will work since once you divide by 256, you will be small enough, but you're forcing it to int before the division, so it's too big causing the error. It's not the assignment into blocks that's the problem, it's the explicit conversion to int.
edit: Wondering if now that we've fixed some of the computation problems with N and floating point vs int that you're seeing issues with the overflow. For example:
for (int i=0;i<N;i++) // (LINE 40)
{
a_h[i] = i;
b_h[i] = (i+1);
}
When N is over 2^31-1, this will always result in true (at least until i overflows. This SHOULD cause this to be either an infinite loop or perhaps do 2^31-1 iterations and then exit? The compiler says it will ALWAYS be true, which if that's the case, the loop should never end.
Also, I don't know what a size_t is in CUDA, but
cudaMemcpy(c_h,c_d,N*sizeof(int),cudaMemcpyDeviceToHost);
doing N*sizeof(int) is going way over 2^31 and even 2^32 when N=3B.
At some point you need to ask yourself why you are trying to allocate this much space and if there is a better approach.
Working on my assignment, more details in another question. If I use
arr[(i * 16) % arrLen] *= 2; // seg fault
vs
arr[i % arrLen] *= 2; // OK!
Why? Full source see line 31. Why? I modulus the array length so it should be OK?
i * 16 can overflow into the negative range of signed integers. When you take modulo of a negative integer, you can get a negative remainder and that'll make your array subscript negative and result in an access outside of the array's allocated memory and sometimes a crash.
Looking at your full source:
You should be checking the return of malloc to make sure you were able to get that memory
You should be freeing the memory, you have a leak
The memory inside your arr array is uninitialized, you allocate it but don't set it to anything so you're getting (most likely) a large negitive number. This can be done by memset(arr,0,arrLen);
You malloc(arrLen * sizeof(int)) however arrLen is created with a /sizeof(int), so you're canceling your work there...
Regarding your seg fault, as others have stated your overflowing your array. You've created an array of ints. The then you're looping from 0 to reps (268,435,456) which is the max size of an int. When you try to multiply that by 16 you're overflowing and creating a negative offset.
Try multiplying the 16 into the initialization of reps:
int reps = 256 * 1024 * 1024 * 16;
Your compiler should throw a warning letting you know this exact thing:
warning: integer overflow in expression [-Woverflow]
Assuming the size of an int on your system is 32-bits, chances are you're causing an overflow and the result of i * 16 is becoming negative. In a two's complement system, negative values are represented with a higher binary value.
int reps = 256 * 1024 * 1024;
So reps = 268,435,456, which is the value you're looping up until. The greatest value of i is therefore 268,435,455 and 268,435,455 * 16 = 4,294,967,280.
The largest positive value a 32-bit int can represent is 2,147,483,647 (4,294,967,295 for an unsigned int, so you haven't wrapped around the negatives yet), which means that result is being interpreted as a negative value.
Accessing a negative offset from arr is out of the bounds of your allocated memory, which causes undefined behaviour and fortunately a seg fault.
If I have the following code in a function:
int A[5][5];
int i; int j;
for(i=0;i<5;i++){
for(j=0;j<5;j++){
A[i][j]=i+j;
printf("%d\n", A[i][j]);
}
}
This simply prints out the sum of each index. What I want to know is if it's possible to access each index in the static array in a similar fashion to dynamic array. So for example, if I wanted to access A[2][2], can I say:
*(A+(2*5+2)*sizeof(int))?
I want to perform some matrix operations on statically allocated matrices and I feel like the method used to dereference dynamic matrices would work the best for my purposes. Any ideas? Thank you.
That's the way to do it: A[i][j].
It prints out the sum of the indexes because, well, you set the element A[i][j] to the sum of the indexes: A[i][j] = i+j.
You can use:
*(*(A + 2) + 2)
for A[2][2]. Pointer arithmetics is done in unit of the pointed type not in unit of char.
Of course, the preferred way is to use A[2][2] in your program.
The subscript operation a[i] is defined as *(a + i) - you compute an offset of i elements (not bytes) from a and then dereference the result. For a 2D array, you just apply that definition recursively:
a[i][j] == *(a[i] + j) == *(*(a + i) + j)
If the array is allocated contiguously, you could also just write *(a + i * rows + j).
When doing pointer arithmetic, the size of the base type is taken into account. Given a pointer
T *p;
the expression p + 1 will evaluate to the address of the next object of type T, which is sizeof T bytes after p.
Note that using pointer arithmetic may not be any faster than using the subscript operator (code up both versions and run them through a profiler to be sure). It will definitely be less readable.
Pointer arithmetic can be tricky.
You are on the right track, however there are some differences between pointer and normal arithmetic.
For example consider this code
int I = 0;
float F = 0;
double D = 0;
int* PI = 0;
float* PF = 0;
double* PD = 0;
cout<<I<<" "<<F<<" "<<D<<" "<<PI<<" "<<PF<<" "<<PD<<endl;
I++;F++;D++;PI++;PF++,PD++;
cout<<I<<" "<<F<<" "<<D<<" "<<PI<<" "<<PF<<" "<<PD<<endl;
cout<<I<<" "<<F<<" "<<D<<" "<<(int)PI<<" "<<(int)PF<<" "<<(int)PD<<endl;
If you run it see the output you would see would look something like this (depending on your architecture and compiler)
0 0 0 0 0 0
1 1 1 0x4 0x4 0x8
1 1 1 4 4 8
As you can see the pointer arithmetic is handled depending on the type of the variable it points to.
So keep in mind which type of variable you are accessing when working with pointer arithmetic.
Just for the sake of example consider this code too:
void* V = 0;
int* IV = (int*)V;
float* FV = (float*)V;
double* DV = (double*)V;
IV++;FV++;DV++;
cout<<IV<<" "<<FV<<" "<<DV<<endl;
You will get the output (again depending on your architecture and compiler)
0x4 0x4 0x8
Remember that the code snippets above are just for demonstration purposes. There are a lot of things NOT to use from here.