On Linux, with 16 GB of RAM, why would the following segfault:
#include <stdlib.h>
#define N 44000
int main(void) {
long width = N*2 - 1;
int * c = (int *) calloc(width*N, sizeof(int));
c[N/2] = 1;
return 0;
}
According to GDB the problem is from c[N/2] = 1 , but what is the reason?
It's probably because the return value of calloc was NULL.
The amount of physical RAM in your box does not directly correlate to how much memory you can allocate with calloc/malloc/realloc. That is more directly determined by the remaining amount of Virtual Memory available to your process.
Your calculation overflows the range of a 32-bit signed integer, which is what "long" may be. You should use size_t instead of long. This is guaranteed to be able to hold the size of the largest memory block that your system can allocate.
You're allocating around 14-15 GB memory, and for whatever reason the allocator cannot
give you that much at the moment- thus calloc returns NULL and you segfault as you're dereferencing a NULL pointer.
Check if calloc returns NULL.
That's assuming you're compiling a 64-bit program under a 64-bit Linux. If you're doing something else - you might overflow the calculation to the first argument to calloc if a long is not 64 bits on your system.
For example, try
#include <stdlib.h>
#include <stdio.h>
#define N 44000L
int main(void)
{
size_t width = N * 2 - 1;
printf("Longs are %lu bytes. About to allocate %lu bytes\n",
sizeof(long), width * N * sizeof(int));
int *c = calloc(width * N, sizeof(int));
if (c == NULL) {
perror("calloc");
return 1;
}
c[N / 2] = 1;
return 0;
}
You are asking for 2.6 GB of RAM (no, you aren't -- you are asking for 14 GB on 64 bit... 2.6 GB overflowed cutoff calculation on 32 bit). Apparently, Linux's heap is utilized enough that calloc() can't allocate that much at once.
This works fine on Mac OS X (both 32 and 64 bit) -- but just barely (and would likely fail on a different system with a different dyld shared cache and frameworks).
And, of course, it should work dandy under 64 bit on any system (even the 32 bit version with the bad calculation worked, but only coincidentally).
One more detail; in a "real world app", the largest contiguous allocation will be vastly reduced as the complexity and/or running time of the application increases. The more of the heap that is used, the less contiguous space there is to allocate.
You might want to change the #define to:
#define N 44000L
just to make sure the math is being done at long resolution. You may be generating a negative number for the calloc.
Calloc may be failing and returning null which would cause the problem.
Dollars to donuts calloc() returned NULL because it couldn't satisfy the request, so attempting to deference c caused the segfault. You should always check the result of *alloc() to make sure it isn't NULL.
Create a 14 GB file, and memory map it.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
My project is to scan an address space (which in my case is 0x00000000 - 0xffffffff, or 0 - (232)-1) for a pattern and return in an array the locations in memory where the pattern was found (could be found multiple times).
Since the address space is 32 bits, i is a double and max is pow(2,32) (also a double).
I want to keep the original value of i intact so that I can use that to report the location of where the pattern was found (since actually finding the pattern requires moving forward several bytes past i), so I want temp, declared as char *, to copy the value of i. Then, later in my program, I will dereference temp.
double i, max = pow(2, 32);
char *temp;
for (i = 0; i < max; i++)
{
temp = (char *) i;
//some code involving *temp
}
The issue I'm running into is a double can't be cast as a char *. An int can be; however, since the address space is 32 bits (not 16) I need a double, which is exactly large enough to represent 2^32.
Is there anything I can do about this?
In C, double and float are not represented the way you think they are; this code demonstrates that:
#include <stdio.h>
typedef union _DI
{
double d;
int i;
} DI;
int main()
{
DI di;
di.d = 3.00;
printf("%d\n", di.i);
return 0;
}
You will not see an output of 3 in this case.
In general, even if you could read other process' memory, your strategy is not going to work on any modern operating system because of virtual memory (the address space that one process "sees" doesn't necessarily (in fact, it usually doesn't) represent the physical memory on the system).
Never use a floating point variable to store an integer. Floating point variables make approximate computations. It would happen to work in this case, because the integers are small enough, but to know that, you need intimate knowledge of how floating point works on a particular machine/compiler and what range of integers you'll be using. Plus it's harder to write the program, and the program would be slower.
C defines an integer type that's large enough to store a pointer: uintptr_t. You can cast a pointer to uintptr_t and back. On a 32-bit machine, uintptr_t will be a 32-bit type, so it's only able to store values up to 232-1. To express a loop that covers the whole range of the type including the first and last value, you can't use an ordinary for loop with a variable that's incremented, because the ending condition requires a value of the loop index that's out of range. If you naively write
uintptr_t i;
for (i = 0; i <= UINTPTR_MAX; i++) {
unsigned char *temp = (unsigned char *)i;
// ...
}
then you get an infinite loop, because after the iteration with i equal to UINTPTR_MAX, running i++ wraps the value of i to 0. The fact that the loop is infinite can also be seen in a simpler logical way: the condition i <= UINTPTR_MAX is always true since all values of the type are less or equal to the maximum.
You can fix this by putting the test near the end of the loop, before incrementing the variable.
i = 0;
do {
unsigned char *temp = (unsigned char *)i;
// ...
if (i == UINTPTR_MAX) break;
i++;
} while (1);
Note that exploring 4GB in this way will be extremely slow, if you can even do it. You'll get a segmentation fault whenever you try to access an address that isn't mapped. You can handle the segfault with a signal handler, but that's tricky and slow. What you're attempting may or may not be what your teacher expects, but it doesn't make any practical sense.
To explore a process's memory on Linux, read /proc/self/maps to discover its memory mappings. See my answer on Unix.SE for some sample code in Python.
Note also that if you're looking for a pattern, you need to take the length of the whole pattern into account, a byte-by-byte lookup doesn't do the whole job.
Ahh, a school assignment. OK then.
uint32_t i;
for ( i = 0; i < 0xFFFFFFFF; i++ )
{
char *x = (char *)i;
// Do magic here.
}
// Also, the above code skips on 0xFFFFFFFF itself, so magic that one address here.
// But if your pattern is longer than 1 byte, then it's not necessary
// (in fact, use something less than 0xFFFFFFFF in the above loop then)
The cast of a double to a pointer is a constraint violation - hence the error.
A floating type shall not be converted to any pointer type. C11dr ยง6.5.4 4
To scan the entire 32-bit address space, use a do loop with an integer type capable of the [0 ... 0xFFFFFFFF] range.
uint32_t address = 0;
do {
char *p = (char *) address;
foo(p);
} while (address++ < 0xFFFFFFFF);
TASK_SIZE is a kernel constant that defines the upper limit of the accessible memory for the code working at the lowest privilege level.
Its value is usually set to 0xc0000000 on systems with less than 1GB of physical memory (all examples included in this article refer to this value). The memory above this limit contains the kernel code .
Is there a way to determine the running kernels TASK_SIZE through c program ??
TASK_SIZE
After a lot of google search and analysis , i got a logic
Assume net virtual address are 4gb and the it is divided in 1:3 ratio.
Rough assumptions:-
Kernel (upper 1 gb): c0000000 -ffffffff
USer space (below 3gb):0-c0000000
than
#define GB *1073741824
unsigned int num;
unsigned int task_size;
task_size=(unsigned)&number+ 1 GB / 1 GB * 1GB;
[the process's stack area will be allocated below the kernel space]
so address of num (in stack)= somewhere around in 3 GB range ex:[3214369612]
now adding 1 GB = 1073741824+3214369612=4288111436
dividing by 1GB=3.993614983 that will be 3 (unsigned int)
now multiplying by 1GB = 3 *1073741824 = 3221225472 i.e. (0xC0000000 in hex)
hence i got the kernel starting address (TASK_SIZE)
I tried it assuming (2:6) ratio also and got correct result.
Is this a fair logic ,Please comment ???
Assume sizeof(int).
Then, what's the total size of bytes that will be allocated on the dynamic heap?
And can you please explain why?
#include<stdio.h>
#include<stdlib.h>
#define MAXROW 8
#define MAXCOL 27
int main()
{
int (*p)[MAXCOL];
p = (int (*) [MAXCOL])malloc(MAXROW *sizeof(*p));
return0;
}
Assume "sizeof(int) is "(what?)... I guess you meant 4.
In the first line you declare p to be a pointer to an array of 27 integers.
In the second line you allocate memory in the heap for size of dereferenced p - which is 27 integers times 8 - that's 27*4*8 so the number of bytes allocated is 864.
In your code,
int (*p)[MAXCOL];
is eqivalent to saying, declare p as a pointer to an array of MAXCOL number of ints.
So, considering sizeof(int) is 4 bytes (32 bit compiler / platform),
sizeof(*p) is 108, and MAXROW *sizeof(*p) is 8 * 108, and malloc() allocate that many bytes, if successful.
Also, please see this discussion on why not to cast the return value of malloc() and family in C..
The answer should be MAXROW*MAXCOL*sizeof(int). The size of int cannot be determined from the code shown. It can be 2, 4, 8... or even 42, pretty much anything greater than 0.
If your teacher or course expects 432, they rely on extra context you failed to provide. Re-reading your question, you write assume sizeof(int). You need to say what should be assumed precisely.
I wrote a code for multiplying 2 vectors of length "N" elements, and returning the product vector of the same length in CUDA 5.0. Here is my code
I vary the value of "N" just see how the GPU fares compared to the CPU. I am able to go up to 2000000000 elements. However when I go to 3000000000 I get the warning:
vecmul.cu(52): warning: floating-point value does not fit in required integral type
vecmul.cu(52): warning: floating-point value does not fit in required integral type
vecmul.cu: In function `_Z6vecmulPiS_S_':
vecmul.cu:15: warning: comparison is always false due to limited range of data type
vecmul.cu: In function `int main()':
vecmul.cu:40: warning: comparison is always true due to limited range of data type
And here is my code
// Summing 2 Arrays
#include<stdio.h>
#include <fstream>
#define N (3000000000)
//const int threadsPerBlock = 256;
// Declare add function for Device
__global__ void vecmul(int *a,int *b,int *c)
{
int tid = threadIdx.x + blockIdx.x * blockDim.x;
if (tid >= N) {return;} // (LINE 15)
c[tid] = a[tid] * b[tid];
}
int main(void)
{
// Allocate Memory on Host
int *a_h = new int[N];
int *b_h = new int[N];
int *c_h = new int[N];
// Allocate Memory on GPU
int *a_d;
int *b_d;
int *c_d;
cudaMalloc((void**)&a_d,N*sizeof(int));
cudaMalloc((void**)&b_d,N*sizeof(int));
cudaMalloc((void**)&c_d,N*sizeof(int));
//Initialize Host Array
for (int i=0;i<N;i++) // (LINE 40)
{
a_h[i] = i;
b_h[i] = (i+1);
}
// Copy Data from Host to Device
cudaMemcpy(a_d,a_h,N*sizeof(int),cudaMemcpyHostToDevice);
cudaMemcpy(b_d,b_h,N*sizeof(int),cudaMemcpyHostToDevice);
// Run Kernel
int blocks = int(N - 0.5)/256 + 1; // (LINE 52)
vecmul<<<blocks,256>>>(a_d,b_d,c_d);
// Copy Data from Device to Host
cudaMemcpy(c_h,c_d,N*sizeof(int),cudaMemcpyDeviceToHost);
// Free Device Memory
cudaFree(a_d);
cudaFree(b_d);
cudaFree(c_d);
// Free Memory from Host
free(a_h);
free(b_h);
free(c_h);
return 0;
}
Is this something because of the number of blocks is not sufficient for this array size?
Any suggestions would be welcome since I am a beginner in CUDA.
I am running this on a NVIDIA Quadro 2000.
The errors are caused by overflowing a 32-bit signed int. 2147483648 is the max 32-bit signed int so N will always be negative, causing your boolean tests to always return true/false as specified by the warning.
The other problem is around
int blocks = int(N - 0.5)/256 + 1; // (LINE 52)
trying to turn N into a floating point and then turn it back into an int. The value in the floating point number is too big -- again because you've overflowed a 32-bit int.
I think if you can remove the int(), it will work since once you divide by 256, you will be small enough, but you're forcing it to int before the division, so it's too big causing the error. It's not the assignment into blocks that's the problem, it's the explicit conversion to int.
edit: Wondering if now that we've fixed some of the computation problems with N and floating point vs int that you're seeing issues with the overflow. For example:
for (int i=0;i<N;i++) // (LINE 40)
{
a_h[i] = i;
b_h[i] = (i+1);
}
When N is over 2^31-1, this will always result in true (at least until i overflows. This SHOULD cause this to be either an infinite loop or perhaps do 2^31-1 iterations and then exit? The compiler says it will ALWAYS be true, which if that's the case, the loop should never end.
Also, I don't know what a size_t is in CUDA, but
cudaMemcpy(c_h,c_d,N*sizeof(int),cudaMemcpyDeviceToHost);
doing N*sizeof(int) is going way over 2^31 and even 2^32 when N=3B.
At some point you need to ask yourself why you are trying to allocate this much space and if there is a better approach.
Working on my assignment, more details in another question. If I use
arr[(i * 16) % arrLen] *= 2; // seg fault
vs
arr[i % arrLen] *= 2; // OK!
Why? Full source see line 31. Why? I modulus the array length so it should be OK?
i * 16 can overflow into the negative range of signed integers. When you take modulo of a negative integer, you can get a negative remainder and that'll make your array subscript negative and result in an access outside of the array's allocated memory and sometimes a crash.
Looking at your full source:
You should be checking the return of malloc to make sure you were able to get that memory
You should be freeing the memory, you have a leak
The memory inside your arr array is uninitialized, you allocate it but don't set it to anything so you're getting (most likely) a large negitive number. This can be done by memset(arr,0,arrLen);
You malloc(arrLen * sizeof(int)) however arrLen is created with a /sizeof(int), so you're canceling your work there...
Regarding your seg fault, as others have stated your overflowing your array. You've created an array of ints. The then you're looping from 0 to reps (268,435,456) which is the max size of an int. When you try to multiply that by 16 you're overflowing and creating a negative offset.
Try multiplying the 16 into the initialization of reps:
int reps = 256 * 1024 * 1024 * 16;
Your compiler should throw a warning letting you know this exact thing:
warning: integer overflow in expression [-Woverflow]
Assuming the size of an int on your system is 32-bits, chances are you're causing an overflow and the result of i * 16 is becoming negative. In a two's complement system, negative values are represented with a higher binary value.
int reps = 256 * 1024 * 1024;
So reps = 268,435,456, which is the value you're looping up until. The greatest value of i is therefore 268,435,455 and 268,435,455 * 16 = 4,294,967,280.
The largest positive value a 32-bit int can represent is 2,147,483,647 (4,294,967,295 for an unsigned int, so you haven't wrapped around the negatives yet), which means that result is being interpreted as a negative value.
Accessing a negative offset from arr is out of the bounds of your allocated memory, which causes undefined behaviour and fortunately a seg fault.