Why is the following code resulting in Segmentation fault? (I'm trying to create two matrices of the same size, one with static and the other with dynamic allocation)
#include <stdio.h>
#include <stdlib.h>
//Segmentation fault!
int main(){
#define X 5000
#define Y 6000
int i;
int a[X][Y];
int** b = (int**) malloc(sizeof(int*) * X);
for(i=0; i<X; i++){
b[i] = malloc (sizeof(int) * Y);
}
}
Weirdly enough, if I comment out one of the matrix definitions, the code runs fine. Like this:
#include <stdio.h>
#include <stdlib.h>
//No Segmentation fault!
int main(){
#define X 5000
#define Y 6000
int i;
//int a[X][Y];
int** b = (int**) malloc(sizeof(int*) * X);
for(i=0; i<X; i++){
b[i] = malloc (sizeof(int) * Y);
}
}
or
#include <stdio.h>
#include <stdlib.h>
//No Segmentation fault!
int main(){
#define X 5000
#define Y 6000
int i;
int a[X][Y];
//int** b = (int**) malloc(sizeof(int*) * X);
//for(i=0; i<X; i++){
// b[i] = malloc (sizeof(int) * Y);
//}
}
I'm running gcc on Linux on a 32-bit machine.
Edit: Checking if malloc() succeeds:
#include <stdio.h>
#include <stdlib.h>
//No Segmentation fault!
int main(){
#define X 5000
#define Y 6000
int i;
int a[X][Y];
int* tmp;
int** b = (int**) malloc(sizeof(int*) * X);
if(!b){
printf("Error on first malloc.\n");
}
else{
for(i=0; i<X; i++){
tmp = malloc (sizeof(int) * Y);
if(tmp)
b[i] = tmp;
else{
printf("Error on second malloc, i=%d.\n", i);
return;
}
}
}
}
Nothing is printed out when I run it (expect of course for "Segmentation fault")
Your a variable requires, on a 32-bit system, 5000 * 6000 * 4 = 120 MB of stack space. It's possible that this violates some limit, which causes the segmentation fault.
Also, it's of course possible that malloc() fails at some point, which might casue you to dereference a NULL pointer.
You are getting a segmentation fault which means that your program is attempting to access a memory address that has not been assigned to its process. The array a is a local variable and thus allocated memory from the stack. As unwind pointed out a requires 120 Mbytes of storage. This is almost certainly larger than the stack space that the OS has allocated to your process. As soon as the for loop walks off the end of the stack you get a segmentation fault.
In Linux the stack size is controlled by the OS not the compiler so try the following:-
$ ulimit -a
In the response you should see a line something like this:-
stack size (kbytes) (-s) 10240
This means that each process gets 10Mbyte of storage, nowhere near enough for your large array.
You can adjust the stack size with a ulimit -s <stack size> command but I suspect it will not allow you to select a 120Mbyte stack size!
The simplest solution is to make a a global variable instead of an local variable.
Try to increase heap and stack limits in GCC:
gcc -Wl,--stack=xxxxx -Wl,--heap=yyyyy
Those are sizable allocations. Have you tried checking to make sure malloc() succeeds?
You might use malloc() for all your arrays, and check to make sure it succeeds each time.
A stack overflow (how appropriate!) can result in a segmentation fault which is what it seems you're seeing here.
In your third case the stack pointer is being moved to an invalid address but isn't being used for anything since the program then exits. If you put any operation after the stack allocation you should get a segfault.
Perhaps the compiler is just changing the stack pointer to some large value but never using it, and thus never causing a memory access violation.
Try initializing all of the elements of A in your third example? Your first example tries to allocate B after A on the stack, and accessing the stack that high (on the first assignment to B) might be what's causing the segfault.
Your 3rd code doesn't work either (on my system at least).
Try allocating memory to array a on the heap rather(when dimensions are large).
Both matrices don't fit in the limits of your memory. You can allocate only one at a time.
If you define Y as 3000 instead of 6000, your program should not issue segfault.
Related
#include <stdio.h>
int main()
{
int i,a;
int* p;
p=&a;
for(i=0;i<=10;i++)
{
*(p+i)=i;
printf("%d\n",*(p+i));
}
return 0;
}
I tried to assign numbers from 0 to 10 in a sequence memory location without using an array.
You are trying to write to memory that it does not have permission to access.
The variable a is a local variable in the main function, and it is stored on the stack. The pointer p is initialized to point to the address of a. The code then attempts to write to the memory addresses starting at p and going up to p+10. However, these memory addresses are not part of the memory that has been allocated for the program to use, and so the program receives a segmentation fault when it tries to write to them.
To fix this issue, you can either change the loop condition to a smaller value, or you can allocate memory dynamically using malloc or calloc, and assign the pointer to the returned address. This will allow you to write to the allocated memory without causing a segmentation fault.
Like this:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int i;
int* p = malloc(sizeof(int) * 11); // Allocate memory for 10 integers
if (p == NULL) { // Check for allocation failure
printf("Error allocating memory\n");
return 1;
}
for(i=0;i<=10;i++)
{
*(p+i)=i;
printf("%d\n",*(p+i));
}
free(p); // Free the allocated memory when you are done with it
return 0;
}
a is only an integer. not an array.
you need to declare it differently:
int i, a[10];
You can not. Memory of int is 4 bytes and you can store only single number in that memory.
for int: -2,147,483,647 to 2,147,483,647
for unsigned int: 0 to 4, 294, 967 295
There are other types you can use with different sizes, but if you want to put different numbers into one variable you need to use array.
int arr[10];
arr[0] = 0;
arr[1] = 5;
something like this.
I'm trying to create a graph with 264346 positions. Would you know why calloc when it reaches 26,000 positions it stops generating memory addresses (ex: 89413216) and starts generating zeros (0) and then all the processes on my computer crash?
The calloc function should generate zeros but not at this position on my code.
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <time.h>
#include <string.h>
#include <limits.h>
int maxV;
struct grafo {
int NumTotalVertices;
int NumVertices;
int NumArestas;
int **p;
};
typedef struct grafo MGrafo;
MGrafo* Aloca_grafo(int NVertices);
int main(){
MGrafo *MatrizGrafo;
MatrizGrafo = Aloca_grafo(264346);
return 0;
}
MGrafo* Aloca_grafo(int NVertices) {
int i, k;
MGrafo *Grafo ;
Grafo = (MGrafo*) malloc(sizeof(MGrafo));
Grafo->p = (int **) malloc(NVertices*sizeof(int*));
for(i=0; i<NVertices+1; i++){
Grafo->p[i] = (int*) calloc(NVertices,sizeof(int));// error at this point
//printf("%d - (%d)\n", i, Grafo->p[i]); // see impression
}
printf("%d - (%d)\n", i, Grafo->p[i]);
Grafo->NumTotalVertices = NVertices;
Grafo->NumArestas = 0;
Grafo->NumVertices = 0;
return Grafo;
}
You surely dont mean what you have in your code
Grafo = (MGrafo*)malloc(sizeof(MGrafo));
Grafo->p = (int**)malloc(NVertices * sizeof(int*)); <<<<=== 264000 int pointers
for (i = 0; i < NVertices + 1; i++) { <<<<< for each of those 264000 int pointers
Grafo->p[i] = (int*)calloc(NVertices, sizeof(int)); <<<<<=== allocate 264000 ints
I ran this on my machine
its fans turned on, meaning it was trying very very hard
after the inner loop got to only 32000 it had already allocated 33 gb of memory
I think you only need to allocate one set of integers, since I cant tell what you are trying to do it hard to know which to remove, but this is creating a 2d array 264000 by 264000 which is huge (~70billion = ~280gb of memory), surely you dont mean that
OK taking a comment from below, maybe you do mean it
If this is what you really want then you are going to need a very chunky computer and a lot of time.
Plus you are definitely going to have to test the return from those calloc and malloc calls to make sure that every alloc works.
A lot of the time you will see answers on SO saying 'check the return from malloc' but in fact most modern OS with modern hardware will rarely fail memory allocations. But here you are pushing the edge, test every one.
'Generating zeros' is how calloc tells you it failed.
https://linux.die.net/man/3/calloc
Return Value
The malloc() and calloc() functions return a pointer to the allocated memory that is suitably aligned for any kind of variable. On error, these functions return NULL. NULL may also be returned by a successful call to malloc() with a size of zero, or by a successful call to calloc() with nmemb or size equal to zero.
I need to write in a variable allocated through a function. I found a lot of threads discussing this, such as Allocating a pointer by passing it through two functions and How to use realloc in a function in C, which helped me fixing some issues, but one remains and I can't find the cause.
Consider the following code:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
void foo(uint64_t** m)
{
*m = realloc(*m, sizeof(uint64_t) * 4);
*m[0] = (uint64_t)1;
//*m[1] = (uint64_t)2; // create a segfault and crash right here, not at the print later
}
int main()
{
uint64_t* memory = NULL;
foo(&memory);
for(int i = 0; i < 4; i++)
printf("%d = %ld\n", i, memory[i]);
return 0;
}
I send the address of memory to foo, which takes a double pointer as an argument so it can modify the variable. The realloc function need a size in bytes, so I make sure to ask for 4 * sizeof(uint64_t) to have enough space to write 4 64-bits int. (The realloc is needed, I don't need malloc).
I can then write in m[0] without issue. But if I write in m[1], the program crashes.
What did I do wrong here ?
I'm new to CUDA/C and new to stack overflow. This is my first question.
I'm trying to allocate memory dynamically in a kernel function, but the results are unexpected.
I read using malloc() in a kernel can lower performance a lot, but I need it anyway so I first tried with a simple int ** array just to test the possibility, then I'll actually need to allocate more complex structs.
In my main I used cudaMalloc() to allocate the space for the array of int *, and then I used malloc() for every thread in the kernel function to allocate the array for every index of the outer array. I then used another thread to check the result, but it doesn't always work.
Here's main code:
#define N_CELLE 1024*2
#define L_CELLE 512
extern "C" {
int main(int argc, char **argv) {
int *result = (int *)malloc(sizeof(int));
int *d_result;
int size_numbers = N_CELLE * sizeof(int *);
int **d_numbers;
cudaMalloc((void **)&d_numbers, size_numbers);
cudaMalloc((void **)&d_result, sizeof(int *));
kernel_one<<<2, 1024>>>(d_numbers);
cudaDeviceSynchronize();
kernel_two<<<1, 1>>>(d_numbers, d_result);
cudaMemcpy(result, d_result, sizeof(int), cudaMemcpyDeviceToHost);
printf("%d\n", *result);
cudaFree(d_numbers);
cudaFree(d_result);
free(result);
}
}
I used extern "C"because I could't compile while importing my header, which is not used in this example code. I pasted it since I don't know if this may be relevant or not.
This is kernel_one code:
__global__ void kernel_one(int **d_numbers) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
d_numbers[i] = (int *)malloc(L_CELLE*sizeof(int));
for(int j=0; j<L_CELLE;j++)
d_numbers[i][j] = 1;
}
And this is kernel_two code:
__global__ void kernel_two(int **d_numbers, int *d_result) {
int temp = 0;
for(int i=0; i<N_CELLE; i++) {
for(int j=0; j<L_CELLE;j++)
temp += d_numbers[i][j];
}
*d_result = temp;
}
Everything works fine (aka the count is correct) until I use less than 1024*2*512 total blocks in device memory. For example, if I #define N_CELLE 1024*4 the program starts giving "random" results, such as negative numbers.
Any idea of what the problem could be?
Thanks anyone!
In-kernel memory allocation draws memory from a statically allocated runtime heap. At larger sizes, you are exceeding the size of that heap and then your two kernels are attempting to read and write from uninitialised memory. This produces a runtime error on the device and renders the results invalid. You would already know this if you either added correct API error checking on the host side, or ran your code with the cuda-memcheck utility.
The solution is to ensure that the heap size is set to something appropriate before trying to run a kernel. Adding something like this:
size_t heapsize = sizeof(int) * size_t(N_CELLE) * size_t(2*L_CELLE);
cudaDeviceSetLimit(cudaLimitMallocHeapSize, heapsize);
to your host code before any other API calls, should solve the problem.
I don't know anything about CUDA but these are severe bugs:
You cannot convert from int** to void**. They are not compatible types. Casting doesn't solve the problem, but hides it.
&d_numbers gives the address of a pointer to pointer which is wrong. It is of type int***.
Both of the above bugs result in undefined behavior. If your program somehow seems to works in some condition, that's just by pure (bad) luck only.
I started to learn C recently. I use Code::Blocks with MinGW and Cygwin GCC.
I made a very simple prime sieve for Project Euler problem 10, which prints primes below a certain limit to stdout. It works fine until roughly 500000 as limit, but above that my minGW-compiled .exe crashes and the GCC-compiled one throws a "STATUS_STACK_OVERFLOW" exception.
I'm puzzled as to why, since the code is totally non-recursive, consisting of simple for loops.
#include <stdio.h>
#include <math.h>
#define LIMIT 550000
int main()
{
int sieve[LIMIT+1] = {0};
int i, n;
for (i = 2; i <= (int)floor(sqrt(LIMIT)); i++){
if (!sieve[i]){
printf("%d\n", i);
for (n = 2; n <= LIMIT/i; n++){
sieve[n*i] = 1;
}
}
}
for (i; i <= LIMIT; i++){
if (!sieve[i]){
printf("%d\n", i);
}
}
return 0;
}
Seems like you cannot allocate 550000 ints on the stack, allocate them dynamically instead.
int * sieve;
sieve = malloc(sizeof(int) * (LIMIT+1));
Your basic options are to store variables in data segment when your memory chunk is bigger than stack:
allocating memory for array in heap with malloc (as #Binyamin explained)
storing array in Data/BSS segments by declaring array as static int sieve[SIZE_MACRO]
All the memory in that program is allocated on the stack. When you increase the size of the array you increase the amount of space required on the stack. Eventually the method cannot be called as there isn't enough space on the stack to accomodate it.
Either experiement with mallocing the array (so it's allocated on the heap). Or learn how to tell the compiler to allocate a larger stack.