malloc works, cudaHostAlloc segfaults? - c

I am new to CUDA and I want to use cudaHostAlloc. I was able to isolate my problem to this following code. Using malloc for host allocation works, using cudaHostAlloc results in a segfault, possibly because the area allocated is invalid? When I dump the pointer in both cases it is not null, so cudaHostAlloc returns something...
works
in_h = (int*) malloc(length*sizeof(int)); //works
for (int i = 0;i<length;i++)
in_h[i]=2;
doesn't work
cudaHostAlloc((void**)&in_h,length*sizeof(int),cudaHostAllocDefault);
for (int i = 0;i<length;i++)
in_h[i]=2; //segfaults
Standalone Code
#include <stdio.h>
void checkDevice()
{
cudaDeviceProp info;
int deviceName;
cudaGetDevice(&deviceName);
cudaGetDeviceProperties(&info,deviceName);
if (!info.deviceOverlap)
{
printf("Compute device can't use streams and should be discarded.");
exit(EXIT_FAILURE);
}
}
int main()
{
checkDevice();
int *in_h;
const int length = 10000;
cudaHostAlloc((void**)&in_h,length*sizeof(int),cudaHostAllocDefault);
printf("segfault comming %d\n",in_h);
for (int i = 0;i<length;i++)
{
in_h[i]=2; // Segfaults here
}
return EXIT_SUCCESS;
}
~
Invocation
[id129]$ nvcc fun.cu
[id129]$ ./a.out
segfault comming 327641824
Segmentation fault (core dumped)
Details
Program is run in interactive mode on a cluster. I was told that an invocation of the program from the compute node pushes it to the cluster. Have not had any trouble with other home made toy cuda codes.
Edit
cudaError_t err = cudaHostAlloc((void**)&in_h,length*sizeof(int),cudaHostAllocDefault);
printf("Error status is %s\n",cudaGetErrorString(err));
gives driver error...
Error status is CUDA driver version is insufficient for CUDA runtime version

Always check for Errors. It is likely that cudaHostAlloc is failing to allocate any memory. If it fails, you are not bailing but are rather writing to unallocated address space. When using malloc it allocates memory as requested and does not fail. But there are cases when malloc may result in failures as well, so it is best to do checks on the pointer before writing into it.
For future, it may be best to do something like this
int *ptr = NULL;
// Allocate using cudaHostAlloc or malloc
// If using cudaHostAlloc check for success
if (!ptr) ERROR_OUT();
// Write to this memory
EDIT (Response to edit in the question)
The error message indicates you have an older driver compared to the toolkit. If you do not want to be stuck for a while, try to download an older version of cuda toolkit that is compatible with your driver. You can install it in your user account and use its nvcc + libraries for temporarily.

Your segfault is not caused by the writes to the block of memory allocated by cudaHostAlloc, but rather from trying to 'free' an address returned from cudaHostAlloc. I was able to reproduce your problem using the code you provided, but replacing free with cudaFreeHost fixed the segfault for me.
cudaFreeHost

Related

Segmentation fault - Socket Programming - C

I am trying to write an echo server\client model in C. My code compiles but throws a segmentation fault error at run-time [I believe on the server side process]. When testing in CLion debug environment, the server process is able to execute the accept() system call and enter into a a waiting state until a client connects. Therefore, I believe that the segmentation fault error happens after the client makes the connect() system call.
Here are the relevant snippets of code (only the last part - not full program):
/* [6] LISTEN FOR CONNECTIONS ON BOUND SOCKET===================================================================== */
struct sockaddr_storage ample; /* from Beej Guide 5.6 accept() */
socklen_t ample_sz = sizeof(ample);
fd_activeSock = accept(fd_listenSock, (struct sockaddr *)&established_SERV_param, &ample_sz);
if (fd_activeSock == -1) /* Error checking */
{
fprintf(stderr, "\nNo forum for communication...\nTERMINATING PROCESS");
exit(EXIT_FAILURE);
}
printf("\nCommunication Established! What's your sign??");
freeaddrinfo(established_SERV_param); /* free up memory */
/* [7] ACCEPT A CONNECTION (BLOCKING)============================================================================= */
/* MAIN LOOP====================================================================================================== */
while(1)
{
bzero(msg_incoming, 16);
recv(fd_activeSock, msg_incoming, 16, 0);
printf("%s", msg_incoming);
send(fd_activeSock, msg_incoming, 16, 0);
}
When I run both programs in separate terminals (server process first, of course), the last print statement that runs before the error is:
printf("\nCommunication Established! What's your sign??");
The error is output to the server terminal. There is a core dump; for future issues, could someone suggest a beginners tutorial on combing through core dump files. Also, I have run the code with the freeaddrinfo() call commented out and still get a segmentation fault error so I do not believe that this is the issue. Why run it at all? I do not want memory leaks. Thank you for your help.
recv() does not explicitly place a null terminator at the end of the buffer, but printf() expects one.
In the statements:
bzero(msg_incoming, 16);
recv(fd_activeSock, msg_incoming, 16, 0);
printf("%s", msg_incoming);
Although msg_incoming has been zeroed, when it is populated in the recv call, if all 16 elements are populated, there is no guaranteed that the last element of the array was populated with '\0', leaving the buffer as a non-null terminated array. If that happens, A segfault is likely when printf() is called. Or worse, a segfault may not occur, leading you to believe your code works fine. (AKA undefined behavior)
The fix is to check the return value of recv():
ssize_t bytes = recv(fd_activeSock, msg_incoming, 16, 0);
if(bytes <= 0)
{
//handle error/end of message condition
}
else
{
msg_incoming[bytes] = '\0';
printf("%s", msg_incoming);
}
Additional material on Reading data with a socket.
freeaddrinfo(established_SERV_param)
Should be called when established_SERV_param is obtained by getaddrinfo. Here established_SERV_param is probably a stack variable. Hence, you are trying to free a pointer to stack variable.
Umm something is wrong in your program. Since, freeaddrinfo expects a pointer but it is a variable since you use & in call to accept. Removing the call to freeaddrinfo may fix it.
If above is not enough then it is important to see how msg_incoming is defined/allocated. It should not be a const char array or initialised by a string literal making it a const. If it is a pointer it should be adequately allocated memory using malloc.
Analysing core dump:
Compile your code with debug On and optimisation Off
gcc -g -O0
Then open the core file in gdb as
gdb <executable> <core file>
(gdb) bt
Above, bt will show you the back trace where the program crashed. You can go the function it crashed by command fr 0 and print some variables. A tutorial for gdb can found here

Segmentation Fault using getcontext() in thread library

I am trying to implement a user level thread library in C using systems calls such as get context, swap context , etc
I have a thread control block that looks like this :
struct tcb {
int thread_id;
int thread_pri;
ucontext_t *thread_context;
struct tcb *next;
}
And I have a function called init() that looks like this:
void t_init()
{
tcb *tmp;
tmp = malloc(sizeof(tcb));
getcontext(tmp->thread_context); /* let tmp be the context of main() */
running_head = tmp;
}
I used gdb and I got a segmentation fault during runtime at the getcontext(tmp->thread_context) function.
I have read the man pages for getcontext() but am unsure as to why this is returning a segmentation fault to me!
Any suggestions please?
You haven't allocated any space for thread_context, try
void t_init()
{
struct tcb *tmp;
tmp = malloc(sizeof(struct tcb));
if (!tmp)
return -1;
memset(&tmp, 0, sizeof(struct tcb));
tmp->thread_context = malloc(sizeof(ucontext_t));
if (!tmp->thread_context)
return -1;
getcontext(tmp->thread_context);
}
We can get the following information about getcontext/setcontext "The GNU C Library Reference Manual Chapter:23 Non Locals Exits, Page 622)", and found the following
While allocating the memory for the stack one has to be careful.
Most modern processors keep track of whether a certain memory region is allowed to contain code which is executed or not. Data segments and
heap memory is normally not tagged to allow this. The result is that
programs would fail. Examples for such code include the calling
sequences the GNU C compiler generates for calls to nested functions.
Safe ways to allocate stacks correctly include using memory on the
original threads stack or explicitly allocate memory tagged for
execution using memory mapped I/O.
This is causing the problem and you should use the recommended step to allocate the memory(using memory mapped I/O For more information, Please refer the libc manual).

Assert calls segmentation fault

I'd written some sort of basic multi-threading library. So, for each thread, I'd a context (ucontext_t). In one of my test programs, I put an assert which failed. Instead of aborting with a line number, it threw a segmentation fault. I then checked and saw the stack size of my context was 8192. When I increased it to 16394, the assert failure worked as expected.
Can you someone tell me as to how assert works internally and why would it use up so many bytes? I believe 8192 is a fairly large size for my context.
This is how my thread is created
MyThread *temp;
temp=malloc(sizeof(MyThread_t));
ucontext_t tempContext;
if (getcontext(&tempContext) == -1)
temp->ThreadId = 0;
tempContext.uc_stack.ss_sp = (char *)malloc(SIZE_STACK*sizeof(char));;
tempContext.uc_stack.ss_size = SIZE_STACK*sizeof(char);
tempContext.uc_link = NULL;
makecontext(&tempContext,(void(*)(void))start_funct,1, args);
And my test function has it this way.
T = MyThreadCreate(t0, (void *)n2);
re=MyThreadJoin(T);
printf("%d\n",re);
assert(re==-1);
re value is 0. When my SIZE_STACK is 8192, I get a seg fault. When its increased to 16384, it is a proper abort as expected from assert.
The implementation of assert is platform dependent.

kernel crash with kmalloc

I am trying to assign memory using kmalloc in kernel code in fact in a queueing discipline. I want to assign memory to q->agg_queue_hdr of which q is a queueing discipline and agg_queue_hdr is a struct, so if assign memory like this:
q->agg_queue_hdr=kmalloc(sizeof(struct agg_queue), GFP_ATOMIC);
the kernel crashes. Based on the examples of kmalloc I saw from searching, I now changed it to:
agg_queue_hdr=kmalloc(sizeof(struct agg_queue), GFP_ATOMIC);
with which the kernel doesn't crash. Now I want to know how can I assign memory to the pointer q->agg_queue_hdr?
Make sure q is pointed to a valid area of memory. Then you should be able to assign q->agg_queue_hdr like you had it to begin with.
Why don't you modify your code with below way, which would avoid kernel panic.
if (q->agg_queue_hdr) {
q->agg_queue_hdr = kmalloc(sizeof(struct agg_queue), GFP_ATOMIC);
}
else {
printk("[+] q->agg_queue_hdr invalid \n");
dump_stack(); // print callstack in the kernel log.
}
When disassembing "q->agg_queue_hdr", "ldr" instruction will works where kernel panic occurs.

Does valgrind track memory initialization through drivers?

valgrind is reporting uninitialized memory errors from code like this:
unsigned char buf[100];
struct driver_command cmd;
cmd.len = sizeof(buf);
cmd.buf = buf;
ioctl(my_driver_fd, READ, &cmd);
for(i = 0; i < sizeof(buf); i++)
{
foo(buf[i]); /* <<--- uninit use error from valgrind */
}
If I memset() the buf before the driver call, the error goes away.
Can valgrind detect whether the linux driver is properly writing to the buffer? (I looked at the driver code, and it seems to be correct, but maybe I'm missing something.)
Or does it just pass the driver call through and has no way of knowing that the buffer has been written inside the kernel?
Thanks.
Valgrind obviously can't trace execution into the kernel, but it does know the visible semantics of most system calls. But ioctl is too unpredictable. If you had coded your driver so that that was a read call, it would get it right. That's better practice anyway.

Resources