opencl runtime error in loop ( Clenqueuewritebuffer )

opencl runtime error in loop ( Clenqueuewritebuffer ) - c

When I use OpenCL to process many chunks of data it crashes in the 7th iteration.
I ensure that memory is released before each iteration of the loop, and allocated again for new chunk, but the crash still occurs with an error of -38 on Clenqueuewritebuffer()
I have tried a lot, but am not getting anywhere.
The following is the flow of my code :
clGetPlatformIDs
clGetDeviceIDs
clCreateContext
clCreateCommandQueue
clCreateProgramWithSource
clBuildProgram
clCreateKernel
for(x){
clCreateBuffer
clEnqueueWriteBuffer
clSetKernelArg
clEnqueueNDRangeKernel
clFinish
clEnqueueMapBuffer
clReleaseMemObject
}
Is it correct or do I have to use it in other ways?
If so, What am I doing wrong?...

Some code and the specific command where this error comes up would be nice.
Error -38 is CL_INVALID_MEM_OBJECT
Please check if you initialised all memory object correctly.
Could you explicitly check for the output of clCreateBuffer clCreateImage.. whatever you are using? This error could also come if the Buffer you provided to your kernel doesn't match it's parameter definition in terms of type or read/write modifiers.
EDIT to match the edited question:
1) You can change the kernel arg when the kernel is not running, but good practice is to set a kernel arg only once. (At best directly after clCreateKernel)
Even better is to reuse the assigned buffer. (Or create several kernels if you use the same buffer combinations several times)
In your case I would at least do createBuffer and setKernelArg before the loop and releaseMemObject after the loop.
2) You are doing clEnqueueMapBuffer on you mem-object. This should be followed by a clEnqueueUnmapMemObject when you are done interacting with you object. If you just want to read data from your buffer, try: enqueueReadBuffer as equivalent to enqueueWriteBuffer

Related

cgo Interacting with C Library that uses Thread Local Storage

I'm in the midst of wrapping a C library with cgo to be usable by normal Go code.
My problem is that I'd like to propagate error strings up to the Go API, but the C library in question makes error strings available via thread-local storage; there's a global get_error() call that returns a pointer to thread local character data.
My original plan was to call into C via cgo, check if the call returned an error, and if so, wrap the error string using C.GoString to convert it from a raw character pointer into a Go string. It'd look something like C.GoString(C.get_error()).
The problem that I foresee here is that TLS in C works on the level of native OS threads, but in my understanding, the calling Go code will be coming from one of potentially N goroutines that are multiplexed across some number of underlying native threads in a thread pool managed by the Go scheduler.
What I'm afraid of is running into a situation where I call into the C routine, then after the C routine returns, but before I copy the error string, the Go scheduler decides to swap the current goroutine out for another one. When the original goroutine gets swapped back in, it could potentially be on a different native thread for all I know, but even if it gets swapped back onto the same thread, any goroutines that ran there in the intervening time could've changed the state of the TLS, causing me to load an error string for an unrelated call.
My questions are these:
Is this a reasonable concern? Am I misunderstanding something about the go scheduler, or the way it interacts with cgo, that would cause this to not be an issue?
If this is a reasonable concern, how can I work around it?
cgo somehow manages to propagate errno values back to the calling Go code, which are also stored in TLS, which makes me think there must be a safe way to do this.
I can't think of a way that the C code itself could get preempted by the go scheduler, so should I introduce a wrapper C function and have IT make the necessary call and then conditionally copy the error string before returning back up to goland?
I'm interested in any solution that would allow me to propagate the error strings out to the rest of Go, but I'm hoping to avoid any solution that would require me to serialize accesses around the TLS, as adding a lock just to grab an error string seems greatly unfortunate to me.
Thanks in advance!

What I'm afraid of is running into a situation where I call into the C routine, then after the C routine returns, but before I copy the error string, the Go scheduler decides to swap the current goroutine out for another one. ...
Is this a reasonable concern?
Yes. The cgo "call C code" wrappers lock on to one POSIX / OS thread for the duration of each call, but the thread they lock is not fixed for all time; it does in fact bop around, as it were, to multiple different threads over time, as long as your goroutines are operating normally. (Since Go is cooperatively scheduled in the current implementations, you can, in some circumstances, be careful not to do anything that might let you switch underlying OS threads, but this is probably not a good plan.)
You can use runtime.LockOSThread here, but I think the best plan is otherwise:
how can I work around it?
Grab the error before Go resumes its normal scheduling algorithm (i.e., before unlocking the goroutine from the C / POSIX thread).
cgo somehow manages to propagate errno values ...
It grabs the errno value before unlocking the goroutine from the POSIX thread.
My original plan was to call into C via cgo, check if the call returned an error, and if so, wrap the error string using C.GoString to convert it from a raw character pointer into a Go string. It'd look something like C.GoString(C.get_error()).
If there is a variant of this that takes the error number (rather than fishing it out of a TLS variable), that plan should still work: just make sure that your C routines provide both the return value and the error number.
If not, write your own C wrapper, just as you suggested:
ftype wrapper_for_realfunc(char **errp, arg1type arg1, arg2type arg2) {
ftype ret = realfunc(arg1, arg2);
if IS_ERROR(ret) {
*errp = get_error();
} else {
*errp = NULL;
}
return ret;
}
Now your Go wrapper simply calls the wrapper, which fills in a pointer to C memory with an extra *C.char argument, setting it to nil if there is no error, and setting it to something on which you can use C.GoString if there is an error.
If that's not feasible for some reason, consider using runtime.LockOSThread and its counterpart, runtime.UnlockOSThread.

Can a process somehow continue without crashing after receiving SIGSEGV or SIGBUS?

I am working on a project that deals with multiple processes and threads effecting the same data. I have a line of code which can result into a segmentation fault because data can be updated from anywhere.
For that particular line, if it causes segmentation fault, I somehow want to handle it instead of letting the program crash.
Like I can simply update the memory location if the previous one was causing a segmentation fault.
Is there any possible way to do that?
UPDATE(A short summary of my case):
I want extremely speedy access to a file.
For that purpose, I am calling mmap(2) to map that file into all processes accessing it. The data I am writing to the file is in form of a particular data structure and it consumes lots of memory. So if a point comes that the size I mapped is not enough, I need to increase file size and mmap(2) that file again with the new size. For increasing the size I call ftruncate(2). ftruncate(2) may get called from any process so it may end up shrinking the file instead. So I need to check if the memory I am accessing doesn’t lead to seg faults.
I am working on macOS.

This can be made to work, but by bringing signal handlers into the picture you make your inter-process and inter-thread locking problems much more complicated. I would like to suggest an alternative approach: Reserve a field in the first page of the mmapped file to indicate the expected size of the data structure. Use fcntl file locks to mediate access to this field.
When any process wants to update the size, it takes a write lock, reads the current value, increases it, msyncs the page (using MS_ASYNC|MS_INVALIDATE should be enough), then uses ftruncate to enlarge the file, then enlarges its mapping of the file, and only then releases the write lock. If, after taking the write lock, you find that the file is already larger than the size you wanted, just enlarge your mapping and drop the lock, don't call ftruncate or change the field.
This ensures cooperating processes will never make the file smaller, and the region of memory each process has mapped is always backed by allocated storage, so you shouldn't ever get any SIGBUSes. Note that the size of the file on disk will only increase when you actually write to newly allocated space, thanks to the magic of sparse files.

Yes, you can make this work with a signal handler that catches the SIGSEGV or SIGBUS, adjusts the mmap and returns. When a signal handler returns it will resume where the signal occurred, which means for a synchronous signal like SIGSEGV or SIGBUS, it will rerun the faulting instruction.
You can see this at work in my shared memory malloc implementation -- search for shm_segv in malloc.c to see the signal handler; it's pretty simple. I've never tried this code on MacOS, but I would think it would work on OSX, as it works on all the other BSD-derived UNIXes I've tried it on. There's a an issue that, according to the POSIX spec, mmap is not async safe, so cannot be called from a signal handler, but on all systems that actually support real memory mapping (rather than emulating it with malloc+read) it should be fine.

Should we error check every call in C?

When we write C programs we make calls to malloc or printf. But do we need to check every call? What guidelines do you use?
e.g.
char error_msg[BUFFER_SIZE];
if (fclose(file) == EOF) {
sprintf(error_msg, "Error closing %s\n", filename);
perror(error_msg);
}

The answer to your question is: "Do whatever you want", there is no written rule, BUT the right question is "What do users want in case of failure".
Let me explain, if you are a student writing a test program for example, no absolute need to check for errors: it may be a waste of time.
Now, if your code may be distributed or used by other people, that quite different: put yourself in the shoes of future users. Which message do you prefer when something goes wrong with an application:
Core was generated by `./cut --output-d=: -b1,1234567890- /dev/fd/63'.
Program terminated with signal SIGSEGV, Segmentation fault.
or
MySuperApp failed to start MySuperModule because there is not enough space on the disk.
Try to free space on disk, then relaunch the app.
If this error persists contact us at support#mysuperapp.com
As it has already been addressed in the comment, you have to consider two types of error:
A fatal error is one that kills your program (app / server / site / whatever it is). It renders it unusable, either by crashing or by putting it in some state whereby it can't do it's usable work. e.g. memory allocation, disk space ...
Non-fatal error is one where something messes up, but the program can continue to do what it's supposed to do. e.g. file not found, serve other users not requesting the thing that called the error.
Source : https://www.quora.com/What-is-the-difference-between-an-error-and-a-fatal-error

Just do error checking if your program behaviour has to behave differently in case an error is detected. Let me illustrate this with an example: Assume you have used a temporary file in your program and you use the unlink(2) system call to erase that temporary file at the end of the program. Have you to check if the file has been successfully erased? Let's analyse the problem with some common sense: if you check for errors, are you going to be able (inside the program) of doing some alternate thing to cope with this? This is uncommon (if you created the file, it's rare that you will not be able to erase it, but something can happen in the time between --- for example a change in directory permissions that forbids you to write on the directory anymore) But what can you do in that case? Is it possible to use a different approach to erase temporary file in that case. Probably not... so checking (in that case) a possible error from the unlink(2) system call will be almost useless.
Of course, this doesn't apply always, you have to use common sense while programming. Errors about writing to files should be always considered, as they belong to access permissions or mostly to full filesystems (In that case, even trying to generate a log message can be useles, as you have filled your disk --- or not, that depends) Do you know always the precise environment details to obviate if a full filesystem error can be ignored. Suppose you have to connect to a server in your program. Should the connect(2) system call failure be acted upon? probably most of the times, at least a message to the user with the protocol error (or the cause of the failure) must be given to the user.... assuming everything goes ok can save you time in a prototype, but you have to cope with what can happen, in production programs.

When i want to use return value of function than suggested to check return value before using it
For example pointer return address that can be null also.so suggested to keep null check before using it.

Ptrace mprotect debugging trouble

I'm having trouble with an research project.
What i am trying to is to use ptrace to watch the execution of a target process.
With the help of ptrace i am injecting a mprotect syscall into the targets code segment (similar to a breakpoint) and set the stack protection to PROT_NONE.
After that i restore the original instructions and let the target continue.
When i get an invalid permisson segfault i again inject the syscall to unprotect the stack again and afterwards i execute the instruction which caused the segfault and protect the stack again.
(This does indeed work for simple programs.)
My problem now is, that with this setup the target (pretty) randomly crashes in library function calls (no matter whether i use dynamic or static linking).
By crashing i mean, it either tries to access memory which for some reason is not mapped, or it just keeps hanging in the function __lll_lock_wait_private (that was following a malloc call).
Let me emphasis again, that the crashes don't always happen and don't always happen at the same positions.
It kind of sounds like an synchronisation problem but as far as i can tell (meaning i looked into /proc/pid/tasks/) there is only one thread running.
So do you have any clue what could be the reason for this?
Please tell me your suggestions even if you are not sure, i am running out of ideas here ...

It's also possible the non-determinism is created by address space randomization.
You may want to disable that to try and make the problem more deterministic.
EDIT:
Given that turning ASR off 'fixes' the problem then maybe the under-lying problem might be:
Somewhere thinking 0 is invalid when it should be valid, or visaversa. (What I had).
Using addresses from one run against a different run?

Is there a way to pre-emptively avoid a segfault?

Here's the situation:
I'm analysing a programs' interaction with a driver by using an LD_PRELOADed module that hooks the ioctl() system call. The system I'm working with (embedded Linux 2.6.18 kernel) luckily has the length of the data encoded into the 'request' parameter, so I can happily dump the ioctl data with the right length.
However quite a lot of this data has pointers to other structures, and I don't know the length of these (this is what I'm investigating, after all). So I'm scanning the data for pointers, and dumping the data at that position. I'm worried this could leave my code open to segfaults if the pointer is close to a segment boundary (and my early testing seems to show this is the case).
So I was wondering what I can do to pre-emptively check whether the current process owns a particular offset before trying to dereference? Is this even possible?
Edit: Just an update as I forgot to mention something that could be very important, the target system is MIPS based, although I'm also testing my module on my x86 machine.

Open a file descriptor to /dev/null and try write(null_fd, ptr, size). If it returns -1 with errno set to EFAULT, the memory is invalid. If it returns size, the memory is safe to read. There may be a more elegant way to query memory validity/permissions with some POSIX invention, but this is the classic simple way.

If your embedded linux has the /proc/ filesystem mounted, you can parse the /proc/self/maps file and validate the pointer/offsets against that. The maps file contains the memory mappings of the process, see here

I know of no such possibility. But you may be able to achieve something similar. As man 7 signal mentions, SIGSEGV can be caught. Thus, I think you could
Start with dereferencing a byte sequence known to be a pointer
Access one byte after the other, at some time triggering SIGSEGV
In SIGSEGV's handler, mark a variable that is checked in the loop of step 2
Quit the loop, this page is done.
There's several problems with that.
Since several buffers may live in the same page, you might output what you think is one buffer that are, in reality, several. You may be able to help with that by also LD_PRELOADing electric fence which would, AFAIK cause the application to allocate a whole page for every dynamically allocated buffer. So you would not output several buffers thinking it is only one, but you still don't know where the buffer ends and would output much garbage at the end. Also, stack based buffers can't be helped by this method.
You don't know where the buffers end.
Untested.

Can't you just check for the segment boundaries? (I'm guessing by segment boundaries you mean page boundaries?)
If so, page boundaries are well delimited (either 4K or 8K) so simple masking of the address should deal with it.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight