Best method for GPU to CPU communication in OpenCL

Best method for GPU to CPU communication in OpenCL - c

I have a kernel that takes no input and whose work items don't communicate with each other. Each work item operates on a different argument based on its global_id, but this is not passed in. I want each work item to process its task, screen the result based on some criteria, and write back the result into a global memory array if it meets this criteria. What is the best way to do this? I considered a __global index that would start at 0 and increment on each write, but there is no lock on this access and the parallel processes end up in a bunch of race conditions, so I don't know where to tell each work item to write to in the output array.
If this were a higher level language, I would expect to be able to pass in a shared hash or something and just push the successful outputs onto it, key'd by global_id, but I'm having trouble figuring out what the most appropriate way to do this is in OpenCL land. Any thoughts? I am using vanilla C, not C++.

This looks like exactly what I needed, I just lacked the googlefu to get to it!
Please respond if you have any other suggestions on best practices, but for future reference, the above coupled with a __global memory buffer will fulfill my needs.

Related

LabVIEW variable array size in SubVIs on FPGA

I have acquisition code running on an cRIO FPGA target. The data is acquired from the I/O nodes and composed to an array. This array should always be of the same size thus I check that with a SubVI. The problem is that I use conditional disable structures to replace the acquistion code for different targets with different channel numbers. Now the compiler complains that it can't resolve the array to a fixed size which is not true because it could be counted by the compiler very easy.
How do I have to write my SubVI that it accepts a (at compile time) variable array? The "array size" symbol from the array palett can do this too. How?

You can use Lookup tables instead to achieve your goal. Or if you have to send this array to RT vi it would be more professional to use DMA FIFO instead. At RT side you can use polling method and read as many points you like at a time.

In short this is not possible with standard LabVIEW arrays as the size must be fixed for compilation (as these basically come down to wires in the chip).
There are two options when you actually need a variable size:
Simple and Wasteful - If there is a reasonable upper bound you can set it to the highest and use logic to control the "end". This means compiling resources for the upper end and if it is more than 100's of bytes will use up a lot of logic.
Scalable but slightly harder - The only way to achieve a large variable size array is to use some of the memory options available with some added logic for defining the size. Depending on the size you can either use look up tables (LUTs) or block RAM. Again LUTs use up logic quickly so should only be used for small arrays (Can't remember the exact size recommended but probably < 500 bytes). If you've not used it you can find some initial reading at http://zone.ni.com/reference/en-XX/help/371599H-01/lvfpgaconcepts/fpga_storing_data/#Memory_Items
Either way you will have to somehow pass the subVI the size of the array so it knows how far into the memory to ready, this would have to simply be another input.
More commonly in LabVIEW FPGA most processing is done on point-by-point data so you can centralise the storage logic without having to pass this around, however this depends on the nature of the algorithm.

How to save state and return to a deep C function?

Background
I am porting an existing C program to work as an online game using Emscripten.
The problem is that Emscripten wants the program to be organised around a single function that is called 60 times a second. This is okay for the main game loop, except that there are quite a few places where the code displays a set of options and then waits for a key to be pressed to select the option. This is expressed as a function deep in a calling hierarchy using getch() to wait for a keypress. I find it hard to see how to fit this into the required Emscripten style of a function that runs and then completes.
Question
When the code has called a function, which has called a function, which has called a function, is there an easy way of saving the entire state of the callstack so that I can return to the same place at a later time?
What I've tried
The approach I am currently using is to set a global state variable to indicate my current location and to write all the things on the stack that seem important into static variables. I then return from all the functions. To reenter I use the global variable to decide which function to call and which variables to reload from saved data. However, this involves writing a lot of extra code and is very error-prone.
I wondered about using a thread for the game logic and just sending messages from the GUI thread, but the current thread API inside Emscripten seems to require me to try and copy all of the game data into a message so this feels like a lot more work for little benefit.
Emscripten supports setjmp/longjmp but as far as I understand, this only does half the job. I think I can use a longjmp to simply return from a deep function back to the upper level, but I don't see anyway that I can use it to later go back to where I was.
Any better ideas on how I can do this?

you cannot return from a callstack and re-enter it again. You can only make deeper calls to still be able to return to the current state. Once the function returns, the same stack (same physical memory locations) is reused for the following calls, and values get overwritten.
I don't know Emscripten; could a getch() wrapper recursively drive the loop until a key is pressed?
setjmp/longjmp saves the stack offset, but not the values on the stack. It's only useful for popping multiple frames off the stack; it's the closest C comes to a thrown exception.

You can try to use asyncify (https://github.com/kripken/emscripten/wiki/Asyncify) but it is not recommended. A quite better way would be the use of the emterpreter (https://github.com/kripken/emscripten/wiki/Emterpreter) or doppio instead (https://github.com/plasma-umass/doppio). However there might be a quite better solution in future, if you could use an evolved standard of WebAssembly. Until now the only certain ways to get such a programm running is to eliminate all recursions or to implement your own callstack. Nevertheless you will have to save your state outside of the JavaScript stack because it can not be manipulated by the programm itself. That is also the reason why longjmp does not work in this case.

C ncurses prevent resize

I am starting to learn how to use ncurses right now, and I do some calculations based on the number of lines and columns when the program starts.
It would be too much work for me to do dynamic calculation to manage the display, so I would need to find a way to block the resize of the shell during the execution, is this possible ?

There is certainly no portable or general-purpose way of blocking display size changes. Specific terminal emulators might offer this feature, but I don't know of any which do. It is generally possible to create a window of fixed size, but the terminal emulator would have to do that; it is invisible to the console code running inside the terminal.
If you find it difficult to respond to dynamic display size changes, you probably need to restructure your code. Otherwise, you can just ignore the size change, which might result in a confusing experience for your users, or might just result in them seeing either a portion of the output or a lot of blank space, depending on the nature of the resizing. (To get the latter effect, you need to avoid relying on automatic line wrapping and scrolling. On the other hand, automatic wrapping and scrolling are often just what you need to make your application window-size-independent.)

Advanced Memory Editing/Function Calling

I've gotten extremely interested into coding trainers (Program that modifies value of a different process) for video games. I've done the simple 'god-mode' and 'unlimited money' things, but I want to do alot more than that. (Simple editing using WriteProcessMemory)
There are memory addresses of functions on the internet of the video game I'm working on, and one of functions is like "CreateCar" and I'm wanting to call that function from an external program.
My question: How can I call a function from an external process in C/C++, provided the function address, using a process handle or other method.
PS: If anyone could link me to tools (I've got debuggers, no need for more..) that help with this sort of thing, that'd be nice.

You can't, at least not safely. If the function has exactly one parameter, you can create a new thread in that process at the function address. If it has more, you might want to inject a DLL to do it.
But neither of these solutions are safe because creating a new thread to run the function can and will corrupt data structures if other threads are currently using them. The only safe way to call a function in another process is to somehow insert the call in exactly the right place in that process so that it's logically correct for that program. Never mind the technical hurdles (inserting hooks at arbitrary locations); you need to know exactly how the program works, which basically means you have a lot of reverse engineering ahead of you (or you need to get the source code).

send glib hashtable with MPI

i recently came across a problem with my parallel program. Each process has several glib hashtables that need to be exchanged with other processes, these hashtables may be quite large. What is the best approach to achieve that?
create derived datatype
use mpi pack and unpack
send key & value as arrays (problem, since amount of elements is not known at compile time)
I haven't used 1 & 2 before and don't even know if thats possible, that's why i am asking you guys..

Pack/unpack creates a copy of your data: if your maps are large, you'll want to avoid that. This also rules out your 3rd option.
You can indeed define a custom datatype, but it'll be a little tricky. See the end of this answer for an example (replacing "graph" with "map" and "node" with "pair" as you read). I suggest you read up on these topics to get a firm understanding of what you need to do.
That the number of elements is not known at compile time shouldn't be a real issue. You can just send a message containing the payload size before sending the map contents. This will let the receiving process allocate just enough memory for the receive buffer.
You may also want to consider simply printing the contents of your maps to files, and then having the processes read each others' ouput. This is much more straightforward, but also less elegant and much slower than message passing.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Best method for GPU to CPU communication in OpenCL - c

This looks like exactly what I needed, I just lacked the googlefu to get to it! Please respond if you have any other suggestions on best practices, but for future reference, the above coupled with a __global memory buffer will fulfill my needs.

Related

LabVIEW variable array size in SubVIs on FPGA

How to save state and return to a deep C function?

C ncurses prevent resize

Advanced Memory Editing/Function Calling

send glib hashtable with MPI

Categories

Resources