send glib hashtable with MPI - c

i recently came across a problem with my parallel program. Each process has several glib hashtables that need to be exchanged with other processes, these hashtables may be quite large. What is the best approach to achieve that?
create derived datatype
use mpi pack and unpack
send key & value as arrays (problem, since amount of elements is not known at compile time)
I haven't used 1 & 2 before and don't even know if thats possible, that's why i am asking you guys..

Pack/unpack creates a copy of your data: if your maps are large, you'll want to avoid that. This also rules out your 3rd option.
You can indeed define a custom datatype, but it'll be a little tricky. See the end of this answer for an example (replacing "graph" with "map" and "node" with "pair" as you read). I suggest you read up on these topics to get a firm understanding of what you need to do.
That the number of elements is not known at compile time shouldn't be a real issue. You can just send a message containing the payload size before sending the map contents. This will let the receiving process allocate just enough memory for the receive buffer.
You may also want to consider simply printing the contents of your maps to files, and then having the processes read each others' ouput. This is much more straightforward, but also less elegant and much slower than message passing.

Related

Send structure with dynamically allocated arrays in MPI

I would like an opinion on a situation in which I find myself in my project. i basically have a structure composed of an int, an array of ints and an array of chars. I need to send this structure to another process. the problem is that these two arrays in the structure are dynamically allocated (practically they are pointers). What's the best way to do this?
I have two main ideas:
Use Pack and Unpack but this is very heavy (this sends in repeat often).
Concatene all three data in one single array and send it with a simple send.
Use MPI_Type_struct but I don't know the specific size of this arrays (it change often in all send).
Someone can help me please. I don't upload code because is very complex and long.
Of those three options Pack/Unpack is probably the best choice. (2) is basically what Pack/Unpack does. MPI_Type_struct won't work if elements inside the struct are dynamically allocated.
Another option is to simply send the arrays in separate messages. Avoids the extra buffer and the packing/unpacking but of course sends more messages, so it may be better or worse performance-wise. If the arrays are very small, it will probably be worse, otherwise it might not make much of a difference or even be faster. Try it out and measure what's best.

How to send and receive a binary tree using MPI?

I want to send a binary tree from one core to another use some function
like MPI_Send(). Or do there have any fast algorithm to make this function?
The data structure I use is
typedef struct BiNode{
struct BiNode *lchi,*rchi;
struct BiNode *parent;
char *name;
}BiNode;
This binary tree have more than 2000 leaves.
Read more about serialization. A 2000 nodes tree is, on current machines and networks, quite a small piece of data. If the average name length is a dozen of bytes, you need to transmit a few dozens of kilobytes (not a big deal today). Typical datacenter network bandwidth is 100Mbytes/sec, and inter-process communication (using e.g. some pipe(7) or unix(7) sockets between cores of the same processor) is usually at least ten times faster. See also http://norvig.com/21-days.html
Or do there have any fast algorithm to make this function?
You probably need some depth-first traversal (and there is probably nothing faster).
You might consider writing your tree in some textual format -or some text-based protocol- such as (some customized variant using) JSON (or XML or YAML or S-expressions). Then take advantage of existing JSON libraries, such as Jansson. They are capable of encoding and decoding your data (in some JSON format) in a dynamically allocated string buffer.
If performance is critical, consider using some binary format, like XDR or ASN-1. Or simply compress the JSON (or other textual) encoding, using some existing compression library (perhaps zlib).
My guess is that in your case, it is not worth the trouble (using JSON is a lot simpler to code, and your development time has some cost and value). Your bottleneck is probably the network itself, not any software layers. But you need to benchmark.
MPI has a feature called datatypes. A full explanation would take a really long time, but you probably want to look at structs in there (though you might be able to get away with vectors depending on how your memory is laid out).
However, you probably can't just use MPI datatypes because you'd just be transmitting a bunch of pointers which won't mean anything to the process on the other end. Instead you have to decide which parts you actually need to send and serialize them in a way that makes sense.
So you have a few options I think.
Change the way your tree is laid out in memory so it's an array of contiguous memory where all of the pointers you have above become indices in the array.
This might not actually make sense in the context of your application, but it makes the "tree" very easy to transmit. At that point, you can either just send a large array of bytes or you can construct MPI datatypes to describe each cell in the array and send an array of 2000 of those.
Re-create the tree on the other process from the source data (whether that's a file or something else).
This is probably not the answer you were looking for and doesn't help if you've generated this data from anything non-trivial in the middle of your application.
Use POSIX shared memory.
Since you say "core" in the description of your question, I'm assuming you want to transfer data between OS processes on the same physical machine. If that's the case, you can use shared memory and you don't need to do message passing at all. Just open a shared memory region, attach to it with the other process and "poof" all of the data is available on the other end. As long as you share all of the memory that those pointers are pointing to, I think you'll be fine.

Best method for GPU to CPU communication in OpenCL

I have a kernel that takes no input and whose work items don't communicate with each other. Each work item operates on a different argument based on its global_id, but this is not passed in. I want each work item to process its task, screen the result based on some criteria, and write back the result into a global memory array if it meets this criteria. What is the best way to do this? I considered a __global index that would start at 0 and increment on each write, but there is no lock on this access and the parallel processes end up in a bunch of race conditions, so I don't know where to tell each work item to write to in the output array.
If this were a higher level language, I would expect to be able to pass in a shared hash or something and just push the successful outputs onto it, key'd by global_id, but I'm having trouble figuring out what the most appropriate way to do this is in OpenCL land. Any thoughts? I am using vanilla C, not C++.
This looks like exactly what I needed, I just lacked the googlefu to get to it!
Please respond if you have any other suggestions on best practices, but for future reference, the above coupled with a __global memory buffer will fulfill my needs.

LabVIEW variable array size in SubVIs on FPGA

I have acquisition code running on an cRIO FPGA target. The data is acquired from the I/O nodes and composed to an array. This array should always be of the same size thus I check that with a SubVI. The problem is that I use conditional disable structures to replace the acquistion code for different targets with different channel numbers. Now the compiler complains that it can't resolve the array to a fixed size which is not true because it could be counted by the compiler very easy.
How do I have to write my SubVI that it accepts a (at compile time) variable array? The "array size" symbol from the array palett can do this too. How?
You can use Lookup tables instead to achieve your goal. Or if you have to send this array to RT vi it would be more professional to use DMA FIFO instead. At RT side you can use polling method and read as many points you like at a time.
In short this is not possible with standard LabVIEW arrays as the size must be fixed for compilation (as these basically come down to wires in the chip).
There are two options when you actually need a variable size:
Simple and Wasteful - If there is a reasonable upper bound you can set it to the highest and use logic to control the "end". This means compiling resources for the upper end and if it is more than 100's of bytes will use up a lot of logic.
Scalable but slightly harder - The only way to achieve a large variable size array is to use some of the memory options available with some added logic for defining the size. Depending on the size you can either use look up tables (LUTs) or block RAM. Again LUTs use up logic quickly so should only be used for small arrays (Can't remember the exact size recommended but probably < 500 bytes). If you've not used it you can find some initial reading at http://zone.ni.com/reference/en-XX/help/371599H-01/lvfpgaconcepts/fpga_storing_data/#Memory_Items
Either way you will have to somehow pass the subVI the size of the array so it knows how far into the memory to ready, this would have to simply be another input.
More commonly in LabVIEW FPGA most processing is done on point-by-point data so you can centralise the storage logic without having to pass this around, however this depends on the nature of the algorithm.

Joining output binary files from MPI simulation

I have 64 output binary files from an MPI simulation using a C code.
The files correspond to the output of 64 processes. What would be a way to join all those files into a single file, perhaps using a C script?
Since this was tagged MPI, I'll offer an MPI solution, though it might not be something the questioner can do.
If you are able to modify the simulation, why not adopt an MPI-IO approach? Even better, look into HDF5 or Parallel-NetCDF and get a self-describing file format, platform portability, and a host of analysis and vis tools that already understand your file format.
But no matter which approach you take, the general idea is to use MPI to describe which part of each file belongs to each process. The easiest example is if each process contributes to a 1D array. then for a logically global array of N items, each process contributes 1/N items at offset "myrank/N"
Since all the output files are fairly small and the same size, it would be easy to use MPI_Gather to assemble one large binary array on one node which could then be written to a file. If allocating a large array is an issue, you could simply use MPI_ISend and MPI_Recv to write to the file one piece at at time.
Obviously this is a pretty primitive solution, but it is also very straightforward, foolproof and really won't take notably longer (assuming you're doing all this at the end of your simulation).

Resources