What ZeroMQ socket type to use for inter process communication? - c

When I had two threads, I used PAIR socket type. But now I am using two processes that can be either on one machine or on different machines. I don't need requests and responses, I don't need sending to multiple nodes, etc. I need same thing that I had with PAIR (async, bidirectional), but with processes and with network. What socket types should I use?

Unfortunately, your world has gotten a bit more complicated. There's no direct analog to the PAIR/PAIR socket pairing in more widely distributed systems.
That said, if you keep roughly the same shape of the topology (two nodes connecting exclusively to each other and no other nodes) then you can pretty much achieve what you want using ROUTER/DEALER or even DEALER/DEALER (as you suggested in comments). Those sockets are sort of like REQ/REP, but they don't enforce a strict request/response communication pattern, they are entirely unrestricted, so in effect you get the same thing. The only problem comes if you intend to add more nodes, at which point you have to start managing things a little differently, in particular the DEALER socket doesn't allow you to choose which node you send to, it's strictly round robin.
But, doing that should get you what you're looking for (async, bidirectional).
The ROUTER socket type can require a little additional complexity since you need to keep track of the "identifier" of the other node to be able to send back to it (you can get this almost for free, especially in your case with just one peer, by using it directly out of the sent message). Because this is an exclusive pair, you can ignore the round-robin uncertainty introduced by the DEALER socket, and just go straight to DEALER/DEALER, which gives you an unrestricted message pattern and doesn't require any management of identities.

#Marko let me notice,
there is a principal separation between a ZMQ.SOCKET's (formal-communication-pattern) "type" and whatever a transport, one opts to .bind() / .connect() over
Once your architecture was happy (as you have written ) to work with PAIR/PAIR "session"
you may just without a single additional SLOC change the transport that is to be used
it works
Python 2.7.3 ...
>>> import zmq
>>> zmq.zmq_version()
'2.1.11'
>>> aZmqCONTEXT = zmq.Context() # --<BoCTX>-- [SideA] Node
>>> aZmqSOCKET = aZmqCONTEXT.socket( zmq.PAIR ) # here one decides about a type
>>> aZmqSOCKET.bind( "tcp://192.168.0.62:2027" ) # here is the transport // used to be ( "ipc://...")
>>> aZmqSOCKET.recv() # here the PAIR waits for 1st MSG
'aMSG from the opposite PAIR/PAIR zmq-session Node arrived via TCP-transport ... QED'
>>> aZmqSOCKET.setsockopt( zmq.LINGER, 0 ) # pre-termination tidy-up
>>> aZmqSOCKET.close()
>>> aZmqCONTEXT.term() # --<EoCTX>-- safe to clean-exit
>>>

Related

SwiftNIO: How "expensive" is transformation in each ChannelHandler?

Checking this tutorial: https://rderik.com/blog/understanding-swiftnio-by-building-a-text-modifying-server/
One thing I do not understand that the main point using NIO directly is to increase speed of a backend service.
But, when we has this pipe:
Client: hello
|
v
Server
|
v
BackPressureHandler (Receives a ByteBuffer - passes a ByteBuffer)
|
v
UpcaseHandler(Receives a ByteBuffer - passes a [CChar])
|
v
VowelsHandler(Receives a [CChar] - passes a ByteBuffer)
|
v
ColourHandler(Receives a ByteBuffer - passes a ByteBuffer)
|
v
Client: receives
H*LL* (In green colour)
parameter gets transformed many times. In UpcaseHandler NIOAny -> ByteBuffer -> string -> CChar -> NIOAny
then in VowelsHandler again: NIOAny -> ByteBuffer -> string -> CChar -> NIOAny
What is the advantage to have so many independent handlers?
If server receive a 'flat' JSON, is it worth to process it with with JSONEncoder, if speed, each microseconds are critical? try JSONEncoder().encode(d2)
Or is it worth, is it common to implement own JSON processor. I.e. an event driven JSON parser?
I think it's useful to use things like an UppercasingHandler when trying to learn and understand SwiftNIO. In the real world however, this is too fine grained for a ChannelHandler.
Typically, the use-case for a ChannelHandler is usually one of the following (not exhaustive):
a whole network protocol (example NIOSSLClientHandler which adds TLS for a client connection)
added value that may be useful with multiple protocols (such as the BackpressureHandler)
added value that may be useful for debugging (example NIOWritePCAPHandler)
So whilst the overhead of a ChannelHandler isn't huge, it is definitely not completely free and I would recommend not overusing them. Abstraction is useful but even in a SwiftNIO-based application or library we should try to express everything as ChannelHandlers in a ChannelPipeline.
The value-add of having something in a ChannelHandler is mostly around reusability (the HTTP/1, HTTP/2, ... implementations don't need to know about TLS), testability (we can test a network protocol without actually needing a network connection) and debuggability (if something goes wrong, we can easily log the inputs/outputs of a ChannelHandler).
The NIOWritePCAPHandler for example is a great example: In most cases, we don't need it. But if something goes wrong, we can add it in between a TLS handler and say the HTTP/2 handler(s) and we get a plaintext .pcap file without having to touch any code apart from the code that inserts it into the ChannelPipeline which can even be done dynamically after the TCP connection is already established.
There's absolutely nothing wrong with a very short ChannelPipeline. Many great examples have just a few handlers, for example:
TLS handler <--> network protocol handler(s) [HTTP/1.1 for example] <--> application handler (business logic)

LabVIEW: How to exchange lots of variables between loops?

I have two loops:
One loop gets data from a device and processes it. Scales received variables, calculates extra data.
Second loop visualizes the data and stores it.
There are lots of different variables that need to passed between those two loops - about 50 variables. I need the second loop to have access only to the newest values of the data. It needs to be able to read those variables any time they are needed to be visualized.
What is the best way to share such vector between two loops?
There are various ways of sharing data.
The fastest and simplest is a local variable, however that is rather uncontrolled, and you need to make sure to write them at one place (plus you need an indicator).
One of the most advanced options is creating a class for your data, and use an instance (if you create a by-ref class, otherwise it won't matter), and create a public 'GET' method.
In between you have sevaral other options:
queues
semaphores
property nodes
global variables
shared variables
notifiers
events
TCP-IP
In short there is no best way, it all depends on your skills and application.
As long as you're considering loops within the SAME application, there ARE good and bad ideas, though:
queues (OK, has most features)
notifiers (OK)
events (OK)
FGVs (OK, but keep an eye on massively parallel access hindering exec)
semaphores (that's not data comms)
property nodes (very inefficient, prone to race cond.)
global variables (prone to race cond.)
shared variables (badly implemented by NI, prone to race cond.)
TCP-IP (slow, awkward, affected by firewall config)
The quick and dirty way to do this is to write each value to an indicator in the producer loop - these indicators can be hidden offscreen, or in a page of a tab control, if you don't want to see them - and read a local variable of each one in the consumer loop. However if you have 50 different values it may become hard to maintain this code if you need to change or extend it.
As Ton says there are many different options but my suggestion would be:
Create a cluster control, with named elements, containing all your data
Save this cluster as a typedef
Create a notifier using this cluster as the data type
Bundle the data into the cluster (by name) and write this to the notifier in the producer loop
Read the cluster from the notifier in the consumer loop, unbundle it by name and do what you want with each element.
Using a cluster means you can easily pass it to different subVIs to process different elements if you like, and saving as a typedef means you can add, rename or alter the elements and your code will update to match. In your consumer loop you can use the timeout setting of the notifier read to control the loop timing, if you want. You can also use the notifier to tell the loops when to exit, by force-destroying it and trapping the error.
Two ways:
Use a display loop with SEQ (Single Element Queue)
Use a event structure with User Event. (Do not put two event structures in same loop!! Use another)
Use an enum with case structure and variant to cast the data to expected type.
(Notifier isn't reliable to stream data, because is a lossy scheme. Leave this only to trigger small actions)
If all of your variables can be bundled together in a single cluster to send at once, then you should use a single element queue. If your requirements change later such that the transmission cannot be lossy, then it's a matter of changing the input to the Obtain Queue VI (with a notifier you'd have to swap out all of the VIs). Setting up individual indicators and local variables would be pretty darn tedious. Also, not good style.
If the loops are inside of the same VI then:
The simplest solution would be local variables.
Little bit better to use shared variables.
Better is to use functional global variables (FGVs)
The best solution would be using SEQ (Single Element Queue).
Anyway for better understanding please go trough this paper.

threadpools - boss/worker vs peer (workcrew) models

I'm aiming to use a threadpool with pthreads and am trying to choose between these two models of threading and it seems to me that the peer model is more suitable when working with fixed input, whereas the boss/worker model is better for dynamically changing work items. However, I'm a little unsure of how exactly to get the peer model to work with a threadpool.
I have a number of tasks that all need to be performed on the same data set. Here's some simple psuedocode for how I would look at tackling this:
data = [0 ... 999]
data_index = 0
data_size = 1000
tasks = [0 ... 99]
task_index = 0
threads = [0 ... 31]
thread_function()
{
while (true)
{
index = data_index++ (using atomics)
if index > data_size
{
sync
if thread_index == 0
{
data_index = 0
task_index++
sync
}
else
{
sync
}
continue
}
tasks[task_index](data[index])
}
}
(Firstly, it seems like there should be a way of making this use just one synchronisation point, but I'm not sure whether that's possible?)
The above code seems like it will work well for the case where the the tasks are known in advance, though I guess a threadpool is unnecessary for this particular problem. However even if the data items are still predefined across all tasks, if the tasks are not known in advance, it seems like the boss/worker model is better suited? Is it possible to use the boss/worker model but still allow the tasks to be picked up by the threads themselves (as above), where the boss essentially suspends itself until all tasks are complete? (Maybe this is still termed the peer model?)
Final question is regarding the synchronisation, barrier or condition variable and why?
If anyone can make any suggestions as to how better to approach this problem or even to poke holes in any of my assumptions, that would be great? Unfortunately I'm restricted from using a more higher-level library such as tbb for tackling this.
Edit: I should point out in case this isn't clear, each task needs to be completed in it's entirety before moving onto the next.
I'm a bit confused by your description here, hope the below is relevant.
I always looked at this pattern and found it very useful: The "boss" is responsible for detecting work and dispatching it to a worker pool based on some algorithm, from that time on, the worker is independent.
In this scenario, the worker is always waiting for work, not aware of any other instance, process requests and when it finishes, may trigger a notification of completion.
This has the advantage of good separation between the work itself and the algorithm that balance between the threads.
The other option is for the "boss" to maintain a pool of work items, and the workers to always pick them up as soon as they are free. But I guess this is more complex to implement and requires a larger amount of synchronization. I do not see the benefit of this second approach over the previous one.
Control logic and worker state is maintained by the "boss" in both scenarios.
As the paralleled work is done on a task, the "boss" "object" is handling a task, in a simple implementation, this "boss" blocks until a task is finished, allowing to call the next "boss" in line.
Regarding the Sync, unless I'm missing here something, you only need to sync once for all the workers to finish and this sync is done at the "boss" where the workers just send notifications that they finished.

Connect 4 with neural network: evaluation of draft + further steps

I would like to build a Connect 4 engine which works using an artificial neural network - just because I'm fascinated by ANNs.
I'be created the following draft of the ANN structure. Would it work? And are these connections right (even the cross ones)?
Could you help me to draft up an UML class diagram for this ANN?
I want to give the board representation to the ANN as its input. And the output should be the move to chose.
The learning should later be done using reinforcement learning and the sigmoid function should be applied. The engine will play against human players. And depending on the result of the game, the weights should be adjusted then.
What I'm looking for ...
... is mainly coding issues. The more it goes away from abstract thinking to coding - the better it is.
The below is how I organized my design and code when I was messing with neural networks. The code here is (obviously) psuedocode and roughly follows Object Oriented conventions.
Starting from the bottom up, you'll have your neuron. Each neuron needs to be able to hold the weights it puts on the incoming connections, a buffer to hold the incoming connection data, and a list of its outgoing edges. Each neuron needs to be able to do three things:
A way to accept data from an incoming edge
A method of processing the input data and weights to formulate the value this neuron will be sending out
A way of sending out this neuron's value on the outgoing edges
Code-wise this translates to:
// Each neuron needs to keep track of this data
float in_data[]; // Values sent to this neuron
float weights[]; // The weights on each edge
float value; // The value this neuron will be sending out
Neuron out_edges[]; // Each Neuron that this neuron should send data to
// Each neuron should expose this functionality
void accept_data( float data ) {
in_data.append(data); // Add the data to the incoming data buffer
}
void process() {
value = /* result of combining weights and incoming data here */;
}
void send_value() {
foreach ( neuron in out_edges ) {
neuron.accept_data( value );
}
}
Next, I found it easiest if you make a Layer class which holds a list of neurons. (It's quite possible to skip over this class, and just have your NeuralNetwork hold a list of list of neurons. I found it to be easier organizationally and debugging-wise to have a Layer class.) Each layer should expose the ability to:
Cause each neuron to 'fire'
Return the raw array of neurons that this Layer wraps around. (This is useful when you need to do things like manually filling in input data in the first layer of a neural network.)
Code-wise this translates to:
//Each layer needs to keep track of this data.
Neuron[] neurons;
//Each layer should expose this functionality.
void fire() {
foreach ( neuron in neurons ) {
float value = neuron.process();
neuron.send_value( value );
}
}
Neuron[] get_neurons() {
return neurons;
}
Finally, you have a NeuralNetwork class that holds a list of layers, a way of setting up the first layer with initial data, a learning algorithm, and a way to run the whole neural network. In my implementation, I collected the final output data by adding a fourth layer consisting of a single fake neuron that simply buffered all of its incoming data and returned that.
// Each neural network needs to keep track of this data.
Layer[] layers;
// Each neural network should expose this functionality
void initialize( float[] input_data ) {
foreach ( neuron in layers[0].get_neurons() ) {
// do setup work here
}
}
void learn() {
foreach ( layer in layers ) {
foreach ( neuron in layer ) {
/* compare the neuron's computer value to the value it
* should have generated and adjust the weights accordingly
*/
}
}
}
void run() {
foreach (layer in layers) {
layer.fire();
}
}
I recommend starting with Backwards Propagation as your learning algorithm as it's supposedly the easiest to implement. When I was working on this, I had great difficulty trying to find a very simple explanation of the algorithm, but my notes list this site as being a good reference.
I hope that's enough to get you started!
There are a lot of different ways to implement neural networks that range from simple/easy-to-understand to highly-optimized. The Wikipedia article on backpropagation that you linked to has links to implementations in C++, C#, Java, etc. which could serve as good references, if you're interested in seeing how other people have done it.
One simple architecture would model both nodes and connections as separate entities; nodes would have possible incoming and outgoing connections to other nodes as well as activation levels and error values, whereas connections would have weight values.
Alternatively, there are more efficient ways to represent those nodes and connections -- as arrays of floating point values organized by layer, for example. This makes things a bit trickier to code, but avoids creating so many objects and pointers to objects.
One note: often people will include a bias node -- in addition to the normal input nodes -- that provides a constant value to every hidden and output node.
I've implemented neural networks before, and see a few problems with your proposed architecture:
A typical multi-layer network has connections from every input node to every hidden node, and from every hidden node to every output node. This allows information from all of the inputs to be combined and contribute to each output. If you dedicate 4 hidden nodes to each input then you will losing some of the network's power to identify relationships between the inputs and outputs.
How will you come up with values to train the network? Your network creates a mapping between board positions and the optimal next move, so you need a set of training examples that provide this. End game moves are easy to identify, but how do you tell that a mid-game move is "optimal"? (Reinforcement learning can help out here)
One last suggestion is to use bipolar inputs (-1 for false, +1 for true) since this can speed up learning. And Nate Kohl makes a good point: every hidden & output node will benefit from having a bias connection (think of it as another input node with a fixed value of "1").
Your design will be highly dependant on the specific type of reinforcment learning that you plan to use.
The simplest solution would be to use back propogation. This is done by feeding the error back into the network (in reverse fashion) and using the inverse of the (sigmoid) function to determine the adjustment to each weight. After a number of iterations, the weights will automatically get adjusted to fit the input.
Genetic Algorithms are an alternative to back-propogation which yield better results (although a bit slower). This is done by treating the weights as a schema that can easily be inserted and removed. The schema is replaced with a mutated version (using principles of natural selection) several times until a fit is found.
As you can see, the implementation for each of these would be drastically different. You could try to make the network generic enough to adapt to each type of implementation but that may overcomplicate it. Once you are in production, you will usually only have one form of training (or ideally your network would already be trainined).

I want to implement a small routing table for my learning. I know it is implemented using radix/patricia tree in routers

I want to implement a small routing table for my learning? I know it is implemented using radix/patricia tree in routers?
Can someone give me an idea on how to go about implementing the same?
The major issue i feel is storing IP ADDRESS.
For example : 10.1.1.0 network next hop 20.1.1.1
10.1.0.0 network next hop 40.1.1.1
Can someone give me a declaration of the struct from which can I have an idea?
This doesn't use a radix, but it is simple to implement.
Your look-up keys aren't going to be absolute. Partial matches are possible, which would signify that you have located a network matching rule rather than a host matching rule.
I suggest you use a list of containers (sub-tables). The first list will be ordered by subnet mask from the most restrictive (host route rules with mask 255.255.255.255 ) to least restrictive (default gateway with mask 0.0.0.0 ).
Under each entry in the list will be an easily searchable structure (tree, hash table, or just a list) which is keyed on the masked portion of the address you are attempting to look up.
For each address you attempt to look up you should search for it in each net-mask's sub-table in turn and choose the first match you come across as your route to use. It will have the most restrictive net-mask possible as they are ordered from most restrictive to least, and if no match is found by the time you reach the end of the net-mask list you will find the default gateway net-mask entry in the list if you have one. This entry will be a little bit different from the others because if you have more than one entry in its sub-table they would all have the same network address. If you only want to have one default gateway then you can opt to not have the 0.0.0.0 entry and just treat that as a special case.
You may also want to have a metric (cost, speed, ...) as a sub-key for each entry (or have each network match be a list of entries with the same network/destination address ordered by their metric). This would allow you to have more than one 192.168.1.0 route (one by WiFi and one by wired Etherent) without making things difficult.
When a net-mask entry becomes empty you will probably want to remove it.
struct route4_table_subnet {
uint32_t mask;
struct route_table_network_container sub_table;
struct route_table_subnet * next;
};
struct route4_table_network_container_entry {
// The route_table_network_container contains nodes of this type, but
// however you want to implement this container is up to you
uint32_t network; // this is the key
uint32_t metric;
// route info
struct route4_table_network_container_entry * next;
};
The route info is tricky. You could simply list an IP address here and recognize when you got an IP address that was on a local network and stop looking up stuff. This would require that you recognized when you were about to look up an address that was in a local network. This would make it difficult to set up routing rules to send packets that were to an address that looked local to a router instead, which is often useful.
You could instead do what Linux does and allow for the use of interface routes in addition to address routes.
You would probably implement this by having a flag that told what type of route it was and having a union type that contained the data for that type of route. This makes interfaces like PPP where it really doesn't matter that you know the IP address of the machine on the other side of the modem is work very cleanly. It also allows you to not have an oddball case for the locally attached network. You just look them up like any other address in the table and they say "use Ethernet interface 0".
In this case when you had a packet to route you would pass the destination IP address to your route lookup function which would return the best match. If the best match was an IP address entry then you would turn around and look that IP address up in the routing table and it would return that address's best match. You would continue this until you got to an interface match (so interface match routes would be required).
You would probably want to hold on to the IP address whose lookup resulted in the interface route entry. In the case of an Ethernet route you would need to supply this address to the ARP lookup. This last matched IP could be the same as the destination address or it could be a router that is on the same network as one of your network interfaces.
Each time you found an interface match you could test that the interface is still present before returning from the route_lookup routine. In the case that an interface is no longer present you could remove it then and then continue looking for a best match. You would not have to restart the search in the net-mask list but would need to ensure that you did not miss an entry in the current network list that had a more costly metric than the interface that you just noticed had been removed. Say you have WiFi and wired Ethernet to your local home network, but you just unplugged your Ethernet which costs less than the WiFi to use (Ethernet is faster and uses less power, so you gave it a more favorable metric) -- you would now want for this packet you are trying to route to get sent to the WiFi.
I don't know how this would compare to a radix tree implementation. I suspect that it would be compeditive to it on a 32 bit machine for IPv4 (depending on how you chose your route_table_network_container) but possibly less favorably in IPv6 where address sizes are larger and subnet masks aren't used (are they? I'm not overly familiar with IPv6, sadly)
I completely ignored threading in this. I am assuming that only one thread would access the routing table at any one time. If this is not the case then adding and removing of nodes would require you to include some type of locks, which would depend on whatever platform you are planning on implementing this on.

Resources