RabbitMQ and DB transactions - database

Does RabbitMQ support a scenario where a received message acknowledgement is sent on the DB transaction commit?
Currently we send ack after DB transaction commit. If service fails inbetween, we'll get data duplication - service will get the same message again.
Is there a pattern for this problem?
Thanks!

Yes it does, but do be aware that RabbitMQ uses its own DB for message storage (at the moment). To get RabbitMQ to send an ack to the publisher, use TX mode. This is documented in the spec and on various parts of our web site.
If you want to use your own DB then you may want to set it up as an end consumer for messages. In this case, you should use your own application-level ack.
Do feel free to email the rabbitmq-discuss for more info and questions.
HTH
alexis

Related

How to publish data synchronously using mosquitto_publish?

I have written code (mosquitto_publish()) using Mosquitto to publish data to AWS.
My problem is the sequence with which data is arriving on the MQTT broker. In the Paho client, I see waitForCompletion(), but nothing similar in Mosquitto. Would anyone please help me in dealing with this problem ?
Based on the mosquitto_publich documentation, the function returns when sending has been "successful". MQTT does not guarantee the order in which messages arrive, so you should arguably watch for the arrival rather than the sending, and avoid having two messages race each other to the broker. With QoS 0, the client never knows if a message arrived; that requires QoS 1 or 2, for which additional communications are exchanged. Raise the quality of service, and you can use mosquitto_max_inflight_messages_set(mosq, 1) so that the client queues any additional messages until it receives confirmation from the server. This may be even more efficient than "waiting" for completion, since non-MQTT operations can continue. The queue might pile up if you send bursts of many messages.
The more complex alternative is to send messages unrestricted, but include an index with each, so that the subscriber can sort them upon receipt (for which it would need its own queue and delay). Not recommended if this burden is going to fall on multiple subscribers.

Could we maintain order of messages in AWS-IoT at subscriber end?

We have created a thing using AWS-IoT service. We have created a topic in that particular thing. Subscriber has subscribed to that topic and publisher is sending messages to that topic.
Below is the publisher messaging order:
message 0
message 1
message 2
message 3
message 4
At the subscriber end the sequence of messages is not maintained. It's showing like this:
message 0
message 1
message 4
message 2
message 3
True, in AWS IoT, the message broker does not guarantee order while they deliver messages to the devices.
The reason being that in a typical distributed systems architecture, a single message from the publisher to the subscriber shall take multiple paths to ensure that the system is highly available and scalable. In the case of AWS IoT, the Device Gateway supports the publisher subscriber messaging pattern and enables scalable, low-latency, and low-overhead communication.
However, based on the type of use case, there are many possible solutions that can be worked out. There should be a logic such that the publishers themselves shall do the co-ordination. One generic or simple approach could be that a sequence number addition at the device side should be sufficient to handle the ordering of the messages between publisher and subscriber. On the receiver, a logic to process or discard based on checking of the ordering based on sequence number should be helpful.
As written in the documentation of AWS
The message broker does not guarantee the order in which messages and
ACK are received.
I guess its too late to answer to this question but I'll still go ahead so others facing this issue can have a work around. I faced a similar scenario and I did the following to make sure that the order is maintained.
I added sequence ID or timestamp to the payload sent to the broker from my iot device (can be any kind of client)
I then configured the IoT rules engine (add actions) to send the messages directly to DynamoDB where the data was automatically stored in a sorted manner (needs to be configured to sort by seqID).
Then I used Lambda to pull out the data from DynamoDB for my further workflow but you can use whatever service according to yours.

I want to log all mqtt messages of the broker. How should I design schema of database. Avoiding dulplicate entries and fast searching

I am implementing a callback in java to store messages in a database. I have a client subscribing to '#'. But the problem is when this # client disconnects and reconnect it adds duplicate entries in the database of retained messages. If I search for previous entries bigger tables will be expensive in computing power. So should I allot a separate table for each sensor or per broker. I would really appreciate if you suggest me better designs.
Subscribing to wildcard with a single client is definitely an anti-pattern. The reasons for that are:
Wildcard subscribers get all messages of the MQTT broker. Most client libraries can't handle that load, especially not when transforming / persisting messages.
If you wildcard subscriber dies, you will lose messages (unless the broker queues endlessly for you, which also doesn't work)
You essentially have a single point of failure in your system. Use MQTT brokers which are hardened for production use. These are much more robust single point of failures than your hand-written clients. (You can overcome the SIP through clustering and load balancing, though).
So to solve the problem, I suggest the following:
Use a broker which can handle shared subscriptions (like HiveMQ or MessageSight), so you can balance all messages between many clients
Use a custom plugin for doing the persistence at the broker instead of the client.
You can also read more about that topic here: http://www.hivemq.com/blog/mqtt-sql-database
Also consider using QoS = 3 for all message to make sure one and only one message is delivered. Also you may consider time-stamp each message to avoid inserting duplicate messages if QoS requirement is not met.

zmq pattern for reliable multicast

I am struggling to work out how to use zmq to implement the architecture I need. I have a classic publish/subscribe situation except that once client x has subscribed to a topic I need the topic data to be sent to it to be cached if the client dies and resent on reconnect. The data order is important and I can't miss messages should the client be offline for a while.
The PUB/SUB pattern doesn't seem to know about individual clients and will just stop sending to client x if it dies. Plus I can't find out this has happened and cache the messages, or know when it reconnects.
To try to get around this I used the REQ/REP pattern so the clients can announce themselves and have some persistence but this is not ideal for a couple of reasons:
1) The clients must constantly ask "got any data for me?" which offends my sensibilities
2) What happens if there's no data to send to client x but there is to client y? Without zmq I'd have had a thread per client and simply block the one with no data but I can't block client x without also blocking client y in a single thread.
Am I trying to shove a round peg in a square hole, here? Is there some way I can get feedback from PUB saying 'failed to send to client x'? so I can cache the messages instead? Or is there some other pattern I should be using?
Otherwise it's back to low level tcp for me...
Many thanks;
Jeremy
This is an area of active research.
I'm currently working on something similar. Our solution is to have a TCP "back channel" on which to receive missed data and have the subscribers know what the last successfully received publication was so that when they reconnect, they can ask for publications since that one.
In some sense you are trying to shove a round peg in a square hole. You have choosen the tool - PUB/SUB - and are trying to solve a problem it are not designed to solve, at least not without some additional design.
The PUB/SUB is an unreliable broadcast. The client can miss messages for several reasons:
Subscribers join late, so they miss messages the server already sent.
Subscribers can fetch messages too slowly, so queues build up and then overflow.
Subscribers can drop off and lose messages while they are away.
Subscribers can crash and restart, and lose whatever data they already received.
etc...
For REQ/RSP the client do not have to constantly ask "got any data for me?", instead the client should probably acknowledge every data so that the server can send correct data next time. If the server has nothing to send, it is just quite.
eg.
client server
Hello ---------------->
(wait until something exist to send)
<-------------------- Msg 1
Ack 1 ---------------->
(wait ...)
<-------------------- Msg 2
...
There are several good ways to do what you want with zmq. First of all you should try to design your protocol. What shall happen when I connect? Should I get any old messages then? If so, how old? If i miss a message when I am connected, should I be able to get it? If the client restarts, should I then get any old messages?
I strongly recommend the very good zmq guide http://zguide.zeromq.org/page:all that have a lot of very good information regardning different ways to get reliability in a protocol. Read the complete guide, including the chapters 4 and 5 whih discuss different techniques on getting a reliable transport. Based on your problem discussion: the Chapter 5 seems like a good start. Try out some of the examples. Then design your protocol.
What about adding an Archiver process. Part of a Client's subscription process would be to also notify the Archiver to start archiving the same subscription(s). The Archiver would keeps all the messages received in an ordered list.
The Clients would record the time or id of the last published message they received. When they started after a crash, they would first contact the Archiver and say "Give me all messages since X". And they would resubscribe with the Publisher. When a client receives the same message from both the Publisher and the Archiver, it tells the Archiver to stop replaying.
The Archiver could purge messages older then the max expected down time for an offline client. Or alternately, Clients could periodically check in to say "I am up to date with message Y", allowing purging of all older items.

distributed system: sql server broker implementation

I have a distributed application consisting of 8 servers, all running .NET windows services. Each service polls the database for work packages that are available.
The polling mechanism is important for other reasons (too boring for right now).
I'm thinking that this polling mechanism would be best implemented in a queue as the .NET services will all be polling the database regularly and when under load I don't want deadlocks.
I'm thinking I would want each .NET service to put a message into an input queue. The database server would pop each message of the input queue one at a time, process it, and put a reply message on another queue.
The issue I am having is that most examples of SQL Server Broker (SSB) are between database services and not initiated from a .NET client. I'm wondering if SQL Server Broker is just the wrong tool for this job. I see that the broker T-SQL DML is available from .NET but the way I think this should work doesn't seem to fit with SSB.
I think that I would need a single SSB service with 2 queues (in and out) and a single activation stored procedure.
This doesn't seem to be the way SSB works, am I missing something ?
You got the picture pretty much right, but there are some missing puzzle pieces in that picture:
SSB is primarily a communication technology, designed to deliver messages across the network with exactly-once-in-order semantics (EOIO), in a fully transactional fashion. It handles network connectivity (authentication, traffic confidentiality and integrity) and acknowledgement and retry logic for transmission.
Internal Activation is an unique technology in that it eliminates the requirement for a resident service to poll the queue. Polling can never achieve the dynamic balance needed for low latency and low resource consumption under light load. Polling forces either high latency (infrequent polling to save resources) or high resource utilization (frequent polling required to provide low latency). Internal activation also has self-tunning capability to ramp up more processors to answer to spikes in load (via max_queue_readers) while at the same time still being capable of tuning down the processing under low load, by deactivating processors. One of the often overlooked advantages of the Internal Activation mechanism is the fact that is fully contained within a database, ie. it fails over with a cluster or database mirroring failover, and it travels with the database in backups and copy-attach operations. There is also an External Activation mechanism, but in general I much more favor the internal one for anything that fits an internal context (Eg. not and HTTP request, that must be handled outside the engine process...)
Conversation Group Locking is again unique and is a means to provide exclusive access to correlated message processing. Application can take advantage by using the conversation_group_id as a business logic key and this pretty much completely eliminates deadlocks, even under heavy multithreading.
There is also one issue which you got wrong about Service Broker: the need to put a response into a separate queue. Unlike most queueing products with which you may be familiar, SSB primitive is not the message but a 'conversation'. A conversation is a fully duplex, bidirectional communication channel, you can think of it much like a TCP socket. An SSB service does not need to 'put responses in a queue' but instead it can simply send a response on the conversation handle of the received message (much like how you would respond in a TCP socket server by issue a send on the same socket you got the request from, not by opening a new socket and sending a response). Furthermore SSB will take care of the inherent message correlation so that the sender will know exactly the response belong to which request it sent, since the response will come back on the same conversation handle the request was sent on (much like in a TCP socket case the client receives the response from the server on the same socket it sent the request on). This conversation handle is important again when it comes to the correlation locking of related conversation groups, see the link above.
You can embed .Net logic into Service Broker processing via SQLCLR. Almost all SSB applciations have a non-SSB service at at least one end, directly or indirectly (eg. via a trigger), a distributed application entirely contained in the database is of little use.

Resources