I have written code (mosquitto_publish()) using Mosquitto to publish data to AWS.
My problem is the sequence with which data is arriving on the MQTT broker. In the Paho client, I see waitForCompletion(), but nothing similar in Mosquitto. Would anyone please help me in dealing with this problem ?
Based on the mosquitto_publich documentation, the function returns when sending has been "successful". MQTT does not guarantee the order in which messages arrive, so you should arguably watch for the arrival rather than the sending, and avoid having two messages race each other to the broker. With QoS 0, the client never knows if a message arrived; that requires QoS 1 or 2, for which additional communications are exchanged. Raise the quality of service, and you can use mosquitto_max_inflight_messages_set(mosq, 1) so that the client queues any additional messages until it receives confirmation from the server. This may be even more efficient than "waiting" for completion, since non-MQTT operations can continue. The queue might pile up if you send bursts of many messages.
The more complex alternative is to send messages unrestricted, but include an index with each, so that the subscriber can sort them upon receipt (for which it would need its own queue and delay). Not recommended if this burden is going to fall on multiple subscribers.
Related
I am trying to develop an application which reads byte arrays (representing a C structure and each array associated with a UUID) from a Cache and sends it to Kafka Server through a Producer application in C.
The kafka producer application accumulates a fixed number of such packets and sends them at once.
What I wish to accomplish is get an acknowledgment that which of the messages in the batch were successfully delivered and get their UUID back so that I can clear them from my cache application. I am new to kafka, Please guide me as what is the best way to get this done.
Why not create another consumer topic that 'listens' for messages that have been acknowledged, and then removes them from the cache. Then the producer can just push messages to that topic when required.
When sending messages to Kafka, you can configure your Producer to request acknowledgments (usually exposed as a config called acks). That allows the producer to know if a message was successfully or not.
For example, with librdkafka (in my opinion the best C Kafka client) you can get a delivery report callback when a produce request completes. It contains the message and an error in case of failure. That should enable you to easily identify which messages where sent successfully and mark them as done.
See the rd_kafka_conf_set_dr_msg_cb method to configure a delivery report callback. The simple producer example demonstrate its usage.
We have created a thing using AWS-IoT service. We have created a topic in that particular thing. Subscriber has subscribed to that topic and publisher is sending messages to that topic.
Below is the publisher messaging order:
message 0
message 1
message 2
message 3
message 4
At the subscriber end the sequence of messages is not maintained. It's showing like this:
message 0
message 1
message 4
message 2
message 3
True, in AWS IoT, the message broker does not guarantee order while they deliver messages to the devices.
The reason being that in a typical distributed systems architecture, a single message from the publisher to the subscriber shall take multiple paths to ensure that the system is highly available and scalable. In the case of AWS IoT, the Device Gateway supports the publisher subscriber messaging pattern and enables scalable, low-latency, and low-overhead communication.
However, based on the type of use case, there are many possible solutions that can be worked out. There should be a logic such that the publishers themselves shall do the co-ordination. One generic or simple approach could be that a sequence number addition at the device side should be sufficient to handle the ordering of the messages between publisher and subscriber. On the receiver, a logic to process or discard based on checking of the ordering based on sequence number should be helpful.
As written in the documentation of AWS
The message broker does not guarantee the order in which messages and
ACK are received.
I guess its too late to answer to this question but I'll still go ahead so others facing this issue can have a work around. I faced a similar scenario and I did the following to make sure that the order is maintained.
I added sequence ID or timestamp to the payload sent to the broker from my iot device (can be any kind of client)
I then configured the IoT rules engine (add actions) to send the messages directly to DynamoDB where the data was automatically stored in a sorted manner (needs to be configured to sort by seqID).
Then I used Lambda to pull out the data from DynamoDB for my further workflow but you can use whatever service according to yours.
So I was looking at using Google's Pub/Sub service for queues but by trial and error I came to a conclusion that I have no idea what it's good for in real applications.
Google says that it's
A global service for real-time and reliable messaging and streaming
data
but the way it work is really strange to me. It holds acked messages up to 7 days, if the subscriber re-subscribes it will get all the messages from the past 7 days even if it already acked them, acked messages will most likely be sent again to the same subscriber that acked them already and there's no FIFO as well.
So I really do not understand how one should use this service if the only thing that it guarantees is that a message will be delivered at least once to any subscriber. This cannot be used for idempotent actions, each subscriber has to store an information about all messages that were acked already so it won't process the message multiple times and so on...
Google Cloud Pub/Sub has a lot of different applications where decoupled systems need to send and receive messages. The overview page offers a number of use cases including balancing work loads, logging, and event notifications. It is true that Google Cloud Pub/Sub does not currently offer any FIFO guarantees and that messages can be redelivered.
However, the fact that the delivery guarantee is "at least once" should not be taken to mean acked messages are redelivered when a subscriber re-subscribers. Redelivery of acked messages is a rare event. This generally only happens when the ack did not make it all the way back to the service due to a networking issue, a machine failure, or some other exceptional condition. While that means that apps do need to be able to handle this case, it does not mean it will happen frequently.
For different applications, what happens on message redelivery can differ. In a case such as cache invalidation, mentioned in the overview page, getting two events to invalidate an entry in a cache just means the value will have to be reloaded an extra time, so there is not a correctness concern.
In other cases, like tracking button clicks or other events on a website for logging or stats purposes, infrequent acked message redelivery is likely not going to affect the information gathered in a significant way, so not bothering to check if events are duplicates is fine.
For cases where it is necessary to ensure that messages are processed exactly once, then there has to be some sort of tracking on the subscriber side to ensure this is the case. It might be that the subscriber is already accessing and updating an underlying database in response to messages and duplicate events can be detected via that storage.
I am implementing a callback in java to store messages in a database. I have a client subscribing to '#'. But the problem is when this # client disconnects and reconnect it adds duplicate entries in the database of retained messages. If I search for previous entries bigger tables will be expensive in computing power. So should I allot a separate table for each sensor or per broker. I would really appreciate if you suggest me better designs.
Subscribing to wildcard with a single client is definitely an anti-pattern. The reasons for that are:
Wildcard subscribers get all messages of the MQTT broker. Most client libraries can't handle that load, especially not when transforming / persisting messages.
If you wildcard subscriber dies, you will lose messages (unless the broker queues endlessly for you, which also doesn't work)
You essentially have a single point of failure in your system. Use MQTT brokers which are hardened for production use. These are much more robust single point of failures than your hand-written clients. (You can overcome the SIP through clustering and load balancing, though).
So to solve the problem, I suggest the following:
Use a broker which can handle shared subscriptions (like HiveMQ or MessageSight), so you can balance all messages between many clients
Use a custom plugin for doing the persistence at the broker instead of the client.
You can also read more about that topic here: http://www.hivemq.com/blog/mqtt-sql-database
Also consider using QoS = 3 for all message to make sure one and only one message is delivered. Also you may consider time-stamp each message to avoid inserting duplicate messages if QoS requirement is not met.
I am struggling to work out how to use zmq to implement the architecture I need. I have a classic publish/subscribe situation except that once client x has subscribed to a topic I need the topic data to be sent to it to be cached if the client dies and resent on reconnect. The data order is important and I can't miss messages should the client be offline for a while.
The PUB/SUB pattern doesn't seem to know about individual clients and will just stop sending to client x if it dies. Plus I can't find out this has happened and cache the messages, or know when it reconnects.
To try to get around this I used the REQ/REP pattern so the clients can announce themselves and have some persistence but this is not ideal for a couple of reasons:
1) The clients must constantly ask "got any data for me?" which offends my sensibilities
2) What happens if there's no data to send to client x but there is to client y? Without zmq I'd have had a thread per client and simply block the one with no data but I can't block client x without also blocking client y in a single thread.
Am I trying to shove a round peg in a square hole, here? Is there some way I can get feedback from PUB saying 'failed to send to client x'? so I can cache the messages instead? Or is there some other pattern I should be using?
Otherwise it's back to low level tcp for me...
Many thanks;
Jeremy
This is an area of active research.
I'm currently working on something similar. Our solution is to have a TCP "back channel" on which to receive missed data and have the subscribers know what the last successfully received publication was so that when they reconnect, they can ask for publications since that one.
In some sense you are trying to shove a round peg in a square hole. You have choosen the tool - PUB/SUB - and are trying to solve a problem it are not designed to solve, at least not without some additional design.
The PUB/SUB is an unreliable broadcast. The client can miss messages for several reasons:
Subscribers join late, so they miss messages the server already sent.
Subscribers can fetch messages too slowly, so queues build up and then overflow.
Subscribers can drop off and lose messages while they are away.
Subscribers can crash and restart, and lose whatever data they already received.
etc...
For REQ/RSP the client do not have to constantly ask "got any data for me?", instead the client should probably acknowledge every data so that the server can send correct data next time. If the server has nothing to send, it is just quite.
eg.
client server
Hello ---------------->
(wait until something exist to send)
<-------------------- Msg 1
Ack 1 ---------------->
(wait ...)
<-------------------- Msg 2
...
There are several good ways to do what you want with zmq. First of all you should try to design your protocol. What shall happen when I connect? Should I get any old messages then? If so, how old? If i miss a message when I am connected, should I be able to get it? If the client restarts, should I then get any old messages?
I strongly recommend the very good zmq guide http://zguide.zeromq.org/page:all that have a lot of very good information regardning different ways to get reliability in a protocol. Read the complete guide, including the chapters 4 and 5 whih discuss different techniques on getting a reliable transport. Based on your problem discussion: the Chapter 5 seems like a good start. Try out some of the examples. Then design your protocol.
What about adding an Archiver process. Part of a Client's subscription process would be to also notify the Archiver to start archiving the same subscription(s). The Archiver would keeps all the messages received in an ordered list.
The Clients would record the time or id of the last published message they received. When they started after a crash, they would first contact the Archiver and say "Give me all messages since X". And they would resubscribe with the Publisher. When a client receives the same message from both the Publisher and the Archiver, it tells the Archiver to stop replaying.
The Archiver could purge messages older then the max expected down time for an offline client. Or alternately, Clients could periodically check in to say "I am up to date with message Y", allowing purging of all older items.