Implement persistence message queue using plain file in Linux - c

Here i want to implement persistence queue in C programming.
Here i want save messages to persistence queue and then i want to send them.
If my embedded device restarts and when starts again then and then i can also send messages from persistence message queue which are pending.
Can any one have some idea how i can implement this and how it will works?
Thanks

Store it on some persistent storage.
There's not much more to tell you with the information you provided.

If you want it to be persistent than you have to store data on hard drive. What I would suggest is using http://www.sqlite.org/ for it. There are bindigs for many languages.

A persistent message is a message that must not be lost, even if the broker fails.
A persistent queue is able to write messages to disk, so that they will not be lost in the event of system shut down or failure.
Now, messages can be either persistent or non-persistent that travel through the persistent queues.
When a sender sends a persistent message to the broker, it routes it to the recipient queue and waits for the message to be written to the persistent store, before acknowledging delivery to the actual sender.
If a queue is not persistent, messages on the queue are not written to disk.
If a message is not persistent, it is not written to disk even if it is on a persistent queue.
When a reciever queue reads a message from a persistent queue, it is not removed from the queue until the reciever acknowledges the message.
Now you have to put in a journaling mechanism, to keep record of the states of the messages and the brokers on the disk.Then you have to manage caches for the messages and the journals, in a proper order.
This is a simple idea of what a persistent queue is supposed to be and how to write one.
Persistent queues are used by many proprietary software systems like the IBM WebSphere, RedHat's MRG,etc. Refer them for more idea.

Related

How to publish data synchronously using mosquitto_publish?

I have written code (mosquitto_publish()) using Mosquitto to publish data to AWS.
My problem is the sequence with which data is arriving on the MQTT broker. In the Paho client, I see waitForCompletion(), but nothing similar in Mosquitto. Would anyone please help me in dealing with this problem ?
Based on the mosquitto_publich documentation, the function returns when sending has been "successful". MQTT does not guarantee the order in which messages arrive, so you should arguably watch for the arrival rather than the sending, and avoid having two messages race each other to the broker. With QoS 0, the client never knows if a message arrived; that requires QoS 1 or 2, for which additional communications are exchanged. Raise the quality of service, and you can use mosquitto_max_inflight_messages_set(mosq, 1) so that the client queues any additional messages until it receives confirmation from the server. This may be even more efficient than "waiting" for completion, since non-MQTT operations can continue. The queue might pile up if you send bursts of many messages.
The more complex alternative is to send messages unrestricted, but include an index with each, so that the subscriber can sort them upon receipt (for which it would need its own queue and delay). Not recommended if this burden is going to fall on multiple subscribers.

Not persisting messages when the system comes up in the wrong order

We're sending messages to Apache Camel using RabbitMQ.
We have a "sender" and a Camel route that processes a RabbitMQ message sent by the sender.
We're having deployment issues regarding which end of the system comes up first.
Our system is low-volume. I am sending perhaps 100 messages at a time. The point of the message is to reduce 'temporal cohesion' between a thing happening in our primary database, and logging of same to a different database. We don't want our front-end to have to wait.
The "sender" will create an exchange if it does not exist.
The issue is causing deployment issues.
Here's what I see:
If I down the sender, down Camel, delete the exchange (clean slate), start the sender, then start Camel, and send 100 messages, the system works. (I think because the sender has to be run manually for testing, the Exchange is being created by the Camel Route...)
If I clean slate, and send a message, and then up Camel afterwards, I can see the messages land in RabbitMQ (using the web tool). No queues are bound. Once I start Camel, I can see its bound queue attached to the Exchange. But the messages have been lost to time and fate; they have apparently been dropped.
If, from the current state, I send more messages, they flow properly.
I think that if the messages that got dropped were persisted, I'd be ok. What am I missing?
For me it's hard to say what exactly is wrong, but I'll try and provide some pointers.
You should set up all exchanges and queues to be durable, and the messages persistent. You should never delete any of these entities (unless they are empty and you no longer use them) and maybe look at them as tables in a database. It's your infrastructure of sorts, and as with database, you wouldn't want that the first DB client to create a table that it needs (this of course applies to your use case, at least that's what it seems to me).
In the comments I mentioned flow state of the queue, but with 100 messages this will probably never happen.
Regarding message delivery - persistent or not, the broker (server) keeps them until they are consumed with acknowledgment that's sent back by the consumer (in lot's of APIs this is done automatically but it's actually one of the most important concepts).
If the exchange to which the messages were published is deleted, they are gone. If the server gets killed or restarted and the messages are persisted - again, they're gone. There may as well be some more scenarios in which messages get dropped (if I think of some I'll edit the answer).
If you don't have control over creating (declaring usually in the APIs) exchanges and queues, than (aside from the fact that's it's not the best thing IMHO) it can be tricky since declaring those entities is idempotent, i.e. you can't create a durable queue q1 , if a non durable queue with the same name already exists. This could also be a problem in your case, since you mention the which part of the system comes first thing - maybe something is not declared with same parameters on both sides...

I want to log all mqtt messages of the broker. How should I design schema of database. Avoiding dulplicate entries and fast searching

I am implementing a callback in java to store messages in a database. I have a client subscribing to '#'. But the problem is when this # client disconnects and reconnect it adds duplicate entries in the database of retained messages. If I search for previous entries bigger tables will be expensive in computing power. So should I allot a separate table for each sensor or per broker. I would really appreciate if you suggest me better designs.
Subscribing to wildcard with a single client is definitely an anti-pattern. The reasons for that are:
Wildcard subscribers get all messages of the MQTT broker. Most client libraries can't handle that load, especially not when transforming / persisting messages.
If you wildcard subscriber dies, you will lose messages (unless the broker queues endlessly for you, which also doesn't work)
You essentially have a single point of failure in your system. Use MQTT brokers which are hardened for production use. These are much more robust single point of failures than your hand-written clients. (You can overcome the SIP through clustering and load balancing, though).
So to solve the problem, I suggest the following:
Use a broker which can handle shared subscriptions (like HiveMQ or MessageSight), so you can balance all messages between many clients
Use a custom plugin for doing the persistence at the broker instead of the client.
You can also read more about that topic here: http://www.hivemq.com/blog/mqtt-sql-database
Also consider using QoS = 3 for all message to make sure one and only one message is delivered. Also you may consider time-stamp each message to avoid inserting duplicate messages if QoS requirement is not met.

JMS auto persistance and auto recovery

I'm trying to solve the next task:
I have 1 output queue and 1 input queue and database.
If output queue is not available, I need to persist all input messages to DB.
As soon as output queue being available, input queue (or broker) should start send all messages from DB and, at the same time, delete these messages from DB. All should be done automatically, not manually.
Are any message brokers (ActiveMQ, RabbitMQ) have "out of the box" solution or not?
I don't think any of the messaging providers provide out of the box support that you are asking for. You need to write an application to do that job. The application can be very simple that uses global transactions to coordinate putting messages to queue and removing from database.
If your business logic allows, I would suggest you to look at the possibility of using persistent messages when putting to input queue. This way you can avoid persisting messages to a database when output queue is not available. When the output queue becomes available, your application can pull messages from input queue, process and put to output queue.

Message Queue vs Task Queue difference

I wonder what is the difference between them. Are they describing the same thing?
Is Google App Engine Service Task Queue is an implementation of Message Queue?
I asked a similar question on some Developer Community Groups on Facebook. It was not about GoogleAppEngine specifically - i asked in more of a general sense to determine use case between RabbitMQ and Celery. Here are the responses I got which I think is relevant to the topic and fairly clarifies the difference between a message queue and a task queue.
I asked:
Will it be appropriate to say that "Celery is a
QueueWrapper/QueueFramework which takes away the complexity of having
to manage the internal queueManagement/queueAdministration activities
etc"?
I understand the book language which says "Celery is a task queue" and
"RabbitMQ is a message broker". However, it seems a little confusing
as a first-time celery user because we have always known RabbitMQ to
be the 'queue'.
Please help in explaining how/what celery does in constrast with
rabbitMQ
A response I got from Abu Ashraf Masnun
Task Queue and Message Queue. RabbitMQ is a "MQ". It receives messages
and delivers messages.
Celery is a Task Queue. It receives tasks with their related data,
runs them and delivers the results.
Let's forget Celery for a moment. Let's talk about RabbitMQ. What
would we usually do? Our Django/Flask app would send a message to a
queue. We will have some workers running which will be waiting for new
messages in certain queues. When a new message arrives, it starts
working and processes the tasks.
Celery manages this entire process beautifully. We no longer need to
learn or worry about the details of AMQP or RabbitMQ. We can use Redis
or even a database (MySQL for example) as a message broker. Celery
allows us to define "Tasks" with our worker codes. When we need to do
something in the background (or even foreground), we can just call
this task (for instant execution) or schedule this task for delayed
processing. Celery would handle the message passing and running the
tasks. It would launch workers which would know how to run your
defined tasks and store the results. So you can later query the task
result or even task progress when needed.
You can use Celery as an alternative for cron job too (though I don't
really like it)!
Another response I got from Juan Francisco Calderon Zumba
My understanding is that celery is just a very high level of
abstraction to implement the producer / consumer of events. It takes
out several painful things you need to do to work for example with
rabbitmq. Celery itself is not the queue. The events queues are stored
in the system of your choice, celery helps you to work with such
events without having to write the producer / consumer from scratch.
Eventually, here is what I took home as my final learning:
Celery is a queue Wrapper/Framework which takes away the complexity of
having to manage the underlying AMQP mechanisms/architecture that come
with operating RabbitMQ directly
GAE's Task Queues are a means for allowing an application to do background processing, and they are not going to serve the same purpose as a Message Queue. They are very different things that serve different functions.
A Message Queue is a mechanism for sharing information, between processes, threads, systems.
An AppEngine task Queue is a way for an AppEngine application to say to itself, I need to do this, but I am going to do it later, outside of the context of a client request.
Might differ depending on the context, but below is my understanding:
Message queue
Message queue is the message broker part - a queue data structure implementation, where you can:
Enqueue/produce/push/send (different terms depending on the platform, but refers to the same thing) message to.
Dequeue/consume/pull/receive message from.
Provides FIFO ordering.
Task queue
Task queue, on the other hand, is to process tasks:
At a desired pace - how many tasks can your system handle at the same time? Perhaps determined by the number of CPU cores on your machine, or if you're on Kubernetes, number of nodes and their size. It's about concurrency control, or the less-cool term, "buffering".
In an async way - non-blocking task processing. Processes tasks in the background, so your main process can go do other stuff after kicking off a task. Server API over HTTP is a popular use case, where you want to respond quickly to the client because HTTP request usually has a short timeout (<= 30s), especially when your API is triggered by end user (humans are impatient). If your task takes longer than seconds, you want to consider bring it off to the background, and give a API response like "OK I received your request, I'll process it when I have time".
Their difference
As you can see, message queue and task queue focus on different aspects, they can overlap, but not necessarily.
An example for task queue but not message queue - if your tasks don't care about ordering - each task does not depend on one another - then you don't need a "queue", FIFO data structure. You can, but you don't have to. You just need a place to store the buffered tasks like a pool, a simple SQL/NoSQL database or even S3 might suffice.
An opposite example is push notification. You use message queue but not necessarily task queue. Server generates events/notifications and wants to deliver them to the client. The server will push notifications in the queue. The client consumes/pulls down notifications from the queue when they are ready to do so. Products like GCP PubSub, AWS SNS can be used for this.
Takeaway
Task queue is usually more complicate than a message queue because of the concurrency control, not to mention if you want horizontal scaling like distributing workers across nodes to optimize concurrency.
Tools like Celery are task queue + message queue baked into one. There aren't many tools like Celery as I know that do both, guess that's why it's so popular (alternatives are Bull or Bee in NodeJS, or if you know more please let me know!).
My company recently had to implement a task queue. While googling for the proper tool these two terms confused me a lot, because I kind of know what I want, but don't know how people call it and what keyword I should search by.
I personally haven't used AppEngine much so cannot answer that, but you can always check for the points above to see if it satisfies the requirements.
If we only talk about the functionality then it's would be hard to discern the difference.
In my company, we try and fail miserably due to our misunderstanding between the two.
We create our worker queue (aka task queue aka scheduler aka cron)
and we use it for long polling. We set the task schedule 5 sec into the future (delay) to trigger the polling code. The code fires a request and checks the response. If the condition doesn't meet we would create a task again to extend the polling and not extend otherwise.
This is a DB, network and computationally intensive. Our new use case requires a fast response we have to reduce the delay to 0.1 and that is a lot of waste per polling.
So this is the prime example where technology achieve the same goal but not the same proficiency
So the answer is the main difference is in the goal Message Queue and Task Queue try to achieve.
Good read:
https://stackoverflow.com/a/32804602/3422861
If you think in terms of browser’s JavaScript runtime environment or Nodejs JavaScript runtime environment, the answer is:
The difference between the message queue and the micro-task queue (such as Promises is) the micro-task queue has a higher priority than the message queue, which means that Promise task inside the micro-task queue will be executed before the callbacks inside the message queue.

Resources