JMS auto persistance and auto recovery - database

I'm trying to solve the next task:
I have 1 output queue and 1 input queue and database.
If output queue is not available, I need to persist all input messages to DB.
As soon as output queue being available, input queue (or broker) should start send all messages from DB and, at the same time, delete these messages from DB. All should be done automatically, not manually.
Are any message brokers (ActiveMQ, RabbitMQ) have "out of the box" solution or not?

I don't think any of the messaging providers provide out of the box support that you are asking for. You need to write an application to do that job. The application can be very simple that uses global transactions to coordinate putting messages to queue and removing from database.
If your business logic allows, I would suggest you to look at the possibility of using persistent messages when putting to input queue. This way you can avoid persisting messages to a database when output queue is not available. When the output queue becomes available, your application can pull messages from input queue, process and put to output queue.

Related

Eventual consistency with both database and message queue records

I have an application where I need to store some data in a database (mysql for instance) and then publish some data in a message queue. My problem is: If the application crashes after the storage in the database, my data will never be written in the message queue and then be lost (thus eventual consistency of my system will not be guaranted).
How can I solve this problem ?
I have an application where I need to store some data in a database (mysql for instance) and then publish some data in a message queue. My problem is: If the application crashes after the storage in the database, my data will never be written in the message queue and then be lost (thus eventual consistency of my system will not be guaranted). How can I solve this problem ?
In this particular case, the answer is to load the queue data from the database.
That is, you write the messages that need to be queued to the database, in the same transaction that you use to write the data. Then, asynchronously, you read that data from the database, and write it to the queue.
See Reliable Messaging without Distributed Transactions, by Udi Dahan.
If the application crashes, recovery is simple -- during restart, you query the database for all unacknowledged messages, and send them again.
Note that this design really expects the consumers of the messages to be designed for at least once delivery.
I am assuming that you have a loss-less message queue, where once you get a confirmation for writing data, the queue is guaranteed to have the record.
Basically, you need a loop with a transaction that can roll back or a status in the database. The pseudo code for a transaction is:
Begin transaction
Insert into database
Write to message queue
When message queue confirms, commit transaction
Personally, I would probably do this with a status:
Insert into database with a status of "pending" (or something like that)
Write to message queue
When message confirms, change status to "committed" (or something like that)
In the case of recovery from failure, you may need to check the message queue to see if any "pending" records were actually written to the queue.
I'm afraid that answers (VoiceOfUnreason, Udi Dahan) just sweep the problem under the carpet. The problem under carpet is: How the movement of data from database to queue should be designed so that the message will be posted just once (without XA). If you solve this, then you can easily extend that concept by any additional business logic.
CAP theorem tells you the limits clearly.
XA transactions is not 100% bullet proof solution, but seems to me best of all others that I have seen.
Adding to what #Gordon Linoff said, assuming durable messaging (something like MSMQ?) the method/handler is going to be transactional, so if it's all successful, the message will be written to the queue and the data to your view model, if it fails, all will fail...
To mitigate the ID issue you will need to use GUIDs instead of DB generated keys (if you are using messaging you will need to remove your referential integrity anyway and introduce GUIDS as keys).
One more suggestion, don't update the database, but inset only/upsert (the pending row and then the completed row) and have the reader do the projection of the data based on the latest row (for example)
Writing message as part of transaction is a good idea but it has multiple drawbacks like
If your
a. database/language does not support transaction
b. transaction are time taking operation
c. you can not afford to wait for queue response while responding to your service call.
d. If your database is already under stress, writing message will exacerbate the impact of higher workload.
the best practice is to use Database Streams. Most of the modern databases support streams(Dynamodb, mongodb, orcale etc.). You have consumer of database stream running which reads from database stream and write to queue or invalidate cache, add to search indexer etc. Once all of them are successful you mark the stream item as processed.
Pros of this approach
it will work in the case of multi-region deployment where there is a regional failure. (you should read from regional stream and hydrate all the regional data stores.)
No Overhead of writing more records or performance bottle necks of queues.
You can use this pattern for other data sources as well like caching, queuing, searching.
Cons
You may need to call multiple services to construct appropriate message.
One database stream might not be sufficient to construct appropriate message.
ensure the reliability of your streams, like redis stream is not reliable
NOTE this approach also does not guarantee exactly once semantics. The consumer logic should be idempotent and should be able to handle duplicate message

Implement persistence message queue using plain file in Linux

Here i want to implement persistence queue in C programming.
Here i want save messages to persistence queue and then i want to send them.
If my embedded device restarts and when starts again then and then i can also send messages from persistence message queue which are pending.
Can any one have some idea how i can implement this and how it will works?
Thanks
Store it on some persistent storage.
There's not much more to tell you with the information you provided.
If you want it to be persistent than you have to store data on hard drive. What I would suggest is using http://www.sqlite.org/ for it. There are bindigs for many languages.
A persistent message is a message that must not be lost, even if the broker fails.
A persistent queue is able to write messages to disk, so that they will not be lost in the event of system shut down or failure.
Now, messages can be either persistent or non-persistent that travel through the persistent queues.
When a sender sends a persistent message to the broker, it routes it to the recipient queue and waits for the message to be written to the persistent store, before acknowledging delivery to the actual sender.
If a queue is not persistent, messages on the queue are not written to disk.
If a message is not persistent, it is not written to disk even if it is on a persistent queue.
When a reciever queue reads a message from a persistent queue, it is not removed from the queue until the reciever acknowledges the message.
Now you have to put in a journaling mechanism, to keep record of the states of the messages and the brokers on the disk.Then you have to manage caches for the messages and the journals, in a proper order.
This is a simple idea of what a persistent queue is supposed to be and how to write one.
Persistent queues are used by many proprietary software systems like the IBM WebSphere, RedHat's MRG,etc. Refer them for more idea.

In AWS or Azure cloud architecture, why would I choose message queues over polling a database?

In the past I have used message queues to handle spikes in demand. This system works fine, except for logging purposes. I write successfully processed messages to a database for reporting and logging. This makes me wonder why I don't just write the message into a database from the beginning, and have my "worker roles" poll the database, rather than the message queue.
I'm guessing this is not the best design because as the database grows, polling a huge database just to look for one "unchecked" record to process will become very slow, whereas a message queue just gives me one if I ask for it instantaneously.
Am I missing something? Are there other reasons to choose a message queue over polling a database? I would love to offer users the ability to see what has yet to be processed (floating in the queue) but that operation takes much longer than running a query on the database, so it seems to be a tradeoff.
Thanks for any input.
One other reason that springs to mind is blocking/locking. Typically, if you just poll a database looking for work, it'll work reasonably well as long as you have only one worker digesting the messages. However, if you want to horizontally scale out, and throw more workers at the problem, you'll typically end up causing lock escalations as you change the work messages in your database based "queue" from "needs to get run" to "ran successfully" or whatever.
Using the message queue takes care of this trickiness for you, as all the thread safety and locking/blocking is out of the way.

Message Queue vs Task Queue difference

I wonder what is the difference between them. Are they describing the same thing?
Is Google App Engine Service Task Queue is an implementation of Message Queue?
I asked a similar question on some Developer Community Groups on Facebook. It was not about GoogleAppEngine specifically - i asked in more of a general sense to determine use case between RabbitMQ and Celery. Here are the responses I got which I think is relevant to the topic and fairly clarifies the difference between a message queue and a task queue.
I asked:
Will it be appropriate to say that "Celery is a
QueueWrapper/QueueFramework which takes away the complexity of having
to manage the internal queueManagement/queueAdministration activities
etc"?
I understand the book language which says "Celery is a task queue" and
"RabbitMQ is a message broker". However, it seems a little confusing
as a first-time celery user because we have always known RabbitMQ to
be the 'queue'.
Please help in explaining how/what celery does in constrast with
rabbitMQ
A response I got from Abu Ashraf Masnun
Task Queue and Message Queue. RabbitMQ is a "MQ". It receives messages
and delivers messages.
Celery is a Task Queue. It receives tasks with their related data,
runs them and delivers the results.
Let's forget Celery for a moment. Let's talk about RabbitMQ. What
would we usually do? Our Django/Flask app would send a message to a
queue. We will have some workers running which will be waiting for new
messages in certain queues. When a new message arrives, it starts
working and processes the tasks.
Celery manages this entire process beautifully. We no longer need to
learn or worry about the details of AMQP or RabbitMQ. We can use Redis
or even a database (MySQL for example) as a message broker. Celery
allows us to define "Tasks" with our worker codes. When we need to do
something in the background (or even foreground), we can just call
this task (for instant execution) or schedule this task for delayed
processing. Celery would handle the message passing and running the
tasks. It would launch workers which would know how to run your
defined tasks and store the results. So you can later query the task
result or even task progress when needed.
You can use Celery as an alternative for cron job too (though I don't
really like it)!
Another response I got from Juan Francisco Calderon Zumba
My understanding is that celery is just a very high level of
abstraction to implement the producer / consumer of events. It takes
out several painful things you need to do to work for example with
rabbitmq. Celery itself is not the queue. The events queues are stored
in the system of your choice, celery helps you to work with such
events without having to write the producer / consumer from scratch.
Eventually, here is what I took home as my final learning:
Celery is a queue Wrapper/Framework which takes away the complexity of
having to manage the underlying AMQP mechanisms/architecture that come
with operating RabbitMQ directly
GAE's Task Queues are a means for allowing an application to do background processing, and they are not going to serve the same purpose as a Message Queue. They are very different things that serve different functions.
A Message Queue is a mechanism for sharing information, between processes, threads, systems.
An AppEngine task Queue is a way for an AppEngine application to say to itself, I need to do this, but I am going to do it later, outside of the context of a client request.
Might differ depending on the context, but below is my understanding:
Message queue
Message queue is the message broker part - a queue data structure implementation, where you can:
Enqueue/produce/push/send (different terms depending on the platform, but refers to the same thing) message to.
Dequeue/consume/pull/receive message from.
Provides FIFO ordering.
Task queue
Task queue, on the other hand, is to process tasks:
At a desired pace - how many tasks can your system handle at the same time? Perhaps determined by the number of CPU cores on your machine, or if you're on Kubernetes, number of nodes and their size. It's about concurrency control, or the less-cool term, "buffering".
In an async way - non-blocking task processing. Processes tasks in the background, so your main process can go do other stuff after kicking off a task. Server API over HTTP is a popular use case, where you want to respond quickly to the client because HTTP request usually has a short timeout (<= 30s), especially when your API is triggered by end user (humans are impatient). If your task takes longer than seconds, you want to consider bring it off to the background, and give a API response like "OK I received your request, I'll process it when I have time".
Their difference
As you can see, message queue and task queue focus on different aspects, they can overlap, but not necessarily.
An example for task queue but not message queue - if your tasks don't care about ordering - each task does not depend on one another - then you don't need a "queue", FIFO data structure. You can, but you don't have to. You just need a place to store the buffered tasks like a pool, a simple SQL/NoSQL database or even S3 might suffice.
An opposite example is push notification. You use message queue but not necessarily task queue. Server generates events/notifications and wants to deliver them to the client. The server will push notifications in the queue. The client consumes/pulls down notifications from the queue when they are ready to do so. Products like GCP PubSub, AWS SNS can be used for this.
Takeaway
Task queue is usually more complicate than a message queue because of the concurrency control, not to mention if you want horizontal scaling like distributing workers across nodes to optimize concurrency.
Tools like Celery are task queue + message queue baked into one. There aren't many tools like Celery as I know that do both, guess that's why it's so popular (alternatives are Bull or Bee in NodeJS, or if you know more please let me know!).
My company recently had to implement a task queue. While googling for the proper tool these two terms confused me a lot, because I kind of know what I want, but don't know how people call it and what keyword I should search by.
I personally haven't used AppEngine much so cannot answer that, but you can always check for the points above to see if it satisfies the requirements.
If we only talk about the functionality then it's would be hard to discern the difference.
In my company, we try and fail miserably due to our misunderstanding between the two.
We create our worker queue (aka task queue aka scheduler aka cron)
and we use it for long polling. We set the task schedule 5 sec into the future (delay) to trigger the polling code. The code fires a request and checks the response. If the condition doesn't meet we would create a task again to extend the polling and not extend otherwise.
This is a DB, network and computationally intensive. Our new use case requires a fast response we have to reduce the delay to 0.1 and that is a lot of waste per polling.
So this is the prime example where technology achieve the same goal but not the same proficiency
So the answer is the main difference is in the goal Message Queue and Task Queue try to achieve.
Good read:
https://stackoverflow.com/a/32804602/3422861
If you think in terms of browser’s JavaScript runtime environment or Nodejs JavaScript runtime environment, the answer is:
The difference between the message queue and the micro-task queue (such as Promises is) the micro-task queue has a higher priority than the message queue, which means that Promise task inside the micro-task queue will be executed before the callbacks inside the message queue.

How to reduce flooding a Service Broker queue?

I'm new to using the SQL Service 2005 Service Broker. I've created queues and successfully got conversations going, etc. However, I want to sort of "throttle" messages, and I'm not sure how to go about that.
Messages are sent by a stored proc which is called by a multi-user application. Say 20 users cause this proc to be called once each within a 30 second period of time, it only needs to be sent once. So I think I need some way from my proc to see if a message was sent within in the last 30 seconds? Is there a way to do that?
One idea I had was to send a message to a "response" queue that indicates if the request queue activation proc has been called. Then in my stored proc (called by user app) see if that particular message has been called recently. Problem is I don't want this to mess up the response queue. Can one peek at a queue (not receive) to see if a message exists in it?
Or is there a more simple way to accomplish what I'm after?
Yes you can peek at a queue to see if a message is in it before hand. Simply query the queue using SELECT instead of RECEIVE and you can look at the data.
A better bet would be to send the messages and have the stored procedure which receives the messages decide if the message should be tossed out or not.
I send hundreds of thousands of messages to service broker at a time without any sort of performance issue.
If you are seeing performance issues then try sending more than one message per conversation as that is the quickest and easiest way to improve Service Broker performance.
Not sure if you could do this in SB somehow, but could you just have a table with a timestamp field in it that was updated when a message is sent. The proc would check for a time diff of > 30sec and send.

Resources