Message Queue vs Task Queue difference - google-app-engine

I wonder what is the difference between them. Are they describing the same thing?
Is Google App Engine Service Task Queue is an implementation of Message Queue?

I asked a similar question on some Developer Community Groups on Facebook. It was not about GoogleAppEngine specifically - i asked in more of a general sense to determine use case between RabbitMQ and Celery. Here are the responses I got which I think is relevant to the topic and fairly clarifies the difference between a message queue and a task queue.
I asked:
Will it be appropriate to say that "Celery is a
QueueWrapper/QueueFramework which takes away the complexity of having
to manage the internal queueManagement/queueAdministration activities
etc"?
I understand the book language which says "Celery is a task queue" and
"RabbitMQ is a message broker". However, it seems a little confusing
as a first-time celery user because we have always known RabbitMQ to
be the 'queue'.
Please help in explaining how/what celery does in constrast with
rabbitMQ
A response I got from Abu Ashraf Masnun
Task Queue and Message Queue. RabbitMQ is a "MQ". It receives messages
and delivers messages.
Celery is a Task Queue. It receives tasks with their related data,
runs them and delivers the results.
Let's forget Celery for a moment. Let's talk about RabbitMQ. What
would we usually do? Our Django/Flask app would send a message to a
queue. We will have some workers running which will be waiting for new
messages in certain queues. When a new message arrives, it starts
working and processes the tasks.
Celery manages this entire process beautifully. We no longer need to
learn or worry about the details of AMQP or RabbitMQ. We can use Redis
or even a database (MySQL for example) as a message broker. Celery
allows us to define "Tasks" with our worker codes. When we need to do
something in the background (or even foreground), we can just call
this task (for instant execution) or schedule this task for delayed
processing. Celery would handle the message passing and running the
tasks. It would launch workers which would know how to run your
defined tasks and store the results. So you can later query the task
result or even task progress when needed.
You can use Celery as an alternative for cron job too (though I don't
really like it)!
Another response I got from Juan Francisco Calderon Zumba
My understanding is that celery is just a very high level of
abstraction to implement the producer / consumer of events. It takes
out several painful things you need to do to work for example with
rabbitmq. Celery itself is not the queue. The events queues are stored
in the system of your choice, celery helps you to work with such
events without having to write the producer / consumer from scratch.
Eventually, here is what I took home as my final learning:
Celery is a queue Wrapper/Framework which takes away the complexity of
having to manage the underlying AMQP mechanisms/architecture that come
with operating RabbitMQ directly

GAE's Task Queues are a means for allowing an application to do background processing, and they are not going to serve the same purpose as a Message Queue. They are very different things that serve different functions.
A Message Queue is a mechanism for sharing information, between processes, threads, systems.
An AppEngine task Queue is a way for an AppEngine application to say to itself, I need to do this, but I am going to do it later, outside of the context of a client request.

Might differ depending on the context, but below is my understanding:
Message queue
Message queue is the message broker part - a queue data structure implementation, where you can:
Enqueue/produce/push/send (different terms depending on the platform, but refers to the same thing) message to.
Dequeue/consume/pull/receive message from.
Provides FIFO ordering.
Task queue
Task queue, on the other hand, is to process tasks:
At a desired pace - how many tasks can your system handle at the same time? Perhaps determined by the number of CPU cores on your machine, or if you're on Kubernetes, number of nodes and their size. It's about concurrency control, or the less-cool term, "buffering".
In an async way - non-blocking task processing. Processes tasks in the background, so your main process can go do other stuff after kicking off a task. Server API over HTTP is a popular use case, where you want to respond quickly to the client because HTTP request usually has a short timeout (<= 30s), especially when your API is triggered by end user (humans are impatient). If your task takes longer than seconds, you want to consider bring it off to the background, and give a API response like "OK I received your request, I'll process it when I have time".
Their difference
As you can see, message queue and task queue focus on different aspects, they can overlap, but not necessarily.
An example for task queue but not message queue - if your tasks don't care about ordering - each task does not depend on one another - then you don't need a "queue", FIFO data structure. You can, but you don't have to. You just need a place to store the buffered tasks like a pool, a simple SQL/NoSQL database or even S3 might suffice.
An opposite example is push notification. You use message queue but not necessarily task queue. Server generates events/notifications and wants to deliver them to the client. The server will push notifications in the queue. The client consumes/pulls down notifications from the queue when they are ready to do so. Products like GCP PubSub, AWS SNS can be used for this.
Takeaway
Task queue is usually more complicate than a message queue because of the concurrency control, not to mention if you want horizontal scaling like distributing workers across nodes to optimize concurrency.
Tools like Celery are task queue + message queue baked into one. There aren't many tools like Celery as I know that do both, guess that's why it's so popular (alternatives are Bull or Bee in NodeJS, or if you know more please let me know!).
My company recently had to implement a task queue. While googling for the proper tool these two terms confused me a lot, because I kind of know what I want, but don't know how people call it and what keyword I should search by.
I personally haven't used AppEngine much so cannot answer that, but you can always check for the points above to see if it satisfies the requirements.

If we only talk about the functionality then it's would be hard to discern the difference.
In my company, we try and fail miserably due to our misunderstanding between the two.
We create our worker queue (aka task queue aka scheduler aka cron)
and we use it for long polling. We set the task schedule 5 sec into the future (delay) to trigger the polling code. The code fires a request and checks the response. If the condition doesn't meet we would create a task again to extend the polling and not extend otherwise.
This is a DB, network and computationally intensive. Our new use case requires a fast response we have to reduce the delay to 0.1 and that is a lot of waste per polling.
So this is the prime example where technology achieve the same goal but not the same proficiency
So the answer is the main difference is in the goal Message Queue and Task Queue try to achieve.
Good read:
https://stackoverflow.com/a/32804602/3422861

If you think in terms of browser’s JavaScript runtime environment or Nodejs JavaScript runtime environment, the answer is:
The difference between the message queue and the micro-task queue (such as Promises is) the micro-task queue has a higher priority than the message queue, which means that Promise task inside the micro-task queue will be executed before the callbacks inside the message queue.

Related

Any simple way that concurrent consumers can share data in Camel?

We have a typical scenario in which we have to load-balance a set of cloned consumer applications, each running in a different physical server. Here, we should be able to dynamically add more servers for scalability.
We were thinking of using the round-robin load balancing here. But we don't want a long-running job in the server cause a message to wait in its queue for consumption.
To solve this, we thought of having 2 concurrentConsumers configured for each of the server application. When an older message is processed by a thread and a new message arrives, the latter will be consumed from the queue by the second thread. While processing the new message, the second thread has to check a class (global) variable shared by the threads. If 'ON', it can assume that one thread is active (ie. job is already in progress). In that case, it re-routes the message back to its source queue. But if the class variable is 'OFF', it can start the job with the message data.
The jobs are themselves heavyweight and so we want only one job to be processed at a time. That's why the second thread re-routes the message, if another thread is active.
So, the question is 'Any simple way that concurrent consumers can share data in Camel?'. Or, can we solve this problem in an entirely different way?
For a JMS broker like ActiveMQ you should be able to simply use concurrent listeners on the same queue. It should do round robin but only with the consumers that are idle. So basically this should just work. Eventually you have to set the prefetch size to 1 as prefetch might cause a consumer to take messages even when a long running process will block them.

In AWS or Azure cloud architecture, why would I choose message queues over polling a database?

In the past I have used message queues to handle spikes in demand. This system works fine, except for logging purposes. I write successfully processed messages to a database for reporting and logging. This makes me wonder why I don't just write the message into a database from the beginning, and have my "worker roles" poll the database, rather than the message queue.
I'm guessing this is not the best design because as the database grows, polling a huge database just to look for one "unchecked" record to process will become very slow, whereas a message queue just gives me one if I ask for it instantaneously.
Am I missing something? Are there other reasons to choose a message queue over polling a database? I would love to offer users the ability to see what has yet to be processed (floating in the queue) but that operation takes much longer than running a query on the database, so it seems to be a tradeoff.
Thanks for any input.
One other reason that springs to mind is blocking/locking. Typically, if you just poll a database looking for work, it'll work reasonably well as long as you have only one worker digesting the messages. However, if you want to horizontally scale out, and throw more workers at the problem, you'll typically end up causing lock escalations as you change the work messages in your database based "queue" from "needs to get run" to "ran successfully" or whatever.
Using the message queue takes care of this trickiness for you, as all the thread safety and locking/blocking is out of the way.

Can I block on a Google AppEngine Pull Task Queue until a Task is available?

Can I block on a Google AppEngine Pull Task Queue until a Task is available? Or, do I need to poll an empty queue until a task is available?
You need to poll the queue. A typical use case for pull queues is to have multiple backends, each obtaining one thousand tasks at a time.
For use cases where there are no tasks in the queue for hours at a time, push queues can be a better fit.
Not 100% sure about your question, but thought to try an answer. Having a pull task queue started by a cron may apply. Saves the expense of running a backend. I have client-side log data that needs to be serialized and stored. On-line handler simply passes the client data to a task pull queue. Cron fires up the task every minute, and up to 10k log items get serialized and stored each run. (Change settings according to your loads -- these more than meet my modest needs.) In this case, the queue acts as a buffer, and load spikes get spread across even processing units. Obviously not useful if you want quick access to the TQ data or have wildly unpredictable loads. Very importantly the log data serialization cuts data writes by a factor of 1,000. May not apply to your question, so I'll end with a big HTH. -stevep

How to process database writes asynchronously (maybe with a message queue) from Django?

After a user submitted data to my app, I'd like to write to the
database asynchronously, possibly through a message queue.
How do I set up such a system? Are there any pluggable Django apps
that do such message queue-based database writes?
Also how do i handle errors that happens during the async processing?
Would really appreciate any pointers you can give me. Thank you.
Celery as a queue mechanism with a processor on the back end. It's one of the simpler setups, and very effective. You can back it with persistence, or not, as you need. There's a good walk through on setting it with django up on the website as well. Typically you'll run a queue processor as a daemon, import the model bits from Django if you're using those, and do the updates/inserts/etc as you need.
The documentation includes an example of processing a serial task that you can use as a template.
You could take a look at Celery with RabbitMQ or another ghetto queue.

google app engine - is task queue a solution for QuotaExceededException?

I have a google app engine code that tries to send a mail with an attachment of size 379KB. The mail has two recipients - one on the "To" list and myself on the "BCC" list. Apparently, GAE is treating this as 2 different mails which makes it an attempt to send mails with attachment size 758KB(379*2) and is resulting in QuotaExceededException as it exceeds the per minute quota of 500 odd KB/minute. While the mail reaches the recipient on the "To" list, the one on the Bcc (myself) is not receiving the mail.
Can task queue service be considered for solution to this problem? will the task queue framework retry transmission of the mail to recipients who did not get the mail whenever QuotaExceededException occurs?
Further, I plan to extend the aforementioned code in such a way that it would send the same mail (with attachment) to several users. This would obvioulsy result in QuotaExceededException if transmission to all recipients is attempted without any time gap. Can Task queue service help me in this case in any way?
I think that Task Queues would cover this use case nicely. In fact, the example that Google uses in its documentation of Task Queues is one in which emails are sent through them.
Two things to think about:
Google lists Task Queues as an
experimental feature that may be
subject to change in future
releases, so if you are using this
for production code, be prepared for
your application's behavior to
change suddenly and without warning.
You'll need to configure your queue
such that it does not process emails
faster than they can be sent without
violating your quotas. Check out the
Queue Concepts section in the documentation.
Finally, have you considered hosting this large attachment as a URL and having the email contain a link to it? That'd make sending the emails much easier, and it'd be kinder to your overall bandwidth consumption, as only the recipients who really wanted it would get it.
Almost. The task queue will retry an action until it succeeds, but it will retry the whole task. AFAIK it doesn't know or remember anything about partial success. So if you just do your current action (sending to two recipients) as a task, I suspect that bad things will happen to the recipient in the To: field, as the task keeps sending them an email but failing overall, once a minute, forever...
So, you'll want to use two tasks (on the same queue): one task for each recipient.

Resources