I have a google app engine code that tries to send a mail with an attachment of size 379KB. The mail has two recipients - one on the "To" list and myself on the "BCC" list. Apparently, GAE is treating this as 2 different mails which makes it an attempt to send mails with attachment size 758KB(379*2) and is resulting in QuotaExceededException as it exceeds the per minute quota of 500 odd KB/minute. While the mail reaches the recipient on the "To" list, the one on the Bcc (myself) is not receiving the mail.
Can task queue service be considered for solution to this problem? will the task queue framework retry transmission of the mail to recipients who did not get the mail whenever QuotaExceededException occurs?
Further, I plan to extend the aforementioned code in such a way that it would send the same mail (with attachment) to several users. This would obvioulsy result in QuotaExceededException if transmission to all recipients is attempted without any time gap. Can Task queue service help me in this case in any way?
I think that Task Queues would cover this use case nicely. In fact, the example that Google uses in its documentation of Task Queues is one in which emails are sent through them.
Two things to think about:
Google lists Task Queues as an
experimental feature that may be
subject to change in future
releases, so if you are using this
for production code, be prepared for
your application's behavior to
change suddenly and without warning.
You'll need to configure your queue
such that it does not process emails
faster than they can be sent without
violating your quotas. Check out the
Queue Concepts section in the documentation.
Finally, have you considered hosting this large attachment as a URL and having the email contain a link to it? That'd make sending the emails much easier, and it'd be kinder to your overall bandwidth consumption, as only the recipients who really wanted it would get it.
Almost. The task queue will retry an action until it succeeds, but it will retry the whole task. AFAIK it doesn't know or remember anything about partial success. So if you just do your current action (sending to two recipients) as a task, I suspect that bad things will happen to the recipient in the To: field, as the task keeps sending them an email but failing overall, once a minute, forever...
So, you'll want to use two tasks (on the same queue): one task for each recipient.
Related
We're sending messages to Apache Camel using RabbitMQ.
We have a "sender" and a Camel route that processes a RabbitMQ message sent by the sender.
We're having deployment issues regarding which end of the system comes up first.
Our system is low-volume. I am sending perhaps 100 messages at a time. The point of the message is to reduce 'temporal cohesion' between a thing happening in our primary database, and logging of same to a different database. We don't want our front-end to have to wait.
The "sender" will create an exchange if it does not exist.
The issue is causing deployment issues.
Here's what I see:
If I down the sender, down Camel, delete the exchange (clean slate), start the sender, then start Camel, and send 100 messages, the system works. (I think because the sender has to be run manually for testing, the Exchange is being created by the Camel Route...)
If I clean slate, and send a message, and then up Camel afterwards, I can see the messages land in RabbitMQ (using the web tool). No queues are bound. Once I start Camel, I can see its bound queue attached to the Exchange. But the messages have been lost to time and fate; they have apparently been dropped.
If, from the current state, I send more messages, they flow properly.
I think that if the messages that got dropped were persisted, I'd be ok. What am I missing?
For me it's hard to say what exactly is wrong, but I'll try and provide some pointers.
You should set up all exchanges and queues to be durable, and the messages persistent. You should never delete any of these entities (unless they are empty and you no longer use them) and maybe look at them as tables in a database. It's your infrastructure of sorts, and as with database, you wouldn't want that the first DB client to create a table that it needs (this of course applies to your use case, at least that's what it seems to me).
In the comments I mentioned flow state of the queue, but with 100 messages this will probably never happen.
Regarding message delivery - persistent or not, the broker (server) keeps them until they are consumed with acknowledgment that's sent back by the consumer (in lot's of APIs this is done automatically but it's actually one of the most important concepts).
If the exchange to which the messages were published is deleted, they are gone. If the server gets killed or restarted and the messages are persisted - again, they're gone. There may as well be some more scenarios in which messages get dropped (if I think of some I'll edit the answer).
If you don't have control over creating (declaring usually in the APIs) exchanges and queues, than (aside from the fact that's it's not the best thing IMHO) it can be tricky since declaring those entities is idempotent, i.e. you can't create a durable queue q1 , if a non durable queue with the same name already exists. This could also be a problem in your case, since you mention the which part of the system comes first thing - maybe something is not declared with same parameters on both sides...
I've developed a python app that registers information from incoming emails and saves this information to the GAE Datastore. Registering the emails works just fine. As part of the registration, emails with the same subject and recipients get a conversation ID. However, sometimes emails enter the system so fast after each other, that emails from the same conversation don't get the same ID. This happens because two emails from the same conversation are being processed at the same time and GAE doesn't see the other entry yet when running a query for this conversation.
I've been thinking of a way to prevent this, and think it would be best if the system processes only one email per user at a time (each sender has his own account). This could be done by having a push task queue that first checks if there is currently an email being processed for this user, and if so, put the new task in a pull queue from which it can be retrieved as soon as the previous task has been finished.
The big disadvantage of this, is that (I think) I can't run the push queue asynchronous, which obviously is a big performance disadvantage. Any ideas on what would be a better way to setup such a process?
Apparently this was a typical race-condition. I've made use of the Transactions functionality to prevent multiple processes writing at the same time. Documentation can be found here: https://cloud.google.com/appengine/docs/python/datastore/transactions
I have a Google App Engine Go application that is handling real-time notifications from a third party server. Those notifications need to be logged and processed more or less on the spot. However, the third party server has a nasty habit of sending two requests at the same time, sometimes 1 milisecond apart from one another - too fast to even make a datastore / memcache write indicating a semaphore.
I am wondering if there is a way to handle such concurrent requests neatly? Ideally I would want to put them on some stack that would be guaranteed to process items on it one at a time. Is something like this possible in GAE Golang?
Do a memcache add for the unique identifier of the message with a short timeout (doesn't actually matter). If the add succeeds, process the message.
I wonder what is the difference between them. Are they describing the same thing?
Is Google App Engine Service Task Queue is an implementation of Message Queue?
I asked a similar question on some Developer Community Groups on Facebook. It was not about GoogleAppEngine specifically - i asked in more of a general sense to determine use case between RabbitMQ and Celery. Here are the responses I got which I think is relevant to the topic and fairly clarifies the difference between a message queue and a task queue.
I asked:
Will it be appropriate to say that "Celery is a
QueueWrapper/QueueFramework which takes away the complexity of having
to manage the internal queueManagement/queueAdministration activities
etc"?
I understand the book language which says "Celery is a task queue" and
"RabbitMQ is a message broker". However, it seems a little confusing
as a first-time celery user because we have always known RabbitMQ to
be the 'queue'.
Please help in explaining how/what celery does in constrast with
rabbitMQ
A response I got from Abu Ashraf Masnun
Task Queue and Message Queue. RabbitMQ is a "MQ". It receives messages
and delivers messages.
Celery is a Task Queue. It receives tasks with their related data,
runs them and delivers the results.
Let's forget Celery for a moment. Let's talk about RabbitMQ. What
would we usually do? Our Django/Flask app would send a message to a
queue. We will have some workers running which will be waiting for new
messages in certain queues. When a new message arrives, it starts
working and processes the tasks.
Celery manages this entire process beautifully. We no longer need to
learn or worry about the details of AMQP or RabbitMQ. We can use Redis
or even a database (MySQL for example) as a message broker. Celery
allows us to define "Tasks" with our worker codes. When we need to do
something in the background (or even foreground), we can just call
this task (for instant execution) or schedule this task for delayed
processing. Celery would handle the message passing and running the
tasks. It would launch workers which would know how to run your
defined tasks and store the results. So you can later query the task
result or even task progress when needed.
You can use Celery as an alternative for cron job too (though I don't
really like it)!
Another response I got from Juan Francisco Calderon Zumba
My understanding is that celery is just a very high level of
abstraction to implement the producer / consumer of events. It takes
out several painful things you need to do to work for example with
rabbitmq. Celery itself is not the queue. The events queues are stored
in the system of your choice, celery helps you to work with such
events without having to write the producer / consumer from scratch.
Eventually, here is what I took home as my final learning:
Celery is a queue Wrapper/Framework which takes away the complexity of
having to manage the underlying AMQP mechanisms/architecture that come
with operating RabbitMQ directly
GAE's Task Queues are a means for allowing an application to do background processing, and they are not going to serve the same purpose as a Message Queue. They are very different things that serve different functions.
A Message Queue is a mechanism for sharing information, between processes, threads, systems.
An AppEngine task Queue is a way for an AppEngine application to say to itself, I need to do this, but I am going to do it later, outside of the context of a client request.
Might differ depending on the context, but below is my understanding:
Message queue
Message queue is the message broker part - a queue data structure implementation, where you can:
Enqueue/produce/push/send (different terms depending on the platform, but refers to the same thing) message to.
Dequeue/consume/pull/receive message from.
Provides FIFO ordering.
Task queue
Task queue, on the other hand, is to process tasks:
At a desired pace - how many tasks can your system handle at the same time? Perhaps determined by the number of CPU cores on your machine, or if you're on Kubernetes, number of nodes and their size. It's about concurrency control, or the less-cool term, "buffering".
In an async way - non-blocking task processing. Processes tasks in the background, so your main process can go do other stuff after kicking off a task. Server API over HTTP is a popular use case, where you want to respond quickly to the client because HTTP request usually has a short timeout (<= 30s), especially when your API is triggered by end user (humans are impatient). If your task takes longer than seconds, you want to consider bring it off to the background, and give a API response like "OK I received your request, I'll process it when I have time".
Their difference
As you can see, message queue and task queue focus on different aspects, they can overlap, but not necessarily.
An example for task queue but not message queue - if your tasks don't care about ordering - each task does not depend on one another - then you don't need a "queue", FIFO data structure. You can, but you don't have to. You just need a place to store the buffered tasks like a pool, a simple SQL/NoSQL database or even S3 might suffice.
An opposite example is push notification. You use message queue but not necessarily task queue. Server generates events/notifications and wants to deliver them to the client. The server will push notifications in the queue. The client consumes/pulls down notifications from the queue when they are ready to do so. Products like GCP PubSub, AWS SNS can be used for this.
Takeaway
Task queue is usually more complicate than a message queue because of the concurrency control, not to mention if you want horizontal scaling like distributing workers across nodes to optimize concurrency.
Tools like Celery are task queue + message queue baked into one. There aren't many tools like Celery as I know that do both, guess that's why it's so popular (alternatives are Bull or Bee in NodeJS, or if you know more please let me know!).
My company recently had to implement a task queue. While googling for the proper tool these two terms confused me a lot, because I kind of know what I want, but don't know how people call it and what keyword I should search by.
I personally haven't used AppEngine much so cannot answer that, but you can always check for the points above to see if it satisfies the requirements.
If we only talk about the functionality then it's would be hard to discern the difference.
In my company, we try and fail miserably due to our misunderstanding between the two.
We create our worker queue (aka task queue aka scheduler aka cron)
and we use it for long polling. We set the task schedule 5 sec into the future (delay) to trigger the polling code. The code fires a request and checks the response. If the condition doesn't meet we would create a task again to extend the polling and not extend otherwise.
This is a DB, network and computationally intensive. Our new use case requires a fast response we have to reduce the delay to 0.1 and that is a lot of waste per polling.
So this is the prime example where technology achieve the same goal but not the same proficiency
So the answer is the main difference is in the goal Message Queue and Task Queue try to achieve.
Good read:
https://stackoverflow.com/a/32804602/3422861
If you think in terms of browser’s JavaScript runtime environment or Nodejs JavaScript runtime environment, the answer is:
The difference between the message queue and the micro-task queue (such as Promises is) the micro-task queue has a higher priority than the message queue, which means that Promise task inside the micro-task queue will be executed before the callbacks inside the message queue.
Google app engine seems to have recently made a huge decrease in free quotas for channel creation from 8640 to 100 per day. I would appreciate some suggestions for optimizing channel creation, for a hobby project where I am unwilling to use the paid plans.
It is specifically mentioned in the docs that there can be only one client per channel ID. It would help if there were a way around this, even if it were only for multiple clients on one computer (such as multiple tabs)
It occurred to me I might be able to simulate channel functionality by repeatedly sending XHR requests to the server to check for new messages, therefore bypassing limits. However, I fear this method might be too slow. Are there any existing libraries that work on this principle?
One Client per Channel
There's not an easy way around the one client per channel ID limitation, unfortunately. We actually allow two, but this is to handle the case where a user refreshes his page, not for actual fan-out.
That said, you could certainly implement your own workaround for this. One trick I've seen is to use cookies to communicate between browser tabs. Then you can elect one tab the "owner" of the channel and fan out data via cookies. See this question for info on how to implement the inter-tab communication: Javascript communication between browser tabs/windows
Polling vs. Channel
You could poll instead of using the Channel API if you're willing to accept some performance trade-offs. Channel API deliver speed is on the order of 100-200ms; if you could accept 500ms average then you could poll every second. Depending on the type of data you're sending, and how much you can fit in memcache, this might be a workable solution. My guess is your biggest problem is going to be instance-hours.
For example, if you have, say, 100 clients you'll be looking at 100qps. You should experiment and see if you can serve 100 requests in a second for the data you need to serve without spinning up a second instance. If not, keep increasing your latency (ie., decreasing your polling frequency) until you get to 1 instance able to serve your requests.
Hope that helps.