I started map reduce job on google app engine but I could not stop it. It is just running and could not find a way to kill it. By looking into already asked questions, I deleted all tasks from default queue and even paused queue but still mapper call-back function is running and eating all my free quota every day in a hour. GAE is just making me frustrating...
Are you sure the MR job is running on the default task queue (it is configurable)? MR uses the task queue to invoke itself. If there are no tasks in the queue (and nothing is currently executing) the job can't proceed. A task is rescheduling itself, so if there was a running task while you were clearing the queue it is possible that it may schedule itself after you cleared. Pausing the queue, waiting a bit (to make sure nothing is running) and clearing it should do the job.
Related
Our AppEngine, written in Python, reads conditionally from BigQuery Table and writes to another BigQuery Source Table.
The above App Engine is triggered by a Cloud Scheduler Job every 15 minutes.
A few times there are multiple Cloud Scheduler Jobs running that cause duplicates in the Source Table.
How do we overcome the above, please?
We're expecting the Cloud Job Scheduler to run the Job one at a time
It seems what you want is that a job is not run (or is paused) if another is still running. If this summary is correct, then something you could consider is...
When you start the job, check the DB for a flag. If the flag isn't there, then you set the flag and the job starts running. When the job is done running, it deletes the flag.
In say 15 minutes, when another job tries to start, it checks for the flag. If it's there, it means the job can't run. You can pause it (sleep) for X seconds/minutes (you have to figure out how to back-off). If the flag isn't there, it runs
I have Cron job which runs every 30mins and queues a task to be executed on a Dynamic Backend (B2).
The Backend loops and does some work, then sleeps for few minutes and then repeats the work till finally the complete job is over after few hours, after which the Backend shuts down. (Till the backend is running, no new Task is actioned)
Now two days in a row, I have seen my Backend stop abruptly (after 1.5hrs) with the familiar "Process terminated because the backend took too long to shutdown.". I have searched through the forums but could not identify WHY exactly my backend shuts down (apart from the theoretical list of reasons that Appengine doc provides). I have checked my DS/Memcache operations, Memory and all looks normal. I upgraded my backend from B1 to B2, but no luck.
Q1. Does anybody know how to debug this issue further?
Q2. Even after this I wish that the job should be completed. If I register a shutdown hook LifecycleManager.getInstance().setShutdownHook(), what is a good way to ensure that the job is resumed (considering that the Cron job could be still 29minutes away from next execution, and I want the job to do its stuff every 2 minutes)
Yes the same has happened to me. I have a backend that uses constant memory and cpu. Apengine shuts it down periodically, usually after 15min but sometimes before that. The docs say that it may get shut down without explanation, it will notify the backend and then shut it down.
You are supposed to handle it gracefully which means it can work by chunks and restart its work. If you. Ant divide the work in chunks dont use backends, use a compute engine instance.
For your first question you'd have to take a closer look at the logs, app engine does promise to indicate shutdown behaviour through a request to /_ah/stop so that would give more insights at the issue.
Now for your second question, stick with app engine's suggestions of having more than one instance. In your case you could move away from looping through some entity infinitely and going to sleep state. Instead have a cron which looks up a task queue and process a single task. If that's processed successfully mark it so somewhere or do so by removing it from the queue after you're done processing it. So in case of failures that task would still be available to be processed unless its marked successful and your additional instances can take over.
I'm seeing very strange behavior in some code that checks the QueueStatistics for a queue to see if any tasks are currently running. To the best of my knowledge there are NO tasks running, and none have been queued up for the past 12+ hours. The development console corroborates this, saying that there are 0 tasks in the queue.
Looking at the QueueStatistics information in my debugger though, confirms that my process is exiting because it's seeing on the order of 500+ (!!!) tasks in the queue. It also says it ran >1000 tasks in the past minute, yet it ran 0 tasks in the past hour. If I parse through the ETA Usec, the time is "accurately" showing as if the ETA is within the next minute of when the QueueStatistics were pulled.
This is happening repeatedly whenever I re-run my servlet, and the first thing the servlet does is check the queue statistics. No other servlets, tasks, or cron jobs are running as this is my local development server. Yet the queue statistics continue to insist I've got hundreds of tasks running.
I couldn't find any other reports of this behavior, but it feels like I must be missing something major here in regards to Queue Statistics. The code I'm using is very simple:
Queue taskQueue = QueueFactory.getQueue("myQueue");
QueueStatistics stats = taskQueue.fetchStatistics();
if (stats.getNumTasks() > 0) { return; }
What am I missing? Are queue statistics entirely unreliable on the local dev server?
If it works as expected when deployed then that's the standard to go by.
Lots of things don't work as they do in the deployed environment (parallel threads are not parallel, backend support is somewhat broken for addressing them at the time of writing) so deploy deploy deploy!
Another example is the channel API. When used locally it uses polling, you'll see 100's of those if you look in the logs/browser debug. But when deployed all is well and it works as expected.
We're interested in using a push queue in GAE but one thing I can't find is the window for recovery in the event of queue or appengine downtime.
For example, I have a push queue with a number of tasks on it. Some of these tasks get pulled off and are executing. Let's say now the queue goes down (for whatever reason), while these tasks are executing, and then comes back up. What is the time window for the restoration of the queue? Is there a set time window of recovery?
There's the possibility of these tasks that were pulled off the queue and executing now reappearing on the queue and having them execute again due to the restoration time window.
We've got idempotence considerations in our code, but it would be good to know if there are time window recovery strategies for the GAE Queue downtimes.
If I understand your question correctly, you're worried that queues can go down in the sense that a knowledge of execution completion is lost for a particular eta range, and those tasks have to be re-executed.
This is not the way things work in the GAE task queue system. We track execution on a task by task basis. (We have to because the tasks need not be dispatched in strict eta order.) The queue doesn't "go down" in the sense you're referring to.
An individual task might execute successfully twice under the current system. When this happens (and it is very rare), there should be at least a minute between successive executions.
There is no time-window recover strategy you need to consider.
I have an app running on a backend instance. It has 11 tasks. The first one is started by /_ah/start and it, in turn, starts the other ten, the worker tasks. The worker tasks have this structure:
done = False
while not done:
do_important_stuff()
time.sleep (30)
if a_long_time_has_passed():
done = True
The execution behavior on app engine is the same every time. The scheduling task runs and enqueues the 10 worker tasks. The first seven worker tasks start running, executing correctly. The last three sit in the queue, never running. The task queue app console shows all ten tasks in the queue with seven of them running.
The app also stop responding to HTTP requests, returning 503 status codes with the logs not showing that my http handlers are getting invoked.
The worker task queue is configured with a maximum rate of 1/s and 2 buckets. It's curious that the admin console shows that the enforced rate is 0.1 sec. Since the tasks run forever, they aren't returning unsuccessful completion status codes. And the cpu load is negligible. The workers mostly do a URL fetch and then wait 30 seconds to do it again.
The logs are not helpful. I don't know where to go to find diagnostics that will help me figure it out. I'm testing in a free account. Could there be a limit of 8 tasks executing at one time? I see nothing like that in the documentation, but I've run out of ideas. Eventually, I'd like to run even more tasks in parallel.
Thanks for any advice you can give me.
There's a limit to how many simultaneous requests a backend instance will process, and it sounds like you're running into that limit.
Alternatives include:
Use regular task queues rather than ones against a backend
Start more than one instance of your backend
Use threading to start threads yourself from the start request, rather than relying on the task queue to do it for you
Note that if your tasks are CPU bound, you're not going to get any extra benefit from running 10 of them over 5, 2, or maybe even 1.