GAE: Queues, Quotas and backend instances - google-app-engine

I have a queue with a lot of tasks in it. I would like to use one backend instance to process this queue. My quota info tells me I have blown my budget on hundreds of frontend instance hours and have not used any backend instance hours. As I had configured only one backend instance, I was expecting to be charged no more than 1 (backend) instance hour per hour. Here is my configuration:
backends.yaml
backends:
- name: worker
class: B8
instances: 1
options:dynamic
queue.yaml
- name: import
rate: 20/s
bucket_size: 40
adding tasks to queue in my script
deferred.defer(importFunction, _target='worker', _queue="import")
bill status
Resource Usage
Frontend Instance Hours 198.70 Instance Hours
Backend Instance Hours 0.00 Instance Hours
Task Headers
X-AppEngine-Current-Namespace
Content-Type application/octet-stream
Referer http://worker.appname.appspot.com/_ah/queue/deferred
Content-Length 1619
Host worker.appname.appspot.com
User-Agent AppEngine-Google; (+http://code.google.com/appengine)

I needed to deploy my backend code:
appcfg.py backends update dir instance_name

Related

How to increase Cloud Scheduler request timeout deadline?

I want to migrate from App Engine Cron jobs to Cloud Scheduler, but in Cloud Scheduler the request deadline timeout is 60 seconds, not the 10 minutes that has the requests from Cron jobs.
Is there a way to configure Cloud Scheduler App Engine request's to have a deadline timeout of 10 minutes?
This will set the deadline for a job to 30mins. Which is the max for HTTP targets.
gcloud beta scheduler jobs update http <job> --attempt-deadline=1800s --project <project>
The allowed duration for this deadline is: For HTTP targets, between 15 seconds and 30 minutes. For App Engine HTTP targets, between 15 seconds and 24 hours.
A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".
Source: https://cloud.google.com/scheduler/docs/reference/rest/v1/projects.locations.jobs#Job
According to their scheduler.v1beta1, it is possible to set that Deadline using the attemptDeadline.
The deadline for job attempts. If the request handler does not respond by this deadline then the request is cancelled and the attempt is marked as a DEADLINE_EXCEEDED failure. The failed attempt can be viewed in execution logs. Cloud Scheduler will retry the job according to the RetryConfig.
The allowed duration for this deadline is:
For HTTP targets, between 15 seconds and 30 minutes.
For App Engine HTTP targets, between 15 seconds and 24 hours.
For PubSub targets, this field is ignored.
https://cloud.google.com/nodejs/docs/reference/scheduler/0.3.x/google.cloud.scheduler.v1beta1#.Job
When we look at Cloud Scheduler, we see that when the time is reached to fire a job the request to fire that job may fail. At this point, the request will be retried based on the configuration of that job ... see:
https://cloud.google.com/sdk/gcloud/reference/beta/scheduler/jobs/create/http
Among these settings we find:
--max-backoff
--max-doublings
--max-retry-attempts
--max-retry-duration
--min-backoff
It seems that if we want to keep trying for a solid 10 minutes we might be able to specify:
--max-backoff: 0s
--max-doublings: 0
--max-retry-attempts: 0
--max-retry-duration: 10m
--min-backoff: 0s

Google App Engine Cron not triggering endpoint at specific times

We have multiple App Engine Cron entries triggering our App Engine application, but recently we detected a decrease on the number of the processed events handled by one of the endpoints of our application. By looking at the App Engine Cron logs for this specific Cron entry on StackDriver, we found out that, during the days we invesgated (March 11-15), that are missing entries. Most of the missing triggers coincide through the days (12:15, 14:15, 16:15, 18:15, 20:15, 22:15, 00:15).
The screenshot below displays one specific day, and the red lines indicate the missing entries:
There are no requests with HTTP status code different than 200.
This is the configuration of the specific Cron entry (replaced some words with XXX due to business restrictions):
- description: 'Hourly job for XXX'
url: /schedule/bigquery/XXX
schedule: every 1 hours from 00:15 to 23:15
timezone: UTC
target: XXX
retry_parameters:
min_backoff_seconds: 2.5
max_doublings: 5
Could someone # GCP side take a look? The task name is 53751dd6a70fb9af38f49993b122b79f.
it seems like if the request takes longer than an hour, then the next one gets skipped (i.e. cron doesn't launch the next iteration if the current iteration is still running)
maybe do the actual work in a separate task and then the only thing the cron task does is launch this separate task

google app engine request duration

how i can overcome the 60 seconds limitation.
i read that configure the yaml file to manual_scaling will help but it didn't. after 60 sec the server reset. the server developed with nodesjs
i need to send 2000 emails every x time. i need the ability to run process for 10-20 minutes
this is the yaml file:
runtime: nodejs8
instance_class: B4
manual_scaling:
instances: 1
you don't have to use the "default" service.
https://cloud.google.com/appengine/docs/standard/java/an-overview-of-app-engine
Try to use the manual scaling in another service of your application, you have to add this xml tag in your appengine-web.xml file to achieve this.
https://cloud.google.com/appengine/docs/standard/java/config/appref#service
This is not the best solution yet, you should use App Engine Tasks to do a long-running operation.
https://cloud.google.com/appengine/docs/standard/java/taskqueue/

High Database Instance Uptime for no queries

I have set up a Wordpress site on google appengine for PHP according to the instructions at https://developers.google.com/appengine/articles/wordpress
I also have a CDN sitting in front of my site so the load on the Google App Engine instance is tiny. Really just cron jobs and the CDN updating it's cache. Here are the access logs fro the last 7 hours for an example.
2013-08-29 06:09:12.829 /post-sitemap.xml 200 6793ms 0kb Amazon CloudFront
2013-08-29 06:09:05.727 /robots.txt 200 4ms 0kb Amazon CloudFront
2013-08-29 04:55:07.937 /wp-cron.php 200 7206ms 0kb AppEngine-Google; (+http://code.google.com/appengine)
2013-08-29 04:33:59.915 /tag/javascript/ 200 8822ms 37kb Amazon CloudFront
2013-08-29 04:33:59.914 This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This requ
2013-08-29 01:12:03.214 / 200 8751ms 39kb Amazon CloudFront
2013-08-29 01:12:03.214 This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This requ
2013-08-29 01:11:50.755 /robots.txt 200 64ms 0kb Amazon CloudFront
2013-08-29 00:05:27.592 /sitemap_index.xml 200 7316ms 1kb Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2013-08-29 00:05:20.217 /robots.txt 200 4ms 0kb Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2013-08-28 23:30:45.596 /system/feeds/sitemap 301 179ms 0kb Amazon CloudFront
My problem is that despite this tiny load my Cloud SQL instance is online way too much and even though I would expect my usage charges under this scenario to be tiny, they are not, I am looking at having to commit to a package to keep them under control rather than paying by usage.
See The following graph for instance uptime that ends at 6am (log time) (10pm UTC)
Then look at the query load for the same period
My guess at what is happening is that a database connection is being opened as soon as the google app engine instance is started (whether it is serving static or dynamic objects).
Any ideas on how I can resolve this?
The Cloud SQL query graph was misleading.
Your application was getting queries throughout the day, which resulted in a hit on the database roughly every hour. This caused the database to be up for about half the time. The problem will be solved in a forthcoming version of the Cloud Console (cloud.google.com/console).
Sorry for the confusion
Joe Faith,
Product Manager,
Google Cloud SQL

Google App Engine - does instance hours include upload / download time by HTTP client to server?

I have an application hosted on the Google App Engine platform. The application is mostly I/O intensive and involves a high number of upload and download operations to the app engine server by an HTTP client.
My question is: what does the instance hour comprise of in this case ? Does it include the total time taken by the HTTP client to upload the request data ? Or does the instance hour calculation begin when the entire request data is uploaded and processing of the request starts ?
Example results from the application:
An HTTP client sent an upload request to the app engine server, request data size 1.1 MB
Time taken for request to complete on the client side - 78311 ms
Corresponding server log entry:
- - [Time] "POST / HTTP/1.1" 200 127 - "Apache-HttpClient/UNAVAILABLE (java 1.4)" "" ms=3952 cpu_ms=1529 api_cpu_ms=283 cpm_usd=0.154248 instance=
An HTTP client sent a download request to the app engine server.
Time taken for request to complete on the client side - 8632 ms
Corresponding server log entry:
- - [Time] "POST / HTTP/1.1" 200 297910 - "Apache-HttpClient/UNAVAILABLE (java 1.4)" "" ms=909 cpu_ms=612 api_cpu_ms=43 cpm_usd=0.050377 instance=
Which of these figures contributes towards the instance hour utilization - is it a) ms, b) cpu_ms or c) the time taken for request to complete on the client side ?
Please note that the HTTP client uses a FileEntity while uploading data, therefore I assume that data is sent over by the client to the server in a single part.
Incoming requests are buffered by the App Engine infrastructure, and the request is only passed to an instance of your app once the entire request has been received. Likewise, outgoing requests are buffered, and your app does not have to wait for the user to finish downloading the response. As a result, upload and download time are not charged against your app.
To understand numbers in log look at log breakdown, a bit more readable here.
None of the options you presented (a. b. c.) are directly billed. It used to be that GAE counted CPU time as a unit of cost, but that changed Nov 2011. Now you pay for instance uptime, even if instance is not handling any requests. Instances stop being billed after 15 min of inactivity.
(This does not mean that GAE actually shuts instances down after they stop billing for them - see "Instances" graph in your dashboard.)
How many instances are up depends on your app's performance settings.
Since your app is IO intensive it will help to enable concurrent requests (Java, Python 2.7). This way instance can run multiple parallel requests which are mainly waiting for IO - in our app I'm seeing about 15-20 requests being served in parallel on one instance.
Update:
This is what first link says about ms=xyz log entry:
This is the actual time (hence 'wallclock' time) taken to return a response to
the user, not including the time it took for the user to send their request or
the time it takes to send the response back - that is, just the time spent
processing by your app.
Note that Nick Johnson is an engineer on GAE team, so this can be taken as authoritative answer.

Resources