Task Queue Issue - Endpoint v2, Google App Engine - google-app-engine

We are facing issues with taskqueues after recently updating the Endpoints API to version 2 in Google App Engine - Python. Following are the issues faced with respect to taskqueus,
Taskqueue doesn't get added to the queue at all , Just gets ignored and never executed.
Taskqueue get terminated with error - "Process terminated because the backend was stopped."
The most critical error is the first one where the task is just ignored and not added to the queue itself.
Details on the codebase and logs are attached along.
It would be great if someone can help us out here.
app.yaml (Server Settings)
#version: 1
runtime: python27
api_version: 1
threadsafe: true
instance_class: F4
automatic_scaling:
min_idle_instances: 1
max_idle_instances: 4 # default value
min_pending_latency: 500ms # default value
max_pending_latency: 900ms
max_concurrent_requests: 50
queue.yaml
- name: allocateStore
rate: 500/s
bucket_size: 500
max_concurrent_requests: 1000
retry_parameters:
task_retry_limit: 0
Adding task to queue:
taskqueue.add(queue_name='allocateStore', url='/tasksStore/allocateStore')
Thanks,
Navin Lr

Related

Google App Engine: unpredictable cost and discrepancy between app engine dashboard vs billing export

I have been exploring the App Engine settings for a small data science web application for 2 weeks. Since it is a personal project that bills my own wallet, I tried a few different parameters in app.yaml to reduce the "frontend instances" cost. Several changes in, I got unexpected ~10x cost surge!!! It was painful!!! In order to not waste it, I decided to learn something here to understand the behaviour :)... Don't worry, I had temporarily shut down my app ;)
Version 1 app.yaml:
service: my-app
runtime: python37
instance_class: F4
env: standard
automatic_scaling:
min_idle_instances: 1
max_idle_instances: 1
default_expiration: "1m"
inbound_services:
- warmup
entrypoint: gunicorn -b 0.0.0.0:8080 main:server
Version 1, billing result (usage.amount_in_pricing_units exported from billing account): ~100hr/day, the same as Front end Instance Hours shown from App Engine billing status.
This is understandable, because I had a F4 instance constantly runing idle that would translate into 24*4=96 frontend instance hours. Adding the instance usage from actual requests (from me only), ~100hr/day seems reasonable.
Version 2, where I intended to lower the instance class and number of instances and also made longer the default_expiration and hoping it would help the app to start quicker and some other stuff that I thought wouldn't affect much....
service: my-app
runtime: python37
instance_class: F2
env: standard
automatic_scaling:
min_instances: 1
max_instances: 1
target_cpu_utilization: 0.85
max_concurrent_requests: 80
max_pending_latency: 6s
default_expiration: "3h"
inbound_services:
- warmup
entrypoint: gunicorn -b 0.0.0.0:8080 main:server
Version 2, billing result (usage.amount_in_pricing_units exported from billing account): ~800hr+/day, ouch!!! In contrast, the Front end Instance Hours from App Engine dashboard billing status is less than 60hr/day as expected. This is where I got lost:
Why the usage from billing is so much larger than the App Engine Dashboard where do those usage come from?
Where to find and track indicators of those unaccounted usage in App Engine Dashboard etc?
2020-01-16 Solution for issue #1.
While I was waiting for Google Billing Support to come back to me, I found this:
Pricing of Google App Engine Flexible env, a $500 lesson
Namely, the past deployed versions of the app also eating frontend instance hours, which needed real world confirmation. (To my surprise, this has nothing to do with app.yaml file!!) So I deleted all the past versions of the app and let it run for two days while observing instance hours and billing records with the following app.yaml file.
service: my-app
runtime: python37
instance_class: F2
env: standard
automatic_scaling:
min_instances: 1
max_instances: 2
max_idle_instances: 1
target_cpu_utilization: 0.85
max_concurrent_requests: 80
max_pending_latency: 6s
default_expiration: "1m"
inbound_services:
- warmup
entrypoint: gunicorn -b 0.0.0.0:8080 main:server
This should have always one F2 instance running and goes up to maximum 2 instances. This time both app engine and exported billing usage hours agreed on 50 hours frontend instance hours. Yes!!! The daily cost is cut down to 1/16.
This solves the cost question #1, but #2 remains to be answered. It is very problematic that app engine dashboard is not showing all the billed usage of frontend instances. Yesterday I heard from Google Billing Support Team, the answer is not helpful (mainly talking about instance numbers in app.yaml, which doesn't help), they seem oblivious about this issue, I will have to let them know.
2020-01-31 Followup on issue #2.
Google Billing Support Team responded swiftly, acknowledged the discrepency between App Engine Dashboard vs Billing Export and agreed to ajust the billing for me. Effectively, the bills during the spiky days were refunded as a result. Kudos to them!

What causes "Request was aborted after waiting too long to attempt to service your request"?

What causes "Request was aborted after waiting too long to attempt to service your request"?
It seems the result of some sort of internal timeout, but I don't know where this is configured.
We're currently using autoscaling, and that error was the result of a temporary increase in the number of tasks in our taskqueue. Shouldn't autoscaling have created more instances to handle that request?
Also, if a task in Cloud Tasks fails with "Request was aborted after waiting too long to attempt to service your request", is that task retried, or is it removed from the queue?
Edit: I found the problem.
This was the configuration for scailing in our app.yaml:
basic_scaling:
max_instances: 2
I found this when I was proxying to view a munin node with many graphs on an f1-micro backend. Responses will fail with a 529 error if they are waiting for longer than (min|max)_pending_latency - probably it is trying to create a new instance, because the minimum is violated, but finds it cannot.
The default appears to be 5s. You can set it in app.yaml to a max of 15s.
automatic_scaling:
min_pending_latency: 15s
max_pending_latency: 15s
Once I did that I stopped getting the errors for requests waiting for 6s. Of course, I'm sure Google would prefer you increase the number of scaling instances, or use a faster node. But maybe you only want to scale to one or two, or 15s is an acceptable latency for what you're trying to do.
For reference, my full app.yaml:
runtime: php73
service: munin
instance_class: F1
automatic_scaling:
max_instances: 1
min_instances: 0
target_cpu_utilization: 0.95
target_throughput_utilization: 0.95
max_concurrent_requests: 80
max_pending_latency: 15s
handlers:
- url: .*
script: auto
secure: always

GAE: instances count drops to 0 but min_instances is set to 1

I'm using GAE standard with nodejs10.
I redeployed a new version of the app approx 9 hours ago with this conf (as shown in the UI under App Engine > Versions):
runtime: nodejs10
env: standard
instance_class: F1
handlers:
- url: '.*'
script: auto
- url: '.*'
script: auto
automatic_scaling:
min_idle_instances: automatic
max_idle_instances: automatic
min_pending_latency: automatic
max_pending_latency: automatic
min_instances: 1
max_instances: 3
And yet I noticed that the instances count dropped to 0:
Any idea why GAE doesn't keep 1 instance running?
GAE standard is autoscaled to zero instance by default. Whenever a url is matched by handlers according to the conf, an instance will be initiated and will continue to be live until there is no more url request for a period of time, so it will automatically rescale to zero if there is no more traffic for a period. Here is the instance state for my test:
automatic_scaling:
min_instances: 2
And here is the traffic:

Google App engine Deoployment receives ERROR Only when specifiying CPUs above 1 in app.yaml

I have a Flask app that deploys fine in the Google App Engine Flexible environment but some new updates have made it relatively resource intensive (Was receiving a [CRITICAL] Worker Timeout message.) In attempting to fix this issue I wanted to increase the number of CPUs for my app.
app.yaml:
env: flex
entrypoint: gunicorn -t 600 --timeout 600 -b :$PORT main:server
runtime: python
threadsafe: false
runtime_config:
python_version: 2
automatic_scaling:
min_num_instances: 3
max_num_instances: 40
cool_down_period_sec: 260
cpu_utilization:
target_utilization: .5
resources:
cpu: 3
After some time I receive:
"Updating service [default] (this may take several minutes)...failed.
ERROR: (gcloud.app.deploy) Error Response: [13] An internal error occurred during deployment."
Is there some sort of permission issue preventing me from increasing the CPUs? Or is my app.ymal invalid?
You can not set the number of cores(CPU) to odd numbers except 1. It should be even.

503 Server Error with new Flexible Environment App

Created a new Flexible Environment App and made successful deploy with latest GCloud version (current is 133) but get "503 Server Error" without any error logs.
Source code: https://github.com/AIMMOTH/scala-stack-angular/tree/503-error
App link: https://scala-stack-angular-us.appspot.com
Error page:
Error: Server Error
The service you requested is not available yet.
Please try again in 30 seconds.
Version info:
Version Status Traffic Allocation Instances Runtime Environment Size Deployed Diagnose
20161108t190158 Serving 100 % 2 custom Flexible
I had a filter responding to /_ah/* and broke Google App Engine.
For me, it was because of wrong settings in app.yaml:
vm: true # the flexible environment
runtime: java # Java 8 / Jetty 9.3 Runtime
service: default
threadsafe: true # handle multiple requests simultaneously
resources:
cpu: .5 # number of cores
memory_gb: 1.3
disk_size_gb: 10 # minimum is 10GB and maximum is 10240GB
health_check:
enable_health_check: true
check_interval_sec: 5 # time interval between checks (in seconds)
timeout_sec: 4 # health check timeout interval (in seconds)
unhealthy_threshold: 2 # an instance is unhealthy after failing this number of consecutive checks
healthy_threshold: 2 # an unhealthy instance becomes healthy again after successfully responding to this number of consecutive checks
restart_threshold: 60 # the number of consecutive check failures that will trigger a VM restart
automatic_scaling:
min_num_instances: 1
max_num_instances: 1
cool_down_period_sec: 120 # time interval between auto scaling checks. It must be greater than or equal to 60 seconds.
# The default is 120 seconds
cpu_utilization:
target_utilization: 0.5 # CPU use is averaged across all running instances and is used to decide when to reduce or
# increase the number of instances (default 0.5)
handlers:
- url: /.* # regex
script: ignored # required, but ignored
secure: always # https
beta_settings:
java_quickstart: true # process Servlet 3.1 annotations
use_endpoints_api_management: true # enable Google Cloud Endpoints API management
I removed use_endpoints_api_management: true and everything works fine.

Resources