I have deployed my Django app with Google App Engine. The app contains APIs; in each API there are calculations on thousands of records. In short I have some APIs that take more than 60 sec and I get 502 error. How can I fix that?
Are google app engine's B1 or B2 instances a solution to my problem? please guide me, thanks
I have been experiencing a similar problem, and the logs tell me that the worker is timing out.
The default Gunicorn worker timeout is 30 seconds. My app is doing some API requests that are taking longer than 30 seconds, so that's why I'm getting timeouts. If you think this is your problem as well, you can address it by adjusting the entrypoint line in your app.yaml file:
runtime: python37
entrypoint: gunicorn -b :$PORT example.wsgi --log-level=DEBUG --timeout=30
service: default
For your specific case I can see 3 possible solutions:
The easiest way to fix your issue would be switching to B1 or B2 instances which support manual and basic Scaling Types, both offer the possibility to have requests running up to 24 hours.
If for some reason you wanted to stick to F2 instances you have the option to create a task on a task queue, this will allow you to run the requests asynchronously.
You could also switch to GAE Flexible, this will give you a 60 minutes maximum request timeout, as stated in this documentation.
Related
About 3-4 times a week one of my two 12hr tasks that acts as an ETL from an API endpoint to a Snowflake DB fails and I can't figure out exactly why.
The Cron Task Mananger says it last ran at 6:29am this morning but in retrieving the logs there's only one line which says:
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
I'm not sure if I need a warm-up, allocate specific workers, etc. because the log of the one-line error is so uninformative to me. I'm using a pretty sizable instance class I was hoping could handle most the workload.
Here is what the logs of a successful run look like:
https://github.com/markamcgown/GF/blob/main/downloaded-logs-success2.csv
And the failure:
https://github.com/markamcgown/GF/blob/main/downloaded-logs-20210104-074656.csv
App.yaml:
service: vetdata-loader
runtime: python38
instance_class: F4_1G
handlers:
- url: /task/loader
script: auto
Updated, here is my most recent app.yaml that's failing less now but still sometimes:
service: vetdata-loader
runtime: python38
instance_class: B4_1G
handlers:
- url: /task/loader
script: auto
basic_scaling:
max_instances: 11
idle_timeout: 30m
I think you don't use the correct instance class. If you have a look here about the timeouts and the task call you are limited to 10 minutes call for automatic scaling, and up to 24h with basic and manual scaling.
If I take your instance_class, the FXXX type is suitable for automatic scaling. Use a B4_1G instance class instead and check if you still have these issues. You should not.
Some weeks ago my app on App Engine just started to increase the number of idle instances to an unreasonable high amount, even when there is close to zero traffic. This of course impacts my bill which is skyrocketing.
My app is simple Node.js application serving a GraphQL API that connects to my CloudSQL database.
Why are all these idle instances being started?
My app.yaml:
runtime: nodejs12
service: default
handlers:
- url: /.*
script: auto
secure: always
redirect_http_response_code: 301
automatic_scaling:
max_idle_instances: 1
Screenshot of monitoring:
This is very strange behavior, as per the documentation it should only temporarily exceed the max_idle_instances.
Note: When settling back to normal levels after a load spike, the
number of idle instances can temporarily exceed your specified
maximum. However, you will not be charged for more instances than the
maximum number you've specified.
Some possible solutions:
Confirm in the console that the actual app.yaml configuration is the same as in the app engine console.
Set min_idle_instances to 1 and max_idle_instances to 2 (temporarily) and redeploy the application. It could be that there is just something wrong on the scaling side, and redeploying the application could solve this.
Check your logging (filter app engine) if there is any problem in shutting down the idle instances.
Finally, you could tweak settings like max_pending_latency. I have seen people build applications that take 2-3 seconds to start up, while the default is 30ms before another instance is being spun up.
This post suggests setting the following, which you could try:
instance_class: F1
automatic_scaling:
max_idle_instances: 1 # default value
min_pending_latency: automatic # default value
max_pending_latency: 30ms
Switch to basic_scaling, let Google determine the best scaling algorithm (last resort option). This would look something like this:
basic_scaling:
max_instances: 5
idle_timeout: 15m
The solution could of course also be a combination of 2 and 4.
Update after 24 hours:
I followed #Nebulastic suggestions, number 2 and 4, but it did not make any difference. So in frustration I disabled the entire Google App Engine (App Engine > Settings > Disable application) and left it off for 10 minutes and confirmed in the monitoring dashboard that everything was dead (sorry, users!).
After 10 minutes I enabled App Engine again and it booted only 1 instance. I've been monitoring it closely since and it seems (finally) to be good now. And now after the restart it also adheres to the "min" and "max" idle instances configuration - the suggestion from #Nebulastic. Thanks!
Screenshots:
Have you checked to make sure you dont have a bunch of old versions still running? https://console.cloud.google.com/appengine/versions
check for each service in the services dropdown
I want to export Metrics from Cloud Monitoring to Big Query and google has given a solution on how to do this. I am following this this article.
I have downloaded the code from github and I am able to successfully deploy and run the application (python2.7),
I have given the aggregate alignment period as 86400s (I want to aggregate metrics per day starting from 1st July)
One of the app-engines the write-metrics app engine which writes the metrics to the big query, by getting the api response as a pub-sub message is always throwing me these errors:
> Exceeded soft memory limit of 256 MB with 270 MB after servicing 5 requests total. Consider setting a larger instance class in app.yaml.
> While handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application or may be using an instance with insufficient memory. Consider setting a larger instance class in app.yaml.
The above is 500 error and very frequent and I find that duplicate records are getting inserted in the table in BigQuery still
and also this one below
DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded.
The app engine logs frequently show POST with codes 500 and 200
In app engine(standard) I have added the scaling as automatic and set in app.yaml as below:
automatic_scaling:
target_cpu_utilization: 0.65
min_instances: 5
max_instances: 25
min_pending_latency: 30ms
max_pending_latency: automatic
max_concurrent_requests: 50
but this seems to have no effect. I am very new to app engine,google-cloud and its stackdriver metrics.
This change makes it work
instance_class: F4_1G
This needs to be as a independent tag and previously i had made the mistake of putting under the automatic scaling: so it gave illegal modifier
how i can overcome the 60 seconds limitation.
i read that configure the yaml file to manual_scaling will help but it didn't. after 60 sec the server reset. the server developed with nodesjs
i need to send 2000 emails every x time. i need the ability to run process for 10-20 minutes
this is the yaml file:
runtime: nodejs8
instance_class: B4
manual_scaling:
instances: 1
you don't have to use the "default" service.
https://cloud.google.com/appengine/docs/standard/java/an-overview-of-app-engine
Try to use the manual scaling in another service of your application, you have to add this xml tag in your appengine-web.xml file to achieve this.
https://cloud.google.com/appengine/docs/standard/java/config/appref#service
This is not the best solution yet, you should use App Engine Tasks to do a long-running operation.
https://cloud.google.com/appengine/docs/standard/java/taskqueue/
What is the maximum timeout deadline for a URL Fetch on Google App Engine?
I understand that for normal requests, it can be no more than 60 seconds (which is the maximum length of the request). But what about backend requests or Taskqueues which can run up to ten minutes? There are a number of questions on this topic with conflicting information. The official documentation is silent on the issue. Any ideas?
60 seconds for both of them. It's specified in Java docs here.
You may use sockets if you want longer deadline, but as I remember you cannot use HTTPS there.