Best configuration for Automatic Scaling in Google App Engine to always have an instance available? - google-app-engine

What is the best way to set up Google App Engine to always have at least one instance ready and available to handle requests when using automatic scaling? This is for a low traffic application.
There are settings here that allow you to control them but I am not sure what the best combination is and some of them sound confusing. For example, min-instances and min-idle-instances sound similar. I tried setting min-instances to 1 but I still experienced lag.
What is a configuration that from the end user's point of view is always on and handles requests without lag (for a low traffic application)? I

In the App Engine Standard environment, when your application load or handling a requests this may cause the users to experience more latency, however warmup requests might help you reduce this latency. Before any live requests get to that instance, warmup requests load the app's code onto a new one. If this is enabled, App Engine will detect if your application needs a new instance and initiate a warmup request to initialize a new instance. You can check this link for Configuring Warmup Requests to Improve Performance
Regarding min-instances and min-idle-instances these will only apply when warmup request is enabled. As you can see in this post the difference of these two elements: min-instances used to process the incoming request immediately while min-idle-instances used to process high load traffic.
However, you mentioned that you don't need a warmup so we suggest you to select App Engine Flexible and based on this documentation it must have at least one instance running and can scale up in response to traffic. Please take note that using this environment costs you a higher price. You can refer to this link for reference regarding the pricing of two environments in App Engine.

Related

Slow initial connection to Google Cloud App Engine

Initial connections to my website are extremely slow (100+ seconds). How can I diagnose the issue?
Using the Chrome dev tool network tab, I see that the issue is "initial connection" and not things like SSL or Waiting/TTFB.
This only happens for the first page visit to the website for a given device; after the first page loads, everything on the website is very fast. This consistently happens for new devices, on the same device if I don't visit the website for X days, and on the same device if I clear the cache and browsing history.
The website is a Django app is hosted using Google Cloud App Engine with 2 flexible instances.
User traffic to the website is low, so I doubt the issue is load balancing or traffic spikes.
Thanks!
Yesterday I tried opening the page and I noticed 1.8min to load the main page and 2.1min to make a search, later attempts were faster as you mentioned. I also tried accessing the page today and it loaded quite fast.
From my understanding the high latency of the first connection might be related to session handling, database connections, ssl certificates problem, huge amount of uncached data, expensive operations run before the server sends the response. It's nearly impossible for us to determine it without access to your code, logs and database configurations.
As for how to narrow down the issue I might suggest the following:
Examine your logs for possible causes.
Add timeit logging interleaved with each of the statements that handle the requests and watch for bottlenecks or long-running code.
Deploy the same endpoint without logos, images and other data that would be browser-cached and see if it makes a difference.
Create a hello-world simple endpoint and check it's latency. Keep slowly evolving the endpoint to resemble your actual handling code with hopes on finding what's the issue.
If only the first connection is slow, it might be because the instance is starting and you do not have minimum idle instances and warmup requests enabled. This configuration will make you have instances ready for taking traffic and the latency will be slower in the first connection.
As it is stated in the documentation:
If you set a minimum number of idle instances, pending latency will
have less effect on your application's performance. Because App Engine
keeps idle instances in reserve, it is unlikely that requests will
enter the pending queue except in exceptionally high load spikes. You
will need to test your application and expected traffic volume to
determine the ideal number of instances to keep in reserve.
Also you can find more information about warmup requests in this documentation about Configuring Warmup Requests to Improve Performance
I resolved this issue by removing and recreating custom domain settings for my App Engine project, and removing and recreating the corresponding DNS records in domains.google, following these instructions:
https://cloud.google.com/appengine/docs/standard/python/mapping-custom-domains
I'm still not sure what the underlying issue was, but this fixed it. Hope this can help anyone encountering a similar issue.
I had this exact issue. it was killing our load performance once we switched to a load balancer.
What is ended up being was the instance group port setting. We're obviously using ssl certs for the site but I had indicated port 80 and 443.
Once I removed port 80 from the instance group that the load balancer refers to, it loaded all the pages immediately.

How can I warm up an AppEngine Flex app after a deployment?

It appears that AppEngine standard has a warmup feature to warm up an app after a deployment but I don't see the same feature available for Flex. The readiness & liveness probes also don't work for this since setting the path setting to a custom path inside the application doesn't seem to make the probes actually hit the internal endpoint.
Is there some solution I'm missing other than doing things like manually hitting the endpoints myself after the deployment which won't be very reliable since the calls don't necessarily always round robin to each instance?
In App Engine Standard, warmup requests essentially load your app's code into a new instance before any live requests reach that instance. This can happen in the following situations:
When you redeploy a version of your app.
When new instances are created due to the load from requests
exceeding the capacity of the current set of running instances.
When maintenance and repairs of the underlying infrastructure or
physical hardware occur
In App Engine Flexible, you can achieve the same result by using the initial_delay_sec setting for liveness checks in your app.yaml file. If you set up its value to give enough time for your code to initialize, the first request coming to that instance will be processed quickly by your already-initialized code.

Is there any way to access Firebase from an App Engine servlet without using manual scaling or the flexible environment?

Question:
Is there any way to access Firebase using server-side code without using either manual scaling or the flexible environment?
Context:
I want to achieve the following flow:
app posts pending 'updates' to firebase -> backend picks them up -> backend sends emails -> backend modifies firebase 'updates' to non-pending state
From what I can see, if I want the back end to pick these updates up in real time, I need a long-running thread in the App Engine Flexible Environment. I'm prepared to forego this to avoid the flexible environment's pricing model and beta status.
Given that my choice is therefore the App Engine Standard Environment, it appears that to access Firebase i'm stuck with having to enable manual scaling.
It seems madness to have a resident instance running all the time - when there's no requirement to listen to updates in real time - which sits idle for 95% of the time then isn't available outside of a 9-instance-hour (free) contiguous period.
Can Firebase be somehow accessed from a server-side without 'attaching a listener', such that I can call it from an automatically scaled instance and simply get a snapshot? If not, is there an alternate technical or architectural solution here - or something I'm missing?
This must be a fairly common issue!
Thankyou for your time, much appreciated.

Google App Engine internal network

Is it possible to route HTTP traffic between google app engine applications without going through the public internet?
For example, if I'm running a Web Service API on one application and want to build a second application on top of it without traffic going through the internet - for performance reasons.
Between separate apps running on different domains? I suspect not.
But you can use backends to do different work behind the scenes:
Backends are special App Engine instances that have no request deadlines, higher memory and CPU limits, and persistent state across requests. They are started automatically by App Engine and can run continously for long periods. Each backend instance has a unique URL to use for requests, and you can load-balance requests across multiple instances.
When I look at the logs between the backend and the front end instances I see IPs like
0.1.0.3
So yes, those communication paths are internal. I'd hazard a guess that as so much of the internet is google you could say requests between different apps might not travel on the public internet.
Logs indicate low latency communication between front and back ends, not under any particular load however. Your milage may vary.
Backends in Python

Stack Exchange API compliant request throttle implementation on Google App Engine Cloud infrastructure

I have been writing a Google Chrome extension for Stack Exchange. It's a simple extension that allows you to keep track of your reputation and get notified of comments on Stack Exchange sites.
Currently I've encountered with some issues that I can't handle myself.
My extension uses Google App Engine as its back-end to make external requests to Stack Exchange API. Each single client request from extension for new comments on single site can cause plenty of requests to api endpoint to prepare response even for non-skeetish user. Average user has accounts at least on 3 sites from Stack Exchange network, some has > 10!
Stack Exchange API has request limits:
A single IP address can only make a certain number of API requests per day (10,000).
The API will cut my requests off if I make more than 30 requests over 5 seconds from single IP address.
It's clear that all requests should be throttled to 30 per 5 seconds and currently I've implemented request throttle logic based on a distributed lock with memcached. I'm using memcached as a simple lock manager to coordinate the activity of GAE instances and throttle UrlFetch requests.
But I think it's a big failure to limit such powerful infrastructure to issue no more than 30 requests per 5 sec. Such api request rate does not allow me to continue development of new interesting and useful features and one day it will stop working properly at all.
Now my app has 90 users and growing and I need come up with solution how to maximize request rate.
As known App Engine makes external UrlFetch requests via the same pool of different IP's.
My goal is to write request throttle functionality to ensure compliance with the api terms of usage and to utilize GAE distributed capabilities.
So my question is how-to provide maximum practical API throughput while complying with api terms of usage and utilizing GAE distributed capabilities.
Advise to use another platform/host/proxy is just useless in my mind.
If you are searching a way to programmatically manage Google App Engine shared pool of IPs, I firmly believe that you are out of luck.
Anyway, quoting this advice that is part of the faq, I think you have more than a chance to keep on running your awesome app:
What should I do if I need more
requests per day?
Certain types of applications -
services and websites to name two -
can legitimately have much higher
per-day request requirements than
typical applications. If you can
demonstrate a need for a higher
request quota, contact us.
EDIT:
I was wrong, actually you don't have any chance.
Google App Engine [app]s are doomed.
First off: I'm using your extension and it rocks!
Have you consider using memcached and caching the results?
Instead of taking the results from the API directly, try first to find them on the cache if they are use it and if they are not: retrieve them and cache them and let them expire after X minutes.
Second, try to batch up users requests, instead of asking the reputation of a single user ask the reputation of several users together.

Resources