Slow initial connection to Google Cloud App Engine - google-app-engine

Initial connections to my website are extremely slow (100+ seconds). How can I diagnose the issue?
Using the Chrome dev tool network tab, I see that the issue is "initial connection" and not things like SSL or Waiting/TTFB.
This only happens for the first page visit to the website for a given device; after the first page loads, everything on the website is very fast. This consistently happens for new devices, on the same device if I don't visit the website for X days, and on the same device if I clear the cache and browsing history.
The website is a Django app is hosted using Google Cloud App Engine with 2 flexible instances.
User traffic to the website is low, so I doubt the issue is load balancing or traffic spikes.
Thanks!

Yesterday I tried opening the page and I noticed 1.8min to load the main page and 2.1min to make a search, later attempts were faster as you mentioned. I also tried accessing the page today and it loaded quite fast.
From my understanding the high latency of the first connection might be related to session handling, database connections, ssl certificates problem, huge amount of uncached data, expensive operations run before the server sends the response. It's nearly impossible for us to determine it without access to your code, logs and database configurations.
As for how to narrow down the issue I might suggest the following:
Examine your logs for possible causes.
Add timeit logging interleaved with each of the statements that handle the requests and watch for bottlenecks or long-running code.
Deploy the same endpoint without logos, images and other data that would be browser-cached and see if it makes a difference.
Create a hello-world simple endpoint and check it's latency. Keep slowly evolving the endpoint to resemble your actual handling code with hopes on finding what's the issue.

If only the first connection is slow, it might be because the instance is starting and you do not have minimum idle instances and warmup requests enabled. This configuration will make you have instances ready for taking traffic and the latency will be slower in the first connection.
As it is stated in the documentation:
If you set a minimum number of idle instances, pending latency will
have less effect on your application's performance. Because App Engine
keeps idle instances in reserve, it is unlikely that requests will
enter the pending queue except in exceptionally high load spikes. You
will need to test your application and expected traffic volume to
determine the ideal number of instances to keep in reserve.
Also you can find more information about warmup requests in this documentation about Configuring Warmup Requests to Improve Performance

I resolved this issue by removing and recreating custom domain settings for my App Engine project, and removing and recreating the corresponding DNS records in domains.google, following these instructions:
https://cloud.google.com/appengine/docs/standard/python/mapping-custom-domains
I'm still not sure what the underlying issue was, but this fixed it. Hope this can help anyone encountering a similar issue.

I had this exact issue. it was killing our load performance once we switched to a load balancer.
What is ended up being was the instance group port setting. We're obviously using ssl certs for the site but I had indicated port 80 and 443.
Once I removed port 80 from the instance group that the load balancer refers to, it loaded all the pages immediately.

Related

Best configuration for Automatic Scaling in Google App Engine to always have an instance available?

What is the best way to set up Google App Engine to always have at least one instance ready and available to handle requests when using automatic scaling? This is for a low traffic application.
There are settings here that allow you to control them but I am not sure what the best combination is and some of them sound confusing. For example, min-instances and min-idle-instances sound similar. I tried setting min-instances to 1 but I still experienced lag.
What is a configuration that from the end user's point of view is always on and handles requests without lag (for a low traffic application)? I
In the App Engine Standard environment, when your application load or handling a requests this may cause the users to experience more latency, however warmup requests might help you reduce this latency. Before any live requests get to that instance, warmup requests load the app's code onto a new one. If this is enabled, App Engine will detect if your application needs a new instance and initiate a warmup request to initialize a new instance. You can check this link for Configuring Warmup Requests to Improve Performance
Regarding min-instances and min-idle-instances these will only apply when warmup request is enabled. As you can see in this post the difference of these two elements: min-instances used to process the incoming request immediately while min-idle-instances used to process high load traffic.
However, you mentioned that you don't need a warmup so we suggest you to select App Engine Flexible and based on this documentation it must have at least one instance running and can scale up in response to traffic. Please take note that using this environment costs you a higher price. You can refer to this link for reference regarding the pricing of two environments in App Engine.

How can I warm up an AppEngine Flex app after a deployment?

It appears that AppEngine standard has a warmup feature to warm up an app after a deployment but I don't see the same feature available for Flex. The readiness & liveness probes also don't work for this since setting the path setting to a custom path inside the application doesn't seem to make the probes actually hit the internal endpoint.
Is there some solution I'm missing other than doing things like manually hitting the endpoints myself after the deployment which won't be very reliable since the calls don't necessarily always round robin to each instance?
In App Engine Standard, warmup requests essentially load your app's code into a new instance before any live requests reach that instance. This can happen in the following situations:
When you redeploy a version of your app.
When new instances are created due to the load from requests
exceeding the capacity of the current set of running instances.
When maintenance and repairs of the underlying infrastructure or
physical hardware occur
In App Engine Flexible, you can achieve the same result by using the initial_delay_sec setting for liveness checks in your app.yaml file. If you set up its value to give enough time for your code to initialize, the first request coming to that instance will be processed quickly by your already-initialized code.

Google App Engine - random ssl connection resets

I have a servlet hosted within Google App Engine. As part of the business logic, it performs a HTTPS request to another webservice. I started getting SSL connection reset exceptions but they appear to be happening in a random fashion. I haven't been able to repeat the exception and they start happening even more over time but then can fade away and come back. I've performed stress testing on the remote service directly from other machines and never receive a single issue. The only way I've been able to deal with this is by redeploying the application but it can take a few redeployments before the issue goes away. I am confident it isnt an issue with the remote service but something happening within Google App Engine.
Thank you for your comments, we just got to the bottom of this yesterday. It was a dynamic application being deployed, so the instances should be managing their resources by themselves. However, what was happening is that an instance was running low on memory which resulted in SSL connection resets. This is why they were appearing in a random manner and degrading even worse overtime. The way this was resolved was to upgrade the resources, changing the deployment configuration to use an F2 instance rather than the default F1 instance. Since doing so there hasn't been a connection dropped. Thanks again for your responses and I hope this helps anyone who may experience something similar.

Azure Web App - Request Timeout Issue

I have and MVC5 web-app running on Azure. Primarily this is a website but I also have a CRON job running (triggered from an external source) which calls a URL (a GET Request) to carry out some house keeping.
This task is asynchronous can take up to and sometimes over the default timeout on Azure of 230 seconds. I'm regularly getting 500 errors due to this.
Now, I've done a bit of reading and this looks like it's something to do with the Azure Load Balancer settings. I've also found some threads relating to being able to alter that timeout in various contexts but I'm wondering if anyone has experience in altering the Azure Load Balancer timeout in the context of a Web App? Is it a straightforward process?
but I'm wondering if anyone has experience in altering the Azure Load Balancer timeout in the context of a Web App?
You could configure the Azure Load Balancer settings in virtual machines and cloud services.
So, if you want to do it, I suggest you could deploy web app to Virtual Machine or migrate web app to cloud services.
For more detail, you could refer to the link.
If possible, you could try to optimize your query or execution code.
This is a little bit of an old question but I ran into it while trying to figure out what was going on with my app so I figured I would post an answer here in case anyone else does the same.
An Azure Load Balancer (created via ANY means) which mostly comes along with an external IP Address — at least when created via Kubernetes / AKS / Helm — has the 4 min, 240 second idle connection timeout that you referred to.
That being said there are two different types of Public IP Addresses — Basic (which is almost always the default that you probably created) and Standard. More information on the docs: https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-standard-overview
You can modify the idle timeout see the following doc: https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-tcp-idle-timeout
That being said changing the timeout may or may not solve your problem. In our case we had a Rails app that was connection to a database outside of azure. As soon as you add a Load Balancer with a Public IP all traffic will EXIT that public IP and be bound by the 4 minute idle timeout.
That's fine except for that Rails anticipates that a connection to a Database is not going to be cut frequently — which in this case happens ALL the time.
We ended up implementing a connection pooling service that sat between our Rails app and our real database (called PGbouncer — specific to Postgres DBs). That service monitored a connection and re-connected when the timer was nearing the Azure LB timeout.
It took a little while to implement but in our case it works flawlessly. You can see some more details over here: What Azure Kubernetes (AKS) 'Time-out' happens to disconnect connections in/out of a Pod in my Cluster?
The longest timeout you can set for a Public IP / Load Balancer is 30 minutes. If you have a connection that you would like to utilize that runs idle longer than that — then you may be out of luck. As of now 30 mins is the max.

How Google cloud achieved scalability through virtualization?

I have a question about how Google app engine achieve scalability through virtualization. For example when we deploy a cloud app to Goodle app engine, by the time the number of users of our app has been increased and I think Google will automatically generate a new virtual server to handle user request. At the first time, the cloud app runs on one virtual server and now it runs on two virtual servers. Google achieved
scalability through virtualization so that any one system in the Google
infrastructure can run an application’s code—even two consecutive
requests posted to the same application may not go to the same server
Does anyone know how an application can run on two virtual servers on Google. How does it send request to two virtual server and synchronizes data, use CPU resources,...?
Is there any document from Google point out this problem and virtualization implement?
This is in now way a specific answer since we have no idea how Google does this. But I can explain how a load balancer works in Apache which operates on a similar concept. Heck, maybe Google is using a variable of Apache load balancing. Read more here.
Basically a simply Apache load balancing structure consists of at least 3 servers: 1 head load balancer & 2 mirrored servers. The load balancer is basically the traffic cop to outside world traffic. Any public request made to a website that uses load balancing will actually be requesting the “head” machine.
On that load balancing machine, configuration options basically determine which slave servers behind the scenes send content back to the load balancer for delivery. These “slave” machines are basically regular Apache web servers that are—perhaps—IP restricted to only deliver content to the main head load balancer machine.
So assuming both slave servers in a load balancing structure are 100% the same. The load balancer will randomly choose one to grab content from & if it can grab the content in a reasonable amount of time that “slave” now becomes the source. If for some reason the slave machine is slow, the load balancer then decides, “Too slow, moving on!” and goes to the next machine. And it basically makes a decision like that for each request.
The net result is the faster & more accessible server is what is served first. But because the content is all proxied behind the load balancer the public accesses, nobody in the outside world knows the difference.
Now let’s say the site behind a load balancer is so heavily trafficked that more servers need to be added to the cluster. No problem! Just clone the existing slave setup to as many new machines as possible, adjust the load balancer to know that these slaves exist & let it manage the proxy.
Now the hard part is really keeping all machines in sync. And that is all dependent on site needs & usage. So a DB heavy website might use MySQL mirroring for each DB on each machine. Or maybe have a completely separate DB server that itself might be mirroring & clustering to other DBs.
All that said, Google’s key to success is balancing how their load balancing infrastructure works. It’s not easy & I have no clue what they do. But I am sure the basic concepts outlined above are applied in some way.

Resources