Google App Engine - random ssl connection resets

Google App Engine - random ssl connection resets - google-app-engine

I have a servlet hosted within Google App Engine. As part of the business logic, it performs a HTTPS request to another webservice. I started getting SSL connection reset exceptions but they appear to be happening in a random fashion. I haven't been able to repeat the exception and they start happening even more over time but then can fade away and come back. I've performed stress testing on the remote service directly from other machines and never receive a single issue. The only way I've been able to deal with this is by redeploying the application but it can take a few redeployments before the issue goes away. I am confident it isnt an issue with the remote service but something happening within Google App Engine.

Thank you for your comments, we just got to the bottom of this yesterday. It was a dynamic application being deployed, so the instances should be managing their resources by themselves. However, what was happening is that an instance was running low on memory which resulted in SSL connection resets. This is why they were appearing in a random manner and degrading even worse overtime. The way this was resolved was to upgrade the resources, changing the deployment configuration to use an F2 instance rather than the default F1 instance. Since doing so there hasn't been a connection dropped. Thanks again for your responses and I hope this helps anyone who may experience something similar.

Related

Can I "wake-up" Heroku server on front-end app launch (seperate deploys)?

I'm aware Heroku sleeps apps after 30 mins of inactivity, and that's fine.
I have a React front-end hosted on Vercel, with an ExpressJS back-end hosted on Heroku, and it's very likely that the user won't need to make a MongoDB call (all this server does) within less than the time it takes to "wake up" the server on Heroku. So if when the app loads it were somehow able to "poke" the server to wake it up, most users wouldn't even know it was sleeping in the first place.
Is this possible without making an intentionally redundant CRUD request?

As per the comment from #jonrsharpe under the question, it appears the simplest and easiest way to do this is to just call get('/wakeUp') on the initial page load and let it return a 404. Not pretty but it works.
There's no reason this can't also return something more meaningful, but apparently it doesn't need to.

Slow initial connection to Google Cloud App Engine

Initial connections to my website are extremely slow (100+ seconds). How can I diagnose the issue?
Using the Chrome dev tool network tab, I see that the issue is "initial connection" and not things like SSL or Waiting/TTFB.
This only happens for the first page visit to the website for a given device; after the first page loads, everything on the website is very fast. This consistently happens for new devices, on the same device if I don't visit the website for X days, and on the same device if I clear the cache and browsing history.
The website is a Django app is hosted using Google Cloud App Engine with 2 flexible instances.
User traffic to the website is low, so I doubt the issue is load balancing or traffic spikes.
Thanks!

Yesterday I tried opening the page and I noticed 1.8min to load the main page and 2.1min to make a search, later attempts were faster as you mentioned. I also tried accessing the page today and it loaded quite fast.
From my understanding the high latency of the first connection might be related to session handling, database connections, ssl certificates problem, huge amount of uncached data, expensive operations run before the server sends the response. It's nearly impossible for us to determine it without access to your code, logs and database configurations.
As for how to narrow down the issue I might suggest the following:
Examine your logs for possible causes.
Add timeit logging interleaved with each of the statements that handle the requests and watch for bottlenecks or long-running code.
Deploy the same endpoint without logos, images and other data that would be browser-cached and see if it makes a difference.
Create a hello-world simple endpoint and check it's latency. Keep slowly evolving the endpoint to resemble your actual handling code with hopes on finding what's the issue.

If only the first connection is slow, it might be because the instance is starting and you do not have minimum idle instances and warmup requests enabled. This configuration will make you have instances ready for taking traffic and the latency will be slower in the first connection.
As it is stated in the documentation:
If you set a minimum number of idle instances, pending latency will
have less effect on your application's performance. Because App Engine
keeps idle instances in reserve, it is unlikely that requests will
enter the pending queue except in exceptionally high load spikes. You
will need to test your application and expected traffic volume to
determine the ideal number of instances to keep in reserve.
Also you can find more information about warmup requests in this documentation about Configuring Warmup Requests to Improve Performance

I resolved this issue by removing and recreating custom domain settings for my App Engine project, and removing and recreating the corresponding DNS records in domains.google, following these instructions:
https://cloud.google.com/appengine/docs/standard/python/mapping-custom-domains
I'm still not sure what the underlying issue was, but this fixed it. Hope this can help anyone encountering a similar issue.

I had this exact issue. it was killing our load performance once we switched to a load balancer.
What is ended up being was the instance group port setting. We're obviously using ssl certs for the site but I had indicated port 80 and 443.
Once I removed port 80 from the instance group that the load balancer refers to, it loaded all the pages immediately.

can't deploy Google Cloud Endpoints 2.0 on existing service

I have had a Python-based Google App Engine app working great using Cloud Endpoints 1.0 for several years without incident. I have had nothing but trouble migrating to Cloud Endpoints 2.0.
Currently I'm in the following state after already clearing many previous hurdles described in other similar questions:
I have one version of my service called gce1 which uses Endpoints 1.0 and is set as the default service receiving 100% of my traffic. I can point API clients and the APIs Explorer to both gce1-dot-myservice.appspot.com and the default myservice.appspot.com and everything works fine. I can verify in the logs that anything that goes through here is using GCE 1.0.
I have a second version of my service called gce2 which is not receiving any traffic by default, but if I point an API client or the APIs Explorer to gce2-dot-myservice.appspot.com it works just fine, and I can verify in the logs that anything that goes through here is using GCE 2.0.
Great, right? So it would seem that all I need to do is migrate all my traffic to gce2 and I'm done.
But... when I do that everything breaks! The default myservice.appspot.com serves up 405 POST Method Not Allowed responses to my existing clients, and if I look at the APIs Explorer, suddenly it now shows a bunch of obsolete methods that I think are from years ago and are no longer used in my current API. I can't tell where those are coming from (they are nowhere in my code, and haven't been for years), and I can't get the default service to serve the GCE 2.0 API no matter what I do.
The biggest problem is that I have thousands of users in the wild that all point to the default API URL, so it isn't so easy to just have them start pointing to gce2-dot-myservice, and besides, it doesn't make sense that I can't make the new default the new default. I've been working on this migration to GCE 2.0 for months, the deadline for getting off GCE 1.0 is getting closer by the day, and Google Support has not responded since late last year on this topic.
I should also mention I have tried:
Pushing a new service with the GCE2.0 code directly to default
Pushing a new service with no API at all (to maybe clear a cache or something)
Pushing services with all different sorts of version names
None of these have worked, although I haven't done any of them allowing a long delay since I'm working on a live service with real users.

This issue is now resolved, so for most people it should no longer occur. However, in my specific case, I had a legacy API that was getting in the way and had to be deleted, which did require specific attention from a Google engineer.
If you have similar issues, visit issuetracker.google.com/issues/76031966 and comment there.
Thanks to #saiyr for help tracking this down.

Deploying a Mean Stack App, Apache?

I am starting to learn Angular and I am employing a MEAN stack. The grey area in my mind is when my angular app is finished and ready to deploy on a server.
Would I still need to use Apache of Nginx to route a domain or subdomain to my app?
I guess the node/express.js is my main question. I have used it when working locally, but when deploying would that run my app on the server side.
Thanks in advance.

You can run Node and your app on the server as-is, single-threaded...provided you took care of telling DNS where to reach the app.
To riff off your nginx question...here are a few of other deployment/config questions that you may consider:
server crashes: sooooo...node isn't at 1.0 yet and apps sometimes do unexpected things and die. tools like forever, supervisor and similar things can auto-restart the server.
logging: tools like morgan, winston and more can provide logging so you can see what was happening on the server before big events (everybody hit the same page, the server crashed every time page XYZ was hit, etc)
load balancing: a node server is single-threaded, single-instance. if you have a super busy site or you are stuck with synchronous stuff (blah!) you'll want to consider how to spin up multiple node instances. nginx and node clustering would be things to consider, but if your app is small, this is probably not a priority over handling crashes and logging

Windows Azure Cloud Service connection with sql database

I have been making requests to my cloud REST Service over the two past weeks. Everything was fine until yesterday.
Over the past days, I kept re-publishing my service to the cloud to test some of its operations with a client. I DID NOT change anything in my web.config, just some method bodies.
Yesterday, by making the simplest GET request to my service, through my browser or Advanced Rest Client, i started getting the following error:
The server encountered an error processing the request. Please see the service help page for constructing valid requests to the service. The exception message is 'The underlying provider failed on Open.'. and so on
I suspect after doing my research that this means I clearly have a connection error with my database which I don't get since it was working fine so far.
I also tried to Stop and Start my service in the Azure Production Enviroment but without any luck. Also the server firewall is configured as it should be.
Any answers would be much appreciated.
Thanks.

In my experience these generally tend to be azure service outages. There is currently 'Scheduled maintenance' occurring on all DB instances which may be affecting you.
http://www.windowsazure.com/en-us/support/service-dashboard/