I'm trying to debug 502 errors coming out of the nginx container with my AppEngine Flex setup.
I noticed that the logs indicate liveness and rediness checks being spammed very rapidly (see attached).
For clarification this is currently running a single instance in manual_scaling mode.
check_interval_sec is set for 30s on liveness_check and 5 sec on rediness_check.
Can anyone provide insight into what is going on here?
It looks like you setup readiness and liveness checks too aggressively in your app.yaml. Please keep in mind that the checks works for every instance so if you have a lot of instances, it will occur frequently.
If you only have one instance setup, then the behavior contradict what the documentation described. Please file an issue with us on the issue tracker.
Related
I asked about this in the appengine user group here and wasn't able to resolve the issue.
The issue I'm having is that, for seemingly a very light endpoint and others like it, latency seems to be an issue. Here's an example request as shown in GCP's Trace tool:
I'm aware that if it's a request that spawned a new instance or if memory usage is high, that would explain high latency. Neither of which are the case here.
It seems that intermittently some endpoints simply take a good second or two to respond on top of however long the endpoint itself takes to do its job, and the bulk of that is "untraced time" under GCP Stackdriver's Trace tool. I put a log entry as early as I seemingly possibly could, in webapp2's RequestHandler object on initialization. You can see that in the screenshot as "webapp2 request init".
I'm not familiar enough with webapp2's inner working to know where I could put a log elsewhere that could help explain this, if anywhere.
These are the scaling settings for that appengine service defined in my yaml file:
instance_class: F4
automatic_scaling:
max_idle_instances: 1
max_pending_latency: 1s
max_concurrent_requests: 50
Not sure what other information would be useful here.
When deploying using gcloud app deploy I get the following error:
Timed out waiting for the app infrastructure to become healthy gcp
I contacted GCP Support and they told me the same thing I had read in other threads:
the error you are referring to may be related to the Compute Engine “In-Use IP Addresses” Quota limit. You can view your current quota limit information by accessing from your GCP menu “IAM & Admin > Quotas”.
I checked the "In-Use IP Addresses" and it doesn't seem like I have a problem with quotas:
Looking for the error, I found that in the Activity tab, when deploying, I get an error. Apparently , when App Engine is trying to delete a VM, the process starts to loop trying to delete it. You can see the error:
(I intentionally covered the project ID)
Edit: It seem like the problem is only with southamerica-east1. I created a new project in southamerica-east1 but I kept getting the same error, so then I created a new project with the App Engine in us-west2 and worked like a charm (I used the same application and app.yaml). I wonder if the problem is GCP southamerica-east1 or a unknown bad configuration by my side.
This is probably related to this issue: https://issuetracker.google.com/u/2/issues/73583699. It does mentioned the "in-use IP Address" quota, but many people have posted in recent days (Nov 2018) indicating that they are seeing the error and have verified that they have not hit their quota.
Unfortunately, no solution has been posted and there hasn't been any recent comment from the devs.
First, our apologies that you’ve experienced this issue. Be assured that we are aware of the situation and the team works hard to resolve it.
Our goal is to make sure that there are available resources in all zones. This
type of issue is rare. When a situation like this occurs, or is about to
occur, our team is notified immediately and the issue is investigated.
We recommend deploying and balancing your workload across multiple zones or
regions to reduce the likelihood of an outage. Please review our documentation
which outlines how to build resilient and scalable architectures on Google
Cloud Platform.
For the time being, you can try relaxing your requirements (e.g. requesting a smaller instance or one with fewer resources) or removing the external IP requirement.
If that proves not to be enough, you can try deploying your application to another region
Again, we want to offer our sincerest apologies.
Thanks for understanding.
At the end we didn't find a real solution so we moved all our services from Brazil to US-2. I'm not sure if the Region is the problem, but there in US-2 all works like a charm
On our development environment, we have been charged about USD100 every month for an instance we didn't know existed (and of course we are not using), and we cant find in the entire AppEngine or Console Engine.
Also, the usage report shows no activity for the whole month, but we are still getting the charges.
The instance is: Flex Instance Core Hours Sao Paulo
I found similar posts in stackoverflow, so, here are the questions:
- is this some bad strategy from Google???
- where can I see this instance to stop it or delete it?
- where can I see who started this instance and when?
Of course, I called google support and no answer received.
Many thanks!
Google Cloud Platform Support here! I found your ticket and see that you were provided an answer there already. In addition to what Dan described in his answer, if your app has currently the "Serving" status it will still run with the corresponding instances regardless of any requests incoming or not. As long as the version is serving it will continue to bill for hours that you are using. Also, if you are using automatic scaling with a minimum number of instances:
that specified number of instances run as resident instances while any
additional instances are dynamic
(Instance scaling description in GCP docs)
You can use basic or manual scaling if this is not what you're interested in.
Check the App Engine Versions pages for all your projects, you should find at least one with Flexible environment. The Deployed column should indicate who deployed it and when.
Based on that information you can decide to keep or delete the respective version(s). Simply stopping the instance may not be sufficient, depending on the scaling configuration for that service version GAE may automatically start one or more new instances.
You should also check the App Engine Instances pages for your projects and cross-reference that with the versions info to make sure no undesired instances are accidentally left behind (at least in the standard environment they are normally stopped when the respective versions are deleted, not entirely certain the same is true for the flex environment)
The running flexible environment instances are billed by the hour, even if they receive no requests, which could explain why you're seeing charges without any activity.
Apparently, the source of this instance was a firebase setting we made to make some test, and it automatically creates this instance. I shut off the billing account for this space, and instantly I received an email from firebase saying it detected changes that make some functions unavailable.
I have encountered a Google App Engine problem for which I can not find an answer despite a lot of searching.
I can make a minor tweak to my code - say changing a constant or something, nothing that breaks the app code - and do a deployment using appcfg.py update.
After deployment, nothing works, and I find a "This app has no instances deployed" message in the developer console. Not a single instance is running. Eventually after about ten minutes I start to see instances appearing, the app working fine.
I have maximum idle instances set to 5, I've enabled warmup, and I am well within both quota and budget. My log files don't tell me anything of use.
So this one's got me stumped. I'm not anxious to see a ten minute outage every time I update my code. What is happening here, and how can I ensure that my instances start up as they should when I deploy new code?
Google app engine refuses to deploy my latest build, and looking at the releases list, I can see that another build has been 'deploying' for the better part of a week.
Google doesn't offer support anymore for this without paying for it, but this is stuff that just shouldn't happen.
Hope one of you google engineers out there can help me with this. The google project is caleld vxlpay.
Have you tried doing an appcfg rollback?
Please cancel the deployment if it gets stuck; just waiting for it to finish often leads to frustration and desk-flipping. There's a few ways to help you deploy the app.
1) Generally, you can simply redeploy after waiting a few minutes.
2) Redeploy with another deployment method (appcfg, Google App Launcher, Eclipse...)
3) Rollback then redeploy
If all 3 fails, there might be something wrong with your configuration and you would probably need to speak to the support engineers at Google.
I ran into to this just now.
I think my issue had something to do with having a browser open to the site I was trying to deploy to. Apparently that was locking up a process or something because, when I closed it, my deploy finished.
Silly, yes. I think it has something to do with GAE attempting to migrate traffic but not dealing with cases where there's browsers open... There's probably a feature that allows for deploying and controlling whether or not traffic is migrated.
I'll have to give that a try to see if closing the connection (browser) resolved it or if it was just a timing thing.
Nope... Just takes an absurdly long time.
Maybe it's due to file sizes?
Note: This only occurs when deploying a Flex environment rather a standard one.