Randomly getting 121 error on Google App Engine - google-app-engine

I experiment same error than this question with a very small application written in Java.
Randomly a servlet throws a 500 error with Error Code 121 in description but no Stack Trace.
Here is the log :
23/Jun/2013:01:37:11 -0700] "GET /premierQuestionnaire?annee=DES3&desc=false&installerLiberal=false&connaitAucun=on&roleNational=on&adhererEmblee=false HTTP/1.1" 500 0 - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36" "these-emilien.appspot.com" ms=569 cpu_ms=0 loading_request=1 exit_code=121 app_engine_release=1.8.1 instance=00c61b117ced60a7064344269a551e9083a10fac
I 2013-06-23 10:37:11.033
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
W 2013-06-23 10:37:11.033
A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. (Error code 121)
Tried to go into issue he proposed but they are restricted now,, so can't be accessed. No communication from Google. No error or system failure on status page.
If anyone has an idea, or news that would be a great thing.

Solution moved from #CreatixEA's question post.
I found an answer from staff :
Beginning around 11pm on Monday, May 7th and continuing until 11am on
Tuesday, May 8th, some App Engine applications saw errors marked
“error code 121” in their application logs, which resulted from
unnecessary instance termination.
A week prior to this issue, a change was made to the infrastructure
underlying the App Engine scheduler, which disrupted our memory
accounting system. This issue was slow to surface, and did not result
in any serious impact to our users before our existing monitoring
caught the error. To address this issue, we rolled out an alternative
method for memory accounting on May 7th. This alternate method
overestimated the amount of memory currently in use, and our
schedulers slowly started accumulating incorrect values for memory
usage.
The overestimation caused the App Engine scheduler to erroneously
assume that our infrastructure was under constant memory pressure,
which in turn resulted in over-aggressive termination of instances,
which was visible as “error code 121” in an affected application's
logs. Upon aggregating user reports of the issue on the morning of May
8th, our reliability team determined the source of the erroneous
calculations, and rolled out a new fix which corrected the memory
usage overestimation, and halted the unnecessary terminations.
We do not consider the time to repair an issue at this level of impact
acceptable. We are adding new alerts and tools for memory accounting
to prevent recurrence of similar issues in the future and to decrease
our response time. Application code or Admin Console settings did not
influence whether your application was affected by this issue, and no
changes to your code or settings are needed.
We appreciate your patience during this issue and apologize for any
inconvenience this caused you or your customers. If you feel that your
paid application experienced an SLA violation, please fill out this
form.
Regards, Christina Ilvento on behalf of the Google App Engine Team

Related

AWS Lightsail Metric graphs "No data available"

We're using AWS Lightsail PostgreSQL Database. We've been experiencing errors with our C# application timing out when using the connection to database. As I'm trying to debug the issue, I went to look at the Metric graphs in AWS. I noticed that many of the graphs have frequent gaps in the data, labeled No data available. See image below.
This graph (and most of the other metrics) shows frequent gaps in the data. I'm trying to understand if this is normal, or could be a symptom of the problem. If I go back to 2 weeks timescale, there does not appear to be any other strange behaviors in any of the metric data. For example, I do not see a point in time in the past where the CPU or memory usage went crazy. The issue started happening about a week ago, so I was hoping the metrics would have helped explained why the connections to the PostgreSQL database are failing from C#.
🔶 So I guess my question is, are those frequent gaps of No data available normal for a AWS Lightsail Postgres Database?
Other Data about the machine:
1 GB RAM, 1 vCPU, 40 GB SSD
PostgreSQL database (12.11)
In the last two weeks (the average metrics show):
CPU utilization has never gone over 20%
Database connections have never gone over 35 (usually less than 5) (actually, usually 0)
Disk queue depth never goes over 0.2
Free storage space hovers around 36.5 GB
Network receive throughput is mostly less than 1 kB/s (with one spike to 141kB/s)
Network transmit throughput is mostly less than 11kB/s with all spikes less than 11.5kB/s
I would love to view the AWS logs, but they are a month old, and when trying to view them they are filled with checkpoint starting/complete logs. They start at one month ago and each page update only takes me 2 hours forward in time (and taking ~6 seconds to fetch the logs). This would require me to do ~360 page updates, and when trying, my auth timed out. 😢
So we never figured out the reason why, but this seems like it was a problem with the AWS LightSail DB. We ended up using a snapshot to create a new clone of the DB, and wiring the C# servers to the new DB. The latency issues we were having disappeared and the metric graphs looked normal (without the strange gaps).
I wish we were able to figure out the root of the problem. ATM, we are just hoping the problem does not return.
When in doubt, clone everything! 🙃

Google App Engine - Sudden increase in latency

We have seen a sudden latency increase in our application on Google App Engine latency within the past few hours. The logs show that requests fail with message "Request was aborted after waiting too long to attempt to service your request.", with no stack-trace or any other relevant information. Users get an empty page with message "Rate exceeded.". No changes have been done to the application that correlate to this spike in latency.
The application is therefore down, with no information from app engine that can help point to the source of the latency.
We have filed a issue in the issuer tracker, no luck in getting response yet.
Does anyone have ideas on what we could do to deal with this kind of situation?
Update
The problem went away after 3 hours as suddenly as it came, and without any intervention on our part. Since there is consensus on min_idle_instances, we have decided to leave all the setting as they have always been so that we can see if this ever happens again. If it does happen, we will have an opportunity to test this by making the suggested changes, and post an update here.
Here is a screen shot for the entire incident:
The comment that #Parth Mehta added is useful and it made me think of what could be causing your issues.
I'm thinking that maybe your increased latency is due to not having idle instances ready for the requests as they increase and come in, so when requests increase a bit is taken until the new instances are ready and there might be your latency cause.
Setting enough min_idle_instances might alleviate the 500's as they would be warm and ready for the requests.
If this doesn't solve your issue I would recommend creating a case with GCP Support and we will surely be able to assist you more.
Try it and let us know.

Resource usage increases abnormally in Google App Engine

I am using GAE for my product development. Now it is MVP version and in development phase. So the user traffic is still small.
But today (14:00 27th Feb 2017 UTC+7) when I saw the statistics I cannot believe my eyes. So huge numbers of requests and bandwidth. I am attaching here:
Abnormal resource usage
My request log and the App Engine Dashboard show the same traffic as usual (very small). So I think the Resource usage numbers is not correct.
Dashboard and Request log
Please explain me where the huge number of requests come from?
Here is the answer I got from Google Cloud Support. I am sharing below, in case someone get the same problem as I had:
The number of requests you see on your App Engine Dashboard is not 100% reliable as it sometimes reflect the only the projected usage not the exact usage. Your exact bill will only be reflected on your transaction history and it takes 24-48 hours for it to refresh to be able to display your usage for February 27, 2017. We can wait for 24-48 hours to validate that your transaction history will be the one to show your charges and usage.
And here is the final resource usage statistics

Google Email Migrator API is too slow

We know from the documentation there is a theoretical limit of 1 message per user per second, but we aren't coming anywhere close to that while running email migrations on a high-end server. What should we do? Should we increase the amount of threads per user to more than one (even though the documentation suggests only 1 thread per user)? I've used their GAMME tool and it blows the email migration api away in terms of speed, even on lower end servers.
Does anyone have any suggestions? It's not super-slow, but it's slow enough to be a pain.
The GAMME tool itself utilizes the Email Migration API, it's not doing anything special so there are likely other factors slowing your migration. Are you actually hitting the migration API from AppEngine? If so, you should be able to utilize appstats to profile your application and see if there are other bottlenecks. Where are you pulling messages from?
Do not attempt to use more than 1 thread per user migration, it won't work and you'll get performance issues. DO make sure that you are properly implementing exponential backoff. If your app doesn't acknowledge 503 error codes by backing off exponential (1 second the first time, then 2 seconds, 4, 8, etc) then Google will respond by further throttling your API calls.

Google App Engine - Request was aborted after waiting too long to attempt to service your request

I get this error sometimes.
Request was aborted after waiting too
long to attempt to service your
request. Most likely, this indicates
that you have reached your
simultaneous dynamic request limit.
This is almost always due to
excessively high latency in your app.
Please see
http://code.google.com/appengine/docs/quotas.html
for more details.
The request that causes it has 10 seconds of latency and 0ms of cpu time. It is a simple jsp page that doesn't do anything that takes long at all. Also, my app is very low traffic, and all the times it has happened, it is the only request being processed.
What causes this?
If your application is low-traffic, it's possibly the startup time. There seems to be an ongoing issue where it takes so long to start an instance up, that they breach the time limit.
Some people have "worked around" this by having a cron/scheduled request that runs every few minutes that does nothing (though personally I think is counter-productive, somewhat undermining the reason Google spin your app down!).
There was an issue in their bugtracker about this:
http://code.google.com/p/googleappengine/issues/detail?id=2456
It's now marked as fixed for version 1.4, and there's a little info on it here:
http://googleappengine.blogspot.com/2010/12/happy-holidays-from-app-engine-team-140.html
Always On - For high-priority applications with low or variable traffic, you can now reserve instances via App Engine's Always On feature. Always On is a premium feature costing $9 per month which reserves three instances of your application, never turning them off, even if the application has no traffic. This mitigates the impact of loading requests on applications that have small or variable amounts of traffic.

Resources