Updating in datastore not working in GAE 1.9.0 - google-app-engine

We have a PHP application running on GAE. It connects to Cloud Datastore using the Google PHP library (v0.6.7).
Google introduced in the last days a new version of App Engine, v1.9.0 (not oficially released), which apparently was running fine, just as 1.8.9 was. However, we have been experiencing some issues related to Cloud Datastore. Sometimes, all the operations regarding to entities updating are just ignored. All the queries used to retrieve information work perfectly, however if we want to create a new entity of update any property, no action is performed. I have been checking for some errors in the response returned by the Cloud Api, but there is no errors or warnings at all.
This situation happened for the first time the 31st of January, and it is also happening today. It started to fail at 3am (GMT +1) and according to the instance log, at the same time the latency times of all the requests suffered an important increase (from 1-3 secs to 5-10 secs). The first time after a few hours the system started to work properly again, but now this problem is lasting much more.
Has anyone experienced anything similar?

Thank you for the report, we're investigating the issue now.
Update: We've addressed the issue. Please join the Google Cloud Datastore downtime notify mailing list for future updates.
https://groups.google.com/forum/?fromgroups=#!topic/gcd-downtime-notify/sNXCFJYFNQU
For future reports about production issues, please refer to the Contact support section of our documentation.

Related

Frequent restarts on GAE application flex environment

I have a GAE application that's set up as a flexible instance, which is expected to be restarted on a weekly basis (and a continually unhealthy instance can be restarted): https://cloud.google.com/appengine/docs/flexible/java/how-instances-are-managed
However, we're seeing this restart ("npm run build" command) several times per week! For example in the past three weeks we've had 9 restarts, and I've confirmed that the log entries leading up are successful 200 responses (no sign of trouble)- all for the active version serving traffic (and not for the other versions that are stopped).
Has anyone seen this symptom before or know of something else that can cause frequent restarts?
Let me know if any other info would be helpful.
An instance restart in the Google App Engine flexible environment can occur for several reasons:
According to the GAE documentation, there is no guarantee that an instance runs indefinitely, it can be restarted due to hardware maintenance, software updates or unforeseen issues. Besides that, as you stated, all instances are restarted on a weekly basis.
An instance can also be restarted if it fails to respond to a specified number of consecutive health check requests.
In case that you observe a unusual number of restarts I recommend you to open a ticket in Google Cloud Platform Support. They have internal tools that are able to check what is going on in the instance and figure out why the restarts are happening.
#DianeKaplan's comment:
Contacting GCP support has given me some a few helpful nuggets so far:
The automatic weekly restart of an instance due to maintenance can occur around different times (so it may only be 5 days since the last one, for example)
our deployments (which result in new GAE versions) make Google Builds
In some cases, a VM was being created overnight and then immediately deleted, where it didn't look like autoscaling was needed. Still looking into this, but was pointed towards the Google Cloud Console section Home > Activity as a good place to find clues

Identify why Google app engine is slow

I developed an application for client that uses Play framework 1.x and runs on GAE. The app works great, but sometimes is crazy slow. It takes around 30 seconds to load simple page but sometimes it runs faster - no code change whatsoever.
Are there any way to identify why it's running slow? I tried to contact support but I couldnt find any telephone number or email. Also there is no response on official google group.
How would you approach this problem? Currently my customer is very angry because of slow loading time, but switching to other provider is last option at the moment.
Use GAE Appstats to profile your remote procedure calls. All of the RPCs are slow (Google Cloud Storage, Google Cloud SQL, ...), so if you can reduce the amount of RPCs or can use some caching datastructures, use them -> your application will be much faster. But you can see with appstats which parts are slow and if they need attention :) .
For example, I've created a Google Cloud Storage cache for my application and decreased execution time from 2 minutes to under 30 seconds. The RPCs are a bottleneck in the GAE.
Google does not usually provide a contact support for a lot of services. The issue described about google app engine slowness is probably caused by a cold start. Google app engine front-end instances sleep after about 15 minutes. You could write a cron job to ping instances every 14 minutes to keep the nodes up.
Combining some answers and adding a few things to check:
Debug using app stats. Look for "staircase" situations and RPC calls. Maybe something in your app is triggering RPC calls at certain points that don't happen in your logic all the time.
Tweak your instance settings. Add some permanent/resident instances and see if that makes a difference. If you are spinning up new instances, things will be slow, for probably around the time frame (30 seconds or more) you describe. It will seem random. It's not just how many instances, but what combinations of the sliders you are using (you can actually hurt yourself with too little/many).
Look at your app itself. Are you doing lots of memory allocations in the JVM? Allocating/freeing memory is inherently a slow operation and can cause freezes. Are you sure your freezing is not a JVM issue? Try replicating the problem locally and tweak the JVM xmx and xms settings and see if you find similar behavior. Also profile your application locally for memory/performance issues. You can cut down on allocations using pooling, DI containers, etc.
Are you running any sort of cron jobs/processing on your front-end servers? Try to move as much as you can to background tasks such as sending emails. The intervals may seem random, but it can be a result of things happening depending on your job settings. 9 am every day may not mean what you think depending on the cron/task options. A corollary - move things to back-end servers and pull queues.
It's tough to give you a good answer without more information. The best someone here can do is give you a starting point, which pretty much every answer here already has.
By making at least one instance permanent, you get a great improvement in the first use. It takes about 15 sec. to load the application in the instance, which is why you experience long request times, when nobody has been using the application for a while

Why am I hitting the datastore read operation quota?

I was in a place without Internet access for 3 weeks and just came back to find out that one of my apps since January 18 started to reach a quota limit (Datastore Read Operations) after around the 18 hours.
I don't see any increase in traffic from either users or crawlers.
This is the error in the logs:
"The API call datastore_v3.RunQuery() required more quota than is available."
It seems very strange since this application has been running for some years and I'm memcaching most the datastore requests.
Please help - This is affecting my bottom line!
Thanks.
I found a subset of pages in the site that had got a sudden interest from several crawlers and some of the requests that those pages made to the Datastore were not being memcached, so that was it...problem solved.
Thanks.

Migration from master/slave to HRD and email limited to 100/day

Almost a week ago I migrated my (paid) app from Master/Slave to HRD. Since that time my app has been restricted to 100 emails/day with a warning indicating "Resource is currently experiencing a short-term quota limit". I know it mentions a limitation until the first successful billing so I was hoping the limit would disappear once that happened - but alas it has not! I have also filled out the "request additional resources" form hoping that might help.
Anyone encountered this problem migrating from master/slave? Any suggestions of who I can contact or how I can recover from this limit? The migration process was relatively smooth - except for this problem which has become a significant impact to my customers.
I am having the same problem right now. I just finished my HRD migration yesterday and only today realized that my Mail API requests are failing because the mail quota on my new HRD app is only at 100 messages per day. I did not have this limitation before, and I find it pretty disappointing that such a triviality is causing my users trouble despite a successful migration. I submitted a quota increase request and hoping I don't also have to wait a week before it's applied.
Anyone who is waiting until the last minute to migrate their app to HRD be warned: make sure you apply for a mail quota increase on your new app because your old setting will not carry over, even if you have billing enabled on both apps.

GAE migration to HRD. Possible to go back?

I migrated a project to HRD using the migration tool.
This seems to have various side effects. Some might be related to unanchored entities or queries. The app doesn't send the notification emails it is supposed to which may have something to do with Billing needed to be re-enabled and Google waiting seven days before applying mail quota.
Anyways, I am wondering if it is possible to roll back to master/slave? Disable or delete the HRD app and re-enable the old one?
As a way for me to buy time, prepare better for this migration and try again at a later time.
I'm afraid that you can't go back because Master/Slave Datatsore is deprecated since April 4, 2012. You could contact the support, but I would suggest you to spend some time and try to fix it on HRD.

Resources