Why is Google Appengine so slow connecting to CloudSQL - google-app-engine

I am seeing a drastic difference in latency between development and production when connecting to a CloudSQL backend, much more so than I would expect.
I ran a test where:
I fetched 125, 250, 500, 1000 and 2000 rows (row size approximately 30bytes)
I fetched each row size 20 times, to get a good sampling of the time
The test was run in three environments:
Hosted appengine
Development mode locally, but connecting to CloudSQL via static IP
Development mode locally and connecting to a local VM running MySQL
Here you can see the results:
Now I would expect some speed fluctuations on the order of 50ms-200ms but 3-4 seconds seems a bit high.
I'm new to appengine, so any newb mistakes that might be causing this? Or other suggestions? I ran a profiler on my code in appengine and there is a call to _apiProxy.Event "wait" that eats up at least 500ms, but didn't go up more than 750ms, other than that, there was any long running calls. A number of shorter running calls that eventually add up of course, but it's not like I have a loop that needs to be tuned or anything.
Thanks in advance!

First off, check the connectivity path you are using: are you connecting via the latest documented method? Cloud SQL used to have a connectivity path which is slower and is now deprecated, but still functioning, so you could be accessing via that.
Second, is the App Engine app and Cloud SQL instance in the same location? Check that the "Preferred Location" in your Cloud SQL settings are set to follow the app engine app you are connecting to.
As a last possibility, which seems unlikely given that you have data connecting locally, make sure you are reusing database connections, making new ones can be expensive. If there was some reason why your app is reusing connections locally but making new ones on the App Engine side, that could create this behavior. But like I said, this one seems unlikely.

Related

How to access terabytes of data sitting in cloud quickly?

We have Terabytes of data sitting in the google hard drives. Initially, since we were using google cloud VMs, so we were doing development work in the cloud and were able to access the data.
Now, we bought our own servers where our application is running and we are bringing the data to our local disks which would be accessed by our application. The things is transferring the data especially terabytes on network using scp is quite slow. Can anyone suggest a way to fix this issue?
What I am thinking is there isn't a way that we can keep running a script waiting for a request on the google cloud instance(it send the requested data over HTTP!), and from local_server, we can request for data at a time!
I know this again is happening over the network but, I think we can scale in this approach, but I could be wrong!. it's kind of client-server(1:1) layout using in building interaction between frontend and backend! any suggestions?
Would that be slow? slower than bringing the data using SCP!
You could download the full VM disk and mount int on you servers or download the disk then just copy the data and delete the VM disk. For any case you should follow the next steps:
Create a snapshot of your VM which will have all the data.
Build and export the VM image to your servers.
Run the image on your servers according to GCE requirements.
It would take a lot less of time, since you're doing it on on premises and avoiding network traffic.

Google Cloud Bigtable Python Client Performance Issue

I'm running into a performance issue with Google Cloud Bigtable Python Client. I'm working on a flask API that writes to and reads from a GCP Bigtable instance. The API uses the python client to communicate with Bigtable, and was deployed to GCP App Engine flexible environment.
Under low traffic, the API works fine. However during a load test, the endpoints that reads and writes to Bigtable suffers a huge performance decrease compare to a similar endpoint that doesn't communicate with Bigtable. Also, a large percentage of requests went to the endpoint receives a 502 Bad Gateway, even when health check was turned off in App Engine.
I'm aware of that the client is currently in Alpha. I wonder if the performance issue is known, or if anyone also ran into the same issue
Update
I found a documentation from Google stating:
There are issues with the network connection. Network issues can
reduce throughput and cause reads and writes to take longer than
usual. In particular, you'll see issues if your clients are not
running in the same zone as your Cloud Bigtable cluster.
In my case, my client is in a different region, by moving it to the same region had a huge increase in performance. However the performance issue still exist, and the recommendation from the documentation is to put client in the same zone as Bigtable.
I also considered using Container engine or Compute Engine where it is easier to specify the zone, but I want stay with App Engine for its autoscale functionality and managed services.
Bigtable client take somewhere between 3 ms to 20 ms to complete each request, and because python is single threaded, during that period of time it will just wait until the response comes back. The best solution we found was for any writes, publish the request to Pubsub, then use Dataflow to write to Bigtable. It is significantly faster because publishing a message in Python would take way below 1 ms to complete, and because Dataflow can be set to exactly the same region as Bigtable, and it is easy to parallel, it can write much faster.
Though it doesn't solve the scenario where you need frequent read or write need to be instantaneous

Java instance memory usage going up without requests

I have a strange problem with GAE Java. There are two instances with basic scaling for the version I am using, with one being used and the other idling, from what I can see in the log. Response times are fine. I can see that my idle instance did not receive any requests for the last hour. Strangely, on the idle instance the memory usage goes up constantly at around a 2MB/minute. For the last hour. The instance is using a google JDBC connection to a mysql cloud sql instance. I am using a DBCP 1.4 connection pool with 2 connections, but I don't think there would be any active processing being done, as a background thread should not even be possible on appengine.
It is on ca. 730MB for a B2 instance (256MB?) and will probably get restarted soon because of memory usage.
I am also using tracing on the connection (com.google.cloud.trace.instrumentation.jdbc 0.1.1) but again I dont think this will do anything as long as there are not queries.
How could this happen? And how could I find the memory leak? I think normally threads would be stopped after 30s. And the JDBC driver from google should not be somehow filling up memory by itself I would guess.
To answer my own question: It seems like its not related to JDBC at all. It seems to be a problem of the endpoint service control APIs:
Cloud endpoint management leaking memory?

Fastest Open Source Content Management System for Cloud/Cluster deployment

Currently clouds are mushrooming like crazy and people start to deploy everything to the cloud including CMS systems, but so far I have not seen people that have succeeded in deploying popular CMS systems to a load balanced cluster in the cloud. Some performance hurdles seem to prevent standard open-source CMS systems to be deployed to the cloud like this.
CLOUD: A cloud, better load-balanced cluster, has at least one frontend-server, one network-connected(!) database-server and one cloud-storage server. This fits well to Amazon Beanstalk and Google Appengine. (This specifically excludes CMS on a single computer or Linux server with MySQL on the same "CPU".)
To deploy a standard CMS in such a load balanced cluster needs a cloud-ready CMS with the following characteristics:
The CMS must deal with the latency of queries to still be responsive and render pages in less than a second to be cached (or use a precaching strategy)
The filesystem probably must be connected to a remote storage (Amazon S3, Google cloudstorage, etc.)
Currently I know of python/django and Wordpress having middleware modules or plugins that can connect to cloud storages instead of a filesystem, but there might be other cloud-ready CMS implementations (Java, PHP, ?) and systems.
I myself have failed to deploy django-CMS to the cloud, finally due to query latency of the remote DB. So here is my question:
Did you deploy an open-source CMS that still performs well in rendering pages and backend admin? Please post your average page rendering access stats in microseconds for uncached pages.
IMPORTANT: Please describe your configuration, the problems you have encountered, which modules had to be optimized in the CMS to make it work, don't post simple "this works", contribute your experience and knowledge.
Such a CMS probably has to make fewer than 10 queries per page, if more, the queries must be made in parallel, and deal with filesystem access times of 100ms for a stat and query delays of 40ms.
Related:
Slow MySQL Remote Connection
Have you tried Umbraco?
It relies on database, but it keeps layers of cache so you arent doing selects on every request.
http://umbraco.com/azure
It works great on azure too!
I have found an excellent performance test of Wordpress on Appengine. It appears that Google has spent some time to optimize this system for load-balanced cluster and remote DB deployment:
http://www.syseleven.de/blog/4118/google-app-engine-php/
Scaling test from the report.
parallel
hits GAE 1&1 Sys11
1 1,5 2,6 8,5
10 9,8 8,5 69,4
100 14,9 - 146,1
Conclusion from the report the system is slower than on traditional hosting but scales much better.
http://developers.google.com/appengine/articles/wordpress
We have managed to deploy python django-CMS (www.django-cms.org) on GoogleAppEngine with CloudSQL as DB and CloudStore as Filesystem. Cloud store was attached by forking and fixing a django.storage module by Christos Kopanos http://github.com/locandy/django-google-cloud-storage
After that, the second set of problems came up as we discovered we had access times of up to 17s for a single page access. We have investigated this and found that easy-thumbnails 1.4 accessed the normal file system for mod_time requests while writing results to the store (rendering all thumb images on every request). We switched to the development version where that was already fixed.
Then we worked with SmileyChris to fix unnecessary access of mod_times (stat the file) on every request for every image by tracing and posting issues to http://github.com/SmileyChris/easy-thumbnails
This reduced access times from 12-17s to 4-6s per public page on the CMS basically eliminating all storage/"file"-system access. Once that was fixed, easy-thumbnails replaced (per design) file-system accesses with queries to the DB to check on every request if a thumbnail's source image has changed.
One thing for the web-designer: if she uses a image.width statement in the template this forces a ugly slow read on the "filesystem", because image widths are not cached.
Further investigation led to the conclusion that DB accesses are very costly, too and take about 40ms per roundtrip.
Up to now the deployment is unsuccessful mostly due to DB access times in the cloud leading to 4-5s delays on rendering a page before caching it.

Identify why Google app engine is slow

I developed an application for client that uses Play framework 1.x and runs on GAE. The app works great, but sometimes is crazy slow. It takes around 30 seconds to load simple page but sometimes it runs faster - no code change whatsoever.
Are there any way to identify why it's running slow? I tried to contact support but I couldnt find any telephone number or email. Also there is no response on official google group.
How would you approach this problem? Currently my customer is very angry because of slow loading time, but switching to other provider is last option at the moment.
Use GAE Appstats to profile your remote procedure calls. All of the RPCs are slow (Google Cloud Storage, Google Cloud SQL, ...), so if you can reduce the amount of RPCs or can use some caching datastructures, use them -> your application will be much faster. But you can see with appstats which parts are slow and if they need attention :) .
For example, I've created a Google Cloud Storage cache for my application and decreased execution time from 2 minutes to under 30 seconds. The RPCs are a bottleneck in the GAE.
Google does not usually provide a contact support for a lot of services. The issue described about google app engine slowness is probably caused by a cold start. Google app engine front-end instances sleep after about 15 minutes. You could write a cron job to ping instances every 14 minutes to keep the nodes up.
Combining some answers and adding a few things to check:
Debug using app stats. Look for "staircase" situations and RPC calls. Maybe something in your app is triggering RPC calls at certain points that don't happen in your logic all the time.
Tweak your instance settings. Add some permanent/resident instances and see if that makes a difference. If you are spinning up new instances, things will be slow, for probably around the time frame (30 seconds or more) you describe. It will seem random. It's not just how many instances, but what combinations of the sliders you are using (you can actually hurt yourself with too little/many).
Look at your app itself. Are you doing lots of memory allocations in the JVM? Allocating/freeing memory is inherently a slow operation and can cause freezes. Are you sure your freezing is not a JVM issue? Try replicating the problem locally and tweak the JVM xmx and xms settings and see if you find similar behavior. Also profile your application locally for memory/performance issues. You can cut down on allocations using pooling, DI containers, etc.
Are you running any sort of cron jobs/processing on your front-end servers? Try to move as much as you can to background tasks such as sending emails. The intervals may seem random, but it can be a result of things happening depending on your job settings. 9 am every day may not mean what you think depending on the cron/task options. A corollary - move things to back-end servers and pull queues.
It's tough to give you a good answer without more information. The best someone here can do is give you a starting point, which pretty much every answer here already has.
By making at least one instance permanent, you get a great improvement in the first use. It takes about 15 sec. to load the application in the instance, which is why you experience long request times, when nobody has been using the application for a while

Resources