Google appengine, least expensive way to run heavy datastore write cron job? - google-app-engine

I have a Google appengine application, written in Go, that has a cron process which runs once a day at 3am. This process looks at all of the changes that have happened to my data during the day and stores some meta data about what happened. My users can run reports on this meta data to see trends that have happened over several months. The process does around 10-20 million datastore writes every night. It all works just fine, but since I have started running it I have noticed a significant increase in my monthly bill from Google (from around $50/month to around $400/month).
I have just setup a very basic taskqueue that this runs in, I have not changed the default settings at all. Is there a better way that I could be running this process at night that could save me money? I have never messed around with the backends (which are now depreciated) or the modules api, and I know they've changed a lot of this stuff recently so I'm not sure where to start looking. Any advice would be greatly appreciated.

Look at your instances at 3am. It might be that GAE spins up a lot of them to handle the job. You could configure your job to make it run less paralel so it will take longer but perhaps it will need only 1 instance then.
However, if your database writes are indeed the biggest factor this won't make a big impact.
You can try looking at your data models and indexes. Remember that each indexed field costs 2 writes extra, so see if you can remove indexes from some fields if you don't need them.

One improvement that you can do is to batch your write operations, you can use memcache for this (pay the dedicated one since it's more reliable). Write the updates to memcache, once it's about 900K, flush it to datastore. This will reduces the number of write to datastore A LOT, especially if your metadata's size is small.

Related

How to Gain Visibility and Optimize Quota Usage in Google App Engine?

How do I go about optimizing my Google App Engine app to reduce instance hours I am currently using/paying for?
I have been using app engine for a while and the cost has been creeping upwards. I now spend enough on GAE to invest time into reducing the expense. More than half of my GAE bill is due to frontend instance hours, so it's the obvious place to start. But before I can start optimizing, I have to figure out what's using the instance hours.
However, I am having difficulty trying to determine what is currently using so many of my frontend instance hours. My app serves many ajax requests, dynamic HTML pages, cron jobs, and deferred tasks. For all I know there could be some runaway process that is causing my instance usage to be so high.
What methods or techniques are available to allow me to gain visibility into my app to see where I am using instance hours?
Besides code changes (all suggestions in the other answer are good) you need to look into the instances over time graph.
If you have spikes and constant use, the instances created during the spikes wont go to sleep because appengine will keep using them. In appspot application settings, change the "idle instances" max to a low number like 1 (or your actual daily average).
Also, change min latency to a higher number so less instances will be created on spikes.
All these suggestions can make an immediate effect on lowering your bill, but its just a complement to the code optimizations suggested in the other answer.
This is a very broad question, but I will offer a few pointers.
First, examine App Engine's console Dashboard and logs. See if there are any errors. Errors are expensive both in terms of lost business and in extra instance hours. For example, tasks are retried several times, and these reties may easily prolong the life of an instance beyond what is necessary.
Second, the Dashboard shows you the summary of your requests over 24 hours period. Look for requests with high latency. See if you can improve them. This will both improve the user experience and may reduce the number of instance hours as more requests can be handled by each instance.
Also look for data points that surprise you as a developer of your app. If you see a request that is called many more times that you think is normal, zero in on it and see what it is happening.
Third, look at queues execution rates. When you add multiple tasks to a queue, do you really need all of them to be executed within seconds? If not, reduce the execution rate so that the queue never needs more than one instance.
Fourth, examine your cron jobs. If you can reduce their frequency, you can save a bunch of instance hours. If your cron jobs must run frequently and do a lot of computing, consider moving them to a Compute Engine instance. Compute Engine instances are many times cheaper, so having such an instance run for 24 hours may be a better option than hitting an App Engine instance every 15 minutes (or even every hour).
Fifth, make sure your app is thread-safe, and your App Engine configuration states so.
Finally, do the things that all web developers do (or should do) to improve their apps/websites. Cache what can be cached. Minify what needs to be minified. Put images in sprites. Split you code if it can be split. Use Memcache. Etc. All of these steps reduce latency and/or client-server roundtrips, which helps to reduce the number of instances for the same number of users.
Ok, my other answer was about optimizing at the settings level.
To trace the performance at a granular level use the new cloud trace relased today at google i/o 2014.
http://googledevelopers.blogspot.com/2014/06/cloud-platform-at-google-io-enabling.html

Identify why Google app engine is slow

I developed an application for client that uses Play framework 1.x and runs on GAE. The app works great, but sometimes is crazy slow. It takes around 30 seconds to load simple page but sometimes it runs faster - no code change whatsoever.
Are there any way to identify why it's running slow? I tried to contact support but I couldnt find any telephone number or email. Also there is no response on official google group.
How would you approach this problem? Currently my customer is very angry because of slow loading time, but switching to other provider is last option at the moment.
Use GAE Appstats to profile your remote procedure calls. All of the RPCs are slow (Google Cloud Storage, Google Cloud SQL, ...), so if you can reduce the amount of RPCs or can use some caching datastructures, use them -> your application will be much faster. But you can see with appstats which parts are slow and if they need attention :) .
For example, I've created a Google Cloud Storage cache for my application and decreased execution time from 2 minutes to under 30 seconds. The RPCs are a bottleneck in the GAE.
Google does not usually provide a contact support for a lot of services. The issue described about google app engine slowness is probably caused by a cold start. Google app engine front-end instances sleep after about 15 minutes. You could write a cron job to ping instances every 14 minutes to keep the nodes up.
Combining some answers and adding a few things to check:
Debug using app stats. Look for "staircase" situations and RPC calls. Maybe something in your app is triggering RPC calls at certain points that don't happen in your logic all the time.
Tweak your instance settings. Add some permanent/resident instances and see if that makes a difference. If you are spinning up new instances, things will be slow, for probably around the time frame (30 seconds or more) you describe. It will seem random. It's not just how many instances, but what combinations of the sliders you are using (you can actually hurt yourself with too little/many).
Look at your app itself. Are you doing lots of memory allocations in the JVM? Allocating/freeing memory is inherently a slow operation and can cause freezes. Are you sure your freezing is not a JVM issue? Try replicating the problem locally and tweak the JVM xmx and xms settings and see if you find similar behavior. Also profile your application locally for memory/performance issues. You can cut down on allocations using pooling, DI containers, etc.
Are you running any sort of cron jobs/processing on your front-end servers? Try to move as much as you can to background tasks such as sending emails. The intervals may seem random, but it can be a result of things happening depending on your job settings. 9 am every day may not mean what you think depending on the cron/task options. A corollary - move things to back-end servers and pull queues.
It's tough to give you a good answer without more information. The best someone here can do is give you a starting point, which pretty much every answer here already has.
By making at least one instance permanent, you get a great improvement in the first use. It takes about 15 sec. to load the application in the instance, which is why you experience long request times, when nobody has been using the application for a while

On the app engine development server why does the "Applying all pending transactions and saving the datastore" step take several seconds?

It seems like something that should take place in a flash, at least locally. Is there some time emulation of the production server that causes it?
It depends how much data is saved in your local datastore and how fast your disk is. I've noticed if you have a lot of deleted data in your datastore, the file still ends up being big.
If you want to clear the database, it's better to delete the whole file than delete all the individual entities.

GAE: How many task per second is enough?

I'm using GAE and task queue. In queue.yaml file, I keep default setting: 5/s. 1 month ago I thought it's enough but now there are about 40-50 tasks in one queue so my application runs too slow.
I want to know how many tasks per second is enough ? Can I change to 100/s ?
Thank you :)
Update:
My application gets data from some social networks, calculate and save to datastore. To over limit 30 seconds of GAE, I split this operation to tasks. I want to know the limit of GAE task queue before change and deploy to GAE :)
http://code.google.com/appengine/docs/python/taskqueue/overview.html#Quotas_and_Limits
or
http://code.google.com/appengine/docs/java/taskqueue/overview.html#Quotas_and_Limits
I'd highly recommend increasing the settings in steps to find your performance sweet spot. The number of tasks needed to run is obviously higher than 5/s but you don't know what's the proper number until you've tried running for a while, and heading straight to the top doesn't sound like a good idea.

Is GAE a viable platform for my application? (if not, what would be a better option?)

Here's the requirement at a very high level.
We are going to distribute desktop agents (or browser plugins) to collect certain information from tons of users (in thousands or possibly millions down the road).
These agents collect data and periodically upload it to a server app.
The server app will allow for analyzing collected data (filter, sort etc based on 4-5 attributes) and summarize in form of charts etc.
We should also be able to export some of the collected data (csv or pdf)
We are looking for an platform to host the server app. GAE seems attractive because of low administrative cost and scalability (as users base increases, the platform will handle the scale... hopefully!).
Is GAE a viable option for us?
One important consideration is that sometimes the volume of uploads from the agents can exceed 50MB per upload cycle. We will have users in places where Internet connections could be very slow too. Apparently GAE has a limit on the duration a request can last. The upload volume may cause the request (transferring data from an agent to the server) to last longer than 30 seconds. How would one handle such situation?
Thanks!
The time of the upload is not considered part of the script execution time, so no worries there.
Google App Engine is very good to perform a vast number of smaller jobs but not so much to do complex long running background jobs (because of the 30 sec limit + even smaller database connection time limit). So probably GAE would be a very good platform to GATHER the data but not for actually ANALYZING it. You probably would like to separate these two.
We went ahead an implemented the first version on GAE anyway. The experience has been very much what is described here http://www.carlosble.com/?p=719
For a proof-of-concept prototype, what we have built so far is acceptable. However, we have decided not to go with GAE (at least in its current shape) for the production version. The pains somewhat outweigh the benefits in our case.
The problems we faced were numerous. Unlike my experience dealing with J2EE stacks, when you run into an issue, many a times it is a dead end. Workarounds are very complicated and ugly, if you can find one.
By writing good prototypes one could figure out whether GAE is right for the solution being built, however, the hype is a problem. Many newcomers are going get overly excited about GAE due to its hype and end up failing badly. Because they will choose GAE for all kinds of purpose that it is not suitable for.

Resources