Google AppEngine automatic scaling (back down) - google-app-engine

I have a php AppEngine app. No scaling config, I'm just using the defaults.
Yesterday it's scaled up to two instances when I was doing some heavy testing. It's only handled one or two requests since then, but is still running 2 instances. How long does it take to scale back down? That was more than 24 hours ago. Thanks!

They usually spin down after 15 minutes. However it will depend heavily on your application profile, and is not guaranteed.
You can configure the maximum idle instances and then they guarantee not to bill you above that threshold.

Related

Understanding Cost Estimate for Google Cloud Platform MicroServices Architecture Design

I'm redesigning a monolith application into a MicroServices architecture and am hoping to use Google Cloud Platform (GCP) to host the entire solution. I'm having a very hard time understanding their costing breakdown, and am concerned that my costs will be uncontrollable after I build it. This is for a personal project but I'm hoping will have many users after I launch so I want to get the underlying architecture right and at the same time have reasonable costs initially when I launch.
Here is my architecture:
MicroServices 1 - 4 (Total 4 API Services):
Runs on App Engine
Exposes a REST API and saves data to DataStore
Initially each API should get hit around 200 times a day
MicroService 5 (Events triggered API Service):
Runs on App Engine
Listens for PubSub events and saves to DataStore (basically I have a sensor that pushes data to this Service for storage)
Initially the PubSub should receive events around 200 times a day
MicroService 6-7 (Total 2 UI Services):
Runs on App Engine
These are UIs so people can login and use the systems. The UIs are lightweight frond end apps that use the REST Services above to populate user data in a nice way.
Each UI Service should be used around 3 hours a day
So in Total I have 7 MicroServices with each running as AppEngine "Services" in a single GCP "Project". A DataStore is shared between these APIs within this Project.
As I have 7 App Engine instances running, and they only need to be operational for a short period of time per day, how does the pricing work?
I want to use App Engine because it's completely Managed, which is one of my design requirements. But I'm hoping AppEngine has some kind of Sleep Mode, so that when there is no usage it does not bill?
Any help in understanding what my monthly costs would be would be appreciated.
Thanks very much.
Update 8/2/2017
I've decided to stay out of GCP for now. As I hope to have 7 App Engines Services running in Flex (as they are node.js) I don't seem to get access to a free tier or the ability to scale idle services to 0 instances.
This means I'll be paying full price for these services. (i.e. 7 X Full App Engine VM Cost per Monthly :O )
This is an expense I cant have just for a POC of a proper MicroService design. Instead I'm going to continue with my MicroService design but use a 10$ DigitalOcean box and Dokku to containerise my Services. If this works well and I have a need I will migrate this design to GCP (or AWS)
The full outline of App Engine instance handling is available at https://cloud.google.com/appengine/docs/python/how-instances-are-managed .
In short, your best bet is to enable automatic scaling and set
max_idle_instances = 0
in your app.yaml.
That means that your app will autoscale to handle traffic as needed and shut down the instances afterwards. Also
When settling back to normal levels after a load spike, the number of idle instances can temporarily exceed your specified maximum. However, you will not be charged for more instances than the maximum number you've specified.
Later - when load time becomes more important you can set min_idle_instances to a more suitable number - this allows for responsive apps.
am concerned that my costs will be uncontrollable after I build it
You should be aware that automatically scalable GAE apps always have cost components dependent on the external user request patterns which are not controllable.
For example, in the standard GAE env, the way those 200 requests/day are distributed matters significantly:
if they are evenly distributed they will come in less than 15 min apart - the minimum billed time per instance lifetime, so the respective service will be billed for minimum 24 instance hours per day (very close to the daily 28 free instance-hours/day for billed apps, only a single-service app using the smallest instance class can fit in it).
if they are all received within a 15 minutes interval the service will be billed for 0.5 instance hours daily (which can easily fit in the free daily quota even with multiple services and/or with more powerfull instance classes).
The actual scalability configuration of each service can matter as well. See, for example,
The only way to keep costs under strict control is via the daily budget configuration (but hitting that limit means your app's functionality will be temporarily crippled).
All other usage-based costs being equal due to the functionality being performed you have some (potentially significant) control over costs via:
the GAE environment type selected for each service:
the standard env is billed by instance hours and includes a free daily quota
the flex env has no free daily quota.
the number of services: you could start with fewer services by combining their functionalities (you can still keep them modularized for later split). The expected initial load you describe can easily fit within the free daily budget with just a single standard env service.
Once the app usage picks up and the free daily quotas percentage in the total costs become neglijible you can gradually split the app into multiple services as needed. In general this can be a relatively simple task if the app is properly modularized.

How to Gain Visibility and Optimize Quota Usage in Google App Engine?

How do I go about optimizing my Google App Engine app to reduce instance hours I am currently using/paying for?
I have been using app engine for a while and the cost has been creeping upwards. I now spend enough on GAE to invest time into reducing the expense. More than half of my GAE bill is due to frontend instance hours, so it's the obvious place to start. But before I can start optimizing, I have to figure out what's using the instance hours.
However, I am having difficulty trying to determine what is currently using so many of my frontend instance hours. My app serves many ajax requests, dynamic HTML pages, cron jobs, and deferred tasks. For all I know there could be some runaway process that is causing my instance usage to be so high.
What methods or techniques are available to allow me to gain visibility into my app to see where I am using instance hours?
Besides code changes (all suggestions in the other answer are good) you need to look into the instances over time graph.
If you have spikes and constant use, the instances created during the spikes wont go to sleep because appengine will keep using them. In appspot application settings, change the "idle instances" max to a low number like 1 (or your actual daily average).
Also, change min latency to a higher number so less instances will be created on spikes.
All these suggestions can make an immediate effect on lowering your bill, but its just a complement to the code optimizations suggested in the other answer.
This is a very broad question, but I will offer a few pointers.
First, examine App Engine's console Dashboard and logs. See if there are any errors. Errors are expensive both in terms of lost business and in extra instance hours. For example, tasks are retried several times, and these reties may easily prolong the life of an instance beyond what is necessary.
Second, the Dashboard shows you the summary of your requests over 24 hours period. Look for requests with high latency. See if you can improve them. This will both improve the user experience and may reduce the number of instance hours as more requests can be handled by each instance.
Also look for data points that surprise you as a developer of your app. If you see a request that is called many more times that you think is normal, zero in on it and see what it is happening.
Third, look at queues execution rates. When you add multiple tasks to a queue, do you really need all of them to be executed within seconds? If not, reduce the execution rate so that the queue never needs more than one instance.
Fourth, examine your cron jobs. If you can reduce their frequency, you can save a bunch of instance hours. If your cron jobs must run frequently and do a lot of computing, consider moving them to a Compute Engine instance. Compute Engine instances are many times cheaper, so having such an instance run for 24 hours may be a better option than hitting an App Engine instance every 15 minutes (or even every hour).
Fifth, make sure your app is thread-safe, and your App Engine configuration states so.
Finally, do the things that all web developers do (or should do) to improve their apps/websites. Cache what can be cached. Minify what needs to be minified. Put images in sprites. Split you code if it can be split. Use Memcache. Etc. All of these steps reduce latency and/or client-server roundtrips, which helps to reduce the number of instances for the same number of users.
Ok, my other answer was about optimizing at the settings level.
To trace the performance at a granular level use the new cloud trace relased today at google i/o 2014.
http://googledevelopers.blogspot.com/2014/06/cloud-platform-at-google-io-enabling.html

Identify why Google app engine is slow

I developed an application for client that uses Play framework 1.x and runs on GAE. The app works great, but sometimes is crazy slow. It takes around 30 seconds to load simple page but sometimes it runs faster - no code change whatsoever.
Are there any way to identify why it's running slow? I tried to contact support but I couldnt find any telephone number or email. Also there is no response on official google group.
How would you approach this problem? Currently my customer is very angry because of slow loading time, but switching to other provider is last option at the moment.
Use GAE Appstats to profile your remote procedure calls. All of the RPCs are slow (Google Cloud Storage, Google Cloud SQL, ...), so if you can reduce the amount of RPCs or can use some caching datastructures, use them -> your application will be much faster. But you can see with appstats which parts are slow and if they need attention :) .
For example, I've created a Google Cloud Storage cache for my application and decreased execution time from 2 minutes to under 30 seconds. The RPCs are a bottleneck in the GAE.
Google does not usually provide a contact support for a lot of services. The issue described about google app engine slowness is probably caused by a cold start. Google app engine front-end instances sleep after about 15 minutes. You could write a cron job to ping instances every 14 minutes to keep the nodes up.
Combining some answers and adding a few things to check:
Debug using app stats. Look for "staircase" situations and RPC calls. Maybe something in your app is triggering RPC calls at certain points that don't happen in your logic all the time.
Tweak your instance settings. Add some permanent/resident instances and see if that makes a difference. If you are spinning up new instances, things will be slow, for probably around the time frame (30 seconds or more) you describe. It will seem random. It's not just how many instances, but what combinations of the sliders you are using (you can actually hurt yourself with too little/many).
Look at your app itself. Are you doing lots of memory allocations in the JVM? Allocating/freeing memory is inherently a slow operation and can cause freezes. Are you sure your freezing is not a JVM issue? Try replicating the problem locally and tweak the JVM xmx and xms settings and see if you find similar behavior. Also profile your application locally for memory/performance issues. You can cut down on allocations using pooling, DI containers, etc.
Are you running any sort of cron jobs/processing on your front-end servers? Try to move as much as you can to background tasks such as sending emails. The intervals may seem random, but it can be a result of things happening depending on your job settings. 9 am every day may not mean what you think depending on the cron/task options. A corollary - move things to back-end servers and pull queues.
It's tough to give you a good answer without more information. The best someone here can do is give you a starting point, which pretty much every answer here already has.
By making at least one instance permanent, you get a great improvement in the first use. It takes about 15 sec. to load the application in the instance, which is why you experience long request times, when nobody has been using the application for a while

Google AppEngine sending all requests to same instance

Lately, I have seen GAE taking much, much longer to process requests than it did just a week ago. Nothing changed in my code, but GAE now is taking 4000-12000ms to respond to requests. What makes is worse is that I have plenty of instances available with 0 requests on them.
Has anyone else seen this happen?
What can I do to fix it?I have gone as far as to spin up 15 extra instances (and paid through the nose for them) but nothing seems to send requests to the other idle instances reliably.
My bill has gone from 70-90c/day to $5-8/day without any code change or increase in traffic. In fact, I am losing traffic because of the huge latency.
QPS* Latency* Requests Errors Age Memory Availability
0.000 0.0 ms 1378 0 10:10:09 57.9 MBytes Dynamic
0.000 0.0 ms 1681 0 15:39:57 57.2 MBytes Dynamic
0.017 9687.0 ms 886 0 10:19:10 56.7 MBytes Dynamic
I recommend installing AppStats to get a picture of what's taking so long in each request. I'd guess that you're having some contention issues or large numbers of reads/writes caused by some new data configuration.
The idle instances won't help decrease latency - it looks like every request takes a long time, and with less than one request per minute (in this sample anyway), 10s requests could run serially on the same instance.
We have a similar problem in our app. In our case, we are under the impression that GAE's scheduler did a poor job in balancing requests to existing instances.
In some cases, the scheduler decided to spin up new instances instead of re-using already existing ones. Since spinning a new instance took 5 to >45 seconds, I suspect this might be what happened to you.
Try to investigate the following and see if it helps you:
Make sure your app has thread-safe enabled so that you could process concurrent requests. You could configure this in your app.yaml if you are using Python, or in your appengine-web.xml if you use Java. Of course, you also need to make sure that the code in your app is threadsafe.
In your application settings, if it is still set on automatic, change the minimum pending latency to a non-automatic setting. I'd suggest around 10 seconds for now, but you could experiment later on which setting would suit you the most. This force the scheduler to wait for a certain time to see if any instance is available within the time before spinning up a new instance.
Now, to answer your original question regarding sending all requests to same instance, as far as I know there is no way to address a specific front-end instance in order to direct the requests to that particular instance.
What you could do is migrate your app to use backend instances instead of the regular frontend instance. Backends provides a way to directly target any particular instance within it. You could deploy your app in a single backend to have more control on the number of instance that you spawn. And since using the backend bypass the scheduler, you would not encounter latencies caused by new instances spinning up.
The major drawback of using this approach is that you lose the auto-scalability benefit of using front-end instances. But seeing from your low daily billing, I think scalability is not yet a major concern for the scale of your app.

is Google App Engine-MapReduce my best bet for a massively parallel solution in a cloud?

Is Google App Engine-MapReduce my best bet for a massively parallel solution in a cloud? My problem takes hours multi-threaded on a 4 core PC. I'd say 600 minutes might do. I would prefer 1000 servers get it done in 36 seconds. Switching from 4 core threading to 1000 server processing is eminently doable in my app. In fact, I can already send 1000 small jobs to 4 cores but it's not going to get done sooner than 4 big jobs to 4 cores given that I still have only 4 cores. (My dataset is small so Map-Reduce, which was designed for large datasets, might have a different sweet-spot than my type of compute-bound problem.)
I think I can get this done if I have 1000 simultaneous URL fetches but as you may know Google limits at 10 requests. It seems Google is actively discouraging outsiders from putting massively parallel solutions on their infrastructure.
I started looking into Google App Engine because upon deployment there will be very few users and it appeared App Engine has fine-grained costs - a feature I really like. My impression was that Amazon EC2 would be more work but also that costs were more likely to be chunky. Given that I'm a home-based business, I don't want to pay anything more than a nominal amount when in the early months I don't expect a lot of visitors to my website. May be they will never visit.
In general, where do people turn to for massively parallel (compute-bound) problems that ought to be served by a cloud?
For compute bound tasks, EC2 is often better than App Engine. App Engine is focused on serving web requests, not pure number crunching. It is not designed to go from 0 requests this minute to 1000 requests the next minute and back to 0 requests the minute after that. In fact, one of its features is that you generally don't need explicit control over how many instances are running at once. Also, long running jobs are not possible, though for many tasks you can use Task Queues to chain jobs together. I think the current limit on background tasks is 10 minutes.
EC2 does have a super low tier of service that you can get for free. EC2 lets you explicitly bring servers up and down, but I think the smallest increment you can pay for is 1 hour.
Of course, if you want to literally run your job on 1000 servers, neither app engine nor EC2 will likely let you do that for free. Both are very elastic/adaptive, but bringing 1000 servers up for 30 seconds of work is not very economical for them. On App Engine you will likely run up against an hourly or daily quota before you had 1000 simultaneous instances running. On EC2, you generally pay by the server instance. So you would be paying for 1000 hours of instance time. Of course, one of Amazon's High CPU instances might be much more powerful than your PC, so maybe you'd only need a 100 or so. or maybe you could compromise and have only 20 instances running at a time, meaning it takes a few minutes to finish your computation, but you don't go broke.
Have you checked Amazon's Elastic MapReduce? http://aws.amazon.com/elasticmapreduce/
With App Engine you should also investigate the task queues. If you already know how to split the big problem into many small ones, you could create one task that takes in the big problem and then creates 1000 (or 10.000) subtasks to tackle the smaller problems. And after that collect the results in one task, if needed.
Individual tasks can run up to 10 minutes before they are terminated, which makes them a little bit easier to use for computing tasks than regular requests.

Resources