Automatic instance scaling depends on various factors like number of concurrent requests, cpu utilization, etc. I would like to be able to look at the App Engine dashboard and see which factor caused the number of instances to increase.
For cpu utilization, it is not clear what the comparison should be. The dashboard presents cpu utilization in terms of Megacycles per second, but the autoscaling cpu utilization parameter is just a number between 0.5 and 0.95.
From here an F1 instance apparently has a cpu limit of 600 MHz. This is frequency, not a cpu limit. Should I interpret this instead as a fully utilized F1 instance can hit 600 Megacycles per second?
And therefore, if I set a target_cpu_utilization = 0.5, I can expect autoscaling to increase the number of instances if the dashboard shows a cpu usage of more than 300 Megacycles/sec * # instances?
Indeed, there are many factors that impact on the scaling on App Engine. There are three types of scaling that you can configure on your application, that will impact the way that it will be scaled. The three types are: Automatic scaling, Basic Scaling and Manual scaling.
I would recommend you to take a look at the documentation How Instances are Managed. This documentation provides more insights on how the scaling occurs on App Engine.
Besides that, in the following articles, you can check for more information on how to configure and set the factors that control the scaling - which will be upscaled or not, etc. - that I believe should help you as well.
Designing for scale on App Engine standard environment
app.yaml Configuration File
Let me know if the information helped you!
Related
I'm using App engine to concurrently handle a number of long running tasks (therefore I need to use basic scaling).
I noticed with one instance, only 8 tasks can be handled simultaneously (consistent with the number of workers for a B4 instance). For the ninth task I receive:
POST 503: Request was aborted after waiting too long to attempt to service your request.
How can I handle more task than this simultaneously without adding more instances?
As a best practice, the number of workers you specify should match the instance class of your App Engine app, but you can change it by modifying the number of workers in the entrypoint as in the example below and try and see if it works for you.
entrypoint: gunicorn -b :8080 -w 2 main:app
Consider that a service with basic scaling is configured by setting the maximum number of instances in the max_instances parameter of the basic_scaling setting. You can control the number of live instance scales with the processing volume by changing to manual scaling.
If you use basic scaling, App Engine attempts to keep your cost low, even though that may result in higher latency as the volume of incoming requests increases.
If you tune the scaling settings to reduce costs by minimizing idle instances, then you run the risk of seeing latency spikes if the load increases unexpectedly.
Basic scaling type is designed to minimize costs at the expense of latency.
Your code needs to scale the number of workers based on processing volume. If your code does not handle scaling, you risk wasting computing resources if there are no tasks to process; you also risk latency if you have too many tasks to process.
A good way to speed up requests is to make use of multiple caching layers.
This article is helpful to handle the instance settings and modify it to get the desired performance.
Have you tried increasing max_concurrent_requests in your app.yaml? It should be defaulting to being able to handle 10 requests at a time.
https://cloud.google.com/appengine/docs/standard/python3/config/appref#max_concurrent_requests
I created a site on App Engine and chose the smallest F1 instance class which according to the docs has a CPU Limit of 600 MHz.
I limited the app to 1 instance only as a test and let it run several days then checked the CPU utilization on the dashboard. Here's part of the chart:
As you can see the utilization which is given in Megacycles/sec which I assume equals to one MHz is between like 700 and 1500.
The app uses one F1 instance only, runs without problems, there are no quota errors, but then what does the 600 Mhz CPU limit mean if the utilization is usually above it?
Megacycles/sec is not MHz in this graph. As explained in Interface QuotaService:
Measures the duration that the current request has spent so far
processing the request within the App Engine sandbox. Note that time
spent in API calls will not be added to this value. The unit the
duration is measured is Megacycles. If all instructions were to be
executed sequentially on a standard 1.2 GHz 64-bit x86 CPU, 1200
megacycles would equate to one second physical time elapsed.
In App Engine Flex, you get an entire CPU core from the machine you are renting out, but in App Engine Standard, it shows the Megacycles since it uses a sandbox.
Note that there is a feature request in the issue tracker on adding CPU% metric under gae_app for App Engine standard and I have relayed your concern about it to the Cloud App Engine product team. However, there is no guarantee of the implementation and ETA at this time. I recommend to star the ticket so that you would receive updates about it.
Have a production web-application deployment on GAE (LAMP Stack) with autoscaled setting, and according to the documentation, Google will automatically spin-up additional instances to meet demand; this seemed to have been proven when we went live, hours before a season finale aired which would guarantee traffic hit our site, and our site did NOT fall-over even with the expected sizable influx - so kudos to Google! However, I'd be naive to think that this server architecture is done, knowing that we're still in our infancy, and we could potentially get 10 - 100x more traffic in the near future on a consistent basis when we gain popularity and move into the global market. So my question is:
Should I be implementing a Load Balancer in GCP or will GAE be able to scale "indefinitely" to accommodate?
Based on this answer: AppEngine load balancing across multiple regions you'll need to implement the load balancer if you're targeting multiple regions.
Otherwise it will be dependent on your configuration and the thresholds you've set on your GAE config.
According to https://cloud.google.com/appengine/docs/standard/go/how-instances-are-managed, there are three ways you can define scaling on your AppEngine instance:
Automatic scaling
Automatic scaling creates dynamic instances based on request rate, response latencies, and other application metrics. However, if you
specify a number of minimum idle instances, that specified number of
instances run as resident instances while any additional instances
are dynamic.
Basic Scaling
Basic scaling creates dynamic instances when your application receives requests. Each instance will be shut down when the app
becomes idle. Basic scaling is ideal for work that is intermittent or
driven by user activity.
Manual scaling
Manual scaling uses resident instances that continuously run the specified number of instances regardless of the load level. This
allows tasks such as complex initializations and applications that
rely on the state of the memory over time.
So the answer is it depends. You'll just need to base your scaling strategy on how your load distribution looks. I would expect that the automatic scaling is fine for 90% of early-stage websites, though that's just my impression.
I am running a site on App Engine (managed VM). It is currently running on f1-micro instances.
The Cloud platform Console reports that CPU utilization is ~40%. I became a little suspicious because the site is receiving practically zero traffic. Is this normal for an idle golang app on a f1-micro instance?
I logged onto the actual instance and "top" reports CPU utilization ~2%.
What gives? Why is "top" saying something different than the Console?
top gives a momentary measure (I believe every second?), while the Console's data might be over a longer period of time during which the site had higher activity. With a micro instance, it seems plausible that relatively normal amounts of traffic could take up a relatively high percentage of the CPU, leading to such a metric.
I'm doing a prototype backend and in the near future I expect little traffic but while testing I consumed all my 300$ free trail.
How can I configure my app to consume the least possible resources? I need things like limiting the number of instances to 1, using a cheap machine, sleep whenever possible, I've read something about Client vs Backend intances.
With time I'll learn the config that best suits me, but now I need the CHEAPEST config to get going.
BTW: I am using managed-vms with Dart.
EDIT
I've been recommended to configure my app.yaml file, what options would you recommend to confront this issue?
There are two train of thought for your issue.
1) Optimization of code: This is very difficult for us as we are not privy to your App's usage and client-base and architecture. In general, it depends on what Google App Engine product you use the most, for example: Datastore API call (fetch, write, delete... etc...), BigQuery and Cloud SQL. Even after optimization, you can still incur a lot of cost depending on traffic.
2) Enforcing cheap operation: This is easier and I think this is what you want. You can manually enforce a daily budget (in your billing setup page) so the App never cost more than a certain amount per day. You can also artificially lower the maximum amount of idling instances to 0 and use the smallest instance possible (F1 for frontend).
For pricing details see this article - https://cloud.google.com/appengine/pricing#Billable_Resource_Unit_Costs
If you use managed VM -- you'll be billed for Compute Engine Instance prices, not for App Engine Instances, and, as I know, the minimum possible instance to use as Managed VM is "g1-small" which costs you $0.023 per hour full sustained usage (if it will be turned on all month), so you minimum bill will be 0.023 * 24 * 30 = $16.56 only for instance hours. Excluding disk and traffic. With minimum amount of datastore operations you may stay on free quota.
Every application consumes resources differently. To minimize your cost, you need to know what resources used the majority of your expenses and go from there.
If it is spent on extra instances that were just sitting there - then trim the number of instances to the minimum required and use a lower class instance. If you are seeing a lot of expense on datastore calls - then look at optimizing your entities and take advantage of memcache.
Lowest Cost for a simple app:
Use App Engine Standard. It scales to zero instances, so will not cost anything if there is no traffic. With App Engine Flex you will pay for the instance hours and the Flex (GCE) instances are bigger.
Use autoscaling with max instances, F1 instance class:
With autoscaling you do not need to guess how many instances you need. F1 are the smallest instances. Set the max instances in case you get DoS'd or more traffic than you can afford.
Stop Instances:
You can stop the App Engine versions when you do not expect the app to be used. The will be no charge for instance hours for either Standard or Flex. For Flex there will be disk charges. The app will be ready to go when you need it again.
App Engine Version Cleanup:
Versions are easy to create and harder to remove. Here is a post on project cleanup. See this post on App Engine cleanup
https://medium.com/google-cloud/app-engine-project-cleanup-9647296e796a