I'm updating my webapp that previously has been using the default configuration. Now I'm starting to use modules trying this config
application: newkoolproject
# Other settings here...
version: newkool
runtime: python27
api_version: 1
threadsafe: true
automatic_scaling:
min_idle_instances: 5
max_idle_instances: automatic # default value
min_pending_latency: automatic # default value
max_pending_latency: 30ms
max_concurrent_requests: 50
I pay for instance hours, data reads and complex searches totalling a few dollar daily on my current budget (where my limit is 7 USD for a day to avoid sudden spikes due to DoSing or other technical issue).
Could it be feasible for my to try and squeeze my app into the freetier using memcache and other technologies to reduce costs? Or should I forget about reaching free-tier (< 28 instances hours etc) and instead make a configuration for a more optimal UX? How will the change change my costs?
Update 141010 18:59 CET
I could add appstats
You'll need to turn appestats on on your local app engine development server, go through your typical user flow. Here are the instructions on how to do this: https://cloud.google.com/appengine/docs/python/tools/appstats
Make sure you turn calculate RPC costs on:
appstats_CALC_RPC_COSTS = True
Once you go through a typical user flow, you would go to localhost:8080/_ah/stats and it will estimate how much specific calls and flows will cost when you get to production. Really great tool, as it even helps identify bottlenecks and slow running areas within your application.
Google's recommendation is to not only use memcache, but also to split work into smaller units (as much as possible) by leveraging task queues.
UPDATE: Simple memcache usage example
my_results = memcache.get("SOME-KEY-FOR-THIS-ITEM")
if not my_results:
#Do work here
memcache.set("SOME-KEY-FOR-THIS-ITEM", my_results)
return my_results
Related
First thing first, here is my app.yaml:
runtime: nodejs10
env: standard
instance_class: F1
handlers:
- url: /.*
script: auto
automatic_scaling:
min_instances: 1
max_instances: 20
inbound_services:
- warmup
I'm using Apache Benchmark for this:
ab -c30 -n100000 "${URL}"
What I notice in the GAE console is that I have 8 instances available but only 3 take on 99% of the work. The rest is serving either no request or a very small portion.
Any idea what the problem could be here?
I would recommend to use the “max_concurrent_requests” element in your “app.yaml” file, as this element is the number of concurrent requests an automatic scaling instance can accept before scheduler spawns a new instance (Keep in mind that the maximum limit is 80).
Furthermore, you can also set “max_pending_latency” that specifies the maximum amount of time that App Engine should allow a request to wait in the pending queue before starting additional instances to handle requests, so that pending latency is reduced.
If you reach the limit, it will be a signal to scale up, so the number of instances will be increased.
The fact that the load is not evenly distributed across the running instances is normal and actually desired as long as the number of instances still processing requests is sufficient to handle the current load level with a satisfactory level of performance - this allows the other instances to be idle long enough to be automatically shutdown (due to inactivity).
This is part of the dynamic instance management logic used with automatic and basic scheduling.
Since moving to GAE go runtime 1.11, we notice the number of instances is much higher. When I dig into the problem, it seems that GAE is not running in concurrency.
Here is a very light module of the frontend settings:
automatic_scaling:
min_idle_instances: 1
max_idle_instances: 2
min_pending_latency: 0.030s
max_pending_latency: automatic
max_concurrent_requests: 80
target_throughput_utilization: 0.95
And with about 50 requests per second, GAE spun up 2 active instances. Each has about 25 QPS and the average latency is under 20ms. Even the chart shows the instances aren't really busy.
What is in the settings that would cause this issue?
I don't think Go runtime 1.9 has this issue. And the document said it ignores the max concurrent requests setting which should make Go runtime 1.11 perform much better.
The max_pending_latency: automatic setting will make your app to scale up if your latency goes above 30ms . If in the current situation, with 2 instances, your average latency is somewhat under 20, it would have been possible that in the initial situation it went over 30 for a short period of time, which triggered the scaling. In case you do not want this to happen, you can always set the max_pending_latency manually, with a value above 30.
Regarding the comparison with Go 1.9, it is known that Go 1.11 consumes slightly more RAM and CPU power than his ancestor, so this would be something normal.
In conclusion I do not think that what is happening in your situation is an issue, but something normal. In case you do not agree you can provide me your whole, sanitized app.yaml file and I will look deeper to see if anything is wrong and edit my answer.
My app engine service is written in Go. I have code that connects to Cloud Datastore before even the server listens on the port. There is a single Projection query that takes about 500ms reading just 4 entities. Does the first interaction with datastore have higher latency potentially as a connection needs to be established? Any way this datastore connection latency be reduced? Also, is there any difference in doing this db call before listening to the port vs doing it within the warmup request (this is an autoscaled instance).
Similar to high initial latency for Cloud Datastore, I see a similar pattern for Cloud Tasks. Initial task creation could be as high as 500ms but even subsequent ones are any where from 200 to 400ms. This is in us-central. I was actually considering moving a db update to a background task but in general I am seeing the latency of task creation to be more or less same as doing a transaction to read and update the data giving no net benefit.
Finally, instance startup time is typically 2.5 to 3 seconds with the main getting called after about 2 seconds. My app startup time is the above mentioned project query cost of 500ms and nothing else. So, no matter how much I optimize my app startup, should I assume an additional latency of about 2 seconds?
Note that the load on the system is very light so these issues can't be because of high volume.
Update: deployment files as requested by Miguel (this is for a test environment investigating performance characteristics. Prod deployment will be more generous for instances)
default app:
service: default
runtime: go112
instance_class: F1
automatic_scaling:
min_instances: 0
max_instances: 1
min_idle_instances: 1
max_idle_instances: 1
min_pending_latency: 200ms
max_pending_latency: 500ms
max_concurrent_requests: 10
target_cpu_utilization: 0.9
target_throughput_utilization: 0.9
inbound_services:
- warmup
backend app:
service: backend-services
runtime: go112
instance_class: B1
basic_scaling:
idle_timeout: 1m
max_instances: 1
200-500ms to initialize a client seems reasonable because there is a remote connection being established. Also, a 1-2 seconds cold start for App Engine also seems normal.
As you mentioned, you can experiment with a warmup request to reduce cold starts and initialize clients.
I would also recommend looking into the mode you are running your Datastore in (native vs datastore). There is increase latency when using datastore mode, for more info see Cloud Datastore Best Practices.
I have noticed a recent surge in instance spawning on GAE.
In my app.yaml I have clearly defined that max two instances should be created at a time.
application: xxx
version: 1-6-0
runtime: python27
api_version: 1
instance_class: F2
automatic_scaling:
max_idle_instances: 2
threadsafe: true
However the dashboard is showing 4 instances and the bills are going up. How can I stop this madness? :)
I did a lot of research into this.
F instances are automatically scaled. There is no way to limit that. Hence it makes sense to move the actual work away from frontend instances and put that into a backend instance (B1 or B2). The latter provides another 8 hours free quota.
The real challenge is to re-architect the app to use a default app.yaml for web statics, a mobile.yaml for mobile requests with shorter min_pending_latency and a backend.yaml (B2) instance for handling the tasks and calculations.
All this needs to be routed properly via a dispatch.yaml. In this file you can specify which url endpoint will be handled effectively by which module.
Best way to understand it, is to look at this excellent example from GAE:
It makes sense trying to make it work first on local environment, before trying anything on remote server.
dev_appserver.py dispatch.yaml app.yaml mobile.yaml backend.yaml
Also this official documentation explains some of the above in more detail.
Its pretty impressive, what can be achieved with GAE.
max_idle_instances
The maximum number of IDLE instances that App Engine should maintain
for this version.
It seems that it is not currently possible to set the max number of instances for automatic scaling module. As #DoIT suggests, you can set spending limit, however, keep in mind the below.
When an application exceeds its daily spending limit, any operation
whose free quota has been exhausted fails.
So if you need to control somehow the total number of instances and keep your service running, I see the following possibilities.
Change your scaling type to basic and set max_instances parameter as you like
Keep automatic scaling type and increase min_pending_latency and max_concurrent_requests parameters (multi-threading has to be enabled)
You can find more details here.
The '''max_idle_instances''' set the max number of idle instances, i.e. Instances that are waiting for traffic spike. From your screenshot, it looks like all instances are getting traffic, so it looks ok to me. You can set max daily budget, if you want to control your spend on GAE.
its possible that some of your requests are taking too long to complete, and thus cause new instances to be spawned. You could work around this (I'm told, but have yet to try it myself) by setting a high value for your min_pending_latency property. this could hurt your latency a little, but would also limit the rate of instance-spawning.
I understand that App Engine handles scaling automatically. However, in order to test drive some multi-instance / consolidated state scenarios, I'd like to instruct App Engine to fire up a minimum of 5 instances, even if load does not justify this.
Is there a way of doing this via app.yaml or Dashboard?
Try setting the Min Pending Latency to a very low number (e.g. 100ms), and then send a burst of requests to your app. Then the scheduler will start spinning up multiple instances to handle these requests.
You may need to use a tool for automated load testing - it will be difficult to achieve this manually.
This is what the min_idle_instances value controls.
In app.yaml:
automatic_scaling:
min_idle_instances: 5
It is possible to use either automatic_scaling or manual_scaling. With automatic scaling, you can only influence how many instances are started, with manual scaling, you can decide how many instances you wish to use. Place the following in your app.yaml, or module yaml file:
manual_scaling:
instances: 5