I run a nodejs backend on Google App Engine for my website app (which is hosted statically elsewhere). My QPS is very low:
However my number of instances regularly goes above 1, to 2 and sometimes as high as 3 or 4. This blows past the free quota and I start have to pay.
My cost were below $0.1 per month but now are regularly above $8 which is a worrying trend. My user base or their patterns haven't really changed significantly to explain this (and as mentioned the QPS are low IMHO).
I noticed that the memory usage is relatively high and I'm wondering if I should investigate potential memory leaks in my app.
More background on the app: it mostly handles auth and fetches and stores data into MongoDB Atlas.
I would go with a 2 step process
First 'temporarily' set your maximum_instances to 1 in your app.yaml file. This means your server will only spin up 1 instance. This comes with a risk that if you have a spike in traffic, it might be slow for some people.
app.yaml
automatic_scaling:
max_instances: 1
Then investigate if you have memory leaks in your code and see how you can optimize your code. If you discover your issues and fix it, you can remove the max_instance or raise the value.
From the screenshots you’ve shared, I can see that your number of instances increases as your memory usage does, and decreases in the same way. Therefore, I would recommend increasing the amount of memory that is assigned to every instance to use less instances for serving your application. As this guide recommends to begin testing your application:
start with a lower machine (1 core CPU and 1 GB RAM) and increase the minimum instances that the App Engine should spawn instead
Also, the guide says that:
Since App Engine bills you on the number of Cores and RAM used per hour, you can save up to 40% of your costs by using the former setup with lower machines
From that, it is better to have more memory allocated to an instance to avoid excessive costs.
In the App Engine pricing page, it says that:
Billing for the memory resource includes the memory your app uses plus the memory that the runtime itself needs to run your app. This means your memory usage and costs can be higher than the maximum memory you request for your app.
In your app.yaml configuration file, you can set the parameter memory_gb that is set to 0.6 GB by default.
The requested memory for your application, which does not include the ~0.4 GB of memory that is required for the overhead of some processes. Each CPU core requires a total memory between 0.9 and 6.5 GB.
To calculate the requested memory:
´memory_gb = cpu * [0.9 - 6.5] - 0.4´
Additionally, you should check your code for memory leaks to avoid your application consuming more memory than expected.
Finally, you can use the pricing calculator to estimate the costs for your application based on the tweaks you do on your application.
Related
I have a few requests that needs to use extensive amount of memory i.e. 40 MB more than other requests.
At the default of 10 max concurrent requests using a F1 auto-scaling instance, it can potentially use 400+ MB, which is way more than the 130MB-ish system memory it has available. There is no memory utilization setting in the yaml file, so I wonder what can be done to prevent such situations.
Google App Engine don't have any memory utilization beside Python Garbage Collection.
My Advise is,
Try to release memory as soon as response
Try to optimize memory usage on that part, may be you need to use other service to help solving memory usage problem, eg. file serving via Google Storage, etc...
Scale up instance to F2 which more suitable for production, but you still need to optimize your memory usage for higher usage.
I have a GAE standard Python app that does some fairly computational processing. I need to complete the processing within the 60 second request time limit, and ideally I'd like to do it faster for a better user experience.
Splitting the work to multiple threads don't seem to be a good solution because the threads would likely run on the same CPU and thus wouldn't give a speed up.
I was wondering if Google Cloud Functions (GCF) could be used in a similar manner as threads. For example, if I create a GCF to do the processing, split my work into 10 chunks, and make 10 GCF calls in parallel, can I expect to get results 10x faster? (aside from latency and GCF startup costs)
Each function invocation runs in its own server instance, and a function will scale up to 1000 instances to handle concurrent requests in parallel. So yes, you can do this, if you are willing to potentially pay the cold start cost of each server instance as it's allocated for its first request.
If you're able to split the workload in smaller chunks that you'd be launching in parallel via separate (external) requests I'd suspect you'd get a better performance (and cost) by using GAE itself (maybe in a separate service) instead of CFs:
GAE standard environment instances can have higher CPU speeds - a B8 instance has 4.8 GHz, the max CF CPU speed is 2.4 GHz
you have better control over the GAE scaling configuration and starting time penalties
I suspect networking delays would be at least the same if not better on GAE - not going to another product infra (unsure though)
GAE costs would likely be smaller since you pay per instance hours (regardless of how many requests the instance handles) not per request/invocations
We have hit a roadblock moving an app at Production scale and was hoping to get some guidance. Application is pretty common use case in stream processing but does require maintaining large number of keyed states. We are processing 2 streams - one of which is a daily burst of stream (normally around 50 mil but could go upto 100 mil in one hour burst) and other is constant stream of around 70-80 mil per hour. We are doing a low level join using CoProcess function between the two keyed streams. CoProcess function needs to refresh (upsert) state from the daily burst stream and decorate constantly streaming data with values from state built using bursty stream. All of the logic is working pretty well in a standalone Dev environment. We are throwing about 500k events of bursty traffic for state and about 2-3 mil of data stream. We have 1 TM with 16GB memory, 1 JM with 8 GB memory and 16 slots (1 per core on the server) on the server. We have been taking savepoints in case we need to restart app for with code changes etc. App does seem to recover from state very well as well. Based on the savepoints, total volume of state in production flow should be around 25-30GB.
At this point, however, we are trying deploy the app at production scale. App also has a flag that can be set at startup time to ignore data stream so we can simply initialize state. So basically we are trying to see if we can initialize the state first and take a savepoint as test. At this point we are using 10 TM with 4 slots and 8GB memory each (idea was to allocate around 3 times estimated state size to start with) but TMs keep getting killed by YARN with a GC Overhead Limit Exceeded error. We have gone through quite a few blogs/docs on Flink Management Memory, off-heap vs heap memory, Disk Spill over, State Backend etc. We did try to tweak managed-memory configs in multiple ways (off/on heap, fraction, network buffers etc) but can’t seem to figure out good way to fine tune the app to avoid issues. Ideally, we would hold state in memory (we do have enough capacity in Production environment for it) for performance reasons and spill over to disk (which I believe Flink should provide out of the box?). It feels like 3x anticipated state volume in cluster memory should have been enough to just initialize state. So instead of just continuing to increase memory (which may or may not help as error is regarding GC overhead) we wanted to get some input from experts on best practices and approach to plan this application better.
Appreciate your input in advance!
I am developping an application using App Engine to collect, store and deliver data to users.
During my tests, I have 4 data sources which send HTTP POST requests to the server every 5s (all requests are exactly uniform).
The server stores received data to the datastore using Objectify.
At the beginning, all requests are manage by 1 instance (class F1) with 0.8 QPS, a latency of 80ms and 80MB of memory.
But during the following hours, the used memory increases and goes over the limit of F1 Instance.
However, the scheduler doesn't start another instance. When I stop all traffic, average memory never decreases.
Now I have 150MB memory instead of 128MB (limit of F1 class) and I stopped all the traffic.
I Tried to set performance settings manually or automatic, disable Appstats without any improvement.
I use Memcache and datastore, don't have any cron or task queues and the traffic is always the same.
What are the possible reasons the average memory increase?
Is it a bug of the admin console?
Which points define the quantity of memory used per request?
Another question:
Does Google have special discount for datastore read/write ( >30 million ops / day ) ?
Thank you,
Joel
Regarding the special price, I don't think there is. If your app needs this amount of read/write quota you should look into optimizing to minimize write and perhaps implement some sort of bulk writing if possible.
On the memory issue. You should post your code in order to get a straight answer since there are too many things to look into when discussing memory usage. Knowing more about your case will help in producing a straight answer.
Cheers,
Kjartan
My Solr 4 instance is slow and I don't know why.
I am attempting to modify the configurations of JVM, Tomcat6 and Solr 4 in order
to optimize performance, with queries per second as the key metric.
Currently I am running on an EC2 small tier with Debian squeeze, but ready to switch to Ubuntu if needed.
There is nothing special about my use case. The index is small. Queries do include a moderate number of unions (e.g. 10), plus faceting, but I don't think that's unusual.
My understanding is that these areas could need tweaking:
Configuring the JVM Garbage collection schedule and memory allocation ("GC tuning is a precise art form", ref)
Other JVM settings
Solr's Query Result cache, Filter cache, Document cache settings
Solr's Auto-warming settings
There are a number of ways to monitor the performance of Solr:
SolrMeter
Sematext SPM
New Relic
But none of these methods indicate which settings need to be adjusted, and there's no guide that I know of that steps through an exhaustive list of settings that could possibly improve performance. I've reviewed the following pages (one, two, three, four), and gone through some rounds of trial and error so far without improvement.
Questions:
How to tell JVM to use all the 2 GB memory on the small EC2 instance?
How to debug and optimize JVM Garbage Collection?
How do I know when I/O throttling, such as the new EBS IOPS pricing, is the issue?
Using figures like the NewRelic examples below, how to detect what is problematic behavior, and how to approach solutions.
Answers:
I'm looking for link to good documentation for setting up and optimizing Solr 4, from a DevOps or server admin perspective (not index or application design).
I'm looking for the top trouble spots in catalina.sh, solrconfig.xml, solr.xml (other?) that are most likely causes of problems.
Or any tips you think address the questions.
First, you should not focus on switching your linux distribution. A different distribution might bring some changes but considering the information you gave, nothing prove that these changes may be significant.
You are mentionning lots of possibilities for your optimisations, this can be overwhelming. You should consider an tweaking area only once you have proven that the problem lies in that particular part of your stack.
JVM Heap Sizing
You can use the parameter -mx1700m to give a maximum of 1.7GB of RAM to the JVM. Hotspot might not need it, so don't be surprised if your heap capacity does not reach that number.
You should set the minimum heap size to a low value, so that Hotspot can optimise its memory usage. For instance, to set a minimal heap size at 128MB, use -mx128m.
Garbage Collector
From what you say, you have limited hardware (1-core at 1.2GHz max, see this page)
M1 Small Instance
1.7 GiB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
...
One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2
GHz 2007 Opteron or 2007 Xeon processor
Therefore, using that low-latency GC (CMS) won't do any good. It won't be able to run concurrently with your application since you have only one core. You should switch to the Throughput GC using -XX:+UseParallelGC -XX:+UseParallelOldGC.
Is the GC really a problem ?
To answer that question, you need to turn on GC logging. It is the only way to see whether GC pauses are responsible for your application response time. You should turn these on with -Xloggc:gc.log -XX:+PrintGCDetails.
But I don't think the problem lies here.
Is it a hardware problem ?
To answer this question, you need to monitor resource utilization (disk I/O, network I/O, memory usage, CPU usage). You have a lot of tools to do that, including top, free, vmstat, iostat, mpstat, ifstat, ...
If you find that some of these resources are saturating, then you need a bigger EC2 instance.
Is it a software problem ?
In your stats, the document cache hit rate and the filter cache hit rate are healthy. However, I think the query result cache hit rate is pretty low. This implies a lot of queries operations.
You should monitor the query execution time. Depending on that value you may want to increase the cache size or tune the queries so that they take less time.
More links
JVM options reference : http://jvm-options.tech.xebia.fr/
A feedback that I did on some application performance audit : http://www.pingtimeout.fr/2013/03/petclinic-performance-tuning-about.html
Hope that helps !