Tracking down memory leak in Google App Engine Golang application? - google-app-engine

I saw this Python question: App Engine Deferred: Tracking Down Memory Leaks
... Similarly, I've run into this dreaded error:
Exceeded soft private memory limit of 128 MB with 128 MB after servicing 384 requests total
...
After handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application.
According to that other question, it could be that the "instance class" is too small to run this application, but before increasing it I want to be sure.
After checking through the application I can't see anything obvious as to where a leak might be (for example, unclosed buffers, etc.) ... and so whatever it is it's got to be a very small but perhaps common mistake.
Because this is running on GAE, I can't really profile it locally very easily as far as I know as that's the runtime environment. Might anyone have a suggestion as to how to proceed and ensure that memory is being recycled properly? — I'm sort of new to Go but I've enjoyed working with it so far.

For a starting point, you might be able to try pprof.WriteHeapProfile. It'll write to any Writer, including an http.ResponseWriter, so you can write a view that checks for some auth and gives you a heap profile. An annoying thing about that is that it's really tracking allocations, not what remains allocated after GC. So in a sense it's telling you what's RAM-hungry, but doesn't target leaks specifically.
The standard expvar package can expose some JSON including memstats, which tells you about GCs and the number allocs and frees of particular sizes of allocation (example). If there's a leak you could use allocs-frees to get a sense of whether it's large allocs or small that are growing over time, but that's not very fine-grained.
Finally, there's a function to dump the current state of the heap, but I'm not sure it works in GAE and it seems to be kind of rarely used.
Note that, to keep GC work down, Go processes grow to be about twice as large as their actual live data as part of normal steady-state operation. (The exact % it grows before GC depends on runtime.GOGC, which people sometimes increase to save collector work in exchange for using more memory.) A (very old) thread suggests App Engine processes regulate GC like any other, though they could have tweaked it since 2011. Anyhow, if you're allocating slowly (good for you!) you should expect slow process growth; it's just that usage should drop back down again after each collection cycle.

A possible approach to check if your app has indeed a memory leak is to upgrade temporarily the instance class and check the memory usage pattern (in the developer console on the Instances page select the Memory Usage view for the respective module version).
If the pattern eventually levels out and the instance no longer restarts then indeed your instance class was too low. Done :)
If the usage pattern keeps growing (with a rate proportional with the app's activity) then indeed you have a memory leak. During this exercise you might be able to also narrow the search area - if you manage to correlate the graph growth areas with certain activities of the app.
Even if there is a leak, using a higher instance class should increase the time between the instance restarts, maybe even making them tolerable (comparable with the automatic shutdown of dynamically managed instances, for example). Which would allow putting the memory leak investigation on the back burner and focusing on more pressing matters, if that's of interest to you. One could look at such restarts as an instance refresh/self-cleaning "feature" :)

Related

Why do APM tools like AppDynamics or VisualVM show heap memory peaks during idle state?

We are using AppDynamics and VisualVM to monitor our application heap memory usage. We see similar graph as stated in these questions - this and this.
the red boxes show idle system heap usage - peaks are seen only when system is in idle state and are even observed when no application is deployed.
the green arrow points to actual application in use state - When system is in use, we see relatively very less heap usage being reported.
Based on the clarifications in other SO questions, if we say it is due to garbage collection, why would GC not occur during application use? When system is idle, we see system objects like java.land.String, byte[], int[] etc. getting reported in AppDynamics, but how to find who is responsible for creating them?
Again, in the heap dumps taken during idle state, we see only 200MB out of 500MB memory used, when the server has dedicated -Xmx4g configuration.
How should we make sense of these observations?
On analyzing the heap dump taken during system idle state, we only see various WebAppClassLoaders holding instances of different library classes.
This pattern is also explained in official blogs of APM experts like Plumbr and Datadog as a sign of healthy JVM where regular GC activity is occurring and they explain that it means none of the objects will stay in memory forever.
From Plumbr blog:
Seeing the following pattern is a confirmation that the JVM at question is definitely not leaking memory.
The reason for the double-sawtooth pattern is that the JVM needs to allocate memory on the heap as new objects are created as a part of the normal program execution. Most of these objects are short-lived and quickly become garbage. These short-lived objects are collected by a collector called “Minor GC” and represent the small drops on the sawteeth.

React/Redux - memory footprint not proportionate to state

So my whole Redux state is perhaps around 3-4mb, but Chrome is reporting my tab's memory usage at around 400-500mb, which climbs the longer you use it.
I understand there are other things it needs the memory for (so I shouldn't expect a 1:1 relationship), but does anyone know how I'd attempt to reduce memory consumption?
On a fresh session (or Incognito tab), my app is running very smoothly. If it's open for an afternoon or so, performance suffers greatly.
My Redux store isn't overly large,
Same page/DOM nodes etc between the 2 normal and Incognito tabs
Everything else is seemingly identical
I get that this is fairly vague, but I'm not sure what else to include. Anyone have any pointers?
Please use the Google Chrome Performance Analysis Tools to analyse the performance of your app and see where savings can be made.
That said, Google Chrome can be a memory-hungry application in general. You should try to consider whether this is a problem. If the Chrome session does not consume an amount of RAM which is hurting the overall computer's performance, then it is not a problem. Try running the application on a computer with very little RAM, as long as it stops consuming memory before it impacts performance, then it is a non-issue.
If it does not do this and it begins to consume more and more memory, you likely have a memory leak and should look to resolve this with the tools linked above.

Appengine frontend instances have been using more and more RAM, how can I reduce this?

My instances all now start at 140m and average just under 200. If left long enough they start hitting 240m. However my question is more about the memory being used right after a fresh instance is booted up. I store nothing on the instances. Every request fetches stuff from memcache and datastore and I don't use singletons.
All I have are classes and a lot of static resources that deploy with the instances. I use JSPs extensively (if that makes a difference).
Thanks for any assistance!
I'm going from memory here, since I have used Java on App Engine for a few years. This may be stale.
The JVM doesn't like to release memory. If an instance gets created, and services a request, the memory watermark goes up. Garbage collection may 'free' part of that memory up in the sense of it being available for reuse, but the high watermark on process memory doesn't necessarily go down. A subsequent request may need allocations that aren't available as free chunks, so the watermark on memory goes up again. If the app isn't configured to serve multiple request simultaneously, memory use follows something like a sigmoid curve. If multiple requests are being processed simultaneously, the watermark is raised further.
That said, a common cause of unexpected memory growth is queries that retrieve more rows than are necessary, with filtering happening in the app.
But without more information, your specific case is impossible to diagnose.
I believe I figured out why my project was taking up an ever increasing amount of ram. I happen to have a lot of static resources in my project and it would appear that these static resources all get loaded into the frontend instance memory (probably for speed). I managed to free up huge amounts of memory by moving my static resources off of my primary application servers.

Google App Engine Memory Limit - Task Queue

Is there a memory limit to the Task Queue on Google App Engine. I'm specifically concerned with the Go runtime, but it would be nice to get answers on all runtimes if someone can provide them.
The tasks are executed by the same app instance(s) as the regular requests, only tasks are allowed to run longer. So the same memory limits apply (also subject to the task queue specific limits and quota, which might also eat into the instance memory).
You might chose to direct tasks to a dedicated module to which you assign an instance class with more memory (more powerful as well), if memory consumption is a concern.
But since the max instance class memory size is currently 1G I suspect your instance will most likely hit the 'soft private memory limit' and be killed before loading a full 1G file into memory :)
A "task" is essentially represented by a URL that's stored away for later delivery to an instance of your app. The representation of a task is independent of language, unless you use a stringified, language-specific serialization of something as a value.
If by memory limit you mean "how much (where much = count*size) task queue stuff can I have pending?," the answer is spelled out in the Task Queue section of the quotas document.
If you're asking how big a single task can be, that'll depend on the memory size of your instances, since you'll need enough memory to construct a task before enquing it.
For task processing the app instances need enough memory to accept and process a task, or enough memory to accept and process many concurrently, if your app is configured to accept multiple simultaneous requests. How much memory that takes beyond accepting the URL is basically up to how the app is coded.

Do I need a custom memory allocator?

I'm currently developing a cache server which for its own nature uses a lot of RAM (I'm testing it on a server with a lot of HTTP traffic and both a WordPress and a custom web application using it to cache data in memory).
The server obviously performs a lot of malloc/realloc/free operations which are expensive, so I was wondering if I should use a custom memory allocator, maybe something which preallocates a big "memory pool" at the beginning and then using it to give free "pieces" of the requested size when a malloc/realloc is performed, and flagging them as freed when a free is called.
Am I taking the right path or don't I really need such a thing? Is there an allocator like that or do I have to code one of my own?
Important notes:
The server is single-threaded (using multiplexing), so I don't
need allocators with high performance grades in multi-threading
applications (such as jemalloc, which as far as I understood is
just as good as the normal malloc in single-threaded applications...
Correct me if I'm wrong, please).
Before you ask/suggest, I've already used Valgrind to remove every
possible memory leak. I just need to optimize, not to fix.
Memory fragmentation is a problem, so I should use an approach to optimize this too.
Using an appropriate configuration directive, the user can set the
maximum usable memory from the server, so that's why a preallocated
fixed memory pool came to my mind.
I don't have performance problems; I'm developing this just for fun and curiosity. I like to learn and experiment with new programming techniques.
And yes, I've used callgrind, and malloc is one of the most expensive operations.
Since you said you don't have a performance problem, you don't need to do anything. Set that aside.
You need a foothold to get any kind of improvement, since malloc is really fast. Offhand I recall about 100-200 cycles, on Mac OS X from a few years ago. (But free can also take more time, which should show up in the profiling statistics.) Writing a better general-purpose allocator is essentially impossible, beyond skill and luck.
Patterns specific to your application can still expose opportunities, though. I've had luck with programs that free objects in roughly the same order as their creation.
Create some memory buckets.
malloc returns blocks from the buckets in linear fashion.
free marks blocks as unused. This may utilize a bitmap, a managed smart-pointer, or the like. For pure garbage collection, no explicit call is needed.
When the last bucket is full, sweep the last recently used one(s) but only check if completely empty.
If no empty bucket, make a new one.
This is a horrible strategy if there's any fragmentation but it can improve on plain malloc by a couple orders of magnitude, because scanning along a bucket until the end just takes 1-2 cycles instead of 100-200, and for little enough fragmentation you can always make the sweeps infrequent enough.
The one-in-a-million slow sweep also disqualifies this approach from many applications.
I don't know much about cache servers, but perhaps this approach would work for HTTP connection objects which are highly transient. You need to focus not on all of malloc, but the subset of allocations which incur the biggest cost in the most predictable pattern.

Resources