Why do APM tools like AppDynamics or VisualVM show heap memory peaks during idle state? - heap-memory

We are using AppDynamics and VisualVM to monitor our application heap memory usage. We see similar graph as stated in these questions - this and this.
the red boxes show idle system heap usage - peaks are seen only when system is in idle state and are even observed when no application is deployed.
the green arrow points to actual application in use state - When system is in use, we see relatively very less heap usage being reported.
Based on the clarifications in other SO questions, if we say it is due to garbage collection, why would GC not occur during application use? When system is idle, we see system objects like java.land.String, byte[], int[] etc. getting reported in AppDynamics, but how to find who is responsible for creating them?
Again, in the heap dumps taken during idle state, we see only 200MB out of 500MB memory used, when the server has dedicated -Xmx4g configuration.
How should we make sense of these observations?

On analyzing the heap dump taken during system idle state, we only see various WebAppClassLoaders holding instances of different library classes.
This pattern is also explained in official blogs of APM experts like Plumbr and Datadog as a sign of healthy JVM where regular GC activity is occurring and they explain that it means none of the objects will stay in memory forever.
From Plumbr blog:
Seeing the following pattern is a confirmation that the JVM at question is definitely not leaking memory.
The reason for the double-sawtooth pattern is that the JVM needs to allocate memory on the heap as new objects are created as a part of the normal program execution. Most of these objects are short-lived and quickly become garbage. These short-lived objects are collected by a collector called “Minor GC” and represent the small drops on the sawteeth.

Related

Flink taskmanager out of memory and memory configuration

We are using Flink streaming to run a few jobs on a single cluster. Our jobs are using rocksDB to hold a state.
The cluster is configured to run with a single Jobmanager and 3 Taskmanager on 3 separate VMs.
Each TM is configured to run with 14GB of RAM.
JM is configured to run with 1GB.
We are experiencing 2 memory related issues:
- When running Taskmanager with 8GB heap allocation, the TM ran out of heap memory and we got heap out of memory exception. Our solution to this problem was increasing heap size to 14GB. Seems like this configuration solved the issue, as we no longer crash due to out of heap memory.
- Still, after increasing heap size to 14GB (per TM process) OS runs out of memory and kills the TM process. RES memory is rising over time and reaching ~20GB per TM process.
1. The question is how can we predict the maximal total amount of physical memory and heap size configuration?
2. Due to our memory issues, is it reasonable to use a non default values of Flink managed memory? what will be the guideline in such case?
Further details:
Each Vm is configured with 4 CPUs and 24GB of RAM
Using Flink version: 1.3.2
The total amount of required physical and heap memory is quite difficult to compute since it strongly depends on your user code, your job's topology and which state backend you use.
As a rule of thumb, if you experience OOM and are still using the FileSystemStateBackend or the MemoryStateBackend, then you should switch to RocksDBStateBackend, because it can gracefully spill to disk if the state grows too big.
If you are still experiencing OOM exceptions as you have described, then you should check your user code whether it keeps references to state objects or generates in some other way large objects which cannot be garbage collected. If this is the case, then you should try to refactor your code to rely on Flink's state abstraction, because with RocksDB it can go out of core.
RocksDB itself needs native memory which adds to Flink's memory footprint. This depends on the block cache size, indexes, bloom filters and memtables. You can find out more about these things and how to configure them here.
Last but not least, you should not activate taskmanager.memory.preallocate when running streaming jobs, because streaming jobs currently don't use managed memory. Thus, by activating preallocation, you would allocate memory for Flink's managed memory which is reduces the available heap space.
Using RocksDBStateBackend can lead to significant off-heap/direct memory consumption, up to the available memory on the host. Normally that doesn't cause a problem, when the task manager process is the only big memory consumer. However, if there are other processes with dynamically changing memory allocations, it can lead to out of memory. I came across this post since I'm looking for a way to cap the RocksDBStateBackend memory usage. As of Flink 1.5, there are alternative option sets available here. It appears though that these can only be activated programmatically, not via flink-conf.yaml.

Tracking down memory leak in Google App Engine Golang application?

I saw this Python question: App Engine Deferred: Tracking Down Memory Leaks
... Similarly, I've run into this dreaded error:
Exceeded soft private memory limit of 128 MB with 128 MB after servicing 384 requests total
...
After handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application.
According to that other question, it could be that the "instance class" is too small to run this application, but before increasing it I want to be sure.
After checking through the application I can't see anything obvious as to where a leak might be (for example, unclosed buffers, etc.) ... and so whatever it is it's got to be a very small but perhaps common mistake.
Because this is running on GAE, I can't really profile it locally very easily as far as I know as that's the runtime environment. Might anyone have a suggestion as to how to proceed and ensure that memory is being recycled properly? — I'm sort of new to Go but I've enjoyed working with it so far.
For a starting point, you might be able to try pprof.WriteHeapProfile. It'll write to any Writer, including an http.ResponseWriter, so you can write a view that checks for some auth and gives you a heap profile. An annoying thing about that is that it's really tracking allocations, not what remains allocated after GC. So in a sense it's telling you what's RAM-hungry, but doesn't target leaks specifically.
The standard expvar package can expose some JSON including memstats, which tells you about GCs and the number allocs and frees of particular sizes of allocation (example). If there's a leak you could use allocs-frees to get a sense of whether it's large allocs or small that are growing over time, but that's not very fine-grained.
Finally, there's a function to dump the current state of the heap, but I'm not sure it works in GAE and it seems to be kind of rarely used.
Note that, to keep GC work down, Go processes grow to be about twice as large as their actual live data as part of normal steady-state operation. (The exact % it grows before GC depends on runtime.GOGC, which people sometimes increase to save collector work in exchange for using more memory.) A (very old) thread suggests App Engine processes regulate GC like any other, though they could have tweaked it since 2011. Anyhow, if you're allocating slowly (good for you!) you should expect slow process growth; it's just that usage should drop back down again after each collection cycle.
A possible approach to check if your app has indeed a memory leak is to upgrade temporarily the instance class and check the memory usage pattern (in the developer console on the Instances page select the Memory Usage view for the respective module version).
If the pattern eventually levels out and the instance no longer restarts then indeed your instance class was too low. Done :)
If the usage pattern keeps growing (with a rate proportional with the app's activity) then indeed you have a memory leak. During this exercise you might be able to also narrow the search area - if you manage to correlate the graph growth areas with certain activities of the app.
Even if there is a leak, using a higher instance class should increase the time between the instance restarts, maybe even making them tolerable (comparable with the automatic shutdown of dynamically managed instances, for example). Which would allow putting the memory leak investigation on the back burner and focusing on more pressing matters, if that's of interest to you. One could look at such restarts as an instance refresh/self-cleaning "feature" :)

Appengine frontend instances have been using more and more RAM, how can I reduce this?

My instances all now start at 140m and average just under 200. If left long enough they start hitting 240m. However my question is more about the memory being used right after a fresh instance is booted up. I store nothing on the instances. Every request fetches stuff from memcache and datastore and I don't use singletons.
All I have are classes and a lot of static resources that deploy with the instances. I use JSPs extensively (if that makes a difference).
Thanks for any assistance!
I'm going from memory here, since I have used Java on App Engine for a few years. This may be stale.
The JVM doesn't like to release memory. If an instance gets created, and services a request, the memory watermark goes up. Garbage collection may 'free' part of that memory up in the sense of it being available for reuse, but the high watermark on process memory doesn't necessarily go down. A subsequent request may need allocations that aren't available as free chunks, so the watermark on memory goes up again. If the app isn't configured to serve multiple request simultaneously, memory use follows something like a sigmoid curve. If multiple requests are being processed simultaneously, the watermark is raised further.
That said, a common cause of unexpected memory growth is queries that retrieve more rows than are necessary, with filtering happening in the app.
But without more information, your specific case is impossible to diagnose.
I believe I figured out why my project was taking up an ever increasing amount of ram. I happen to have a lot of static resources in my project and it would appear that these static resources all get loaded into the frontend instance memory (probably for speed). I managed to free up huge amounts of memory by moving my static resources off of my primary application servers.

What is the equivalent of a heap dump in Oracle

Doing some reading about Oracle database and I'm learning about Shared Pools.
I used this as my main reference: https://docs.oracle.com/database/121/TGDBA/tune_shared_pool.htm#TGDBA558
After reading this one thing I’m still not clear on is how we can get a “dump” of the shared pool
For example, let's say I have an application that is having memory consumption issues / errors due to an over stressed shared pool... how would I go about finding out what stored procs, string variable contents, etc is eating up all the storage?
In Java we would simply take a Heap dump. The heap dump shows packages, classes, raw data that was in the memory.
What is the equivalent of a heap dump in Oracle?
Have a look at Oradebug
From the page:
Oradebug is a command that can be executed from SQL*Plus (or Svrmgrl
in Oracle 8i and lower releases) to display or dump diagnostics
information.
Brief explanation on dumps here and here.
When you dump Oracle's shared pool you will get a file having tens of gigabytes. And you will block the whole database till it is done.
This is something you usually do NOT want to do on production database.
Oracle's diagnostic capabilities are far beyond of JVM can provide.
For a brief view on memory usage you van use V$SGASTAT and v$sga_resize_ops
First of all you can also look into the past and can analyze past problems. google for AWR, ASH, STATSPACK reports.
Blocking problems you can use hang analyze tool
For data consistency problems you can use auditing or logminer
For detailed tracing a single session you can use tkprof, trca or even real-time sql monitoring. or v$active_session_history.
Oracle has something called wait interface so whenever the database spends some time doing something, some counter is increased. There are various tools which access these counters. Each of them serves for a particular purpose.
So yes, you can also dump Oracle's shared pool, but this is usually last resort way, how to diagnose problems in Oracle.

Google App Engine Memory Limit - Task Queue

Is there a memory limit to the Task Queue on Google App Engine. I'm specifically concerned with the Go runtime, but it would be nice to get answers on all runtimes if someone can provide them.
The tasks are executed by the same app instance(s) as the regular requests, only tasks are allowed to run longer. So the same memory limits apply (also subject to the task queue specific limits and quota, which might also eat into the instance memory).
You might chose to direct tasks to a dedicated module to which you assign an instance class with more memory (more powerful as well), if memory consumption is a concern.
But since the max instance class memory size is currently 1G I suspect your instance will most likely hit the 'soft private memory limit' and be killed before loading a full 1G file into memory :)
A "task" is essentially represented by a URL that's stored away for later delivery to an instance of your app. The representation of a task is independent of language, unless you use a stringified, language-specific serialization of something as a value.
If by memory limit you mean "how much (where much = count*size) task queue stuff can I have pending?," the answer is spelled out in the Task Queue section of the quotas document.
If you're asking how big a single task can be, that'll depend on the memory size of your instances, since you'll need enough memory to construct a task before enquing it.
For task processing the app instances need enough memory to accept and process a task, or enough memory to accept and process many concurrently, if your app is configured to accept multiple simultaneous requests. How much memory that takes beyond accepting the URL is basically up to how the app is coded.

Resources