Collect JVM metrics like heap usages GC info in cloudwatch for fargate services - heap-memory

We are using fargate to deploy our ecs containers. It gives us CPU and Memory utilization dashbaords. We need to capture heap usages pattern/gc/thread usages info for ecs tasks. How could we collect this infor and use cloud watch to set alarnms and monitoring dahsboard on it.
Thanks

We use micrometer to collect our application metrics and Jvm metrics.

Related

GCP Monitoring App Engine instance memory

In Google Cloud Platform Monitoring, is there any way to monitor memory usage by App Engine instance? I see that there is the Memory usage metric for the GAE Application resource type, but I don't see a Memory usage metric for the GAE Instance resource type.
My particular use case is that I'd like to see (and make alerts based on) memory usage per instance.
Currently there's no built-in metric to monitor per instance memory. The only way to monitor the Memory Usage within a GAE Instance is by creating a custom metric, in this link you can find more information about custom metrics.
Just remember that the amount of instances will increase according to the load, so a per instance alert shouldn't reflect the actual load of your application, that's why the memory usage is normally measured by group on the built in metric rather than per instance.

Flink: How to configure Flink such that the Taskmanagers auto restart after a failure?

How to configure Flink such that the Taskmanagers auto restart after a failure ?
On yarn and kubernetes Flink has a native resource manager (YarnResourceManager and KubernetesResourceManager) that will arrange for the requested number of slots to be available. In other environments you'll need to use cluster-framework-specific mechanisms to take care of this yourself.
Note that for k8s, only session clusters are supported by this new, more active mode implemented by KubernetesResourceManager. Job clusters still need to be managed in the old fashioned way, as described in the docs.
And then there are managed Flink environments where these details are taken care of for you -- e.g., Ververica Platform or Kinesis Data Analytics.

How to monitor Flink Backpressure in Grafana with Prometheus metrics

Flink Web UI has a brilliant backpressure section. But I can not see any metrics, given by Prometheus reporter, which could be used to detect backpressure in the same way for a Grafana dashboard.
Is there some way to get the same metrics outside of the Flink Web UI? Using the metrics described here https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html. Or even having a prometheus scraper for scraping the web api?
The back pressure monitoring that appears in the Flink dashboard isn't using the metrics system, so those values aren't available via a MetricsReporter. But you can access this info via the REST api at
/jobs/:jobid/vertices/:vertexid/backpressure
While this back pressure detection mechanism is useful, it does have its limitations. It works by calling Thread.getStackTrace(), which is expensive, and some operators (such as AsyncFunction) do critical activities in threads that aren't being sampled.
Another way to investigate back pressure is to set this configuration option in flink-conf.yaml
taskmanager.network.detailed-metrics: true
and then you can look at the metrics measuring inbound/outbound network queue lengths.

Flink latency metrics not being shown

While running Flink 1.5.0 with a local environment I was trying to get latency metrics via REST (with something similar to http://localhost:8081/jobs/e779dbbed0bfb25cd02348a2317dc8f1/vertices/e70bbd798b564e0a50e10e343f1ac56b/metrics) but there isn't any reference to latency.
All of this while the latency tracking is enabled which I confirmed by checking with the debugger that the LatencyMarksEmitter is emiting the marks.
What can I be doing wrong?
In 1.5 latency metrics aren't exposed for tasks but for jobs instead, the reasoning being that latency metrics inherently contain information about multiple tasks. You have to query "http://localhost:8081/jobs/e779dbbed0bfb25cd02348a2317dc8f1/metrics" instead.

Filesystem metrics from Heapster api is not available

I set up heapster+influxdb+grafana combination for my Minikube kubernetes cluster.In heapster metrics api documentation they mention about filesystem metrics along with cpu ,memory, network related apis. I can get CPU , memory related metrics by using hepster api. But I am not able to access filesystem metrics using api. Any help guys?

Resources