How to request Flink job metrics between start-time and end-time? - apache-flink

I am trying to use Flinks monitoring REST API in order to retrieve some metrics for a specific time period.
Looking at the documentation, I can find the metrics of the job by navigating to http://hostname:8081/jobs/:jobid and I have the following:
{
"jid":"692c1d818afb77daaca891484e0b6a7g",
"name":"myjob",
"isStoppable":false,
"state":"RUNNING",
"start-time":1570552858876,
"end-time":-1,
"duration":62639599,
"now":1570615498475,
...
I would like to know if there is a method for requesting metrics from a specific start-time and end-time, the documentation does not mention if this can be done.

I dont think that you can achieve that via Rest API.
But you can defiantly export flink metrics for further analysis.

Related

Prometheus - Database Access

I'm trying to connect to a SQL Server database via Prometheus. I think I'm supposed to do this using mssql_exporter or sql_exporter but I simply don't know how. I can see the metrics of prometheus itself and use those metrics to build a graph but again, I'm trying to do that with a database. The query doesn't matter, I just need to somehow access a database through prometheus. I've come to this point by watching some tutorials and web searching but I'm afraid I'm stuck at this point. Can anyone help me on this topic. Maybe there is a good tutorial I overlooked or maybe I'm having a hard time understanding the documentation but I would really appreciate some form of help very much. Thanks in advance.
Prometheus scrapes the metrics via HTTP. The exporters take the metrics and expose them in a format, so that prometheus can scrape them.
What you can check:
is the exporter exporting the metrics (can you reach the /metrics page with your browser or curl)
are there any warnings or rrors in the logs of the exporter
is prometheus able to scrape the metrics (open prometheus - status - targets)
I've figured how to do what I asked:
You want to download Prometheus and the exporter you need.
You want to configure your 'exporter.yml' file: In my case, it was the data_source_name variable in the 'sql_exporter.yml' file. By default, it is set to:
data_source_name: 'sqlserver://prom_user:prom_password#dbserver1.example.com:1433'
So you want to change 'prom_user:prom_password' part to your SQL Server user name and password, 'dbserver1.example.com' part to your server name which is the top name you see on your object explorer in SSMS.
After these, you need to let prometheus know about your exporter. Therefore, you need to configure your prometheys.yml file and add a new job. Name it whatever you'd like and write the port of the exporter that it is working on. After you've done that, you can see if it worked through localhost:9090/targets (9090 being the prometheus default port here). If you can see the exporter there, that means this step was successful and you can now see the metrics your exporter is exporting.
You can now add prometheus as a data source to grafana and use the metrics you need to build a dashboard.

Prometheus alert for flink failed job?

I'm trying to monitor the availability of my flink jobs using Prometheus alerts.
I have tried with the flink_jobmanager_job_uptime/downtime metrics but they don't seem to fit since they just stop being emmited after the job has failed/finished.
I have already been pointed out to the numRunningJobs metric in order to alert of a missing job. I don't want to use this solution since I would have to update my prometheus config each time i want to deploy a new job.
Has anyone managed to create this alert of a Flink failed job using Prometheus?
Prometheus has an absent() function that will return 1 if the metric don't exist. So, you can just set the alert expression to something like
absent(flink_jobmanager_job_uptime) == 1

How to expose Hystrix jmx for Prometheus

I'm new to Hystrix and I just created my first Hystrix Commands. The commands are being created and executed in a loop so the metrics data should have being registered. I am using the servo metrics publisher as follows:
HystrixPlugins.getInstance()
.registerMetricsPublisher(HystrixServoMetricsPublisher.getInstance());
EDIT:
Looking at the JConsole I found the related metrics definition as follows in the link:
jconsole
I am not using spring, eureka, servo to read data and run the app.
I would like to know how to expose this data in a way that prometheus can read. I tried hystrix-prometheus, but the documentation is not helpful when it is about where the metrics are being exposed, how to get them or check the them.
In order to retrieve Hystrix metrics, you'll first need to get Prometheus' Java Simple Client up and running. The setup depends on your environment. Independent of your environment the result should be a URL where you can retrieve i.e. simple Java metrics.
Once that it up and running, you can use the line
HystrixPrometheusMetricsPublisher.register("application_name");
to register the additional Hystrix metrics. They will be served by the same URL. Please note that you will see Hystrix metrics only after the first call of a Hystrix enabled command.

how to query the status of an apache flink web console [metrics api]

Is there any URL that exporting a json file that describes the status of the of a flink cluster service?
I.e sys uptime, jobs status, number of nodes, etc...
You should definitely have a look at the Monitoring REST API.
The documentation for that feature can be accessed here
You can also access some TaskManager metrics, unfortunately they are not yet described in the doc, but you can have a look at source-code: WebRuntimeMonitor

Google Monitoring API : Get Values

I'm trying to use the Google Monitoring API to retrieve metrics about my cloud usage. I'm using the Google Client Library for Python.
The API advertises the ability to access over 900 Stackdriver Monitoring Metrics. I am interested in accessing some Google App Engine metrics, such as Instance count, total memory, etc. The Google API Metrics page has a list of all the metrics I should be able to access.
I've followed the guides on the Google Client Library page , but my script making the API calls is not printing the metrics, it is just printing the metric descriptions.
How do I use the Google Monitoring API to access the metrics, rather than the descriptions?
My Code:
from oauth2client.service_account import ServiceAccountCredentials
from apiclient.discovery import build
...
response = monitor.projects().metricDescriptors().get(name='projects/{my-project-name}/metricDescriptors/appengine.googleapis.com/system/instance_count').execute()
print(json.dumps(response, sort_keys=True, indent=4))
My Output
I expect to see the actual instance count. How can I achieve this?
For anyone reading this, I figured out the problem. I was assuming the values would come from the 'metric descriptors' class in the api, but that was a poor assumption.
For values, you need to use a 'timeSeries' call. For this call, you need to specify the project you want to monitor, start time, end time, and a filter (the metric you want, such as cpu, memory, etc.)
So, to retrieve the app engine project memory, the above code becomes
request = monitor.projects().timeSeries().list(name='projects/my-appengine-project',
interval_startTime='2016-05-02T15:01:23.045123456Z',
interval_endTime='2016-06-02T15:01:23.045123456Z',
filter='metric.type="appengine.googleapis.com/system/memory/usage"')
response = request.execute()
This example has the start time and end time to cover a month of data.

Resources