Flink custom metrics are not shown in Datadog - apache-flink

In Flink, I am generating custom metrics in a FlatMapFunction using Python.
class OccupancyEventFlatMap(FlatMapFunction):
def open(self, runtime_context: RuntimeContext):
mg = runtime_context.get_metrics_group()
self.counter_sum = mg.counter("my_counter_sum")
self.counter_total = mg.counter("my_counter_total")
def flat_map(self, value):
self.counter_sum.inc(10)
self.counter_total.inc()
I am able to query the metric using the REST API
http://localhost:43491/jobs/9a376e28a1bb022b45c127d75fb1b447/vertices/5239a5f0e3e9cdca6a88500e58b5759e/metrics?get=0.FlatMap.my_counter_sum
[{"id":"0.FlatMap.my_counter_sum","value":"28201"}]
But I don't see any of my custom metrics in Datadog, however I see all the standard Flink metrics there.
This is my configuration in Flink for Datadog exporter
# Datadog
metrics.reporter.dghttp.class: org.apache.flink.metrics.datadog.DatadogHttpReporter
metrics.reporter.dghttp.dataCenter: US
metrics.reporter.dghttp.apikey: ${datadog_api_key}
metrics.reporter.dghttp.tags: env:development
# https://docs.datadoghq.com/integrations/flink/#configuration
metrics.scope.jm: flink.jobmanager
metrics.scope.jm.job: flink.jobmanager.job
metrics.scope.tm: flink.taskmanager
metrics.scope.tm.job: flink.taskmanager.job
metrics.scope.task: flink.task
metrics.scope.operator: flink.operator
It's the first time I am tying to send custom metrics from Flink to Datadog.
Am I doing something wrong ?
Thanks

I was using the configuration from Flink 1.15 documentation but using it on Flink 1.16
It's working now. These are the changes needed.
+ metrics.reporters: prom,dghttp
+
# Prometheus
- metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
+ metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
metrics.reporter.prom.port: 9249
# Datadog
- metrics.reporter.dghttp.class: org.apache.flink.metrics.datadog.DatadogHttpReporter
+ metrics.reporter.dghttp.factory.class: org.apache.flink.metrics.datadog.DatadogHttpReporterFactory

Related

Zeppelin Python Flink cannot print to console

I'm using Kinesis Data Analytics Studio which provides a Zeppelin environment.
Very simple code:
%flink.pyflink
from pyflink.common.serialization import JsonRowDeserializationSchema
from pyflink.common.typeinfo import Types
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.connectors import FlinkKafkaConsumer
# create env = determine app runs locally or remotely
env = s_env or StreamExecutionEnvironment.get_execution_environment()
env.add_jars("file:///home/ec2-user/flink-sql-connector-kafka_2.12-1.13.5.jar")
# create a kafka consumer
deserialization_schema = JsonRowDeserializationSchema.builder() \
.type_info(type_info=Types.ROW_NAMED(
['id', 'name'],
[Types.INT(), Types.STRING()])
).build()
kafka_consumer = FlinkKafkaConsumer(
topics='nihao',
deserialization_schema=deserialization_schema,
properties={
'bootstrap.servers': 'kakfa-brokers:9092',
'group.id': 'group1'
})
kafka_consumer.set_start_from_earliest()
ds = env.add_source(kafka_consumer)
ds.print()
env.execute('job1')
I can get this working locally can sees change logs being produced to console. However I cannot get the same results in Zeppelin.
Also checked STDOUT in Flink web console task managers, nothing is there too.
Am I missing something? Searched for days and could not find anything on it.
I'm not 100% sure but I think you may need a sink to begin pulling data through the datastream, you could potentially use the included Print Sink Function

Failed to register Prometheus Gauge to Flink

I am trying to expose a Prometheus Gauge in a Flink app:
#transient def metricGroup: MetricGroup = getRuntimeContext.getMetricGroup
.addGroup("site", site)
.addGroup("sink", counterBaseName)
#transient var failedCounter: Counter = _
def expose(metricName: String, gaugeValue: Int, context: SinkFunction.Context[_]): Unit = {
try {
metricGroup.addGroup("hostname", metricName).gauge[Int, ScalaGauge[Int]]("test", ScalaGauge[Int](() => gaugeValue))
}
} catch {
case _: Throwable => failedCounter.inc()
}
}
The app runs locally just fine and expose the metrics without any problem.
While trying to move to production I encounter the following exception in Flink task manager:
WARN org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while registering metric. java.lang.NullPointerException
Not sure, what am I missing here.
Why does the local app expose metrics while on the cluster it fails to register gauge?
I use Prometheus in order to expose other metrics from Flink, for example, failedCounter (in the code) which is a counter.
This is the first time I exposed gauge in Flink so I bet something in my implementation is broken.
Please help.

Where are avgRequestsPerSecond and avgTimePerRequest metrics in solr 7,8

I am coding golang solr exporter which format the same with java solr-exporter of Apache Solr (it ate much RAM) . I want to add more metric like "avgTimePerRequest", "avgRequestsPerSecond".
According to Solr document, it said that can query "avgTimePerRequest" and "avgRequestsPerSecond" via
"http://localhost:8983/solr/admin/metrics?group=core&prefix=UPDATE./update.requestTimes"
"http://localhost:8983/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes"
But when i couldn't see avgTimePerRequest or avgRequestsPerSecond, It only includes these
"count":0,
"meanRate":0.0,
"1minRate":0.0,
"5minRate":0.0,
"15minRate":0.0,
"min_ms":0.0,
"max_ms":0.0,
"mean_ms":0.0,
"median_ms":0.0,
"stddev_ms":0.0,
"p75_ms":0.0,
"p95_ms":0.0,
"p99_ms":0.0,
"p999_ms":0.0
With Solr 6, I can found "avgTimePerRequest" and "avgRequestsPerSecond" in mbean. But solr7,8 I couldn't found it? Does they need to enable?
From SOLR v7.3 Change.txt
SOLR-8785: Metrics related classes in org.apache.solr.util.stats have been removed in favor of
the dropwizard metrics library. Any custom plugins using these classes should be changed to use
the equivalent classes from the metrics library.
As part of this, the following changes were made to the output of Overseer Status API:
* The "totalTime" metric has been removed because it is no longer supported
* The metrics "75thPctlRequestTime", "95thPctlRequestTime", "99thPctlRequestTime"and "999thPctlRequestTime" in Overseer Status API have been renamed to "75thPcRequestTime", "95thPcRequestTime"
and so on for consistency with stats output in other parts of Solr.
The metrics "avgRequestsPerMinute", "5minRateRequestsPerMinute" and "15minRateRequestsPerMinute" have been replaced by corresponding per-second rates viz. "avgRequestsPerSecond", "5minRateRequestsPerSecond" and "15minRateRequestsPerSecond" for consistency with stats output in other parts of Solr.

Flink, where can I find the ExecutionEnvironment#readSequenceFile method?

I have hdfs data files which were originally created by mapreduce job with output settings like below,
job.setOutputKeyClass(BytesWritable.class);
job.setOutputValueClass(BytesWritable.class);
job.setOutputFormatClass(SequenceFileAsBinaryOutputFormat.class);
SequenceFileAsBinaryOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);
Now I'm trying to read these files with Flink DataSet API (version 1.5.6), I look into the flink doc, but couldn't figure out how to do that.
In the doc, there's an API 'readSequenceFile', I just cannot find it in the class ExecutionEnvironment, I can find 'readCsvFile', 'readTextFile', but not this one.
There's a general one 'readFile(inputFormat, path)', but I have no clue what's the inputFormat, it seems this API doesn't accept hadoop input format such as 'SequenceFileAsBinaryInputFormat'.
Could anyone please shed some light here? Many thanks.
I guess what you missed is an additional dependency: "org.apache.flink" %% "flink-hadoop-compatibility" % 1.7.2
Once you added this you can run:
val env = ExecutionEnvironment.getExecutionEnvironment
env.createInput(HadoopInputs.readSequenceFile[Long, String](classOf[Long], classOf[String], "/data/wherever"))
Find a more detail documentation about the what and how here https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/hadoop_compatibility.html
Hope that helps

Hadoop YARN Default schedular

Does Hadoop YARN have a default scheduler ?
Wondering what if yarn.resourcemanager.scheduler.class is not set in conf/yarn-site.xml?
yarn-defualt.xml specifies the value of property: yarn.resourcemanager.scheduler.class = org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.
yarn-default.xml
Hence, if you do not specify the scheduler property in yarn-site.xml then CapacityScheduler is used as default,
For the benefit of future readers of this question:
Different distributions have affinity to different schedulers by default which can be overridden .
The following information about leading distributions is accurate as of the time of writing:
Hortonworks (v2.x) - Capacity Schedulers
Cloudera (v5.x) - Fair Schedulers
MAPR (v5.x) - Fair Schedulers
Big Insights (v2.x) - InfoSphere BigInsights Scheduler (average
response time metrics)
Pivotal HD (v3.x) - Capacity Schedulers
public static final String DEFAULT_RM_SCHEDULER =
"org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
String schedulerClassName = conf.get(YarnConfiguration.RM_SCHEDULER,
YarnConfiguration.DEFAULT_RM_SCHEDULER);
LOG.info("Using Scheduler: " + schedulerClassName);

Resources