We are using Flink 1.8.0 and running it on EMR - Yarn and would like to measure the throughput.
Because our operators are chained, we have added meters and counters in our code - essentially an async operator that makes API calls with kinesis as both source and sync. In the Application Master i.e. Flink's web UI, we are able to get the value for the counters but not the meters.
public class AsyncClass extends RichAsyncFunction<String, String> {
private transient Counter counter;
private transient Meter meter;
public void open(Configuration parameters) throws Exception {
this.counter = getRuntimeContext()
this.meter = getRuntimeContext()
.meter("myMeter", new DropwizardMeterWrapper(new com.codahale.metrics.Meter()));
public void close() throws Exception {
ExecutorUtils.gracefulShutdown(20000, TimeUnit.MILLISECONDS, executorService);
public void asyncInvoke(String key, final ResultFuture<String> resultFuture) throws Exception {
To measure the complete throughput of the application, we obviously need the throughput of all the task managers together. Using meters, we are able to get the metrics for individual task managers. Is there any way to measure it at the operator level?
Turns out the meter displays whole number values and the rate is measured in decimals. When my load was a constant 1 event per second, it was actually measured as 0.9xxx something and hence was showing only 0 events per second.
In order to improve the performance of data process, we store events to a map and do not process them untill event count reaches 100.
in the meantime, start a timer in open method, so data is processed every 60 seconds
this works when flink version is 1.11.3,
after upgrading flink version to 1.13.0
I found sometimes events were consumed from Kafka continuously, but were not processed in RichFlatMapFunction, it means data was missing.
after restarting service, it works well, but several hours later the same thing happened again.
any known issue for this flink version? any suggestions are appreciated.
public class MyJob {
public static void main(String[] args) throws Exception {
DataStream<String> rawEventSource = env.addSource(flinkKafkaConsumer);
public class MyMapFunction extends RichFlatMapFunction<String, String> implements Serializable {
public void open(Configuration parameters) {
long periodTimeout = 60;
pool.scheduleAtFixedRate(() -> {
// processing data
}, periodTimeout, periodTimeout, TimeUnit.SECONDS);
public void flatMap(String message, Collector<String> out) {
// store event to map
// count event,
// when count = 100, start data processing
You should avoid doing things with user threads and timers in Flink functions. The supported mechanism for this is to use a KeyedProcessFunction with processing time timers.
I want to evaluate the time costed between an event reaches the system and get finished, and I think getting ingestion time will help, but how to do get it?
You probably want to use latency tracking. Alternatively, you can add the processing time directly after the source in a chained process function (with Context->TimerService#currentProcessingTime()).
Based on the reply from David, to get the ingest time we can chain the process method with source.
Below code shows the way to get the ingest time. Also in case the same need to be used for metrics to get the difference between ingest time & event time, I have used histogram metric group to do that.
Below code snippet might help you to better understand.
DataStream<EventDataMapping> text = env
.fromSource(source, WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofSeconds(5)),"Kafka Source")
.process(new ProcessFunction<EventDataMapping, EventDataMapping>() {
private transient DescriptiveStatisticsHistogram eventVsIngestionTimeLag;
private static final int EVENT_TIME_LAG_WINDOW_SIZE = 10_000;
public void open(Configuration parameters) throws Exception {
eventVsIngestionTimeLag = getRuntimeContext().getMetricGroup().histogram("eventVsIngestionTimeLag",
new DescriptiveStatisticsHistogram(EVENT_TIME_LAG_WINDOW_SIZE));
public void processElement(EventDataMapping eventDataMapping, Context context, Collector<EventDataMapping> collector) throws Exception {
LOG.info("process element event time "+context.timestamp()+" current ingestTime "+context.timerService().currentProcessingTime());
eventVsIngestionTimeLag.update(context.timerService().currentProcessingTime() - context.timestamp());
We are in the middle of testing scaling ability of Flink. But we found that scaling not working, no matter increase more slot or increase number of Task Manager. We would expect a linear, if not close-to-linear scaling performance but the result even show degradation. Appreciated any comments.
Test Details,
-VMWare vsphere
-Just a simple pass through test,
- auto gen source 3mil records, each 1kb in size, parallelism=1
- source pass into next map operator, which just return the same record, and sent counter to statsD, parallelism is in cases = 2,4,6
3 TM, total 6 slots(2/TM) each JM/TM has 32 vCPU, 100GB memory
2 slots: 26 seconds, 3mil/26=115k TPS
4 slots: 23 seconds, 3mil/23=130k TPS
6 slots: 22 seconds, 3mil/22=136k TPS
As shown the scaling is almost nothing. Any clue? Thanks.
You really should be using a RichParallelSourceFunction. If you care about making the records from different instances of the source distinct, you can get ahold of each instance's index from the RuntimeContext, which is available via the getRuntimeContext() method in the RichFunction interface.
Also, Flink has a built-in statsd metrics reporter that you should be using instead of rolling your own. Moreover, numRecordsIn, numRecordsOut, numRecordsInPerSecond, and numRecordsOutPerSecond are already being computed for you, so no need to create this instrumentation yourself. You can also access these metrics via Flink's web interface, or the REST API.
As for why you might be experiencing poor scalability with the Kafka consumer, there are many things that could cause this. If you are using event time processing, then idle partitions could be holding things up (see https://issues.apache.org/jira/browse/FLINK-5479). If the stream is keyed, then data skew could be an issue. If you are connecting to an external database or service, then it could easily be a bottleneck. If checkpointing is misconfigured it could cause this. Or insufficient network capacity.
I would start to debug this by looking at some key metrics in the Flink web UI. Is the load well balanced across the sub-tasks, or is it skewed? You could turn on latency tracking and see if one of the kafka partitions is misbehaving (by inspecting the latency at the sink(s), which will be reported on a per-partition basis). And you could look for back pressure.
please refer to the sample code,
public class passthru extends RichMapFunction<String, String> {
public void open(Configuration configuration) throws Exception {
... ...
stats = new NonBlockingStatsDClient();
public String map(String value) throws Exception {
... ...
return value;
public class datagen extends RichSourceFunction<String> {
... ...
public void run(SourceContext<String> ctx) throws Exception {
int i = 0;
while (run){
String idx = String.format("%09d", i);
ctx.collect("{\"<a 1kb json content with idx in certain json field>\"}");
if(i == loop)
run = false;
... ...
public class Job {
public static void main(String[] args) throws Exception {
... ...
DataStream<String> stream = env.addSource(new datagen(loop)).rebalance();
DataStream<String> convert = stream.map(new passthru(statsdUrl));
the reductionState code,
dataStream.flatMap(xxx).keyBy(new KeySelector<xxx, AggregationKey>() {
public AggregationKey getKey(rec r) throws Exception {
... ...
}).process(new Aggr());
public class Aggr extends ProcessFunction<rec, rec> {
private ReducingState<rec> store;
public void open(Configuration parameters) throws Exception {
store= getRuntimeContext().getReducingState(new ReducingStateDescriptor<>(
"reduction store", new ReduceFunction<rec>() {
... ...
public void processElement(rec r, Context ctx, Collector<rec> out)
throws Exception {
... ...
I want to show numRecordsIn for an operator in Flink and for doing this I have been following ppt by data artisans at here. code for the counter is given below
public static class mapper extends RichMapFunction<String,String>{
public Counter counter;
public void open(Configuration parameters) throws Exception {
this.counter = getRuntimeContext()
public String map(String s) throws Exception {
System.out.println("counter val " + counter.toString());
return null;
The problem is that how do I specify which operator I want to show number_of_Records_In?
Metric counter are exposed via Flink's metric system. In order to take a look at them, you have to configure a metric reporter. A description how to register a metric reporter can be found here.
Flink includes a number of built-in metrics, including numRecordsIn. So if that's what you want to measure, there's no need to write any code to implement that particular measurement. Similarly for numRecordsInPerSecond, and a host of others.
The code you asked about causes the numRecordsIn counter to be incremented for the operator in which the metric is being used.
A good way to better understand the metrics system is to bring up a simple streaming job and look at the metrics in Flink's web ui. I also found it really helpful to query the monitoring REST api while a job was running.
I am running flink from within eclipse where necessary jars have been fetched by Maven. My machine has a processor with eight cores and the streaming application I have to write reads lines from its input and calculates some statistics.
When I run the program on my machine, I expected flink to use all the cores of the CPU as well-threaded code. However, when I watch the cores, I see that only one core is being used. I tried many things and left in the following code my last try, i.e. setting the parallelism of the environment. I also tried to set it for the stream alone and so on.
public class SemSeMi {
public static void main(String[] args) throws Exception {
System.out.println("Starting Main!");
StreamExecutionEnvironment env = StreamExecutionEnvironment
env.socketTextStream("localhost", 9999).flatMap(new SplitterX());
public static class SplitterX implements
FlatMapFunction<String, Tuple2<String, Integer>> {
public void flatMap(String sentence,
Collector<Tuple2<String, Integer>> out) throws Exception {
// Do Nothing!
I fed the programm with data using netcat:
nc -lk 9999 < fileName
The question is how to make the program scale locally and use all available cores?
You don't have to specify the degree of parallelism explicitly. Jobs which are run with the default setting will set the parallelism automatically to the number of available cores.
In your case, the source will be run with parallelism of 1 since reading from a socket cannot be distributed. However, for the flatMap operation the system will instantiate 8 instances. If you turn on logging, then you will also see it. Now the input data is distributed to the flatMap tasks in a round-robin fashion. Each of the flatMap tasks is executed by an individual thread.
I would suspect that the reason why you only see load on a single core is because the SplitterX does not do any work. Try the following code which counts the number of characters in each String and then prints the result to the console:
public static void main(String[] args) throws Exception {
System.out.println("Starting Main!");
StreamExecutionEnvironment env = StreamExecutionEnvironment
env.socketTextStream("localhost", 9999).flatMap(new SplitterX()).print();
public static class SplitterX implements
FlatMapFunction<String, Tuple2<String, Integer>> {
public void flatMap(String sentence,
Collector<Tuple2<String, Integer>> out) throws Exception {
out.collect(Tuple2.of(sentence, sentence.length()));
The numbers at the start of each line tell you which task printed the result.