I'm testing the performance of flink processing different amount of data, so I need the Job Runtime to record and analyse .
When I use flink to processing a small dataset like ten thousand records, I can get the Job Runtime log as below.
07/18/2017 17:41:47 DataSink (collect())(1/1) switched to FINISHED
07/18/2017 17:41:47 Job execution switched to status FINISHED.
Program execution finished
Job with JobID 3f7658725aaae8cd3427d2aad921f2ef has finished.
Job Runtime: 1124 ms
Accumulator Results:
- c28953fb854da74d18dc7c168b988ca2 (java.util.ArrayList) [15433 elements]
But when I use flink to processing a little bit larger dataset like Fifty thousand records, I can't get Job Runtime info, as below, and the shell stucked:
07/18/2017 17:49:33 DataSink (collect())(1/1) switched to FINISHED
07/18/2017 17:49:33 Job execution switched to status FINISHED.
Is there any configuration I need to modify?
Why the shell stucked when the dataset is bigger?
Hope someone can answer my doubts.Thanks~
Flink uses Akka for remote communication, and the accumulator results are sent as a single message back to the client. Akka imposes a maximum message size, and you may be hitting the limit. A few suggestions:
Check the JobManager log for error messages related to Akka.
Increase the maximum size via the Flink configuration, e.g. akka.framesize. See Flink documentation for more information.
Related
I'm using Flink + Kafka to process streaming documents. I have set up filters on the documents to stop strange documents from coming into Flink jobs, but still there are types of documents that I couldn't foresee. If the job consumes these documents, it will take extra long time.
Like I have seen in the checkpoints of the job, many processes finish quite fast and are waiting for the slow ones to finish (e.g. in image below, all finished but one). My question is: can I make Flink drop these slow processes after certain threshold, and commit those that are already finished? I tried to set flink.job.checkpoint.timeout but found that the checkpoint will fail if it exceeds the timeout, and will read the last offset and process again. Is there a way to make the checkpoint succeed and read the next offset?
looks like this unaligned checkpoint is what I need. https://flink.apache.org/2020/10/15/from-aligned-to-unaligned-checkpoints-part-1.html
but have to upgrade flink to 1.11, and then set in flink-conf.yaml
execution.checkpointing.unaligned: true
execution.checkpointing.aligned-checkpoint-timeout: 60 s
I am new to flink and i deployed my flink application which basically perform simple pattern matching. It is deployed in Kubernetes cluster with 1 JM and 6 TM. I am sending messages of size 4.4k and 200k messages every 10 min to eventhub topic and performing load testing. I added restart strategy and checking pointing as below and i am not explicitly using any states in my code as there is no requirement for it
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// start a checkpoint every 1000 ms
env.enableCheckpointing(interval, CheckpointingMode.EXACTLY_ONCE);
// advanced options:
// make sure 500 ms of progress happen between checkpoints
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(1000);
// checkpoints have to complete within one minute, or are discarded
env.getCheckpointConfig().setCheckpointTimeout(120000);
// allow only one checkpoint to be in progress at the same time
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
// enable externalized checkpoints which are retained after job cancellation
env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
// allow job recovery fallback to checkpoint when there is a more recent savepoint
env.getCheckpointConfig().setPreferCheckpointForRecovery(true);
env.setRestartStrategy(RestartStrategies.fixedDelayRestart(
5, // number of restart attempts
Time.of(5, TimeUnit.MINUTES) // delay
));
Initially i was facing Netty server issue with network buffer and i followed this link https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#taskmanager-network-memory-floating-buffers-per-gate flink network and heap memory optimizations and applied below settings and everything is working fine
taskmanager.network.memory.min: 256mb
taskmanager.network.memory.max: 1024mb
taskmanager.network.memory.buffers-per-channel: 8
taskmanager.memory.segment-size: 2mb
taskmanager.network.memory.floating-buffers-per-gate: 16
cluster.evenly-spread-out-slots: true
taskmanager.heap.size: 1024m
taskmanager.memory.framework.heap.size: 64mb
taskmanager.memory.managed.fraction: 0.7
taskmanager.memory.framework.off-heap.size: 64mb
taskmanager.memory.network.fraction: 0.4
taskmanager.memory.jvm-overhead.min: 256mb
taskmanager.memory.jvm-overhead.max: 1gb
taskmanager.memory.jvm-overhead.fraction: 0.4
But i have two below questions
If any task manager restarts because of any failures the task manager is restarting successfully and getting registered with job manager but after the restarted task manager don't perform any processing of data it will sit idle. Is this normal flink behavior or do i need to add any setting to make task manager to start processing again.
Sorry and correct me if my understanding is wrong, flink has a restart strategy in my code i made limit 5 attempts of restart. What will happen if my flink job is not successfully overcomes the task failure entire flink job will be remained in idle state and i have to restart job manually or is there any mechanism i can add to restart my job even after it crossed the limit of restart job attempts.
Is there any document to calculate the number of cores and memory i should assign to flink job cluster based on data size and rate at which my system receives the data ?
Is there any documentation on flink CEP optimization techniques?
This is the error stack trace i am seeing in job manager
I am seeing the below errors in my job manager logs before the pattern matching
Caused by: org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager '/10.244.9.163:46377'. This might indicate that the remote task manager was lost.
at org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:136)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
at org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:393)
at org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:358)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1416)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:816)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:748)
Thanks in advance, please help me in resolving my doubts
Various points:
If your patterns involve matching temporal sequences (e.g., "A followed by B"), then you need state to do this. Most of Flink's sources and sinks also use state internally to record offsets, etc., and this state needs to be checkpointed if you care about exactly-once guarantees. If the patterns are being streamed in dynamically, then you'll want to store the patterns in Flink state as well.
Some of the comments in the code don't match the configuration parameters: e.g., "500 ms of progress" vs. 1000, "checkpoints have to complete within one minute" vs 120000. Also, keep in mind that the section of the documentation that you copied these settings from is not recommending best practices, but is instead illustrating how to make changes. In particular, env.getCheckpointConfig().setPreferCheckpointForRecovery(true); is a bad idea, and that config option should probably not exist.
Some of your entries in config.yaml are concerning. taskmanager.memory.managed.fraction is rather large (0.7) -- this only makes sense if you are using RocksDB, since managed memory has no other purpose for streaming. And taskmanager.memory.network.fraction and taskmanager.memory.jvm-overhead.fraction are both very large, and the sum of these three fractions is 1.5, which doesn't make sense.
In general the default network configuration works well across a wide range of deployment scenarios, and it is unusual to need to tune these settings, except in large clusters (which is not the case here). What sort of problems did you encounter?
As for your questions:
After a TM failure and recovery, the TMs should automatically resume processing from the most recent checkpoint. To diagnose why this isn't happening, we'll need more information. To gain experience with a deployment that handles this correctly, you can experiment with the Flink Operations Playground.
Once the configured restart strategy has played itself out, the job will FAIL, and Flink will no longer try to recover that job. You can, of course, build your own automation on top of Flink's REST API, if you want something more sophisticated.
Documentation on capacity planning? No, not really. This is generally figured out through trial and error. Different applications tend to have different requirements in ways that are difficult to anticipate. Things like your choice of serializer, state backend, number of keyBys, the sources and sinks, key skew, watermarking, and so on can all have significant impacts.
Documentation on optimizing CEP? No, sorry. The main points are
do everything you can to constrain the matches; avoid patterns that must keep state indefinitely
getEventsForPattern can be expensive
I have a standalone cluster where there is a Flink streaming job with 1-hour event time windows. After 2-3 hour of a run, the job dies with the "org.apache.flink.util.FlinkException: The assigned slot ... was removed" exception.
The job is working well when my windows are only 15minutes.
How can the job recover after losing a slot?
Is it possible to run the same calculations on multiple slots to prevent this error?
Shall I increase any of the timeouts? if so which one?
Flink streaming job recovers from failures from checkpoint. If your checkpoint is externalized, for example in S3. You can manually or ask Flink automatically recover from the most recent checkpoint.
Depends on your upstream message queuing service, you will likely get duplicated messages. So it's good to make your ingestion idempotent.
Also, the slot removed failure can be the symptom of various failures.
underlying hardware
network
memory pressure
What do you see in the task manager log that was removed?
I am working on a flink project which write stream to a relational database.
In the current solution, we wrote a custom sink function which open transaction, execute SQL insert statement and close transaction. It works well until the the data volume increases and we started getting connection timeout issues. We tried a few connection pool configuration adjustment, it does not help much.
We are thinking of trying "batch-insert", so to decrease the number of "writes" to the database. We come across a few classes which do almost what we want: JDBCOutputFormat, JDBCSinkFunction. With JDBCOutputFormat, we can configure the batch size.
We would also like to force a "batch-insert" every 1 minutes if the number of records does not exceed the "batch-size". How would you normally deal with these kinds of problems? My first thoughts is to extend JDBCOutputFormat to use scheduled tasks to force flush every 1 minute, but it was not obvious how it could be done.
Do we have to write our own sink all together?
Updated:
JDBCSinkFunction does a flush and batch execute each time Flink checkpoints. So long as you are doing checkpointing, the batches won't be any longer than the checkpointing interval.
However, having read this mailing list thread, I see that JDBCSinkFunction does not support exactly-once output.
I've setup a Flink 1.2 standalone cluster with 2 JobManagers and 3 TaskManagers and I'm using JMeter to load-test it by producing Kafka messages / events which are then processed. The processing job runs on a TaskManager and it usually takes ~15K events/s.
The job has set EXACTLY_ONCE checkpointing and is persisting state and checkpoints to Amazon S3.
If I shutdown the TaskManager running the job it takes a bit, a few seconds, then the job is resumed on a different TaskManager. The job mainly logs the event ids which are consecutive integers (e.g. from 0 to 1200000).
When I check the output on the TaskManager I shut down the last count is for example 500000, then when I check the output on the resumed job on a different TaskManager it starts with ~ 400000. This means ~100K of duplicated events. This number is dependent on the speed of the test can be higher or lower.
Not sure if I'm missing something but I would expect the job to display the next consecutive number (like 500001) after resuming on the different TaskManager.
Does anyone know why this is happening / extra settings I have to configure to obtain the exactly once?
You are seeing the expected behavior for exactly-once. Flink implements fault-tolerance via a combination of checkpointing and replay in the case of failures. The guarantee is not that each event will be sent into the pipeline exactly once, but rather that each event will affect your pipeline's state exactly once.
Checkpointing creates a consistent snapshot across the entire cluster. During recovery, operator state is restored and the sources are replayed from the most recent checkpoint.
For a more thorough explanation, see this data Artisans blog post: High-throughput, low-latency, and exactly-once stream processing with Apache Flinkā¢, or the Flink docs.