Flink with Ceph as the persistent storage - apache-flink

Flink documents suggests that Ceph can be used as a persistent storage for states. https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/checkpointing.html
Considering that Ceph is a transactional database, wouldn't it have adverse effect on Flink's performance?

Ceph describes itself as a "unified, distributed storage system" and provides a network file system API. As such, it such should be seamlessly working with Flink's state backends that persist checkpoints to a remote file system.
I'm not aware of people using Ceph (HDFS and S3 are more commonly used) and have no information about the performance. However, note that Flink is able to write checkpoints asynchronously, such that the performance of the storage system does not affect the processing speed of a Flink application. It might however, constrain the interval in which checkpoints are taken.
Update:
(Feb. 2018) I noticed that multiple users reported on Flink's user mailing list that they are using Ceph with Flink.
Update 2:
Flink is working fine with S3 protocol and both (Presto & Hadoop) Flink's S3 FileSystem plugins are working fine with it.

Related

How Apache Flink manages MQTT consumer offsets

I'm using MQTT consumer as my flink job's data source. I'm wondering how to save the data offsets into checkpoint to ensure that no data lost when flink cluster restarts after a failure. I've see lots of articles introducing how apache flink manages kafka consumer offsets. Does anyone know whether apache flink has its own function to manage MQTT consumer? Thanks.
If you have a MQTT consumer, you should make sure it uses the Data Source API. You can read about that on https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/ - That also includes how to work integrate with checkpointing. You can also read the details in FLIP-27 https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
You shoul read state-backends part of documentation. And checkpoints section.
When checkpointing is enabled, managed state is persisted to ensure
consistent recovery in case of failures. Where the state is persisted
during checkpointing depends on the chosen Checkpoint Storage.

How to store checkpoint into remote RocksDB in Apache Flink

I know that there are three kinds of state backends in Apache Flink: MemoryStateBackend, FsStateBackend and RocksDBStateBackend.
MemoryStateBackend stores the checkpoints into local RAM, FsStateBackend stores the checkpoints into local FileSystem, and RocksDBStateBackend stores the checkpoints into RocksDB. I have some questions about the RocksDBStateBackend.
As my understanding, the mechanism of RocksDBStateBackend has been embedded into Apache Flink. The rocksDB is a kind of key-value DB. So If I'm right, it means that Flink will store all checkpoints into the embedded rocksDB, which uses the local disk.
If so, I think the disk could be exhausted in some cases because of the checkpoints stored into the rocksDB. Now I'm thinking if it is possible to configure a remote rocksDB to store these checkpoints? If it is possible, should we worry about the remote rocksDB crashing? If the remote rocksDB crashes, the jobs of Flink can not continue working, right?
There is no option to use an external or remote RocksDB with Apache Flink. RocksDB is an embedded key-value store with a local instance in each task manager.
Several points:
Flink makes a strong distinction between the working state, which is always local (for good performance), and state snapshots (checkpoints and savepoints), which are not local (for reliability they should be stored in a distributed file system).
The RocksDBStateBackend uses the local disk for working state. The other two state backends keep their working state on the Java heap.
The checkpoint coordinator arranges for all of these slices of data scattered across all of the task managers to be collected together into complete checkpoints that are stored elsewhere. In the case of the MemoryStateBackend those checkpoints are stored on the JobManager heap; for the other two, they are in a distributed file system.
You want to configure RocksDB to use the fastest available local file system. Try to use locally attached SSDs, and avoid network-attached storage (such as EBS). Do not try to use a distributed file system such as S3 as RocksDB's local storage.
state.backend.rocksdb.localdir controls where each local RocksDB stores its working state.
The parameter to the RocksDBStateBackend constructor controls where the checkpoints are stored. E.g., using S3 as recommended by #ezequiel is the obvious choice on AWS.
RocksDB can work with any supported Filesystem by Flink
https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/
If you are running Flink probably you want to checkpoint, and resume from them.
I would externalise the storage outside the node. I you are using a cloud provider like AWS, then S3 is the right option.
So you should probably write something like:
new RocksDBStateBackend("s3://my-bucket", true); and assing it to your execution environment.
Please check the above documentation to configure properly your filesystem.

How to use NATS Streaming Server with Apache flink?

I want to use NATs streaming server to streaming data and using Flink want to process on data.
how I can use apache flink to process real-time streaming data with NATS streaming server?
You'll need to either find or develop a Flink/NATS connector, or mirror the data into some other stream storage service that is already has Flink support. There is not a NATS connector among the connectors that are part of Flink, or Apache Bahir, or in the collection of Flink community packages. But if you search around, you will find some relevant projects on github, etc.
When evaluating a connector implementation, in addition to the usual considerations, consider these factors:
does it provide both consumer and producer interfaces?
does it do checkpointing?
what processing guarantees does it provide? (at least once, exactly once)
how good is the error handling?
performance: e.g., is it somehow batching writes?
how does it handle serialization?
does it expose any metrics?
If you decide to write your own connector, there are existing connectors for similar systems you can use as a reference, e.g., Nifi, Pulsar, etc. And you should be aware that the interfaces used by data sources are currently being refactored under the umbrella of FLIP-27.

RocksDBStateBackend in Flink: how does it works exactly?

I have read the official Flink's documentation about the State Backends, here. In particular, I was interested in the RocksDBStateBackend.
I don't understand, if I enable this kind of backend, RocksDB will be accessible from TaskManagers through another node inside the Flink's cluster?
What I have understood so far about the RocksDBStateBackend is that Task Managers will store the states inside their memory, i.e. the memory of the JVM process. After that, will they send the states to store inside RocksDB? If yes, where is RocksDB inside the Flink's cluster? Where is it phisically?
RocksDB is an embedded database. If you are using RocksDB as your state backend for Flink, then each task manager has a local instance of RocksDB, which runs as a native (JNI) library inside the JVM. When using RocksDB, your state lives as serialized bytes on the local disk, with an in-memory (off-heap) cache.
During checkpointing, the SST files from RocksDB are copied from the local disk to the distributed file system where the checkpoint is stored. If the local recovery option is enabled, then a local copy is retained as well, to speed up recovery. But it wouldn't be safe to rely only on the local copy, as the local disk might be lost if the node fails. This is why checkpoints are always stored on a distributed file system.
The alternative to RocksDB is to use one of the heap-based state backends, in which case your state will live as objects on the JVM heap.

Apache Flink checkpointing

We are using Apache Flink, and it's good framework.
But now we have a question - what checkpoint storage should we use?
Our demands:
Checpoint storage should be shared for all Flink jobs.
Secure, we need guaranties, like SSL or smth else
Fast
What storage you use?
For example, we think about Ceph or Hdfs.
But for Ceph i can't find any examples and success stories, except mount storage as NFS and use it as state backend like file:// .
If you use Ceph - please write about your config and pros/cons.
Or i will be graceful for any information and advices for this theme.
Thanx a lot!

Resources