Flink S3 Hadoop 2.8.0 - apache-flink

We were trying to use S3 for Flink backend state and checkpoints and used a bucket in Frankfurt (V4 authentication) It gave the error I posted here (Cannot access S3 bucket with Hadoop) and it was due to hadoop. However hadoop 2.8.0 works but there is no Flink support yet.
Guess my question is when will Flink offer a version based on hadoop 2.8.0?

Flink will probably offer a Hadoop 2.8.0 version once that Hadoop version is released.
In the meantime, you can build Flink yourself with a custom Hadoop version:
mvn clean install -DskipTests -Dhadoop.version=2.8.0

Hadoop 2.7.x does work with Frankfurt and v4 API endpoints. If you are having problems, check your joda-time version as an odd combination of old joda-time JARs and Java versions causes AWS to get the wrong-formatted timestamp, which it then rejects in the ubiquitous "bad auth" message.

Related

How to move Redash hostservice to my ec2? Migration script is not working

Redash is discontinuing their hosting service app.redash.io/****. I follow this doc to standup an AWS EC2 instance from opensource AMI. First, redash-toolbelt seems installed but can't find redash-migrate. Then I cloned the repo and checked out the issue-5. The recommended migration is not working for this AMI. pip install cannot find peotry.
$ pip3 install peotry
Collecting peotry
Could not find a version that satisfies the requirement peotry (from versions: )
No matching distribution found for peotry
Is there a better way to migrate all my data from Redash site to my EC2? (backup & restore) I do not have CLI access to the Redash hosting site.
I guess you've made a typo. It should be pip install poetry. What you have written is peotry.

FlinkKafkaConsumer fails to read from a LZ4 compressed topic

We've got several flink applications reading from Kafka topics, and they work fine. But recently we've added a new topic to the existing flink job and it started failing immediately on startup with the following root error:
Caused by: org.apache.kafka.common.KafkaException: java.lang.NoClassDefFoundError: net/jpountz/lz4/LZ4Exception
at org.apache.kafka.common.record.CompressionType$4.wrapForInput(CompressionType.java:113)
at org.apache.kafka.common.record.DefaultRecordBatch.compressedIterator(DefaultRecordBatch.java:256)
at org.apache.kafka.common.record.DefaultRecordBatch.streamingIterator(DefaultRecordBatch.java:334)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.nextFetchedRecord(Fetcher.java:1208)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1245)
... 7 more
I found out that this topic has the lz4 compression and guess that flink for some reason is unable to work with it. Adding lz4 dependencies directly to the app didn't work, and what's weird - it runs fine locally, but fails on the remote cluster.
The flink runtime version is 1.9.1, and we have the same version of all other dependencies in our application:
flink-streaming-java_2.11, flink-connector-kafka_2.11, flink-java and flink-clients_2.11
Could this be happening due to flink not having a dependency to the lz4 lib inside?
Found the solution. No version upgrade was needed, nor the additional dependencies to the application itself. What worked out for us is adding the lz4 library jar directly to the flink libs folder in the Docker image. After that, the error with lz4 compression disappeared.

Flink 1.9.1 No FileSystem for scheme "file" error when submit jobs to cluster

we are recently upgrading our flink cluster to version 1.9.1. Error related to hadoop s3a occurs. The message is as below.
2020-01-16 08:39:49,283 ERROR org.apache.flink.runtime.blob.BlobServerConnection - PUT operation failed
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "file"
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3332)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3352)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:433)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:301)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:378)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:456)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:200)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AFileSystem.createTmpFileForWrite(S3AFileSystem.java:572)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory.create(S3ADataBlocks.java:811)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3ABlockOutputStream.createBlockIfNeeded(S3ABlockOutputStream.java:190)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3ABlockOutputStream.<init>(S3ABlockOutputStream.java:168)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:778)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
at org.apache.flink.fs.s3.common.hadoop.HadoopFileSystem.create(HadoopFileSystem.java:141)
at org.apache.flink.fs.s3.common.hadoop.HadoopFileSystem.create(HadoopFileSystem.java:37)
at org.apache.flink.runtime.blob.FileSystemBlobStore.put(FileSystemBlobStore.java:73)
at org.apache.flink.runtime.blob.FileSystemBlobStore.put(FileSystemBlobStore.java:69)
at org.apache.flink.runtime.blob.BlobUtils.moveTempFileToStore(BlobUtils.java:444)
at org.apache.flink.runtime.blob.BlobServer.moveTempFileToStore(BlobServer.java:694)
at org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:351)
at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:114)
I guess the s3 hadoop filesystem is trying to create local files but it cannot find 'file' filesystem. Can anyone advise the potential problem here?
Thanks
The plugin loader had a shortcoming in 1.9.0 and 1.9.1 that prevented the plugins from lazily loading new classes. It's fixed in the upcoming 1.9.2 and 1.10 releases.
For the time being, you could simply add the jar to the lib folder as a workaround. Note, however, that in 1.10 you can only use s3 through plugins, so keep that in mind when you would upgrade.

Error while deploying flink application on EMR

I am getting this error when I deploy my flink application on EMR
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/flink/api/common/serialization/DeserializationSchema
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.util.RunJar.run(RunJar.java:232)
Although, it works fine when I deploy on a local cluster. I am using flink 1.9.0 on EMR version 5.28.0
This issue can be connected with multiple different things. Things to check are:
Version mismatch between Flink in dependencies and Flink on EMR.
The core dependencies of Flink should be `provided. To not cause clash with the dependencies that are available on cluster.
What is your JDK version? Is it possible that there is a problem with the environment? I think it is very likely that the JDK version does not match

Flink 1.4 throws errors

just trying to migrate from flink 1.3 into 1.4 and getting this exception on
linux machine:
(not reproducing at windows).
i've import this package also:
// https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop2
compile group: 'org.apache.flink', name: 'flink-shaded-hadoop2', version: '1.4.0'
any help?
at flink console:
TriggerWindow(TumblingProcessingTimeWindows(10000), ReducingStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer#cb6c5dba, reduceFunction=com.clicktale.reducers.MetricsReducer#4e406694}, ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java:241)) -> Sink: Unnamed (1/1)
java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.LocalFileSystem not a subtype
at java.util.ServiceLoader.fail(ServiceLoader.java:239)
at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:376)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2364)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375)
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:99)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:401)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.createHadoopFileSystem(BucketingSink.java:1154)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initFileSystem(BucketingSink.java:411)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:355)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:259)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:694)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:682)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:748)
I faced a similar (not specifically this, but dependencies related) issues migrating from 1.3 to 1.4.
In my case, I had to re-generate a fresh POM file using maven archetype and then add the needed dependencies one by one.
See Java Quickstart or Scala Quickstart.
Reason being that there has been a major rework on dependency structure. See Release notes for more information.
Note that Flink 1.4 will load any Hadoop jars found via the "hadoop classpath" shell command, and these will be first on the classpath. So if you have an incompatible version of Hadoop installed that the "hadoop" command points at, you can run into this kind of problem.

Resources