Error while deploying flink application on EMR - apache-flink

I am getting this error when I deploy my flink application on EMR
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/flink/api/common/serialization/DeserializationSchema
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.util.RunJar.run(RunJar.java:232)
Although, it works fine when I deploy on a local cluster. I am using flink 1.9.0 on EMR version 5.28.0

This issue can be connected with multiple different things. Things to check are:
Version mismatch between Flink in dependencies and Flink on EMR.
The core dependencies of Flink should be `provided. To not cause clash with the dependencies that are available on cluster.

What is your JDK version? Is it possible that there is a problem with the environment? I think it is very likely that the JDK version does not match

Related

Upgrading Apache Flink need to update pom.xml?

I've just upgraded my flink from version 1.9.1 to 1.11.2 (using docker)
I have already many flink jobs running in version 1.9.1
When I try to upgrade to 1.11.1 and re run my job, it shows error.
2020-11-12 06:49:17,731 WARN org.apache.zookeeper.ClientCnxn []
- SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/jaas-1135609831848314731.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
2020-11-12 06:49:17,739 INFO org.apache.zookeeper.ClientCnxn [] - Opening socket connection to server xxxxxx:2181
2020-11-12 06:49:17,741 ERROR org.apache.curator.ConnectionState [] - Authentication failed
And this is the error after deploying my flink job:
Caused by: java.lang.RuntimeException: API paths not defined
and also:
java.lang.NoSuchMethodError: org.apache.flink.api.common.state.OperatorStateStore.getSerializableListState(Ljava/lang/String;)Lorg/apache/flink/api/common/state/ListState;
Do I need to change every pom for my flink jobs?
Is there any work around without changing my source code?
Thanks
Yes, you do have to rebuild your Flink jobs whenever you update the Flink version being used to run them. The libraries you use should be from the same exact version used by the Job Manager and Task Managers.
If you are trying to automate deployments for a CI/CD pipeline, you could inject the version number into the pom.xml using an environment variable -- but doing things like that can make it hard to debug when things go wrong.

Beam on EMR throws a java.util.ServiceConfigurationError

I have an Apache Beam application(using beam version 2.23.0) that I am trying to deploy on AWS EMR(emr-5.30.1) with Flink(1.10.0) preinstalled.
The application is running with no issues when I deploy it on my local docker flink cluster. But when I do
flink run -m yarn-cluster -c my_class my_jar.jar
on the master node of the EMR cluster
I get
java.util.ServiceConfigurationError: com.fasterxml.jackson.databind.Module: Provider com.fasterxml.jackson.module.jaxb.JaxbAnnotationModule not a subtype
at java.util.ServiceLoader.fail(ServiceLoader.java:239)
at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:376)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at com.fasterxml.jackson.databind.ObjectMapper.findModules(ObjectMapper.java:1054)
at org.apache.beam.sdk.options.PipelineOptionsFactory.<clinit>(PipelineOptionsFactory.java:471)
at org.myapp.main(MainApp.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:664)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:895)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968)
Seems like the issue is with
org.apache.beam.sdk.options.PipelineOptionsFactory.<clinit>(PipelineOptionsFactory.java:471) but I am not clear on what is causing this behaviour.
Can someone please advise what may cause this?
Thank you in advance!
That is probably a classloading issue.
On EMR Flink EC2 instance, there are already some jars, and these libraries are loaded before your own dependencies. So, the version that is used at runtime is the one provided by EMR, not the one you have as a dependency in your own pom.xml.
There are multiple solutions:
in your pom.xml, use the same version than the one provided by EMR
in EC2 instance, replace the EMR version by yours
change the order of library loading
whatever the solution, you need to send to Flink all the required dependencies, no only the jar that contains your own code

FlinkKafkaConsumer fails to read from a LZ4 compressed topic

We've got several flink applications reading from Kafka topics, and they work fine. But recently we've added a new topic to the existing flink job and it started failing immediately on startup with the following root error:
Caused by: org.apache.kafka.common.KafkaException: java.lang.NoClassDefFoundError: net/jpountz/lz4/LZ4Exception
at org.apache.kafka.common.record.CompressionType$4.wrapForInput(CompressionType.java:113)
at org.apache.kafka.common.record.DefaultRecordBatch.compressedIterator(DefaultRecordBatch.java:256)
at org.apache.kafka.common.record.DefaultRecordBatch.streamingIterator(DefaultRecordBatch.java:334)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.nextFetchedRecord(Fetcher.java:1208)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1245)
... 7 more
I found out that this topic has the lz4 compression and guess that flink for some reason is unable to work with it. Adding lz4 dependencies directly to the app didn't work, and what's weird - it runs fine locally, but fails on the remote cluster.
The flink runtime version is 1.9.1, and we have the same version of all other dependencies in our application:
flink-streaming-java_2.11, flink-connector-kafka_2.11, flink-java and flink-clients_2.11
Could this be happening due to flink not having a dependency to the lz4 lib inside?
Found the solution. No version upgrade was needed, nor the additional dependencies to the application itself. What worked out for us is adding the lz4 library jar directly to the flink libs folder in the Docker image. After that, the error with lz4 compression disappeared.

Apache Zeppelin Upgrade to Flink 1.4.2

I tried to upgrade Apache Zeppelin to use Flink 1.4.2. Checking the source of the Flink Zeppelin interpreter I did not find anything that seems materials from a Flink version point of view so I just updated the Flink verion in the pom file to 1.4.2 and run a new build from source which surprisingly worked. Running the Flink batch example notebook (or a streaming example of myself) I get the following error which I cannot properly understand
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:437)
at org.apache.zeppelin.scheduler.Job.run(Job.java:183)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:305)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Would be great to get some understanding how we can move to newest Flink version in Zeppelin.
A recent thread on the Flink users mailing list covered how to update Zeppelin to use Flink 1.4.2. The email included an image that doesn't appear in the archive, so I'll include the relevant info here:
You need to set two properties in Zeppelin, which point to the jobmanager of the flink cluster to be used by the zeppelin server:
jobmanager.rpc.host
jobmanager.rpc.port
Note that there are also properties named host and port, but they aren't used for this purpose.
And you'll need to build Zeppelin using profile scala-2.11 and Flink 1.4.2.

Flink 1.4 throws errors

just trying to migrate from flink 1.3 into 1.4 and getting this exception on
linux machine:
(not reproducing at windows).
i've import this package also:
// https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop2
compile group: 'org.apache.flink', name: 'flink-shaded-hadoop2', version: '1.4.0'
any help?
at flink console:
TriggerWindow(TumblingProcessingTimeWindows(10000), ReducingStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer#cb6c5dba, reduceFunction=com.clicktale.reducers.MetricsReducer#4e406694}, ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java:241)) -> Sink: Unnamed (1/1)
java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.LocalFileSystem not a subtype
at java.util.ServiceLoader.fail(ServiceLoader.java:239)
at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:376)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2364)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375)
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:99)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:401)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.createHadoopFileSystem(BucketingSink.java:1154)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initFileSystem(BucketingSink.java:411)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:355)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:259)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:694)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:682)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:748)
I faced a similar (not specifically this, but dependencies related) issues migrating from 1.3 to 1.4.
In my case, I had to re-generate a fresh POM file using maven archetype and then add the needed dependencies one by one.
See Java Quickstart or Scala Quickstart.
Reason being that there has been a major rework on dependency structure. See Release notes for more information.
Note that Flink 1.4 will load any Hadoop jars found via the "hadoop classpath" shell command, and these will be first on the classpath. So if you have an incompatible version of Hadoop installed that the "hadoop" command points at, you can run into this kind of problem.

Resources