Unable to run PubSubSource on Flink cluster - apache-flink

I've wrote a minimal Flink application trying to read data from PubSub.
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.enableCheckpointing(10000L)
env.addSource(
PubSubSource.newBuilder()
.withDeserializationSchema(new SimpleStringSchema)
.withProjectName("PROJECT")
.withSubscriptionName("SUBSCRIPTION")
.build())
.print()
env.execute("job")
This program can be run directly (sbt run) successfully, but if I submit it to a Flink cluster, I got the following error message.
java.lang.IllegalArgumentException: cannot find a NameResolver for pubsub.googleapis.com:443
I've tried to run clusters in different machines/environments, but none of them works.
OS: macOS Catalina / Ubuntu 18.04
Flink version: 1.13.1 / 1.12.2
Scala version: 2.12.13 / 2.11.12
JVM: Oracle 8&11, OpenJDK 8&11
Here is the gist for code, build.sbt and full error message.
Thank you.

Found the solution.
Like this post said, I need to keep those files in META-INF/services.
After I remove the following line, everything works fine.
case PathList("META-INF", xs # _*) => MergeStrategy.discard

Related

FlinkKafkaConsumer fails to read from a LZ4 compressed topic

We've got several flink applications reading from Kafka topics, and they work fine. But recently we've added a new topic to the existing flink job and it started failing immediately on startup with the following root error:
Caused by: org.apache.kafka.common.KafkaException: java.lang.NoClassDefFoundError: net/jpountz/lz4/LZ4Exception
at org.apache.kafka.common.record.CompressionType$4.wrapForInput(CompressionType.java:113)
at org.apache.kafka.common.record.DefaultRecordBatch.compressedIterator(DefaultRecordBatch.java:256)
at org.apache.kafka.common.record.DefaultRecordBatch.streamingIterator(DefaultRecordBatch.java:334)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.nextFetchedRecord(Fetcher.java:1208)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1245)
... 7 more
I found out that this topic has the lz4 compression and guess that flink for some reason is unable to work with it. Adding lz4 dependencies directly to the app didn't work, and what's weird - it runs fine locally, but fails on the remote cluster.
The flink runtime version is 1.9.1, and we have the same version of all other dependencies in our application:
flink-streaming-java_2.11, flink-connector-kafka_2.11, flink-java and flink-clients_2.11
Could this be happening due to flink not having a dependency to the lz4 lib inside?
Found the solution. No version upgrade was needed, nor the additional dependencies to the application itself. What worked out for us is adding the lz4 library jar directly to the flink libs folder in the Docker image. After that, the error with lz4 compression disappeared.

java.lang.ClassNotFoundException: org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer when trying to run spark job in zeppelin

1.Summarizing the problem
I have build zeppelin from the source code by running the below command.
mvn clean package -DskipTests -Pspark-2.3 -Pscala-2.11
The build was successful.
Launched apache zeppelin on kubernetes cluster and could see zeppelin-server starts perfectly fine.
but when trying to run a spark notebook the spark interpreter pod goes into completed/succeded state with below errors in the logs from spark-interpreter.log
WARN [2020-03-06 00:42:37,683] ({main} Logging.scala[logWarning]:87) - Failed to load org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.
java.lang.ClassNotFoundException: org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
2.Describe what you’ve tried
I did not find any resolution so could not try any solution to this problem yet.
any suggestions or ideas would be highly appreciated.
I figured out the issue and was able to resolve by adding --jars with interpreter and spark jars in the zeppelin-env.sh script but later stuck into different issue.
Now that interpreter is starting but unable to launch executors.
Below is the error message, if anybody would like to provide any inputs, would appreciate it.
java.lang.NoClassDefFoundError: org/sonatype/aether/resolution/DependencyResolutionException
Thank you.

Flink CEP: java.lang.NoSuchMethodError

flink run /home/admin/Documents/flink_cep/Flink-master/dist/Kinesis.jar
When I am trying to run Jar file in command line, getting error but my code is running fine in Netbeans IDE:
A NoSuchMethodError indicates a version conflict.
You should verify that you compiled your Flink job with the same Flink version as your cluster is running.

Apache Zeppelin 0.7.0-SNAPSHOT not working with external Spark

I am trying to use Zeppelin (0.7-0 snapshot compiled with mvn clean package -Pcassandra-spark-1.6 -Dscala-2.11 -DskipTests)
with an external, standalone Spark of version 1.6.1
I have tried to set this up by entering export MASTER=spark://mysparkurl:7077 in /zeppelin/conf/zeppelin-env.sh
and under the %spark interpeter settings, through the Zeppelin GUI I have also tried to set the master-parameter to spark://mysparkurl:7077.
So far, attempts to connect to Spark have been unsuccessful. Here is a piece of code I have used for testing Zeppelin with external spark and the error I get with it:
%spark
val data = Array(1,2,3,4,5)
val distData = sc.parallelize(data)
val distData2 = distData.map(i => (i,1))
distData2.first
data: Array[Int] = Array(1, 2, 3, 4, 5)
Java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
Zeppelin is running in a docker container, and Spark is running on host.
Am I missing something here? Is there something else that needs to be configured in order for Zeppelin to work with an external, standalone Spark?
As Cedric H. mentions, at that time you have to compile Apache Zeppelin with -Dscala-2.10.
Few bugs have been fixed since Sept and Scala 2.11 support should be working now well, if not - please file an issue in official project JIRA.

Kafka Flink logging issue

I am working on Kafka Flink integration actually I am done with that integration , I have written a simple word count program in Java using Flink API, when I ran it by java -jar myjarname it worked fine but when I tried to ran it with ./bin/flink run myjarname command it was giving me following error,
NoSuchMethodError:org.apache.flink.streaming.api.operators.isCheckpointingEnabled
The respected jar is there but still it is giving me above error.

Resources