Apache Zeppelin Upgrade to Flink 1.4.2 - apache-flink

I tried to upgrade Apache Zeppelin to use Flink 1.4.2. Checking the source of the Flink Zeppelin interpreter I did not find anything that seems materials from a Flink version point of view so I just updated the Flink verion in the pom file to 1.4.2 and run a new build from source which surprisingly worked. Running the Flink batch example notebook (or a streaming example of myself) I get the following error which I cannot properly understand
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:437)
at org.apache.zeppelin.scheduler.Job.run(Job.java:183)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:305)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Would be great to get some understanding how we can move to newest Flink version in Zeppelin.

A recent thread on the Flink users mailing list covered how to update Zeppelin to use Flink 1.4.2. The email included an image that doesn't appear in the archive, so I'll include the relevant info here:
You need to set two properties in Zeppelin, which point to the jobmanager of the flink cluster to be used by the zeppelin server:
jobmanager.rpc.host
jobmanager.rpc.port
Note that there are also properties named host and port, but they aren't used for this purpose.
And you'll need to build Zeppelin using profile scala-2.11 and Flink 1.4.2.

Related

Upgrading Apache Flink need to update pom.xml?

I've just upgraded my flink from version 1.9.1 to 1.11.2 (using docker)
I have already many flink jobs running in version 1.9.1
When I try to upgrade to 1.11.1 and re run my job, it shows error.
2020-11-12 06:49:17,731 WARN org.apache.zookeeper.ClientCnxn []
- SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/jaas-1135609831848314731.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
2020-11-12 06:49:17,739 INFO org.apache.zookeeper.ClientCnxn [] - Opening socket connection to server xxxxxx:2181
2020-11-12 06:49:17,741 ERROR org.apache.curator.ConnectionState [] - Authentication failed
And this is the error after deploying my flink job:
Caused by: java.lang.RuntimeException: API paths not defined
and also:
java.lang.NoSuchMethodError: org.apache.flink.api.common.state.OperatorStateStore.getSerializableListState(Ljava/lang/String;)Lorg/apache/flink/api/common/state/ListState;
Do I need to change every pom for my flink jobs?
Is there any work around without changing my source code?
Thanks
Yes, you do have to rebuild your Flink jobs whenever you update the Flink version being used to run them. The libraries you use should be from the same exact version used by the Job Manager and Task Managers.
If you are trying to automate deployments for a CI/CD pipeline, you could inject the version number into the pom.xml using an environment variable -- but doing things like that can make it hard to debug when things go wrong.

Beam on EMR throws a java.util.ServiceConfigurationError

I have an Apache Beam application(using beam version 2.23.0) that I am trying to deploy on AWS EMR(emr-5.30.1) with Flink(1.10.0) preinstalled.
The application is running with no issues when I deploy it on my local docker flink cluster. But when I do
flink run -m yarn-cluster -c my_class my_jar.jar
on the master node of the EMR cluster
I get
java.util.ServiceConfigurationError: com.fasterxml.jackson.databind.Module: Provider com.fasterxml.jackson.module.jaxb.JaxbAnnotationModule not a subtype
at java.util.ServiceLoader.fail(ServiceLoader.java:239)
at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:376)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at com.fasterxml.jackson.databind.ObjectMapper.findModules(ObjectMapper.java:1054)
at org.apache.beam.sdk.options.PipelineOptionsFactory.<clinit>(PipelineOptionsFactory.java:471)
at org.myapp.main(MainApp.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:664)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:895)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968)
Seems like the issue is with
org.apache.beam.sdk.options.PipelineOptionsFactory.<clinit>(PipelineOptionsFactory.java:471) but I am not clear on what is causing this behaviour.
Can someone please advise what may cause this?
Thank you in advance!
That is probably a classloading issue.
On EMR Flink EC2 instance, there are already some jars, and these libraries are loaded before your own dependencies. So, the version that is used at runtime is the one provided by EMR, not the one you have as a dependency in your own pom.xml.
There are multiple solutions:
in your pom.xml, use the same version than the one provided by EMR
in EC2 instance, replace the EMR version by yours
change the order of library loading
whatever the solution, you need to send to Flink all the required dependencies, no only the jar that contains your own code

Get code '200' from prometheus when using PrometheusPushGateway to report flink metrics

when i use flink-metrics-prometheus_2.11-1.10.0.jar to report metrics to prometheus, i got following errors, i want to know the reason for this problem and how can i solve it:
2020-04-20 15:32:17.940 [Flink-MetricRegistry-thread-1] WARN org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter - Failed to push metrics to PushGateway with jobName oceanus, groupingKey {}.
java.io.IOException: Response code from http://9.91.161.72:80/metrics/job/oceanus was 200
at org.apache.flink.shaded.io.prometheus.client.exporter.PushGateway.doRequest(PushGateway.java:297)
at org.apache.flink.shaded.io.prometheus.client.exporter.PushGateway.push(PushGateway.java:127)
at org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter.report(PrometheusPushGatewayReporter.java:109)
at org.apache.flink.runtime.metrics.MetricRegistryImpl$ReporterTask.run(MetricRegistryImpl.java:441)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
It seems flink-metrics-prometheus-1.10.0.jar contains a really old prometheus client java version (https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-prometheus/pom.xml#L35 - 0.8.1, https://github.com/apache/flink/blob/release-1.9.1/pom.xml#L125 - 0.3.0 version).
The annoying WARN on "200 OK" response code is fixed on 0.8.0 (https://github.com/prometheus/client_java/blob/parent-0.8.0/simpleclient_pushgateway/src/main/java/io/prometheus/client/exporter/PushGateway.java#L316). And that version is included since Flink 1.11.0 (https://github.com/apache/flink/blob/release-1.11.0/pom.xml#L127).
So the solution is to include flink-metrics-prometheus-1.11.0.jar as part of your classpath.
Do not worry about Flink version incompatibility there because I'm using Flink 1.9.1 with flink-metrics-prometheus-1.11.0.jar and everything is working fine. I guess there was no so many changes on that part between Flink versions
Seems like there was a recent change on how responses are handled.
I suspect you need to match the prometheus jar to the server version or vice versa.

Error while deploying flink application on EMR

I am getting this error when I deploy my flink application on EMR
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/flink/api/common/serialization/DeserializationSchema
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.util.RunJar.run(RunJar.java:232)
Although, it works fine when I deploy on a local cluster. I am using flink 1.9.0 on EMR version 5.28.0
This issue can be connected with multiple different things. Things to check are:
Version mismatch between Flink in dependencies and Flink on EMR.
The core dependencies of Flink should be `provided. To not cause clash with the dependencies that are available on cluster.
What is your JDK version? Is it possible that there is a problem with the environment? I think it is very likely that the JDK version does not match

[AWS Glue]: org.apache.thrift.TApplicationException: Internal error processing createInterpreter

I'm trying to use zeppelin-0.8.0 to connect to AWS Glue Development endpoint and when executing a cell below error occurs.
And there is no helpful message to understand what could be the problem. Any leads appreciated
172318_1906434757 is finished, status: ERROR, exception: java.lang.RuntimeException: org.apache.thrift.TApplicationException: Internal error processing createInterpreter, result: %text org.apache.thrift.TApplicationException: Internal error processing createInterpreter
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_createInterpreter(RemoteInterpreterService.java:209)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.createInterpreter(RemoteInterpreterService.java:192)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:169)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:165)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:165)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:132)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:299)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:407)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
UPDATE: So as in the answer below looks like 0.8.0 doesn't work with Glue yet.. I had problems running 0.7.x aw well with the javax.ws.rx package having a bunch of MethodNotFoundException when running with Java 8(did not help update-alternative to Java 7 as well). But when running inside a JDK 7 docker container it worked with no problems and was able to connect to my Dev end point. Highly appreciate if anyone can clarify the root cause of it
Could you please provide more information, such as zeppin instance location. Is it running on your desktop/laptop or is it running as AWS Notebook server? Also did you try connecting to zeppelin 0.7.3 version, as mentioned here in this AWS forum link :
https://forums.aws.amazon.com/thread.jspa?threadID=285128
As per the above link dated Jul 2018, think AWS Glue doesn't yet support Zeppelin 0.8 version.
I am assuming all other configurations, environment settings are done as needed. Can help more, if you can provide additional info.
UPDATE:
Anyway, please refer here and setting up zeppelin on windows, for any help on setting up local development environment & zeppelin notebook.
Once you set up the zeppelin notebook, have an SSH connection established (using AWS Glue DevEndpoint URL), so you can have access to the data catalog/crawlers,etc., and also the S3 bucket where your data resides. Then, you can create your python scripts in the zeppelin notebook, and run from the zeppelin.
You can use dev instance provided by Glue, but you may incur additional costs for the same(EC2 instance charges).
Environment settings (updated in response to comments):
JAVA_HOME=E:\Java7\jre7
Path=E:\Python27;E:\Python27\Lib;E:\Python27\Scripts;
PYTHONPATH=E:\spark-2.1.0-bin-hadoop2.7\python;E:\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip;E:\spark-2.1.0-bin-hadoop2.7\python\lib\pys
park.zip
SPARK_HOME=E:\spark-2.1.0-bin-hadoop2.7
Change the drive name/ folders accordingly. Let me know if any help neeed.

Resources